
Toronto, Canada · Open to opportunities
Bipina Poudel
Senior Data Engineer · CAPM®
Senior Data Engineer with 7+ years of experience building scalable cloud-native data platforms processing multi-terabyte enterprise datasets across AI analytics, telecom, and healthcare domains. Expertise in AWS, Azure Databricks, Snowflake, PySpark, Apache Spark, Kafka, Airflow, Terraform, and Spark Streaming. Proven track record of optimizing ETL/ELT pipelines, enabling real-time analytics, and implementing scalable Lakehouse architectures supporting enterprise reporting and AI/ML workloads.
Experience
Where I've worked and what I've built
Senior Data Engineer
Skill Squirrel · Toronto, Ontario
Designed and developed scalable ETL/ELT pipelines using Azure Databricks, PySpark, Azure Data Factory, and AWS Glue, processing 5TB+ workforce analytics data daily to support enterprise AI and workforce intelligence.
- ▸Built enterprise Lakehouse architecture using Azure Data Lake Gen2, AWS S3, Delta Lake, and Snowflake following Medallion Architecture standards
- ▸Developed Kafka and Spark Structured Streaming pipelines processing 1M+ candidate activity events daily for real-time workforce intelligence analytics
- ▸Implemented Apache Airflow DAGs and Delta Live Tables (DLT) workflows, improving pipeline reliability and reducing manual intervention by 40%
- ▸Optimized Spark workloads using partitioning, caching, AQE, and broadcast joins, reducing ETL processing time by 45%
- ▸Developed reusable PySpark and dbt transformation frameworks supporting CDC-based incremental data ingestion and SCD Type 2 implementations
- ▸Integrated Great Expectations and Data Observability frameworks, improving enterprise data quality accuracy to 99.5%
- ▸Built CI/CD pipelines using Terraform, Azure DevOps, Jenkins, and GitHub Actions, automating deployment and infrastructure provisioning
- ▸Collaborated with AI/ML teams to build feature-engineered datasets and MLflow-integrated pipelines supporting predictive hiring models
Environment: AWS (S3, Glue, Lambda), Azure Databricks, PySpark, Spark SQL, Delta Lake, Snowflake, Kafka, Airflow, Azure Data Factory, dbt, Terraform, Python, SQL, Docker, Jenkins, GitHub Actions, CI/CD
Data Engineer
T-Mobile · Bellevue, WA
Developed scalable PySpark and Spark SQL pipelines processing 10TB+ telecom customer, billing, subscriber, and network analytics data daily, supporting Customer 360 analytics for 50M+ subscribers.
- ▸Built large-scale streaming ingestion frameworks using AWS Glue, Kafka, Spark Structured Streaming, and Azure Event Hub, supporting near real-time telecom analytics
- ▸Designed cloud-native Lakehouse architecture using AWS S3, Delta Lake, Snowflake, and Azure Databricks for Customer 360 analytics
- ▸Developed Kafka-based real-time event processing pipelines, reducing network alert processing latency from 2 hours to under 15 minutes
- ▸Implemented Apache Airflow and Azure Data Factory workflows orchestrating 500+ enterprise ETL jobs with 99.9% SLA compliance
- ▸Utilized AWS EMR distributed clusters for processing large-scale telecom network and operational datasets across multi-region environments
- ▸Optimized Snowflake and Spark performance using clustering, partitioning, caching, and query tuning, improving analytics query performance by 60%
- ▸Automated infrastructure provisioning and CI/CD deployment pipelines using Terraform, Jenkins, Docker, and GitHub Actions, reducing deployment effort by 70%
Environment: AWS (Glue, EMR, S3, Lambda), Azure Databricks, PySpark, Spark SQL, Snowflake, Kafka, Spark Streaming, Airflow, Azure Data Factory, dbt, Terraform, Python, SQL, Docker, Jenkins, CI/CD
Data Engineer
Cedar Gate Technologies · Greenwich, CT
Developed ETL/ELT pipelines using AWS Glue, Databricks, PySpark, and Azure Data Factory processing 100M+ healthcare records in a HIPAA-compliant environment.
- ▸Built enterprise Lakehouse architecture using AWS S3, ADLS Gen2, Delta Lake, and Snowflake with Medallion Architecture patterns
- ▸Automated healthcare ETL workflows using Apache Airflow, improving SLA compliance to 99.9%
- ▸Designed dimensional models and CDC pipelines for enterprise claims, provider, and member reporting
- ▸Developed Kafka streaming pipelines for real-time healthcare event processing
- ▸Implemented IAM security policies and HIPAA-compliant governance controls
- ▸Optimized Spark and Snowflake workloads, reducing ETL failures by 35%
- ▸Automated deployment and monitoring using Terraform, Jenkins, and CloudWatch
Environment: AWS (Glue, S3, Redshift, IAM), Azure Databricks, PySpark, Spark SQL, Delta Lake, Snowflake, Kafka, Airflow, Terraform, Python, SQL Server, CI/CD
Skills & Technologies
Tools I work with day to day
Programming Languages
Big Data Technologies
Cloud Platforms
Data Warehousing & Databases
ETL & Orchestration
Streaming Technologies
DevOps & Infrastructure
Data Governance & Observability
Visualization & Reporting
Projects
Things I've built and problems I've solved
Workforce Analytics Lakehouse (Medallion Architecture)
Built enterprise Lakehouse at Skill Squirrel processing 5TB+ workforce data daily using Azure Databricks, Delta Lake, and Snowflake. Integrated Great Expectations for 99.5% data quality accuracy and MLflow pipelines for AI/ML hiring models.
Telecom Customer 360 Real-Time Platform
Designed cloud-native Lakehouse for T-Mobile supporting Customer 360 analytics for 50M+ subscribers. Reduced network alert processing latency from 2 hours to under 15 minutes using Kafka and Spark Structured Streaming.
Healthcare Data Platform (Cedar Gate Technologies)
Developed ETL/ELT pipelines processing 100M+ healthcare records using AWS Glue, Databricks, and PySpark. Implemented HIPAA-compliant Lakehouse with Delta Lake and Snowflake, reducing ETL failures by 35%.
Real-time Workforce Intelligence Streaming
Developed Kafka and Spark Structured Streaming pipelines at Skill Squirrel processing 1M+ candidate activity events daily. Implemented Delta Live Tables workflows, reducing manual intervention by 40%.
CI/CD & Infrastructure Automation
Built automated CI/CD deployment pipelines using Terraform, Jenkins, Docker, and GitHub Actions across multiple enterprise environments, reducing deployment effort by 70%.
Data Quality & Observability Framework
Integrated Great Expectations, Prometheus, Grafana, and Data Observability frameworks for end-to-end data quality monitoring, improving enterprise data accuracy to 99.5% across multi-cloud environments.
More on github.com/bipinapoudel
Certifications
Professional credentials and qualifications
AWS Certified Data Engineering – Associate
Amazon Web Services
Microsoft Certified: Azure Databricks Data Engineer Associate
Microsoft
Certified Associate in Project Management (CAPM)®
Project Management Institute
Education
Academic background
P.G. in Project Management – IT
Seneca College
P.G. in Cyber Security and Threat Management
Seneca College
B.S. in Computing (Honours) – Information Technology
Leeds Beckett University
Get in Touch
Have a project or opportunity in mind? Let's talk.
I'm open to data engineering roles, freelance projects, and collaborations. Whether it's a pipeline problem or a full platform build — I'd love to hear from you.