Provide technical solution discovery effort on any new capabilities or new functionality.
Assist PO with technical user stories to ensure healthy backlog features
Lead the development of real-time data pipelines using AWS DMS, MSK, Kafka or Glue Streaming and for CDC ingestion from multiple SQL Server sources (RDS/on-prem).
Build and optimize streaming and batch data pipelines using AWS Glue (PySpark) to validate, transform, and normalize data to Iceberg and DynamoDB.
Define and enforce data quality, lineage, and reconciliation logic with support for both streaming and batch use cases.
Integrate with S3 Bronze/Silver layers and implement efficient schema evolution and partitioning strategies using Iceberg.
Collaborate with architects, analysts, and downstream application teams to design API and file-based egress layers.
Implement monitoring, logging, and event-based alerting using CloudWatch, SNS, and EventBridge.
Mentor junior developers and enforce best practices for modular, secure, and scalable data pipeline development.
Requirements
6+ years of hands-on expert level data engineering experience in cloud-based environments (AWS preferred) with event driven implementation
Strong experience with Apache Kafka / AWS MSK including topic design, partitioning, and Kafka Connect/Debezium
Proficiency in AWS Glue (PySpark) and for both batch and streaming ETL
Working knowledge of AWS DMS, S3, Lake Formation, DynamoDB, and Iceberg
Solid grasp of schema evolution, CDC patterns, and data reconciliation frameworks
Experience with infrastructure-as-code (CDK/Terraform) and DevOps practices (CI/CD, Git)