Design, build, and maintain scalable data ingestion pipelines supporting a wide range of source systems (APIs, databases, streaming platforms, third-party data providers)
Develop batch and real-time ingestion frameworks capable of handling high data volumes and low-latency requirements
Build fault-tolerant and resilient pipelines with strong guarantees around data integrity, idempotency, and recovery
Establish and enhance observability, monitoring, and alerting for ingestion pipelines (latency, throughput, failures, data freshness)
Implement and enforce data quality checks and validation frameworks at ingestion points
Contribute to the development of Data Ingestion as a platform product, including reusable frameworks, standards, and best practices
Leverage AI-assisted development tools and automation to accelerate pipeline development, improve code quality, and enhance operational efficiency (e.g., code generation, testing, anomaly detection, workflow automation)
Partner with upstream system owners and downstream consumers to define data contracts, SLAs, and schema evolution strategies
Optimize pipelines for performance, cost, and efficiency across compute and storage layers
Use Infrastructure as Code (IaC) to provision and manage ingestion infrastructure in a consistent and scalable manner
Collaborate with platform, analytics, and data science teams to ensure seamless data availability and usability.
Requirements
3+ years of experience in data engineering
Proven experience building high-volume data pipelines (batch and/or streaming) in cloud environments