Lead the design and architecture of end-to-end data pipelines and solutions on modern cloud-based platforms, including Snowflake, Databricks, and AWS.
Build and optimize robust, scalable data orchestration workflows using Apache Airflow and implement best practices across multiple agile squads.
Design and implement data solutions using PostgreSQL for relational data and MongoDB for NoSQL requirements, ensuring optimal performance and scalability.
Architect and deploy containerized data applications using Docker, Kubernetes, and AWS EKS, incorporating GitHub Actions for automated deployments.
Design and implement CI/CD pipelines using GitHub Actions, establish branching strategies, and ensure automated testing, code quality checks, and security scanning.
Collaborate with cross-functional teams—including Data Scientists, Analytics teams, and business stakeholders—to translate requirements into scalable technical solutions.
Mentor and guide data engineers by promoting technical excellence, establishing coding standards, and conducting architecture reviews.
Drive data platform modernization initiatives and ensure data quality, reliability, and governance across all data systems.
Design and implement AI-enhanced data pipelines that leverage LLMs and Agentic AI frameworks to automate data quality checks, anomaly detection, and intelligent data transformation workflows.
Architect data infrastructure to support AI/ML workloads, including feature stores, vector databases, and real-time inference pipelines integrated with cloud-native services.
Leverage established standards and best practices to integrate AI agents into data engineering workflows, including context management protocols (MCP) for seamless AI-to-data-platform communication.
Requirements
You have 8+ years of data engineering experience, including 3+ years in a lead role architecting large-scale data platforms.
You possess expert-level proficiency in Python and Java for building cloud-native data processing solutions.
You have deep hands-on experience with Apache Airflow, Snowflake (data warehousing, modeling, optimization), and Databricks.
You have strong AWS expertise, including S3, Lambda, Glue, EMR, Kinesis, EKS, and RDS.
You have production database experience with PostgreSQL (design, optimization, replication) and MongoDB (document modeling, sharding, replica sets).
You have solid experience with containerization and orchestration using Docker, Kubernetes, and AWS EKS, including cluster management and autoscaling.
You have proven CI/CD and GitOps experience using GitHub, GitHub Actions, and ArgoCD for automated deployments and multi-environment management.
You are proficient with agile tools such as JIRA for sprint management and Confluence for technical documentation and knowledge sharing.
You have excellent analytical, problem-solving, and communication skills, with the ability to explain complex concepts to non-technical stakeholders and drive initiatives in complex environments.
You have working knowledge of AI/ML frameworks (LangChain, LlamaIndex, AutoGen, etc.) and understand how Agentic AI can enhance data engineering workflows through automated data validation, intelligent orchestration, and self-healing pipelines.
You have practical understanding of AI integration patterns in data platforms, including prompt engineering, RAG architectures, and vector database implementations.
You are familiar with Model Context Protocol (MCP) or similar frameworks for enabling AI agents to interact securely and efficiently with data sources, APIs, and tools.
You have experience with AI-powered development tools such as GitHub Copilot and Amazon Q.
Tech Stack
Airflow
Apache
AWS
Cloud
Docker
Java
Kubernetes
MongoDB
NoSQL
Postgres
Python
Benefits
Hybrid Work Environment: On-site presence required two days per week.
A Culture of Learning & Mobility: Access to dedicated training, leadership development, and mentorship programs to support continuous learning.
Investing in Your Future: Retirement planning and tuition reimbursement programs to help you meet your short
and long-term goals.
Promoting Health & Wellbeing: Comprehensive healthcare offerings that support physical, mental, financial, social, and occupational wellbeing.
Supportive Parenting Policies: Family-friendly policies, including a generous global parental leave plan, designed to help you balance work and family life.
Inclusive Work Environment: A collaborative workplace where all voices are valued, supported by Employee Resource Groups that unite and empower colleagues worldwide.
Dedication to Giving Back: Paid volunteer days, matched donation programs, and ample opportunities to volunteer in your community.