Design, develop, and maintain scalable data pipelines and architectures across Azure, AWS, and/or GCP using tools like Azure Data Lake Storage, Azure Data Factory, Databricks, Apache Spark/Scala, and equivalent multi-cloud services
Manage and optimize a growing cloud-based data ecosystem, ensuring the reliability and performance of corporate data lakes and analytics data marts
Collaborate with cross-functional teams to define and implement data strategy, including data acquisition, ingestion, and integration from diverse sources and formats
Develop and deploy custom data pipelines that support machine learning models, business intelligence dashboards, and new data products
Implement data quality assurance practices using automated validation frameworks such as dbt or Great Expectations
Establish metadata management standards, including data lineage and stewardship for high-value datasets
Identify and implement automation opportunities leveraging AI capabilities and advanced data modeling to enhance data workflows and business outcomes
Enforce strong development standards through code reviews, testing, and monitoring
Stay current with emerging trends in data engineering, AI, and cloud technologies to continuously improve data solutions and align with Marsh’s data strategy
Evangelize best practices in data engineering and AI adoption across the organization
Requirements
BE /B.Tech /MCA /BSc.(IT)
6+ years of experience
Advanced proficiency in SQL and Python
Hands-on experience with cloud data platforms, preferably multi-cloud (Azure, AWS, GCP)
Strong knowledge of data pipeline frameworks and big data technologies such as Apache Spark and Databricks
Familiarity with AI/ML concepts and practical applications in data engineering
Experience with data quality frameworks and metadata management
Ability to work independently and collaboratively in a fast-paced environment
Professional certifications in Databricks, MS Azure, AWS, GCP etc.
Experience with automation and AI-driven data engineering tools
Knowledge of containerization and orchestration technologies (e.g., Docker, Kubernetes)
Exposure to modern data governance and security practices