Role Overview

Develop AI-driven systems to improve data capabilities, ensuring compliance with industry best practices
Implement efficient Retrieval-Augmented Generation (RAG) architectures and integrate with enterprise data infrastructure
Collaborate with cross-functional teams to integrate solutions into operational processes and systems supporting various functions
Stay up to date with industry advancements in AI and apply modern technologies and methodologies to our systems
Design, build and maintain scalable and robust real-time data streaming pipelines using technologies such as GCP, Vertex AI, S3, AWS Bedrock, Spark streaming, or similar
Develop data domains and data products for various consumption archetypes including Reporting, Data Science, AI/ML, Analytics etc
Ensure the reliability, availability, and scalability of data pipelines and systems through effective monitoring, alerting, and incident management
Implement best practices in reliability engineering, including redundancy, fault tolerance, and disaster recovery strategies
Collaborate closely with DevOps and infrastructure teams to ensure seamless deployment, operation, and maintenance of data systems
Mentor junior team members and engage in communities of practice to deliver high-quality data and AI solutions while promoting best practices, standards, and adoption of reusable patterns
Apply AI solutions to insurance-specific data use cases and challenges
Partner with architects and stakeholders to influence and implement the vision of the AI and data pipelines while safeguarding the integrity and scalability of the environment.

Requirements

8+ years’ Strong hands-on experience programming skills in Python
7+ years of data engineering Strong hands-on experience including Data solutions, SQL and NoSQL, Snowflake, ETL/ELT tools, CICD, Bigdata, Cloud Technologies (AWS/Google/AZURE), Python/Spark
3+ years of data engineering experience focused on supporting Generative AI technologies
2+ years Strong hands-on experience implementing production ready enterprise grade GenAI data solutions
3+ years’ experience in implementing Retrieval-Augmented Generation (RAG) pipelines, integrating retrieval mechanisms with language models
3+ years’ Experience of vector databases and graph databases, including implementation and optimization
3+ years’ Experience in processing and leveraging unstructured data for GenAI applications
3+ years’ Proficiency in implementing scalable AI driven data systems supporting agentic solution (AWS Lambda, S3, EC2, Langchain, Langgraph)
3+ years’ Experience with building AI pipelines that bring together structured, semi-structured and unstructured data. This includes pre-processing with extraction, chunking, embedding and grounding strategies, semantic modeling, and getting the data ready for Models and Agentic solutions.

Tech Stack

AWS
Azure
Cloud
EC2
ETL
Google Cloud Platform
NoSQL
Python
Spark
SQL

Benefits

Other rewards may include short-term or annual bonuses
Long-term incentives
On-the-spot recognition

Lead Data Engineer – Generative AI

Key skills

About this role

Role Overview

Requirements

Tech Stack

Benefits