CVS Health is dedicated to building a more connected and compassionate health experience. In this role, you will lead the evolution of data infrastructure, developing critical data self-service platforms and modernizing data operations to empower data owners in managing their data quality.
Responsibilities:
- Architect Petabyte Pipelines: Engineer scalable, reliable, and performant data pipelines to assemble large and intricate datasets using SQL, DBT, and Snowflake, ensuring high data availability and integrity
- Build Data Platforms: Independently design and maintain internal React (TypeScript) interfaces and Python backend services that automate data ingestion and discovery, reducing lead times for application teams from weeks to minutes
- Develop Data APIs: Build and maintain production-grade REST and gRPC APIs that serve as the high-performance interface between our Snowflake data layer and downstream consumer touchpoints
- Modernize Data Operations: Implement a GitOps model for data using GitHub Actions and Argo/Kargo, integrating standardized logging, alerting, and automated observability into the heart of all data products
- Innovate with AI: Leverage Cursor AI, MCPs, and other AI tooling to accelerate the data engineering SDLC, from optimizing complex SQL queries to automating schema migrations
- Collaborate and Lead: Communicate with business leaders to translate complex data requirements into functional specifications while mentoring other engineers in modern data architecture and software best practices
- Data Architecture: Design and optimize high-volume ETL/ELT pipelines using SQL, DBT, and Snowflake, ensuring data is modeled for both analytical and operational use cases
- Internal Tooling (Full Stack): Develop and maintain internal-facing web applications using React that allow data owners to interact with, monitor, and configure their data pipelines
- API Development: Architect and implement REST and gRPC APIs in Python that serve as the interface between our Snowflake data layer and downstream consumer applications
- CI/CD & GitOps: Own the deployment lifecycle of data services and tools using GitHub Actions for CI and Argo/Kargo for continuous delivery and lifecycle management
- Self-Service Platforms: Build "Data-as-a-Service" features, such as automated UI-driven ingestion workflows, reducing the reliance on manual data engineering tickets
- AI Integration: Utilize modern AI development tools (e.g., Claude AI) to accelerate the development of both data pipelines and management interfaces
Requirements:
- 7+ years of experience in Data Engineering with a heavy focus on Python as the primary scripting and backend language
- 7+ years of experience with SQL and cloud data warehouses (e.g Snowflake, AWS, GCP, etc.)
- 7+ years of experience building high-volume ETL/ELT pipelines and data modeling
- Bachelor's Degree in Computer Science, Data Engineering, or a related technical field
- 5+ years of experience with DBT (Data Build Tools)
- 5+ years of experience building frontend applications with React and designing RESTful APIs
- 5+ years of experience with GitHub Actions and GitOps-based deployment tools (e.g., Argo or Kargo)
- Big Data Architecture: High-level understanding of big data design patterns, including Data Lake, Data Mesh, and Iceberg, along with data normalization strategies
- GitOps & Deployment: Demonstrated experience with Argo/Kargo for Kubernetes-based deployments and advanced GitHub Actions for workflow automation
- Messaging & Streaming: Experience with message queuing technologies such as Kafka, SNS, or RabbitMQ to support real-time data movement
- AI-Enhanced Development: Proficiency in working with Cursor AI, GitHub CoPilot, or similar AI-driven environments to accelerate engineering cycles
- Observability: Strong experience with metrics, logging, monitoring, and alerting tools to ensure production system reliability
- Software Fundamentals: Strong grasp of data structures, algorithms, async programming patterns, and parallel programming
- Healthcare Domain: High-level understanding of HL7 V2.x or FHIR based interface messages