CVS Health is seeking a highly skilled Observability Data Infrastructure Engineer to join our Enterprise Observability and Security Engineering team. This role is responsible for building, scaling, and operationalizing an enterprise Observability Lakehouse that enables threat detection, incident response, and platform visibility across hybrid and multi-cloud environments.
Responsibilities:
- Design, build, and operate high-volume log, metric, and trace pipelines using Databricks, cloud data lakes, and distributed processing engines
- Architect and evolve an Observability Lakehouse aligned with OpenTelemetry (OTEL) data models and standards
- Implement ingestion and transformation workflows using technologies such as Cribl, Vector, Jenkins, GitHub Actions, or equivalent tools
- Normalize, model, and enrich telemetry data to support detection engineering, forensics, and operational analytics
- Develop scalable ETL/ELT frameworks, Delta Lake architectures, and automated data quality validation for unstructured and semi-structured data
- Partner with Security Engineering, SRE, Cloud, and SOC teams to improve enterprise visibility and detection accuracy
- Build and maintain CI/CD pipelines and reusable Infrastructure-as-Code (IaC) patterns for observability platform deployment
- Identify and resolve performance, latency, cost, and reliability issues across telemetry pipelines
- Contribute to engineering standards, documentation, and knowledge sharing across observability and security platforms
Requirements:
- 7+ years of experience building and operating log, metric, and trace pipelines in Data, Security Data, or Observability Engineering roles
- 5+ years of hands-on experience with Databricks, Apache Spark, or other large-scale distributed data platforms
- 5+ years of experience working across cloud platforms (AWS, Azure, or GCP), including storage, compute, and event-driven services
- 5+ years of production experience using SQL and Python in data-intensive environments
- 3+ years of experience with enterprise observability platforms (Splunk, Datadog, Elastic, or equivalent)
- 3+ years of experience with high-throughput ingestion and streaming technologies such as Cribl, Vector, or Kafka
- 3+ years of experience designing telemetry systems aligned to OpenTelemetry (OTEL) or similar standards
- Background supporting SIEM/SOAR platforms, detection engineering, or threat analytics
- Familiarity with Delta Lake, Unity Catalog, metadata management, and data lineage
- Understanding of security governance, auditing, access controls, and sensitive data handling
- Hands-on experience with Infrastructure as Code (Terraform, ARM/Bicep, CloudFormation)
- Familiarity with cloud-native compute and orchestration services (Azure Functions, AWS Lambda, GCP Cloud Functions, Kubernetes)
- Strong communication skills with the ability to engage both engineering teams and senior stakeholders
- Demonstrated passion for observability, security, reliability, and continuous learning