Teladoc Health is leading the evolution of virtual care, empowering employees to help millions live healthier lives. The Principal DevOps Platform Engineer will accelerate platform delivery by combining software engineering skills with platform expertise, working collaboratively across teams to improve delivery speed and reliability.
Responsibilities:
- Act as a technical “force multiplier” on the highest-priority initiatives; clarify approach, resolve ambiguity, and drive work to completion with high quality and pragmatic trade-offs
- Reduce cross-team friction by defining clear interfaces, breaking work into deliverable increments, and enabling parallelization through strong architecture boundaries
- Establish and model best practices for engineering excellence: design docs/RFCs, architecture reviews, code review discipline, and effective automated testing strategies
- Drive API-first and “platform as a product” behaviors: define and promote consistent platform interfaces that reduce bespoke integrations and siloed solutions
- Create reusable platform capabilities (templates/modules/golden paths) that reduce reinvention and speed up delivery for teams
- Drive automation opportunities (including agentic/AI-enabled workflows) that improve operational and delivery efficiency
- Lead cross-cutting improvements that enhance stability and reduce toil: observability standards, alert hygiene, incident learning loops, and resilience patterns
- Partner with operations and platform stakeholders to measurably improve reliability outcomes and reduce operational drag on platform delivery teams
- Coach senior/staff engineers by pairing on real work, running reviews, and teaching pragmatic system-level thinking
- Set clear examples of technical leadership, collaboration, and accountability without formal people management responsibility
- Participate in the on-call rotation and contribute to restoration, root cause learning, and prevention
Requirements:
- Bachelor's degree in Computer Science, Engineering, or a related technical field
- 15+ years of hands-on software engineering designing, building, testing, deploying and operating large-scale distributed systems in cloud-native environments
- 5+ years operating at Staff or Principal scope, leading multi-quarter, cross-team technical initiatives that span 3+ teams and deliver organization-level outcomes
- 8+ years of experience designing and operating microservices-based systems, including API design and versioning, authentication and authorization frameworks (e.g. OAuth, OIDC, IAM), and Infrastructure-as-Code (e.g. Terraform, Cloudformation, ARM)
- Deep hands-on experience (5+ years) in at least three of the following: Kubernetes and container orchestration platforms, public cloud infrastructure (AWS/Azure/GCP), CI/CD systems and deployment automation, Infrastructure-as-Code and configuration management, and production operations, reliability tooling and on-call systems
- Demonstrated ownership of production systems supporting business-critical workloads, including participation in incident response, post-incident reviews, and reliability improvements at scale
- Proven ability to operate as a self-directed technical leader, navigating ambiguity, defining problem spaces, and driving clarity and alignment across multiple teams
- Demonstrated success influencing technical direction across globally distributed teams and multiple levels of the organization without formal authority
- Strong written and verbal communication skills, with the ability to translate complex technical concepts for engineering, product and executive audiences
- Experience designing or evolving internal platforms or self-service capabilities that materially improve developer experience, delivery throughput, or operational efficiency
- Strong background in observability (metrics, logs, traces), incident management, and reliability practices, with a track record of improving system health and reducing operational toil
- Deep understanding of performance optimization, system resilience, and observability in high-scale production environments
- Experience working in regulated industries such as healthcare or fintech, including familiarity with compliance-driven architectural and security considerations
- Familiarity with healthcare data standards (e.g. FHIR, HL7) and platform security best practices