The Hartford is an insurance company committed to making a difference and helping others achieve their goals. They are seeking a Cloud Reliability Engineering Lead to drive the reliability, scalability, and performance of API Hosting Platforms across multiple cloud providers, while building a team to ensure secure and continuously available Cloud API Platforms.

Responsibilities:

Lead the design and implementation of reliability strategies across the API hosting platform, including availability, performance, capacity planning, and operational readiness
Define and enforce reliability standards, SLIs/SLOs, and error budgets for platform services and customer-facing APIs
Oversee incident management, ensuring strong triage, root-cause analysis, and preventive action development for Platform issues
Drive automation to reduce manual operations, improve deployment safety, and strengthen platform secure baselines
Establish and maintain robust observability practices, including logging, metrics, tracing, and synthetic monitoring
Build and Lead a team of reliability engineers, providing mentorship, coaching, and technical direction
Work with application owners to prioritize reliability-focused backlog items and improve platform health over time
Identify and implement cost savings opportunities
Serve as a subject‑matter expert for reliability engineering best practices across the organization
Collaborate with security teams to ensure platform compliance with enterprise security standards
Integrate security practices into CI/CD workflows and platform architecture
Participate in risk assessments, audits, and compliance reviews for API platform services
Advocate for modern reliability practices (e.g., chaos engineering, resilience testing, auto‑remediation)
Evaluate and introduce new technologies, tooling, and methodologies to keep platform operations modern and efficient
Monitor industry trends and translate them into actionable platform improvements

Requirements:

8+ years of technical experience, engineering, platform management and operations roles with a demonstrated track record of technical innovation and experience leading technically diverse teams
Strong cloud engineering mindset with cloud experience across public cloud providers and the technologies most frequently used in engineering and managing highly reliable and automated technology environments
Strong experience with API management or hosting platforms (Apigee, AWS API Gateway)
Expertise with cloud-native technologies (Kubernetes, containers, distributed systems)
Deep knowledge of performance and observability tools such as Dynatrace, Splunk, CloudWatch, Cloud Trail, and related tools
Proven track record leading engineering teams or technical initiatives
Strong understanding of CI/CD, release automation, and DevOps tooling
Excellent communication, stakeholder management, and problem‑solving skills
Knowledge of networking fundamentals, API security, and Zero Trust principles
Experience with incident command roles in major incident processes
Strong knowledge and experience with cloud product management, cloud engineering, and Agile principles
Strong Experience with automation tools such as Ansible and Terraform
Exceptional critical thinking and problem-solving skills
Able to influence diverse teams and build strong business relationships

API Management Platform - Cloud Reliability Engineering Lead

Key skills

About this role

Responsibilities:

Requirements: