Gifthealth is revolutionizing the way people experience healthcare by simplifying the process of managing prescriptions and health services. They are seeking a Lead Site Reliability Engineer to build reliable, scalable software systems and support DevOps practices that enhance application performance and resilience.

Responsibilities:

Designs, builds, and maintains reliable, scalable software systems supporting Ruby on Rails applications
Embeds reliability, performance, and operational best practices into application code and development workflows
Owns DevOps practices including CI/CD reliability, deployment strategies, and release safety
Leads incident response, debugging, and root cause analysis across application and platform layers
Implements and evolves observability (logging, metrics, tracing) within application and service code
Partners with engineering teams on architecture, capacity planning, and technical standards

Requirements:

Bachelor's degree in computer science, engineering, or related field OR equivalent professional experience in software engineering, SRE, or DevOps roles
5+ years of experience in software engineering, SRE, or DevOps roles
Hands-on experience building and operating Ruby on Rails applications in production
Experience in owning production incidents and application-level reliability
Knowledge of Ruby on Rails application architecture and production operations; software reliability engineering principles (SLOs, SLIs, error budgets); and modern DevOps and CI/CD practices
Strong software engineering skills (Ruby and/or comparable backend languages)
Debugging and performance optimization of production applications skills
CI/CD pipelines, deployment automation, and release tooling skills
Monitoring and observability tooling (Datadog, New Relic, Prometheus, etc.) skills
Ability to write production-quality code that improves system reliability
Ability to collaborate with product and engineering teams to influence design decisions
Ability to troubleshoot complex, cross-system failures
Cloud platform certifications (AWS, GCP, Azure)
SRE or DevOps-focused certifications
Experience in high-growth or scaling engineering organizations
Experience working in regulated or customer-impact–sensitive environments
Knowledge of security and compliance considerations in production systems
Infrastructure as Code (Terraform or similar) skills
Containerization and orchestration (Docker) skills
Ability to mentor engineers on operational ownership and reliability practices
Ability to balance speed of delivery with long-term system health

Lead Site Reliability Engineer

Key skills

About this role

Responsibilities:

Requirements: