Coalition is the world's first Active Insurance provider designed to help prevent digital risk before it strikes. They are seeking a Staff Site Reliability Engineer to lead AI enablement across their engineering organization, focusing on integrating AI tools into the software development lifecycle and ensuring the reliability and security of AI-generated outputs.

Responsibilities:

Define and own the standards and best practices for AI-assisted development across the engineering organization, from tool selection to workflow integration
Evaluate, build, or adopt AI-powered tools that improve code quality, catch vulnerabilities earlier in the development process, and reduce review cycle times — whether that means evolving internal solutions or identifying and integrating third-party platforms
Partner with engineering teams to understand what's impacting their AI tool adoption, guide them through improvements, and lead org-wide enablement efforts such as lunch-and-learns, workshops, and documentation
Establish metrics and feedback loops to quantify the impact of AI tooling on developer productivity, code quality, and delivery speed
Contribute to the design and scaling of production environments using AWS and Terraform when on rotation or as needs arise
Mentor engineers across the team, uphold high infrastructure quality, and actively shape the best practices and standards used by the organization
Participate in a low-volume on-call rotation

Requirements:

8–10+ years of experience in SRE, DevOps, Cloud Engineering, Platform Engineering, or Software Development roles
Hands-on experience with AI-assisted development tools such as Cursor, GitHub Copilot, or similar
Experience building AI/LLM-powered developer tools or integrations
Demonstrated ability to drive org-wide tooling adoption, including change management, training, and measuring outcomes
Proficiency in prompt engineering techniques
Proficiency in Go or Python, with experience building production-grade automation, tooling, or libraries
Hands-on experience operating production environments in AWS
Strong experience with Terraform
Experience with container orchestration platforms like ECS or Kubernetes
Familiarity with CI/CD tools such as GitHub Actions
Solid understanding of observability practices including system metrics, distributed tracing, and SLOs. Datadog is a plus
Exceptional communication and presentation skills, both written and verbal
Experience troubleshooting complex distributed systems in a high-traffic production environment
Exposure to event streaming systems such as Kafka or Kinesis
Experience building Internal Developer Platforms (IDP) or designing self-service infrastructure workflows
Familiarity with systems security, compliance requirements, or infrastructure hardening
Experience with agentic AI workflows, MCP frameworks, or AI-powered automation beyond code generation
Track record of leading incident response or driving post-incident review processes

Staff Site Reliability Engineer

Key skills

About this role

Responsibilities:

Requirements: