Coalition is the world's first Active Insurance provider designed to help prevent digital risk before it strikes. They are seeking a Staff Site Reliability Engineer to lead AI enablement across their engineering organization, focusing on integrating AI tools into the software development lifecycle and ensuring the reliability and security of AI-generated outputs.
Responsibilities:
- Define and own the standards and best practices for AI-assisted development across the engineering organization, from tool selection to workflow integration
- Evaluate, build, or adopt AI-powered tools that improve code quality, catch vulnerabilities earlier in the development process, and reduce review cycle times — whether that means evolving internal solutions or identifying and integrating third-party platforms
- Partner with engineering teams to understand what's impacting their AI tool adoption, guide them through improvements, and lead org-wide enablement efforts such as lunch-and-learns, workshops, and documentation
- Establish metrics and feedback loops to quantify the impact of AI tooling on developer productivity, code quality, and delivery speed
- Contribute to the design and scaling of production environments using AWS and Terraform when on rotation or as needs arise
- Mentor engineers across the team, uphold high infrastructure quality, and actively shape the best practices and standards used by the organization
- Participate in a low-volume on-call rotation
Requirements:
- 8–10+ years of experience in SRE, DevOps, Cloud Engineering, Platform Engineering, or Software Development roles
- Hands-on experience with AI-assisted development tools such as Cursor, GitHub Copilot, or similar
- Experience building AI/LLM-powered developer tools or integrations
- Demonstrated ability to drive org-wide tooling adoption, including change management, training, and measuring outcomes
- Proficiency in prompt engineering techniques
- Proficiency in Go or Python, with experience building production-grade automation, tooling, or libraries
- Hands-on experience operating production environments in AWS
- Strong experience with Terraform
- Experience with container orchestration platforms like ECS or Kubernetes
- Familiarity with CI/CD tools such as GitHub Actions
- Solid understanding of observability practices including system metrics, distributed tracing, and SLOs. Datadog is a plus
- Exceptional communication and presentation skills, both written and verbal
- Experience troubleshooting complex distributed systems in a high-traffic production environment
- Exposure to event streaming systems such as Kafka or Kinesis
- Experience building Internal Developer Platforms (IDP) or designing self-service infrastructure workflows
- Familiarity with systems security, compliance requirements, or infrastructure hardening
- Experience with agentic AI workflows, MCP frameworks, or AI-powered automation beyond code generation
- Track record of leading incident response or driving post-incident review processes