Kontakt.io is building a platform that automates and orchestrates clinical workflows in hospitals, enhancing efficiency through AI and real-time data. They are seeking a Lead Software Engineer - SRE to drive the reliability, scalability, and performance of their AWS-based platform, while mentoring engineers and shaping technical strategy.

Responsibilities:

Lead the design and implementation of scalable, fault-tolerant, and self-healing infrastructure and services across AWS and Kubernetes
Collaborate with Product, Engineering, and Infrastructure teams to align SRE initiatives with business priorities and platform needs
Define and drive adoption of SLIs, SLOs, and SLAs to ensure consistent performance and high reliability across the platform
Own and evolve observability strategies using Prometheus, OpenTelemetry, Grafana, and related tooling
Design and maintain infrastructure as code (Terraform) and drive GitOps best practices
Oversee major incident response and on-call practices, including incident reviews and long-term remediation planning
Mentor and support the growth of SRE and platform engineers, fostering a culture of engineering rigor and operational excellence
Contribute to the long-term reliability roadmap and architecture of high-throughput, real-time systems in healthcare operations
Drive process improvements in CI/CD, service ownership, chaos engineering, disaster recovery, and secure deployment

Requirements:

5+ years of experience in Site Reliability Engineering, Cloud Infrastructure, or Platform Engineering
5+ years of software engineering experience building production-grade systems (Java, Python, Go, or similar)
Proven success scaling high-traffic, mission-critical platforms in SaaS, IoT, or healthcare environments
Deep expertise in cloud platforms (especially AWS), Kubernetes, and distributed system architecture
Hands-on experience with monitoring, logging, and observability tools (Prometheus, OpenTelemetry, Datadog, etc.)
Extensive knowledge of CI/CD automation, GitOps workflows, and infrastructure-as-code (Terraform, Helm, ArgoCD)
A track record of leading major incident response and running postmortems with a blameless, learning-focused approach
Strong understanding of networking, access control, and security within regulated environments (HIPAA, SOC 2)
A leadership mindset—able to drive cross-functional alignment, lead initiatives, and mentor a high-performance SRE team

Lead Software Engineer - SRE

Key skills

About this role

Responsibilities:

Requirements: