Artisight transforms hospital operations with its Smart Hospital Platform, helping health systems reduce costs and improve efficiency. The Senior Site Reliability Engineer will architect healthcare technology systems, ensuring reliability and building resilient infrastructures that enhance patient care.

Responsibilities:

Serve as the go-to expert for complex L2 support issues, diving deep into our stack to not just fix problems but eliminate their root causes forever
Engineer automation solutions that turn repetitive operational tasks into seamless, intelligent workflows, because manual work should be the exception, not the rule
Design and implement next-generation observability platforms that provide crystal-clear insights into system health before problems become incidents
Partner with development teams to bake reliability directly into new features and architectural decisions from day one
Lead incident response with surgical precision, conducting thorough post-mortems that transform failures into learning opportunities
Mentor emerging talent across engineering teams, spreading the SRE mindset and elevating our collective technical capabilities
Hunt down performance bottlenecks across our infrastructure and applications, optimizing for speed and efficiency at scale

Requirements:

Expert-level Python (scripting, automation, tooling)
Linux proficiency (Ubuntu preferred); system admin, networking, troubleshooting
Docker (containerization)
Kubernetes: deployment, management, and troubleshooting of clusters and applications
CI/CD pipelines: able to own and improve delivery workflows
Infrastructure as code (Terraform, Ansible, or similar): able to write and maintain IaC
Networking fundamentals (TCP/IP, DNS, Load Balancing, Firewalls): sufficient to diagnose and resolve production issues independently
Monitoring and alerting tools (Prometheus, Grafana, ELK, Datadog, New Relic): able to design and implement coverage

Senior Site Reliability Engineer

Key skills

About this role

Responsibilities:

Requirements: