Altera Digital Health is a company focused on enhancing healthcare provider capabilities. As a Site Reliability Engineer (SRE), you will ensure the reliability, scalability, and performance of hosted healthcare platforms, while leading incident management and continuous improvement efforts.
Responsibilities:
- Maintain and improve the reliability, availability, and performance of our production environments
- Lead the investigation and resolution of complex application, database, and infrastructure issues
- Participate in incident management, conduct root cause analysis (RCA), and contribute to post-incident reviews to prevent future occurrences
- Define and measure Service Level Indicators (SLIs) and Objectives (SLOs) to meet our service commitments
- Develop proactive monitoring and alerting strategies to identify and resolve issues before they impact customers
- Automate operational tasks using scripting and Infrastructure-as-Code (IaC) to improve efficiency
- Partner with engineering and cloud teams to refine deployment, monitoring, and support processes
- Provide technical leadership during major incidents and act as a key escalation point for critical issues
Requirements:
- 7+ years of experience supporting enterprise applications, infrastructure, or cloud environments
- Strong experience with APM tools such as LogicMonitor, AppDynamics, Azure Monitor, SentryOne, Dynatrace, Datadog, or New Relic
- Deep knowledge of Windows Server administration, IIS, .NET applications, Windows Clustering, MSMQ, Event Logs, and PerfMon
- Strong SQL Server experience, including performance tuning, query optimization, blocking analysis, and Always On Availability Groups
- Experience with Azure cloud environments and a solid understanding of networking fundamentals (DNS, TCP/IP, load balancing, firewalls)
- Familiarity with ServiceNow (or other ITSM platforms) and ITIL principles
- Scripting with PowerShell, Python, or similar languages
- Infrastructure as Code (Terraform, ARM Templates, Bicep)
- CI/CD pipelines and deployment automation (Azure DevOps, GitHub Actions)
- Experience with Kubernetes and containerized workloads
- Experience implementing SLOs, SLIs, and Error Budgets
- Experience in a healthcare technology or patient care environment
- Bachelor's Degree in Computer Science, Information Technology, or Engineering is preferred; equivalent professional experience will be considered