LMI is a digital solutions provider focused on enhancing government impact through innovative technology. They are seeking a Health DevOps Engineer to support public health systems management, ensuring the stability and performance of healthcare technology infrastructure and applications.

Responsibilities:

Design, implement, and maintain monitoring and alerting systems for production and development environments to ensure high availability and reliability
Leverage tools like Prometheus, Grafana, DataDog, Elastic Stack, or equivalent to track system performance and application health
Proactively detect and troubleshoot performance bottlenecks, infrastructure issues, and failures
Optimize the performance and reliability of highly available systems supporting healthcare payment applications
Lead incident response efforts, including root cause analysis, and implement measures to prevent recurrence
Develop and maintain service level objectives (SLOs) and service level indicators (SLIs) to measure system reliability and availability
Ensure the consistent delivery of high-quality health-related data in compliance with industry standards such as HIPAA
Create and maintain automated deployment pipelines (CI/CD) to reduce release cycle times and improve workflows for developers
Automate infrastructure provisioning and management through tools such as Terraform, Helm, Ansible, or CloudFormation
Improve the operational efficiency of development and deployment processes
Deploy, maintain, and scale cloud infrastructure on platforms such as AWS, Azure, or Google Cloud, ensuring compliance with healthcare-sector security and privacy requirements
Implement and manage container orchestration platforms such as Kubernetes and Docker to ensure efficient resource usage and scalability
Optimize cloud resources for cost efficiency and streamline infrastructure provisioning
Partner with development and product teams to ensure seamless integration of observability tools and reliability practices through every stage of the software delivery lifecycle
Document architecture, processes, metrics, and troubleshooting guides to support scalability and knowledge sharing across the organization
Actively contribute to improving engineering workflows, reliability processes, and operational excellence

Requirements:

Bachelor's degree in Computer Science, Software Engineering, or a related field
Minimum of 2 years of professional experience in DevOps, Site Reliability Engineering (SRE), or a related role, preferably focused on observability and reliability
Hands-on experience with monitoring tools such as Prometheus, Grafana, ELK stack, DataDog, New Relic, or similar platforms
Experience with containerization and orchestration technologies like Docker and Kubernetes
Proficiency with cloud platforms such as AWS
Solid understanding of infrastructure as code (IaC) with tools like Terraform, Ansible, or CloudFormation
Knowledge of scripting languages such as Python, Bash, or PowerShell for automation tasks
Familiarity with CI/CD tools such as Jenkins, GitLab CI/CD, CircleCI, or similar frameworks
Strong understanding of network protocols, monitoring, and troubleshooting best practices
Strong problem-solving and analytical abilities, with extreme attention to detail and a commitment to reliability and excellence
Excellent written and verbal communication skills, able to interact effectively with cross-functional teams
Ability to work under pressure and prioritize tasks in fast-paced environments
Experience with healthcare-focused projects or systems
Familiarity with healthcare compliance requirements such as HIPAA
Certifications such as AWS Certified DevOps Engineer, Azure DevOps Engineer Expert, or Linux Foundation Certified Kubernetes Administrator
Experience in federal consulting
Demonstrated experience with healthcare data projects (e.g. claims processing or payment systems) or financial/banking systems

Health DevOps Engineer - Observability/Reliability

Key skills

About this role

Responsibilities:

Requirements: