LMI is a digital solutions provider focused on enhancing government impact through innovative technology. They are seeking a Health DevOps Engineer to support public health systems management, ensuring the stability and performance of healthcare technology infrastructure and applications.
Responsibilities:
- Design, implement, and maintain monitoring and alerting systems for production and development environments to ensure high availability and reliability
- Leverage tools like Prometheus, Grafana, DataDog, Elastic Stack, or equivalent to track system performance and application health
- Proactively detect and troubleshoot performance bottlenecks, infrastructure issues, and failures
- Optimize the performance and reliability of highly available systems supporting healthcare payment applications
- Lead incident response efforts, including root cause analysis, and implement measures to prevent recurrence
- Develop and maintain service level objectives (SLOs) and service level indicators (SLIs) to measure system reliability and availability
- Ensure the consistent delivery of high-quality health-related data in compliance with industry standards such as HIPAA
- Create and maintain automated deployment pipelines (CI/CD) to reduce release cycle times and improve workflows for developers
- Automate infrastructure provisioning and management through tools such as Terraform, Helm, Ansible, or CloudFormation
- Improve the operational efficiency of development and deployment processes
- Deploy, maintain, and scale cloud infrastructure on platforms such as AWS, Azure, or Google Cloud, ensuring compliance with healthcare-sector security and privacy requirements
- Implement and manage container orchestration platforms such as Kubernetes and Docker to ensure efficient resource usage and scalability
- Optimize cloud resources for cost efficiency and streamline infrastructure provisioning
- Partner with development and product teams to ensure seamless integration of observability tools and reliability practices through every stage of the software delivery lifecycle
- Document architecture, processes, metrics, and troubleshooting guides to support scalability and knowledge sharing across the organization
- Actively contribute to improving engineering workflows, reliability processes, and operational excellence
Requirements:
- Bachelor's degree in Computer Science, Software Engineering, or a related field
- Minimum of 2 years of professional experience in DevOps, Site Reliability Engineering (SRE), or a related role, preferably focused on observability and reliability
- Hands-on experience with monitoring tools such as Prometheus, Grafana, ELK stack, DataDog, New Relic, or similar platforms
- Experience with containerization and orchestration technologies like Docker and Kubernetes
- Proficiency with cloud platforms such as AWS
- Solid understanding of infrastructure as code (IaC) with tools like Terraform, Ansible, or CloudFormation
- Knowledge of scripting languages such as Python, Bash, or PowerShell for automation tasks
- Familiarity with CI/CD tools such as Jenkins, GitLab CI/CD, CircleCI, or similar frameworks
- Strong understanding of network protocols, monitoring, and troubleshooting best practices
- Strong problem-solving and analytical abilities, with extreme attention to detail and a commitment to reliability and excellence
- Excellent written and verbal communication skills, able to interact effectively with cross-functional teams
- Ability to work under pressure and prioritize tasks in fast-paced environments
- Experience with healthcare-focused projects or systems
- Familiarity with healthcare compliance requirements such as HIPAA
- Certifications such as AWS Certified DevOps Engineer, Azure DevOps Engineer Expert, or Linux Foundation Certified Kubernetes Administrator
- Experience in federal consulting
- Demonstrated experience with healthcare data projects (e.g. claims processing or payment systems) or financial/banking systems