Availity, LLC. is a leading healthcare engagement platform focused on transforming the healthcare landscape. The Automation Engineer IV will play a key role in deploying, managing, and maintaining Event Management processes and automation that enhance system reliability and resilience.
Responsibilities:
- Mentoring and providing technical expertise to other members of the team as well as the broader organization
- Acting as one of two senior level resources on this 4 person team
- Designing, building, and maintaining Event Management and remediation automation that detects issues, enriches alerts with context, and escalates or resolves incidents appropriately
- Implementing and maturing event ingestion, correlation, and noise reduction strategies to ensure actionable, high-fidelity alerts
- Developing automated remediation workflows and runbook-driven actions that are safe, repeatable, and continuously improved
- Assisting in the design, development, and maintenance of tools that automate daily operational tasks and incident response actions
- Collaborating with observability and service management teams to improve signal quality, incident triage, and mean time to recovery (MTTR)
- Creative problem solving that includes investigating the root cause of technical anomalies and creating innovative, automated solutions
- Production incident management and escalations, with a focus on automation-first recovery where appropriate
- Research! Availity has a large/enterprise healthcare processing platform with many technologies to become familiar/expert level in understanding and supporting
- Expect the unexpected, be prepared to multi-task to stay in front of the workflow in a high-uptime, mission-critical environment
Requirements:
- 3–5 years of operations experience with large-scale, high-availability distributed systems
- Proven experience in AWS infrastructure, including EC2, S3, Lambda, and CloudFormation
- Splunk, New Relic, or other enterprise observability tool experience
- Experience designing event correlation, alert noise reduction, or automated remediation in ServiceNow
- Proficiency in Linux system administration
- Scripting experience with Bash or Python
- Experience building containerized applications
- Experience administering application container environments, specifically EKS
- Strong understanding of production monitoring, alerting, and incident response concepts
- Excellent problem-solving skills and the ability to work well in a fast-paced, collaborative environment
- Strong communication skills, with the ability to effectively convey technical concepts to both technical and non-technical stakeholders