Availity, LLC. is a leading healthcare engagement platform focused on transforming the healthcare landscape. The Automation Engineer IV will play a key role in deploying, managing, and maintaining Event Management processes and automation that enhance system reliability and resilience.

Responsibilities:

Mentoring and providing technical expertise to other members of the team as well as the broader organization
Acting as one of two senior level resources on this 4 person team
Designing, building, and maintaining Event Management and remediation automation that detects issues, enriches alerts with context, and escalates or resolves incidents appropriately
Implementing and maturing event ingestion, correlation, and noise reduction strategies to ensure actionable, high-fidelity alerts
Developing automated remediation workflows and runbook-driven actions that are safe, repeatable, and continuously improved
Assisting in the design, development, and maintenance of tools that automate daily operational tasks and incident response actions
Collaborating with observability and service management teams to improve signal quality, incident triage, and mean time to recovery (MTTR)
Creative problem solving that includes investigating the root cause of technical anomalies and creating innovative, automated solutions
Production incident management and escalations, with a focus on automation-first recovery where appropriate
Research! Availity has a large/enterprise healthcare processing platform with many technologies to become familiar/expert level in understanding and supporting
Expect the unexpected, be prepared to multi-task to stay in front of the workflow in a high-uptime, mission-critical environment

Requirements:

3–5 years of operations experience with large-scale, high-availability distributed systems
Proven experience in AWS infrastructure, including EC2, S3, Lambda, and CloudFormation
Splunk, New Relic, or other enterprise observability tool experience
Experience designing event correlation, alert noise reduction, or automated remediation in ServiceNow
Proficiency in Linux system administration
Scripting experience with Bash or Python
Experience building containerized applications
Experience administering application container environments, specifically EKS
Strong understanding of production monitoring, alerting, and incident response concepts
Excellent problem-solving skills and the ability to work well in a fast-paced, collaborative environment
Strong communication skills, with the ability to effectively convey technical concepts to both technical and non-technical stakeholders

Automation Engineer IV

Key skills

About this role

Responsibilities:

Requirements: