Availity is a healthcare technology company that delivers revenue cycle and related business solutions for healthcare professionals. As a Platform Engineer IV, you will lead observability practices and ensure the health and stability of critical infrastructure systems that support U.S. healthcare services.

Responsibilities:

Own and evolve our observability practices at enterprise scale
Lead the tooling and support for observability and logging services (New Relic, Splunk, Cribl, OpenTelemetry) with reliability as your north star
Oversee the delivery of automated deployment solutions for observability tools and governance that ensures observability coverage for mission-critical internal platforms in our AWS private cloud
Guide and mentor engineers while setting the bar for operational excellence
Provide technical leadership for the infrastructure engineering and operations team focused on observability services
Owning and advancing the observability practices implemented across all enterprise technology groups
Managing observability and logging platforms including:
Splunk (EKS + on-prem components, forwarders, deployment server)
Cribl operational pipelines (EKS-based)
New Relic SaaS integrations and Prometheus data ingestion
OpenTelemetry & KubeLogging/Banzai Operator for distributed tracing and logging pipelines
Prometheus/Grafana migrations from on-prem OCP to AWS for metrics scraping and synthetic monitoring
Overseeing observability deployment solutions for platforms hosted in AWS
Driving infrastructure-as-code practices (Terraform, Helm, Ansible) for repeatable deployments and environment consistency
Collaborating with engineering, middleware, and product teams to define clear ownership, reduce friction, and ensure platform services enable—not block—delivery
Ensuring upgrades, patching, and platform updates are proactively planned and executed without business disruption
Setting reliability targets and defining operational metrics (availability, latency, error budgets) in line with SRE methodologies

Requirements:

Bachelor's degree in computer science or related field, or equivalent work experience
7-10 years of relevant technical and business experience in IT systems delivery, operations, and support (preferably in healthcare or high-transaction environments)
3+ years of experience leading technical engineering efforts involving implementation and management of IT systems
Hands-on expertise with leading observability practices and architecture across enterprise and at scale
Managing observability platforms and monitoring tools: Splunk, Cribl, Prometheus/Grafana, OpenTelemetry, New Relic
Terraform, Helm, and AWS services (VPC, IAM, EC2, EKS, Istio)
Experience bridging infrastructure and development teams, ensuring alignment of roadmaps and goals
Strong leadership skills with the ability to motivate and guide technical teams
Excellent communication skills, with the ability to explain complex technical concepts to both technical and non-technical stakeholders
SaaS experience supporting large-scale, mission-critical systems
Familiarity with IaC application deployment pipelines for packaged software (commercial and open source) and re-platforming to cloud-native environments
Knowledge of service mesh concepts (Istio, Linkerd, etc.)
Background in metrics-driven reliability engineering (SLOs, SLIs, error budgets)
Experience with scripting/programming (JavaScript for Cribl, Python, etc.)

Platform Engineer IV (Observability)

Key skills

About this role

Responsibilities:

Requirements: