Empower Pharmacy is a visionary healthcare company dedicated to making quality, affordable medication accessible to millions of patients nationwide. The Staff DevOps & Site Reliability Engineer drives enterprise infrastructure reliability, scalability, and security across hybrid and multi-cloud environments, directly impacting system uptime, product quality, and operational efficiency.
Responsibilities:
- Cloud Architecture: Design and operate scalable hybrid and multi-cloud infrastructure across Azure, AWS, and on-prem environments, ensuring high availability, resilience, and cost efficiency while leveraging AI-driven insights to optimize system performance, resource allocation, and architectural decisions
- Platform Automation: Build and maintain Infrastructure as Code frameworks using Terraform, Bicep, or similar tools, enabling consistent, auditable deployments while integrating AI-assisted automation to accelerate provisioning, reduce errors, and enhance infrastructure lifecycle management
- Network Design: Engineer secure, high-performance networking solutions including hybrid connectivity, segmentation, VPNs, and zero trust architectures, using AI-enhanced analytics to proactively detect vulnerabilities, optimize traffic flows, and ensure secure, compliant communication
- Reliability Engineering: Establish and evolve SRE practices including SLIs, SLOs, and error budgets, leveraging AI-driven observability platforms to improve system reliability, automate incident detection, and enable proactive remediation
- Incident Management: Lead incident response, root cause analysis, and post-incident improvements, applying AI-powered anomaly detection and predictive analytics to reduce mean time to resolution, prevent recurrence, and strengthen operational resilience
- Capacity Planning: Drive intelligent capacity forecasting and performance optimization using AI models, ensuring infrastructure scales efficiently with demand while maintaining cost discipline, system reliability, and alignment with business growth objectives
- AIOps Integration: Design and implement AI-driven operational capabilities, including predictive monitoring, anomaly detection, and automated remediation, transforming traditional operations into intelligent, self-healing systems
- Data Infrastructure: Build AI-ready infrastructure platforms that support advanced analytics, automation pipelines, and enterprise AI workloads, ensuring secure, scalable environments that enable innovation while maintaining regulatory compliance
- Decision Intelligence: Leverage AI and data-driven insights to inform infrastructure strategy, optimize performance, and enhance operational decision-making
Requirements:
- Bachelor's degree in Information Systems, Computer Science, Engineering, or related field required; master's degree preferred
- 8–10 years of hands-on experience in infrastructure engineering, DevOps, or Site Reliability Engineering roles
- Proven experience designing, implementing, and operating hybrid infrastructure solutions across on-premises and cloud environments
- Hands-on experience designing and operating solutions across both Microsoft Azure and AWS in production environments
- Strong hands-on expertise in Microsoft Active Directory, Hybrid AD, Microsoft Entra ID, and Group Policy
- Experience working in regulated environments with knowledge of HIPAA, SOC 2 Type II, and/or HITRUST compliance requirements
- Experience operating infrastructure remotely using secure access methods, including VDI, bastion hosts, or privileged access workstations
- Strong problem-solving, documentation, and communication skills, with the ability to influence technical direction and drive engineering best practices across teams
- Preferred certifications include Azure Solutions Architect Expert, AWS Solutions Architect, Azure Administrator, AWS SysOps Administrator, Azure DevOps Engineer Expert, AWS DevOps Engineer Professional, HashiCorp Terraform Associate, CISSP, CISM, Microsoft Identity and Access Administrator, or Azure Security Engineer