iHeartMedia is the number one audio company in America, reaching 90% of Americans every month. The Senior Site Reliability Engineer will lead a team of SREs/DevOps Engineers to develop automated, resilient cloud services and ensure operational efficiency and high availability of applications.

Responsibilities:

Standardize and modernize Amazon EKS platforms & AWS Serverless Suites, including all Cutting-Edge Managed Services from AWS adhering to DevOps best practices
Provide expertise and hands on implementation of large-scale, mission critical Kubernetes workloads with High Resiliency and multi-region architecture
Work collaboratively with 2 to 5 Site Reliability Engineers
Champion accountability; take responsibility through actions & thoughts
Design and implement end-end CI/CD pipelines with CDK and CodePipeline, including integrating with source control, build tools and deployment targets like CFT stacks
Prioritize & re-align quickly to adapt to a demanding fast paced Shift Left environment
Maximize automation to improve speed and quality while relentlessly driving low-value, repetitive work out of our operational activities
Work with our application delivery teams to design and build scalable and maintainable solutions for our customers
Enforce GitOps workflow where Git is the source of truth for EKS clusters and app state in a multi-account and multi-region environment (FluxCD/ArgoCD)
Develop baselines for governance, consumption/cost and performance to ensure that our elastic cloud-based applications operate efficiently, securely and with zero down time
Run Reliability Incident management processes along with Root Cause Analysis, developing Runbooks, & Self-Healing architecture
Instill Standardization in DevOps processes across a wide range of applications

Requirements:

6+ years of hands-on experience in public cloud specifically AWS
3+ years of leading SRE/DevOps teams across complex AWS ecosystems
Deep understanding of high velocity SDLC best practices along with CI/CD & Application/infrastructure Monitoring practices to operate workloads at high scale
Expert proficiency in Kubernetes, Terraform, AWS CDK, Lambda, API Gateway, Route53, S3, EC2, Load Balancing, DynamoDB, CloudWatch, IAM, Networking, IOT, SQS, Event Bridge, etc
Adept at solving & troubleshooting High volume Distributed architecture applications running on AWS
Demonstrated ability to design, build, and maintain AWS infrastructure using AWS CDK (TypeScript preferred) with strong modular patterns (multi-stack, multi-account, multi-region)
Strong understanding of GitOps methodologies, experience in implementing and managing multiple environments through declarative configuration management versioned in Git repos and applied via automated tools like Flux or ArgoCD
Hands-on experience managing large-scale, production EKS clusters across multiple regions and AWS accounts
Deep knowledge of AWS Cost optimization techniques such as Reserved Instances, Spot Instances, and Life Cycle Management
Proven ability to build highly secure AWS Infrastructure with a security first mindset
Proven ability to collaborate and build strong relationships with development teams including Conflict Resolutions & driving decisions/initiatives
Strong software development background including knowledge of microservices architecture along with fluency in JavaScript, TypeScript, or Node.JS or Python
At least one among the following AWS Certifications: AWS Solution Architect Associate, AWS Solution Architect Professional, AWS DevOps Associate, AWS DevOps Professional, Professional Kubernetes Certifications

Senior Site Reliability Engineer

Key skills

About this role

Responsibilities:

Requirements: