iHeartMedia is the number one audio company in America, reaching 90% of Americans every month. The Senior Site Reliability Engineer will lead a team of SREs/DevOps Engineers to modernize cloud services and implement automated, resilient infrastructure solutions.
Responsibilities:
- Standardize and modernize Amazon EKS platforms & AWS Serverless Suites, including all Cutting-Edge Managed Services from AWS adhering to DevOps best practices
- Provide expertise and hands on implementation of large-scale, mission critical Kubernetes workloads with High Resiliency and multi-region architecture
- Work collaboratively with 2 to 5 Site Reliability Engineers
- Champion accountability; take responsibility through actions & thoughts
- Design and implement end-end CI/CD pipelines with CDK and CodePipeline, including integrating with source control, build tools and deployment targets like CFT stacks
- Prioritize & re-align quickly to adapt to a demanding fast paced Shift Left environment
- Maximize automation to improve speed and quality while relentlessly driving low-value, repetitive work out of our operational activities
- Work with our application delivery teams to design and build scalable and maintainable solutions for our customers
- Enforce GitOps workflow where Git is the source of truth for EKS clusters and app state in a multi-account and multi-region environment (FluxCD/ArgoCD)
- Develop baselines for governance, consumption/cost and performance to ensure that our elastic cloud-based applications operate efficiently, securely and with zero down time
- Run Reliability Incident management processes along with Root Cause Analysis, developing Runbooks, & Self-Healing architecture
- Instill Standardization in DevOps processes across a wide range of applications
Requirements:
- 6+ years of hands-on experience in public cloud specifically AWS
- 3+ years of leading SRE/DevOps teams across complex AWS ecosystems
- Deep understanding of high velocity SDLC best practices along with CI/CD & Application/infrastructure Monitoring practices to operate workloads at high scale
- Expert proficiency in Kubernetes, Terraform, AWS CDK, Lambda, API Gateway, Route53, S3, EC2, Load Balancing, DynamoDB, CloudWatch, IAM, Networking, IOT, SQS, Event Bridge, etc
- Adept at solving & troubleshooting High volume Distributed architecture applications running on AWS
- Demonstrated ability to design, build, and maintain AWS infrastructure using AWS CDK (TypeScript preferred) with strong modular patterns (multi-stack, multi-account, multi-region)
- Strong understanding of GitOps methodologies, experience in implementing and managing multiple environments through declarative configuration management versioned in Git repos and applied via automated tools like Flux or ArgoCD
- Hands-on experience managing large-scale, production EKS clusters across multiple regions and AWS accounts
- Deep knowledge of AWS Cost optimization techniques such as Reserved Instances, Spot Instances, and Life Cycle Management
- Proven ability to build highly secure AWS Infrastructure with a security first mindset
- Proven ability to collaborate and build strong relationships with development teams including Conflict Resolutions & driving decisions/initiatives
- Strong software development background including knowledge of microservices architecture along with fluency in JavaScript, TypeScript, or Node.JS or Python
- At least one among the following AWS Certifications: AWS Solution Architect Associate, AWS Solution Architect Professional, AWS DevOps Associate, AWS DevOps Professional, Professional Kubernetes Certifications