Working on Internet technologies to improve the performance, availability, and scalability of large distributed content delivery systems
Engaging in collaborative efforts with cross-functional teams to define and establish measurable Service Level Indicators and Service Level Objectives
Monitoring platform availability and performance, debug issues by leveraging data analysis skills and implement corrective actions to avoid recurrence
Developing and implement automation solutions to improve operational efficiency and reduce toil.
Improving CI/CD pipelines and safe deployment practices for platform services.
Participating in design reviews and providing technical guidance to ensure designs meet requirements for scalability, performance, and robustness
Requirements
Have 2 years of relevant experience and a Bachelor's degree in Computer Science or its equivalent
Have hands-on experience with compute platforms such as Kubernetes, Containerization, and Docker
Have experience with monitoring and alerting systems (e.g., Prometheus, Grafana, ADBMS, Datadog), including metric collection, alerting, dashboarding, and troubleshooting
Show fluency working in a UNIX/Linux computing environment
Have familiarity with infrastructure-as-code tools such as Terraform
Have proficiency with a configuration management tool such as Ansible, Salt Stack, Chef, Puppet, or similar
Tech Stack
Ansible
Chef
Docker
Grafana
Kubernetes
Linux
Prometheus
Puppet
SaltStack
Terraform
Unix
Benefits
Healthcare
401K savings plan
Company holidays
Vacation (in the form of PTO)
Sick time
Family friendly benefits including parental leave
Employee assistance program including a focus on mental and financial wellness
Employee Stock Purchase Plan (ESPP)
Site Reliability Engineer – II at Akamai Technologies | JobVerse