ZON Multimédia is seeking a high-octane, 100% hands-on AWS DevOps Engineer to join their engineering team. The role focuses on managing the end-to-end lifecycle of production-grade ML and data platforms, with responsibilities including implementing Amazon EKS, maintaining Infrastructure as Code, and building CI/CD pipelines for Machine Learning.
Responsibilities:
- Lead the end-to-end implementation and day-to-day operations of Amazon EKS, including cluster provisioning, networking, and hosting complex microservices/apps
- Maintain and scale our environment using Terraform as the primary provider
- Build and manage CI/CD pipelines specifically for Machine Learning, including automated model registry updates, feature store synchronization, and automated deployments
- Implement rigorous IAM best practices specifically for ML workloads and ensure data isolation through VPCs, subnets, and private endpoints
- Manage S3 (lifecycle policies, performance tiering), Secrets Manager, KMS encryption, and Parameter Store
- Own the monitoring stack using CloudWatch, Prometheus, and Grafana to ensure system health and proactive alerting
Requirements:
- Minimum 5+ years of experience in an AWS DevOps or Site Reliability role
- Proven experience supporting Data Science teams and ML production environments
- Ability to troubleshoot complex networking and container orchestration issues in real-time
- Must be 100% hands-on
- AWS Certifications are a significant plus
- Deep expertise in Kubernetes (EKS)
- Experience with Terraform as the primary provider
- Experience with CI/CD pipelines specifically for Machine Learning
- Experience with Cloud Security & Identity best practices
- Experience managing S3, Secrets Manager, KMS encryption, and Parameter Store
- Experience with monitoring stack using CloudWatch, Prometheus, and Grafana
- Experience with Amazon SageMaker (training jobs, endpoints, pipelines, and notebooks)
- Experience with RDS/Aurora (PostgreSQL/MySQL)
- Expert-level capacity planning and data modeling with DynamoDB
- Experience with AWS Glue (ETL) and EMR (EMR on EKS/Serverless)