Allstate Insurance Co. is dedicated to protecting families and their belongings from life’s uncertainties. The Cloud Platform Engineer (ML DevOps) will design and maintain infrastructure for machine learning experimentation, develop CI/CD pipelines, and collaborate with data scientists to ensure efficient ML workflows.
Responsibilities:
- Designs, builds, and maintains infrastructure for ML experimentation, model training, and deployment
- Develops and manages CI/CD pipelines for ML workflows (data ingestion, model training, testing, and deployment)
- Implements and manages ML platforms (e.g., Azure MLStudio, Fabric, MLflow, Kubeflow, SageMaker, Vertex AI) to support reproducibility and scalability
- Creates tools and environments to automate data versioning, model tracking, and artifact management
- Collaborates with data scientists to enable self-service access to compute resources and production systems
- Monitors, logs, and alerts on ML system health and model performance in production
- Enforces MLOps best practices across teams, including governance, model validation, and rollback strategies
- Ensures infrastructure security, cost-efficiency, and compliance
- Practices daily paired programming and test-driven development in writing software and building product
- Participates in executing the strategy, keeping the customer needs and wants in mind
- Establishes continuous integration, continuous delivery, and continuous deployment pipelines and practices
- Participates in retrospectives to gather feedback and derive actionable items to improve the team and the product
- Participates in iteration planning meetings ensuring that the team has a common understanding of each story and chores in a team’s backlog
Requirements:
- 4+ years of experience with software development languages such as Python, Java
- 4+ years of experience with Cloud Technologies such as Azure and AWS
- 4+ years of experience with DevOps
- 4+ years of experienced with Infrastructure as Code technologies such as Terraform, Ansible, Chef or Puppet
- Exposure to machine learning frameworks and distributed data processing tools like Apache Spark or equivalents