Allstate is a company focused on protecting families and their belongings from life’s uncertainties. They are seeking a Cloud Platform Lead Engineer who will be responsible for building and operating cloud application development platforms and leading engineering practices to improve system efficiency and team collaboration.
Responsibilities:
- Lead the design, build, and operation of cloud infrastructure supporting ML experimentation, training, and production deployments
- Define technical direction and best practices for ML platforms, MLOps, reliability, and cloud infrastructure
- Architect ML platforms for high availability, fault tolerance, and resiliency across supported environments
- Establish and own observability standards, including metrics, logging, tracing, alerting, and SLOs for ML platforms
- Build and oversee CI/CD pipelines and automation for infrastructure and ML workflows
- Drive infrastructure-as-code, automation, and reliability standards across the platform
- Proactively monitor, troubleshoot, and improve platform availability, performance, scalability, and recovery
- Champion MLOps best practices including model versioning, validation, promotion, monitoring, and rollback strategies
- Ensure platform security, compliance, and cost optimization
- Partner with data scientists, ML engineers, and product teams to deliver reliable, self-service ML capabilities
- Mentor engineers through design reviews, code reviews, and hands-on technical leadership
- Contribute to roadmap planning, prioritization, and execution aligned with business and customer needs
- Participate in agile ceremonies and drive continuous improvement across the team
Requirements:
- Proven experience leading cloud platform or infrastructure initiatives
- Strong hands-on experience with cloud platforms (Azure, AWS, and/or GCP)
- Deep knowledge of infrastructure as code, automation, CI/CD, and reliability engineering
- Experience designing highly available and resilient distributed systems
- Experience with ML platforms or MLOps tooling (e.g., MLflow, Kubeflow, Azure ML, SageMaker, Vertex AI)
- Familiarity with observability tools (e.g., Datadog, ELK, New Relic, Prometheus)
- Ability to influence technical direction and collaborate across teams
- Strong communication skills and a leadership mindset
- Amazon Web Services (AWS)
- Cloud Computing
- Cloud Engineering
- Cloud Management
- Cloud Software
- Cloud Technology
- DevOps
- Google Cloud Platform (GCP)
- Lead Engineering
- Microservice Framework
- Microsoft Azure
- Python (Programming Language)
- Python Software Development
- Terraform
- Terraform (Software)
- 6 or more years of experience (Preferred)