Farmers Insurance is a company dedicated to making a real difference in people's lives through a results-driven culture. They are seeking a Cloud Platform Engineer to manage and optimize multi-cloud infrastructure while ensuring platform availability, scalability, and compliance, leveraging AI technologies for operational reliability.
Responsibilities:
- Ensure availability, scalability, reliability and security of cloud platforms and services
- Design, deploy, and govern AI-powered agents (e.g., using Azure Copilot /AWS Bedrock) to achieve autonomous self-healing capabilities and automated resource management
- Handle regular operational requests with hands-on experience using Terraform for EC2 changes, S3 updates, user access management, and managed services like SageMaker, Bedrock, Storage Gateway, RDS, and Transfer Family etc
- Supervise and refine AI-generated Infrastructure-as-Code (IaC) (Terraform/Ansible) for Developing and maintaining complex and scalable Terraform/ansible/CloudBees (Jenkins) automation pipelines to provision, deploy, patch and manage cloud infrastructure
- Implementing AI based automation solutions for Cloud Operations to Monitor performance, scalability and respond to incidents and operational issues autonomously
- Implement GenAI tools to perform real-time Root Cause Analysis (RCA), correlate complex event data (logs, metrics), and auto-generate runbooks and incident summaries
- Develop and train predictive ML models to analyze historical telemetry and forecast potential system outages or performance bottlenecks and configure proactive monitoring and alerting for critical services
- Manage security and compliance by utilizing AI agents to detect configuration drift and auto-generate compliant updates for IAM, network, and security policies
- Manage security and compliance by remediating vulnerabilities, configuring notifications, supporting audits, and maintaining certifications and governance standards
- Collaborate with application, architecture, AIOps, FinOps and security teams to ensure production readiness
- Lead proof-of-concepts, implement new cloud services and Adapt quickly to new cloud releases and features to enhance operational capabilities
- Understand middleware components to provide end-to-end production support and troubleshoot complex operational scenarios
- Works with application teams, analyzing logs and data, opens service requests, works with vendors, partners, and drives problem resolution
- Reviews architectural designs for applications to ensure reliable and performant design patterns are implemented
- Hands-on experience in deploying applications, workloads, and data to the cloud environment, often involving migration from on-premises infrastructure or other cloud providers
- Advanced experience working with Finance and procurement team and implement cost optimization strategies based on changing workload patterns, business requirements, and new offerings from cloud providers
Requirements:
- 5 years of experience in public cloud operations (AWS, Azure, GCP), with a strong focus on AIOps integration
- Deep, demonstrable expertise designing and operationalizing solutions leveraging AWS Bedrock/Agent Frameworks and Azure Copilot for Cloud Operations
- Expertise in Infrastructure as Code (Terraform, CloudFormation), Ansible, and CI/CD pipelines, including supervising AI-generated infrastructure artifacts
- Proficiency in scripting languages (Python, Bash)
- Expertise in integrating observability platforms (Dynatrace, Prometheus) into AI/ML platforms for predictive analysis and anomaly detection
- Understanding of Site Reliability Engineering (SRE) and operational reliability principles
- Experience with monitoring tools (CloudWatch, Prometheus, Dynatrace, azure monitor) and ServiceNow
- Ability to adapt to new cloud releases and emerging technologies
- Excellent troubleshooting, problem-solving, and communication skills
- Hands-On experience on Jboss, Tomcat, IBM WAS, Oracle, DB2, MSSQL and Postgres is a plus
- Bachelor's degree in computer science or other technical discipline
- Technical Certifications are a plus