Life360 is a company dedicated to keeping families connected and safe through innovative mobile applications and tracking devices. They are seeking a Senior Machine Learning Operations Engineer II to design and manage the infrastructure and automated pipelines for machine learning models, ensuring their reliable deployment and monitoring in production environments.
Responsibilities:
- Pipeline Automation: Design, implement, and manage automated CI/CD and Continuous Training (CT) pipelines for machine learning model development, evaluation, and delivery
- Model Deployment: Containerize, deploy, and scale machine learning models as high-availability microservices or batch processing workflows
- Observability & Monitoring: Establish unified logging, alerting, and monitoring solutions to track model inference performance, system latency, resource utilization, data drift, and concept drift
- Infrastructure Management: Provision and optimize cloud-based ML infrastructure (including GPU/CPU computing clusters) utilizing Infrastructure as Code (IaC) paradigms
- Cross-Functional Collaboration: Work intimately with product development teams to drive infrastructure adoption and efficiency gains through SDK/API development, automation and efficient ML system maintenance
- Governance & Compliance: Implement robust lineage tracking for data, code, and model artifacts to ensure compliance, reproducibility, and security across the entire ML lifecycle
- Data Infrastructure & Tooling: Work with data engineering to improve the data ecosystem, ensuring robust, scalable pipelines for experimentation and ML (including streaming tools like Kafka and Flink for low-latency online inference)
- Thought Leadership: Act as a mentor and thought leader, helping to define best practices in machine learning engineering, scalable ML service ops, and agentic AI (AI-Native) best practices
Requirements:
- 5+ years of professional software engineering, DevOps, or data engineering experience, with at least 2 years dedicated to building and maintaining MLOps infrastructure
- Strong proficiency in Python, including deep familiarity with software engineering best practices (unit testing, modular design, version control via Git)
- Hands-on experience with containerization (Docker) and container orchestration platforms, specifically Kubernetes (EKS, GKE, or native clusters), experience with related tools like FastAPI
- Proven familiarity with specialized ML lifecycle and data processing tools and platforms such as MLflow, Kubeflow, SparkML, Synapse ML, SQL, Spark/PySpark, dbt, and Airflow
- Practical experience operating within a major cloud ecosystem—e.g., AWS, GCP, Databricks—with a clear grasp of cloud networking, security, and storage tiers
- Strong communication and project leadership skills, with the ability to influence cross-functional teams
- Bachelor's or Master's degree in Computer Science, Data Science, Software Engineering, or a closely related quantitative field
- Experience implementing and scaling production feature stores (e.g., Feast, Tecton) and model registries
- Prior experience deploying and optimizing Large Language Models (LLMs) or foundation models utilizing serving frameworks like vLLM, Triton Inference Server, or TGI
- Proficient with IaC frameworks, particularly Terraform, to manage reproducible environments
- Familiarity with distributed data computation engines such as Apache Spark, Ray, or Dask
- Relevant cloud or architecture credentials, such as AWS Certified Machine Learning Specialty, Google Cloud Professional Machine Learning Engineer, or Certified Kubernetes Administrator (CKA)
- Experience in subscription-based products, lifecycle marketing, or user acquisition
- Experience with geospatial data and mobile location-based services
- Experience in the consumer technology sector, particularly within a fast-paced and sometimes ambitious development setting