Zip Co is a digital financial services company offering innovative products, and they are seeking a Senior Machine Learning Engineer to build and scale the systems that enable production-grade machine learning and AI. The role focuses on managing the ML lifecycle, collaborating with data science and engineering teams, and solving complex distributed systems problems.

Responsibilities:

Own and scale the infrastructure that powers production ML and AI across Zip
Build and maintain batch and streaming feature pipelines
Design and manage offline and online feature store patterns
Define MLflow model registry standards and promotion workflows
Deploy and operate scalable model serving endpoints
Implement CI/CD for ML pipelines and model deployment
Develop pipelines using PySpark and Spark SQL
Optimize joins, partitioning, and shuffle-heavy workloads
Improve reliability and cost-efficiency of distributed data jobs
Support streaming workloads using Delta Live Tables
Manage Databricks clusters, jobs, and access controls
Improve observability, alerting, and operational standards
Contribute to Lakehouse architecture (Databricks and Snowflake)
Implement governance, RBAC, and data quality standards
Build infrastructure that accelerates experimentation and model deployment
Support emerging AI use cases, including real-time and large-scale ML systems

Requirements:

8+ years of experience in Machine Learning with a strong focus on production-grade ML and distributed data systems
Demonstrated experience owning and operating ML systems end-to-end in production environments
Advanced experience with PySpark and Spark SQL
Strong understanding of Spark execution (joins, shuffles, partitioning)
Experience building and optimizing reliable, scalable data pipelines
Strong data engineering fundamentals including medallion architecture design, incremental/idempotent ETL patterns, and Delta Lake optimization (partitioning)
Experience operating ML systems in production
Hands-on experience with MLflow (tracking + model registry)
Experience managing feature stores (offline + online)
Experience deploying and monitoring model serving endpoints
Experience implementing CI/CD for ML workflows
Experience working in Azure
Production experience with Databricks and Delta Lake
Experience integrating with CosmosDB or similar NoSQL key-value stores
Experience designing orchestrated, production-grade data workflows (Databricks Workflows, Airflow, or ADF) with dependency management, backfills, and failure recovery
Delta Live Tables and streaming pipelines
Iceberg or Lakehouse Federation experience
Snowflake experience
Vector databases or LLM infrastructure
Infrastructure-as-code experience

Senior Machine Learning Engineer

Key skills

About this role

Responsibilities:

Requirements: