Pipeline Automation: Design, implement, and manage automated CI/CD and Continuous Training (CT) pipelines for machine learning model development, evaluation, and delivery.
Model Deployment: Containerize, deploy, and scale machine learning models as high-availability microservices or batch processing workflows.
Observability & Monitoring: Establish unified logging, alerting, and monitoring solutions to track model inference performance, system latency, resource utilization, data drift, and concept drift.
Infrastructure Management: Provision and optimize cloud-based ML infrastructure (including GPU/CPU computing clusters) utilizing Infrastructure as Code (IaC) paradigms.
Cross-Functional Collaboration: Work intimately with product development teams to drive infrastructure adoption and efficiency gains through SDK/API development, automation and efficient ML system maintenance.
Governance & Compliance: Implement robust lineage tracking for data, code, and model artifacts to ensure compliance, reproducibility, and security across the entire ML lifecycle.
Data Infrastructure & Tooling: Work with data engineering to improve the data ecosystem, ensuring robust, scalable pipelines for experimentation and ML (including streaming tools like Kafka and Flink for low-latency online inference).
Thought Leadership: Act as a mentor and thought leader, helping to define best practices in machine learning engineering, scalable ML service ops, and agentic AI (AI-Native) best practices.
Requirements
Professional Experience: 5+ years of professional software engineering, DevOps, or data engineering experience, with at least 2 years dedicated to building and maintaining MLOps infrastructure.
Programming Mastery: Strong proficiency in Python, including deep familiarity with software engineering best practices (unit testing, modular design, version control via Git).
Orchestration & Containerization: In addition to hands-on experience with containerization (Docker) and container orchestration platforms, specifically Kubernetes (EKS, GKE, or native clusters), experience with related tools like FastAPI.
MLOps and Datastore Tooling: Proven familiarity with specialized ML lifecycle and data processing tools and platforms such as MLflow, Kubeflow, SparkML, Synapse ML, SQL, Spark/PySpark, dbt, and Airflow.
Cloud Foundations: Practical experience operating within a major cloud ecosystem—e.g., AWS, GCP, Databricks—with a clear grasp of cloud networking, security, and storage tiers.
Strong communication and project leadership skills, with the ability to influence cross-functional teams.
Educational Background: Bachelor’s or Master’s degree in Computer Science, Data Science, Software Engineering, or a closely related quantitative field.
Tech Stack
Airflow
AWS
Cloud
Docker
Google Cloud Platform
Kafka
Kubernetes
Microservices
PySpark
Python
Spark
SQL
Benefits
Competitive pay and benefits.
Medical, dental, vision, life and disability insurance plans (100% paid for US employees). We offer supplemental plans for medical and dental for Canadian employees.
401(k) plan with company matching program in the US and RRSP with DPSP plan for Canadian employees.
Employee Assistance Program (EAP) for mental wellness.
Flexible PTO and 12 company wide days off throughout the year.
Learning & Development programs.
Equipment, tools, and reimbursement support for a productive remote environment.
Free Life360 Platinum Membership for your preferred circle.