Design, build, and operate scalable ML platform components including training infrastructure, feature stores, model registries, inference services, and end‑to‑end workflow orchestration.
Develop cloud‑native, distributed systems and CI/CD pipelines that ensure reliable, reproducible, and continuously delivered ML model deployments.
Implement and mature MLOps capabilities such as experiment tracking, data and model versioning, model evaluation, monitoring, and automated retraining.
Establish best practices for model lifecycle management, testing, and deployment across development, staging, and production environments.
Integrate observability into ML systems, enabling deep visibility into performance, drift, data quality, and inference reliability.
Build and optimize cloud-based ML infrastructure on Azure, AWS, and/or GCP using Kubernetes, container orchestration, and infrastructure‑as‑code tools.
Develop scalable batch and real‑time data pipelines that power feature generation, training workflows, and high‑performance model serving.
Ensure security, compliance, and cost-effectiveness across ML environments in partnership with platform, architecture, and governance teams.
Collaborate with data scientists and applied ML teams to translate modeling needs into robust, reusable, and self-service platform capabilities.
Work with security, compliance, and architecture partners to uphold responsible AI, governance, and data protection standards.
Drive developer productivity by promoting self‑service tooling, reusable components, documentation, and engineering best practices.
Contribute to Agile delivery processes while championing automation, engineering excellence, and continuous improvement.
Requirements
Strong software engineering background with experience building distributed systems or platform services
Hands-on experience with machine learning workflows, MLOps tooling, and productionizing ML solutions
Proficiency in Python and familiarity with ML libraries, frameworks, and backend development patterns
Experience with cloud platforms and ML services, including Azure ML Studio, AWS SageMaker, and/or Google Vertex AI
Exposure to cloud storage/data such as Azure Fabric/OneLake, AWS S3, and Google Cloud Storage (GCS)
Experience with cloud-native scanning and security tools such as Azure Defender, Microsoft Purview, AWS Security Hub, Amazon Inspector, GCP Security Command Center, or equivalent services
Strong understanding of technologies such as Kubernetes, Docker, CI/CD, Terraform/Infrastructure-as-Code, etc.
Understanding of system design, API architecture, and scalable data/ML infrastructure
Strong communication and cross-functional collaboration skills.
4+ years of experience in ML engineering, platform engineering, or equivalent (preferred).
Tech Stack
AWS
Azure
Cloud
Distributed Systems
Docker
Google Cloud Platform
Kubernetes
Python
Terraform
Benefits
Joining our team isn’t just a job — it’s an opportunity
One that takes your skills and pushes them to the next level
One that encourages you to challenge the status quo
One where you can shape the future of protection while supporting causes that mean the most to you.