Design, build, and maintain highly scalable, robust, and secure machine learning infrastructure and platforms across the entire organization.
Define and drive the long-term MLOps vision, roadmap, and best practices in alignment with broader business and engineering goals.
Establish and optimize automated CI/CD/CT pipelines for machine learning models, ensuring seamless transitions from research to production.
Oversee the deployment of complex models (including LLMs and deep learning models), optimizing for latency, throughput, and cost-efficiency.
Implement enterprise-grade monitoring, alerting, and logging for model performance, data drift, concept drift, and system health.
Ensure robust AI governance and security compliance.
Partner closely with Data Scientists, Data Engineers, Software Engineers, and Product Managers to bridge the gap between model development and software engineering, developing standardized workflows that accelerate the path to production.
Mentor data scientists in MLOps best practices, foster a culture of engineering excellence, and lead technical design reviews.
Requirements
considerable experience in software engineering, DevOps, or Data Engineering, with dedicated experience in MLOps, ML infrastructure, or deploying ML models at scale.
Deep, hands-on expertise with AWS and its respective managed ML/AI services (SageMaker, Bedrock).
Advanced proficiency with Kubernetes, Docker, and ML-specific orchestration tools like MLFlow.
Strong software development skills in Python, alongside proficiency in languages like C++, or Java for high-performance systems.
Mastery of automation tools (GitHub Actions, GitLab CI, Jenkins, Octopus Deploy) and IaC frameworks (Terraform, Pulumi, Ansible).
Strong understanding of the underlying mechanics of popular ML and deep learning frameworks (PyTorch, TensorFlow, Scikit-Learn) to effectively troubleshoot and optimize deployments.
Demonstrated ability to lead complex, multi-quarter technical initiatives from conception to successful production rollout, including stakeholder management.