Independently design and implement CI/CD pipelines for machine learning models.
Automate the process of model training, validation, testing, and deployment using relevant tools.
Develop and maintain automated testing frameworks for machine learning models.
Develop and manage various model deployment strategies (e.g., A/B testing, canary deployments).
Build and maintain scalable and reliable infrastructure for model serving (e.g., using Kubernetes, serverless functions).
Implement and manage model versioning and rollback mechanisms.
Optimize model serving for latency, throughput, and resource utilization.
Implement and manage comprehensive monitoring and logging systems to track model performance and identify issues (e.g., model drift, data drift).
Set up and manage alerting systems to notify the team of performance degradation.
Contribute to the development and implementation of model governance policies and procedures.
Ensure compliance with security and privacy requirements.
Collaborate effectively with data scientists to understand model requirements and dependencies.
Work with software engineers to integrate machine learning models into applications and services.
Develop and maintain APIs and interfaces for model access.
Requirements
Education: Bachelor’s degree in computer science, Engineering, or a related field.
Experience: 2+ years of experience in DevOps, software engineering, or a related role with a focus on machine learning deployment and operations. Proven experience with cloud platforms and containerization technologies. Experience in the financial services industry is a plus.
Technical Skills:
Strong proficiency in DevOps practices and tools (e.g., Jenkins, GitLab CI, Docker, Kubernetes).
Solid knowledge of cloud computing platforms (e.g., AWS, Azure, GCP) and their machine learning services.
Knowledge of MLflow or Kubeflow for model management, or similar.
Familiarity with machine learning frameworks (e.g., TensorFlow, PyTorch, scikit-learn).
Strong experience with scripting and automation (e.g., Python, Bash).
Solid understanding of software engineering principles and best practices.
Experience with infrastructure as code (IaC) tools (e.g., Terraform, CloudFormation).
**Soft Skills: **
Strong problem-solving and analytical skills with the ability to work independently.
Ability to work in a fast-paced and dynamic environment.
Automation mindset and a drive to improve efficiency.
Excellent collaboration and communication skills.
Customer-centric approach to solution development.
Strong team player.
Disciplined work ethic.
Good command of English language, both verbal and written.