KData AI is seeking a highly skilled MLOps Engineer with deep expertise in the Databricks ecosystem to join their data team for a critical 6-month initiative. This role focuses on automating, scaling, and managing the end-to-end lifecycle of machine learning models, bridging the gap between Data Science and Data Engineering.
Responsibilities:
- Design, build, and maintain robust CI/CD and MLOps pipelines for machine learning model training, evaluation, deployment, and batch/real-time scoring using Databricks Jobs and Workflows
- Implement and manage experiment tracking, model registration, versioning, and environment promotion policies using MLflow and Unity Catalog
- Optimize Databricks clusters and computational workloads for ML training and inference to ensure both cost-efficiency and high performance
- Collaborate with data engineers to build and maintain scalable feature pipelines utilizing Databricks Feature Store / Delta Lake
- Establish proactive monitoring frameworks to track model performance, data drift, concept drift, and system health in production environments
- Partner closely with Data Scientists to transition proof-of-concept (PoC) code into scalable, production-ready ML products
Requirements:
- 6+ years of professional experience in Software Engineering, Data Engineering, or DevOps, with at least 3+ years dedicated to MLOps
- Hands-on experience architecting ML workflows within Databricks (including MLflow, Unity Catalog, Delta Lake, and Databricks Repos)
- Advanced proficiency in Python and SQL. Strong skills in PySpark are highly desired
- Proven experience building automated deployment pipelines using tools such as GitHub Actions, GitLab CI, Jenkins, or Azure DevOps
- Familiarity with major cloud environments (AWS, Azure, or GCP) and cloud data infrastructure
- Bachelor's degree in Computer Science, Data Science, Engineering, or equivalent practical experience
- Active Databricks certifications (e.g., Databricks Certified Machine Learning Professional)
- Experience with Infrastructure as Code (IaC) tools like Terraform
- Familiarity with containerization (Docker, Kubernetes)
- Exposure to LLMOps or serving GenAI models on Databricks