IMO Health combines strengths in software development, artificial intelligence, and clinical expertise to create AI-driven solutions. They are seeking a Staff AI / MLOps Engineer to own the end-to-end machine learning lifecycle for production AI systems, focusing on operational excellence and architectural rigor.
Responsibilities:
- Own the full ML lifecycle, including data ingestion, training, validation, deployment, monitoring, retraining, and retirement
- Transition AI/ML prototypes into scalable, production-ready systems with CI/CD pipelines, automation, and observability
- Lead system design and architecture discussions, providing guidance on ML systems, MLOps, and AI infrastructure
- Develop and maintain AI-driven applications and inference services, optimizing for performance, scalability, reliability, and cost
- Integrate LLMs, generative AI, and NLP solutions into IMO Health products, focusing on unstructured clinical data
- Implement monitoring, alerting, logging, and dashboards to ensure model quality, detect drift, and maintain operational SLAs
- Build, maintain, and optimize CI/CD pipelines, automation scripts, and Infrastructure-as-Code for production ML systems
- Apply containerization (Docker, Kubernetes) and cloud infrastructure best practices to manage production environments
- Mentor and guide engineers, enforce technical standards, and drive reduction of technical debt
- Conduct root cause analysis of production defects and implement durable fixes
- Advocate for non-functional requirements (availability, scalability, reliability, maintainability) and design systems accordingly
- Collaborate cross-functionally with Product, Data Science, Architecture, and Engineering teams to align AI solutions with business goals
Requirements:
- 8+ years of professional experience in software engineering, AI/ML engineering, or related roles, building and operating production-grade systems
- Bachelor's or Master's degree in Computer Science, Engineering, or a related technical field (or equivalent experience)
- Strong foundation in computer science fundamentals (data structures, algorithms, design patterns, operating systems, networking)
- Expert-level coding skills in Python or Java, with a strong emphasis on production-quality software engineering practices
- Hands-on experience owning ML systems in production, including deployment, monitoring, retraining, and optimization
- Experience designing and operating CI/CD pipelines, automation, and observability for ML systems
- Deep experience with cloud platforms (AWS or Azure), containerization, and Infrastructure-as-Code
- Experience with MLOps tools and workflows (e.g., MLflow, SageMaker, Kubeflow)
- Experience integrating and deploying LLMs, generative AI, and agentic systems in production environments
- Working knowledge of NLP concepts (tokenization, embeddings, classification, sequence modeling); healthcare exposure is a plus
- Experience with Elasticsearch and vector databases for embedding-based search and retrieval
- Proven ability to translate business needs into scalable, reliable technical solutions, balancing technical debt and delivery velocity
- Strong system design skills for high-performance, distributed, and scalable systems
- Excellent communication and collaboration skills across cross-functional, distributed teams
- Self-starter who can operate autonomously and own complex systems end to end
- Experience with clinical or healthcare AI applications
- Familiarity with Hugging Face, PyTorch, TensorFlow, or other modern ML frameworks
- AWS Associate-level certification (Machine Learning Engineer or Solutions Architect)