Design and build scalable ML/AI infrastructure, including feature stores, model serving, data streaming, evaluation frameworks, and observability systems
Build and maintain data pipelines for structured and unstructured data (claims, EHR, transactions, logs)
Ensure data quality, lineage, and reliability across the platform
Ensure compliance and security for data handling, including adherence to healthcare and financial data standards
Empower teams to access data and turn into actionable insights with agentic analytics
Care coordination (clinical reasoning, workflow automation)
Establish and own best practices across MLOps and LLMOps, including:
Model lifecycle management (training, versioning, deployment, monitoring)
LLM evaluation, prompt/version control, and experimentation frameworks
CI/CD for ML systems and reproducible pipelines
Develop systems for LLM orchestration and agent frameworks (tool use, memory, retrieval, multi-step reasoning)
Understand drivers and implement solutions for agent performance, e.g. model selection, memory, context windows prompt engineering, agent orchestration, fine-tuning
Partner closely with forward-deployed Product, Data Science, and GTM teams to translate ambiguous problems into production-ready AI systems
Own end-to-end delivery, from experimentation to deployment and iteration
Contribute to defining Nitra’s agentic AI product strategy
Establish best practices for model evaluation, monitoring, and safety
Improve system reliability, latency, and cost efficiency at scale
Mentor engineers and help raise the bar for ML across the team.
Requirements
4+ years of experience in machine learning and data engineering
Strong background in ML frameworks for reinforcement learning
Hands-on experience with multi-agent systems, evaluation, and observability
Proven experience deploying ML systems into production at scale (think: $billions in volume)
Hands-on experience with MLOps practices, including:
Model versioning, monitoring, and retraining pipelines
Experiment tracking and reproducibility
Experience with LLMOps tooling and workflows, including: