Operationalize GenAI and LLM applications, leveraging RAG (retrieval-augmented generation), vector search, prompt engineering, agentic AI, and MCP (Model Context Protocol).
Lead the design, development, and deployment of production-grade LLM and ML pipelines, including data transformation, feature engineering, training, tuning, and serving.
Architect scalable data and AI workflows on Snowflake, Databricks, and Azure ML, integrating AI models with modern data lakehouse platforms.
Build and maintain API-based AI services (FastAPI, Flask), enabling secure, performant, and reliable model access at scale.
Define and implement CI/CD pipelines for GenAI and ML services, using GitHub Actions/Azure DevOps, MLFlow, and container orchestration (Kubernetes, Docker).
Develop and enforce MLOps/LLMOps best practices, including experiment tracking, model versioning, observability, and governance.
Mentor ML engineers and data scientists on engineering rigor, scalable design, and production-readiness.
Partner with cross-functional teams to integrate AI services into products, ensuring security, compliance, and resilience in regulated healthcare environments.
Troubleshoot production AI systems, analyzing inference latency, drift, and performance issues, and implementing preventive solutions.
Document and communicate architecture patterns, operational standards, and AI development frameworks across the organization.
Requirements
Bachelor's degree in Computer Science, Engineering, Machine Learning, or a related field; equivalent work experience is acceptable.
8+ years of experience in AI/ML engineering roles, with proven success in architecting and scaling production LLM/GenAI and ML systems.
Experience deploying LLM and GenAI solutions including RAG, vector database integration, and agentic/tool‑augmented LLM systems (LangChain, MCP, or similar frameworks).
Experience with Snowflake or Databricks, using one or both as core platforms for data processing or AI/ML workloads.
Proven track record in MLOps/LLMOps, including CI/CD pipeline automation, model serving, monitoring, and governance, using modern AI infrastructure tools such as Docker, Kubernetes, Azure ML, MLflow, and Terraform.
Proficiency in Python and SQL, with experience processing high‑volume datasets using big‑data tools such as Spark or equivalent distributed systems.
Ability to collaborate in cross-disciplinary teams (engineering, product, compliance, security) and deliver impact in regulated industries.
Tech Stack
Azure
Distributed Systems
Docker
Flask
Kubernetes
Python
Spark
SQL
Terraform
Benefits
Flexible Vacation Policy
80 hours of Paid Sick, Safe, and Caregiver Leave annually