Mindlance is a company focused on AI initiatives, and they are seeking a Senior Machine Learning Engineer to own the production lifecycle of AI projects. The role involves building automated infrastructure for AI applications, ensuring scalability and observability in a multi-cloud environment.
Responsibilities:
- Build and maintain automated CI/CD and CT (Continuous Training) pipelines across AWS (SageMaker/Bedrock) and Azure (AI Studio)
- Design and execute the infrastructure for Retrieval-Augmented Generation (RAG), including vector database management (OpenSearch, Pinecone, or Azure AI Search) and semantic index optimization
- Build the engineering 'pipes' to securely ingest and move data from legacy systems (Mainframes, SQL Server, on-prem DBs) into cloud-native MLOps workflows
- Implement systemized frameworks for LLM evaluation (LLM-as-a-judge, ROUGE, METEOR) and traditional ML validation to ensure performance before deployment
- Deploy real-time monitoring for model drift, hallucination detection, latency, and token consumption to manage both quality and cost
- Manage all AI resources using Terraform or CloudFormation, ensuring the cloud posture is reproducible, secure, and follows a 'Privacy by Design' mandate
- Partner with teams using platforms like Palantir, Databricks, or Snowflake to ensure a high-fidelity data flow between analytical ontologies and production models
- Work directly with central IT and Security to navigate IAM roles, VPC peering, and firewall configurations, clearing the path for rapid transformation
- Optimize model serving endpoints for high-throughput and low-latency, utilizing containerization (Docker/Kubernetes) and serverless architectures where appropriate
- Establish rigorous version control for prompts (PromptOps), model weights, and data snapshots to ensure 100% auditability and rollback capability
- Support the data science lifecycle by automating feature stores, feature engineering pipelines, and the transition of experimental notebooks into hardened production microservices
- Implement automated scanning and guardrails (e.g., Bedrock Guardrails or Azure Content Safety) to prevent prompt injection and data leakage
Requirements:
- Bachelor's degree in Computer Science or a related field required
- 6+ years of engineering experience, with a minimum of 3 years strictly focused on MLOps or LLMOps in a production environment
- Deep, hands-on proficiency in both AWS and Azure ecosystems
- Expert Python, SQL, and PySpark
- Extensive experience with containerization (Docker, Kubernetes) and orchestration tools (Airflow, Kubeflow, or Step Functions)
- A strong understanding of statistical validation, model evaluation metrics
- The ability to move at the speed of a startup while maintaining collaborative relationships within a large-scale enterprise IT landscape
- Master's degree in a quantitative discipline highly desirable
- Professional experience with evaluation and observability frameworks like LangSmith, Arize Phoenix, or WhyLabs