Bayforce is seeking a Senior AI/ ML Engineer with strong hands-on experience in GenAI agent development and modern AI engineering. This role focuses on building production-grade agents, MCP integrations, and enterprise knowledge systems using their cloud and data stack.
Responsibilities:
- Design, build, and maintain production grade APIs, microservices, and internal SDKs that support AI models, LLM agents, retrieval pipelines, and tool integrations
- Develop and refine GenAI and agentic systems, including MCP tools, secure system integrations, and scalable agent orchestration infrastructure
- Create and maintain knowledge bases, retrieval pipelines, and vector search systems to power RAG and agent workflows across internal platforms
- Implement comprehensive observability for LLM agents, retrieval pipelines, and AI microservices, covering monitoring, logging, tracing, drift detection, hallucination tracking, latency, cost, and quality metrics
- Build automated evaluation and regression testing pipelines for agents, prompts, tools, and model updates to ensure reliability and continuous improvement
- Develop frameworks for prompt versioning, experiment tracking, reproducibility, and model governance, ensuring consistent and auditable AI development practices
- Establish and maintain MLOps and LLMOps pipelines, including model training, deployment, CI/CD Optimize model serving and inference infrastructure for performance and cost efficiency (batching, caching, quantization, GPU/CPU autoscaling)
- Collaborate with data scientists and cloud teams to productionize models, ensure reproducibility, and support scalable AI systems
- Work across Azure, Snowflake, and Databricks to support production AI systems, data pipelines, and model deployments
Requirements:
- 1+ years of experience building and deploying GenAI agents (including open-source LLM agents) in production environments
- Strong knowledge of MCP, tool integrations, and agent orchestration
- Ability to design and maintain knowledge bases, vector search, and retrieval systems
- Cloud expertise in Azure (preferred) or AWS
- Strong SQL skills and experience with Snowflake
- Experience working with Databricks for data and ML workflows
- Solid background in traditional ML (classification, clustering, etc.)
- Ability to build evals, guardrails, and safety layers for agents at scale
- Experience with MLOps and LLM Observability
- Experience building CI/CD pipelines (Github, DevOps)