CareSource is seeking an AI Engineer III to lead the development of the "Engine Room," focusing on the reliability and scalability of their AI infrastructure. The role involves designing CI/CD pipelines, managing Azure AI Foundry environments, and ensuring secure deployment of AI solutions.

Responsibilities:

Architect and maintain the LLMOps/GenAIOps toolchain, including model registries, prompt version control, and reproducible training pipelines
Implement and manage the Azure AI Foundry environment, configuring model routers, quota management, and private endpoints for secure inferencing
Develop comprehensive observability dashboards to track model latency, token costs, hallucination rates, and drift
Automate "Policy-as-API" controls within the orchestration layer to enforce governance guardrails (e.g., PII filtering) at runtime
Collaborate with the Platform SRE team to ensure high availability and disaster recovery for mission-critical clinical agents
Manage the "Model Registry," ensuring all deployed models have associated version history, performance metrics, and rollback targets
Configure and maintain "Vector Databases" and RAG pipelines, optimizing retrieval performance and index freshness
Implement "Prompt Filtering" and content moderation gateways to prevent jailbreaks and enforce safety standards at the infrastructure level
Develop "Blue/Green" or "Canary" deployment strategies for AI agents to safely test new model versions in production
Manage the "API Gateway" for all AI services, ensuring authentication, rate limiting, and usage logging are enforced
Optimize "GPU/CPU Orchestration" to control compute costs while maintaining performance SLAs for high-volume inference
Build automated "Drift Detection" alerts that trigger retraining or human review when model performance degrades below a set threshold
Perform any other job related duties as requested

Requirements:

Bachelor's degree in Computer Science, Engineering, or related technical field required
Equivalent years of relevant work experience may be accepted in lieu of required education
Five (5) years of IT engineering experience, with at least three (3) years specialized in DevOps, MLOps, or Cloud Infrastructure required
Experience with Azure AI Services (Azure OpenAI, AI Search, Azure ML) and container orchestration (Kubernetes/AKS) required
Experience building and maintaining CI/CD pipelines for machine learning models or complex software applications required
Mastery of Python and scripting languages for automation and infrastructure-as-code (Terraform, Bicep, ARM templates)
Deep understanding of LLMOps principles: Prompt versioning, model registry management, and evaluation pipelines (e.g., MLflow, Prompt Flow)
Proficiency in Azure Networking and Security, including Private Endpoints, VNET integration, and API Management (APIM) configuration
Knowledge of Vector Databases and RAG (Retrieval Augmented Generation) infrastructure requirements
Strong observability skills, utilizing tools like Azure Monitor or App Insights to track token usage, latency, and drift
Microsoft Certified: Azure AI Engineer Associate or Azure DevOps Engineer Expert preferred
CKA (Certified Kubernetes Administrator) preferred

AI Engineer III

Key skills

About this role

Responsibilities:

Requirements: