Tek Leaders Inc is seeking an AI Observability Engineer to design and implement observability for AI agents and data pipelines. The role involves monitoring agent behavior, building evaluation frameworks, and implementing quality metrics to ensure reliable AI performance.

Responsibilities:

Design and implement end‑to‑end observability for AI agents, models, MCPs, and data pipelines
Instrument agents for traces, metrics, and logs covering prompts, tool calls, responses, latency, errors, and cost
Monitor agent behavior, reliability, and performance across single‑ and multi‑agent systems
Build and operate an evaluation framework (offline + continuous) for agentic systems
Define offline “golden” test suites, regression sets, and scenario‑based evaluations
Implement continuous, in‑production evaluations to detect quality and safety drift with alerts and thresholds
Implement AI quality and safety metrics (hallucination rate, grounding accuracy, tool success rate, confidence scores)
Detect and alert on model drift, data drift, and concept drift impacting agent outcomes
Implement Human‑in‑the‑Loop (HITL) review workflows for approval‑gated agent actions
Enforce and log approvals for sensitive or high‑risk tool actions
Define HITL triggers using confidence thresholds, escalation policies, and reviewer queues
Feed human feedback back into prompt updates, retrieval tuning, and agent policy improvements
Instrument MCPs for request/response observability and correlate MCP telemetry with agent traces
Integrate observability and evaluation checks into CI/CD pipelines to enable safe rollout, canarying, and rollback
Build dashboards and alerts for agent health, quality, safety, and usage trends
Ensure security, privacy, and compliance observability, including PII detection and audit logging
Optimize observability cost and performance across logs, metrics, traces, and evaluation runs
Experience implementing AI observability using AWS cloud services and open‑source tooling

Requirements:

Design and implement end‑to‑end observability for AI agents, models, MCPs, and data pipelines
Instrument agents for traces, metrics, and logs covering prompts, tool calls, responses, latency, errors, and cost
Monitor agent behavior, reliability, and performance across single‑ and multi‑agent systems
Build and operate an evaluation framework (offline + continuous) for agentic systems
Define offline 'golden' test suites, regression sets, and scenario‑based evaluations
Implement continuous, in‑production evaluations to detect quality and safety drift with alerts and thresholds
Implement AI quality and safety metrics (hallucination rate, grounding accuracy, tool success rate, confidence scores)
Detect and alert on model drift, data drift, and concept drift impacting agent outcomes
Implement Human‑in‑the‑Loop (HITL) review workflows for approval‑gated agent actions
Enforce and log approvals for sensitive or high‑risk tool actions
Define HITL triggers using confidence thresholds, escalation policies, and reviewer queues
Feed human feedback back into prompt updates, retrieval tuning, and agent policy improvements
Instrument MCPs for request/response observability and correlate MCP telemetry with agent traces
Integrate observability and evaluation checks into CI/CD pipelines to enable safe rollout, canarying, and rollback
Build dashboards and alerts for agent health, quality, safety, and usage trends
Ensure security, privacy, and compliance observability, including PII detection and audit logging
Optimize observability cost and performance across logs, metrics, traces, and evaluation runs
Experience implementing AI observability using AWS cloud services and open‑source tooling

AI Observability Engineer

Key skills

About this role

Responsibilities:

Requirements: