Function Health is dedicated to empowering individuals to live healthier lives through innovative technology. They are seeking a Staff AI Engineer to design and implement stateful multi-agent systems, integrating LLMs and multimodal models into various workflows while ensuring reliability and performance.

Responsibilities:

Architect and build stateful, graph-based agent workflows with tool use, planning, and memory
Integrate LLMs and multimodal models via structured I/O (JSON Schema, Pydantic validators) and function/tool calling
Build high-reliability APIs and streaming services for real-time inference, speech, and vision
Own production readiness: tracing, logging, metrics, rate limiting, circuit breakers, and SLOs
Stand up eval pipelines: offline golden sets, LLM-as-judge with human rubrics, online A/B, and regression tests in CI
Implement retrieval and memory: hybrid search, vector and graph retrieval, semantic caches, and long-horizon context
Optimize cost/latency: model routing, prompt and tool selection, quantization, and KV cache/prefill strategies
Lead cloud-native deployments on Kubernetes with GPU autoscaling, canary/shadow releases, and feature flags
Partner cross-functionally to translate research into robust production systems and iterate quickly behind evaluation gates
Mentor engineers through code reviews, design docs, and architecture decisions

Requirements:

2.5+ years building agentic AI systems; 6+ years as a full-stack or ML engineer, building production backends or ML systems in Python, Go, or similar
Fluency with agentic orchestration (e.g., LangGraph, PydanticAI, DSPy, LlamaIndex) and tool/function calling
Experience integrating frontier LLMs and multimodal models via managed APIs or self-hosted serving
Deep understanding of model serving and inference optimization (vLLM/Triton/TGI/SGLang, batching, KV cache reuse)
Strong with API design and backend frameworks (FastAPI, Flask) and event-driven architectures
Data systems expertise with PostgreSQL and Redis, including caching, token streaming, and throughput tuning
Retrieval and memory: vector databases (pgvector, Pinecone, Weaviate, Milvus), hybrid search, and graph/knowledge storage
Production evals: LLM-as-judge, human-in-the-loop, rubric design, and CI-integrated regression tests
Observability and SRE: OpenTelemetry traces, metrics, structured logs, SLOs, dashboards, and on-call triage
Cloud-native delivery: Kubernetes, Terraform, Docker, GPU scheduling/autoscaling on AWS or GCP
CI/CD proficiency with GitHub Actions and test automation for prompts, tools, and agents
Clear, concise communication and high ownership in fast-paced environments
Real-time multimodal systems: streaming ASR, low-latency TTS, WebRTC, and vision pipelines
Post-training/fine-tuning: DPO/ORPO, RLHF, preference data generation, and safety alignment
RAG expertise beyond basics: Graph RAG, multi-hop retrieval, rerankers, query planning, and freshness policies
Safety and governance: policy-as-code, red-teaming, PII handling, audit logs, and role-based tool authorization
Regulated data experience (HIPAA, SOC 2, GDPR) and data residency controls
Personalization at inference time, long-term memory agents, session state, and episodic memory stores
Experience with consumer-scale AI apps, high-traffic systems, or on-device/edge acceleration (WebGPU)

Staff AI Engineer, Agentic Systems

Key skills

About this role

Responsibilities:

Requirements: