Staff AI Systems Engineer

United States of America

Full Time

2 weeks ago

$200,000 - $225,000 USD

No H1B

Key skills

Agentic SystemsLarge Language Model (LLM) API usageAgent DesignMulti-agent system architectureRetrieval-Augmented Generation (RAG)Vector/hybrid searchMachine Learning systems developmentPythonJavaScriptTypeScriptGolangWeb services ExpressWeb services FastAPIWeb services RESTWeb services SSEWeb services JWTCloud Infrastructure AWSCloud Infrastructure TerraformCloud Infrastructure VPCCloud Infrastructure NetworkingBackend databases PostgresBackend databases RedisObservability tools PrometheusObservability tools GrafanaObservability tools OpenTelemetryML Inference PyTorchML Inference TensorRTML Inference NVIDIA TritonCompute orchestration KubernetesCompute orchestration PrefectCompute orchestration RayLLM evaluation methodologiesSafetyrobustness testingCostperformance trade-off analysisUrgencyBold thinkingRAIMLLLMOpenAIAnthropicGeminiRAGLangChainAgenticPyTorchLangGraphBigQueryExpressFastAPIAWSTerraformKubernetesPostgresRedisPrometheusGrafanaOpenTelemetry

About this role

Flock Safety is the leading safety technology platform, helping communities thrive by taking a proactive approach to crime prevention and security. The Staff AI Systems Engineer will support the development of Night Shift, an AI research assistant, and will be responsible for building the system architecture and AI evaluation framework to improve lead accuracy for law enforcement officers.

Responsibilities:

Immerse yourself in the current system design and agent/tooling landscape
Understand the core customer use cases and data flows
Support the team by shipping a few quick wins (e.g., refining tool APIs, prompt engineering, fixing bugs)
Stand up the foundational eval and observability scaffolding (datasets, metrics, KPIs, reporting)
Propose a technical architecture and implementation plan for an agent evaluation framework
Deliver the MVP evaluation harness to produce initial metrics, enable debugging and perform regression testing
Take on a system feature that offers demonstrated improvement against your MVP evaluation suite
Productionize the evaluation and observability platform and make it the source of truth for quality and safety. (e.g. Online/offline tracing, alerting, dashboards, evaluations and PR-gated regression suite)
Own the roadmap for evolving the agent evaluation platform
Lead deeper R&D threads (e.g., lightweight fine-tuned projection layers, specialized embeddings, multimodal understanding) that can improve system performance on core metrics

Requirements:

Familiarity with Agentic Systems: Hands-on experience with LLM agents including: LLM API use (e.g. LangChain/LangGraph, vLLM, OpenAI/Gemini/Anthropic APIs)
Agent Design: tool use (e.g. via MCP), retrieval, memory, grounding/attribution for claims, and guardrails
Architectural patterns: planning and hand-off for multi-agent systems, context management
RAG: vector/hybrid search (e.g. pgvector, turbopuffer, rerankers, etc.)
ML Platform expertise: 5+ years building and shipping ML systems to production; experience in the following areas:
Backend Python and JS familiarity required; Typescript/Golang familiarity welcome
Web services (e.g. Express/FastAPI, REST, SSE, JWTs)
Cloud Infrastructure (e.g. AWS, Terraform, VPC, Networking)
Backend databases/stores (e.g. Postgres, Redis)
Observability (e.g. Prometheus, Grafana, OpenTelemetry, LangSmith/Langfuse)
Experience with LLM Evaluations at scale: You've built offline/online eval harnesses and are familiar with the methodologies and metrics to measure: Search, retrieval, and recommendation performance
Safety & robustness (security, compliance, red-teaming, regression testing)
Cost, performance and latency trade-offs
Durable execution (e.g. Temporal, Hatchet)
OLAP (e.g. ClickHouse, Bigquery)
ML Inference (e.g. PyTorch, TensorRT, NVIDIA Triton), ideally in multimodal domains (text/image/video)
Compute orchestration (e.g. Kubernetes, Prefect, Ray)
Agentic task success, trajectory quality, preference learning (e.g. SFT, DPO, RLHF, LLM-as-judge)