Senior ML Evaluation Engineer – Autonomous Vehicles
District of Columbia, United States of America
Full Time
5 days ago
$184,000 - $356,500 USD
Visa Sponsor
Key skills
PythonPyTorchSparkC++CMLLLMAgenticJAXPrototyping
About this role
Role Overview
Design and build learned evaluation pipelines that assess driving behavior using LLMs, VLMs, and multimodal models
Develop agentic workflows that chain model inference, retrieval, and structured reasoning to evaluate complex driving scenarios
Define evaluation-of-evaluation methodology — how do we know our learned evaluators are correct?
Build golden-set frameworks and calibration loops for learned metrics
Partner with AML (Alpamayo Logos) teams on model-specific eval needs (e.g., COT prediction quality, AML regression coverage)
Instrument evaluation systems with robust experiment tracking, A/B comparison tooling, and model versioning
Contribute to the team's transition from rule-based to learned evaluation: identify metrics and analyzers that are candidates for ML replacement and build the alternatives
Requirements
PhD with 4+ years, MS with 6+ years, or BS (or equivalent experience) with 8+ years of relevant experience in Computer Science, Computer Engineering, or a related technical field.