Airbnb was born in 2007 and has grown to over 5 million hosts worldwide. The Senior Staff Machine Learning Engineer will lead the technical direction for ML evaluation, focusing on the data flywheel powering CSxAI products, and work closely with cross-functional teams to enhance machine learning models and systems.

Responsibilities:

Set technical direction and lead execution for ML evaluation and the end-to-end data flywheel powering CSxAI products
Define how we measure quality, turn feedback into learning signals, and continuously improve models and products safely and efficiently
Partner closely with product, engineering, design, and operations to build evaluation systems that are trusted, scalable, and actionable
Work with large scale structured and unstructured data; explore, experiment, build and continuously improve Machine Learning models and pipelines for Airbnb product, business and operational use cases
Work collaboratively with cross-functional partners including product managers, operations and data scientists, to identify opportunities for business impact; understand, refine, and prioritize requirements for machine learning, and drive engineering decisions
Hands-on develop, productionize, and operate Machine Learning models and pipelines at scale, including both batch and real-time use cases
Leverage third-party and in-house Machine Learning tools & infrastructure to develop reusable, highly differentiating and high-performing Machine Learning systems, enable fast model development, low-latency serving and ease of model quality upkeep
Define evaluation strategy and success metrics for GenAI systems, aligning offline evaluation with online business and customer experience outcomes
Build and scale evaluation frameworks (golden sets, synthetic data, automated regressions, rubric-based grading, LLM-as-judge where appropriate) with strong controls for bias, drift, and reliability
Design the data flywheel: instrumentation, feedback collection, data quality checks, labeling strategy, dataset versioning, and governance to support continuous improvement
Lead cross-functional quality initiatives across product, ops, and engineering, driving clarity on what 'good' looks like and how teams act on evaluation results
Develop and productionize pipelines for dataset creation, model monitoring, evaluation-at-scale, and continuous testing (pre-deploy and post-deploy)
Drive technical decisions and architecture for evaluation and data infrastructure, balancing speed, rigor, cost, and safety

Requirements:

PhD in Computer Science, Mathematics, Statistics, or related technical field (or equivalent practical experience)
10+ years building, testing, and shipping ML/AI systems end-to-end; including 2+ years of experience with GenAI/LLM systems in production
5+ years leading large, ambiguous technical initiatives as a senior IC, influencing roadmap and engineering/science direction across teams
Deep expertise in evaluation methodology (offline/online alignment, metric design, human-in-the-loop evaluation, A/B testing, power analysis, regression testing)
Hands-on experience with GenAI systems, including orchestration, retrieval, tool calling, memory, etc
Experience building data pipelines and quality systems (labeling workflows, dataset curation, versioning, monitoring, and governance)
Solid ML fundamentals and best practices (model selection, training/serving, monitoring, reliability, and model lifecycle management)
Experience applying ML/AI to customer support workflows (e.g., agent assist, classification/routing, resolution recommendation, QA)
Experience building robust evaluation platforms for agent behavior validation, safety/guardrails, and continuous improvement
Proven ability to take evaluation and data flywheel work from incubation to production, iterating quickly while maintaining scientific rigor
Strong curiosity and ability to absorb new techniques (e.g., judge models, preference optimization, synthetic data generation) and apply them pragmatically

Senior Staff Machine Learning Engineer, Data & Eval

Key skills

About this role

Responsibilities:

Requirements: