ETLPythonSQLAIMLNLPGenAILLMRAGAgenticMLOpsCI/CDA/B TestingRemote Work
About this role
Role Overview
Partner with Product and Engineering to identify high-impact opportunities, frame ambiguous problems, define success metrics, and choose pragmatic approaches (heuristics, statistics, ML, or GenAI).
Lead rigorous experimentation across teams: hypothesis design, metric/guardrail definition, power analysis, A/B testing (or quasi-experiments), and clear readouts that drive decisions.
Build and iterate on ML/AI capabilities that ship to production (e.g., classification, information extraction, ranking/recommendations, and GenAI components such as RAG or developing the agent harnesses for our core agentic journeys), optimizing for value-added, latency, and cost.
Establish best-in-class evaluation practices for both ML and LLM features: golden datasets, offline/online evaluation plans, regression suites, and monitoring that catches quality drift early.
Enable engineers to build safely and effectively with AI by coaching on prompt patterns, tool/function calling, structured outputs, guardrails, and debugging/evaluation workflows.
Design and support agentic workflows where they add real product value, with clear constraints, observability, and fallbacks.
Support the end-to-end lifecycle of deployed models and AI systems: data requirements, training/fine-tuning where relevant, validation, deployment, monitoring, incident response, and continuous improvement.
Raise org-wide leverage by creating reusable assets (evaluation harnesses, shared datasets, templates, documentation) and running enablement workshops.
Communicate insights and tradeoffs clearly to technical and non-technical stakeholders, turning analyses into decisions and measurable impact.
Champion responsible, privacy-aware AI: appropriate data handling, bias/fairness considerations where applicable, and human-in-the-loop workflows when needed.
Requirements
3+ years (or equivalent) delivering data science work that shipped to production and/or materially influenced product direction
Experience collaborating cross-functionally and communicating clearly with diverse stakeholders; ability to influence without authority
Strong Python and SQL skills, with the ability to write maintainable, production-quality code (testing, reviews, documentation)
Demonstrated mentorship/enablement—helping other engineers and teams adopt best practices and ship faster with higher quality
At least 3 of:
Strong applied statistics and experimentation skills (A/B testing, causal thinking, metric design, interpretation under uncertainty)
Proven ability to evaluate and improve models in real conditions: dataset design, error analysis, offline metrics, online measurement, monitoring, and iteration
Hands-on experience building with LLMs in product contexts, including some of: RAG/grounding, tool/function calling, structured outputs, prompt iteration, quality/cost/latency tradeoffs
Practical approach to LLM evaluation: golden sets, regression testing, human review loops, and monitoring for quality drift
Experience with modern MLOps/LLMOps practices (experiment tracking, ETL pipelines, versioning, CI/CD for ML, observability)
NLP and information extraction/classification on noisy social/content data
Experience developing and evaluating large scale Retrieval, Recommendation
and Search Systems.
Tech Stack
ETL
Python
SQL
Benefits
Competitive Salary
Remote Work Options with Hybrid Flexibility and Home Office Set-Up Stipend
Coworking Office Subscription for Collaborative Spaces
Health, Dental, and Life Insurance Coverage*
Open Vacation Policy and Flexible Holiday Schedule to Suit Your Needs
Paid Parental Leave to Support Quality Time with Your Loved Ones
Career Development, including Internal and External Training Opportunities