Foxglove is a company that builds observability and data infrastructure for robotics and autonomous systems. They are seeking an Applied ML Engineer to design, deploy, and scale ML systems for their data platform, focusing on production ML workloads and optimizing inference pipelines.
Responsibilities:
- Deploy and operate inference infrastructure for production ML workloads, including model serving, scaling, and cost optimization
- Build and maintain vector database integrations and embedding applications to support semantic search over multimodal (image, video, point cloud, and timeseries) robotics data
- Design and implement evaluation and training infrastructure, to help us iterate quickly on model performance
- Own cloud architecture decisions and tooling that affect inference latency, throughput, cost, and reliability at scale
- Collaborate with product engineers to ship application-driven ML features tailored to developers building the cutting edge of robotics and physical AI, not prototype experiments
- Identify the right off-the-shelf solutions and adapt them for production, and know when to build vs. buy
Requirements:
- Strong hands-on experience in production ML infrastructure: cloud inference, model serving optimization frameworks (e.g., TorchServe, vLLM, Triton), and cost management
- Experience with the technologies used in building retrieval systems, including vector databases (e.g., Pinecone, Lance, turbopuffer, pgvector) and text-image embedding models
- Solid engineering fundamentals: distributed systems, cloud infrastructure (AWS/GCP), and production reliability
- A bias toward application and product impact over research; you're excited by shipping things that work, not writing papers
- Proven ability to operate independently, make good tradeoffs, and move fast in a high-ownership environment
- Excellent communication skills; you can explain ML tradeoffs to non-ML engineers
- Familiarity with fine-tuning and domain adaptation techniques for LLMs or embedding models (i.e. SFT, PEFT)
- Experience with data mining or hybrid search workflows, especially as applied in robotics autonomous vehicles, or physical AI workflows
- Experience building ML tooling, data management, and evaluation frameworks from scratch