Seer is a research-driven AI company focused on building scalable intelligent systems capable of robust operation in dynamic environments. They are hiring Senior and Staff-level Data Infrastructure Machine Learning Engineers to scale the systems powering their ML training data platform, focusing on building and optimizing high-throughput data infrastructure and large-scale indexing and retrieval systems.
Responsibilities:
- Architect, build, and operate distributed data infrastructure capable of processing and managing billions of video and multimodal data samples
- Design systems with strong guarantees around reliability, latency, scalability, and cost efficiency
- Optimize cloud object storage, metadata systems, databases, and large-scale distributed storage architectures
- Build efficient indexing and retrieval systems to support rapid dataset querying, filtering, and iteration
- Improve data access patterns and retrieval performance for research and production ML workflows
- Design scalable metadata and search infrastructure for multimodal datasets
- Develop monitoring, alerting, failure recovery, and performance optimization frameworks for large-scale data pipelines
- Build tooling to identify bottlenecks and improve operational visibility across distributed systems
- Optimize workload balancing and throughput across distributed compute and storage infrastructure
- Build systems for artifact management, dataset versioning, lineage tracking, and reproducibility across training workflows
- Ensure traceability and consistency across evolving datasets and training runs
- Develop lightweight internal tooling enabling engineers and researchers to explore and analyze data at scale
- Integrate and scale vision-language model (VLM) inference within distributed data pipelines
- Support automated enrichment, filtering, metadata generation, and preprocessing workflows
- Collaborate closely with ML systems and research teams to improve data quality and training velocity
Requirements:
- 5+ years of experience in data infrastructure, distributed systems, ML infrastructure, or related fields
- Strong experience building and operating large-scale distributed data pipelines
- Deep understanding of: Distributed systems architecture, Databases and metadata systems, Indexing and retrieval strategies, Cloud storage architectures
- Experience optimizing throughput, workload balancing, and cost-performance tradeoffs in cloud environments
- Hands-on experience with distributed processing frameworks such as Ray or Spark
- Strong observability, monitoring, and production reliability experience
- Strong software engineering fundamentals with the ability to own systems end-to-end
- Experience managing large multimodal datasets
- Familiarity with ML training workflows and data lifecycle management
- Experience running large-scale ML inference workloads in distributed or cloud environments
- Familiarity with vision-language models (VLMs)
- Experience working with real-world sensor data such as video, telemetry, or time-series streams
- Familiarity with data warehouse technologies such as Snowflake, BigQuery, or Redshift
- Experience with data versioning and lineage systems such as DVC, Delta Lake, or similar tooling