Tubi is a free streaming service that entertains over 100 million monthly active users. As a Staff Software Engineer on the ML Infrastructure team, you will collaborate closely with the Machine Learning and Product teams to build world-class machine learning inference platforms that power essential services like personalized recommendations and search.
Responsibilities:
- Design and build scalable, high throughput, and low latency distributed systems using Scala
- Build reusable components and services that serve various ML applications like Personalization, Search, Ads and Exploration
- Partner closely with ML engineers to understand their challenges and limitations and develop scalable solutions to address them. Proactively recommend solutions to keep our ML Inference stack state of the art
- Take a data driven approach to identifying & optimizing latency, cost, and efficiency of our infra. Lead large scale cross functional refactorings if necessary
- Mentor other engineers on the team on system design, effective incident management, interviewing, leveraging LLMs for work, etc
- Collaborate with ML, Product, and cross functional engineering teams to define the long term vision and architecture for ML Infrastructure at Tubi
Requirements:
- 8+ years of experience designing and building scalable, distributed systems in any modern backend language (e.g., Scala, Java, Python, Go, C++); experience with Scala or JVM based language is a plus
- Strong experience with AWS or an equivalent cloud platform
- Experience building online microservices at scale with low latency serving
- Experience with both SQL (e.g. Postgres) and NoSQL databases (e.g. Cassandra), message brokers (e.g. Kafka), and caches (e.g. Redis)
- Experience with containerization technologies, such as Docker or Kubernetes
- Led the response and resolution efforts for multiple major, large-scale incidents
- Familiarity with the machine learning infrastructure like inference engines (e.g. torschserve, triton, vLLM), vector stores (e.g. LanceDB, FAISS), feature stores (e.g. Feast), ElastiCache, model training orchestration, etc
- Understanding of ML model training pipelines and model internals. Experience with Recommender Systems, Search, Autocomplete and Ads ML is a plus
- Previous experience with Akka, Erlang, Elixir or Go
- Proficient in data-driven analysis of complex A/B testing results