Modular is on a mission to revolutionize AI infrastructure by rebuilding the AI software stack. The Cloud Inference Engineer will focus on building end-to-end distributed LLM inference deployments, making inference fast and scalable while providing a platform for enterprises and developers.
Responsibilities:
- Build & ship a LLM focused inference platform using best in class inference techniques (disaggregated inference, multi-node deployment of large models, high performance networking, distributed kv-cache management, high throughput batch processing, etc)
- Push the envelope for operational excellence with request-to-kernel observability, multi-cloud deployments, clever autoscaling, cold-start optimizations, and more
- Collaborate with our kernels and genAI teams to achieve SOTA application performance by integrating SOTA kernel & serving optimizations with SOTA cluster optimizations
- Build helm charts, kubernetes operators, and more to make a create simple, effective, maintainable deployments
Requirements:
- 5+ years of experience working in backend engineering
- Experience with kubernetes and operating your own services
- Ability to create durable, reusable software tools and libraries that are leveraged across teams and functions
- Experience in machine learning technologies and use cases
- Creativity and curiosity for solving complex problems, a team-oriented attitude that enables you to work well with others, and alignment with our culture
- Strongly identifies with our core company cultural values
- Experience with high performance computing / networking
- Experience working on high scale ML inference infrastructure (traditional AI or genAI)
- Familiarity with golang