Modular is on a mission to revolutionize AI infrastructure by rebuilding the AI software stack. The Cloud Inference Engineer will focus on building end-to-end distributed LLM inference deployments, making inference fast and scalable while providing a platform for enterprises and developers.

Responsibilities:

Build & ship a LLM focused inference platform using best in class inference techniques (disaggregated inference, multi-node deployment of large models, high performance networking, distributed kv-cache management, high throughput batch processing, etc)
Push the envelope for operational excellence with request-to-kernel observability, multi-cloud deployments, clever autoscaling, cold-start optimizations, and more
Collaborate with our kernels and genAI teams to achieve SOTA application performance by integrating SOTA kernel & serving optimizations with SOTA cluster optimizations
Build helm charts, kubernetes operators, and more to make a create simple, effective, maintainable deployments

Requirements:

5+ years of experience working in backend engineering
Experience with kubernetes and operating your own services
Ability to create durable, reusable software tools and libraries that are leveraged across teams and functions
Experience in machine learning technologies and use cases
Creativity and curiosity for solving complex problems, a team-oriented attitude that enables you to work well with others, and alignment with our culture
Strongly identifies with our core company cultural values
Experience with high performance computing / networking
Experience working on high scale ML inference infrastructure (traditional AI or genAI)
Familiarity with golang

Cloud Inference Engineer

Key skills

About this role

Responsibilities:

Requirements: