Modular is on a mission to revolutionize AI infrastructure by rebuilding the AI software stack. As a GenAI Systems Engineer, you'll architect robust frameworks and optimize advanced inference processes to enhance the Modular platform for AI model deployment.
Responsibilities:
- Leverage a broad understanding of available libraries and concurrency techniques to inform high impact architecture decisions
- Identify and implement architecture-level optimizations in complex distributed systems
- Architect and implement building blocks and APIs to accelerate the development of advanced distributed optimizations
- Lead cross-functional projects spanning multiple teams and multiple layers of a deep tech stack
- Build beautiful abstractions to seamlessly weave async RESTful layers with intensive data processing layers
- Collaborate with cloud inference team to maximize flexibility in scalable cluster deployments
- Develop extensible customization interfaces to support open source community models and features
- Develop detailed and intuitive metrics, logging, and profiling tools
Requirements:
- Expert-level Python programming with deep understanding of asyncio and event loops
- 5+ years of systems programming experience with focus on performance and concurrency
- Hands on experience with robust low-latency applications running production workloads
- Extensive experience designing software architecture, interfaces, and collaboration
- Deep understanding of the fundamentals of profiling, benchmarking, and performance optimization
- Creativity and curiosity for learning and solving complex distributed systems problems
- Experience working inside high-performance ML inference systems (e.g. vLLM, SGLang, etc.)
- Experience with Kubernetes, containers, microservices, and cloud-native architectures
- Experience with graph based (e.g. dataflow, actors) programming models and runtimes
- Experience with distributed runtimes such as Ray, Open MPI, Dask, Spark, etc