Cartesia is a pioneering company focused on building the next generation of AI with interactive intelligence. They are seeking an Inference Engineer to design and build low latency, scalable, and reliable model inference stacks for their foundation models, collaborating closely with research and product teams.
Responsibilities:
- Design and build low latency, scalable, and reliable model inference and serving stack for our cutting edge foundation models using Transformers, SSMs and hybrid models
- Work closely with our research team and product engineers to serve our suite of products in a fast, cost-effective, and reliable manner
- Design and build robust inference infrastructure and monitoring for our products
- Have significant autonomy to shape our products and directly impact how cutting-edge AI is applied across various devices and applications