Datadog is the leading observability and security platform for the AI era, providing businesses with unified visibility across the technology stack. As a Staff Engineer, you will lead the development of new features within Datadog’s LLM Observability product, shaping product direction and driving experimentation to solve complex problems in AI systems.
Responsibilities:
- Drive design and implementation of LLM observability features
- Ideate, prototype, and scale new product features to provide insights and drive improvements for generative AI systems
- Work cross-functionally with other eng teams, product, UX, and applied science to iterate fast and find product-market fit
- Develop and extend tools for tracing, evaluating, and debugging LLMs
- Influence architecture decisions and mentor engineers to build resilient, high-performance systems
- Stay close to customer pain points and use those insights to guide product and engineering priorities
- Stay current with industry trends and advancements in machine learning and observability, driving innovation within the team
Requirements:
- You have a BS/MS/PhD in a Computer Science, Engineering or related scientific field or equivalent experience
- Deep understanding of distributed systems and scalable backend architectures
- Hands-on experience building and shipping LLM-powered or GenAI applications
- Understanding of model internals, inference pipelines, evaluation techniques, and prompt engineering
- Ability to thrive in ambiguous, fast-changing spaces and have a product-oriented mindset
- You're excited to shape the next generation of AI observability tools from the ground up
- Communicate clearly, think rigorously, and take pride in clean, maintainable code
- Experience with observability tools/platforms