Clarity AI is a global tech company focused on leveraging AI and machine learning technologies for societal impact. The Senior GenAI Platform Staff Engineer will be responsible for designing and developing a core platform for deploying and managing large language models and agentic systems, ensuring high reliability and performance while bridging the gap between experimentation and production at scale.
Responsibilities:
- GenAI Platform Engineering: Designing and developing the core platform that enables the efficient deployment, scaling, and management of LLMs and multi-agent systems
- Infrastructure for Agents: Building specialized infrastructure to support long-running agentic workflows, including state management, tool-calling interfaces, and complex reasoning loops
- High-Scale Productionization & Model Serving: Scaling inference for LLMs to handle global demand while optimizing for latency, throughput, and cost. Implement standard batch and online serving with controlled rollback
- Build & Delivery: Establishing the 'Golden Path' for model deployment through a self-service path to move code, data, and models to production safely and reproducibly, including automated evaluation frameworks, safety guardrails, and CI/CD/CT pipelines
- Strategic Vision & Product Management: Continuously monitoring the AI ecosystem and proactively evolving our platform to maintain a competitive edge. This includes adopting best practices in Platform Product Management and driving the adoption of golden-path solutions
- End-to-End Observability: Implementing deep observability for LLMs, tracking not just system health but providing unified visibility into health, impact, and root cause across data, ML, and GenAI (including model hallucinations, token usage, and RAG performance)
- Collaborative Foundation: Providing the tools and abstractions that allow Data Scientists and stakeholders to move from a 'tuned model' to a 'production service' with zero friction
Requirements:
- Deep, hands-on experience deploying Large Language Models and complex agentic architectures at scale
- Proven experience in implementing Prompt Lifecycle Management (versioning, testing, and deploying prompts as code), an LLM Abstraction Layer (provider-agnostic access), and systems for Cost & Usage Control (visibility and limits on GenAI spend per use case)
- Expert-level experience building automated evaluation pipelines and frameworks (e.g., Ragas, DeepEval, G-Eval) and implementing LLM-as-a-judge patterns to validate model quality, grounding, and safety in CI/CD
- A proven track record of building platforms or shared infrastructure
- Deep understanding of MLOps concepts like Model Registry (versioning, state management, and lineage) and Model Monitoring & Drift Detection
- 3+ years of experience in MLOps or high-scale Software Engineering with a focus on AI production environments
- Expert-level Python and deep experience with container orchestration (Kubernetes, Docker) and cloud infrastructure (AWS/GCP)
- Proficiency with orchestration libraries (e.g., LangChain, LlamaIndex, CrewAI), vector databases (e.g., Pinecone, Weaviate), and inference engines (e.g., vLLM, TGI)
- The ability to learn and implement new technologies in a field that changes weekly
- Strong fundamentals in API design, microservices, and 'GitOps' methodologies, including the implementation of automated security and compliance by default
- Excellent communication skills (minimum C1 level), with the ability to articulate technical vision to both engineers and leadership