PythonAILLMLarge Language ModelsLlamaRAGLangChainLlamaIndexMLOpsCachingCommunication
About this role
Role Overview
Design and implement LLM-based systems for production use cases.
Participate in architectural decisions around model selection, prompt designs, data flows, and system constraints.
Optimize LLM usage for latency, cost, and reliability, using techniques such as prompting strategies, caching, and system-level optimizations.
Establish and own the evaluation framework for AI systems, from defining metrics to implementing automated evaluation pipelines.
Architect, build, and rigorously evaluate RAG pipelines, with a strong focus on retrieval and generation metrics.
Implement guardrails and constraints to improve the reliability and safety of model behavior, and to mitigate known failure modes.
Requirements
Strong Python skills, with production-grade engineering standards.
Proven experience building and operating LLM systems in production, including performance and latency trade-offs.
Deep understanding of LLM failure modes (hallucinations, drift, prompt sensitivity) coupled with a rigorous, metric-driven approach to evaluating and mitigating them.
Ability to work autonomously and make sound engineering trade-offs under real-world constraints.
Professional-level English communication skills.
Hands-on experience deploying, serving, and optimizing open-source large language models (e.g., Llama 3, Mistral, Mixtral) in production (Nice to Have).
Familiarity with LLM application frameworks and orchestration layers (e.g., LangChain, LlamaIndex, or custom implementations) (Nice to Have).