Design Agentic Workflows: Design and implement LLM-based systems that go behind response generation
enabling structured tool usage, workflow orchestration, and secure interaction with internal services via MCP (Model Context Protocol).
Build and Optimize RAG & CAG: Develop high-performance Retrieval-Augmented Generation and Context-Augmented Generation pipelines to ensure accurate, relevant, and low-latency responses. Continuously improve context management, ranking strategies, and grounding mechanisms to support complex, multi-step interactions.
Voice Channel Mastery: Develop and optimize real-time Speech-to-Speech (S2S) pipelines, focusing on streaming architectures, latency reduction (including Time to First Word
TTFW) and maintaining a natural conversational flow.
Evaluation, Quality & Alignment: Build and maintain an automated QA module, including LLM-as-a-judge patterns, to measure accuracy, safety, latency, and resolution quality at scale. Translate evaluation insights into systematic models and prompt improvements.
Model Strategy & Hybrid Integration: Integrate and operate both commercial foundation models (e.g., OpenAI, Anthropic, Google) and open-source alternatives (e.g., Qwen, Kimi, DeepSeek, Moonshot, GLM), selecting and optimizing models based on performance, latency, cost, and use-case requirements.
Requirements
Strong Python and/or Java Engineering Skills: Advanced-level Python development experience, including asynchronous programming (e.g., FastAPI, asyncio) and building high-performance, production-grade services. Experience with streaming architectures is a strong advantage.
LLM Application & Multi-Agent Orchestration Experience: Hands-on experience building LLM-powered systems, including multi-step workflows, stateful agents, and tool invocation. Familiarity with orchestration frameworks such as LangChain, LlamaIndex, or LangGraph, particularly in building stateful, multi-turn agents.
Advanced Retrieval & Context Management: Deep understanding of vector databases (e.g., Weaviate, Qdrant, pgvector, Elasticsearch), semantic search, embedding strategies, and re-ranking techniques. Experience designing and optimizing RAG pipelines.
Real-Time & Low-Latency Systems: Experience in designing systems that operate under latency constraints, including streaming APIs, event-driven architectures, and performance optimization. Understanding of trade-offs between quality, cost, and response time.
Evaluation-Driven Development: Experience in implementing evaluation frameworks for LLM-based systems, including automated QA pipelines and LLM-as-a-judge patterns.
Familiar with API Design: knowledge of RESTful API design, OAuth2.
Tech Stack
ElasticSearch
Java
Python
Benefits
Access to local/international trainings, development and growth opportunities, including access to e-learning platforms, covering both technical and soft skills areas;
Modern technologies, product responsibility;
Flexible work schedule;
Hybrid work option;
Medical services package from one of two private providers;
25 vacation days per year;
Substitute days off for public holidays that occur on the weekend;
Meal tickets;
Internal referral program;
Team events, networking events organized to promote a passionate, creative and diverse culture;
Summerfest and Winterfest parties;
Of course, coffee, soft drinks and fresh fruits are on us in the office.