Owning the design and implementation of the AI-driven customer care systems and autonomous multi-agent orchestration workflows.
Designing, developing, and scaling state-of-the-art cyclic graph agent networks and multi-agent systems using frameworks like LangGraph, CrewAI, or AutoGen.
Optimizing LLM & Agent execution utilizing advanced runtime techniques such as quantization, pruning, batching, token streaming, and semantic caching to ensure ultra-low latency.
Owning the solutions alignment of dependencies and service contracts with other teams.
Designing, developing, and scaling real-time Retrieval-Augmented Generation (RAG) pipelines integrating state-of-the-art open-source LLMs (Llama 3, Mistral, Falcon, or similar).
Implementing scalable, high-performance vector search (Qdrant, Weaviate, Milvus) for robust knowledge retrieval and semantic search.
Having awareness of techniques such as quantization, pruning, distillation, batching, and caching for optimizing LLM inference with the minimum response times.
Developing and exposing secure, performant APIs via FastAPI/gRPC or others, containerized (Docker), orchestrated (Kubernetes), and fully integrated into automated CI/CD pipelines.
Embedding comprehensive monitoring and evaluation (e.g. MRR, Recall@k, NDCG, Faithfulness, latency metrics) and implementing automated regression testing for continuous improvement.
Championing and enforcing best practices for data security, compliance (GDPR, Saudi PDPL is a plus), and responsible AI, including PII redaction and end-to-end encryption.
Demonstrating mastery of foundational software engineering by writing clean code and architecture, maintainable and testable code, designing robust, modular, and scalable systems; leveraging version control, and implementing comprehensive continuous integration, automated testing, and deployment practices.
Leading rigorous design and code reviews, mentoring engineers, and fostering an innovative engineering culture grounded in clean architecture, SOLID principles, and proactive best practices to ensure system reliability, security, and agility.
Requirements
Bachelor’s degree in Computer Science, Data Engineering, Information Systems, or a related field
5+ years delivering production AI/NLP systems, including 2+ years as a technical lead or senior staff engineer
Proven experience owning real-time conversational AI/RAG platforms at massive scale, serving thousands of concurrent users
Expert proficiency in Java or Python with strong software engineering fundamentals and system-design capabilities
Deep knowledge and hands-on experience with frameworks and technologies: PyTorch, Scikit-learn, Hugging Face, LangChain, LlamaIndex, SpringAI (Optional), vector databases (Pinecone, Weaviate, Milvus), and embedding models
Strong knowledge of Agentic AI design and tools, e.g. LangGraph, CrewAI, tool calling, and reasoning/thinking models
Strong knowledge about context-engineering, and how to design a RAG/chat system memory (long, short, summarized, ...)
Strong expertise in low-latency inference optimization and GPU resource management
Solid experience building large-scale data ingestion and processing pipelines (Spark, Flink, Kafka, RabbitMQ)
Clear communicator capable of translating complex technical concepts into strategic business value
Expertise in red-teaming practices and machine learning security research, including developing and reinforcing robust defenses against adversarial threats
Arabic & English language proficiency.
Tech Stack
Docker
GRPC
Java
Kafka
Kubernetes
Python
PyTorch
RabbitMQ
Scikit-Learn
Spark
Benefits
Competitive salary and bonus
Unifonic share scheme (we are all owners!)
30 holiday days after the first anniversary
Your Birthday off!
Spend up to 25 days per year working from anywhere in the world!