Architecting, implementing, testing, and operating end-to-end RAG workflows: Ingesting and normalizing documents from diverse sources
Generating and managing embeddings; index and query vector databases
Retrieve relevant passages, apply reranking or fusion strategies, and feed prompts to LLMs
Building scalable, low-latency services and APIs (Python preferred; other languages acceptable) and ensure production-grade reliability (monitoring, tracing, alerting)
Integrating with vector databases and embedding pipelines and optimize for latency, throughput, and cost
Designing and implementing ML Ops workflows: model/version management, experiments, feature stores, CI/CD for ML-enabled services, rollback plans
Developing robust data pipelines and governance around ingestion, provenance, quality checks, and access controls
Collaborating with data engineers to improve retrieval quality (embedding strategies, reranking, cross-encoder models, prompt engineering) and implement evaluation metrics (precision/recall, MRR, QA accuracy, user-centric metrics)
Implementing monitoring and observability for RAG components (latency, success rate, cache hit rate, retrieval quality, data drift)
Ensuring security, privacy, and compliance (authentication, authorization, data masking, PII handling, audit logging)
Requirements
5+ years of professional software engineering experience designing and delivering production systems
Strong programming skills (Python required; NodeJs a plus)
Deep understanding of retrieval-augmented or application-scale NLP systems and practical experience building RAG-like pipelines
Hands-on experience with ML workflow tooling and MLOps concepts (model serving, versioning, experiments, feature stores, reproducibility)
Proficiency with cloud infrastructure and modern software practices (AWS/GCP/Azure; Docker; Kubernetes; CI/CD)
Strong problem-solving skills, excellent communication, and ability to work with cross-functional teams
Familiarity with data governance, privacy, and security best practices.