Backend for LLMs – Architect and implement scalable, low-latency APIs and services that wrap, orchestrate, and optimize LLMs for healthcare use cases.
Data & Retrieval Pipelines – Build ingestion, preprocessing, and retrieval-augmented generation (RAG) pipelines to ground LLMs in clinical and revenue-cycle data.
LLMOps & Observability – Design systems for model monitoring, evaluation, cost tracking, and guardrails, ensuring reliability and responsible use.
Performance & Optimization – Engineer solutions for caching, batching, load balancing, and scaling LLM workloads across cloud and containerized environments.
Security & Compliance – Implement HIPAA-ready infrastructure, data governance, and auditability for LLM-powered applications.
Cross-Functional Collaboration – Partner with product, ML engineers, and healthcare experts to translate business workflows into robust backend systems.
Technical Leadership – Drive end-to-end delivery of LLM backend projects, establish engineering best practices, and mentor peers in LLM system design.

5+ years of backend or full-stack software engineering experience, with 3+ years working on ML/LLM-enabled applications.
Strong coding skills in Python (and ideally one statically typed language such as Go, Java, or TypeScript).
Experience with LLM integration frameworks (Hugging Face, LangChain, LlamaIndex, OpenAI APIs, Anthropic, etc.).
Deep knowledge of distributed systems, service-oriented architecture, and building APIs at scale.
Cloud-native expertise: AWS/GCP/Azure, Kubernetes, Docker, Terraform, etc.
Familiarity with MLOps/LLMOps practices: CI/CD for models, evaluation harnesses, monitoring, and reproducibility.
Excellent system design skills and the ability to align technical architecture with product goals.

Staff Software Engineer, Applied AI

Key skills