Cloudera is a company that empowers people to transform complex data into clear and actionable insights. They are seeking a Staff Software Engineer to lead the architecture and delivery of their cloud-native AI platform, optimizing how they run and manage open-source models using Kubernetes-native patterns and ensuring the integration of AI capabilities for enterprise use.

Responsibilities:

Design and implement elegant, scalable application services (Go/Node.js) that wrap AI capabilities for enterprise use
Lead the deployment of inference servers (vLLM, Triton) using KServe, KubeRay, or Knative to ensure serverless-style scaling for AI workloads
Build internal tooling, SDKs, and "AI Gateways" that enhance team agility and simplify the integration of Foundation Models (Llama, GPT) into product features
Architect robust Retrieval-Augmented Generation (RAG) pipelines and prompt management services that integrate seamlessly with vector databases and enterprise data sources
Partner with UI engineers, UX designers, and Product Management to ensure the AI platform is not just powerful, but highly usable for internal developers
Ensure AI workloads are secure, multi-tenant, and optimized for GPU resource scheduling (MIG, fractional GPUs) within Kubernetes

Requirements:

Bachelor's degree with 6+ years of software engineering experience (or equivalent Masters/PhD tenure), with at least 2+ years focused on AI/ML systems
Expert proficiency in Python (for AI ecosystem) and strong competence in a systems language like Go or Rust/C++ (for high-performance serving layers)
Deep understanding of LLM deployment challenges and runtimes (e.g., vLLM, ONNX, TorchServe, Triton)
Familiarity with quantization techniques (AWQ, GPTQ) to optimize model size/speed
Experience building complex workflows using tools like LangChain or LlamaIndex, and deploying them on containerized infrastructure (Docker/Kubernetes)
Ability to navigate the rapidly changing AI landscape, filtering hype from practical engineering solutions, and driving technical alignment across teams
Model Fine-Tuning: Experience with efficient fine-tuning techniques (PEFT, LoRA/QLoRA) on custom datasets
GPU Optimization: Familiarity with CUDA programming or profiling GPU performance (Nsight systems)
Open Source: Contributions to open-source AI projects (HuggingFace transformers, vLLM, etc.)

Staff Software Engineer , Anywhere Cloud - AI Systems & Runtimes

Key skills

About this role

Responsibilities:

Requirements: