Cloudera is a company that empowers people to transform complex data into clear and actionable insights. They are seeking a Staff Software Engineer to lead the architecture and delivery of their cloud-native AI platform, optimizing how they run and manage open-source models using Kubernetes-native patterns and ensuring the integration of AI capabilities for enterprise use.
Responsibilities:
- Design and implement elegant, scalable application services (Go/Node.js) that wrap AI capabilities for enterprise use
- Lead the deployment of inference servers (vLLM, Triton) using KServe, KubeRay, or Knative to ensure serverless-style scaling for AI workloads
- Build internal tooling, SDKs, and "AI Gateways" that enhance team agility and simplify the integration of Foundation Models (Llama, GPT) into product features
- Architect robust Retrieval-Augmented Generation (RAG) pipelines and prompt management services that integrate seamlessly with vector databases and enterprise data sources
- Partner with UI engineers, UX designers, and Product Management to ensure the AI platform is not just powerful, but highly usable for internal developers
- Ensure AI workloads are secure, multi-tenant, and optimized for GPU resource scheduling (MIG, fractional GPUs) within Kubernetes
Requirements:
- Bachelor's degree with 6+ years of software engineering experience (or equivalent Masters/PhD tenure), with at least 2+ years focused on AI/ML systems
- Expert proficiency in Python (for AI ecosystem) and strong competence in a systems language like Go or Rust/C++ (for high-performance serving layers)
- Deep understanding of LLM deployment challenges and runtimes (e.g., vLLM, ONNX, TorchServe, Triton)
- Familiarity with quantization techniques (AWQ, GPTQ) to optimize model size/speed
- Experience building complex workflows using tools like LangChain or LlamaIndex, and deploying them on containerized infrastructure (Docker/Kubernetes)
- Ability to navigate the rapidly changing AI landscape, filtering hype from practical engineering solutions, and driving technical alignment across teams
- Model Fine-Tuning: Experience with efficient fine-tuning techniques (PEFT, LoRA/QLoRA) on custom datasets
- GPU Optimization: Familiarity with CUDA programming or profiling GPU performance (Nsight systems)
- Open Source: Contributions to open-source AI projects (HuggingFace transformers, vLLM, etc.)