Cloudera is a company that empowers people to transform complex data into actionable insights. They are seeking a Staff Full Stack Software Engineer to lead the architecture and delivery of AI-powered workflows, collaborating with various teams to ensure high-impact features are delivered at scale.

Responsibilities:

Own the architecture: Design, evolve, and document the end-to-end AI workflow stack (prompting, retrieval, tools/function-calling, agents, orchestration, evaluation, observability, and safety) with clear interfaces, SLAs, and versioning
Ship production systems: Build reliable, low-latency services that integrate foundation models (hosted and self-hosted), and traditional microservices
Own end-to-end delivery of features from the user-facing aspect (UI) to the backend services
Implement robust testing frameworks, including unit, regression, and end-to-end tests, to guarantee deterministic and predictable behavior from our AI-powered data platform. Establish safety guardrails and human-in-the-loop processes to maintain accuracy and ensure the production of ethical, responsible, and non-toxic outputs
Optimize for cost & performance: Instrument, analyze, and optimize unit economics (token usage, caching, batching, distillation) and performance (p95 latency, throughput, autoscaling)
Drive data excellence: Shape data contracts, feedback loops, labeling strategies, and feature stores to continuously improve model and workflow quality
Mentor and multiply: Provide technical leadership across teams, unblock complex projects, raise code/design standards, and mentor senior engineers
Partner across functions: Translate product intent into technical plans, influence roadmaps with data-driven insights, and communicate trade-offs to executives and stakeholders

Requirements:

Bachelor's degree in Computer Science or equivalent, and 6+ years of experience
Expertise in at least one primary language (Rust preferred) and ecosystem (e.g., Python, Go, or Java) and cloud-native architectures (containers, service mesh, queues, eventing)
Proven experience in integrating AI/ML models into user interfaces. This is more than just calling an API; you should have experience building features like AI-powered assistants, natural language interfaces (e.g., text-to-SQL), proactive suggestions, or intelligent data visualization
Familiarity with the AI/ML ecosystem: You understand the fundamentals of LLMs, vector databases, RAG, and prompt engineering. Familiarity with tools such as MLflow, LangChain, or Hugging Face is a significant advantage
Security & privacy mindset: Familiarity with data governance, PII handling, tenant isolation, and compliance considerations
Platform thinking: Experience designing reusable AI workflow primitives, SDKs, or internal platforms used by multiple product teams
Model ops: Experience with model lifecycle management, feature/embedding stores, prompt/version management, and offline/online eval systems
Search & data infra: Experience with vector databases (e.g., Pinecone, Weaviate, pgvector), retrieval strategies, and indexing pipelines
Observability: Built robust tracing/metrics/logging for AI systems; familiarity with quality dashboards and prompt diff tooling
Cost strategy: Experience with model selection, distillation, caching layers, router policies, and autoscaling to manage spend
Experience with managing machine learning workloads on container orchestration platforms like Kubernetes, including setting up GPU resources, managing distributed training jobs, and deploying models at scale

Staff Full Stack Software Engineer, Platform Engineering

Key skills

About this role

Responsibilities:

Requirements: