Cloudera is a leading company in data management and cloud innovation, seeking a Staff Backend Engineer to join their Anywhere Cloud team. The role involves architecting and improving scalable backend systems, driving performance and reliability across Kubernetes-based services, and mentoring engineers while managing project priorities.

Responsibilities:

Architect, build, and improve scalable backend systems and APIs
Drive performance, reliability, and security across kubernetes based backend services
Implement robust testing frameworks, including unit, regression, and end-to-end tests, to guarantee deterministic and predictable behavior from our AI-powered data platform
Establish safety guardrails and human-in-the-loop processes to maintain accuracy and ensure the production of ethical, responsible, and non-toxic outputs
Optimize for cost & performance: Instrument, analyze, and optimize unit economics (token usage, caching, batching, distillation) and performance (p95 latency, throughput, autoscaling)
Drive data excellence: Shape data contracts, feedback loops, labeling strategies, and feature stores to continuously improve model and workflow quality
Mentor and multiply: Provide technical leadership across teams, unblock complex projects, raise code/design standards, and mentor senior engineers
Partner across functions: Translate product intent into technical plans, influence roadmaps with data-driven insights, and communicate trade-offs to executives and stakeholders

Requirements:

6+ years of software engineering experience building large-scale distributed production systems
Expertise in at least one primary language (Go preferred) and ecosystem (eg: Rust) and cloud-native architectures (containers, service mesh, queues, eventing)
Proven expertise in advanced Kubernetes design and operation, including optimizing performance (e.g., node affinity, resource limits, horizontal pod autoscaling), service mesh implementation, and custom resource definition (CRD) development
Experience designing reusable AI workflow primitives, SDKs, or internal platforms used by multiple product teams
Built robust tracing/metrics/logging for AI systems; familiarity with quality dashboards and prompt diff tooling
Experience with managing machine learning workloads on container orchestration platforms like Kubernetes, including setting up GPU resources, managing distributed training jobs, and deploying models at scale
Familiarity with the AI/ML ecosystem: You understand the fundamentals of LLMs, vector databases, RAG, and prompt engineering
Familiarity with tools such as MLflow, LangChain, or Hugging Face is a significant advantage
Security & privacy mindset: Familiarity with data governance, PII handling, tenant isolation, and compliance considerations

Staff Software Engineer

Key skills

About this role

Responsibilities:

Requirements: