Atlan is a pioneering company focused on transforming data chaos into clarity through their active metadata platform. The Staff+ Engineer will design and build platform services that enable app lifecycle, distribution, and runtime, while ensuring enterprise-grade reliability and scalability.
Responsibilities:
- Design, build, and own platform services that power app lifecycle, distribution, and runtime at Atlan
- Architect systems that support enterprise-grade reliability, scalability, and multi-tenancy
- Lead complex initiatives end-to-end — from design and RFCs to rollout and long-term ownership
- Build and evolve cloud-native infrastructure on Kubernetes across AWS/GCP
- Solve hard distributed systems problems: noisy-neighbor isolation, tenant-aware scaling, fault tolerance, and observability
- Partner closely with teams across App Runtime, Distribution, Builder Experience, Product, and Infra to make aligned platform decisions
- Act as a technical leader and multiplier — reviewing designs, mentoring engineers, and setting platform best practices
- Write production-grade code, leveraging AI-assisted development tools (Claude, Cursor) as a force multiplier
- Contribute to architectural documentation, RFCs, and long-term platform strategy
Requirements:
- 8+ years of experience building and operating large-scale backend or platform systems
- Deep expertise in Kubernetes, containerization, and orchestration in production environments
- Strong hands-on experience with cloud infrastructure (AWS, GCP, or Azure), including networking, compute, storage, and security
- Proven experience operating systems at scale—high traffic, distributed, or mission-critical platforms
- Excellent understanding of platform engineering principles: reliability, automation, self-service, and developer experience
- Experience designing systems with high availability, fault tolerance, and deep observability
- Demonstrated AI-native mindset: You design platforms assuming non-deterministic workloads, You think in terms of automation, leverage, and feedback loops, You've supported or built infrastructure for AI/ML or AI-powered systems
- Hands-on experience leveraging AI in your own development workflow to increase speed, quality, and leverage
- Ability to operate at Staff+ level: owning ambiguous problem spaces, driving architecture, and influencing across teams