Camunda is the leader in enterprise agentic automation, orchestrating complex business processes across agents, people, and systems. As a Senior Cloud Infrastructure Engineer, you will design, maintain, and improve our Kubernetes-based infrastructure and multi-cloud platform, ensuring operational excellence and supporting a world-class SaaS experience for customers.
Responsibilities:
- Architect & Maintain Our Platform: Design, build, and maintain our Kubernetes-based infrastructure and multi-cloud platform, focusing on availability, scalability, fault tolerance. You will be directly involved in expanding Camunda SaaS capabilities by playing an important role in upcoming projects like:
- Making our service available as a multi-region offering
- Expanding the availability of our service to new regions and cloud providers
- Champion Observability: Implement and enhance our monitoring tools to provide clear visibility into the health and performance of our entire stack – for both SREs and developers. You will be directly involved in helping Camunda continue its Observability journey by being an instrumental part of evolving our monitoring and observability practice supporting a multi-cloud, multi-region product
- Collaborate & Innovate: Work closely with cross-functional teams (development, product, etc.) to define, improve, and efficiently ship new features. Bring your experience to bear on how we can innovate and automate our processes further. You will be directly involved in developing new capabilities for Camunda SaaS
- Be a Trusted Resource: Provide 3rd level support for critical incidents and participate in our on-call rotation, ensuring rapid response and resolution. You will directly assist our customers and partners in providing a world-class SaaS experience
- Drive Automation: Identify opportunities to automate manual tasks and improve operational efficiency across the platform. You will help Camunda:
- Continue to scale operations with automation
- Evolve operational strategy to uplevel Camunda as a world-class SaaS provider
Requirements:
- 5+ years of experience in Site Reliability Engineering (SRE) or a similar role, with a strong focus on cloud infrastructure
- Deep understanding and practical experience with Kubernetes and containerization technologies (Docker, etc.)
- Proficiency in at least one scripting language (Python, Go) for automation and tooling development
- Experience with monitoring and observability tools (Prometheus, Grafana, ELK stack, Datadog, New Relic – or similar)
- Experience working in a multi-cloud environment (AWS, Azure, GCP)
- Familiarity with Infrastructure as Code (IaC) tools like Terraform or CloudFormation