Qlik is a leading company in data transformation and analytics, serving over 40,000 global customers. The Site Reliability Engineer will play a critical role in ensuring the reliability, security, and scalability of Qlik and Talend Cloud services, working with cloud-native technologies and collaborating with various engineering teams.
Responsibilities:
- Solve real scale challenges – Work on reliability and performance across a global cloud platform handling millions of transactions
- Engineer, not just operate – Build tooling, automation, alerts, and scalable infrastructure patterns that prevent problems before they happen
- Collaborate with highly skilled teams – Partner with Global SRE, Architecture, Platform, and Domain Engineering teams to influence how infrastructure is designed from the ground up
- Work with modern cloud-native technologies – Kubernetes, IaC, observability tooling, autoscaling, secret management, CI/CD — you’ll be hands-on with today’s most relevant technologies
- Shape best practices – Help define and champion cloud optimization and reliability standards across the organization
- Grow your technical influence – Act as a go-to resource for reliability, incident management, cloud engineering, and production operations
- Continuously evolve – Stay close to emerging tools and practices, contributing to ongoing improvements in our cloud environment
- Increase reliability and availability by implementing resilient infrastructure patterns and performance optimizations
- Reduce incidents and recovery time through better observability, automation, and proactive engineering
- Strengthen scalability by designing infrastructure that adapts seamlessly to growth
- Improve cloud efficiency by driving optimization best practices across AWS and Azure environments
- Resolve complex system challenges across infrastructure, networking, applications, and distributed systems
- On-Call Support: Participate in on-call duties to maintain the availability and performance of our cloud infrastructure, providing regular updates on project status and activities. This includes first-line incident response
- Elevate engineering standards by mentoring peers and embedding reliability-first thinking into development workflows
Requirements:
- Cloud engineering skill across AWS and/or Azure, including hands-on experience supporting production systems running on Kubernetes at scale
- Infrastructure as Code and microservices experience, using tools such as Terraform, Crossplane or Ansible, with a strong understanding of operating distributed systems in live environments
- Automation and engineering mindset, with proficiency in Python, Go or Bash, plus experience building and improving CI/CD pipelines and autoscaling strategies
- Observability and incident management depth, including Prometheus, Grafana, OpenTelemetry, distributed tracing, and SIEM tooling — with the ability to turn insights into reliability improvements
- Security and networking knowledge, including secret management (e.g., Vault, AWS SSM) and familiarity with infrastructure security and compliance best practices
- Cloud-native tooling experience, including Helm (managing and creating charts) and exposure to modern database and ecosystem technologies such as MongoDB
- Strong analytical thinking, with the ability to troubleshoot complex issues across infrastructure, networking, and application layers
- Curiosity and collaboration at their core; a passion for learning, sharing ideas and insight and comfort with the on-call support rotation – experience here is also welcome