Hewlett Packard Enterprise is a global edge-to-cloud company that helps organizations connect, protect, analyze, and act on their data. They are seeking a Site Reliability Engineer to enhance their production environment for rapid scaling and outstanding performance, focusing on maintaining system uptime and reliability while collaborating with software developers.

Responsibilities:

Express your passion about infrastructure as code and continuous deployment to build scalable and highly reliable systems
Define and own KPIs around system availability, quality and scale
Partner with our developers and quality engineering teams to automate the monitoring, alerting, availability and scalability of our applications and systems
Ensure system availability and business continuity by implementing redundant servers/services
Manage after-hours infrastructure updates and maintenance
Proactively research and propose the use of new concepts, processes, technologies, and tools
Partner with software developers to create Mist standards for Microservices (APIs, schemas, serialization, data stores and best practices)
Run secure and scalable applications for highly available, multi-region, AWS and GCP deployments
Ship code several times per week
Be a part of our On-Call rotation
Own disaster recovery and business continuity plans

Requirements:

An extensive background in developing and operating large-scale cloud-based distributed applications
Direct experience developing/running applications on AWS or Google Cloud
Laser focus and be able to design infrastructure solutions for scalability, reliability, high availability, performance, security, software maintainability, and operational excellence
The ability to 'fix the plane while in flight' (not just support greenfield solutions)
The ability to prioritize existing technical and infrastructure debt, and experience to build and execute a plan to pay it off
Delivering web-scale infrastructure for a global market at high release velocity
A deep understanding of distributed system design and dependency management
Must have solid experience with at least 2 of the languages: Go, Java, Python
10+ years industry experience in managing infrastructure
5 years Kubernetes administration in a large-scale SaaS environment
5 years maintaining production systems on AWS or GCP
3 years in implementing, managing, and monitoring metrics specific to SaaS applications
3 years using infrastructure as code software (eg. Terraform, AWS and Google Cloud Deployment, CloudFormation)
5 years' experience in continuous integration practices & tools (Jenkins, Travis CI, CircleCI, etc…)
Experience with Kafka, Spark, Storm, Cassandra, ElasticSearch, PostgreSQL, Redis, Zookeeper, Nginx, Airflow
Experience of working with or contributing directly to Open Source projects
Understanding and experience of leading/managing technology products
Understand machine learning techniques and tools. Translate business requirements into data models and implement them for scale and production ready systems
Experience of working with failure-based testing
Experience working in a test-driven development environment

Site Reliability Engineer (DevOps) - Netherlands

Key skills

About this role

Responsibilities:

Requirements: