Hopper is on a mission to become the leading travel platform globally, and they are seeking a Senior Site Reliability Engineer to join their Cloud FinOps team. The role involves managing infrastructure in Google Cloud, automating processes, and ensuring systems remain optimized for cost efficiency and reliability.
Responsibilities:
- Work on projects that will drive a higher cost efficiency, such as:
- Reduce our network egress costs by removing unnecessary headers
- Ensure that our warehouse data is in use and select the most efficient storage for it. E.g., cold storage for buckets with infrequent retrieval
- Ensure that autoscaling for both databases and compute is well optimized
- Work on improving the current cost attribution to ensure all teams have clear visibility into their costs
- You will also participate in providing support to incidents and be part of on-call rotation for platform incidents, as each engineering team has their own on-call rotation (Team is scattered across America and Europe, so you can sleep at night!). You will also contribute to solving doubts and problems engineers might face with our infrastructure and approving PRs that require Platform supervision
- You will be part of a small and highly efficient team of SREs
Requirements:
- Strong background in SRE, DevOps, Software Engineering or Systems engineering
- Troubleshooting skills
- System design with good analytical capabilities
- Good communication skills
- Knowledge of major cloud providers, preferably Google Cloud
- SQL knowledge
- Containers, Kubernetes, and related tooling like Kustomize and Helm
- Service Mesh, preferably with Istio
- Networking knowledge. DNS, TLS, certificates, ingresses, etc
- Observability with log collection, metrics, APM, etc. preferably Datadog
- Security knowledge, IAM, RBAC, network security, etc
- Knowledge on authentication and authorization technologies
- CI/CD
- Database technologies
- Competent in scripting with Bash and Python or other scripting languages