Home
Jobs
Saved
Resumes
Senior Site Reliability Engineer at BillingPlatform | JobVerse
JobVerse
Home
Jobs
Recruiters
Companies
Pricing
Blog
Jobs
/
Senior Site Reliability Engineer
BillingPlatform
Remote
Website
LinkedIn
Senior Site Reliability Engineer
Serbia
Contract
3 hours ago
Visa Sponsorship
Apply Now
Key skills
AWS
Cloud
Grafana
Kubernetes
Oracle
Prometheus
Python
Terraform
Go
Bash
Pulumi
Datadog
Jaeger
Istio
Linkerd
Service Mesh
SaaS
CI/CD
Communication
Collaboration
About this role
Role Overview
Own and improve on-call processes, incident response playbooks, and post-mortem culture
Define, track, and manage SLOs, SLIs, and error budgets for critical services
Lead blameless post-mortems and drive systematic reliability improvements
Respond to production incidents and coordinate cross-functional resolution
Design, build, and maintain scalable AWS infrastructure using IaC (Terraform, Pulumi)
Manage Kubernetes clusters and containerized workloads in production
Build and maintain CI/CD pipelines to improve deployment speed and reliability
Evaluate and implement tooling to enhance developer productivity and system stability
Implement monitoring, alerting, and distributed tracing (Prometheus, Grafana, Datadog, Jaeger)
Identify and resolve performance bottlenecks across services, networks, and databases
Build dashboards and runbooks for self-service operational insights
Partner with engineering teams to embed reliability practices (load testing, capacity planning, chaos engineering)
Conduct architecture reviews with a focus on reliability and operability
Requirements
5+ years of experience in SRE, DevOps, or infrastructure engineering
Deep expertise with AWS and cloud-native architectures
Strong experience with Kubernetes and container orchestration at scale
Hands-on experience with infrastructure-as-code tools (Terraform or Pulumi)
Proficiency in Python, Go, or Bash
Experience with observability tools (Prometheus, Grafana, Datadog, or similar)
Strong understanding of SLOs, SLIs, and error budgets
Experience with service mesh technologies (Istio, Linkerd)
Familiarity with chaos engineering tools (Chaos Monkey, Gremlin, LitmusChaos)
Background in Oracle database reliability and administration
Contributions to open-source infrastructure projects
Experience in a high-growth SaaS or product-led environment
Excellent English communication skills (written and spoken).
Tech Stack
AWS
Cloud
Grafana
Kubernetes
Oracle
Prometheus
Python
Terraform
Go
Benefits
A high-impact role at a growing SaaS company that values personal growth, accountability, and teamwork
A culture of open collaboration and problem-solving
100% remote
Competitive pay
Apply Now
Home
Jobs
Saved
Resumes