Home
Jobs
Saved
Resumes
JobVerse
Home
Jobs
Recruiters
Companies
Pricing
Blog
Jobs
/
Site Reliability Engineer (SRE) – Site Reliability Engineering
Gauge
Website
LinkedIn
Site Reliability Engineer (SRE) – Site Reliability Engineering
Brazil
Full Time
2 hours ago
No H1B
Apply Now
Key skills
Azure
Cloud
Grafana
Kubernetes
Prometheus
Python
Terraform
Bash
PowerShell
Analytics
Databricks
Datadog
New Relic
Dynatrace
CI/CD
Mentoring
Communication
Cloud Security
About this role
Role Overview
Implement and evolve observability and monitoring strategies using Dynatrace
Configure metrics, dashboards, alerts, and distributed tracing
Manage and respond to incidents, conducting root cause analyses
Define and track reliability indicators such as SLI, SLO, and SLA
Automate operational routines and provisioning processes
Drive continuous improvement of performance, availability, and resilience
Collaborate with development teams to raise reliability standards
Implement high-availability, scalability, and disaster recovery practices
Support continuous integration and delivery pipelines
Monitor capacity and cost optimization in Azure environments
Requirements
Previous experience in SRE, DevOps, or cloud operations
Strong knowledge of Microsoft Azure
Solid experience with observability tools (Dynatrace, Datadog, New Relic, Prometheus/Grafana, or similar)
Experience with Dynatrace is a plus
Experience with containers and orchestration (Kubernetes)
Knowledge of automation and scripting (Python, Bash, or PowerShell)
Experience with CI/CD pipelines
Experience with infrastructure as code
Analytical skills for troubleshooting distributed systems
Strong experience in workload monitoring
Reliability-oriented mindset focused on failure prevention
Strong ability to diagnose and resolve complex problems
Clear communication with technical teams and stakeholders
Proactive identification of operational risks
Organized and disciplined incident management
Experience with data environments or analytics platforms (plus)
Knowledge of cloud security and compliance (plus)
Experience with Terraform, Bicep, or ARM (plus)
Experience in high-availability and mission-critical environments (plus)
Azure or observability certifications (plus)
Experience operating and monitoring workloads on Databricks (jobs, clusters, performance, and cost) (plus)
Tech Stack
Azure
Cloud
Grafana
Kubernetes
Prometheus
Python
Terraform
Benefits
Meal allowance / Food voucher
Health insurance
Dental insurance
Day off
Gympass / Totalpass
Childcare allowance
Pet assistance
Fuel allowance
Home office allowance
Educational reimbursement
Free online health platform
E-learning – Stefanini Academy with a variety of courses
Mentoring – Mentorship platform (opportunity to network, develop skills, and share experiences)
Discounts for undergraduate, postgraduate, language, and other courses
Perks and discounts at partner establishments
Apply Now
Home
Jobs
Saved
Resumes
Site Reliability Engineer (SRE) – Site Reliability Engineering at Gauge | JobVerse