Home
Jobs
Saved
Resumes
SRE, Pleno at Grupo PRIMO | JobVerse
JobVerse
Home
Jobs
Recruiters
Companies
Pricing
Blog
Jobs
/
SRE, Pleno
Grupo PRIMO
Website
LinkedIn
SRE, Pleno
Brazil
Full Time
5 hours ago
Apply Now
Key skills
AWS
Azure
Cloud
Docker
Google Cloud Platform
Grafana
Jenkins
Kubernetes
Prometheus
Python
Terraform
Go
Bash
GCP
Google Cloud
GitHub Actions
GitLab CI
Pulumi
CloudFormation
Datadog
New Relic
GitHub
GitLab
CI/CD
Communication
About this role
Role Overview
Define and implement SLI/SLOs for critical services (latency, availability, error rate)
Establish company-wide observability standards (structured logs, distributed traces, metrics – RED/USE)
Configure dashboards and alerts in Datadog (SLO tracking, burn rate, anomaly detection)
Create and maintain runbooks for troubleshooting and incident response
Participate in blameless postmortems and ensure implementation of improvements
Enable engineering teams to adopt reliability standards (office hours, pairing, documentation)
Map and monitor costs by product, team, and environment
Identify and eliminate waste (idle resources, old snapshots, unused volumes)
Implement optimization automations (automatic shutdown, rightsizing, orphaned resource cleanup)
Configure cost anomaly alerts and budget tracking
Collaborate with teams to validate and execute optimizations
Conduct weekly office hours
Document standards, runbooks, and processes clearly and consumably
Pair with developers to implement standards
Collect feedback and propose continuous improvements
Present results in monthly reviews and all-hands
Requirements
Observability: structured logs, distributed traces, metrics (golden signals)
Platforms: Datadog, New Relic, Grafana/Prometheus, ELK or similar
Cloud: Strong experience in AWS, GCP or Azure
Automation: Python, Bash or Go
IaC: Terraform, CloudFormation, Pulumi or similar
CI/CD: Knowledge of pipelines (GitHub Actions, GitLab CI, Jenkins)
Containers: Docker and Kubernetes (deployments, services, ingress)
Advanced Datadog (APM, SLO Tracking, Cloud Cost Management)
Plus
Practical experience with SLO/error budgets in production
Plus
FinOps (tagging, budgets, anomaly detection, cost optimization)
Plus
DORA metrics and DevEx practices
Plus
Incident management, on-call and structured postmortems
Plus
End-to-end ownership and accountability
Behavioral
Consistent presence and proactive communication
Behavioral
Pragmatism and focus on incremental deliveries
Behavioral
Clear communication for technical and executive audiences
Behavioral
Enablement mindset
Behavioral
Continuous learning and autonomy
Behavioral
Tech Stack
AWS
Azure
Cloud
Docker
Google Cloud Platform
Grafana
Jenkins
Kubernetes
Prometheus
Python
Terraform
Go
Benefits
Semiannual Variable Bonus
Meal Allowance and Food Voucher available on Ifood flexible card
SulAmérica Health Plan
SulAmérica Dental Plan
Total Pass
Life Insurance
Commuter Allowance
Childcare Assistance
Access to Grupo Primo platforms
Apply Now
Home
Jobs
Saved
Resumes