About this role

Block MB is a fast-growing AI infrastructure company that is building a cutting-edge vector database platform used for AI search, recommendation systems, and large-scale data discovery. They are looking for a Site Reliability Engineer to join their Cloud Operations team and help ensure their cloud platform remains reliable, scalable, and secure as usage continues to grow.

Responsibilities:

Operating and maintaining production cloud infrastructure at scale
Managing Kubernetes clusters, networking, and deployment pipelines
Improving monitoring, logging, and alerting systems
Leading incident response and root cause analysis
Automating operational tasks to reduce manual toil
Improving security, reliability, and performance of production systems
Working closely with platform and infrastructure teams
Participating in on-call rotations

Requirements:

5+ years experience in DevOps / SRE / Infrastructure roles
Strong hands-on Kubernetes production experience
Solid knowledge of Linux systems and networking
Experience with AWS, GCP, or Azure
Experience with monitoring, alerting, and incident management
Familiarity with infrastructure-as-code and automation tools
Terraform
Prometheus / Grafana / Loki / OpenTelemetry
Scripting with Python, Bash, or Go
Experience working in SaaS or cloud infrastructure environments

Senior SRE Engineer - Cloud Operations

Key skills

About this role

Responsibilities:

Requirements: