System Automation Corporation is hiring a Senior Site Reliability Engineer to join their platform team and help evolve the infrastructure, observability, and security posture of their Azure-based SaaS platform. The role involves collaborating with product engineers and technical leadership to build secure, scalable, and maintainable systems while fostering a strong DevOps culture.

Responsibilities:

Design and evolve Azure platform infrastructure with a focus on scalability, reliability, and growth readiness
Participate in capacity planning to support growth, peak demand, and seasonal usage patterns
Integrate with development resources to implement infrastructure-as-code (e.g., Bicep)
Troubleshoot production infrastructure issues and lead incident response efforts, including coordination, escalation, and real-time remediation across teams
Conduct post-incident reviews (postmortems) focused on root cause analysis, corrective actions, and long-term reliability improvements rather than blame
Monitor and operate production systems using Azure Monitor, Application Insights, Sentinel, and related observability tooling
Improve system reliability and performance through alerting, error monitoring, SLOs/SLAs, and analysis of performance and capacity trends
Collaborate with security analyst to define and implement security controls across Azure resources and pipelines
Manage secrets, certificates, and identity integrations
Automate security posture checks in CI/CD pipelines
Maintain policy-as-code using Azure Blueprints or Defender for Cloud
Act as a key team member in the authorization and enablement of access to secured resources
Support SOC 2 Type II compliance through tooling, automation, and audit readiness
Respond to evidence requests and generate reports from observability and security systems
Contribute to the documentation of platform controls and best practices
Support, maintain, and own CI/CD pipelines (GitHub Actions, Azure DevOps, or equivalent)
Optimize build, test, and release flows, partnering with engineers to diagnose failures and improve deployment reliability
Define and maintain consistent environment standards across development, staging, and production to ensure deployment safety, reliability, and compliance
Partner with engineering teams to improve deployment promotion strategies, rollback mechanisms, and release safety practices

Requirements:

5+ years of experience in Site Reliability, DevOps, or Cloud Infrastructure
Strong experience in Microsoft Azure, including identity, networking, and monitoring. Specifically needs to have demonstrated experience using and optimizing platform as a service technologies in Azure with an understanding of consumption limitations
Hands-on experience in DevOps and SRE
Familiarity with SOC 2 or other compliance frameworks (HIPAA, FedRAMP a plus) as well as how these are implemented and maintained in Azure
Proficient with scripting or automation (e.g., PowerShell, Bash, Python, etc.)
Strong collaboration and documentation habits
Ability to quickly identify and create necessary Azure resources/scripts in support of ongoing operations needs
Experience optimizing infrastructure for cost management
Experience with Terraform/Bicep, GitHub Actions
Exposure to low-code or microservice platforms
Demonstrated experience using AI tools to optimize work output
Certifications such as Azure certifications (AZ-104, AZ-400, AZ-500) a plus

DevSecOps Site Reliability Engineer

Key skills

About this role

Responsibilities:

Requirements: