Delinea is a pioneer in securing human and machine identities through intelligent, centralized authorization. The Senior Site Reliability Engineer will be responsible for ensuring the availability, performance, and reliability of critical production SaaS applications hosted in Azure, while maintaining compliance within a FedRAMP High authorized environment.
Responsibilities:
- Serve as the primary point of contact for several critical production SaaS applications hosted in Azure, ensuring their availability, performance, and reliability
- Maintain and support infrastructure within a FedRAMP High authorized environment, ensuring continuous compliance with NIST 800-53 controls and participating in audit readiness activities
- Configure, monitor, troubleshoot, and resolve complex cloud infrastructure and application issues across multiple environments
- Ensure critical SLAs are met, including participation in an on-call rotation for weekends and emergencies
- Develop and maintain automation solutions for monitoring, alert mitigation, telemetry, log analysis, and incident response
- Contribute to security documentation including system security plans, standard operating procedures, and runbooks
- Apply observability best practices to proactively detect and mitigate issues using logging, metrics, tracing, and alerting tools
- Partner with engineering, security, and product teams to drive reliability improvements and ensure services are built with SRE principles from the ground up
- Lead and contribute to post-incident reviews, identifying root causes, and implementing preventive actions
Requirements:
- 8+ years of relevant experience in Site Reliability Engineering, DevOps, or Cloud Administration
- Strong background in integrating, upgrading, securing, and supporting software systems across heterogeneous environments
- Proven hands-on experience as a Cloud Administrator with Azure, including microservices on AKS (Azure Kubernetes Service), cloud concepts, and cloud security
- Scripting and programming experience: PowerShell, Python, and markup languages such as XML, JSON, and YAML
- Infrastructure-as-code expertise with Terraform and Azure DevOps pipelines
- Knowledge of redundancy, backup, and disaster recovery strategies in cloud environments
- Hands-on expertise with monitoring and observability tools such as Datadog, Azure Application Insights, Log Analytics
- Strong understanding of networking fundamentals, including firewalls, VLANs, NAT, NACLs, load balancing, VPN tunnels, DNS, DHCP, and packet filtering
- Direct experience operating in FedRAMP environments, with working knowledge of NIST 800-53 controls, ConMon requirements, and boundary protection
- Hands-on experience with CI/CD automation (e.g., Azure DevOps pipeline creation, maintenance, and troubleshooting)
- Experience with on-call and alerting tools such as Jira Service Management or PagerDuty
- Advanced troubleshooting skills using Kibana Discover and Datadog APM to analyze logs, interpret stack traces, and diagnose problematic services
- Exposure to large-scale, geo-redundant architectures and a strong grasp of performance tuning and cost optimization in the cloud