System One is a leader in delivering outsourced services and workforce solutions across North America. The Senior DevOps Engineer will be responsible for mission critical support of applications and their infrastructure, ensuring proper monitoring, scale, and resiliency across environments from test to production.
Responsibilities:
- Design, build, and maintain Azure landing zones and platform services (e.g., VNet, Private Endpoints, Key Vault, Azure Firewall/NSGs, Application Gateway/WAF)
- Implement Infrastructure as Code (IaC) with Terraform and/or Bicep; enforce GitOps workflows (branching, PRs, policy checks)
- Create reusable modules, pipelines, and golden patterns for app teams; champion automation-first approaches
- Define and measure SLIs/SLOs, error budgets, and reliability roadmaps for critical services
- Implement and tune observability (logs, metrics, traces) using Azure Monitor, Log Analytics, Application Insights, and Prometheus/Grafana where applicable
- Conduct capacity planning, resiliency testing (chaos, failover, DR), and performance tuning across services
- Build secure, robust CI/CD pipelines (GitHub Actions / Azure DevOps Pipelines) with automated testing, scans, and approvals
- Standardize deployment strategies (blue/green, canary, rolling) for containerized and PaaS workloads
- Manage container platforms (AKS: node pools, cluster autoscaling, HPA/VPA, ingress, network policies) and registries (ACR)
- Implement guardrails using Azure Policy, RBAC, PIM, and Blueprints (or equivalent) to enforce least privilege and compliance (e.g., SOC 2, ISO 27001, HIPAA as relevant)
- Manage secrets and certificates (Key Vault) and integrate security testing (SAST/DAST/Container scanning) into pipelines
- Support vulnerability remediation and patching SLAs
- Own incident response, including rotational shifts and on-call; lead triage, root cause analysis (RCA), and post-incident reviews
- Optimize cost (FinOps), tagging standards, budgets, and proactive spending alerts
- Maintain runbooks, knowledge base articles, and automation for routine operations
- Act as a technical mentor; review designs/PRs; contribute to architecture decisions
- Partner with app teams to onboard workloads, define nonfunctional requirements, and drive platform adoption
- Manage and or Participate in the deployment and release of development, test and production software builds
- Manage the operations and monitoring of applications and infrastructure from dev to production
- Develop code and escalate break fix issues that may occur
- Task automation of infrastructure and application provisioning
- Ensure all environments meet scale and resiliency requirements
- Support customer facing and internal applications
- Monitor submitted tickets; assign, escalate and communicate, as required
- Participate in services and software systems design
- Participate in rotating on-call support duties
Requirements:
- 8 to 10 years of hands-on experience with Azure-based infrastructure and services in production
- Bachelor's degree in computer science (or related) or equivalent work experience required
- 3+ years of experience supporting enterprise level applications and their infrastructure
- Strong understanding of web and their related infrastructure technologies (load balancers, DNS, IIS or Apache/WebSphere, authentication and authorization, database connections, etc.)
- Experience implementing or managing Application Monitoring, Splunk, or Dynatrace
- Working knowledge in Automation technologies
- Familiarity with Agile/Scrum methodologies
- Deep expertise in several of: AKS, App Services, Functions, APIM, Azure SQL/MI, Cosmos DB, Storage, Event Hub/Service Bus, Redis, VNet/Peering, Private Link, Application Gateway/WAF, Front Door
- Strong IaC with Terraform (preferred) and/or Bicep; Git-based workflows; GitHub or Azure DevOps
- Proven SRE background: SLI/SLO design, error budgets, incident management, RCA, capacity and performance engineering
- CI/CD design and operations (GitHub Actions / Azure DevOps Pipelines); artifact/versioning strategies; release governance
- Observability with Azure: Monitor, Log Analytics, Application Insights, and alerting/automations (Action Groups, Logic Apps, Functions)
- Solid networking fundamentals (DNS, TLS, routing, firewalls, load balancing), identity (AAD/Entra ID), and secrets management (Key Vault)
- Scripting proficiency in PowerShell and/or Python; Linux fundamentals
- Understanding of security best practices: RBAC, PIM, Azure Policy, managed ide
- Good written, verbal, interpersonal and presentation skills. Ability to communicate among technical and non-technical employees, and process orientation skills
- Demonstrates a customer driven approach and good relationship management skills
- Ability to work autonomously and under deadlines
- Ability to multi-task, be highly organized, and work independently
- Ability to identify areas of improvement and come up with creative solutions
- Working knowledge of mainframe architecture is a plus
- At least 3 years coding/scripting JAVA, JavaScript and HTML