Architect and evolve scalable CI/CD pipelines using GitHub Actions and Azure DevOps to support multi-service deployments.
Own and operate production infrastructure in Azure Kubernetes Service (AKS), including capacity planning, secrets management, and rollout strategies.
Implement and manage Infrastructure as Code with Terraform, promoting reuse, modularity, and collaboration across teams.
Lead observability efforts using DataDog , setting up actionable SLOs/SLIs, monitors, APM instrumentation, and dashboards for various personas (developers, SREs, leadership).
Design secure and scalable messaging patterns using Azure Service Bus and Event Hub, ensuring ordering, fault tolerance, and performance.
Define and enforce DevSecOps best practices across environments—handling RBAC, audit logging, access controls, and automated policy enforcement.
Mentor other DevOps and engineering team members; drive incident postmortems, root cause analysis, and process improvements.
Collaborate closely with application developers, QA, product managers, and architecture teams to drive operational excellence across all stages of software delivery.
Requirements
6 + years of DevOps/SRE experience, with at least 2+ years in a senior or lead capacity.
Expert-level knowledge of Git and GitHub workflows, including automation, hooks, access control, and GitOps practices.
Deep understanding of Kubernetes primitives (pods, jobs, services, config maps, volumes) and advanced concepts like init /sidecar containers, traffic draining, and fault recovery.
Strong experience building resilient CI/CD pipelines with conditional logic, approvals, and rollback mechanisms.
Proficient in managing Terraform state and modules in collaborative, multi-team environments.
Demonstrated expertise in securing and operating Azure services including Service Bus, Event Hub, Logic Apps, and Cognitive Search.
Hands-on experience with DataDog APM, monitors, SLOs, and logging integrations.
Track record of designing for high availability, scalability, and observability in production cloud systems.