NTT DATA North America is a leading global innovator of business and technology services. They are seeking an SRE Engineer / Site Reliability Engineer Specialist to manage observability and drive incident response, ensuring reliability and performance improvements across their platforms.

Responsibilities:

Own and manage observability using New Relic (APM, infrastructure monitoring, dashboards, alerting)
Define and implement SLIs/SLOs and alerting strategies
Drive incident response, root-cause analysis (RCA), and post-mortems
Administer GitHub Enterprise (repos, branch protections, access control)
Design and maintain GitHub Actions CI/CD pipelines for Java/.NET applications
Support engineering teams with build, deployment, and pipeline reliability improvements
Contribute to code quality and security practices in CI/CD pipelines
Troubleshoot issues across application, infrastructure, and CI/CD layers
Drive continuous reliability and performance improvements
Leverage or support adoption of AI/automation in SRE workflows (alerting, incident triage, or productivity tools like Copilot)

Requirements:

8+ Years required in SRE platforms
5+ years of Hands-on experience with New Relic (or similar APM tools)
5+ years of Strong understanding of SRE practices (SLI/SLO, alerting, incident management)
5+ years of Experience with GitHub Enterprise and GitHub Actions
5+ years of CI/CD pipeline experience for Java or .NET applications
5+ years of Strong Experience in troubleshooting and root-cause analysis skills
5+ years of Experience supporting production workloads (application + infrastructure)
Basic exposure to AI-enabled tools (e.g., GitHub Copilot, observability insights, or automation tools)
Advanced New Relic capabilities (synthetics, dashboards, query/SQL monitoring)
Experience with JFrog Artifactory and/or Xray
SonarQube integration for static code analysis
GitHub Advanced Security (CodeQL, Dependabot, secret scanning)
Experience supporting Angular or multi-stack pipelines
Exposure to network monitoring
Hands-on use of AI/ML for SRE or DevOps automation (alert noise reduction, anomaly detection)
GitHub Copilot usage or governance
Exposure to ServiceNow ITOM (event mgmt, CMDB, discovery)
Experience with Databricks CI/CD pipelines
Familiarity with AIOps concepts (predictive alerting, intelligent incident response)

SRE Engineer / Site Reliability Engineer Specialist

Key skills

About this role

Responsibilities:

Requirements: