Federal Express Corporation is seeking a Senior Site Reliability Engineering Analyst to enhance system reliability and performance. This role will lead initiatives to improve observability, automation, and operational best practices while collaborating with engineering teams.

Responsibilities:

Lead reliability and performance improvements, including capacity planning, failover strategies, and MTTA/MTTR reduction
Develop technical solutions for complex system issues and resilience gaps
Assess reliability risks and recommend enhancements to ensure service continuity
Refine and promote best practices for reliability, maintainability, and scalability
Mentor team members and provide technical guidance
Recommend engineering improvements that drive consistency and long-term stability
Improve monitoring, alerting, and observability to strengthen system awareness
Support incident response and RCA activities to ensure effective resolution
Document incident learnings and share knowledge across teams supporting Agile Release Train(s)
Partner with development, operations, and architecture teams to integrate reliability into system design and delivery
Reduce operational toil through automation and process optimization
Enhance engineering workflows, CI/CD pipelines, and readiness practices
Perform additional responsibilities as required to support organizational goals

Requirements:

Strong written and verbal communication skills
Ability to analyze complex technical problems and implement effective solutions
Solid understanding of distributed systems, cloud environments, and modern application architectures
Hands-on experience with observability platforms (Dynatrace required)
Experience with monitoring, incident management, and RCA practices
Ability to lead initiatives independently and collaborate across teams
Demonstrated focus on reliability, resiliency, automation, and continuous improvement
Development experience (e.g., Python, Java, scripting for automation)
Cloud expertise (e.g., Azure, GCP) including deployment, architecture, and operations
Bachelor's Degree in Computer Science, Engineering, Information Systems and/or related field or equivalent
Five (5) or more years equivalent work experience in information technology or engineering environment
Experience with AI/ML-powered monitoring, automation, or incident prediction
Familiarity with SRE-aligned frameworks such as SLIs/SLOs, error budgets, and reliability patterns

Sr Site Reliability Engineering Analyst

Key skills

About this role

Responsibilities:

Requirements: