Microsoft is a leading technology company empowering individuals and organizations globally. They are seeking an Infrastructure/Site Reliability Engineer to improve and maintain Azure's networking and infrastructure, ensuring high reliability and performance for mission-critical workloads.
Responsibilities:
- Design, build, produce and deliver software to improve the usability, reliability, scalability, performance, security and highly available infrastructure using Azure Networking services having independence, sense of ownership and drive for areas of ownership
- Own / Troubleshoot End to End hardware/software issues across L2/L3 networking stack, device OS, telemetry, and infrastructure dependencies
- Innovate on the Software-Defined Networking(SDN) platform to provide consistent connectivity for a heterogeneous mix of workloads
- Proactively identify and resolve production issues, performing root cause analysis and implementing long-term fixes to reduce operational toil
- Collaborate with multiple partner teams and gain broad exposure to core networking technologies end-to-end
- Develop, set up, and execute tests before feature releases
- Create monitoring and diagnostic tools to ensure quality of service
- Participate in on-call rotations, incident response, and postmortem analysis to ensure continuous learning and resilience improvement
Requirements:
- Bachelor's Degree in Computer Science or related technical field AND 2+ years technical engineering experience with coding in languages including, but not limited to, C, C++, C#, Java, JavaScript, or Python OR equivalent experience
- Ability to meet Microsoft, customer and/or government security screening requirements are required for this role. These requirements include, but are not limited to the following specialized security screenings: Microsoft Cloud Background Check: This position will be required to pass the Microsoft Cloud Background Check upon hire/transfer and every two years thereafter
- Bachelor's Degree in Computer Science OR related technical field AND 4+ years technical engineering experience with coding in languages including, but not limited to, C, C++, C#, Java, JavaScript, OR Python OR equivalent experience
- Master's Degree in Computer Science or related technical field AND 2+ years technical engineering experience with coding in languages including, but not limited to, C, C++, C#, Java, JavaScript, or Python OR equivalent experience
- Prior experience as an infrastructure, systems, or site reliability engineer in large-scale or high-availability environments
- Proficiency in cloud infrastructure concepts (compute, networking, storage, security) and in automating cluster or data center operations
- Experience working with large-scale infrastructure, data centers, or cloud platforms and a good understanding of L2/L3 networking
- Solid Understanding of Datacenter Networking and Cloud Environments
- Experience with monitoring/logging platforms and strong problem solving and software troubleshooting skills
- Understanding of system performance, incident response, and troubleshooting in production environments
- Strong experience and knowledge working with databases such as SQL / KQL etc