Microsoft is seeking a Product Manager II for their Azure High-Performance Computing/Artificial Intelligence team to drive networking for the largest AI training supercomputers. In this role, you will collaborate with various teams to ensure maximum operational uptime and AI workload throughput for supercomputers, while also developing your skills in AI supercomputing and business acumen.
Responsibilities:
- Drive, track, and publish success criteria for backend networking of ultra large scale AI supercomputers
- Your primary objective, shared with colleagues and partner teams, is to drive maximum operational uptime and AI workload throughput of some of the largest supercomputers on the planet
- Identify leading and/or unique points of failure affecting your primary goal and associated KPIs, and drive remediations and roadmap changes to address those issues
- Work across and build trust among a V-team of supercomputing product groups, datacenter site operators, quality control specialists, vendors, business leaders, and customers to achieve your objectives
Requirements:
- Bachelor's Degree AND 2+ years experience in product/service/project/program management or software development OR equivalent experience
- Ability to meet Microsoft, customer and/or government security screening requirements are required for this role. These requirements include, but are not limited to the following specialized security screenings: - Microsoft Cloud Background Check: This position will be required to pass the Microsoft Cloud Background Check upon hire/transfer and every two years thereafter
- Bachelor's Degree AND 5 years of experience in product/service/project/program management or software development OR equivalent experience
- 5+ years experience in operating production supercomputers
- 5+ years experience improving product metrics for a product, feature, or experience in a market (e.g., growing customer base, expanding customer usage, avoiding customer churn)
- Familiarity with RoCE v2, InfiniBand, UCX, MPI, NCCL, RCCL, and distributed memory compute workloads
- Ability to work overlapping hours with East Coast teams (EST)