NVIDIA is a leading technology company known for its innovation in computer graphics and AI. They are seeking a highly skilled Senior Staff Software Engineer to enhance their IT Compute platform architecture and optimize performance across on-prem and cloud services.
Responsibilities:
- Lead initiatives to transform IT Compute platform architecture to build new service offerings across On-Prem & Cloud
- Define and implement metrics to measure the efficiency of compute platforms & services and drive efficiency
- Collect and review system data for capacity and planning purposes, analyze capacity data and develop plans for appropriate level enterprise-wide systems, and coordinate with management personnel in implementing changes
- Develop and maintain tools for collecting, analyzing, and visualizing data for reporting, alerting, monitoring
- Collaborate with NVIDIA leadership, senior engineers, program managers, and product managers to develop compelling IT products and services that meet customer needs
Requirements:
- Bachelor's degree in Engineering, Computer Science, Mathematics, or related field, or equivalent experience
- 12+ years of proven experience in compute platform engineering with a focus on automation
- Proven experience in designing and deploying virtualization architectures
- In-depth knowledge of hardware technologies, including SR-IOV, DPU, and GPU, with a track record of implementing these in virtualized and containerized environments
- Proven experience evaluating existing application architectures and identify opportunities for containerization to improve scalability, reliability, and efficiency
- Strong analytical skills with the ability to define and track key performance metrics
- Experience in developing tools for data analysis and performance profiling, Development with Terraform, Config Management tools
- Proficiency in programming languages such as Go and/or Python
- Experience with running large environments consisting of BareMetal, large scale virtualized environments with a mix of tens of thousands of VM's and cloud infrastructure
- Deep understanding of other infrastructure components like Storage, DNS, LDAP, Security Tools etc
- Hands-on experience with cloud platforms such as AWS, Azure, or Google Cloud Platform
- Solid understanding of microservices architecture, infrastructure as code (IaC) and configuration management tools