NVIDIA has been transforming computer graphics and accelerated computing for over 25 years, focusing on the potential of AI. They are seeking a Senior Software Engineer to work on their security team, responsible for securing infrastructure and AI cluster systems while collaborating with various teams to enhance security practices.
Responsibilities:
- Develop and lead incident response and disaster recovery plans for pre-production clusters
- Work with infrastructure, networking, storage, OS, firmware, and application teams to harden systems
- Train peers and users on practical security standard methodologies
- Build and enforce security controls, systems, and policies for cluster infrastructure of new NVIDIA hardware
- Identify, assess, and reduce cybersecurity risks; report major risks clearly to leadership
- Investigate security incidents and drive root-cause analysis
- Ensure systems meet IT, legal, regulatory, and information security standards
- Improve security governance, documentation, and audit readiness
Requirements:
- Experience in programming secure computing environments, with strong proficiency in C/C++
- Linux kernel hardening (SELinux/AppArmor) and observability (eBPF)
- Experience securing large-scale Linux infrastructure
- Proven understanding of security risks and how to reduce them
- Demonstrated understanding of incident response and breach handling
- Clear written and verbal communication with technical and leadership audiences
- Experience with compute and networking systems security architectures
- Experience in securing AI agents using sandboxing technologies and AI-based threat detection (e.g. Mythos)
- BS in Computer Science, Engineering, Cybersecurity, or equivalent experience
- 8+ yrs of relevant experience
- Developing secure software in Rust, prioritizing memory safety
- Experience with modern authentication and identity frameworks such as OAuth 2.1, OIDC, Kerberos, FIDO2/WebAuthn
- Experience with Microsoft Active Directory and Entra ID, including cross-realm trusts and identity federation (SCIMv2)
- Experience managing centralized Linux identity (FreeIPA/RHEL IdM/SSSD), including PKI lifecycle management and Host-Based Access Control
- Experience hardening HPC schedulers and storage, Slurm alongside parallel filesystems like Lustre and NFS
- Experience securing containerized workloads (Docker, Enroot, Kubernetes)
- Knowledge of high-speed fabric security like InfiniBand PKeys/MKeys
- Zero Trust, ZTNA, VRFs, VLANs, performance-optimized firewalls
- Use of advanced vulnerability management and supply chain mitigation (CVSS 4.0, SBOM)