NVIDIA AI is transforming how the world uses AI, cloud, and accelerated computing. They are seeking a Senior Software Engineer for their Attestation Services team to design and manage secure cloud services that ensure the integrity of NVIDIA platforms.
Responsibilities:
- Your main focus will be on building and managing our core attestation cloud services. Day-to-day responsibilities include crafting APIs and integrations, boosting reliability, and working alongside NVIDIA teams to convert hardware trust mechanisms and standards into production-ready solutions. You will contribute significantly to shaping how customers verify that NVIDIA platforms are secure and prepared for their workloads
- Crafting and evolving attestation cloud services, APIs, and SDK/CLI integration points that confirm the integrity of NVIDIA platforms across data center, AI, networking, and partner environments
- Improving reliability and operational maturity through SLOs/SLIs, alerting, runbooks, incident response, and safe rollout practices
- Crafting resilient service behavior that handles dependency failures, caching challenges, regional issues, customer-side resilience needs, and graceful degradation
- Architecting trust-material distribution for certificate status, revocation updates, signed metadata, RIM artifacts, trust bundles, offline verification, and customer-managed resilience models
- Implementing appraisal, policy, and verification workflows that evaluate attestation evidence against endorsements, reference values, certificate status, and security requirements
- Providing encouraging technical leadership and mentoring on distributed system development, production debugging, reliability tradeoffs, observability, and operational simplicity
Requirements:
- BS or MS in Computer Science, Information Security, or a related field, or equivalent experience
- 12 + years of proven experience designing and building large-scale distributed systems or cloud services, including at least 3 years in security, attestation, or trusted computing
- Experience owning services in production, including monitoring, alerting, incident response, root-cause analysis, and long-term operational improvements
- Experience developing and maintaining REST and/or gRPC APIs, microservices, control planes, background tasks, caches, queues, data stores, and customer-facing integrations
- Proficiency in Go or Java (our primary service languages); experience with C++ or Rust is a plus
- Hands-on experience with cloud-native platforms such as Kubernetes and containers, CI/CD, infrastructure as code, observability tools, and at least one major cloud provider (AWS, GCP, Azure, or OCI)
- Knowledge of distributed systems practices such as retries, timeouts, circuit breakers, backpressure, rate limiting, caching, consistency, idempotency, failover, and dependency isolation
- Understanding of security fundamentals including PKI, TLS/mTLS, certificate lifecycle, signing, secrets management, authentication and authorization, and secure service-to-service communication
- Interest in security-critical systems, trust infrastructure, secure service development, and learning more about attestation or confidential computing
- Hands-on experience operating customer-facing cloud services with formal SLOs, error budgets, safe deployment systems, production-readiness reviews, resilience testing, or chaos/game-day practices
- Experience with attestation, confidential computing, trusted execution environments, TPMs, DICE, SPDM, IETF RATS, EAT, CoRIM, HSMs, hardware roots of trust, secure boot, GPU/accelerator attestation, or related open-source and standards efforts
- Experience in services related to identity, PKI, certificate status and revocation, signing, secrets management, trust-material distribution, or control-plane reliability
- Experience crafting services for multi-region, multi-cloud, sovereign, GovCloud, edge, disconnected, air-gapped, or customer-hosted environments