NVIDIA is a leading technology company focused on building the future through innovative AI solutions. They are seeking a Senior Generative AI Software Engineer to own and evolve their Cosmos open-source and internal research codebases, craft core infrastructure for foundation model research, and ensure code quality and maintainability across the team.
Responsibilities:
- You will own and evolve the Cosmos open-source and internal research codebases, crafting core infrastructure that supports our foundation model research and deployment
- Refactor and modularize large research-driven code into clean, testable, maintainable libraries for use across teams
- Integrate and adapt off-the-shelf models into our pipelines as preprocessors, postprocessors, or evaluation components
- Build model-serving endpoints (e.g., with Gradio or FastAPI) to enable researchers and internal users to experiment with models interactively
- Design, implement, and maintain evaluation pipelines, providing high-quality tooling to the broader team to measure model quality and track improvements
- Improve configuration hygiene and reproducibility using systems like Hydra, and ensure smooth overrides, templates, and environment switching
- Lead efforts in packaging and release of Python modules using modern tools (uv, just, pydantic) for both OSS and internal consumption
- Set the standard for code health, test coverage, and release readiness across the team. Write documentation and automation to scale good practices
Requirements:
- Expert-level proficiency in Python, with a strong foundation in modular design, abstraction boundaries, and collaborative codebase evolution
- Fluency with PyTorch, including the ability to run, debug, and patch inference-time model behavior in research-level codebases. Comfort modifying pre/post-processors, model wrappers, and checkpoint logic
- Proven experience in refactoring large codebases—cleaning up legacy implementations, eliminating anti-patterns, and paying down tech debt to improve long-term maintainability
- Strong grasp of configuration systems, especially Hydra, with an emphasis on reproducibility, override logic, and environment scoping
- Familiarity with Python packaging tools like uv, just, and pydantic, including experience managing environment consistency and shipping libraries as artifacts
- Strong instincts around code health: API design, directory structure, writing unit and integration tests, exception hygiene, docstrings, and dependency isolation
- Comfortable deploying models internally via Gradio or similar frameworks to enable interactive evaluation and feedback from researchers or downstream users
- BS or MS (or equivalent experience) in Computer Science, Software Engineering, or a related technical field and 10+ years of industry experience
- Proficiency in model configs, especially Hydra! Comfortable crafting hierarchical config systems with reusable templates, environment scoping, and overrides for evaluation, inference, or release
- Prior work cleaning up sophisticated generative model codebases—adding tests, improving wrappers, and instrumenting code for observability and debugging
- Demonstrated success raising engineering quality in a research setting: taking exploratory code and evolving it into a robust, production-friendly module
- Track record of mentoring teammates on software engineering best practices and proactively identifying long-term structural risks in fast-moving teams
- Passion for building ML tooling that is not only functional, but also elegant, intuitive, and maintainable by others