Necessary Ventures is the leading developer of Embodied AI technology, aiming to create autonomy that propels the world forward. They are seeking a Staff ML Performance Engineer to optimize large scale ML jobs, enhancing training efficiency and model performance for automated driving systems.
Responsibilities:
- Profile ML workloads to identify their bottlenecks, e.g. using NVIDIA Nsight Systems
- Design and implement efficiency improvements to maximize MFU and throughput, e.g. parallelism, model compilation, mixed precision
- Design and implement observability tools to identify bottlenecks and drive performance improvements, e.g. to track MFU, throughput, latency, etc
- Design and implement benchmarking tools, e.g. to track efficiency gains or regressions
- Collaborate closely with Research teams to integrate training efficiency improvements and create a culture of performance optimization