Sauron is a company focused on residential security through innovations in autonomous robots and self-driving cars. They are seeking an AI Inference Engineer to lead the development of low-latency inference engines and optimize performance for perception systems on edge devices.
Responsibilities:
- Lead the development and optimization of low-latency inference engines using TensorRT and ONNX, including authoring custom plugins to support cutting-edge architectures
- Design and maintain multithreaded video processing and streaming pipelines (RTSP, RTP, HLS) using GStreamer and DeepStream
- Collaborate closely with embedded engineers to integrate perception software with Yocto platforms, ensuring seamless hardware-software synergy
- Work with raw data from cameras and LiDAR to enable real-time data capture, obstacle detection, and avoidance
- Write and optimize custom CUDA kernels and perform low-level GPU tuning to maximize throughput and minimize power consumption
- Productionize proven prototypes from Jetpack into Yocto
- Apply advanced optimization techniques—including quantization (INT8/FP16), pruning, and distillation - to bring research-grade models to production-grade efficiency