About this role

Sauron is a company focused on residential security through innovations in autonomous robots and self-driving cars. They are seeking an AI Inference Engineer to lead the development of low-latency inference engines and optimize performance for perception systems on edge devices.

Responsibilities:

Lead the development and optimization of low-latency inference engines using TensorRT and ONNX, including authoring custom plugins to support cutting-edge architectures
Design and maintain multithreaded video processing and streaming pipelines (RTSP, RTP, HLS) using GStreamer and DeepStream
Collaborate closely with embedded engineers to integrate perception software with Yocto platforms, ensuring seamless hardware-software synergy
Work with raw data from cameras and LiDAR to enable real-time data capture, obstacle detection, and avoidance
Write and optimize custom CUDA kernels and perform low-level GPU tuning to maximize throughput and minimize power consumption
Productionize proven prototypes from Jetpack into Yocto
Apply advanced optimization techniques—including quantization (INT8/FP16), pruning, and distillation - to bring research-grade models to production-grade efficiency

AI Inference Engineer

Key skills

About this role

Responsibilities: