Package and containerize AI models (Python/Docker) from research into clean, versioned services with well-defined APIs
Own the engineering side of the tech transfer process: inference specs, environment setup, model mocking, and integration scaffolding
Collaborate closely with research scientists on quantization, TensorRT compilation, and hitting latency budgets (e.g. <200ms real-time response targets)
Maintain and operate model services in production, debugging stability and performance issues under load
Contribute to the dual-track delivery model — keeping the engineering platform moving even while research is still iterating

Strong Python engineering skills; comfortable writing production-grade, maintainable code
Comfortable owning ambiguous model-to-service transfers end-to-end with a high degree of autonomy
Hands-on experience deploying ML/AI models (inference pipelines, serving frameworks)
Familiarity with GPU workloads, containerization, and model optimization concepts
Ability to read and work directly with research code and translate it into reliable services
Bonus: experience with video/image generation models, TensorRT, or real-time streaming pipelines

Research Engineer

Key skills