Package and containerize AI models (Python/Docker) from research into clean, versioned services with well-defined APIs
Own the engineering side of the tech transfer process: inference specs, environment setup, model mocking, and integration scaffolding
Collaborate closely with research scientists on quantization, TensorRT compilation, and hitting latency budgets (e.g. <200ms real-time response targets)
Maintain and operate model services in production, debugging stability and performance issues under load
Contribute to the dual-track delivery model — keeping the engineering platform moving even while research is still iterating