Sciforium is an AI infrastructure company that develops advanced AI models and operates a proprietary serving platform. The role involves architecting and leading the development of the next-generation model serving platform, while also mentoring engineers and influencing engineering direction.
Responsibilities:
- Lead the technical direction of the model serving platform, owning architecture decisions and guiding engineering execution
- Build core serving components including execution runtimes, batching, scheduling, and distributed inference systems
- Develop high-performance C++ and CUDA/HIP modules, including custom GPU kernels and memory-optimized runtimes
- Collaborate with ML researchers to productionize new multimodal models and ensure low-latency, scalable inference
- Build Python APIs and services that expose model capabilities to downstream applications
- Mentor and support other engineers through code reviews, design discussions, and hands-on technical guidance
- Drive performance profiling, benchmarking, and observability across the inference stack
- Ensure high reliability and maintainability through testing, monitoring, and engineering best practices
- Troubleshoot and resolve complex issues across GPU, runtime, and service layers