People In AI is a high-growth digital media and AI company that leverages generative technologies to create and distribute content globally. They are seeking a Senior AI Engineer to design and deploy AI-generated media systems, focusing on classification, tagging, and exploratory R&D in voice and image intelligence capabilities.
Responsibilities:
- Design, train, and deploy classification models across the media pipeline (e.g. style detection, quality scoring, moderation, semantic categorization)
- Build automated tagging and organization systems to enable structured, searchable media libraries
- Develop data pipelines including annotation tooling, dataset curation, and active learning workflows
- Lead R&D initiatives in AI voice and audio (e.g. TTS, voice cloning, audio synthesis), from evaluation to productionization
- Prototype and implement image intelligence capabilities (e.g. pose estimation, visual similarity, style transfer, avatar consistency)
- Create evaluation frameworks to track model performance, quality, and drift over time
- Optimize inference systems for latency and cost (batching, quantization, caching, serving strategies)
- Deploy models into production environments with robust APIs and GPU-backed infrastructure
Requirements:
- Experience building and deploying ML models in production
- Strong hands-on experience with classification, tagging, or content understanding systems
- Solid background in computer vision (e.g. CNNs, Vision Transformers, CLIP)
- Experience training models end-to-end: dataset creation, experimentation, tuning, and debugging
- Exposure to voice/audio AI (e.g. TTS, voice cloning, speech synthesis), whether in production or side projects
- Proficiency in Python with PyTorch or TensorFlow
- Experience with data labeling pipelines, annotation workflows, or active learning systems
- Understanding of production model serving (APIs, latency constraints, monitoring, drift detection)
- Familiarity with embeddings, vector search, or semantic retrieval systems
- Bonus: experience with diffusion models, GANs, generative audio, or inference optimization tools like ONNX/TensorRT