Fluidstack is building the infrastructure for abundant intelligence, partnering with top AI labs and enterprises to unlock compute capabilities. They are seeking a Product Manager to own the AI platform roadmap, focusing on managed inference and agent platforms, while balancing customer needs with operational realities.
Responsibilities:
- Own the product strategy and roadmap for managed inference services, including model deployment, autoscaling, multi-LoRA serving, and inference optimization
- Define requirements for agent platform capabilities: structured outputs, function calling, memory primitives, tool integration, and multi-step reasoning workflows
- Drive decisions on which inference optimizations to prioritize: speculative decoding, continuous batching, KV cache management, quantization support, and custom kernel integration
- Partner with ML infrastructure engineers to design APIs, SDKs, and deployment workflows that support model fine-tuning, version management, and A/B testing
- Work with datacenter teams to optimize GPU allocation strategies—balancing dedicated vs. serverless deployments, cold start latency, and cost-per-token economics
- Analyze competitive offerings from Together AI (inference optimization stack), Fireworks (custom inference engine), Baseten (training-to-inference integration), and Modal (serverless architecture)
- Define pricing models that align with customer usage patterns (tokens, requests, GPU-hours) while maintaining healthy unit economics
- Conduct customer research to understand inference workload requirements: latency SLAs, throughput targets, model size constraints, and integration needs
- Translate customer feedback into feature specifications—including support for new model architectures, framework integrations (vLLM, TensorRT-LLM, TGI), and observability tooling
- Build go-to-market materials: reference architectures, performance benchmarks, cost calculators, and migration guides for customers moving from self-hosted or competing platforms
Requirements:
- 5+ years product management experience with at least 3 years focused on AI/ML infrastructure, inference platforms, or developer tools
- Strong technical understanding of transformer architectures, inference optimization techniques, and production ML systems
- Experience building products for technical users deploying LLMs in production (ML engineers, research scientists, AI application developers)
- Track record of shipping features that improved inference latency, throughput, or cost efficiency—backed by quantitative metrics
- Deep familiarity with the inference ecosystem: serving frameworks (vLLM, TensorRT-LLM, TGI), model formats (GGUF, SafeTensors), and API standards (OpenAI-compatible endpoints)
- Understanding of GPU memory constraints, batching strategies, and the tradeoffs between latency-optimized vs. throughput-optimized serving
- Ability to translate complex technical concepts (speculative decoding, PagedAttention, Multi-LoRA) into clear customer value propositions
- Experience conducting competitive analysis in the inference market, including pricing elasticity, feature differentiation, and customer acquisition patterns
- Comfortable working with engineering teams to debug performance bottlenecks, analyze profiling data, and prioritize kernel-level optimizations
- Experience with agent frameworks (LangChain, LlamaIndex, AutoGPT), compound AI patterns, or model fine-tuning workflows