Fluidstack is building the infrastructure for abundant intelligence, partnering with top AI labs and enterprises to unlock compute capabilities. They are seeking a Product Manager to own the AI platform roadmap, focusing on managed inference and agent platforms, while balancing customer needs with operational realities.

Responsibilities:

Own the product strategy and roadmap for managed inference services, including model deployment, autoscaling, multi-LoRA serving, and inference optimization
Define requirements for agent platform capabilities: structured outputs, function calling, memory primitives, tool integration, and multi-step reasoning workflows
Drive decisions on which inference optimizations to prioritize: speculative decoding, continuous batching, KV cache management, quantization support, and custom kernel integration
Partner with ML infrastructure engineers to design APIs, SDKs, and deployment workflows that support model fine-tuning, version management, and A/B testing
Work with datacenter teams to optimize GPU allocation strategies—balancing dedicated vs. serverless deployments, cold start latency, and cost-per-token economics
Analyze competitive offerings from Together AI (inference optimization stack), Fireworks (custom inference engine), Baseten (training-to-inference integration), and Modal (serverless architecture)
Define pricing models that align with customer usage patterns (tokens, requests, GPU-hours) while maintaining healthy unit economics
Conduct customer research to understand inference workload requirements: latency SLAs, throughput targets, model size constraints, and integration needs
Translate customer feedback into feature specifications—including support for new model architectures, framework integrations (vLLM, TensorRT-LLM, TGI), and observability tooling
Build go-to-market materials: reference architectures, performance benchmarks, cost calculators, and migration guides for customers moving from self-hosted or competing platforms

Requirements:

5+ years product management experience with at least 3 years focused on AI/ML infrastructure, inference platforms, or developer tools
Strong technical understanding of transformer architectures, inference optimization techniques, and production ML systems
Experience building products for technical users deploying LLMs in production (ML engineers, research scientists, AI application developers)
Track record of shipping features that improved inference latency, throughput, or cost efficiency—backed by quantitative metrics
Deep familiarity with the inference ecosystem: serving frameworks (vLLM, TensorRT-LLM, TGI), model formats (GGUF, SafeTensors), and API standards (OpenAI-compatible endpoints)
Understanding of GPU memory constraints, batching strategies, and the tradeoffs between latency-optimized vs. throughput-optimized serving
Ability to translate complex technical concepts (speculative decoding, PagedAttention, Multi-LoRA) into clear customer value propositions
Experience conducting competitive analysis in the inference market, including pricing elasticity, feature differentiation, and customer acquisition patterns
Comfortable working with engineering teams to debug performance bottlenecks, analyze profiling data, and prioritize kernel-level optimizations
Experience with agent frameworks (LangChain, LlamaIndex, AutoGPT), compound AI patterns, or model fine-tuning workflows

Product Manager, AI Platform

Key skills

About this role

Responsibilities:

Requirements: