Stitch Fix is the leading online personal styling service that helps people discover styles that fit perfectly. The Manager of Data & AI Platform Engineering will lead the organization managing engineers on core data, machine learning, and generative AI platforms, contributing to the technical execution for AI-powered, data-driven experiences across the company.
Responsibilities:
- Lead in a player-coach capacity in execution for Stitch Fix’s next-gen Data, ML, and GenAI platforms - building a unified, secure and scalable architecture for semantic search, retrieval-based intelligence, multi-model orchestration, and agent automation, while operationalizing GenAI through safe, performant, and production-ready systems that power real-world client and employee experiences
- Contribute towards modernization of data and ML foundations to support unified signals, adaptive models, experimentation velocity, and scalable AI/ML workloads
- Provide foundational APIs, SDKs, frameworks, and self-service tools that make it easy for data scientists, ML engineers, analysts, and application teams to build and deploy AI solutions quickly, safely, and at scale
- Partner with Data Science, Engineering, and Product teams to translate Data/ML/GenAI platform capabilities into production-grade features and intelligent experiences that deliver measurable business value
- Drive responsible AI and data adoption by creating reusable templates, documentation, and enablement programs, and by partnering closely with technology and business teams to identify and prioritize high-impact opportunities for personalization, automation, and intelligence
- Contribute towards improving governance practices including data contracts, lineage, metric definitions, access policies, and responsible AI guardrails - for trust, safety, and compliance
- Ensure operational excellence through platform reliability, performance, observability, cost efficiency, and simplification of legacy systems
- Lead and develop high-performing engineering teams fostering a culture of clarity, excellence, and trust
- Balance speed of innovation with platform stability, ensuring engineering efforts are tightly aligned to business priorities and long-term client value
Requirements:
- 5+ years in software, data, ML, or platform engineering; 1+ years leading engineering individual contributors is a plus
- Demonstrated success contributing towards large-scale data platforms, ML platforms, or AI/GenAI platforms in cloud environments
- Experience delivering platform modernization, unification, and multi-year architectural transformation
- Strong software engineering foundation, with experience designing and building large-scale distributed systems and resilient, high-quality APIs and services using modern programming languages and cloud-native architectures
- Track record operating and evolving modern data infrastructure, including some of the following: distributed compute and storage technologies (Spark, Trino, Iceberg), real-time processing frameworks (Kafka/Flink), metadata / catalog systems, and Kubernetes-based orchestration
- Expertise across the ML lifecycle - feature engineering, training pipelines, model deployment and serving, monitoring, validation, fine-tuning, and MLOps best practices
- Proven capability in building self-service platform abstractions and tooling that enable teams to develop, experiment, and deploy data and ML products efficiently
- Experience with modern GenAI architectures - semantic retrieval, knowledge-grounded indexing, LLM orchestration, agent workflows, and evaluation frameworks
- Familiarity with modern ML frameworks like PyTorch and Ray is a plus
- Strategic thinker able to align platform investments with business priorities and emerging AI opportunities
- Potential to be a strong people leader with a track record of contributing to make inclusive, high-performing engineering teams
- Excellent communicator who can influence both technical and business stakeholders across domains