Attentive is an AI marketing platform focused on redefining brand and customer connections through personalized messaging. The Engineering Leader for the ML Platform will build and lead a team responsible for developing foundational systems that facilitate the training, deployment, and maintenance of production-grade ML models at scale.
Responsibilities:
- Build and lead the ML Platform team (hiring, coaching, execution bar)
- Own the end-to-end ML platform roadmap and delivery: training + evaluation infrastructure, feature management, and standardized ML workflows
- Ship a clear “golden path” for ML development: CI/CD, champion/challenger rollouts, experimentation, model registry, and automated re-training
- Enable massively scalable deployments (batch and near-real-time), including rollout patterns (shadow/canary), robust contracts, and operational readiness (SLOs, runbooks, on-call)
- Lead ML observability and debugging across the stack (data quality, drift, performance, latency, cost), leveraging Ray + AnyScale
- Partner across ML Eng, Data Science, Analytics Eng, and Infrastructure to increase velocity and develop a world-class standard for Machine Learning integrations
- Drive cost and capacity efficiency for distributed compute (scheduling, resource governance, spend visibility)
Requirements:
- 6+ years in software/data/ML-infra engineering, including 2+ years people management; experience building shared platforms adopted by multiple ML teams
- Strong distributed systems + cloud fundamentals; comfortable owning reliability (SLOs, incidents, on-call maturity)
- Production ML platform experience across the lifecycle: feature pipelines, training/eval, deployment, and monitoring
- Hands-on experience with distributed computing (Ray, Spark, Dask) with Ray and/or AnyScale for distributed workloads and observability
- Familiar with orchestration (Metaflow, Airflow, Dagster, etc., data transformation pipelines (dbt), containerization (Docker/Kubernetes), and modern CI/CD practices
- Experience with ML observability and lifecycle management (MLflow)
- Clear technical leadership: roadmap setting, stakeholder alignment, and ability to get into the weeds when needed
- Snowflake-native experience for modeling-ready data and feature management (data modeling, backfills, point-in-time correctness, governance/lineage)