Kargo is a company that creates powerful moments of connection between brands and consumers to build businesses. They are seeking a Staff Machine Learning Engineer who will own the design, deployment, and ongoing health of machine learning systems that drive revenue, bridging data science and production engineering.
Responsibilities:
- Production ML models are reliable, monitored, and continuously improving — Active models have monitoring and alerting in place for drift and degradation; performance metrics are tracked and optimization is ongoing rather than reactive
- CI/CD pipelines accelerate model deployment — Model versioning, updates, and deployment are automated end-to-end; the time from model validation to production is measurably shorter than at hire
- At least one high-impact ML system is shipped and driving measurable business outcomes — A new or significantly improved model — bid optimization, CTR prediction, or audience targeting — is live in production and tied to a documented revenue or efficiency impact
- ML infrastructure is scalable and cost-efficient — Data pipelines, feature stores, and cloud tooling (AWS, Snowflake, Databricks) are optimized for both performance and cost; infrastructure decisions are made with operational sustainability in mind
- Cross-functional delivery is smooth and low-friction — Data science, engineering, and product teams are working from shared standards; integrations don't require heroic coordination and knowledge is documented rather than siloed
Requirements:
- 6+ years of experience building and deploying ML models in production environments — has owned the full lifecycle from training through inference, monitoring, and iteration
- Strong proficiency in Python and SQL for model development, data manipulation, and pipeline work; experience with Spark for large-scale distributed data processing
- Hands-on experience with AWS (S3, EC2, Lambda, SageMaker) and cloud-native ML workflows; comfortable provisioning and managing cloud infrastructure programmatically
- Familiarity with the MLOps stack — Databricks, Feature Stores, Kubernetes, Kubeflow, MLflow — and how these tools fit together in a production ML system
- Experience building both offline and online training and inference pipelines for real-time systems with latency and throughput constraints
- Strong understanding of CI/CD practices applied to ML — model versioning, automated testing, deployment pipelines, and rollback strategies
- Ad tech or digital advertising experience — familiarity with auction dynamics, bid optimization, CTR/viewability prediction, or audience targeting models
- Experience with Go for performance-sensitive production services