Capital One is an industry leader in using machine learning to create real-time, personalized customer experiences. They are seeking a Sr. Distinguished Machine Learning Engineer to define and drive the technical strategy for their Personalization Platform, collaborate with cross-functional teams, and develop robust ML infrastructure to enhance customer interactions.
Responsibilities:
- Define and drive technical strategy and roadmap for our Personalization Platform that powers real-time, personalized product experiences and multi-channel targeted user messaging across all Capital One products and services
- Partner cross-functionally with Product, Data science, Cloud infrastructure, and Machine learning platform teams to align on and co-develop the advanced recommendation systems and algorithms serving our Capital One users
- Develop and maintain a flexible, scalable rules engine to enable business-driven personalization logic, allowing dynamic configuration of user segmentation, targeting rules, and real-time decisioning while integrating seamlessly with ML-driven recommendations
- Design, build and maintain robust ML infrastructure and pipelines to support end-to-end workflows including feature extraction, model training, testing, guardrails, model evaluation, deployment, and both real-time and batch inference - ensuring high performance, scalability, and reliability
- Architect low-latency, event-driven systems for enabling real-time dynamic personalization and decisioning based on streaming data, user behavior, and contextual signals
- Drive the evolution of MLOps practices by building automated metrics-backed deployment workflows, integration validation and testing systems, and scalable monitoring & observability
- Invent and introduce state-of-the-art LLM optimization techniques to improve the performance — scalability, cost, latency, throughput — of large scale production AI systems
- Leverage a broad stack of Open Source and SaaS AI technologies such as AWS Ultraclusters, Huggingface, VectorDBs, Nemo Guardrails, PyTorch, and more
- Provide organizational technical leadership to influence architecture, engineering standards, cross-team strategies, mentoring engineers and driving organization wide platform innovation
Requirements:
- Bachelor's degree
- At least 10 years of experience designing and building data-intensive solutions using distributed computing
- At least 7 years of experience programming in C, C++, Python, or Scala
- At least 4 years of experience with the full ML development lifecycle using modern technology in a business critical setting
- 8+ years of experience deploying scalable, responsible AI solutions on major cloud platforms (AWS, GCP, Azure)
- Master's or PhD in Computer Science or a relevant technical field
- 5+ years of proven expertise in designing, implementing and scaling personalization platform and recommendation systems serving one or more areas of Feed Personalization/Ads Ranking/Targeted Marketing Messaging
- 5+ years of strong proficiency in Python, Java, C++, or Golang
- hands-on experience with ML frameworks (PyTorch, TensorFlow) and orchestration tools (Databricks, Airflow, Kubeflow)
- 5+ years of experience developing and applying state-of-the-art techniques for optimizing training and inference systems to improve hardware utilization, latency, throughput, and cost
- 5+ years of deep expertise in cloud-native engineering, containerization (Docker, Kubernetes), and automated CI/CD deployment
- Passion for staying on top of the latest AI research and AI systems, and judiciously apply novel techniques in production
- Excellent communication and presentation skills, with the ability to articulate complex AI concepts to peers
- Proven leadership in driving platform strategy, fostering cross-functional collaboration, and influencing technical direction across the company