Twilio is a company focused on shaping the future of communications through innovative solutions for businesses. The Machine Learning Engineer will drive the development of cutting-edge products, collaborating with cross-functional teams to create scalable ML-based systems that enhance customer experiences.
Responsibilities:
- Partner with product, UX, and technical stakeholders to analyze business problems, clarify requirements, define scope, and translate them into measurable ML problem statements
- Design, implement, and maintain scalable, enterprise-grade ML solutions in production
- Build reproducible ML workflows for data preparation, training, evaluation, and inference using modern orchestration and MLOps tooling
- Implement monitoring and evaluation frameworks to continuously improve data quality, model performance, latency, and cost through feedback loops
- Partner cross-functionally with Product, Data Science/ML, Engineering, and Security to deliver resilient, scalable, and compliant ML-powered services
- Demonstrate end-to-end systems understanding and articulate the 'why' behind model and system design choices
- Own operational excellence: SLAs, on-call, incident response, customer feedback triage, and blameless post-mortems
- Drive engineering excellence via AI-assisted SDLC, code reviews, automated testing, MLOps best practices, knowledge-sharing, and mentoring
- Actively adopt AI-assisted practices to improve implementation and collaboration efficiency
Requirements:
- Strong foundation in ML/AI (statistics, probability, optimization) with the ability to apply these concepts to real-world problems
- 5+ years of experience building, deploying, and operating data and ML systems in production
- Proficient in Python, Java, and SQL; strong software engineering fundamentals (system design, testing, version control, code reviews)
- Hands-on experience with workflow orchestration and data pipelines (e.g., Airflow, Kubeflow) and cloud data platforms/storage (e.g., SageMaker Feature Store, Snowflake, DynamoDB, OpenSearch)
- Experience with the ML lifecycle and MLOps tooling (e.g., MLflow, Metaflow, SageMaker; LLM/agent frameworks such as LangChain/LangGraph; model evaluation/observability tools such as Galileo or similar)
- Working knowledge of containerization and cloud infrastructure, including Docker and Kubernetes, GitOps/CI/CD tools (e.g., Argo CD), and at least one major cloud platform (AWS, GCP, or Azure)
- Understanding of data modeling and scalable systems, including distributed computing and streaming frameworks (e.g., Spark/EMR, Flink, Kafka Streams); familiarity with GPU-based implementation is a plus
- Demonstrated ability to ramp up quickly and operate effectively in new application/business domains
- Strong written and verbal communication skills: able to document and present designs and decisions, and comfortable giving/receiving feedback in an Agile environment
- Familiarity with ML problem areas and techniques, including recommendation systems (e.g., graph-based approaches, two-tower models), time-series modeling (classical and deep learning), representation learning (e.g., embeddings), anomaly detection, and causal inference
- Practical experience with LLMs and generative AI workflows, including foundation model fine-tuning, RAG, and vector databases
- Evidence of technical leadership/impact, such as contributions to open-source data/ML projects and/or published technical presentations, blog posts, papers, or research
- Domain experience (plus) in communications, marketing automation, or customer engagement analytics
- Familiarity with AI-assisted development tools (e.g., Claude, GitHub Copilot/Codex, Cursor, etc.)
- Advanced degree preferred (M.S. or Ph.D.) in a relevant field