Design, Build, and Maintain Scalable ML Infrastructure: Lead the design and development of scalable machine learning infrastructure on AWS, utilizing services like AWS Sagemaker for efficient model training and deployment.
Collaborate with Product Teams: Work closely with product teams to develop MVPs for AI-driven features, ensuring quick iterations and market testing to refine solutions effectively.
Develop Monitoring & Alerting Frameworks: Create and enhance monitoring and alerting systems for machine learning models to ensure high performance, reliability, and minimal downtime.
Support Cross-Departmental AI Utilization: Enable various departments within the organization to leverage AI/ML models, including cutting-edge Generative AI solutions, for different use cases.
Provide Production Support: Offer expertise in debugging and resolving issues related to machine learning models in production, participating in on-call rotations for operational troubleshooting and incident resolution.
Scale ML Architecture: Design and scale machine learning architecture to support rapid user growth, leveraging deep knowledge of AWS and ML best practices to ensure robustness and efficiency.
Mentor and Elevate Team Skills: Conduct code reviews, mentor team members, and elevate overall team capabilities through knowledge sharing and collaboration.
Stay Ahead of the Curve: Stay updated with the latest advancements in machine learning technologies and AWS services, driving the adoption of cutting-edge solutions to maintain a competitive edge.
Requirements
Bachelor's degree in Computer Science, Computer Engineering, Machine Learning, Statistics, Physics, or a relevant technical field, or equivalent practical experience.
At least 6+ years of experience in machine learning engineering, with demonstrated success in deploying scalable ML models in a production environment.