Amazon Web Services (AWS) is focused on building AWS Neuron, a software development kit for accelerating deep learning and GenAI workloads. The role involves architecting and implementing business-critical features while mentoring a team of engineers, optimizing machine learning models for AWS's custom hardware accelerators.
Responsibilities:
- Design, develop, and optimize machine learning models and frameworks for deployment on custom ML hardware accelerators
- Participate in all stages of the ML system development lifecycle including distributed computing based architecture design, implementation, performance profiling, hardware-specific optimizations, testing and production deployment
- Build infrastructure to systematically analyze and onboard multiple models with diverse architecture
- Design and implement high-performance kernels and features for ML operations, leveraging the Neuron architecture and programming models
- Analyze and optimize system-level performance across multiple generations of Neuron hardware
- Conduct detailed performance analysis using profiling tools to identify and resolve bottlenecks
- Implement optimizations such as fusion, sharding, tiling, and scheduling
- Conduct comprehensive testing, including unit and end-to-end model testing with continuous deployment and releases through pipelines
- Work directly with customers to enable and optimize their ML models on AWS accelerators
- Collaborate across teams to develop innovative optimization techniques