Amazon Web Services (AWS) is the world’s most comprehensive and broadly adopted cloud platform. This role is for a machine learning engineer in the Distributed Training team for AWS Neuron, responsible for development, enablement, and performance tuning of various ML model families.
Responsibilities:
- You will help lead the efforts building distributed training support into Pytorch, Tensorflow using XLA and the Neuron compiler and runtime stacks
- You will help tune these models to ensure highest performance and maximize the efficiency of them running on the custom AWS Trainium and Inferentia silicon and the Trn1, Inf1/2 servers