Amazon Web Services (AWS) is the world’s most comprehensive and broadly adopted cloud platform. This role is for a machine learning engineer in the Distributed Training team for AWS Neuron, responsible for development, enablement, and performance tuning of various ML model families.

Responsibilities:

You will help lead the efforts building distributed training support into Pytorch, Tensorflow using XLA and the Neuron compiler and runtime stacks
You will help tune these models to ensure highest performance and maximize the efficiency of them running on the custom AWS Trainium and Inferentia silicon and the Trn1, Inf1/2 servers

Software Engineer - AI/ML, AWS Neuron Distributed Training - Multimodal

Key skills

About this role

Responsibilities: