PyTorch Distributed Training with Amazon SageMaker

Presented by Shashank Prasanna

Approx. duration: 2 hours

Abstract: Reducing time-to-train of your PyTorch models is crucial in improving your productivity and reducing your time-to-solution. In this workshop, you will learn how to efficiently scale your training workloads to multiple instances, with Amazon SageMaker doing the heavy-lifting for you. You don’t have to manage compute, storage and networking infrastructure, simply bring in your PyTorch code and distribute training across large number of CPUs and GPUs. The AWS PyTorch team will also discuss their latest PyTorch feature contributions.

Learning objective:

Learn how to get started with AWS on PyTorch
Learn about the most recent PyTorch and AWS libraries for deep learning
Best practices to reduce training times for your deep learning models

Agenda

Topics	Duration (90 mins)
Setup and getting started	20 mins
Problem overview and dataset preparation	20 mins
Distributed Training with SageMaker	40 mins
Wrap Up	10 mins