PyTorch Distributed Training with Amazon SageMaker

Presented by Shashank Prasanna

Approx. duration: 2 hours

Abstract: Reducing time-to-train of your PyTorch models is crucial in improving your productivity and reducing your time-to-solution. In this workshop, you will learn how to efficiently scale your training workloads to multiple instances, with Amazon SageMaker doing the heavy-lifting for you. You don’t have to manage compute, storage and networking infrastructure, simply bring in your PyTorch code and distribute training across large number of CPUs and GPUs. The AWS PyTorch team will also discuss their latest PyTorch feature contributions.

Learning objective:

  • Learn how to get started with AWS on PyTorch
  • Learn about the most recent PyTorch and AWS libraries for deep learning
  • Best practices to reduce training times for your deep learning models


TopicsDuration (90 mins)
Setup and getting started20 mins
Problem overview and dataset preparation20 mins
Distributed Training with SageMaker40 mins
Wrap Up10 mins