2. Distributed Training with PyTorch and Amazon SageMaker

In this section you’ll learn how to prepare your PyTorch training scripts and run distributed training jobs with Amazon SageMaker

You will cover:

  1. Preparing your dataset for Amazon SageMaker by uploading them to Amazon S3
  2. Writing your PyTorch training script for distributed training
  3. Writing your Amazon SageMaker SDK functions to run distributed training
  4. Running distributed training jobs on specified number of CPU instances
  5. Deploying your trained models to endpoints using SageMaker and evaluating them
  6. BONUS: Running high-performance and large scale training on GPUs