Home
1. Getting started
1.1 Login to your temporary workshop AWS Account
1.2 Download workshop content
2. Distributed Training with PyTorch and Amazon SageMaker
2.1 Why distributed training
2.2 How Amazon SageMaker and PyTorch work together
2.3 Notebook: PyTorch Native Distributed Training with Amazon SageMaker
2.4 Bonus: PyTorch SageMaker Data Parallel Distributed Training with Amazon SageMaker
3. Clean up resources
Delete all resources
Appendix
Documentation resources
Blogposts and videos
[Back to workshops page]
2. Distributed Training with PyTorch and Amazon SageMaker
2. Distributed Training with PyTorch and Amazon SageMaker
In this section you’ll learn how to prepare your PyTorch training scripts and run distributed training jobs with Amazon SageMaker
You will cover:
Preparing your dataset for Amazon SageMaker by uploading them to Amazon S3
Writing your PyTorch training script for distributed training
Writing your Amazon SageMaker SDK functions to run distributed training
Running distributed training jobs on specified number of CPU instances
Deploying your trained models to endpoints using SageMaker and evaluating them
BONUS:
Running high-performance and large scale training on GPUs