Best GPUs on AWS for Deep Learning

In a hurry? Here are the best GPUs for Deep Learining on AWS

By Shashank Prasanna | Wednesday, May 11, 2022

Tags:

Here are 5 GPU instance recommendations on AWS that should serve majority of deep learning use-cases. For a complete deep dive into choosing the right GPU for deep learning on AWS, read my blog post:

Choosing the right GPU for deep learning on AWS

Highest performing multi-GPU instance on AWS

GPU

Instance: p4d.24xlarge

When to use it: When you need all the performance you can get. Use it for distributed training on large models and datasets.

What you get: 8 x NVIDIA A100 GPUs with 40 GB GPU memory per GPU. Based on the latest NVIDIA Ampere architecture. Includes 3rd generation NVLink for fast multi-GPU training.

Highest performing single-GPU instance on AWS

GPU

Instance: p3.2xlarge

When to use it: When you want the highest performance Single GPU and you’re fine with 16 GB of GPU memory.

What you get: 1 x NVIDIA V100 GPU with 16 GB of GPU memory. Based on the older NVIDIA Volta architecture. The best performing single-GPU is still the NVIDIA A100 on P4 instance, but you can only get 8 x NVIDIA A100 GPUs on P4. This GPU has a slight performance edge over NVIDIA A10G on G5 instance discussed next, but G5 is far more cost-effective and has more GPU memory.

Best performance/cost, single-GPU instance on AWS

GPU

Instance: g5.xlarge

When to use it: When you want high-performance, more GPU memory at lower cost than P3 instance

What you get: 1 x NVIDIA A10G GPU with 24 GB of GPU memory, based on the latest Ampere architecture. NVIDIA A10G can be seen as a lower powered cousin of the A100 on the p4d.24xlarge so it’s easy to migrate and scale when you need more compute. Consider larger sizes withg5.(2/4/8/16)xlarge for the same single-GPU with more vCPUs and higher system memory if you have more pre or post processing steps.

Best performance/cost, multi-GPU instance on AWS

GPU

Instance: p3.(8/16)xlarge

When to use it: Cost-effective multi-GPU model development and training.

What you get: p3.8xlarge has 4 x NVIDIA V100 GPUs and p3.16xlarge has 8 x NVIDIA V100 GPUs with 16 GB of GPU memory on each GPU, based on the older NVIDIA Volta architecture. For larger models, datasets and faster performance consider P4 instances.

High-performance GPU instance at a budget on AWS

GPU

Instance: g4dn.xlarge

When to use it: Lower performance than other options at lower cost for model development and training. Cost effective model inference deployment.

What you get: 1 x NVIDIA T4 GPU with 16 GB of GPU memory. Based on the previous generation NVIDIA Turing architecture. Consider g4dn.(2/4/8/16)xlarge for more vCPUs and higher system memory if you have more pre or post processing.

## Related blog posts

AI Accelerators and Machine Learning Algorithms: Co-Design and Evolution

Choosing the right GPU for deep learning on AWS

How Docker Runs Machine Learning on NVIDIA GPUs, AWS Inferentia, and Other Hardware AI Accelerators