Best GPUs on AWS for Deep Learning
Here are 5 GPU instance recommendations on AWS that should serve majority of deep learning use-cases. For a complete deep dive into choosing the right GPU for deep learning on AWS, read my blog post:
Instance: p4d.24xlarge
When to use it: When you need all the performance you can get. Use it for distributed training on large models and datasets.
What you get: 8 x NVIDIA A100 GPUs with 40 GB GPU memory per GPU. Based on the latest NVIDIA Ampere architecture. Includes 3rd generation NVLink for fast multi-GPU training.
Instance: p3.2xlarge
When to use it: When you want the highest performance Single GPU and you’re fine with 16 GB of GPU memory.
What you get: 1 x NVIDIA V100 GPU with 16 GB of GPU memory. Based on the older NVIDIA Volta architecture. The best performing single-GPU is still the NVIDIA A100 on P4 instance, but you can only get 8 x NVIDIA A100 GPUs on P4. This GPU has a slight performance edge over NVIDIA A10G on G5 instance discussed next, but G5 is far more cost-effective and has more GPU memory.
Instance: g5.xlarge
When to use it: When you want high-performance, more GPU memory at lower cost than P3 instance
What you get: 1 x NVIDIA A10G GPU with 24 GB of GPU memory, based on the latest Ampere architecture. NVIDIA A10G can be seen as a lower powered cousin of the A100 on the p4d.24xlarge so it’s easy to migrate and scale when you need more compute. Consider larger sizes withg5.(2/4/8/16)xlarge for the same single-GPU with more vCPUs and higher system memory if you have more pre or post processing steps.
Instance: p3.(8/16)xlarge
When to use it: Cost-effective multi-GPU model development and training.
What you get: p3.8xlarge has 4 x NVIDIA V100 GPUs and p3.16xlarge has 8 x NVIDIA V100 GPUs with 16 GB of GPU memory on each GPU, based on the older NVIDIA Volta architecture. For larger models, datasets and faster performance consider P4 instances.
Instance: g4dn.xlarge
When to use it: Lower performance than other options at lower cost for model development and training. Cost effective model inference deployment.
What you get: 1 x NVIDIA T4 GPU with 16 GB of GPU memory. Based on the previous generation NVIDIA Turing architecture. Consider g4dn.(2/4/8/16)xlarge for more vCPUs and higher system memory if you have more pre or post processing.
## Related blog posts