Best GPUs on AWS for Deep Learning
Here are 5 GPU instance recommendations on AWS that should serve majority of deep learning use-cases. For a complete deep dive into choosing the right GPU for deep learning on AWS, read my blog post:
Instance: p4d.24xlarge
When to use it: When you need all the performance you can get. Use it for distributed training on large models and datasets.
What you get: 8 x NVIDIA A100 GPUs with 40 GB GPU memory per GPU. Based on the latest NVIDIA Ampere architecture. Includes 3rd generation NVLink for fast multi-GPU training.
Instance: p3.2xlarge
When to use it: When you want the highest performance Single GPU and you’re fine with 16 GB of GPU memory.
What you get: 1 x NVIDIA V100 GPU with 16 GB of GPU memory. Based on the older NVIDIA Volta architecture. The best performing single-GPU is still the NVIDIA A100 on P4 instance, but you can only get 8 x NVIDIA A100 GPUs on P4. This GPU has a slight performance edge over NVIDIA A10G on G5 instance discussed next, but G5 is far more cost-effective and has more GPU memory.
Instance: g5.xlarge
When to use it: When you want high-performance, more GPU memory at lower cost than P3 instance
What you get: 1 x NVIDIA A10G GPU with 24 GB of GPU memory, based on the latest Ampere architecture. NVIDIA A10G can be seen as a lower powered cousin of the A100 on the p4d.24xlarge so it’s easy to migrate and scale when you need more compute. Consider larger sizes withg5.(2/4/8/16)xlarge for the same single-GPU with more vCPUs and higher system memory if you have more pre or post processing steps.
Instance: p3.(8/16)xlarge
When to use it: Cost-effective multi-GPU model development and training.
What you get: p3.8xlarge has 4 x NVIDIA V100 GPUs and p3.16xlarge has 8 x NVIDIA V100 GPUs with 16 GB of GPU memory on each GPU, based on the older NVIDIA Volta architecture. For larger models, datasets and faster performance consider P4 instances.
Instance: g4dn.xlarge
When to use it: Lower performance than other options at lower cost for model development and training. Cost effective model inference deployment.
What you get: 1 x NVIDIA T4 GPU with 16 GB of GPU memory. Based on the previous generation NVIDIA Turing architecture. Consider g4dn.(2/4/8/16)xlarge for more vCPUs and higher system memory if you have more pre or post processing.
Related blog posts
Feedback
Was this page helpful?
Glad to hear it! Please tell me how I can improve.
Sorry to hear that. Please tell me how I can improve.