Choosing the right GPU for deep learning on AWS

How to choose the right Amazon EC2 GPU instance for deep learning training and inference — from best performance to the most cost-effective and everything in-between

On AWS, you can launch GPU instances with different GPU memory sizes (8 GB, 16 GB, 24 GB, 32 GB, 40 GB), NVIDIA GPU generations (Ampere, Turing, Volta, Maxwell, Kepler) different capabilities (FP64, FP32, FP16, INT8, Sparsity, TensorCores, NVLink), different number of GPUs per instance (1, 2, 4, 8, 16), and paired with different CPUs (Intel, AMD, Graviton2). You can also select instances with different vCPUs (core thread count), system memory and network bandwidth and add a range of storage options (object storage, network file systems, block storage, etc.) — in summary, you have options.

My goal with this blog post is to provide you with guidance on how you can choose the right GPU instance on AWS for your deep learning projects. I’ll discuss key features and benefits of various EC2 GPU instances, and workloads that are best suited for each instance type and size. If you’re new to AWS, or new to GPUs, or new to deep learning, my hope is that you’ll find the information you need to make the right choice for your projects.

Topics covered in this blog post:

  1. Key recommendations for the busy data scientist/ML practitioner
  2. Why you should choose the right GPU instance not just the right GPU
  3. Deep dive on GPU instance types: P4, P3, G5 (G5g), G4, P2 and G3
  4. Other machine learning accelerators and instances on AWS
  5. Cost optimization tips when using GPU instances for ML
  6. What software and frameworks to use on AWS?
  7. Which GPUs to consider for HPC use-cases?
  8. A complete and unapologetically detailed spreadsheet of all AWS GPU instances and their features