Personal blog
Choosing the right GPU for deep learning on AWS
Saturday, July 25, 2020 in Personal blog
Illustration by author Just a decade ago, if you wanted access to a GPU to accelerate your data processing or scientific simulation code, you’d either have to get hold of a PC gamer or contact your friendly neighborhood supercomputing center. Today, …
A quick guide to managing machine learning experiments
Tuesday, July 14, 2020 in Personal blog
Illustration by author. Inspired by xkcd.com The word “experiment” means different things to different people. For scientists (and hopefully for rigorous data scientists), an experiment is an empirical procedure to determine if an outcome agrees or …
A quick guide to using Spot instances with Amazon SageMaker
Saturday, April 25, 2020 in Personal blog
Amazon SageMaker will automatically back up and sync checkpoint to Amazon S3 so you can resume training easily One of the simplest ways to lower your machine learning training costs is to use Amazon EC2 Spot instances. Spot instances allow you to …
How to debug machine learning models to catch issues early and often
Friday, April 03, 2020 in Personal blog
Eek! there’s a bug in my neural network! If you work in software development, you know that bugs are a fact of life. They’ll be there when you start your project, and they’ll be there when you ship your product to customers. Over the last couple of …
A quick guide to distributed training with TensorFlow and Horovod on Amazon SageMaker
Saturday, March 14, 2020 in Personal blog
Distribute training on multiple GPUs using horovod and Amazon SageMaker for faster training and increased productivity In deep learning, more is better. More data, more layers, and more compute power, usually leads to higher accuracy, and better …
Amazon SageMaker Operators for Kubernetes—examples for distributed training, hyperparameter tuning and model hosting
Wednesday, March 04, 2020 in Personal blog
Use Amazon SageMaker Operators for Kubernetes to run training jobs, model tuning jobs, batch transform jobs, and set up inference endpoints on Amazon SageMaker using Kubernetes config files and kubectl At re:invent 2019, AWS announced Amazon …
Kubernetes and Amazon SageMaker for machine learning — best of both worlds
Tuesday, March 03, 2020 in Personal blog
Kubernetes and Amazon SageMaker — best of both worlds If you’re part of a team that trains and deploys machine learning models frequently, you probably have a cluster setup to help orchestrate and manage your machine learning workloads. Chances that …
Stop duplicating deep learning training datasets with Amazon EBS multi-attach
Saturday, February 22, 2020 in Personal blog
(Edit 02/26/20: AWS support team has put out a warning discouraging the use of standard file systems such as xfs with EBS multi-attach. xfs is not a cluster-aware file system and may lead to data loss in a multi-access cluster setup. In my …
Run RAPIDS experiments at scale using Amazon SageMaker
Monday, February 10, 2020 in Personal blog
If you worked with machine learning in the 2000s, chances are that your tools, frameworks and go-to algorithms looked very different than what it does today. Deep neural networks now, have become synonymous with machine learning and non-neural …