Machine Learning Training and Preemptible GPUs
Training ML workloads is a great fit for Preemptible VMs with GPUs.
Kubernetes Engine and Compute Engine’s
managed instance groups allow you to create dynamically scalable clusters of Preemptible VMs with GPUs for your large compute jobs. To help deal with Preemptible VM terminations, Tensorflow’s
checkpointing feature can be used to save and restore work progress. An example and walk-through is provided
here.
Getting Started
To get started with Preemptible GPUs in Google Compute Engine, simply
append --preemptible to your instance create command in gcloud, specify
scheduling.preemptible to
true in the
REST API or set Preemptibility to "On" in the Google Cloud Platform Console, and then attach a GPU as usual. You can use your regular GPU quota to launch Preemptible GPUs or, alternatively, you can request a special
Preemptible GPUs quota that only applies to GPUs attached to Preemptible VMs. Check out our
documentation to learn more. To learn how to use Preemptible GPUs with Google Kubernetes Engine, head over to our
Kubernetes Engine GPU documentation.
For a certain class of workloads, Google Cloud GPUs provide exceptional compute performance. Now, with new low Preemptible GPU pricing, we invite you to see for yourself how easy it is to get the performance you need, at the low, predictable price that you want.