This document provides an overview of distributed deep learning on Spark. It begins with a brief introduction to machine learning and deep learning. It then discusses why distributed systems are needed for deep learning due to the computational intensity. Spark is identified as a framework that can be used to build distributed deep learning systems. Two examples are described - SparkNet, which was developed at UC Berkeley, and CaffeOnSpark, developed at Yahoo. Both implement distributed stochastic gradient descent using a parameter server approach. The document concludes with demonstrations of Caffe and CaffeOnSpark.