This document discusses Apache Spark, an open-source distributed data processing framework. It describes how Spark provides a unified platform for batch processing, streaming, SQL queries, machine learning and graph processing. The document demonstrates how in Spark these capabilities can be combined in a single application, without needing to move data between systems. It shows an example pipeline that performs SQL queries, machine learning clustering and streaming processing on Twitter data.