The document provides an introduction to Apache Spark and related technologies. It discusses the Spark ecosystem including Spark Core, Spark SQL, Spark Streaming and MLlib. It also covers Resilient Distributed Datasets (RDDs), DataFrames, Spark SQL optimizations using Catalyst, and using Spark on a YARN cluster. The document is intended to provide a hands-on intro to Spark and related tools in the Hortonworks Data Platform sandbox environment.