This document provides an overview of Spark and its key components. Spark is a fast and general engine for large-scale data processing. It uses Resilient Distributed Datasets (RDDs) that allow data to be partitioned across clusters and cached in memory for fast performance. Spark is up to 100x faster than Hadoop for iterative jobs and provides a unified framework for batch processing, streaming, SQL, and machine learning workloads.