Apache Spark is an open-source cluster computing framework for large-scale data processing. It supports batch processing, real-time processing, streaming analytics, machine learning, interactive queries, and graph processing. Spark core provides distributed task dispatching and scheduling. It works by having a driver program that connects to a cluster manager to run tasks on executors in worker nodes. Spark also introduces Resilient Distributed Datasets (RDDs) that allow immutable, parallel data processing. Common RDD transformations include map, flatMap, groupByKey, and reduceByKey while common actions include reduce.