Apache Spark is a fast and general engine for large-scale data processing. It provides a unified API for batch, interactive, and streaming data processing using in-memory primitives. A benchmark showed Spark was able to sort 100TB of data 3 times faster than Hadoop using 10 times fewer machines by keeping data in memory between jobs.