Apache Spark is a unified analytics engine for large-scale, distributed data processing. And Spark MLlib (Machine Learning library) is a scalable Spark implementation of some common machine learning (ML) functionality, as well associated tests and data generators.