Hadoop is an open-source software framework for distributed storage and processing of large datasets across clusters of computers. It allows for the reliable, scalable, and distributed processing of large data sets across clusters of commodity hardware. Hadoop features the Hadoop Distributed File System for storage, and MapReduce for distributed computing. Many large companies such as Google, Yahoo, Facebook and Amazon use Hadoop for applications like log analysis, machine learning and data mining.