Big data comes from many sources like social media, e-commerce sites, and stock markets. Hadoop is an open-source framework that allows processing and storing large amounts of data across clusters of computers. It uses HDFS for storage and MapReduce for processing. HDFS stores data across cluster nodes and is fault tolerant. MapReduce analyzes data through parallel map and reduce functions. Sqoop imports and exports data between Hadoop and relational databases.