- Big data refers to large sets of data that businesses and organizations collect, while Hadoop is a tool designed to handle big data. Hadoop uses MapReduce, which maps large datasets and then reduces the results for specific queries.
- Hadoop jobs run under five main daemons: the NameNode, DataNode, Secondary NameNode, JobTracker, and TaskTracker.
- HDFS is Hadoop's distributed file system that stores very large amounts of data across clusters. It replicates data blocks for reliability and provides clients high-throughput access to files.