The Hadoop Ecosystem: So Much Free Stuff!
The Hadoop Ecosystem: So Much Free Stuff!
B C
A
One possible layer diagram for Hadoop
Hive Pig
Giraph
Spark
Storm
Flink
MapReduce
HBase
Cassandra
MongoDB
Zookeeper
YARN
HDFS
One possible layer diagram for Hadoop
Higher levels:
Interactivity
Hive Pig
Giraph
Spark
Storm
Flink
MapReduce
HBase
Cassandra
MongoDB
Zookeeper
YARN
HDFS
Lower levels:
Storage and scheduling
Distributed file system as foundation
Scalable storage
Fault tolerance
Spark
Storm
Flink
MapReduce
HBase
Cassandra
Zookeeper
MongoDB
YARN
HDFS
Flexible scheduling and
resource management
Spark
Storm
Flink
MapReduce
HBase
Cassandra
Zookeeper
MongoDB
YARN
HDFS
Simplified programming model
Map apply()
Reduce summarize()
Hive Pig Giraph
Spark
Storm
Flink
MapReduce
HBase
Cassandra
Zookeeper
MongoDB
YARN
Google used MapReduce
for indexing web sites
HDFS
Higher-level programming models
Pig = dataflow scripting
Hive = SQL-like queries
Spark
Storm
Flink
MapReduce
HBase
Cassandra
Zookeeper
MongoDB
YARN
Hive created at Facebook
HDFS
Specialized models
for graph processing
Giraph used by Facebook
to analyze social graphs
Hive Pig Giraph
Spark
Storm
Flink
MapReduce
HBase
Cassandra
Zookeeper
MongoDB
YARN
HDFS
Real-time and
in-memory processing
In-memory 100x faster
for some tasks
Spark
Storm
Flink
MapReduce
HBase
Cassandra
Zookeeper
MongoDB
YARN
HDFS
NoSQL for non-files
Key-values
Sparse tables
Hive Pig Giraph
Spark
Storm
Flink
MapReduce
HBase
Cassandra
Zookeeper
MongoDB
HBase used
Messaging Platform
HDFS
Zookeeper for management
Synchronization
Configuration
High-availability
Hive Pig Giraph
Spark
Storm
Flink
Created by Yahoo to wrangle
MapReduce
HBase
Cassandra
Zookeeper
MongoDB
services
YARN named after animals
HDFS
All these tools are open-source
All these tools are open-source
Large community
for support
All these tools are open-source
Large community
for support
Download separately
or part of pre-built image
All these tools are open-source
Large community
for support
Download separately
or part of pre-built image
Hive Pig
Giraph
Spark
Storm
Flink
MapReduce
HBase
Cassandra
MongoDB
Zookeeper
YARN
HDFS