Chapter 2 - 大数据生态系统
Chapter 2 - 大数据生态系统
ECOSYSTEM
INTRO
• With the advances in technology and the rapid evolution of
computing technology, it is becoming a very tedious to process
and manage huge amount of information without the use of
supercomputers.
• There are some tools and techniques that are available for data
management like Google BigTable, Data Stream Management
System (DSMS), NoSQL amongst others.
• However, there is an urgent need for companies to deploy special
tools and technologies that can be used to store, access, analyse
and large amounts of data in near-real time. Big Data cannot be
stored in a single machine and thus, several machines are
required.
• Common tools that are used to manipulate Big Data are Hadoop,
MapReduce, and BigTable.
CLUSTER COMPUTING
• Cluster computing is attracting the attention of researchers
including system developers, network engineers, academics
and software designers.
• A computer cluster is defined as a single logical unit which
consist of several computers that are linked through a fast
local area network (LAN). The components of a cluster, which
is commonly termed as nodes, operate their own instance of
an operating system.
CLUSTER COMPUTING
• A node usually comprises of the CPU, memory, and disk
storage (Buyya et al., 2003).
• It is observed that clusters, as computing platform, is not only
restricted to the scientific and engineering applications. Many
business applications are also using computer clusters.
Computer Clusters are needed for Big Data.
HADOOP
• Hadoop was founded by Apache. It is an open-source software
framework for processing and querying vast amounts of data
on large clusters of commodity.
• Hadoop is being written in Java and can process huge volume
of structured and unstructured data (Khan et al., 2014).
• It is implemented for Google MapReduce as an open source
and is based on simple programming model called
MapReduce.
HADOOP
• It provides reliability through replication (Usha and Jenil,
2014).
• The Apache Hadoop ecosystem is composed of the Hadoop
Kernel, MapReduce, HDFS and several other components like
Apache Hive, Base and Zookeeper (Bhosale and Gadekar,
2014).
CHARACTERISTICS OF
HADOOP
The characteristics of Hadoop are described as follows: