0% found this document useful (0 votes)
15 views

Big Data Analytics

This document provides information on Apache Hadoop, including its core components and features. It discusses how Hadoop uses MapReduce for distributed processing of large datasets across clusters of computers. It also describes HDFS for providing high-throughput access to application data in a distributed file system, and how data is stored in blocks and replicated across multiple nodes in HDFS.

Uploaded by

iasccoe354
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
15 views

Big Data Analytics

This document provides information on Apache Hadoop, including its core components and features. It discusses how Hadoop uses MapReduce for distributed processing of large datasets across clusters of computers. It also describes HDFS for providing high-throughput access to application data in a distributed file system, and how data is stored in blocks and replicated across multiple nodes in HDFS.

Uploaded by

iasccoe354
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 26

UNIT - III

 Characteristics of Big Data (3 Vs of Big Data)


 3Vs of Big Data = Volume, Velocity and
Variety.
Apache Hadoop

 The Apache Hadoop project develops open-


source software for reliable, scalable, distributed
computing.

 The Apache Hadoop software library is a


framework that allows for the distributed
processing of large data sets across clusters of
computers using simple programming models.

 It is designed to scale up from single servers to


thousands of machines, each offering local
computation and storage.
Cont..

 The project includes these modules:


 Hadoop Common: The common utilities that
support the other Hadoop modules.
 Hadoop Distributed File System (HDFS): A
distributed file system that provides high-
throughput access to application data.
 Hadoop YARN: A framework for job scheduling
and cluster resource management.
 Hadoop MapReduce: A YARN-based system for
parallel processing of large data sets.
Cont..

 Ambari
 Avro
 Cassandra
 Hbase
 Hive
 Pig
 Chukwa
 Mahout
 Spark
 Tez
 ZooKeeper
MapReduce

 MapReduce is a data processing model. Its


biggest advantage is the easy scaling of data
processing over many computing nodes.

 The primitives to achieve data processing


with MapReduce are called mappers and
reducers.
Cont..
Hadoop File System (HDFS)

 HDFS stands for Hadoop distributed file system.


HDFS is a filesystem designed for large-scale
distributed data processing under MapReduce.

 HDFS is the primary distributed storage used by


Hadoop applications.

 HDFS cluster primarily consists of a NameNode


that manages the file system metadata and
DataNodes that store the actual data.
Cont..

 Data is stored in blocks distributed and


replicated over many nodes and it is capable
to store a big data set, (e.g 100TB) as a single
file. The block size is often range from 64MB
to 1GB.

 Hadoop provides a set of command line


utilities that work similarly to the basics Linux
file commands.
Cont..
Cont..
Cont..
list of supported file
systems
 HDFS
 FTP
 Amazon S3 file system.
 Windows Azure Storage Blobs (WASB) file
system.
features in HDFS

 File permissions and authentication.


 Rack awareness
 Safemode
 Upgrade and rollback
 Secondary NameNode
 Checkpoint node
 Backup node
 Fsck
 Fetchdt
Current level of security in
Hadoop
 Current version of Hadoop has very basic rudimentary
implementation of security which is advisory access control
mechanism.

 Hadoop doesn’t strongly authenticate the client, it simply


asks the underlying Unix system by executing `who am i`
command

 Anyone can communicate directly with a Datanode


(without the need for communicating with the Namenode)
and ask for blocks if you have the block location details
(This was experimented at the recent Cloudera's Hadoop
Hackathon)
Hadoop Configuration
NameNode Conf.
DataNode Conf.

You might also like