0% found this document useful (0 votes)

21 views26 pages

Big Data Analytics

This document provides information on Apache Hadoop, including its core components and features. It discusses how Hadoop uses MapReduce for distributed processing of large datasets across clusters of computers. It also describes HDFS for providing high-throughput access to application data in a distributed file system, and how data is stored in blocks and replicated across multiple nodes in HDFS.

Uploaded by

iasccoe354

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

21 views26 pages

Big Data Analytics

Uploaded by

iasccoe354

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 26

UNIT - III

 Characteristics of Big Data (3 Vs of Big Data)

 3Vs of Big Data = Volume, Velocity and
Variety.
Apache Hadoop

 The Apache Hadoop project develops open-

source software for reliable, scalable, distributed
computing.

 The Apache Hadoop software library is a

framework that allows for the distributed
processing of large data sets across clusters of
computers using simple programming models.

 It is designed to scale up from single servers to

thousands of machines, each offering local
computation and storage.
Cont..

 The project includes these modules:

 Hadoop Common: The common utilities that
support the other Hadoop modules.
 Hadoop Distributed File System (HDFS): A
distributed file system that provides high-
throughput access to application data.
 Hadoop YARN: A framework for job scheduling
and cluster resource management.
 Hadoop MapReduce: A YARN-based system for
parallel processing of large data sets.
Cont..

 Ambari
 Avro
 Cassandra
 Hbase
 Hive
 Pig
 Chukwa
 Mahout
 Spark
 Tez
 ZooKeeper
MapReduce

 MapReduce is a data processing model. Its

biggest advantage is the easy scaling of data
processing over many computing nodes.

 The primitives to achieve data processing

with MapReduce are called mappers and
reducers.
Cont..
Hadoop File System (HDFS)

 HDFS stands for Hadoop distributed file system.

HDFS is a filesystem designed for large-scale
distributed data processing under MapReduce.

 HDFS is the primary distributed storage used by

Hadoop applications.

 HDFS cluster primarily consists of a NameNode

that manages the file system metadata and
DataNodes that store the actual data.
Cont..

 Data is stored in blocks distributed and

replicated over many nodes and it is capable
to store a big data set, (e.g 100TB) as a single
file. The block size is often range from 64MB
to 1GB.

 Hadoop provides a set of command line

utilities that work similarly to the basics Linux
file commands.
Cont..
Cont..
Cont..
list of supported file
systems
 HDFS
 FTP
 Amazon S3 file system.
 Windows Azure Storage Blobs (WASB) file
system.
features in HDFS

 File permissions and authentication.

 Rack awareness
 Safemode
 Upgrade and rollback
 Secondary NameNode
 Checkpoint node
 Backup node
 Fsck
 Fetchdt
Current level of security in
Hadoop
 Current version of Hadoop has very basic rudimentary
implementation of security which is advisory access control
mechanism.

 Hadoop doesn’t strongly authenticate the client, it simply

asks the underlying Unix system by executing `who am i`
command

 Anyone can communicate directly with a Datanode

(without the need for communicating with the Namenode)
and ask for blocks if you have the block location details
(This was experimented at the recent Cloudera's Hadoop
Hackathon)
Hadoop Configuration
NameNode Conf.
DataNode Conf.

Thug Format
100% (1)
Thug Format
8 pages
Treinamento ONLINE MONEYGRAM
No ratings yet
Treinamento ONLINE MONEYGRAM
27 pages
VSDC Manual PDF
100% (3)
VSDC Manual PDF
71 pages
Games 23
No ratings yet
Games 23
8 pages
Introduction To Hadoop
No ratings yet
Introduction To Hadoop
5 pages
02 Unit-II Hadoop Architecture and HDFS
No ratings yet
02 Unit-II Hadoop Architecture and HDFS
18 pages
Advance Steel Customizing Profile Database
No ratings yet
Advance Steel Customizing Profile Database
15 pages
bda 2_hadoop
No ratings yet
bda 2_hadoop
112 pages
Introduction: Hadoop's History and Advantages 2. Architecture in Detail 3. Hadoop in Industry
No ratings yet
Introduction: Hadoop's History and Advantages 2. Architecture in Detail 3. Hadoop in Industry
53 pages
Bigdata Lecture 2
No ratings yet
Bigdata Lecture 2
17 pages
BDT_Unit03.pptx_(1)
No ratings yet
BDT_Unit03.pptx_(1)
93 pages
Hadoop
No ratings yet
Hadoop
83 pages
Unit-Iv CC&BD CS71
No ratings yet
Unit-Iv CC&BD CS71
148 pages
Big Data Module 2
No ratings yet
Big Data Module 2
23 pages
Hdfs Part 1
No ratings yet
Hdfs Part 1
72 pages
Knowledge Base 2190 - Deploying The ClickShare Desktop App With Microsoft SCCM
No ratings yet
Knowledge Base 2190 - Deploying The ClickShare Desktop App With Microsoft SCCM
8 pages
Lecture 4 Introduction to Hadoop
No ratings yet
Lecture 4 Introduction to Hadoop
25 pages
Bda Unit2
No ratings yet
Bda Unit2
24 pages
Module-2
No ratings yet
Module-2
23 pages
UNIT 2 full
No ratings yet
UNIT 2 full
121 pages
Bda Unit4
No ratings yet
Bda Unit4
22 pages
Unit Ii LM
No ratings yet
Unit Ii LM
18 pages
21 01 2022 19 54 41
No ratings yet
21 01 2022 19 54 41
5 pages
MAD 22617 Codes
No ratings yet
MAD 22617 Codes
25 pages
BD-Unit-II (1)
No ratings yet
BD-Unit-II (1)
57 pages
Big Data Module 2
No ratings yet
Big Data Module 2
23 pages
Unit 3
No ratings yet
Unit 3
5 pages
Unit 2
No ratings yet
Unit 2
73 pages
Imembuf DBG
No ratings yet
Imembuf DBG
5 pages
AntTweakBar GUI Library To Tweak Parameters of Your OpenGL A
No ratings yet
AntTweakBar GUI Library To Tweak Parameters of Your OpenGL A
1 page
Big data aktu unit 2
No ratings yet
Big data aktu unit 2
127 pages
BDA-3
No ratings yet
BDA-3
70 pages
School of Computer Engineering: Kalinga Institute of Industrial Technology Deemed To Be University Bhubaneswar-751024
No ratings yet
School of Computer Engineering: Kalinga Institute of Industrial Technology Deemed To Be University Bhubaneswar-751024
260 pages
Module 4_hadoop
No ratings yet
Module 4_hadoop
5 pages
Big data unit 2
No ratings yet
Big data unit 2
25 pages
HADOOP
No ratings yet
HADOOP
10 pages
Hadoop, A Distributed Framework For Big Data
No ratings yet
Hadoop, A Distributed Framework For Big Data
55 pages
Hadoop Ecosystem
No ratings yet
Hadoop Ecosystem
58 pages
BDA-UNIT-2 - 2023
No ratings yet
BDA-UNIT-2 - 2023
58 pages
DX Diag
No ratings yet
DX Diag
45 pages
UNIT - 2
No ratings yet
UNIT - 2
42 pages
Hadoop Important Lecture
No ratings yet
Hadoop Important Lecture
38 pages
BDA_exp_1
No ratings yet
BDA_exp_1
7 pages
Unit 2 Part A
No ratings yet
Unit 2 Part A
34 pages
Hadoop
No ratings yet
Hadoop
71 pages
Make Bootable USB (OSx86, S..
No ratings yet
Make Bootable USB (OSx86, S..
8 pages
IIS Configuration For Intranet
No ratings yet
IIS Configuration For Intranet
10 pages
Jinal Desai
No ratings yet
Jinal Desai
4 pages
UNIT 3-1
No ratings yet
UNIT 3-1
14 pages
Big Data-UNIT-2
No ratings yet
Big Data-UNIT-2
46 pages
Unit-III (Big Data) Final
No ratings yet
Unit-III (Big Data) Final
34 pages
Mod5 RPA
No ratings yet
Mod5 RPA
14 pages
Chap4_BigDataStorageAndManagement
No ratings yet
Chap4_BigDataStorageAndManagement
46 pages
DBMS Unit-5
No ratings yet
DBMS Unit-5
92 pages
5.Apache Hadoop Updated
No ratings yet
5.Apache Hadoop Updated
57 pages
Bda Summer 2022 Solution
No ratings yet
Bda Summer 2022 Solution
30 pages
Chapter 2
No ratings yet
Chapter 2
19 pages
Unit 2 Hadoop
No ratings yet
Unit 2 Hadoop
60 pages
Unit 3 - BD - Hadoop Ecosystem
No ratings yet
Unit 3 - BD - Hadoop Ecosystem
132 pages
HADOOP
No ratings yet
HADOOP
18 pages
Attachment (21)
No ratings yet
Attachment (21)
11 pages
How To Package and Deploy Extensions For Lightweight Deployment in 9.1
No ratings yet
How To Package and Deploy Extensions For Lightweight Deployment in 9.1
20 pages
Unit 3 - BD - Hadoop Ecosystem
No ratings yet
Unit 3 - BD - Hadoop Ecosystem
42 pages
BIG Data_Unit_2
No ratings yet
BIG Data_Unit_2
24 pages
Unit-I
No ratings yet
Unit-I
38 pages
Introduction To Big Data and Hadoop
100% (1)
Introduction To Big Data and Hadoop
29 pages
Chapter2 Bdi
No ratings yet
Chapter2 Bdi
101 pages
BigData Unit 2
No ratings yet
BigData Unit 2
56 pages
HADOOP NOTES
No ratings yet
HADOOP NOTES
8 pages
HADOOP
No ratings yet
HADOOP
55 pages
Big Data Analytics – Unit 4
No ratings yet
Big Data Analytics – Unit 4
32 pages
EagleRIP Offset Presentation-Ingles
No ratings yet
EagleRIP Offset Presentation-Ingles
37 pages
NET102 Chapter2ConfigureaNetworkOperatingSystem
No ratings yet
NET102 Chapter2ConfigureaNetworkOperatingSystem
52 pages
It Manager
No ratings yet
It Manager
3 pages
File Allocation Table
No ratings yet
File Allocation Table
22 pages
BDT - Unit - II - Hdfs and Hadoop Io
No ratings yet
BDT - Unit - II - Hdfs and Hadoop Io
42 pages
Module-2 PPT-1
No ratings yet
Module-2 PPT-1
126 pages
A Configuring SAP For Inbound and Outbound Processing
No ratings yet
A Configuring SAP For Inbound and Outbound Processing
29 pages
Printing Big Data Hadoop
No ratings yet
Printing Big Data Hadoop
24 pages
CIS CentOS Linux 7 Benchmark v3.1.2
No ratings yet
CIS CentOS Linux 7 Benchmark v3.1.2
625 pages
Documentation
No ratings yet
Documentation
44 pages
IrfanView 4.27 Shortcuts
No ratings yet
IrfanView 4.27 Shortcuts
2 pages
How To Merge Excel Files
No ratings yet
How To Merge Excel Files
24 pages
Mini-Link Craft Basic Setup
No ratings yet
Mini-Link Craft Basic Setup
15 pages
System Center Data Protection Manager
No ratings yet
System Center Data Protection Manager
7 pages
Warmcomm EN PDF
No ratings yet
Warmcomm EN PDF
2 pages
DBA Ch02 Oracle Architecture
No ratings yet
DBA Ch02 Oracle Architecture
85 pages
Operating System Assignments
No ratings yet
Operating System Assignments
2 pages
Mastering Data Engineering: Advanced Techniques with Apache Hadoop and Hive
From Everand
Mastering Data Engineering: Advanced Techniques with Apache Hadoop and Hive
Peter Jones
No ratings yet
Exploring Hadoop Ecosystem (Volume 1): Batch Processing
From Everand
Exploring Hadoop Ecosystem (Volume 1): Batch Processing
Wei Liu
No ratings yet

Big Data Analytics

Uploaded by

Big Data Analytics

Uploaded by

UNIT - III

 Characteristics of Big Data (3 Vs of Big Data)

 The Apache Hadoop project develops open-

 The Apache Hadoop software library is a

 It is designed to scale up from single servers to

 The project includes these modules:

 MapReduce is a data processing model. Its

 The primitives to achieve data processing

 HDFS stands for Hadoop distributed file system.

 HDFS is the primary distributed storage used by

 HDFS cluster primarily consists of a NameNode

 Data is stored in blocks distributed and

 Hadoop provides a set of command line

 File permissions and authentication.

 Hadoop doesn’t strongly authenticate the client, it simply

 Anyone can communicate directly with a Datanode

You might also like