0% found this document useful (0 votes)

5 views

bigdata (1) (1)

The document provides an overview of Big Data architecture and Hadoop architecture, detailing the layers involved in Big Data systems, including data source, ingestion, storage, processing, analytics, presentation, security, and orchestration. It explains key components of Hadoop architecture, such as HDFS, MapReduce, and YARN, emphasizing their roles in distributed data processing and storage. The document also highlights important technologies and tools used in Big Data analytics and Hadoop systems.

Uploaded by

kameswarisreevalli2801

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

5 views

bigdata (1) (1)

Uploaded by

kameswarisreevalli2801

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 23

1.

To Study of Big Data Analytics and Hadoop Architecture

(i) know the concept of big data architecture

(ii) know the concept of Hadoop architecture

Big Data architecture:

B ig D a ta a rc h ite c tu re re fers to th e d e s ig n a n d s tru c tu re u s ed to s to re , p ro c es s , a n d

a n a lyze la rg e vo lu m es o f d a ta. T h e s e a rc h itec tu res a re b u ilt to h a n d le a v a rie ty o f d a ta
typ e s ( s tru c tu red , s em i-s tru c tu red , u n s tru c tu red ), a s w ell a s th e la rg e s c ale a n d s p e ed o f
m o d ern d a ta flo w s . T h e c o re c o m p o n en ts o f B ig D a ta a rc h itec tu re typ ic a lly in c lu d e th e
fo llo w in g la yers :

1. Data Source Layer

T h is la ye r refe rs to th e o rig in o f th e d a ta , w h ic h c o u ld c o m e fro m a va riety o f s o u rc es :

● External data sources: S o c ia l m e d ia , Io T d ev ic e s , th ird -p a rty s erv ic es , e tc .

● Internal data sources: D a ta b a s es , d a ta w a re h o u s e s , etc .
● Data streams: R ea l-tim e d a ta fro m s en s o rs , lo g s , e tc .

2. Data Ingestion Layer

D a ta in g es tio n is th e p ro c e s s o f c o lle c tin g a n d tra n s p o rtin g d a ta f ro m v a rio u s s o u rc es to

th e s to ra g e la ye r. T h e tw o m a in typ e s o f in g es tio n a re:

● Batch processing: D a ta is c o llec te d o v er a fix ed p e rio d (e. g ., e ve ry h o u r, d a ily) .

● Real-time/streaming processing: D a ta is c o llec te d in rea l-tim e o r n ea r re a l-tim e.

T o o ls u s e d fo r d a ta in g es tio n in c lu d e :

● Apache Kafka: A d is trib u ted s trea m in g p la tf o rm .

● Apache Flume: A s erv ic e fo r c o llec tin g a n d m o v in g larg e a m o u n ts o f lo g d a ta .
● AWS Kinesis: A p la tfo rm fo r rea l-tim e s tre am in g d a ta o n A W S .

3. Data Storage Layer

T h is is w h ere a ll th e d a ta is s to re d . B ig D a ta s to ra g e s h o u ld s up p o rt b o th s tru c tu re d a n d
u n s tru c tu re d d a ta . It n e ed s to b e s c a la b le , relia b le, a n d h ig h ly a v a ila b le . S o m e c o m m o n
typ e s o f d a ta s to ra g e in B ig D a ta s ys tem s in c lu d e:

● HDFS (Hadoop Distributed File System): A s c a la b le, d is trib u ted f ile s ys te m .

● NoSQL databases: M o n g o D B , C a s s a n d ra , H B a s e fo r n o n -rela tio n a l d a ta .
● Data Lakes: A c e n tra l re p o s ito ry fo r s to rin g raw d a ta in its n a tive fo rm at (e .g ., A W S
S 3 , A z u re B lo b S to ra g e ).

4. Data Processing Layer

T h is la ye r p ro c e s s es the s to red d a ta a n d tra n s f o rm s it in to va lu a b le in s ig h ts . It c a n b e

d iv id ed in to tw o m a jo r a p p ro a c h es :

● Batch Processing: P ro c e s s in g d a ta in la rg e, s c h ed u led in terva ls (e. g ., H a d o o p

M a p R e d u c e ).
● Stream Processing: P ro c e s s in g d a ta in rea l-tim e a s it flo w s in (e.g ., A p a c h e F lin k ,
A p a c h e S to rm , Sp a rk S trea m in g ).

S o m e k e y p ro c es s in g to o ls :

● Apache Spark: A fa s t a n d g e n era l-p u rp o s e c lu s ter-c o m p u tin g s ys tem .

● Apache Hadoop: A fra m e w o rk fo r d is trib u ted s to ra g e a n d p ro c es s in g .
● Flink and Storm: U s ed fo r rea l-tim e d a ta s trea m p ro c es s in g .

5. Data Analytics Layer

O n c e d a ta is p ro c e s s ed , it is o ften a n a lyze d to ex tra c t in s ig h ts . T he a n a lytic s la yer

p ro vid e s to o ls fo r c o m p le x a n a lys is , in c lu d in g :

● Machine Learning (ML): B u ild in g p red ic tiv e m o d els a n d p a tte rn s u s in g a lg o rith m s .

● Data Mining: D is c o v erin g h id d en p a ttern s a n d tren d s in d a ta .
● Business Intelligence (BI): T o o ls lik e T a b lea u , P o w e r B I fo r rep o rtin g a n d
vis u a liz a tio n .

P o p u la r to o ls u s ed fo r a n a lytic s :

● Apache Hive: A d a ta w a reh o u s e b u ilt o n to p o f H a d o o p f o r q u eryin g a n d a n a lyzin g

la rg e d a ta s ets .
● Apache Impala: A h ig h -p e rfo rm a nc e S Q L en g in e fo r b ig d a ta .
● Python libraries (Pandas, scikit-learn): F o r d a ta m a n ip u la tio n a n d m a c h in e
le a rn in g .

6. Data Presentation Layer

T h is la ye r p re s en ts th e in s ig h ts d e riv ed fro m th e a n a lytic s la yer. It o fte n in v o lve s

d a s h b o a rd s , re p o rts , a n d v is u a liza tio n s . U s e rs , s ta k e h o ld e rs , o r s y s te m s w ill in te ra c t w ith
th is la yer to m a k e d a ta -d rive n d ec is io n s . T o o ls in c lu d e :
● BI tools: T a b le a u , P o w er B I, Q lik V iew .
● Custom web interfaces: T o d is p la y rep o rts , g ra p h s , a n d a n a lys is .

7. Security and Governance Layer

G iv en th e la rg e vo lu m e s a n d s e n s itiv ity o f d a ta , s e c u rity a n d g o v ern a n c e a re c ritic a l. T h is

la yer en s u res d a ta p riva c y, a c c es s c o n tro l, a n d re g u la to ry c o m p lia n c e .

● Authentication/Authorization: E n s u rin g o n ly a u th o rize d u s ers c a n a c c es s s p e c ific

d a ta .
● Data Encryption: T o p ro tec t s en s itive d a ta a t re s t a n d in tra n s it.
● Data Lineage: T ra c k in g th e o rig in a n d m o v em en t o f d a ta to en s u re tru s tw o rth in e s s .
● Compliance: A d h erin g to re g u la tio n s s u c h a s G D P R , H IP A A , e tc .

8. Orchestration and Management Layer

B ig D a ta s ys te m s req u ire c o m p le x m a na g em en t fo r c o o rd in a tio n , s c h e d u lin g , a n d

m o n ito rin g .

● Apache Airflow: A n o p e n -s o u rc e p la tfo rm to p ro g ra m m a tic a lly a u th o r, s c h ed u le ,

a n d m o n ito r w o rk flo w s .
● Kubernetes: F o r m a n a g in g c o n ta in eriz ed a p p lic a tio n s a n d e n s u rin g s c a la b ility an d
re lia b ility.

Key Technologies in Big Data Architecture:

● Hadoop Ecosystem: F o r s to ra g e a n d p ro c e s s in g ( H D F S , Y A R N , M a p R e d u c e, P ig ,
H iv e, e tc .).
● Apache Kafka: F o r rea l-tim e s trea m ing .
● Apache Spark: F o r fa s t in -m em o ry d a ta p ro c es s in g .
● NoSQL Databases: M o n g o D B , C a s s a n d ra , H B a s e.
● Cloud Platforms: A W S , A z u re , G o o g le C lo u d p ro vid e to o ls fo r s to ra g e , p ro c es s in g ,
a n d m a n a g em en t.

Example of Big Data Architecture

+ ---------------------+
| D a ta S o u rc e s |
+ ---------------------+
|
v
+ ---------------------+ + --------------------+
| D ata In g es tio n | ---> | D ata S to ra g e |
| (B a tc h /S trea m in g ) | | (H D F S , N o S Q L , |
+ ---------------------+ | D a ta La k e s ) |
| + --------------------+
v |
+ --------------------+ v
| D a ta P ro c es s in g | + ---------------------+
| (B a tc h /S trea m ) | ---> | D a ta A n a lytic s |
+ --------------------+ | (M L, B I, A n a ly s is ) |
| + ---------------------+
v |
+ ---------------------+ v
| D a ta P res e n tatio n | < --------> + -----------------+
| (D a s h b o a rd s , R ep o rts | | S e c u rity & |
| V is u a liz a tio n ) | | G o v ern a n c e |

T h is h ig h -lev el o ve rview d e m o n s tra tes th e flo w o f d a ta th ro u g h th e a rc h ite c tu re fro m

c o lle c tio n to p ro c es s in g a n d p res e nta tio n .

(ii) know the concept of Hadoop architecture

Hadoop Architecture Overview

H a d o o p is a n o p e n -s o u rc e fra m e w o rk f o r p ro c e s s in g a n d s to rin g la rg e d a ta s ets in a

d is trib u te d c o m p u tin g en v iro n m e n t. It is d es ig ne d to s c ale f ro m a s in g le s e rv er to
th o u s a n d s o f m a c h in e s , ea c h o ffe rin g lo c a l c o m p u ta tio n a n d s to ra g e. U n d e rs ta n d in g
H a d o o p a rc h itec tu re is es s e n tia l fo r w o rk in g w ith H a d o o p -b a s ed s ys tem s . B elo w is a
d e ta ile d o v ervie w o f th e Hadoop architecture , its c o m p o n en ts , a n d h o w th ey w o rk
to g eth er.

Key Components of Hadoop Architecture

T h e a rc h itec tu re o f H a d o o p p rim a rily rev o lv es a ro u n d th re e m a in c o m p o n en ts :

1 . Hadoop Distributed File System (HDFS)

2 . MapReduce
3 . YARN (Yet Another Resource Negotiator)

T h es e c o m p o n en ts w o rk to g e th e r to p ro vid e a d is trib u ted s ys te m th a t c a n s to re a n d

p ro c e s s la rg e v o lu m es o f d a ta .

1. Hadoop Distributed File System (HDFS)

H D F S is th e s to ra g e la ye r o f H a d o o p . It is d es ig ne d to s to re v a s t a m o u n ts o f d a ta a c ro s s
m u ltip le m a c h in es in a d is trib u ted en viro n m en t.

● Block-based storage : H D F S s to re s d a ta in b lo c k s (typ ic a lly 1 2 8 M B o r 2 5 6 M B b y

d e fa u lt) . E a c h file is d ivid ed in to b lo c k s , w h ic h a re th e n d is trib u ted ac ro s s m u ltip le
n o d es .
● Fault tolerance : H D F S e n s u re s fa u lt to lera n c e b y re p lic a tin g b lo c k s . T h e d efa u lt
rep lic a tio n fa c to r is 3 (e a c h b lo c k is c o p ie d th re e tim e s a c ro s s th e c lu s te r). If o n e
n o d e fa ils , th e d a ta c a n s till b e a c c es s e d fro m a n o th er re p lic a .
● NameNode : T h e NameNode is th e m a s te r n o d e in H D F S th a t m a n a g e s th e
m eta d a ta (s u c h a s b lo c k lo c a tio n s ) fo r th e file s . It d o es n o t s to re th e d a ta its elf b u t
k ee p s trac k o f w h ere th e b lo c k s a re s to red a c ro s s th e c lu s te r.
● DataNode : DataNodes a re th e w o rk e r n o d es th a t s to re th e a c tu a l d a ta in th e fo rm
o f b lo c k s . E a c h D a ta N o d e is re s p o ns ib le fo r s ervin g th e b lo c k s o n req u es t a n d
p e rfo rm in g b lo c k -lev el o p e ra tio n s (lik e b lo c k c re a tio n , d ele tio n, a nd re p lic a tio n ).

HDFS Architecture Diagram :

+ -------------------+ + -------------------+
| C lie nt | | C lien t |
+ -------------------+ + -------------------+
| |
+ ---------------+ + ---------------+
| N a m eN o d e | | N am e N ode |
+ ---------------+ + ---------------+
| |
+ -------------------+ + -------------------+
| D a ta N o d e | | D a ta N o d e |
+ -------------------+ + -------------------+

2. MapReduce

M a p R e d u c e is th e p ro c e s s in g la yer o f H a d o o p . It is a p ro g ra m m in g m o d el u s ed f o r
p ro c e s s in g la rg e d a ta s e ts in p a ra lle l a c ro s s a d is trib u ted c lu s te r.

● Map phase : In th e M a p p h a s e , th e in p u t d a ta is d ivid e d in to c h u n k s (c a lled s p lits ),

a n d e a c h c h u n k is p ro c e s s ed b y a mapper. T h e m a p p er p ro c es s e s th e d a ta a n d
g e n era tes a s et o f in te rm e d ia te k ey-v a lu e p a irs .
● Shuffle and Sort: A fter th e M a p p h a s e, th e in te rm e d ia te k ey-v a lu e p a irs are
s h u ffled a n d s o rted . Th e s ys te m g ro u p s th e d a ta b y k ey a n d p re p a res it f o r th e
Re duc e phase .
● Reduce phase : In th e R e d u c e p ha s e , th e s ys tem a p p lies th e red u c e fu n c tio n to th e
s o rted in term ed ia te d a ta , a g g reg a tin g o r tra n s fo rm in g th e d a ta in s o m e w a y. T he
res u lts a re w ritten to the o u tp u t file s .

MapReduce Architecture Diagram:

+ -------------+
| In p u t | ----> [M a p ] ----> [S hu ffle & S o rt ] ----> [R e d u c e ] ----> O u tp u t
+ -------------+

● JobTracker : T h e JobTracker is th e m a s ter d a e m o n in th e M ap R e d u c e fra m e w o rk .

It is res p o n s ib le f o r s c h ed u lin g a n d m o n ito rin g jo b s , d ivid in g th e w o rk in to ta s k s ,
a n d a llo c atin g ta s k s to T a s k T ra c k e rs .
● TaskTracker: TaskTrackers a re w o rk er d a e m o n s th a t ru n o n th e c lu s ter n o d es a n d
ex ec u te tas k s a s s ig n e d b y th e J o b T ra c k e r. E a c h T a s k T ra c k er h a n d les b o th M a p
a n d R ed u c e ta s k s .

3. YARN (Yet Another Resource Negotiator)

Y A R N is th e res o u rc e m a n a g e m e n t la yer o f H a d o o p , re s p o n s ib le fo r m a na g in g re s o u rc es
a c ro s s th e c lu s te r a n d s c h ed u lin g th e ex ec u tio n o f ta s k s .

● ResourceManager (RM): T h e ResourceManager is th e m a s ter d a e m o n in Y A R N ,

w h ic h m a n a g es th e a llo c a tio n o f res o u rc es (m em o ry, C PU ) to th e v a rio u s
a p p lic a tio n s ru n n in g o n th e c lu s ter. It m a k e s s u re th a t re s o u rc e s a re a llo c a te d
b a s e d o n jo b req u ire m e n ts a n d c lus te r a v a ila b ility.
● NodeManager (NM): T h e NodeManager ru n s o n ea c h no d e in th e c lu s ter. It is
res p o n s ib le fo r m a n a g in g res o u rc es o n th e in d iv id u a l n o d e a n d m o n ito rin g th e
s ta tu s o f th e n o d e .
● ApplicationMaster (AM): T h e ApplicationMaster is a p e r-a p p lic a tio n e n tity th a t
m a n a g es th e lifec yc le o f a jo b . It n e g o tia tes re s o u rc es w ith th e R es o u rc eM a n a g e r
a n d m o n ito rs th e p ro g re s s o f its a p p lic a tio n (M a p R ed u c e jo b o r S p a rk jo b ).

YARN Architecture Diagram :

Hadoop Ecosystem Components

A p a rt fro m th e c o re c o m p o n en ts (H D F S , M a p R e d u c e, a n d Y A R N ), H a d o o p h a s a ric h
ec o s ys te m th a t in c lu d e s s ev era l to o ls a n d f ra m ew o rk s fo r d iff eren t u s e c a s e s . S o m e o f
th e k ey c o m p o n en ts in c lu d e:

● Hive : A d a ta w a reh o u s e s ys tem th a t fa c ilita tes q u e ryin g an d m a n a g in g la rg e

d a ta s e ts in H D F S u s in g S Q L -lik e q u e rie s .
● Pig: A p la tfo rm fo r a n a lyzin g la rg e d a tas e ts , p ro v id in g a h ig h -lev el lan g u a g e c a lle d
P ig La tin f o r p ro c es s in g a n d tra n s fo rm in g d a ta .
● HBase : A N o S Q L d a ta b a s e fo r rea l-tim e re a d / w rite a c c es s to la rg e d a ta s e ts s to red
in H D F S .
● Sqoop: A to o l fo r tra n s fe rrin g d a ta b e tw e en H a d o o p a n d re la tio n a l d a ta b a s es .
● Flume : A s erv ic e fo r c o llec tin g a n d a g g reg a tin g lo g d a ta a n d o th er typ es o f
s tre a m in g d a ta .
● Oozie : A w o rk f lo w s c h ed u le r f o r m a n a g in g H a d o o p jo b s .
● Zookeeper : A s e rvic e fo r c o o rd in a tin g d is trib u ted a p p lic a tio n s in th e H a d o o p
ec o s ys te m .
● Mahout: A m a c h in e lea rn in g lib ra ry fo r s c a la b le m a c h in e lea rn in g a lg o rith m s .

Hadoop Architecture Diagram (Complete)

+ ------------------+
| C lie n t N o d e |
+ ------------------+
|
+ ------------------+ --------+ ----------+ ------------------+
| H D F S (St o ra g e La yer) | Y A R N (R es o u rc e M a n a g er)
+ --------------------------------+ + ----------------------------------+
| N a m e N o d e (M a s t er) | | R es o urc e M a na g e r (M a s te r) |
| D a ta N o d e (W o rk er) | | N o d eM a n a g er (W o rk e r) |
+ --------------------------------+ + ----------------------------------+
| |
+ ------------------+ + ------------------------+
| M ap R ed u c e La yer | | A p p lic a tio n M a s te r |
+ ------------------+ + ------------------------+
Key Characteristics of Hadoop Architecture

1 . Scalability: H a d o o p is d es ig n e d to s c a le h o riz o n ta lly. A s yo u r d a ta g ro w s , y o u c a n

a d d m o re n o d e s to th e c lu s te r.
2 . Fault Tolerance : T h ro u g h rep lic a tio n a n d d a ta d is trib u tio n , H a d o o p e n s u re s th a t
th e d a ta is n o t lo s t ev en w h en in d ivid u a l n o d es fa il.
3 . Cost Efficiency: H a d o o p ru n s o n c o m m o d ity h a rd w a re, m e a n in g yo u c a n b u ild
la rg e-s c a le c lu s ters w ith lo w -c o s t m a c h in e s .
4 . Data Locality: H a d o o p trie s to m o v e c o m p u ta tio n to w h ere th e d a ta is s to re d to
m in im ize n e tw o rk c o n g e s tio n a n d s p ee d u p p ro c e s s in g .

2. Loading DataSet in to HDFS for Spark Analysis Installation of Hadoop and cluster
management

(i) Installing Hadoop single node cluster in ubuntu environment

(ii) Knowing the differencing between single node clusters and multi-node clusters

(iii) Accessing WEB-UI and the port number

(iv) Installing and accessing the environments such as hive and sqoop

Installing Hadoop Single Node Cluster in Ubuntu Environment

Prerequisites:

● A fres h U b u n tu s ys tem o r a v irtu a l m a c h in e ru n n in g U b u n tu .

● J a va s h o u ld b e in s ta lle d (H a d o o p req u ire s J a v a 8 o r la te r).
● A u s e r w ith s u d o p riv ile g e s .

Step-by-Step Installation:

1 . Install Java (JDK):

H a d o o p req u ire s Ja v a to b e in s ta lle d . In s ta ll J a v a 8 o r a c o m p a tib le ve rs io n .

sudo a pt upda te
s u d o a p t in s ta ll o p en jd k -8 -jd k

V erify th e J a va in s ta lla tio n:

ja v a -ve rs io n

2 . Install Hadoop:
o F irs t, d o w n lo a d H a d o o p b in a ries fro m th e o ff ic ial A p a c h e w e b s ite. Y o u c a n
d o w n lo a d a s ta b le ve rs io n u s in g w g e t :
3. w g et ht tp s ://a rc h iv e. ap ac h e.o rg /d is t/h a d o o p /c o m m o n /h ad o o p -3 .3 .1 /h a d o o p -3 .3 .1 .t ar.g z
o E x tra c t th e d o w n lo a d e d ta r file:
4. ta r -xz vf h a d o o p -3 .3 .1 .ta r. g z

o M o v e it to th e /o p t d ire c to ry:
5. s u d o m v ha d o o p -3 . 3 .1 /o p t /h ad o o p

6 . Set Environment Variables:

A d d H a d o o p -re la te d en v iro n m e n t v a ria b le s to th e . b a s hrc file:

n a no ~ /.b a s h rc

A d d th e fo llo w in g lin es a t th e e n d o f th e file:

ex p o rt H A D O O P _ H O M E = /o p t /ha d o o p
ex p o rt P A T H = $ P A T H :$ H AD O O P _ H O M E /b in:$ H A D O O P _H O M E /s b in
ex p o rt H A D O O P _ C O N F _ D IR = $ H A D O O P _ H O M E /et c /h a d o o p
ex p o rt Y A R N _ CO N F _ DI R =$ H AD O O P _ H O M E /etc /h ad o o p

A f te r s a v in g an d c lo s in g , a p p ly th e c h a n g es :

s o u rc e ~ /.b a s h rc
7 . Configure Hadoop:

In th e H a d o o p c o n fig u ra tio n d ire c to ry, yo u 'll n ee d to ed it s ev era l X M L f ile s to s e t u p

th e c lu s ter.

o core-site.xml :

E d it th e c o re c o n fig u ra tio n to s e t th e H D F S U R I.

n a no $ H AD O O P _ H O M E /etc /h ad o o p /c o re -s ite. xm l

A d d th e fo llo w in g c o n fig u ra tio n :

< c o nfig u ra t io n>

< n a m e > fs .d e fa u ltF S < /na m e >
< v a lue > h d fs ://lo c a lh o s t:9 0 0 0 < /va lu e>

< /c o nfig u ra t io n>

o hdfs-site.xml:

C o n fig u re H D F S d irec to rie s a n d re p lic a tio n :

n a no $ H AD O O P _ H O M E /etc /h ad o o p /h d fs -s ite .x m l

Ad d:

< c o nfig u ra t io n>

< n a m e > d fs .rep lic a tio n < /n a m e>
< v a lue > 1 < /v a lue >


< n a m e > d fs .n a m en o d e.n a m e. d ir< /n a m e>
< v a lue > file:///o p t /ha d o o p /h d fs /n am en o d e < /va lu e>


< n a m e > d fs .d a t an o d e.d a t a. d ir< /n a m e>
< v a lue > file:///o p t /ha d o o p /h d fs /d a ta n o d e < /va lu e>

< /c o nfig u ra t io n>

o mapred-site.xml:
S et u p th e M ap R e d u c e fra m e w o rk :

n a no $ H AD O O P _ H O M E /etc /h ad o o p /m a p red -s it e.x m l

Ad d:

< c o nfig u ra t io n>

< n a m e > m a p red u c e .fra m e w o rk .n a m e< /n a m e >
< v a lue > ya rn< /v a lue >

< /c o nfig u ra t io n>

o yarn-site.xml:

C o n fig u re Y A R N s ettin g s :

n a no $ H AD O O P _ H O M E /etc /h ad o o p /ya rn-s it e.x m l

Ad d:

< c o nfig u ra t io n>

< n a m e > ya rn .res o u rc em an a g e r.a d d res s < /n a m e>
< v a lue > lo c alh o s t :8 0 3 2 < /v a lu e >


< n a m e > ya rn .n o d em a n a g er.a ux -s e rv ic es < /n a m e>
< v a lue > m a p red u c e_ s h uffle< /v a lu e >

< /c o nfig u ra t io n>

8 . Format HDFS:

B ef o re s ta rtin g H a d o o p , f o rm a t th e H D F S :

h d fs na m e n o d e -fo rm a t

9 . Start Hadoop Daemons:

S ta rt th e H a d o o p d a e m o n s (N a m e N o d e , D a ta N o d e , R e s o u rc e M a na g er,
N o d eM a n a g e r):

s ta rt-d fs .s h
s ta rt-ya rn. s h
1 0 . Verify the Installation:
o C h e c k if th e H D F S is ru n nin g p ro p e rly:
o jp s
o C h e c k if th e R es o u rc e M a n a g e r a n d N o d e M a n a g er a re ru n n in g a s w e ll:
o jp s
o Y o u c a n a ls o c h e c k th e H a d o o p W e b U I to v ie w th e s ta tu s o f yo u r c lu s te r.

(ii) Differences Between Single-Node and Multi-Node Clusters

Single-Node Cluster:

● A s in g le -n o d e c lu s ter is a H a d o o p s e tu p w h e re a ll the H ad o o p s erv ic es

(N a m eN o d e, D a ta N o d e, R es o u rc eM a n a g e r, a n d N o d e M an a g er) ru n o n o n e
m a c h in e (lo c a lh o s t).
● It is s im p ler to s et up a n d u s efu l fo r d e ve lo p m e n t a n d tes tin g p u rp o s es .
● Lim ite d s c a la b ility a n d n o d is trib u te d c o m p u ta tio n c a p a b ility in th e tru e s en s e o f a
m u lti-n o d e c lu s ter.

Multi-Node Cluster:

● A m u lti-n o d e c lu s te r in vo lv es m u ltip le m a c h in e s , w h ere o n e n o d e a c ts as th e

m a s te r (N a m eN o d e, R es o u rc eM a n a g e r) a n d o th ers a s s la v es (D a ta N o d e,
N o d eM a n a g e r).
● It o ffe rs th e tru e p o w e r o f d is trib u te d c o m p u tin g a n d s to ra g e, e n a b lin g s c a la b ility
a n d fa u lt to le ra n c e.
● It re q u ires m o re c o m p lex c o n fig u ra tio n , n e tw o rk s etu p , a n d h a rd w are re s o u rc e s .
● It is u s e d in p ro d u c tio n e n viro n m en ts w h e re la rg e-s c a le d a ta p ro c es s in g is
req u ired .

(iii) Accessing WEB-UI and the Port Number

H a d o o p p ro vid es a Web UI to m o n ito r th e c lus te r's h ea lth a n d p e rfo rm a n c e. T h e fo llo w in g

a re th e k ey p o rts :

● NameNode Web UI: ht tp ://lo c alh o s t :5 0 0 7 0 – F o r m o n ito rin g H D FS s ta tu s .

● ResourceManager Web UI: ht tp ://lo c alh o s t :8 0 8 8 – F o r m o n ito rin g th e Y A R N
res o u rc e m a n ag er.
● JobHistory Server : ht tp ://lo c alh o s t :1 9 8 8 8 – F o r tra c k in g M a p R ed u c e jo b h is to ry.

M a k e s u re th e s e p o rts a re o p en a n d a c c e s s ib le.

(iv) Installing and Accessing Environments such as Hive and Sqoop

Hive Installation:

1 . Install Hive:

Y o u c a n d o w n lo ad th e la tes t s ta b le ve rs io n o f A p a c h e H ive fro m th e A p a c h e

w eb s ite o r in s ta ll it v ia a p t if a va ila b le .

s u d o a p t-g et in s ta ll hiv e

2 . Configure Hive:

H ive req u ire s a m eta s to re (typ ic a lly M yS Q L o r D e rb y). Y o u c a n c o n fig u re it b y

ed itin g h ive -s ite .xm l:

n a no $ H IV E _ H O M E /c o n f/h iv e-s ite .x m l

3 . Access Hive:

A f te r in s talla tio n a nd c o n fig u ra tio n , s ta rt H ive :

h ive

T h is o p e n s th e H ive C L I w h e re yo u c a n ex ec u te H iv e q u erie s .

Sqoop Installation:

1 . Install Sqoop:

D o w n lo a d a n d in s ta ll S q o o p , w h ic h is u s ed fo r tra n s ferrin g d a ta b etw ee n rela tio n a l

d a ta b a s e s a n d H a d o o p .

s u d o a p t-g et in s ta ll s q o o p

2 . Configure Sqoop:
S et u p d a ta b a s e c o n n ec tio n c o n fig u ra tio n s in S q o o p b y ed itin g th e s q o o p -s ite. xm l
file.

3 . Access Sqoop:

T o u s e S q o o p to im p o rt o r ex p o rt d a ta , yo u c a n ru n c o m m a n d s lik e:

s q o o p im p o rt --c o nn e c t jd b c :m y s q l://lo c a lh o s t /d a ta b a s e --ta b le ta b le na m e --u s erna m e us e r

--p as s w o rd p a s s

3. File management tasks & Basic linux commands

(i) Creating a directory in HDFS

(ii) Moving forth and back to directories

(iii) Listing directory contents

(iv) Uploading and downloading a file in HDFS

(v) Checking the contents of the file

(vi) Copying and moving files

(vii) Copying and moving files between local to HDFS environment

(viii) Removing files and paths

(ix) Displaying few lines of a file

(x) Display the aggregate length of a file

(xi) Checking the permissions of a file

(xii) Zipping and unzipping the files with & without permission pasting it to a location

(xiii) Copy, Paste commands

H ere’ s a b re ak d o w n o f file m a n a g e m e n t ta s k s a n d b a s ic Lin u x c o m m a n d s , p artic u la rly
fo c u s e d o n HDFS (Hadoop Distributed File System) o p era tio n s :

(i) Creating a directory in HDFS:

T o c re a te a d ire c to ry in H D F S , yo u c a n u s e th e h ad o o p fs -m k d ir c o m m a n d .

hadoop fs -mkdir /path/to/your/directory

T h is w ill c re a te a d ire c to ry a t th e s p ec ifie d p a th in H D F S .

(ii) Moving forth and back to directories:

Y o u c a n n a vig a te d ire c to ries in th e L in u x file s ys te m u s in g th e c d c o m m an d .

● T o m o v e to a d ire c to ry:
● c d /p at h /to /d irec t o ry

● T o m o v e b a c k to th e p re vio u s d ire c to ry:

● cd -

● T o m o v e u p o n e d irec to ry lev el:

● cd ..

F o r H D F S d irec to rie s , yo u u s e th e h ad o o p fs -ls c o m m a n d to lis t th e c o n te nts a n d h a d o o p fs

-c d to c h a n g e d ire c to ries .

(iii) Listing directory contents:

T o lis t c o n ten ts o f a d ire c to ry, w h e th e r in H D F S o r lo c a l, yo u u s e th e ls c o m m a n d .

● In H D F S :
● h a d o o p fs -ls /p a th /to /d irec to ry
● In Lo c a l F ile Sy s te m :
● ls /p a th /to /d irec to ry

(iv) Uploading and downloading a file in HDFS:

T o upload a file to H D F S :

h a d o o p fs -p u t /lo c al/p a t h/t o /file /hd fs /p at h /to /d irec t o ry

T o download a file fro m H D F S:

hadoop fs -get /hdfs/path/to/file /local/path/to/directory

(v) Checking the contents of the file:

Y o u c a n c h e c k th e c o n te n ts o f a file u s in g th e c a t c o m m an d .

● In H D F S :
● h a d o o p fs -c a t /p at h /to /file
● In Lo c a l F ile Sy s te m :
● c a t /p at h /to /file

(vi) Copying and moving files:

● Copying files:
o T o c o p y a file w ith in H D F S :
o h a d o o p fs -c p /h d fs /s o u rc e/p at h /h d fs /d e s tin a tio n /p a th
o T o c o p y a file fro m lo c a l to H D F S :
o h a d o o p fs -c o p yF ro m Lo c al /lo c a l/s o u rc e/p a t h /h d fs /d e s tin at io n /p a th

o T o c o p y a file fro m H D F S to lo c a l:
o h a d o o p fs -c o p yT o Lo c a l /h d fs /s o u rc e/p a t h /lo c al/d e s tin a tio n /p a th

● Moving files:
o T o m o v e a file w ith in H D F S :
o h a d o o p fs -m v /hd fs /s o urc e /p a th /hd fs /d es tin a tio n /p a th

o T o m o v e a file fro m lo c a l to H D F S :
o h a d o o p fs -m o v eF ro m Lo c a l /lo c a l/s o urc e /p a th /hd fs /d es tin a tio n /p a th

o T o m o v e a file fro m H D F S to lo c a l:
o h a d o o p fs -m o v eT o L o c a l /hd fs /s o urc e /p a th /lo c a l/d es t in a tio n /p a t h

(vii) Copying and moving files between local and HDFS environment:

● Copying a file from local to HDFS:

● h a d o o p fs -c o p yF ro m Lo c al /lo c a l/p a th /t o /file /h d fs /p a t h/t o /d e s tin at io n
● Copying a file from HDFS to local:
● h a d o o p fs -c o p yT o Lo c a l /h d fs /p a th /t o /file /lo c a l/p at h /to /d e s tin a tio n
● Moving a file from local to HDFS:
● h a d o o p fs -m o v eF ro m Lo c a l /lo c a l/p at h /to /file /h d fs /p a th /to /d es tin a tio n
● Moving a file from HDFS to local:
● h a d o o p fs -m o v eT o L o c a l /hd fs /p at h /to /file /lo c a l/p a th /to /d es t in a tio n

(viii) Removing files and paths:

T o rem o v e files a n d d ire c to ries , yo u c a n u s e th e -rm a n d -r o p tio n s fo r d irec to ries .

● Remove a file in HDFS:
● h a d o o p fs -rm /h d fs /p a t h/t o /file
● Remove a directory in HDFS:
● h a d o o p fs -rm -r /h d fs /p a th /to /d irec to ry
● Remove a file locally:
● rm /lo c a l/p at h /to /file

● Remove a directory locally:

● rm -r /lo c a l/p a th /to /d irec to ry

(ix) Displaying few lines of a file:

T o d is p lay th e firs t fe w lin es o f a file:

● In H D F S :
● h a d o o p fs -h ea d /p a th /t o /file

● In Lo c a l F ile Sy s te m :
● h ea d /p a th /t o /file

(x) Display the aggregate length of a file:

Y o u c a n g e t th e file s ize u s in g th e -d u (d is k u s a g e) c o m m a n d .

● In H D F S :
● h a d o o p fs -d u -s /p a th /to /file

● In Lo c a l F ile Sy s te m :
● d u -s h /p a th /to /file

T h is w ill d is p la y th e to ta l s iz e o f th e file.

(xi) Checking the permissions of a file:

Y o u c a n c h e c k th e p e rm is s io n s o f a file u s in g th e -ls c o m m a n d , w h ic h w ill s h o w th e file

p e rm is s io n s .

● In H D F S :
● h a d o o p fs -ls /p a th /to /file
● In Lo c a l F ile Sy s te m :
● ls -l /p at h /to /file

T h is w ill d is p la y th e p erm is s io n s , o w n er, a n d g ro u p o f th e file o r d irec to ry.

(xii) Zipping and unzipping files with and without permission pasting it to a
location:

Y o u c a n zip a n d u n z ip file s u s in g th e z ip a n d u nz ip c o m m a n d s .

● Zipping a file:
● z ip filen a m e. zip /p a th /t o /file
● Unzipping a file:
● u n zip file n am e.z ip -d /p a th /to /ex tra c t

T o m a in ta in p erm is s io n s w h ile tra n s fe rrin g a file, u s e th e -p o p tio n in c p o r rs yn c fo r

p re s ervin g p erm is s io n s .

E x a m p le w ith rs y nc :

rs y nc -a v /p a th /t o /s o u rc e /p a th /to /d es t in a tio n

(xiii) Copy, Paste Commands:

● Copy Command (F o r lo c a l f ile s ys te m ) :

● c p /s o urc e /p a th /d es t in a tio n /p a t h

For H DFS:

h a d o o p fs -c p /s o urc e /hd fs /p at h /d es tin a tio n /h d fs /p a th

● Paste Command ( T o p a s te a file a fte r c o p yin g it): T h is is g e n era lly d o n e b y u s in g

cp o r m v a s m e n tio n e d a b o ve . T h ere's n o s p ec if ic "p a s te " c o m m a n d , b u t th e
o p era tio n is p erfo rm ed th ro u g h th es e c o m m a n d s w h en m o vin g o r c o p yin g d a ta .

4. Map-reducing

(i) Definition of Map-reduce

(ii) Its stages and terminologies

(iii) Word-count program to understand map-reduce (Mapper phase, Reducer

phase, Driver code)

(i) Definition of Map-Re du ce :

M a p-Red u ce is a prog ram m ing m od el a nd pro ce ssing te ch niq ue u se d to proc ess and ge nera te
la rg e d ata se ts. It a llo w s th e pa ra llel proc essing of da ta b y divid ing it into sm a ll ch u nks a nd
distribu ting it ac ross m u ltiple nod es in a c lu ster. T h e m a in co nc ep t inv olv es tw o ke y o pera tions:
Map a nd Reduce.

● Map : T h e m ap fu nc tion p roc esses inpu t d a ta a nd p rodu c es a se t of inte rm ed ia te key-va lue

pa irs.
● Reduce: T h e red u ce fu nction ta ke s th e interm ed ia te key-v alu e p a irs, pro ce sses th e m , a nd
m erg es th e m to prod u ce th e fina l re su lt.

M a p-Red u ce is w ide ly u se d in distribu ted syste m s like H a d oop fo r la rg e-sca le da ta p roc essing
ta sks.

(ii) Stages and Terminologies in Map-Reduce:

T h e M a p-Re du ce p roce ss is split into tw o m a in sta ges: th e Map sta ge and th e Reduce sta ge, bu t
sev era l oth e r inte rm ediate p roce sses a nd term ino log ies com e into pla y.

1. Map Stage:
o T h e inpu t da ta is divid ed into ch u nks (u su a lly file s or re co rds).
o T h e Mapper fu nctio n pro cesse s e a ch c hu nk a nd ou tpu ts interm e dia te key-v a lu e
pa irs.
o T h e interm ed ia te ou tpu t is sorte d a nd grou p ed by ke y (ca lled th e sh u ffle ph a se ).
2. Shuffle and Sort :
o A fter th e m a p p h ase , th e interm e dia te ke y-v a lu e pa irs a re sh u ffle d a nd sorted to
ensu re th a t a ll va lue s c orresp onding to th e sa m e key a re g rou pe d to geth e r. Th is
step h a ppe ns a u tom a tic a lly in M a p-Re du ce fra m e w orks like H a do op.
3. Reduce Stage:
o T h e Reducer fu nctio n pro ce sses e a ch grou p o f interm ed ia te key-v alu e p airs a nd
m erg es th e m to prod u ce a fina l ou tpu t. It c a n a ggre ga te , su m m arize , or proc ess
da ta in a ny oth e r w a y requ ired by th e u ser.
4. Output:
o A fter th e red u ce ph a se, th e fina l o u tpu t is w ritte n to a file o r a d at ab a se.

Key Terminologies in Map-Reduce:

● Mapper: T h e fu nction o r pro cess th at rea ds inpu t d a ta , proc esse s it, a nd ou tp uts key-v alu e
pa irs.
● Reducer: T h e fu nction th a t p roce sses th e g rou pe d ke y-v a lu e pa irs from th e m a pp er a nd
pe rfo rm s th e fina l ag gre ga tion or c om pu ta tio n.
● Key-Value Pair: T h e fu nda m enta l u nit o f d at a in M a p-Re du ce , w h e re e a ch re cord is
rep rese nted a s a key pa ire d w ith a v a lu e.
● Shuffle: T h e p roce ss of re distribu ting th e d a ta a cross redu c ers ba sed on ke ys, e nsu ring
th a t a ll va lue s for th e sa m e ke y a re sent to th e sa m e re du ce r.
● Input Split: T h e u nit of w ork or c h u nk of da ta th a t is sent to a m a pp er.
● Output: Th e final resu lt a fte r pro ce ssing in th e redu c e p h ase , u su a lly sa ve d to disk or a
stora g e syste m .

(iii) Word-Count Program to Understand Map-Reduce:

H e re is a sim ple ex a m ple of a W ord-C ou nt p rogra m to d em onstra te th e M a p -Re du c e p roce ss. W e

w ill b rea k it into th ree m a in pa rts:

1. Mapper Phase:

T h e m a pp er rea ds inpu t te xt and em its key-v a lu e p a irs, w he re th e ke y is a w ord , a nd th e v alu e is 1

(repre senting a sing le occ u rrence of th e w ord).

Mapper code (in P yth on o r a ny su ita b le la ngu a ge):

im p ort sys

# M a p per fu nction
de f m a p per():
for line in sys.stdin:
w ords = line .sp lit()
for w ord in w o rds:
# Em it w ord w it h va lu e 1
p rint(f"{w o rd}\t1")

if na m e = = "m ain":

m a pp er()

In th is c ode :

● T h e inpu t is a line of tex t.

● T h e line is split into w o rds.
● For ea ch w ord, a key-v a lu e p a ir is em itte d, w h ere th e ke y is th e w ord, a nd th e va lu e is 1.

2. Shuffle and Sort:

A fter th e m a p p h ase , th e fra m e w ork a u to m atica lly gro u ps a nd sorts th e em it ted key-va lue p airs.
For insta nc e, all inst anc es o f the w o rd "h e llo " w ill b e g rou pe d to geth e r so th a t th ey ca n be pa sse d
to th e sa m e red uc er.
Ex a m ple of sh u ffle d da ta :

h ello 1
h ello 1
w o rld 1
w o rld 1
da ta 1

3. Reducer Phase:

T h e re du ce r p roce sses the grou ped ke y-v a lu e pa irs. It a gg reg ate s th e va lue s by su m m ing th e m to
ge t th e tota l cou nt for ea ch w ord.

Reducer code:

im p ort sys

# R edu c er fu nction
de f redu c er():
c urrent_w ord = N o ne
c urrent_cou nt = 0
for line in sys.stdin:
w ord, c ou nt = line.strip ().sp lit('\t')
c ou nt = int(co u nt)

if w ord == cu rrent_w o rd:

c u rrent_co unt += c ou nt
e lse:
if c u rrent_w ord :
p rint(f"{c urre nt_w ord}\t{c u rrent_co unt}")
c u rrent_w ord = w o rd
c u rrent_co unt = co unt

# O utp u t th e la st w o rd
if c urrent_w ord :
p rint(f"{cu rre nt_w ord}\t{c urrent_cou nt}")

if na m e = = "m ain":

re du ce r()

In th is c ode :

● T h e re du ce r re ce ive s grou ped ke y-v a lu e pa irs.

● It a gg rega te s th e cou nt o f ea ch w ord a nd prints th e fina l resu lt.

4. Driver Code:

T h e d riv er c ode se ts u p th e m a p a nd red uc e ope ra tions a nd co ordina te s th e ex ec ution o f th e m a p

a nd re du ce p ha se s in th e fram e w ork. In H a doo p, th is w ou ld be h a ndle d b y a job co nfig u rat ion, b ut
fo r sim p lic ity, th is ca n b e m a na ge d m a nu a lly in a b asic sc rip t.

Ex a m ple D riv er C ode (in a H a doop or ba sic setu p ):

# P se ud o c ode to ex pla in th e e xe cu tio n

# 1 . T he inp u t tex t is pa ssed to th e M a pp er.
# 2 . M a ppe r em its key -v a lu e pa irs.
# 3 . Inte rm ed ia te da ta is sh u ffle d and sorte d b y key s.
# 4 . T he Re du ce r ta ke s th e so rted da ta , a gg reg at es it, a nd ou tp u ts th e resu lt.

# In H a doo p, yo u w ou ld c onfigu re a J ob w ith M a pp er a nd Red u cer.

Final Output:

A fter th e m a p a nd red uc e ph a ses, th e ou tp ut w ou ld lo ok like th is:

da ta 1
h ello 2
w o rld 2

T h is sh ow s th e w ord co unt for ea c h w ord in th e inp ut tex t.

In a distribu te d se tu p like H a doo p:

● T h e m a pp er w ou ld be ex ec u ted on d iffere nt node s p roce ssing ch u nks of da ta in pa ra lle l.

● T h e re du ce r w ou ld th en a gg rega te th e resu lts fro m a ll th e m a pp ers.

T h is b asic e xa m p le giv es y ou a go od u nd ersta nding of h ow M a p-Red u ce w orks to p roc ess larg e

da ta se ts by distribu ting the w o rk a nd a g greg a ting resu lts e ffic iently .

Uponor Complete Design Assistance Manual (CDAM) : Radiant Heating and Cooling Systems
No ratings yet
Uponor Complete Design Assistance Manual (CDAM) : Radiant Heating and Cooling Systems
380 pages
Column Rebar: Cutting List
No ratings yet
Column Rebar: Cutting List
26 pages
Big Data & Hadoop Training Material 0 1 PDF
50% (2)
Big Data & Hadoop Training Material 0 1 PDF
168 pages
Ebook Fast Data Architectures For Streaming Applications 2
No ratings yet
Ebook Fast Data Architectures For Streaming Applications 2
58 pages
Spatial Averaging in Loudspeakers In-Room Measurement: First, Let's Look at Some Measurements
No ratings yet
Spatial Averaging in Loudspeakers In-Room Measurement: First, Let's Look at Some Measurements
8 pages
bigdata
No ratings yet
bigdata
18 pages
SDCBDASPARKWEEK1-1
No ratings yet
SDCBDASPARKWEEK1-1
9 pages
BD by maaz
No ratings yet
BD by maaz
19 pages
BDA simple 1 to 4
No ratings yet
BDA simple 1 to 4
11 pages
IOT and Comp.architecture
No ratings yet
IOT and Comp.architecture
17 pages
Big Data Analytics Overview
No ratings yet
Big Data Analytics Overview
17 pages
TIE- 21CS71 SIMP with Key Answers (1)
No ratings yet
TIE- 21CS71 SIMP with Key Answers (1)
19 pages
Big Data Streams Analytics: Challenges, Analysis, and Applications
No ratings yet
Big Data Streams Analytics: Challenges, Analysis, and Applications
55 pages
1) Discuss Big Data Architecture in Detail With Help of Neat and Clean Diagram
No ratings yet
1) Discuss Big Data Architecture in Detail With Help of Neat and Clean Diagram
18 pages
Big Data Analytics
100% (1)
Big Data Analytics
14 pages
Hadoop and Related Tools
No ratings yet
Hadoop and Related Tools
57 pages
BDA Unit 2 1
No ratings yet
BDA Unit 2 1
42 pages
Data Science
No ratings yet
Data Science
87 pages
Chapter 2
No ratings yet
Chapter 2
19 pages
Big Data Imp-1
No ratings yet
Big Data Imp-1
16 pages
BDMA Part 2
No ratings yet
BDMA Part 2
16 pages
07-BigData-DataAnalysis
No ratings yet
07-BigData-DataAnalysis
66 pages
The Age OF: Every Minute
No ratings yet
The Age OF: Every Minute
47 pages
Bda Summer 2022 Solution
No ratings yet
Bda Summer 2022 Solution
30 pages
Chapter - 2 Hadoop
No ratings yet
Chapter - 2 Hadoop
32 pages
Part2 HDFS
No ratings yet
Part2 HDFS
33 pages
Big Data QB
No ratings yet
Big Data QB
37 pages
Big Data Architecture
No ratings yet
Big Data Architecture
4 pages
BDA Module-1
No ratings yet
BDA Module-1
9 pages
Big Data
No ratings yet
Big Data
29 pages
L8 Big Data Management en
No ratings yet
L8 Big Data Management en
58 pages
Hadoop Course Content
No ratings yet
Hadoop Course Content
2 pages
Big Data Analysis pdf 2
No ratings yet
Big Data Analysis pdf 2
18 pages
Hidden Patterns, Unknown Correlations, Market Trends, Customer Preferences and Other Useful Information That Can Help Organizations Make More-Informed Business Decisions
No ratings yet
Hidden Patterns, Unknown Correlations, Market Trends, Customer Preferences and Other Useful Information That Can Help Organizations Make More-Informed Business Decisions
4 pages
HADOOP
No ratings yet
HADOOP
10 pages
Hadoop Ecosystem Large PDF
No ratings yet
Hadoop Ecosystem Large PDF
229 pages
Module 1.ppt
No ratings yet
Module 1.ppt
29 pages
Big Data Hadoop Detailed Essay
No ratings yet
Big Data Hadoop Detailed Essay
4 pages
big data unit 1
No ratings yet
big data unit 1
24 pages
unit II big data architecture
No ratings yet
unit II big data architecture
5 pages
Last Min Preparation -Big Data
No ratings yet
Last Min Preparation -Big Data
5 pages
Big Data Analysis BDA IMP QNA Openinapp
No ratings yet
Big Data Analysis BDA IMP QNA Openinapp
33 pages
Big Data - Cloud - AI
No ratings yet
Big Data - Cloud - AI
45 pages
2.2. Components of Hadoop - Analysing.docx
No ratings yet
2.2. Components of Hadoop - Analysing.docx
16 pages
BD IMP QUES 1
No ratings yet
BD IMP QUES 1
22 pages
Demystifying The Big Data Ecosystem... - Param Natarajan
100% (1)
Demystifying The Big Data Ecosystem... - Param Natarajan
8 pages
BigData Unit-4 Complete
No ratings yet
BigData Unit-4 Complete
97 pages
Updated Unit-2
0% (1)
Updated Unit-2
55 pages
HADOOP
No ratings yet
HADOOP
43 pages
Big Data Computing Notes
No ratings yet
Big Data Computing Notes
17 pages
Get Hadoop Essentials Delve into the Key Concepts of Hadoop and Get a Thorough Understanding of the Hadoop Ecosystem 1st Edition Shiva Achari free all chapters
100% (2)
Get Hadoop Essentials Delve into the Key Concepts of Hadoop and Get a Thorough Understanding of the Hadoop Ecosystem 1st Edition Shiva Achari free all chapters
67 pages
HADOOP
No ratings yet
HADOOP
4 pages
Unit 2
No ratings yet
Unit 2
17 pages
ucPDF (14)
No ratings yet
ucPDF (14)
10 pages
Big Data Analytics unit wise short note
No ratings yet
Big Data Analytics unit wise short note
6 pages
UNIT1 -BDH
No ratings yet
UNIT1 -BDH
77 pages
Aim Data Engineer
No ratings yet
Aim Data Engineer
6 pages
Hortonworks Data Platform (HDP)
100% (1)
Hortonworks Data Platform (HDP)
56 pages
Unit 1 Notes
No ratings yet
Unit 1 Notes
11 pages
Hadoop Essentials Delve into the Key Concepts of Hadoop and Get a Thorough Understanding of the Hadoop Ecosystem 1st Edition Shiva Achari instant download
100% (2)
Hadoop Essentials Delve into the Key Concepts of Hadoop and Get a Thorough Understanding of the Hadoop Ecosystem 1st Edition Shiva Achari instant download
58 pages
Big Data Analytics Application
No ratings yet
Big Data Analytics Application
6 pages
Python Machine Learning for Beginners: A Step by Step Approach to Scikit-Learn and TensorFlow
From Everand
Python Machine Learning for Beginners: A Step by Step Approach to Scikit-Learn and TensorFlow
Lena Neill
No ratings yet
Hands-On Machine Learning Recommender Systems with Apache Spark
From Everand
Hands-On Machine Learning Recommender Systems with Apache Spark
Ernesto Lee
No ratings yet
Inventory Control & Management
No ratings yet
Inventory Control & Management
17 pages
Problem 5.42
No ratings yet
Problem 5.42
2 pages
SI-1061 Friabilator DS v05
No ratings yet
SI-1061 Friabilator DS v05
2 pages
Freeing Stuck Pipe
100% (5)
Freeing Stuck Pipe
24 pages
2021 2 CE-III BR Piled-Raft Theory 21102021
No ratings yet
2021 2 CE-III BR Piled-Raft Theory 21102021
83 pages
XII Phy Past Pprs
No ratings yet
XII Phy Past Pprs
27 pages
Installation Manual Axial FAN
No ratings yet
Installation Manual Axial FAN
15 pages
Log
No ratings yet
Log
20 pages
Advanced Die Casting
No ratings yet
Advanced Die Casting
9 pages
Fourier Transform Assignment
No ratings yet
Fourier Transform Assignment
10 pages
Digital - 20397123-Leading Processes To Lead Companies - Lean Six Sigma PDF
100% (1)
Digital - 20397123-Leading Processes To Lead Companies - Lean Six Sigma PDF
345 pages
Candidate Sourcing Technique
No ratings yet
Candidate Sourcing Technique
39 pages
Lisbon Portela Airport - Maps & Approach Charts - Military Airfield Directory
No ratings yet
Lisbon Portela Airport - Maps & Approach Charts - Military Airfield Directory
24 pages
Indian Scaffolding & Formwork
100% (1)
Indian Scaffolding & Formwork
15 pages
ABDUL AZIZ
No ratings yet
ABDUL AZIZ
13 pages
1221 Whitepaper Sems v141210
No ratings yet
1221 Whitepaper Sems v141210
60 pages
Engg531 Hwac2for Students
No ratings yet
Engg531 Hwac2for Students
5 pages
4b. Back Pressure Calculation
No ratings yet
4b. Back Pressure Calculation
3 pages
Sandvik Cone Crusher Components E
No ratings yet
Sandvik Cone Crusher Components E
3 pages
ML-ARM 3600: Fully Motorized and Programmable Rotary Microtome
No ratings yet
ML-ARM 3600: Fully Motorized and Programmable Rotary Microtome
2 pages
87magna PDF
100% (1)
87magna PDF
283 pages
Blank Development For Farmed Sheet Metal Parts: Machine Technology 205
No ratings yet
Blank Development For Farmed Sheet Metal Parts: Machine Technology 205
7 pages
Ac
100% (1)
Ac
37 pages
By Arch. Dale Jose C. Gozar
No ratings yet
By Arch. Dale Jose C. Gozar
26 pages
Index of Relay Papers 1946 (1949) - 2010
No ratings yet
Index of Relay Papers 1946 (1949) - 2010
53 pages
RACH ShareTechnote
No ratings yet
RACH ShareTechnote
37 pages
2TLA022302R0700 Datasheet PDF WWW - Findic.us
No ratings yet
2TLA022302R0700 Datasheet PDF WWW - Findic.us
6 pages