Cloudera Tutorial
Cloudera Tutorial
to
Data
Science
with
Hadoop
Glynn
Durham,
Senior
Instructor,
Cloudera
[email protected]
Copyright 2010-2013 Cloudera. All rights reserved. Not to be reproduced without prior wri>en consent.
1 of 36
Terms
I
will
cover:
YARN
HBase
Impala
Oozie
data
products
Copyright 2010-2013 Cloudera. All rights reserved. Not to be reproduced without prior wri>en consent.
2 of 36
Hadoop
Hadoop
is:
a
plaLorm
for
big
data
several
Apache
SoNware
FoundaOon
(ASF)
projects
free
open
source
soNware
Major
parts:
Hadoop
Core
Hadoop
ecosystem
Copyright
2010-2013
Cloudera.
All
rights
reserved.
Not
to
be
reproduced
without
prior
wri>en
consent.
3 of 36
Copyright 2010-2013 Cloudera. All rights reserved. Not to be reproduced without prior wri>en consent.
4 of 36
Hadoop Core
Copyright 2010-2013 Cloudera. All rights reserved. Not to be reproduced without prior wri>en consent.
5 of 36
HDFS Writes
Copyright 2010-2013 Cloudera. All rights reserved. Not to be reproduced without prior wri>en consent.
6 of 36
HDFS Reads
Copyright 2010-2013 Cloudera. All rights reserved. Not to be reproduced without prior wri>en consent.
7 of 36
*
but
you
can
always
write
a
new
le,
and/or
delete,
move,
and
rename
les
and
directories
Copyright
2010-2013
Cloudera.
All
rights
reserved.
Not
to
be
reproduced
without
prior
wri>en
consent.
8 of 36
Copyright 2010-2013 Cloudera. All rights reserved. Not to be reproduced without prior wri>en consent.
9 of 36
MapReduce Chains
Copyright 2010-2013 Cloudera. All rights reserved. Not to be reproduced without prior wri>en consent.
10 of 36
MapReduce at Scale
Copyright 2010-2013 Cloudera. All rights reserved. Not to be reproduced without prior wri>en consent.
11 of 36
MapReduce in Hadoop
Copyright 2010-2013 Cloudera. All rights reserved. Not to be reproduced without prior wri>en consent.
12 of 36
MapReduce
cannot:
run
in
real
Ame:
MapReduce
jobs
are
batch
jobs
Copyright
2010-2013
Cloudera.
All
rights
reserved.
Not
to
be
reproduced
without
prior
wri>en
consent.
13 of 36
Copyright 2010-2013 Cloudera. All rights reserved. Not to be reproduced without prior wri>en consent.
14 of 36
Hadoop
Ecosystem
The
Hadoop
Ecosystem
consists
of
other
projects
that
round
15 of 36
Copyright 2010-2013 Cloudera. All rights reserved. Not to be reproduced without prior wri>en consent.
16 of 36
Copyright 2010-2013 Cloudera. All rights reserved. Not to be reproduced without prior wri>en consent.
17 of 36
Copyright 2010-2013 Cloudera. All rights reserved. Not to be reproduced without prior wri>en consent.
18 of 36
19 of 36
20 of 36
Cloudera
exclusive
oering,
but
Apache
licensed,
so
it's
free
and
open
source
Copyright
2010-2013
Cloudera.
All
rights
reserved.
Not
to
be
reproduced
without
prior
wri>en
consent.
21 of 36
Copyright 2010-2013 Cloudera. All rights reserved. Not to be reproduced without prior wri>en consent.
22 of 36
Copyright 2010-2013 Cloudera. All rights reserved. Not to be reproduced without prior wri>en consent.
23 of 36
24 of 36
Copyright 2010-2013 Cloudera. All rights reserved. Not to be reproduced without prior wri>en consent.
25 of 36
26 of 36
Machine Learning
27 of 36
Copyright 2010-2013 Cloudera. All rights reserved. Not to be reproduced without prior wri>en consent.
28 of 36
Copyright 2010-2013 Cloudera. All rights reserved. Not to be reproduced without prior wri>en consent.
29 of 36
Copyright 2010-2013 Cloudera. All rights reserved. Not to be reproduced without prior wri>en consent.
30 of 36
Copyright 2010-2013 Cloudera. All rights reserved. Not to be reproduced without prior wri>en consent.
31 of 36
Copyright 2010-2013 Cloudera. All rights reserved. Not to be reproduced without prior wri>en consent.
32 of 36
Copyright 2010-2013 Cloudera. All rights reserved. Not to be reproduced without prior wri>en consent.
33 of 36
IdenOfy internal and external data for potenOal use (general data wrangling tools).
2.
Help build ingesOon pipelines to obtain data for use (Flume, Sqoop, other).
3.
Examine, clean, and anonymize ingested data (Hive, Impala, Pig, Hadoop Streaming).
4.
5.
Explore
data
sets
to
gain
understanding
of
problems,
trends,
reality
(Impala,
Hive,
Pig,
staOsOcal
programming).
6.
7.
Contribute
to
data
products:
products
in
the
organizaOon
that
are
built
in
large
part
from
the
data
itself
(Mahout,
Sqoop
export,
general
le
export).
8.
9.
Copyright 2010-2013 Cloudera. All rights reserved. Not to be reproduced without prior wri>en consent.
34 of 36
Copyright 2010-2013 Cloudera. All rights reserved. Not to be reproduced without prior wri>en consent.
35 of 36
Thank
you!
QuesAons?
ContribuAons?
Glynn
Durham,
Senior
Instructor,
Cloudera
[email protected]
Copyright 2010-2013 Cloudera. All rights reserved. Not to be reproduced without prior wri>en consent.
36 of 36