Big Data For Dummies
Big Data For Dummies
Submitted by: SHRUTI MITTAL ( !" 2!1#$ SHILPI RAI ( !" 2!1#$
%
12, #,1-
2
12/07/13 Big Data For Dummies
It s"ou(# %e su%0e*t orie'te#It s"ou(# %e orga'i1e# so t"at re(ate# e2e'ts are (i',e# toget"erT"e i')ormatio' s"ou(# %e 'o'2o(ati(e so t"at it *a''ot %e i'a#2erte't(& *"a'ge#-
I')ormatio' i' t"e $are"ouse s"ou(# i'*(u#e a(( t"e a++(i*a%(e o+eratio'a( sour*es- T"e i')ormatio' s"ou(# %e store# i' a $a& t"at "as *o'siste't #e)i'itio's a'# t"e most u+3to3#ate 2a(ues12/07/13 Big Data For Dummies
Data $are"ouses "a2e tra#itio'a((& su++orte# stru*ture# #ata a'# "a2e %ee' *(ose(& tie# to t"e o+eratio'a( a'# tra'sa*tio'a( s&stems o) t"e e'ter+rise-
It is a s&stem o) re*or# )or %usi'ess i'te((ige'*e. mu*" (i,e a *ustomer re(atio's"i+ ma'ageme't 78R59 s&stem or a' a**ou'ti'g s&stem-
T"ese s&stems are "ig"(& stru*ture# a'# o+timi1e# )or s+e*i)i* +ur+oses a'# re*or# te'# to %e "ig"(& *e'tra(i1e#-
.
12/07/13 Big Data For Dummies
A #ata $are"ouse is use# %& t"e *om+a'& to tra*, its tra'sa*tio's a'# o+eratio'a( #ata- Ho$e2er. t"e #ata $are"ouse #oes 'ot ,ee+ tra*, o) $e% tra))i*- !o. t"e *om+a'& use# $e% a'a(&ti*s so(utio's to *a+ture *ustomer i'tera*tio'sTo ,ee+ as mu*" o) t"is #ata as +ossi%(e to u'#ersta'# t"e *"a'ges a'# 'ua'*es o) t"e %usi'ess- T"e i')ormatio' ma'ageme't team #e*i#e# t"at rat"er t"a' %ui(#i'g a *ustomi1e# #ata $are"ouse to store t"is #ata. it $ou(# (e2erage a Ha#oo+ #istri%ute# *om+uti'g a++roa*" %ase# o' *ommo#it& ser2ersNo$ t"e *om+a'& *a' store# t"e #ata a*ross a 2ast arra& o) ser2ers ru''i'g Ha#oo+ a'# 5a+Re#u*eB& usi'g Le2eragi'g too(s su*" as F(ume a'# !<oo+. t"e team is a%(e to mo2e #ata i'to a'# out o) Ha#oo+ a'# +us" it i'to a re(atio'a( mo#e( so t"at it *a' %e <uerie# $it" )ami(iar !=L too(sT"e *om+a'& is 'o$ a%(e to *"a'ge its %usi'ess o))eri'gs <ui*,(& a'# t"e #ata remai's i' t"e Ha#oo+ )rame$or, e'2iro'me't a'# is u+#ate# i' 'ear rea( time- Ot"er #ata e(eme'ts are *(ea'se# a'# t"e' are mo2e# i'to t"e #ata $are"ouse so t"at t"e #ata is use# to *om+are t"e "istori*a( i')ormatio' a%out *ustomers a'# +art'ers to " t"e 'e$ #ata12/07/13 Big Data For Dummies
T"e e>isti'g $are"ouse +ro2i#es t"e *o'te>t )or t"e %usi'ess $"i(e t"e Ha#oo+ e'2iro'me't tra*,s $"at is "a++e'i'g o' a mi'ute3to3mi'ute %asisT"e *om%i'atio' o) t"e s&stem3o)3re*or# a++roa*" $it" t"e #ata $are"ouse $it" t"e #&'ami* %ig #ata s&stem +ro2i#es a treme'#ous o++ortu'it& )or t"e *om+a'& to *o'ti'ue to e2o(2e its %usi'ess %ase# o' a'a(&1i'g t"e massi2e amou't o) #ata ge'erate# %& its $e% e'2iro'me'ts-
T"is #e+i*ts a' a++roa*" to "&%ri#i1i'g tra#itio'a( a'# %ig #ata $are"ousi'g-
/
12/07/13 Big Data For Dummies
First it is im+orta't to re*og'i1e t"at t"e #ata $are"ouse as it is #esig'e# to#a& $i(( 'ot *"a'ge i' t"e s"ort term-
T"e $are"ouse mig"t i'*(u#e i')ormatio' a%out a +arti*u(ar *om+a'&?s +ro#u*t (i'e. its *ustomers. its su++(iers. a'# t"e #etai(s o) a &ear?s $ort" o) tra'sa*tio's-
T"e i')ormatio' ma'age# i' t"e #ata $are"ouse or a #e+artme'ta( #ata mart "as %ee' *are)u((& *o'stru*te# so t"at meta#ata is a**urate-
#ata-
12/07/13
The data warehouse with big data can be relatively easy. For example, many of the big data sources come from sources that include their own well-designed metadata. Therefore, when conducting analysis between the warehouse and the big data source, The information management organization is working with two data sets with carefully designed metadata models that have to be rationalized. Before an analyst can combine the historical transactional data with the less structured big data, work has to be done. n the process of analysis, it is !ust as important to eliminate unnecessary data as it is to identify data relevant to the business context. n this way, when the big data is combined with traditional, historical data from the warehouse, the results will be accurate and meaningful.
0
12/07/13 Big Data For Dummies
T"e #ata i'tegratio' is a *riti*a( e(eme't o) ma'agi'g %ig #ata. it is e<ua((& im+orta't $"e' *reati'g a "&%ri# a'a(&sis $it" t"e #ata $are"ouse-
T"e +ro*ess o) e>tra*ti'g #ata a'# tra's)ormi'g it i' a "&%ri# e'2iro'me't is 2er& simi(ar to "o$ t"is +ro*ess is e>e*ute# $it"i' a tra#itio'a( #ata $are"ouse-
T"e #ata is e>tra*te# )rom tra#itio'a( sour*e s&stems su*" as 8R5 or ER4 s&stems- It is *riti*a( t"at e(eme'ts )rom t"ese 2arious s&stems %e *orre*t(& mat*"e#!
12/07/13 Big Data For Dummies
The extracted files which must be transformed to match the business rules and processes of the sub!ect area that the data warehouse is designed to analyze. The sources which have to be transformed so that they are helpful in analyzing the relationship between the historical data and the more dynamic and real-time data. )oading information in the big data model will be different than what we would expect in a traditional data warehousing model. n the data warehouse, after data has been codified, it is never changed.
-hen creating a hybrid of the traditional data warehouse and the big data environment, the distributed nature of the big data environment can dramatically change the capability of organizations to analyze huge volumes of data in context with the business.
1
12/07/13 Big Data For Dummies
Re<uireme'ts )or *ommo' #ata #e)i'itio'sRe<uireme'ts to e>tra*t a'# tra's)orm ,e& #ata sour*esT"e 'ee# to *o')orm to re<uire# %usi'ess +ro*esses a'# ru(es-
The distributed computing model of big data will be essential to allowing the hybrid model to be operational. The big data analysis will be the primary focus of the efforts, while the traditional data warehouse will be used to add historical and transactional business context.
12/07/13 Big Data For Dummies
11
1.
12/07/13 Big Data For Dummies
1"
12/07/13 Big Data For Dummies
B)+i6 A&)5yti6+
Basi* a'a(&ti*s *a' %e use# to e>+(ore &our #ata. i) &ou?re 'ot sure $"at &ou "a2e. %ut &ou t"i', somet"i'g is o) 2a(ueT"is mig"t i'*(u#e sim+(e 2isua(i1atio's or sim+(e statisti*sBasi* a'a(&sis is o)te' use# $"e' &ou "a2e (arge amou'ts o) #is+arate #ataHere are some e>am+(es/ AS5i6i&' )&d di6i&': Slicing and dicing re)ers to %rea,i'g
#o$' &our #ata i'to sma((er sets o) #ata t"at are easier to e>+(ore1# AB)+i6 m*&it*(i&': ;ou mig"t a(so $a't to mo'itor (arge Big Data For 12/07/13 2o(umes o) #ata i' rea( time- T"is $ou(# +ro#u*e a "uge Dummies
Ad:)&6ed A&)5yti6+
It +ro2i#es a(gorit"ms )or *om+(e> stru*ture# or u'stru*ture# #ataa'a(&sis o) eit"er
It i'*(u#es so+"isti*ate# statisti*a( mo#e(s. ma*"i'e (ear'i'g. 'eura( 'et$or,s. te>t a'a(&ti*s a'# ot"er a#2a'*e# #ata mi'i'g te*"'i<uesAmo'g its ma'& use *ases. a#2a'*e# a'a(&ti*s *a' %e #e+(o&e# to )i'# +atter's i' #ata. +re#i*tio'. )ore*asti'g. a'# *om+(e> e2e't +ro*essi'gBusi'esses rea(i1e t"at %etter i'sig"ts *a' +ro2i#e a su+erior *om+etiti2e +ositio'- 8om+a'ies are +us"i'g to$ar# uti(i1i'g a#2a'*e# a'a(&ti*s as +art o) t"eir #e*isio'3ma,i'g +ro*ess10
12/07/13
Here are a )e$ e>am+(es o) a#2a'*e# a'a(&ti*s )or %ig #ata/ P(edi6ti:e m*de55i&': A +re#i*ti2e mo#e( is a statisti*a( or #ata mi'i'g so(utio' *o'sisti'g o) a(gorit"ms a'# te*"'i<ues t"at *a' %e use# o' %ot" stru*ture# a'# u'stru*ture# #ata 7toget"er or i'#i2i#ua((&9 to #etermi'e )uture out*omesTe<t )&)5yti6+: 'stru*ture# #ata is su*" a %ig +art o) %ig #ata. so te>t a'a(&ti*s B t"e +ro*ess o) a'a(&1i'g u'stru*ture# te>t. e>tra*ti'g re(e2a't i')ormatio'. a'# tra's)ormi'g it i'to stru*ture# i')ormatio'Ot9e( +t)ti+ti6)5 )&d d)t)=mi&i&' )5'*(it9m+: T"is ma& i'*(u#e a#2a'*e# )ore*asti'g. o+timi1atio'. *(uster a'a(&sis )or segme'tatio' or e2e' mi*ro segme'tatio'. or a))i'it& a'a(&sis1!
12/07/13
2
12/07/13
D)t) Mi&i&'
T&+i*a( a(gorit"ms use# i' #ata mi'i'g i'*(u#e t"e )o((o$i'g/
21
O7e()ti*&)5i;ed )&)5yti6+
W"e' &ou o+eratio'a(i1e a'a(&ti*s. &ou ma,e t"em +art o) a %usi'ess +ro*ess- For e>am+(e. statisti*ia's at a' i'sura'*e *om+a'& mig"t %ui(# a mo#e( t"at +re#i*ts t"e (i,e(i"oo# o) a *(aim %ei'g )rau#u(e't T"e mo#e(. a(o'g $it" some #e*isio' ru(es. *ou(# %e i'*(u#e# i' t"e *om+a'&?s *(aims3+ro*essi'g s&stem to )(ag *(aims $it" a "ig" +ro%a%i(it& o) )rau#- T"ese *(aims $ou(# %e se't to a' i'2estigatio' u'it )or )urt"er re2ie$-
22
12/07/13
M*&eti;i&' A&)5yti6+
A'a(&ti*s *a' %e use# to o+timi1e &our %usi'ess to *reate %etter #e*isio's a'# #ri2e %ottom3 a'# to+3(i'e re2e'ueHo$e2er. %ig #ata a'a(&ti*s *a' a(so %e use# to #eri2e re2e'ue a%o2e a'# %e&o'# t"e i'sig"ts it +ro2i#es 0ust )or &our o$' #e+artme't or *om+a'&For e>am+(e. *re#it *ar# +ro2i#ers ta,e t"e #ata t"e& assem%(e to o))er 2a(ue3a##e# a'a(&ti*s +ro#u*ts- Li,e$ise. $it" )i'a'*ia( i'stitutio's- Te(e*ommu'i*atio's *om+a'ies are %egi''i'g to se(( (o*atio'3%ase# i'sig"ts to retai(ers- T"e i#ea is t"at 2arious sour*es o) #ata. su*" as %i((i'g #ata. (o*atio' #ata. te>t3messagi'g #ata. or $e%3%ro$si'g #ata *a' %e use# toget"er or se+arate(& to ma,e i')ere'*es a%out *ustomer %e"a2ior +atter's t"at retai(ers $ou(# )i'# use)u(212/07/13
2.
12/07/13
2"
A&)5yti6)5 A5'*(it9m+
W"e' &ou?re *o'si#eri'g %ig #ata a'a(&ti*s. &ou 'ee# to %e a$are t"at $"e' &ou e>+a'# %e&o'# t"e #es,to+. t"e a(gorit"ms &ou use o)te' 'ee# to %e refactored, changing the internal code without affecting its external functioning. De'#ors are starti'g to o))er a 'e$ %ree# o) a'a(&ti*s #esig'e# to %e +(a*e# *(ose to t"e %ig #ata sour*es to a'a(&1e #ata i' +(a*e rat"er t"a' )irst "a2i'g to store it a'# t"e' a'a(&1e itT"is a++roa*" o) ru''i'g a'a(&ti*s *(oser to t"e #ata sour*es mi'imi1es t"e amou't o) store# #ata %& retai'i'g o'(& t"e "ig"32a(ue #ata- It is a(so e'a%(es &ou to a'a(&1e t"e #ata soo'er. (oo,i'g )or ,e& e2e'ts. $"i*" is *riti*a( )or rea(3time 2/ #e*isio' ma,i'g12/07/13
I&8()+t(u6tu(e Su77*(t
AI&te'()te te69&*5*'ie+:
T"e i')rastru*ture 'ee#s to i'tegrate 'e$ %ig #ata te*"'o(ogies $it" tra#itio'a( te*"'o(ogies to %e a%(e to +ro*ess a(( ,i'#s o) %ig #ata a'# ma,e it *o'suma%(e %& tra#itio'a( a'a(&ti*sASt*(e 5)('e )m*u&t+ *8 di+7)()te d)t): A' e'ter+rise3 "ar#e'e# Ha#oo+ s&stem ma& %e 'ee#e# t"at *a' +ro*ess/store/ma'age (arge amou'ts o) #ata at rest. $"et"er it is stru*ture#. semi3stru*ture#. or u'stru*ture#AP(*6e++ d)t) i& m*ti*&: A stream3*om+uti'g *a+a%i(it& ma& %e 'ee#e# to +ro*ess #ata i' motio' t"at is *o'ti'uous(& ge'erate# %& se'sors. smart #e2i*es. 2i#eo. au#io. a'# (ogs to su++ort rea(3time #e*isio' ma,i'gA>)(e9*u+e d)t): ;ou ma& 'ee# a so(utio' o+timi1e# )or o+eratio'a( or #ee+ a'a(&ti*a( $or,(oa#s to store a'# ma'age t"e gro$i'g amou'ts o) truste# #ata2#
12/07/13
1ASA
It is usi'g +re#i*ti2e mo#e(s to a'a(&1e sa)et& #ata o' air*ra)ts- It $a'ts to u'#ersta'# $"et"er t"e i'tro#u*tio' o) a 'e$ te*"'o(og& i'to a' air*ra)t $i(( ma,e a #ramati* im+a*t i' sa)et&
20
12/07/13
i'tegrati'g a*ross t"e +(at)orm i'*(u#i'g em%e##i'g/%u'#(i'g its a'a(&ti*s SAS +ro2i#es mu(ti+(e a++roa*"es to a'a(&1e %ig #ata 2ia its "ig"3+er)orma'*e a'a(&ti*s i')rastru*ture a'# its statisti*a( so)t$are O()65e o))ers a ra'ge o) too(s to *om+(eme't its %ig #ata +(at)orm *a((e# Ora*(e E>a#ata Pe&t)9* +ro2i#es o+e' sour*e %usi'ess a'a(&ti*s 2ia a *ommu'it& a'# e'ter+rise e#itio'- 4e'ta"o su++orts t"e (ea#i'g Ha#oo+3%ase# #istri%utio's a'# su++orts 'ati2e *a+a%i(ities
2!
12/07/13 Big Data For Dummies
T9)&A B*u