0% found this document useful (0 votes)
8 views

Data Warehouse and Data Mining - Unit 1

Uploaded by

zrimreaper
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
8 views

Data Warehouse and Data Mining - Unit 1

Uploaded by

zrimreaper
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 40

INT ROD UCT ION TO

DAT A WAR EHO USIN G

...-..;·:-·----.;. .
,, -,

CHAPTER OUTuNE

0 Lifecycle of data, Types of data,


0 Data warehouse and data warehousing,
0 Differences between operational database and data warehouse,
0 A multidimensional data model,
0 OLAP operation in multidimensional data model,
0 Conceptual modeling of data warehouse,
0 Architecture of data ware house,
0 O;ata warehouse implementation,
0 Data marts
0 Components of data warehouse
0 Need for data ware housing
0 Trends in data warehousing
ing
Dat a War eho usin g and Dat a Min

LIF EC YC LE OF DA TA

dat a life cyc le pro vid es a hig h-le vel ove rvie w of the stag es inv olv ed
in suc ces afu ] ~tern~l\t
The
dat a for use and reu se. Mu ltip le ver sion s of a dat a life cyc le ex~;
and pre ser vat ion of ain s or com mu niti es. The dat a life "-' th
1
in pra ctic es acr oss dom
diff eren ces attr ibu tab le to var iati on m one dat a cycl~ is
crib ed as a cyc le bec aus e the less ons lear ned and insi ght s gle ane d fro
ofte n des ces s fee ds bac k into the first. PrOjeq
In this way , the fina l step of the pro
typ ical ly info rm the nex t.
opp ortu niti es, and Pote .
dat a pro ject s are iden tica l; eac h brin gs its ow n cha llen ges ,
No two ic life n~
its traj ecto ry. Nea rly all dat a pro ject s, how eve r, foll ow the sam e bas
sol utio ns tha t imp act ph?cl~
le can be spli t into eig ~t c~m m~ n _stag es or step s, or
fro m s~a rt to fini_sh. Thi s life cyc rpretation ses.
, Ma1 1agement, Ana lysi s, V1sual1zation, and Inte
Generahon, Collccho11, Processing, Storage

1.
Gen erat ion

2.
8. Coll ectio n
lnta -pet atio n

(
3.
7. Proc essi ng
Vi.S1•eJimtion

\ )
6. 4.
Ana lysis Stor age

5.
Man agem ent

Figure 1.1: 8 steps in data life cyc le.

1. Gen era tion


the following steP'
in, dat a mu st firs t be gen era ted . Oth erw ise,
less of wh eth er you 're aw are of it, ~
For the dat a life cyc le to beg
~an 't ~ init ia~e d. Dat a gen era tion occ urs reg ard
· gen era ted by you r org ani·zati·on, suP --~- t1f
f this d ata 1S
in our inc rea sing ly online wo rld · Some o · E _, ,i,,
you r cus tom. ers' and som. e by thir d par ties·
you ma y or ma y not be aware of. v~ ~,
. . the .J'IJI
se, hire , com mu ruc atio n, in·t cti eve ryt hin g gen era tes dat a. GIV en
cha
pur . . era on- 4-ar ~e yv-
dat a can ofte n lead to pow . . gh ts that allo w you to L..J.
erfu l ms1 '1': .........
atte ntio n, this
cus tom ers and bec om e.. mo re effe ctive in you r ro1e.
. =-. • __ .
,. . _

SW
Introduction to Data Warehousing O CHAPTEI 1 IJ
2. Collection
Not all of the data that's generated every day is collected
or used. It's up to your data team to
identify what information should be captured and the best
means for doing so, and what data is
unnecessary or irrelevant to U,e project at hand. We can collect data
in a variety of ways,
including:
• Forms: Web forms, client or customer intake forms
, vendor forms, and huma n
resources applications are some of the most common ways
businesses generate data.
• Surve ys: Surveys can be an effective way to gather vast
amounts of information from a
large numb er of respondents.
• Inter views: Interviews and focus groups conducted
with customers, users, or job
applicants offer opportunities to gather qualitative and
subjective data that may be
difficult to captu re through other means.
• Direc t Observation: Observing how a customer
interacts with your website,
application, or produ ct can be an effective way to gather
data that may not be offered
throu gh the methods above.
It's important to note that many organizations take a
broad approach to data collection,
capturing as much data as possible from each interaction and
storing it for potential use.
3. Proce ssing
Once data has been collected, it must be processed. Data
processing can refer to various
activities, including:
• Data wran gling , in which a data set is deane d and transf
ormed from its raw form into
something more accessible and usable. This is also know
n as data deani ng or data
remediation.
• Data comp ressio n, in which data is transformed
into a format that can be more
efficiently stored.
• Data encry ption , in which data is translated into anoth
er form of code to protect it
from privacy concerns.
Even the simple act of taking a printed form and digitizing
it can be considered a form of data
processing.
4. Storage
After data has been collected and processed, it must be
stored for future use. This is most
commonly achieved throu gh the creation of databases or
datasets. These datasets may then be
stored in the cloud, on servers, or using another form of
physical storage like a hard drive,
CD, cassette, or flopp y disk.
When determining how to best store data for your organ
ization, it's impo rtant to build in a
certain level of redun dancy to ensure that a copy of your
data will be protected and accessible,
even if the original source becomes corru pted or compromis
ed.
5. Mana geme nt
Data management, also called database management,
involves orgaruzmg, storing, and
retrieving data as necessary over the life of a data project.
While referred to here as a step, it's
an ongoing process that takes place from the beginning
throu gh the end of a project. Data
management includes everything from storage and encry
ption to implementing access logs
and change logs that track who have accessed data and what
changes they may have made.
e a;µ
d Da ta Mi nin g
Da ta Wa reh ou sin g an
from ra
6. An al ys is se s tha t att em pt to gle an °? ea ni ng fu l in sig ht s ~~ E,u
Da ta an aly sis ref ers to
proces
ls an d str~te_gies to co n~ uc t th es e anaiy "'
tists use differ en t too go rit hm s,~· '
An aly sts an d da ta scien ds inc lud e statistical mo de lin g, al
rm s ~ an aly sis cl r¾
ly us ed me tho
of th e mo re co mm on ing. Exactly _wh o pe rfo
int ell ige nc e, da ta mi nin
g, an d ma ch ine lea rn za tio n's cl ~
ad dr es se d, as we ll as th e siz e of yo ur or ga ni
ge be ing ata ~Q i.
on th e specific challen sc ien tis ts ca n all pl ay a role.
analysts, an d da ta
Bu sin es s an aly sts , da ta
l re pr es en tat io ns
7. Vi su al iza tio n to the pr oc es s of cr ea tin g gr ap hi ca _of >'O\tr
Da ta vi su ali za tio n ref ers
on e or mo re vi su ali·d za tio n tools. Visua1·IZm g ,f_
the us e of di
information., tv~ pk all y
th ro ug h al is · en ce bo th • . "'Ila
e ~c le ~
ys to a W l er au
ly communicate yo ur an
m ak es it ea sie r to quick yo ur vis ua lizati o_ n tak es ~e pe nd s _on th
ati on . The fo nn lly no t a Y~ rt
ou tsi de yo ur or ga niz ur uc ate . W hi le tec hn ica
wo rk in g "; th , as we ll
as th e sto ry yo u wa nt to
ha s
co mm
be co me an in cr ea sin gl
y im po rta nt p:;:Uired
cts . da ta vis ua liz ati on of lllt
ste p fo r all da ta pr oje It
da ta lif e C\'Cle. ill
ke
8. ln te tp te t.t io n da ta lif e cy cle pr ov id es th e op po rtu ni ty to ma pl
ph as e of th e th is is when5en9t
Finally, ~ int e.r pte tat ion . Be yo nd sim pl y pr es en tin g th e da ta , o1
visualizati on interpretation Yoa
of yo ar an aly sis an d ur ex pe rti se an d un de rs ta nd in g. Yo ur Vi
th e len s of yo ~
in ve sti ga te it th ro ug h tio n of wJ :ia t th e da ta sh ow s bu t, m or e i m b
no t on ly in clu de a de
sc rip tio n or ex pla na ; . '
wh at th e im pli ca tio ns
may be.

T\"PES O F DATA scriptiuis


rd s, me as ur em en ts, ob se rv at io ns or ju st de
ts, su ch as nu mb er s, wo ta, and
Data ~ a co lle cti on ol fac ies na me ly , .R ec or d Da ta , Gr ap h- ba se d Da
da ta in to three categor
of things. W e define
I •
Or de re dD at a.
L Re co rc lD m i

a) Da ta M atr ix l
b) Document Data t
c)Transaction Data
2. Graph bu ed c1au
a) Li nk ed web pages
res
b) Benzene MolecuJar Structu
3. Or de re d el m
a) Sequential Data
C, en eti c Sequence Da
ta
b)
c) Temporal Data
d) Spatial Data

R ec or d D at a . cifJ)
collection of d a fix ed se t of attrib uteS iS 41'
Da ta th at co ns ist s of a ofr s, ea has
basic form Orecre ch of which consists of
. .
_......nrds ot .
·tftd'
re co rd da ta . 1ne m os t cord data no ex pb at relationship am on g
rP ."' '

fi. eId s., an d ev er y re co rd


(object) has the
t of att rib ut es . Re co rd da ta is usually stored"
l databases. sa m e se
fla t fil es or in relationa
Introduction to Data Warehousing O ..--CHAPTEI 1 I I
Example:
Tid Refund Marital status Taxable income Cheat
1 Yes Single 200000 No
2 No Married 400000 Yes
3 No Single 340000 No
4 Yes Single 150000 No
5 No Married 340000 Yes
6 No Single 550000 No
7 Yes Divorced 540000 Yes
There are a few variations of Record Data, which have some characteristic properties.
Transaction or Market Basket Data

It is a special type of record data, in which each record contains a set of items. For example, shopping
in a supermarket or a grocery store. For any particular customer, a record will contain a set of items
purchased by the customer in that respective visit to the supermarket or the grocery store. This type
of data is called Market Basket Data. Transaction data is a collection of sets of items, but it can be
viewed as a set of records whose fields are asymmetric attributes. Most often, the attributes are
binary, indicating whether or not an item was purchased or not.

Tid Item
"
1 Pencil, Paper
2 Pencil, Book, Rubber, Ink
3 Paper, Book, Rubber, Rule~
4 Pencil, Pape~Book,Rubber
5 Pencil, Paper, Book, Ruler

The Data Matrix


If the data objects in a collection of data all have the same fixed set of numeric attributes, then the
data objects can be thought of as points (vectors) in a multidimensional space, where each dimension
represents a distinct attribute describing the object. A set of such data objects can be interpreted as an
m x n matrix, where there are n rows, one for each object, and n columns, one for each attribute.
Standard matrix operation can be applied to transform and manipulate the data. Therefore, the data
matrix is the standard data format for most statistical data. Some examples are shown below:
Point Attributel Attrlbute2
Xt 1 2
X2 3 5
X, 2 0

12.65 6.25
i~15.22
16.22
I Data Warehousing and Data Mining

The ~-0..:
uy-~
M · / Documcnt~ta Matrix
atto: . ) is a special case of a data
tnX
t-data ma
d ina~
A sparse data matrix (sometimes also called ocumen ,,.c:vmmetric; i.e., only non-zero Valu
in which the attributes are of the same type and are ~ i -- es~,
important.
Game Win Lost Timeout
Ball Score Se~
Team Co.lch Play
Docwnent 1 3 0 5 0 2 6 0 2 0
'2'
7 0 2 1 0 0 3 0 ()'
Docwneut 2 0
0 1 0 0 1 3 2 0 4 ()'
Do.:umemJ

Graph Buecl Data -----


. a da tabase that uses graph structures for semantic quPti...
f th
In . database (GDB) JS
.....,.?J. -•,q
computing, a b4---r· . to t and store data. A key concept o e system is the
•ith nodes. edge5Y ~ pro~ ~ t e s the data items in the store to a collection of nodes
graph (or edge or relationship): 1re :rettion.ships between the nodes. The relationships allow data
~ edge5Y t h e e d ~ ~ A ~ and, in many cases, retrieved with one operation. Graph
m the strre to be linked """l>....... ~ .......~..., • be furth di .ded .
databases hold the relationships between data as a priority. This can er VI into types:
Dmwim R,:l,,rtinmhips Among Objects (Unked Web Pages Data)
linked daia is an approach. to publishing and sharing data on the w eb. The Se~tic Web isn't just
I
about prtting Jata 00 the web. It is about making links, so that a person or machine can explore the
web cL data. With linked data. when we have some of it, we can find other, related, data. Like the
web of hJfEdexJ.. the •-eb of data is ronstructed with documents on the w eb. How ever, unlike the
web of hJfE:itexl., where Jinks are relationships anchors in hypertext documents written in HTMi.,
u data~ link between arl>ihaty ~ described by Resource Description Framework (RDF).
The data oqet:ts are mapped to nodes of the graph. while the relationships among objects m
link propedies, such as direction and weight. Comider
captured by the links between objects and
Web pages on the World Wide Web, which contain both text and links to other pages. In order to
proa:56 seux:h ~ Web search engines collect and process Web pages to extract their contents.

Data warehouse and Data mining


• Introd:uctum to ow .
troduction to D

A Data WareJ,omng (DW) is ~ for collecting and ·


data from varied SOW"Ces to pn:wide meaningful business ll1Sl -~ hg
ts. A
Data ....,.._i._ __ • •
- c u ~ IS l}'picaJJy used to coanec:t and ana.iy7.e business Data mining is defined as a process
data from heterogeneous sources. 1be data warehouse . used to extract usable data frolll a
the Bl a.oo-.... ~ ...1.. • .__.,._for.,....... IS the core of
. ancf l'eporting.
. -;-.... .,.,IIU.I JS UIWL 1141a analysis larger set of any raw data. It iJDplieS
It is a blend of t:edmologies and components wbi b .
strategic use of data. It is eledronic c aufs the analyzing data patterns in Id
information by a business which ~~ of a large amount of batches of data using one or ~
analysis instead of tl'ansaaion · ~gn~. for query and ~ftware. Data mining has a~pli~
tra:mfonning data into infonnatiOII ~ . 11• a P ~ of 1n multiple fields, like saenc:t
users in a timely ID.aruler' to make a d i l f e ~ tt available to research.
Introdu ction to Data Warehousin g O CHAPTER 1 I 1
Data with Objects That Are Graphs (Benzene (C6H 6)Molecular Structures )

If objects have structure, that is, the objects contain sub objects that have relationships, then such
objects are frequently represente d as graphs. For example, the structure of chemical compound s can
be represented by a graph, where the nodes are atoms and the links between nodes are chemical
bonds.

Figure 1.3: Graphical representation of Benzine molecular strudure

Ordered Data

Ordered data set records are kept in a physical sequence based on a user-specified key without the
necessity of utilizing a set. Ordered data sets can_be either disjoint or embedded, but are normally
embedded. For some types of data, the attributes have relationshi ps that- involve order in time or
space. It can be segregated into four types:

Sequential Data
-.
Whenever the points in the dataset are dependent on the other points in the dataset the data is said
to be Sequential data. A common example of this is a Time series such as a stock price or a sensor
data where each point represents an observation at a certain point in time.
Sequential Data is any kind of data where the order matters as you said. So, we can assume that time
series is a kind ol sequential data, because the· order matters. A time series is a sequence taken at
successive equally spaced points in time and it is not the only case of sequential d ata. Consider a
retail transaction data set that also stores the time at which the transaction took place
Time Customer Item Purchased
Tl Aarav Bag, book
T2 Umesh Bag, pen
T2 Aarav Pen, Copy
T3 Aadesh Bag, Copy
T4 Aadesh Doll
TS Aarav Bag, Doll

customer Time and Item Purchased


Aarav (Tl: Bag, book) (T2: Pen, Copy)(TS: Bag, Doll)
Umesh (T2: Bag, pen)
Aadesh (T3: Bag, Copy) (T4: Doll)
I Data Warehousing and Data Mining

Gene tic Seque nce Data


. . b th •
Orgarusms are built, and their functions are detem uned, f
y err gene 1c code· This. code is contained
. .
in DNA molecules, which are found in human, animal and plant
cells, as well as m rrucr oorg~ ms
like bacteria and viruses. DNA has four components, or build. blocks called C (cytosine), G
mg ' .
(guanine), A (adenine), or T (thymine). Laboratories can determ
ine the genetic se~uence of _a
. Th d
particular organism, using sequencing technologies . e ata generated throu .
gh this process 1s
.
called genetic sequence data (GSD), which is represented by listing
the nucleotides m 0rder (e.g., ;
RNA sequence might look like AGAAAUGAAAUGGCUCCUGU
CAA) .
·
Genetic Sequence data consists of a data set that 1s a sequence of indivi dual entities, such as a
sequence of words or letters. It is quite similar to sequential .
data, except that there are no time
stamps; instead, there are positions in an ordered sequence. For
example, the genetic information of
plants and animals can be represented in the form of sequences
of nucleotides that ~ know n as
genes.

GGTTCCGCCTTCAGCCCCGCGCC
CGCAGGGCCCGCCCCGCGCCGTC
. .. -.,.. .

GAGAAGGGCCCGCCTGGCGGGCG
GGGGGAGGCGGGGCCGCCCGAGC
/ CCAACCGAGTCCGACCAGGTGCC
. ;

CCCTCTGCTCGGCCTAGACCTGA
GCTCATTAGGCGGCAGCGGACAG . .. .

GCCAAGTAGAACACGCGAAGCGC
TGGGCTGCCTGCTGCGACCAGGG
Figure 1.4: Genomic sequence data
Time Series Data (Tem poral Data)

Time series data, also referred to as time-stamped data,


is a sequence of data points indexed
in time order. Time-stamp ed is data collected at different points
in time. These data points typically
consist of successive measurements made from the same source
over a time interval and are used to
track chang e over time. Time series data is a special type of seque
ntial data in which each record is a
time series, i.e., a series of measurements taken over time. For
example, a financial data set might
conta in objects that are time series of the daily prices of various
stocks.
In troduction to Data Warehousi ng O ~ -CHAPTER, .. - . I
Fiscal Year N EPS E Index (Mid-July)
2053/54 176.3
2054/55 J63.3
2055/56 216.9
2056/57 360.7
2057/58 348.4
2058/59 227.5
2059/60 204.86
2060/61 22.04
2061/62 286.67
2062/63 386.86
2063/ 64 683.95
2064/ 65 963.36 -

2065/ 66 749.1
2066/ 67 477.73
2067/68 362.85
2068.69 389.74
2069/70 518.33
2070/71 1036.1
2071/72 961.2
2072/73 1718.2
1073/74 1582.67
2074/75 1200.09
2075/76* 1102.64
- - *Nepse Index is of Falgun 21, 2072

Figure 1.5: NEPSE index (Mid July) over 23 years


• Dai. Wa~hou11lng and Data Mining

Spatial Data
. h sical object that can be
Spatial data, al!.O known M gl'ospatial data, ls Information about a p y k' spatial data
, , d' . 1 ·t m Generally spea mg,
rPnrPO;M\tl'd by numerical values m a gl•ogmph1c coor ,na c sys e · 'Id ' Jake mountain
·-,.-· --·· E h hasa bu1 mg, '
~ t s t~ location, size and shape of an objt-ct on planet art sue

or township. ther types of attributes.


·· 11
O r areas as we as
0
Some objects have spatial attributes, such as positions ' re) that is collected for
. ·t t' temperature, pressu
An example of spatial data is weather data (prec1p1 a ,on,
a variety of geographical locations.
1PW over GLOBAL
I he r JI ~.;1)1~> Operational Bler1ded

figure 1.6: Spatial data of Total Precipitable Water (TPW) in the atmosphere over the globe.

DATA WAREHOUSE AND DATA WAREHOUSING

the volume of data, is increasing day by day the traditional ways and methods that were used to
AF,
manage and manipulate data were becoming obsolete in nature, to overcome this problem we Il;eed
to have a more effective and advanced data storage system that is with the use of data warehouses. A
warehouse in general terms is a historic repository of information collected from multiple sources,
stored under a unified schema, and that usually resides at a single site. A data warehouse stores
historical data of an organization so that they can analyze their performance over the past time
(days, weeks, months or years) and plan for the future.
A data warehouse may contain multiple databases. Within each database, data is organized into
tables and columns. Within each column, you can define a description of the data, such as integer,
data field, or string. Tables can be organized inside of schemas, which you can think of as folders.
When data is ingested, it is stored in various tables described by the schema. Query tools use the
schema to determine which data tables to access and analyze.
Introduction to Data Warehousing O ~- CHAPTER f ...,,, fl
Data Sources Data Warehouse Users

Operational
Database
Metadata

-Summary~· .-----=-----1i,:;,7-:-
Data Raw Data •
Operational Analysis
Database ::, Data for
Mining \ ~,:, >

Flat Files Mining


Figure 1.7: Data warehouse

A data warehouse (OW) is a digital storage system that connects and harmonizes large amounts of
data from many different sources. Its purpose is to feed business intelligence (BI), reporting, and
analytics, and support regulatory requirements - so companies can turn their data into insight and
make smart, data~ven decisions. Data -warehouses store current and historical data in one place
and act as the single source of truth for an organization.
Data warehousing is the process of constructing and using data warehouses. It is the process of
extracting & transferring operational data into informational data & loading it into a central data
store (warehouse). ·

Benefits of Data Warehousing


A well-designed data warehouse is the foundation for any successful BI or analytics program. Its
main job is to power the reports, dashboards, and analytical tools that have become indispensable to
businesses today. A data warehouse provides the information for your data~ven decisions - and
helps you make the right call on everything from new product development to inventory levels.
There are many benefits of a data warehouse some of major benefits are listed below:
• Better business analytics: With data warehousing, decision-makers have access to data
from multiple sources and no longer have to make decisions based- on incomplete
information.
• Faster queries: Data warehouses are built specifically for fast data · retrieval and
analysis. With a data warehouse, we cari ·very rapidly query large amounts of
consolidated data with little to no support from IT.
• Improved data quality: Before being loaded into the data warehouse, data cleansing
cases are created by the system and entered in a work list for further processing,
ensuring data is transformed into a consistent format to support analytics - .and
decisions - based on high quality, accurate data. _ . _..
• · Historical insight: By storing rich historical data, a data warehot1se lets decision-_
makers learn from past trends and challenges, make predictions, and drive continuous
business improvement.
11 Data Warehousing and Data Mining

Featu res of Data Warebouae


The key features of a data wareho use are discussed below:
Subject Oriented: A data wareh ouse is subject oriente d becaus
• inform ation around a subject rather than the organization's ongoin g
e it provides
operat ions. A data
wareh ouse target on the modeling and analysis of data for decision-mak
ers. Therefore,
data warehouses typically provid e a concise and straigh tforwa rd view
around a
particular subject. such as customer, product, or sales, instead of
the global
organization's ongoing operations. This is done by exclud ing data
that are not useful
concerning the subject and including all data needed by the users to unders
tand the
subject.
Integrated: A data warehouse integrates variou s hetero geneou s data source
• IIDBMS, flat files, and online u-ansaction records. It requires perlor ming data
s like
cleaning
and integration during data warehousing to ensure consistency in namin
g conventions,
attributes types, etc., among different data sources.
Tune Variant Historical information is kept in a data wareh ouse. For examp
• retrieve files from 3 months, 6 months, 12 month s, or even previo us
le, one can
data from a data
warehouse. These variations with a transactions system, where often
, only the most
i curren t file is kept
Non-volatile: Non-volatile means the previous data is not erased when
• added to it A data warehouse is kept separa te from the operat
new data is
ional databa se and
therefore frequent changes in operational databa se are not reflect
ed in the data
warehouse.
· Non-Volatile

OLTP Loads Data


Datab ase Warehouse
OLAf

/ I\
Read Add/C hange /Delet e Read

[ Opera tional system applic ations ) ( Decisi on Suppo rt System )

Figure 1.1: OLTP versus data warehouse and both are non-vo.
latile

Diffe renc es Betw een Operational Database and Data Warehouae

The
info Opera
. tional Datab ase is the source of informat ion
. for the data wareh ouse. It includ es detaile
d
rmatio n used to run the day-to -day operat ions of th b .
update s are made and reflect the curren t value of th:
Manag ement System s also called as OLTP (Onlin e T .
1:~~c:
Th
data freque ~tly change s as
ons. Opera tional Database
manag e dynam ic data in real-ti me. . . ransac tions Proces sing Databa ses), are used to
lnlt tl<illrtlon to Dntn Wnrehousing O CHAPTll 1 I 11
l \,t., \\'1\l\'t\l,11:-t' ~ , :it,•m:-1 l"l' t\'t' u~,•1~ or k11owh--dgt' workl•rs In the purpose of data analysis and
,f,, ,~i\'I\ 11 \.\km~. ~urh ~v:1h-m~ \\ H\ 11rg,,nl1,, nnd prl'St'nt information in specific formats to
,h''''"''wda h• the' ,11\'l'l~t• lW,'\b ,,1 VMit,ull u~r:1, ThcllC systcrns ore called as Online-Ana lytical
l't\,\'~,h~ ('-"It ,\ l') $,·:-t,•ms.
~'"''' m,,,,,, ditt,•1,•n,,•~hdwt'l'I\ n.,t,1 Wnrl'lmmR'S ond Opt.>r,1tional Database Systems are tabulated
l~'k'" ·
-O~ratlun.a
- l t,.,,..;;t'
®-
- -
t\"'r.,twn.,l ~y~t\'ms ,\1\' 1fos~1wd
\'\,tumc lt\\" ~\1.'lh.m p1,x~~~h'&,
t,, support high-
Data Warehouse
D,tln warehousin g systems are typically designed to
support high-volum e analytical processing (i.e.,
OLAP).
Oper.itiC'n.,l ~yst\'m~ ,'It\' usu;\lly conccmed with Dntn warehousing systems are usually concerned
'-'-t~nt J,'lt.,. with historical data.
o._,t,\ within o~~mti,.'nal systems nre mrunly Non-volatile, new data may be added regularly.
UJ.Xi,,~,d l\'gul,uly nC\."Otding to need. Once Added rarely changed.
It is Jcsignoo for real-time business denting nnd It is designed for analysi~ of business measures by
pm."-.~ subject area, categories, and attributes.
It is optimized for n simple set of trnnsnctions, It is optimized for exte!'t loads and high, complex,
~nerally adding or retrieving a single row at n unpredictable queries that access many rows per
ti~ per table. table.
It is optimized for validation of incoming Loaded with consistent, valid information, requires
information during transactions, uses validation no real-time validation.
data tables.
It supports thousands of concurrent clients. It supports a few concurrent clients relative to OLTP.
Operational systems are widely process-oriented.Data warehousin g sy~tems are widely subject-
oriented
Operational systems are usually optimized to Data warehousing s~stems are usually optimized to
perform fast inserts and upda~ of associatively perform fast retrievals of relatively high volumes of
small volumes of data. data.
Less Number of data accessed. Large Number of data accessed.
Relational databases are created for on-line Data Warehouse designed for on-line Analytical
transactional Pr~ing (OLTP) Processing (OLAP)
-
It has normalized schema Data warehouse has de-normaliz ed schema
E-R Model is used for designing Star or Snow flake Model is used for designing

A MULTIDIMENSIONAL DATA MODEL

Multidimen sional data model in data warehouse is a model which represents data in the form of
data cubes. It allows to model and view the data in multiple dimensions and it is defined by
dimensions and facts. Multidimen sional data model is generally categorized around a central theme
and represented by a fact table. It is typically used in the organizatio ns for drawing out Analytical
results and generation of reports, which can be used as the main source for imperative decision-
making processes. This model is typically applied to systems that operate with OLAP techniques
(Online Analytical Processing).
-
14 Data Warehousing and Data Mining
The Multi-Dimensional Data Model is a significant improvement amongst various areas of Data
Science, like the Data Warehouse system and the Data Management techniques. Multi-Dimensionai
Models are found to be the competent relational systems, which can serve as a key input for
generating Analytical outcomes for the purpose of business decision making processes.
Now, if we want to view the sales data with a third dimension, for example, suppose the data
according to time, product and location. Time is considered for four quarters i.e., Ql, Q2, Q3, and Q4,
wh~ four products are considered i.e., Television (TV), Personal Computer (PC), Access Point (AP),
and Solid-Smit Drrot (SSD), and the location is considered for the cities Pokhara, Kawasoti, Dhangadi,
and MahenJnnaagar. These 30 data are shown in the table below. The 3D data of the table are
represented as a series of 20 tables.

I ...........
Table 1.1 :3D view of sales data according to time, produd and location

.......
Location = "Kawasotf"

Product
Locallon • "Dhangadl" Location=''Mahendranagar"

- .... : Product Product

- sso
PC AP SSD TV . PC AP SSD TV PC AP SSD 1V PC AP

88 623 1087 968 38 872 818 746 43 591 605 825 14 400

-
890 64 698 1130 1024 41 925 894 769 52 682 680 952 31 512
..
58 788 1034 1048 45 1002 940 ?95 58 .728
. 812 1023 30 501
QI 1129 99'l 63 870 1142 1081 54 984 978 864 59 784 927 1038 38 580

.i).
<>~
~
~-o0~ Kawasoti
vi' Dhangadi

Ql 605
-C
825 14

~
::s
Q2 680 952 31 512

-
Cl
II
...
6
Q3 812 1023 30 501
r-
Q4 927 1038 38 580
1V PC AP SSD
Product (types)
Figure 1.9: Multidimensional Data Model (3D d t be
a a cu of sales data)
Introduction to Data Warehousing 0 CHAPTER 1 I 15
Working Mechanism of Multidimensi onal Data Model

Like any other system, lhe Multidimensional Data Model also works based on the predetermined
steps, in order to keep the pattern, the same throughout the industry and for ena bling the reusability
of the already designed or created database systems. For creating a Multidimensional Data Model,
every project should go all the way through the below phases,
• Congregating the requirements from the client
Similar to the other software applications, a Data Model also requires the precise
requiremen t from the c1ient. Most of the time, the client might not know what could be
accomplished with the selected technology. It is the software professional's duty to
provide clarity on to what extent a requirement can be achieved with the selected
technology, and elaborately collect the complete requirement.
• Categorizing the various modules of the system
After the process of collecting the entire requirement, the next step is to identify and
categorize each of the requirements under the module where they belong. Modularity
helps in better management, and also makes it trouble-free to implement, one at a time.
• Spotting the various dimensions based on which the system needs to be designed
Once the separation of various requirements and moving them to the matching
modules are completed, the next step is to identify the main factors, from the user's
point of view. These factors can be termed as the dimensions, based on which the
multidimensional data model can be created.
• Drafting the real-time dimensions and the corresponding properties
As a part of next step, in the process of the Multi-Dimensional Data Model, the dimensions
identified in the previous step can be further used for recognizing the related properties.
These properties are termed as the 'attributes' in the database systems.
• Discovering the facts from the already listed dimensions and their properties
From the initial requirement gathering, the dimensions can be a mix of dimensions and
facts. It is a significant step to distinguish and segregate the facts from the dimensions.
These facts play a great role in the structure of the Multi-Dimensional Data Models.
• Constructing the Schema to place the data, with respect to the information gathered
from the above steps:
Based on the information collected so far, the elaborate requirements, the dimensions,
the facts, and their respective attributes, a Schema can be constructed. There are many
types of Schemas, from which the most suitable type of schema can be chosen. A few of
the commonly used schema types are the Star Schema, the Galaxy Schema, and the
Snowflake Schema.

Advantage• and Diaadvanta1e • of Multidimensi onal Data Model

Below are the advantages and disadvantages:


Advantages
• Multi-Dimensional Data Models are workable on complex systems and applications,
unlike the simple one-dimensional database systems.
• The Modularity in this type of Database is an encouragement for projects with lower
bandwidth for maintenance staff.
• Data Warehousing and Data Mining

• Overall, organizational capacity and structural d efini'tion of the Mult i-Dim ensio ~
Data Models aids in holding cleaner and reliable data in the
database.
• Oear ly defined construction of the data placements makes it uncomplicated, in
.
situations like one team constructs the database, another team
works on~ an: SOine
other team works on the maintenance. It serves as a self - learn
ing sy5tem an when
required.
• . · y of the data and performance of the
As the system is fresh and free of junk, the eff1aenc
database system is found to be advanced & elevated.
Disadvanta,ges
~ As the Multi-Dimensional Data Model handles complex sySt
ems, these types of
databases are typically complex in nature.
• Being a complex system means the contents of the data b~
are hu?e in the amount as
well. This makes the system to be highly risky when there IS a secun
ty breach.
• When the system caches due to the operations on the Multi
-Oimens!onal Data Model,
the performance of the system is affected greatly.
·
• Though the end product in a Multi-Dimensional Data Model
is a~vantageous, the path
to achieving it is intricate most of the time.

OLA P OPERATION IN MULTIDIMENSIONAL DATA MOD


EL
- - . -•

Online Analytical Processing Server (OLAP) is based


on the multidimensional data model. It allows
managers, and analysts to get an insight of the information throu
gh fast, consistent, and interactive
access to information. Since OLAP servers are based on multi
dimensional view of data, we will
discuss OLAP operations in multidimel_'Sional data. Here is the
list of OLA P operations:
• Roll-up
• Drill-down
• Slice and dice
• Pivot (rotate)
Consider the OLAP operations which are to be performed
on mult idim ~ion al ·data. The figure
shows data cubes for sales of a shop. The cube contains the dimen
sions, location, and time and item,
where the location is aggregated with regard to city value
s, time is aggregated with respect to
quarters, and an item is aggregated with respect to item types.

Roll-Up

The roll-up operation (also known as drill-up or aggregation


operation) performs aggregation ~ a
data cube, by climbing up concept hierarchies, i.e., dimension
reduction. Roll-up is like zooming-out
on the data cubes. Figure shows the result of roll-up opera
tions performed on the dimension
location. The hierarchy for the location is defined as the Order
Street, city, province, or state, country,
The roll-up operation aggregates the data by ascending the·
location hierarchy from the level of. ~
city to the level of province. When a roll-up is perfo
rmed by dimensions reduc tion, one or more
dimensions are removed from the cube. For example,
consider a sales data cube having twO
dime nsions, location and time. Roll-up may be performed
by removing, the ti.me dimensic;lnf.
Introduction to Data Warehousing o CIIAP'III I
111
.1.f'l''1rit~ in an -l._Q;."l\.);J.tl\'•n \lt the total sales by location, relatively than by location and by time. The
l\,lk'" in,s J1~"T.1m tllu~tr.ite:. ho" roll-up works when the sales data cube is rolled-up from cities to
fl'\"'\ in..'\'

~o~j' Gandaki
~. ~
"~~-- S11dwp,1sbchim
~
QI 1423 1571 57

-...,
II
t:fl
Q2 1574 172 83 1194
·::,

-
CJ
~

E
QJ 1752 1818 38

~ Q4 1905 1902 97 l.364

1V PC AP SSD
Product (types)

ap on
om cities
irovinces)

Mala

liQ5 825 14
e QI
-

! 512
; Q.? 680 9S2 31

-e
a
Q3 112 1()23 30
i=
Q4 9'1:1 1038 38 580

1V PC AP SSD
Produd (types)

fi1uN 1.10: UlvltNltion of roll-up openmons on ..._ . . .


Roll-up is performed by climbing up a concept hierarchy for the dimension location. Initially the
concept hierarchy was •street< city< province< country•. On rolling up, the data is aggregated by
ascending the location hierarchy from the level of city to the level of province.
• Data ~·a."thNXsing and Data Mining

Drill-Down
The drllhlown operation (ol..<0 called roll-dow
n) is the reverse operation of roll-up. Drill-d
"'--cming-in on the data cube. It navigates from own is like
can be ~ less detailed record to more detailed data
by either ,repping down a concept hie
. Drill-down
ruchYfor a dimension or adding addition
dmten>ionS- The rollov,ing diagram illustrates al
how Drill-down works when drill-do
wn on time from
qu..rteIStomonth:

--..
f'I,
Ql 6:0 825 1-l

•= ~ 6&} 95! 31

-..
0
Cl
:: Q3 SU um 30 501
'€=
Q! '1!i 1038 38

1V PC .\P S.SD
Product ltvPesl

la

-"
f
t:
1ft
~
Apr
1

-=s
!,by
Ci ha
Cl w
Aq
~ ~
0a
Nov
Dee J
1V PC AP sso
F. Product (types)
igure 1.11: Ulustration of drill-aown
. n
operatio on sales clala
Introduction to Data Warehousing O CNAP1III I 1•
Drill..cfown is performed by stepping down a concept hierarchy for the dimension time. Initially the
concept hierarchy was "day < month < quarter < year." On drilling down, the time dimension is
descended from the level of quarter to the level of month.

Slice

The slice operation selects one particular dimension from a given cube and provides a new sub-cube.
Consider the following diagram that shows how slice works when sliced for first quarter i.e., Qt.

C
~~
-~
·~
. 0~ Kawasoti
,LI>-'¢: ~--,,,,.;.__- ~---'--.....r
V.r Dhangadi
818

- Ql 605 825 14

I& Q2 680 952 31 512


-cu

Q3 812 1023 30 501
i-,
Q4 927 1038 38 580
TV PC AP SSD
Product (types)

i'
-~ Pokhara 854 882 89 623

-
0
§ Kawasoti
'.::I
1087 968 38 872

] Dhangadi 818 746 43 591

Mahendranagar 14
605 825 400

1V PC AP SSD
Product (types)

figure 1.12: Illustration of slice operation on sales data

Here, Slice is performed for the dimension "time" using the criterion time = "Ql". It will form a new
sub-cube by selecting one or more dimensions.
21 Data W~housing and Data Mining

~ ~
.
. selects two or more dimensions from a given
Dice cu be and Provides a new sub-cube. Consider ~ n
following diagram that shows the dice operation. all

bE

-t!e QI 605 825 14

Q2 680 952 31
":,

-
CJ
Cl/
E Q3 812 1023 30 501
~
Q4 927 1038 38 580

TV PC AP SSD
Product (types)

Dice for (location =


HMahendranagar" or "Dhangadi"
and (time z ..,Ql,, or "Q2") and
(nrnnnrt = .,TV" OT "Pf"'' '

f QI 605
si
~ (5
- Q2 680

TV
952

PC
Product (types)

Figure 1.13: Illustration of dice operation on sales .._

The dice operation on the cube based on the following selection criteria involved three dimensi~
are
• (location = "Mahendranagar" or "Dhangadi") and
• (time = "Ql II or "Q2") and
• (item = 11 TV" or "PC").
Introduction to Data Warehousing O e-CHAPTEI 1.._I 21
Pivot
\ · m
· view · ·an
· ord er to provide
. operation is. also known
The pivot . data axes m
as rotation • It rotates the
alternative presentation of data. Consider the following diagram that sh th •
between location and product dimension. ows e pivot operation

Pokhara 854 882 89 623


i
',C

Q. Kawasoti 1087 968 38 872


s
',C

] Dhangadi 818 746 43 591

Mahendranaga
605 825 14 400

TV PC AP SSD
Product (types)

T
'
TV 854 1087 818 605

PC 882
-
968 746 825
. ,· . - :-.· :·-. -- -<

AP 89 38 :· 43 14

: __ SSD 623 872 591 400

Pokhara I<awasoti Dhangadi Mahendranagar


Location (Cities)
Fi9ure 1. 14: lllustrcdion of pivot operation on -~ .., data

CONCEFl'UAL MODELING OF DATA WAREHOUSE


_ ..:L•, ~•-• ... ;J

The conceptual data model is a structured business view of the data required to support business
. proc ellies, record business events, and trade related performance measures. This model focuses on

i identifying the data used in the business but not its processing flow or physical characteristics It is a
concise description of the user'sdata requirements without taking into account implementation details. ·
Conventional databases are generally designed at the conceptual level using some variation of the well-
known entity-relationship (ER) model, although the Unified Modeling language (UML) is being
22 Data Warehnusing and o.t, Minint ,
tt r,-~H1K0I mtJtid t,y Myyf ym~ • "'1
increasingly u!ed. Conceptu.al 5eheffla~can be eaflily tr~t!-..d ti, ,el , ,t t- w~o.f1t~ wn,A , ...,,u,.
of mappmg. rules. Providing extemwnf"i to the ERand uie '" UMf., mt'°'-',.
-
,11,• v;,"',vJ f/~J,::,;,tw,n ,,f 11.,
a eolution to the problem, since ultimately, they reyr~.:nt ' r~=,"' :Wn
pr1.!,kmt, ·n~t4,,,.,
underlying reJational technology cor.cept5 and, in additlfJ11, ,v.v,-:;, ~:on v,p ,J the J.<!'1)'~ J--v..«, ,
conceptual.data warehou5mg modeHngrequir~a mood ttiat,tearlyfM ..1 J· t' . _ t 11 p b!:tw~ t}...
h',,huu,.k-/~ re ,:, IIN"'n ~ .,.
A Data warehouse conceptual data model ii ru,thing but a 117 ·~.,n ~ ~
different entities {in other word different table) in the dJJ~ ,rvJdcl,

l'eatures of Data Warehoue eonceptua1 Data Model

Following are the feature, o f ~ data ~el; diff,erent entitkt in the ~ta nw~
• Thi5 is initial or high-level reJatitm- betw~ ~~ .1.._ •-L!~l..jflCl amo,w ~
, ludes the unportant entJtief ctnv llJII:' reuauv,_, r ' ?

Conceptual model me data model wt will not ~ amy attrwu~ fJ> th.,
• 1n the data warehouse conceptual '
entities.
• We also not define any primary key yet.
The figure 1.15 is an example of a conceptual data model.
Patient Date

0 1•

8
~Fact
--

'

Hospital
Figure 1.1~ Example of ~ . . . •
From the above figure(see figure 1.15) you can see that, data warehome conceptual model de9cribe
only high-level relationship between the entities,
.
Schema• for Multidimeuional Data llodell

A schema is a logical de!Cription that descn'bes the entire database. In the data waael!IOUle 11d
includes the name and description of record.5. It has all data items and mo diffeiait •W ~
associated with the data. Lib a database ha., a schema, it is required to maintain a 9dlftna for a d,la
warehouse as well There are different !Chemas based on Im !etup and data which~ maintained•
Ii data warehouse.
Introduction to Data Warehousing O ; CHAPTD 1 ID
There nre fact tnblt•s rmd dimension loblcs that form the basis of any schema in the data warehouse
that ore important to be understood. The foct tables should have data corresponding data to any
business process. Every row r~prescnts any event that can be associated with any process. It stores
quantitative informotion for onolysis. A dimension toble stores data about how the data in fact table
is being analyzed . 11,cy foci lilatc the foc t table in gothcring different dimensions on the measures
whkh are to he token.
The most populllr d.,t,, model for o Jato worchouse is a multidimensional model, which can exist in
the fom, of a ~h1r ~rhc111,1, a s11owflnkc sd1e111a, or 11 fi1ct constellation schema.

Star Schema

The most common modeling paradigm of da ta warehouse is the star schema. A star schema is
represented by one ln11:,--e fact table nnd many dimension tables. The schema diagram looks like a star
with a central fact table from which points radiating to the surrounding dimension tables. The fact
data is organized in the fact table, and the dimensional data is organized in the dimension table. The
fact tables are in 3NF form and the dimension tables are in denormalized form. Every dimension in
star schema should be represented by the only one-dimensional table. The dimension table should be
joined to a fact table. The fact table should have a key and measure.

A star schema for sales data is shown in figure below. Sales are considered along three dimensions:
product, time and location. The schema contains a central fact table for sales that contains key to each
of the three dimensions, along with two measures: rupees_sold and units_sold. To minimize the size of
the fact table, dimension identifiers (e.g., time_key and product_key) are system-generated identifiers.
time product
dimension table dimension table

sales
fact table
month e

ru ees_sold
location
units_sold dimension table
location_ke
street

figure 1.6: Star schema of sales data warehouse.

Snowflake Schema

Snowflake schema can be considered as a variant of the star schema. However, this is a more
complex data model compared to the star schema. In a snowflake schema, there is single, large and
21 Data Warehousing and Data Mining .

central fact table and one or more tables for each dimension. In order to eliminate redundancy,
dimension tables split data into different tables. Due to this normalization, often it results in mor,
complex queries and reduced query performance. The advantage of snowflake schema is that it uses
small disk space. The implementatio n of dimensions is easy when they are added to this schema. The
same set of attributes are published by different sources.
A snowflake schema for sales data is shown in figure below. Sales are considered along three
dimensions: product. time and location. The fact table is identical to star schema. The main difference
between the two schemas is in the definition of dimension tables. The single dimension table for
location in the star schema can be normalized into two new tables: location and city. The city key in
the new location table links to the city dimension as shown in figure below.

time product
dimension table dimension table
time_
da sales
dav_of_week fact table brand
month

units_sold

location city
dimension table •
ct

Figure 1.7: Snowflake 1chema of 1ale1 data warehouse.

Fact Constellation

Sophisticated applications may require multiple fact tables to share dimension tables. This kind of
schema can be viewed as a collection of stars, and hence is called a galaxy schema or a fact
constellation.
A fact constellation schema is shown in figure below. This schema specifies two fact tables, salts and
shipping. The sales table definition is identical to that of the star schema. The shipping table has five
d imensions, or keys: product_lcey, time_lcey, shippn-_lcey, from location, and to loa,tion and two measuresi
rupees_cost, and units_shipped. A fact constellation schema allows dimension tables to be shared
between fact tables. For example, the dimensions tables for time, product, and location are shared
between the sales and shipping fact tables.
Introduction to Data Warehou sing O ":' CHAnll i ·) 21

time shipping
fact ta ble
dimension table
time_ke
da
da _of_week
month
shipper
d imension table
shi _,ke

Figure 1.8: Fad constellation schema of salH and shipping data warehouse.

ARcmT ECTUR E OF DATA wAREHOUSE


Data Warehou se Architect ure is complex as it's an information system that contains historical and
commuta tive data from multiple sources. There are 3 approach es for construct ing Data Warehou se
layers: Single tier, Two tier and Three tier.
The structure of a single~tier data warehou se architecture centers on producin g a dense set of data
and reducing the volume of data deposited. Although it is beneficial for eliminati ng redundan cies,
this type of warehou se architect ure is not suitable for businesses with complex data requirem
ents
and numerou s data streams. This is where multi-tier data warehou se architectures come in as they
deal with more complex data streams. In comparison, the data structure of a two-tier data
warehou se architect ure splits the tangible data sources from the warehou se itself. Unlike a single-
tier, the two-tier architect ure uses a system and a database server. This is most common ly used in
small organiza tions where a server is used as a data mart. Although it is more efficient at data
storage and organiza tion, the two-tier architecture is not scalable. Moreover, it only supports a
nominal number of users. Three-tier data warehou se architect ure is the most widely used
architect ure of data warehou se as it produces a well-organized data flow from raw informat ion to
valuable insights. It consists of the Top, Middle and Bottom Tier. .. ·
The top tier is the front-end client that presents results through reporting , analysis, and data mining

tools. The middle tier consists of the analytics engine that is used to access and analyze the data. The
bottom tier of the architect ure is the database server, where data is loaded and stored.
. • Bottom Tier: The database of the data warehou se servers as the bottom tier. It is .
usually a relationa l database system. Data is cleansed, transform ed, and loaded into ·
this layer using back-end tools. ·

• Middle Tier: The middle tier in data warehou se is an OLAP server which is
impleme nted using either ROLAP or MOLAP or HOLAP model. For a user, this
applicati on tier presents an abstracte d view of the database. This layer also acts as a
mediator between the end-user and the database.
ll Data Warehousing and Data Mining
th
• Ton-Tier: The top tier is a front-end client layer. It is the tools and APll at yotin~conn~
r · It ld be query too s, repor g too\
and get data out from the data warehouse. cou s, t.
managed query tools, Analysis tools and Data mining tools.
. h'tecture of a data warehouse.
The data warehouse diagram below illustrates the 3-tier arc 1
2.
Top

•Data Mining OLAP Analysis Reporting

--------- -----------------
Tier

Middle 4
OLAP Server _
Tier
(ROLAP or MOLAP or HOLAP)

Output
_ ·- ___ - - - - - - -
---------------~- ------------- - .

:-..,,. .. - ..
- Bottom
Data
: : '. -ner --- ....
Warehouse f
Data.Mart

ETL
(Extract, Transform, Load and Refresh)

,---8
______8
_______8______ c::::::r-,
E3 I -
·-·.,
. -
: . . c::::::J : ~eterogen~us
1 Operational ERP CRM _ Flat Files I Data Sources
~ System · ' · : ·
----------------------------~
Figure 1.19: Three-Tier data warehouse architecture ·

DATA WAREHOUSE IMPLEMENTATION

Data Warehouse Implementation is a series of activities that are essential to create a fully functioning
Da_ta Warehouse, after classifying, analyzing and designing the Data Warehouse with respect to the
requirements provided by the client. The process of establishing and implementing a data
warehouse system in an organization is known as data warehouse implementation. . Data
warehousing is one of the most important components of the business intelligence process for an
organization. The data warehousing implementation process requires a series of steps that need to be
followed in a very effective manner. The processes are as follows:
Introduction to Data Warehousing 0 [1:HAPTER ·f .,...I %1 -
1. Requirement's analysis and capacity planning
The first process in data warehousing involves defining enterprise needs, defining
architectures, carrying out capacity planning, and selecting the hardware and software tools.
This step will contain be consulting senior management as well as the different stakeholder.
2. Hardware integration
Once the hardware and software has been selected, they require to be put by integrating the
servers, the storage methods, and the user software tools.
3. Modeling
Modelling is a significant stage that involves designing the warehouse schema and views.
This may contain using a modeling tool if the data warehouses are sophisticated.
4. Physical modeling
For the data warehouses to perform efficiently, physical modeling is needed. This contains
designing the physical data warehouse organization, data placement, data partitioning,
deciding on access techniques, and indexing. ·
S. Sources
The information for the data warehouse is likely to come from several data sources. This step
contains identifying and connecting the sources using the gateway, ODBC drives, or another
wrapper. . __
6, Ell. •. a ,•., • • ' '

The data from the source system will require to go through an ETL phase. The process of
designing and implementing the E1t. phase may contain defining a suitable ETL tool vendor
and purchasing and implementing the tools: This may contains customize the tool to suit the
. need of the enterprises. ~.r

7. Populate the data warehouses :..·· ..


Once the ETL tools have been agreed upon, testing the tools will be needed, perhaps using a
staging area. Once everything is working adequately, the ETL tools may :be used in
populating the warehouses given the schema and view definition.
8. User application
For the data warehouses to be helpful, there must be end-user applications. This step contains
designing and implementing applications required by the end-users.
9. Roll-out the warehouses and applications
Once the data warehouse has been populated and the end-client applications tested, the
warehouse system and the operations may be rolled out for the user's community to use.

DATAMART S

A data mart is a subset of a data warehouse oriented to a specific business line. Data marts contain
repositories of summarized data collected for analysis on a specific section or unit within an
organization E.g., Marketing, Sales, HR or finance. It is often controlled by a single department in.an
organization. Data Mart usually draws data from only a few sources compared to a Data warehouse.
Data marts are small in size and are more flexible compared to a Data warehouse.
n Data Warehousing and Da
ta Mining

Manufacturing
Data Mart

Finance
Data M ar t

Sales Marketing
Data.Mart r. -- -- -t 1 Data M
Data Warehouse art

figure 1.20: Data mart


W hy do we ne ed Da ta
Mart?
:
:I
i • Data. M ar t helps to enha
nc e user's response time du
e to red uc tio n in vo lu me

'
' •
lt provides easy access to
frequently requested da
ta. .
of data.

~
Da ta ma rt are simpler
to implement wh en co
the same time, the cost mp are d to co rp or ate Da
of implementing Da ta taw• !hauae. At
implementing a full da M ar t is certainly low er
ta warehouse. . co mp are d witl\._
• Compared to Da ta Wareho
use, a da tam art is agile
ca n be bu ilt quicker du . In ca se ~ change in mo
e to a smaller size. de l datarnart
_ .- _ .
• A Data.mart is defined
by a single Subject ·Matt
warehouse is defined by er Expert. .On the contrary
interdisciplinary SME fro da ta
ma rt is more op en to ~ a variety of do ma ins
change compared to Da . Hence, Data
ta.warehouse. ..
• Data is partitioned an d
allows very granular ac
- .
• Data can be segmented cess control privileges.
an d stored on different
ha rd wa re / software pla
tforms.
T yp ea of D at a Mart

Th er e are thr ee ma in
types of da ta m ar t
• De pe nd en t De pe nd en
t da ta marts are cre
operational, external or ated by dr aw ing da
bo th sources. ta directly . from
• In de pe nd en t In de pe
nd en t da ta ma rt is cre
wa re ho us e. ated wi th ou t th e use
of a central data
• Hy br id : Th is type of
da ta ma rts ca n take da
sy ste ms. ta from da ta warehou
ses or operational
· - .. . . ., _ _
De pe nd en t Da ta M ar
t
A de pe nd en t da ta ma
rt allows so ur cin g orga
of th e da ta ma rts ex am nization's da ta from a
ple s wh ich offers the be single Data Warehouse.
nefit of centralization. It is one
m or e phys ica l da ta ma U yo u ne ed to develop
Da ta M ar t in da ta wa
rts, th en yo u ne ed to
configure th em as depe one or
re ho us e ca n be bu ilt in ndent da ta ma rts. Depe
tw o different ways. Eithe ndestt
r wh ere a user ca n access
both
Introduction to Data Warehousing O [ CHAPTR 1 121
the data mart and data warehouse, depending on need, or where access is limited only to the data
mart. The second approach is not optimal as it produces sometimes referred to as a data junkyard. In
the data junkyard, all data begins with a common source, but they are scrapped, and mostly junked.

Fi
l__J
Operational

l ------
Sources

Ente,prl
Da

r=::::::::::iu
LJ Dependent
Departmental
Data Marts
Figure 1.21: Dependent data mart

Independent Data Mart

An independent data mart is created without the use of central Data warehouse. This kind of Data
Mart is an ideal option for smaller groups within an organization.
An independent data _mart has neither a relationship with the enterprise data warehouse nor with
any other data mart. In Independent data mart, the data is input separately, and its analyses are also
performed autonomously. _
Implementation of independent data marts is antithetical to the motivation for building a data
warehouse. First of all, you need a consistent, centralized store of enterprise data which can be
analyzed by multiple users with different interests who want widely varying information.

Operational
Sources

Independent
Data Marts
fllvre 1.22: llldependent data mart
Hybrid Data'llart
A hybrid data mart combines input from sources apart from Data warehouse. This could be helpful
when you want ad-hoc integration, like after a new group or product is added to the organization. It
is the best data mart example suited for multiple database environments and fast implementation
• 'Oata Warehousing and Data Mining
. 1 t data cleansing. effort. Hybrid Data ma rt aJso
. ti n
tufflU'O\llld for any orgamza o . It also req uire
d ·t .
s eas fl 'ble for smaller data<entri.c
supports large storage structureS, best suited for eXJ
an • is
applicalioffi.
operational
Sources

0
0
~
Dependent
Departmental
Data Marts

Figure 1.23: Hybrid data mart

Advantages and Disadvantage


s of a Da ta Mart
A.dvmt2gCS
• Data marts contain a subset of
organization-wide dat a. This
Da ta is valuable to ,a
specific group of people in an org
anization. a
It is cost-effective alternatives to a
• dat a warehouse, wh ich can tak
e hig h cos ts to build.
• Dat a Ma rt allows faster access of Data
.
• Dat a Ma rt is easy to use as it is specifically des ign ed for
the nee ds of its use rs. Thus, a
data ma rt can accelerate business
processes.
• Data Ma rts needs less implement
ation time com par e to Da ta Wa
fas ter to implement Data Mart as reh ous e sys tem s. It is
you only nee d to concentrate the
dat a. onl y sub set of the

• It contains historical data which ena


bles the analyst to det erm ine dat
Disadvantages a tre nds.

• Many times, enterprises create


too many dis par ate and unr ela
mu ch benefit. It can become a big ted dat a marts withoul
difficulty to ma inta in.
• Data Mart cannot provide com pan
y-w ide dat a analysis as the ir dat
a set is lim ited .
Di ffe ren ce Betw ee n Da ta War
ehouse an d Da ta Mart
Da ta warehouses are built to ser
ve as the central store of dat a for
ma rt fulfills the request of a spe the ent ire bus ine ss, wh ere as a
cific division or business functio data
dat a wa reh ous e and dat a ma rt n. The ma jor differentiate betwee
are tabulated below: n
Introduction to Data Warchou11ing O > CHAPTll 1 I fl
Data Warehouse Data Mart
A Data Warehouse is a vast repository of A data mart ls an only subtype of a Data
information collected from various Warehouses. It is architecture to meet the
organizations or departments within a requirement of a specific user group.
corporation.
It may hold multiple subject areas. It holds only one subject area. For example, Finance
or Sales.
Data warehouse is top-down model. It is a bottom-up model.
It holds very detailed information. It may hold more summarized data.
Works to integrate all data sources It concentrates on integrating data from a given
subject area or set of source systems.
In data warehousing, Fact constellation is In Data Mart, Star Schema and Snowflake Schema
used. are used.
It is a Centralized System. It is a Decentralized System.
Data Warehousing is the data-oriented. Data Marts is a project-oriented.
Data Ware house has long life While data-mart has short life than warehouse.
Data Warehouse is vast in size. Data mart is smaller than warehouse.
In data warehouse, Fact constellation schema While in this, Star schema and snowflake schema are
is used. used.

METADATA

Meta.data is data about the data or documentation about the information which is required by the
users. In data warehousing, metadata is one of the essential aspects. Several examples of Meta data
are listed below:
• A library catalog may be considered metadata. The directory metadata consists of
several predefined components representing specific attributes of a resource, and each
item can have one or more values. These components could be the name of the author,
the name of the document, the publisher's name, the publication date, and the methods
to which it belongs. ·
• The table of content and the index in a book may be treated metadata for the book.
• Suppose we say that a data item about a person is 70. This must be defined by noting
that it is the person's weight and the unit is kilograms. Therefore, (weight, kilograms) is
the metadata about the data is 70.
• A webpage may include metadata specifying what language it is written in, what tools
were used to create it, and where to go for more on the subject, allowing browsers to
automatically improve the experience of users.
• A digital image may include metadata that describes how large the picture is, the color
depth, the image resolution, when the image was created, and other data.
• A text document's metadata may contain information about how long the document is,
who the author is, when the document was written, and a short summary of the
document.
·• Another example of metadata are data about the tables and figures in a report like this
book. A table (which is a record) has a name (e.g., table titles), and there are column
names of the tables that may be treated metadata. 'The figures also have titles or names.
II Data Warehousing and Data Mining

Key features of Metadata are described as following: nd components.


ware house sy stems a use and end-users views.
location and description s of
• The Names, definitions, structures, and content of data-wareho
• Identification of authoritative data sources .
• . 1 d to populate data. tion to end-user analytical
inf
• Integration and transformation ru es use
Integration and transformation rules used to deliver orma
• . t0 analysis subscribers·
tools. .
• Subscription information for informatio n deliv ery_ _
ce- .
• Metrics used to analyze warehouses usage and performan
• Security authorizations, access control ~t, etc. . ~ d ta warehouses. Metadata
and using e a .
Meladata is used for building, maintaining, managmg,
allow mas
access to help understand the content and find data.

COMPONENTS OF DATA WDIHOUSE


~ °' . al d tabase ETL (extract, transform, and
a en~e ered for speed so that we
A typicaldlk i 1ft ttt w four main components: a centr - . .
load) 1aa11,z:Af§ai, a a«eSS tools. All of these comp
onents are .
Access Tools .
Clllpfl!llbllsquiddy and analyze data on the fly.
ETL centr al Database

Metadata

Figure 1.24: Comp onent s of data warehouse


ouse. We see the Ell. shows on the left.
The figure 1.24 shows the essential elements of a typical wareh
. In the middle, we see the Data Storage
The Data staging element serves as the next building block
ent not only stores and manages the data; it
component that handles the data warehouses data. This elem
also keeps track of data using the metadata repository.
The Information Delivery component shows on
information from the data warehouses available
the right consists of all the different ways of making the
are listed below:
to the users. The major four components of Datawarehouse
1. Central database
house. Traditionally, these have beell
A database serves as the foundation of your data ware
the cloud. But because of Big Datar tbt
stan dard relational databases running on premise or in
tion in the cost of RAM, in-memory
need for true, real-time performance, and a drastic reduc
databases are rapidly gaining in popularity.
CHAml I IQ
'/.. I ,,,111 l 11l◄•K11•ll•111
f )11111 , .. p11ll1•d (111111 111 111 ,1 1111yMll •11111 '""' 111111ll fl,•d '" 1iltw1 ,,,,. J11h1111111tlc,11 lol' n,pld Ultlllytlml
I o,,r~11 11 1ptl1111 11~•l11n 11 VII I lt•ty ,if dnl11 l,111w11tlt111 "l'Jlflllltfo•tt ,uwh 1,11 HI I. (oYlrtrd, lruw,foro1,
lu11d) 111111 m:I' 1111 w1•ll 11n ri-111 111111• d 11l1111•pll1·11l111111 l,11JJ, Jo11d 11rrn·1•11td11r,, thitu lrt111aformutlrm,
nod d11l11 q1111l11 y 1111d ""' II 111111•111 1111, vlt 1•11.
Mrhtefol11
M1•l11di1IH 111 d111♦1 II 11p1•r t(l,,t, tl u, n<11 or1•, 1m11tt.·, vi,lut·N, 011d other footurc11 o(
11110111 y u111• dfl l11,
tllfl d111i 1~<•t14
111 <111111 w1111•lwww. 'I hw1• 111 liw1lncm1 r111•l,1dulu, which nddt1 co11tcxt to your
yt11 1,·
<111111, fllld lt•t 111111'111 11 wh1d,11i1, wldd1 d1•11<•rJl11•11 l1ow to 11r1·c1m dhf.1 ~ f11rludlrtfi whcrt· It rctildt•H
1111d how ll lM,., , 111 lml'd ,
,1 , l)"ta w 1111·t•ho11 H'1 ncc,•Hfl tc,ol,;
Arn•u1◄ loolH11llow w1<.•111 lo l111t•r11c•t wltl1thi- d1,tt, Jrt your <.Iri ta wa rchoulK:. Hx,1mpfot1 of occcs,i
100111 l11d11d<•: q11c·1y ti nd •l'I"'' ting t()OIR, u11pllcutlo11 dcvclopmc11t tools, dn t., rnlrilng tool~,
(111d 0 1.. /\ P lool11.

Nt.:1£0 fi'OK DATA WAKl~IIOlJSING

A wcll-dcHlfincd du tn warehouse Is the fo undotlon for nny E1ucccst1fu1 Bl or anolytks progrom. Jts
m.ilr1 job !11 to power the rcporl11, dashboards, ond n11ulytJcoJ tools lhot hove become indispensable to
buRlncsses today. A doln worchousc provides the Jnformation for your data-driven decision!! - and
helps you make the rlghl coll on everything from new product development to Jnvcntory levels.
1ncrc ore mony benefits of o data warehouse. f lcre are just a few:
• Better busJness analytics
With data worehouslng, decision-makers have access to data from multiple sources and
no longer have to make decisions based on incomplete information.
• Faster queries
Data warehouses arc built specifically for fast data retrieval and analysis. With a data
warehouse, we can very rapidly query large amounts of consolidated data with little to
no support from IT.
• Improved data quality
Before being loaded into the data warehouse, data cleansing cases are created by the
system and entered In a worklist for further processing, ensuring data is transformed
into a con11Jstent format to support analytics - and decisions - based on high quality,
accurate data.
• HJttorkal insight
By storing rich historical data, a data warehouse lets decision-makers learn from past
trends and challenges, make predictions, and drive continuous business improvement.

TRENDS IN DATA WAREHOUSING

Data warehouses have come a long way since their earliest iterations back in the 1980s. They're now
fa11ter, more powerful, and in the cloud. But what hasn't changed is their goal: to unlock the full
value of an organization's data. The latest developments are only making this easier with
automation, empowerment, and openness. Trends in data warehousing are listed below:
• CootinueJ Growth in Data warehousing
• Data warehou..~ N!- be\.-ome Mainstream
• lndustJie!. usms
Data wareho\L.~
• Y~~u tioo& tn,duc ts

• Status oi Data warehoo-~ market


• Signific&ntTrends
• ,,·et,Ena't'1ed Data warehouse

Coatba11e4 Glowtll bl Data Warehousing

Data ~-arehou sing is no longer a purely D()"\-el idea for ::!~


and expenmerJ:rion. H has become
.
office ret. but neither is i1 con5ned
11.aia&tlam.. TIUe. the data "-arehouse is not in every d
}ia.e made a romrnit:roer'...: to data
mh • ~ busir.cSJfS. More than half of all U~- c o m ~ v,areboa,seS or are pla:I=ung to
·• · •
--~ s i g . ~~
~
90"1. oi multinational compames ha,e
..... .. clala walfflDQ9eS in the next~- months- L..-.i-.At :c of vendois nae
Even ~ \he fiat~ ·~ of data wan:uvu.; --1..~..;.,.g in the late 1990s, mua.u ~
>&u the a2m-nt of data
. · roducts. Vendor solution s and products run ~
IIDJ W ~ amrt
-~--lSX~
9
·_
·
UAta
",th
numerous p . . .
• .
'""'"1iro data anah'"Sis, metadat a, and so on. A
..:. mod.Jmo data ac:qmsttion, data 'i__.J' J

tv.-er-~ ci.:ie t"UHxsheci by the Data \\ arehousing Institnte .at that time featured no fewer than 1w
='D'

~ . • The manet is huge and continues to grow m revenue dollars.


re.

lea..:::f r-oaaos.
Data Wuelu ,ue Bu Becom e llainstreaDl
r:: ::1e ea..-fy stage. mar significant factots drove many companies to move into data warehou sing;
• F1erCk c:ompeliliao
• ~ ·emmen t deregu)ation
• ~ eed to n?\'amp internal plOC 95FS

• Imperati ve for customiz ed marketing


Te!ecomm:.rrucations, banking, and retail were the first industries to adopt data warehou
sing. 1bat
1-.-as largely because of go\·emm ent deregulation in telecommunications
and banking . Retail
tr.!sinesses moved into data v.arehousing because of fiercer competition. Utility compan
ies joined the
grtrJp as that sector was deregula ted. The next wave of businesses to get into data
warehou sing
amsiste d of compan ies in financial services, health care, insuranc e, manufacturing, pharma
ceuticals,
rra_"lSJ)Ortation. and distribut ion.

Indust ries Using Data Warehouse

Althoug h earlier data warehouses concentrated on keeping summary data for high-lev
el analysis, we
now see .larger and larger data warehouses being built by different businesses. Now compan
ies have
the ability to capture, cleanse, maintain , and use the vast amounts of data generate
d by their
business transactions. The quantities of data kept in data warehouses continue to
swell to the
terabyte range Data warehouses storing several terabytes of data are not uncommon
in retail and
telecommunications.
--
Introduction to Data Warehousing O r c HAPTER f'"'a U
vendor Solution & Products

As an information technology professional, you are familiar with database vendors and database
products. In the same way, you are familiar with most of the operating systems and their vendors.
How many leading database vendors are there? How many leading vendors of operating systems
are there? A handful? The number of database and operating system vendors pales in comparison
with data warehousing products and vendors. There are hundreds of data warehousing vendors and
thousands of data warehousing products and solutions.
In the beginning, the market was filled with confusion and vendor hype. Every vendor, small or big,
that had any product remotely connected to data warehousing jumped on the bandwagon. Data
warehousing meant what each vendor defined it to be. Each company positioned its own products as
the proper set of data warehousing tools. Data warehousing was a new concept for many of the
businesses that adopted it. These businesses were at the mercy of the marketing hype of the vendors.

Status of Data Warehouse Market

With so many vendors and products, how can we classify the vendors and products, and thereby
make sense of the market? It is best to separate the market broadly into two distinct groups. The first
group consists of data warehouse vendors and products catering to the needs of corporate data
warehouses in which all enterprise data is integrated and transformed. This segment has been
referred to as the market for strategic data warehouses. This segment accounts for about a quarter of
the total market. The second segment is looser and more dispersed, consisting of departmental data
marts, fragmented database marketing systems, and a wide range of decision support systems.
Specific vendors and products dominate each segment.
DW market in beginning stages DW market currently -
(state of flux) more mature and stable)

New
· -vendor Technologies
conslidations {LAP, etc)

Product Web-enabled
Sophistication solutions

Vendor/ ·' Open


Product Source
Specialization BI/OW

Administrative Infrastructure
Tools Tools

Figure 1.25: Status of the data warehousing market.


• Data Warehousing and Data Mining

Sign ifica nt Trends


g data wareh ousin g. These experts
Some exper ts feel that, until now, technology has been drivin
in software. In the next few years, data
declare that we are now begin ning to see important progress
ially for optim izing querieS, indexing
warehousing is expected make big strides in softw are, espec
n, and expan ding dimensional
very large tables, enhancing SQL, improving data comp ressio
modeling.
1. Real-Time Data Warehousi ng
2 Multi ple Data Types
• Adding Unstr uctur ed Data
• Searching Unstr uctur ed Data
• Spatial Data
3. Data V1Sualization
• Major Visualization Trend

• VI.SUalization Type s
t • Adva nced Visualization Techniques ·
o Char t Manipulation.
o Drill Down.
0 Advanced Interaction
4. Web Enabled Data warehouse
1. Real-Time Data Warehousing
ouses have been used mainly for
Business intelligence systems and the supporting data wareh
strategic decision making. The·data warehouse was kept
separate from operational systems.
business intelligence for tactical
Recen tly industry momentum is swinging towards using
warehousing is progressing rapidly
decision making for day-to-day business operations. Data
senior executives.
to the point that real-time data warehousing is the focus of
ical trends, whereas real-time data
Traditional data warehousing is passive, providing histor
view of the business in real time. A
ware housing is dyna mic, providing the most up-to-date
almost zero latency.
real-time data warehouse gets refreshed continuously, with
ndously by sharing information
Real-time infor mation delivery increases productivity treme
unde r a lot of pressure to provide
with more people. Companies are, therefore, coming
al business processes. However,
infor matio n, in real time, to everyone connected to critic
real-time data warehousing have
extraction, trans form ation, and integration of data for
sever al chall enges.
2. M ultip le Data Types
e ~low shows the different types
Wha t are the types of data we call unstr uctur ed data? Figur · · on making more
h rt dec1S1
,l data that need to be integ rated in the data ware ouse to supp o
c:ff1•<; t1v,·1y.
tntroduction to Data Warehousing 0 CHAmlt 1 131

Dato Warehouse
Rcspository Video
~turroTe, t

~ iii
Audio
figure 1.26; Multiple data types in a data warehouse

Adding Unstructured Data: Some vendors are addressing the inclusion of unstructured data,
~pt.'ci;illy te-'t and images, by treating such multimedia data as just another data type. These
.ire d~IDN as part of the relational data and stored as binary large objects (BLOBs) up to 2 GB
in si::e. User-defined functions (UDFs) are used to define these as user defined types (UDTs).
Se..uchi:ng Unstructured Data: For free-form text data, retrieval engines pre index the textual
documents to allow searches by words, character strings, phrases, wild cards, proximity
operators und Boolean operators. Some engines are powerful enough to substitute
corresponding words and search. A search with a word mouse will also retrieve documents
containing the word mice. Searching audio and video data directly is still in the research
stige. Usually, these are described with free-form text, and then searched using textual search
methods that are currently available.
Sp.ti;tl D.ita: Adding spatial data will greatly enhance the value of your data warehouse.
Address,. street block, city quadrant, county, state, and zone are examples of spatial data.
Vendors have begun to address the need to include spatial data. Some database vendors are
pro\id:ing spatial extenders to their products using SQL extensions to bring spatial and
business data together.
3.. Om V1.SU.lization
Visualization of data in the result sets boosts the process of analysis for the user, especially
when the user is looking for trends over time. Data visualization helps the user to interpret
query results quickly and easily.

/
~teraction ~o- -
-
Advanced--~~~
~~o~
1/
Multiple Link
Drill ~~ .,.,charts
Oo111'II ~
'.S,~ O
Scientific
/Chart Types Neural Data

C / • ~0\J Entcrp_nllC /
~

=/
In~~ ii ~arting Unstructured
] ,...,~terns Text Data

1 oL I
~ ~~~o / T~~~s
!g. / ~'t-~~ ~wntation
Printm Graphics Realtlme
/

Rq,ans 1/ / Data Feed


S / _Simple Multidimensional .-'
Q Buie Numeric Data Smcs "/'

~
";I Charting Series - -
Small Data sets to largoe, complex strucures

~'lute 1.27: Data visualization trends.


- -... n • "' ' ,r 1; .-.~ • •

SI Dntn Wnrt'housl ng and Datn Mining trends have shaped the


s three rnajor
Major Visualb:alion Trend1,: In the l,1sl few year '
direction of dnla vbuoliwtion sofl w,,rc. f c standard chart type. The
I the (nrrn o so rn N h
More Chart Types: Most dato vi'lual11-0tions ore n lot or another chart type. ow t e
numerical t\'Sulls ore co11wrlt•J into O Pc1 d1Mt a sct1ttcr P ' h I nger
' ' has grown muc O •
1iz•1tlon software
list of ch.lrt types supported by do1., vlsun ~• , ta tic. Dynam ic chart types are
, ti s arc no longer 5 ·t nd then se
l nt~ractive Visualization: VlsuohLO on , It chart, rnanjpula te 1 , a e
con review a resu
themselves u&-r int~rfoCl'S, Your users
newer views on\inc. ·ew a sim ple series of numeric
ult Sets: Users can v1 . .
Visualiution of Complex and Large Res . ualization software can v1suahze
~ . . r bar chart. But newer v1s
result points as a ruduncntory pie 0
. d lex data structures .
thousands of result pomts an comp t large array of chart types. Gone
. 1· · oftware now suppor s a .
Visualiulion Types: V,sua ization s d f sers vary enormou sly. Business
. . hs The current nee s O u
are the days of simple lme grap · . d . tifi·c users need scatter plots and
h ts Technical an sc1en
use.rs demand pie and bar c ar · . d t ed maps and other three-dim ensional
1 t l00king at spaha1 a a ne
constellation graphs. Ana ys s . d h e shaped the direction of data
representations. In the last few years, ma1or tren s av
vi sualization software.
ble advance in visualization
Advm ced Visualization Techniqu es: The most re~ar ka . .
techniques is the transition from static charts to dynamic mteractive presentations.
• Chart Manipulation: A user can rotate a chart or dynamica lly change the chart ty~ to
· f the results With complex visualizat ion types such as constellation
get a cIearer view o · th ·
and scatter plots, a user can select data points with a mouse and then move e points
around to clarify the view.
Drill Down: The visualization first presents the results at the summary level. The user

can then drill down the visualization to display further visualizat ions at subsequent
levels of detail.
4. Web Enabled Data Warehous e
Web-enabling the data warehouse means using the Web for informati on delivery and
integrating the clickstream data from the corporate Web site for analysis.
Oickstrea m data tracks how people proceeded through your company 's Web site, what
triggers purchases, what attracts people, and what makes them come back. Clickstrea m data
enables analysis of several key measures, including:
• Customer demand
• Effectiveness of marketing promotions
• Effectiveness of affiliate relationsh ip among products
• Demographic data collection
• Customer buying patterns
• Feedback on Web site design
A clickstrea m Web house may be the single most important tool for identifyin g, prioritizing.
and retaining e-commerce customers. The Web house can produce the following useful
informati on:
Introduction to Data Warehousm
· g O ,,.•
; ~ _.A_
r,sa I n
• Site statistics 1
• Visitor conversions
• Ad metrics
• Referring partner links
• Site navigation resulting in orders
• Site navigation not resulting in orders
• Pages that are session killers


i· '-

lfif
General Public
zC . &J
Customers Business Partners
·~

. ~
Employees

t ·.
Results through
Extranets t
The Web
Simplified
View or Clickstream Data,
Web-enabled
Data Warehouse l Requests through
Extranets
l
'

Warehouse Webhouse
Repository Repository

Figure 1.28: Web-enabled data warehouse

(4Exer
.....___ ______..;.
cis~) ___________________________
_...
1. Define data. Describe life cycle of data with suitable diagram.
2. List out types of data. Describe them with suitable example.
3. What is data warehouse? How it is differed from database? Explain
4. Differences between operational database and data warehouse.
41 Data Warehousing and Data Mining
5. Define multi-dimensional data model. Explain their uses.
6. Describe OLAP opera tion in multidimensional data mode l.
7. What is data warehousing? Describe architecture of data wareh ouse.
8. What do you mean by conceptual modeling of data wareh ouse? Expla
in
9. . D
How to imple ment data warehouse? Explam. escn co 'be mpon ents of data wareh ouse.
10. What is data mart? How it is differed from data wareh ouse? Expla in
11. Describe needs of data warehousing. Describe trends in data wareh ousin
g.
12. Define metad ata. How it is differed from datab ase? Expla in
13. Define Real-Time Data Warehousing with suitab le exam ple.
14. What are the stages of data warehousing?
15. What are the steps to build the data wareh ouse?
16. What is the difference betwe en metad ata and data dictio nary?
17. What is the very basic difference betwe en data wareh ouse and opera
tional datab ases?
18. Explain the data warehouse architecture. Differentiate between distri
buted and virtua l data
warehouse

(
ii
19. Explain the structure of a data wareh ouse and how a data wareh ouse
of a business.
helps in better analy sis
I
20. Differentiate between data marts and data cubes.

□□□

You might also like