0% found this document useful (0 votes)

18 views

04 Handout 1

Uploaded by

abendanjhanine

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as DOCX, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

18 views

04 Handout 1

Uploaded by

abendanjhanine

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as DOCX, PDF, TXT or read online on Scribd

You are on page 1/ 12

IT2003

Data Analytics Components of Data Warehouse (DW)

A. Data Warehouse  Data Warehouse Database – This is a databank that

stocks all enterprise data and makes it manageable for
reporting.
 A data warehouse is a database designed to enable and updated.
support business intelligence (BI) activities, especially  Non-volatile – Once data is in a data warehouse, it is
analytics. stable and does not change.
 intended to perform queries and analysis
 optimized for data retrieval, not for transaction
processing
 centralizes and consolidates large amounts of data
from multiple sources
 allows organizations to derive valuable business
insights from their data to improve decision-making
 can be considered an organization’s “single source of
truth"

Characteristics of Data Warehouses (DW)

 Subject-Oriented – The DW can analyze data about a
particular subject or functional area.
 Subjects can be products, customers, departments,
regions, etc.
 The functional area can be sales, marketing,
finance, distribution, etc.
 Focuses on the data rather than on the processes that
modify the data
 Integrated – The DW creates consistency among
different data types from different sources.
Example: A student’s level in the database might be
defined as “freshman”, “sophomore”, “junior”, or “senior”
in the accounting department, and “FR”, “SO”, “JR”, “SR”
in the computer information systems department.
 DW must conform to a common format that is
acceptable throughout the organization.
 Time-variant – Data in DW represents the flow of data
through time. It can be organized weekly, monthly, or
annually, etc. Example: When data for previous weekly
sales is uploaded to the data warehouse, the weekly,
monthly, yearly, and other time- dependent subjects
such as products, customers, stores, etc. are also
04 Handout 1 *Property of STI
 [email protected] Page 1 of 12
IT2003
 always implemented on the relational database
management system (RDBMS) technology like SQL
 Extraction, Transformation, and Loading Tools (ETL) – These
tools are used for performing all the conversions,
summarizations, and all the changes needed to transform
data into a unified format in the data warehouse. These
include:
 In case of missing data, populating them with defaults
 Calculating summaries and derived data
 Eliminating unwanted data in operational databases
from loading into the data warehouse
 Converting to common data names and definitions
 Metadata – is data about data that describes the data
warehouse. It provides the source, transformation,
integration, storage, usage, relationships, and history of
each data element.
Example: A line in a sales department database
contains: SNY-JP-0010-15000
This is meaningless data until we consult the meta that tell
us the following:
 Brand Model: Sony
 Country Manufactured: Japan
 Product ID: 0010
 Price: ₱15,000
Metadata can be classified into two (2) categories:
1. Technical Metadata – contains information about the
warehouse, which is used by data warehouse designers
and administrators.
2. Business Metadata – contains details that give end-users
an easy way to understand the information stored in the
data warehouse.
 Data Warehouse Access Tools – Corporate users generally
cannot work with databases directly. They use the
assistance of the following tools:
 Query and reporting tool – help users produce corporate
reports for analysis that can be in the form of
spreadsheets, calculations, or interactive visuals.
 Application development tools – In such cases, custom
reports are developed using application development
tools

04 Handout 1 *Property of STI

 [email protected] Page 2 of 12
IT2003
when built-in graphical and analytical tools do not stress on the production system
satisfy the analytical needs of an organization.
 Data mining – a process of discovering meaningful
new correlations, patterns, and trends by mining a
large amount of data. Data mining tools are used to
make this process automatic.
 OLAP tools – allow users to analyze the data using
elaborate and complex multi-dimensional views.
 Data Marts – a small, single-subject data warehouse
subset that provides decision support for the particular
user group.

Figure 1. Data Warehouse Architecture

Benefits of a Data Warehouse

1. Allows business users to quickly access critical data
from some sources all in one place. Therefore, it saves
the user's time of retrieving data from multiple sources
2. Provides consistent information on various cross-functional
activities. It is also supporting ad-hoc reporting and query.
3. Helps to integrate many sources of data to reduce
04 Handout 1 *Property of STI
 [email protected] Page 3 of 12
IT2003
4. Helps to reduce total turnaround time for analysis and reports and used as the basis of business
reporting decisions. It contains measurement or facts to the
5. Restructuring and integration make it easier for the user data and foreign key to dimension table.
to use for reporting and analysis.
6. Stores a large amount of historical data. This helps users
to analyze different time periods and trends to make
future predictions.

Star Schema
 A star schema is a data-modeling technique used to
map multi- dimensional decision support data into a
relational database.
 It has two (2) common components:
 Facts table – are data that will be included in
04 Handout 1 *Property of STI
 [email protected] Page 4 of 12
IT2003
 Dimension table – are attributes that qualify and OLAP OLTP
provide more information about facts. It contains Provides historical data for Manages day to day
dimensions of a fact and they are joined to fact table reporting and planning operations
via foreign key. Uses complex queries for Uses standard queries for
Example: retrieving a large amount of data
data such as inserting, deleting,
and updating

OLAP vs. OLTP

B. Online Analytical Processing (OLAP)

 Online Analytical Processing (OLAP)

 a software tool that is used for data analysis and
reporting purposes for business decisions
 used by business analysts, managers, and
executives. Example: In Netflix, OLAP was used
for movie recommendations based on watch
history.
 Online Transaction Processing (OLTP)
 an operational system that manages the
day-to-day transactions of an organization
 used by the Database Administrator (DBA) and
Database Professionals
Example: In ATM centers, OLTP is used for money
withdrawals, transfers, deposits, and inquiries.

04 Handout 1 *Property of STI

 [email protected] Page 5 of 12
IT2003
Characteristics of OLAP

 Multi-dimensional data analysis techniques – Data is

processed and viewed as part of a multi-dimensional
structure.
 Advanced Database support – To deliver efficient
decision support, OLAP tools must have the
following:
 Access to many kinds of DBMSs, flat files, and
internal and external data sources
 Rapid and consistent query response times
 Support for very large databases because the data
warehouse could easily and quickly grow to multiple
terabytes in size
 Easy-to-use end-user interfaces – permit the user to
navigate the data in a way that simplifies and accelerates
decision making or data analysis with easy-to-use
graphical interfaces

Types of OLAP

 Relational OLAP (ROLAP)

 Works directly with relational databases
 Fact and dimension tables are stored as relations.
 Multi-dimensional OLAP (MOLAP)
 extends OLAP functionality to multi-dimensional
database management systems (MDBMS)
 best suited to manage, store, and analyze multi-
dimensional data
ROLAP vs.
MOLAP
Characteristic ROLAP MOLAP
Schema Uses Star Schema Uses data cubes
Speed Good with small Faster for large data
data sets sets
Access Unlimited Limited to predefined
dimensions dimensions

04 Handout 1 *Property of STI

 [email protected] Page 6 of 12
IT2003

Campus NumberOfStudents Program

Ortigas- 400 BSIT
Cainta
Ortigas- 200 BSCS
Cainta
Cubao 600 BSIT
Cubao 300 BSCS
Table 1) having columns Campus, NumberOfStudents, and Program.

Figure 2. Multi-dimensional data/data cubes

OLAP operations

SQL has been enhanced with analytic functions that support

OLAP-type processing. This includes:

 ROLLUP operator – an extension of the GROUP BY

clause that is used to create subtotals and grand
totals for a set of columns
 CUBE operator – Like ROLLUP, this generates subtotals for
all the combinations of grouping column s specified in
the GROUP BY clause.
 PIVOT operator – allows you to write a cross-tabulation,
which means you can aggregate your results and
rotate rows into columns

Example:
Assume that we have a table named Enrolled_Students (see
04 Handout 1 *Property of STI
 [email protected] Page 7 of 12
IT2003
Table 1. Enrolled_Students

Using the ROLLUP operator, we will display the total

number of students enrolled in specific campuses and
the grand total of students enrolled in all campuses.
SELECT Program, Campus,
SUM(NumberOfStudents) AS 'TotalStudents'
FROM Enrolled_Students
GROUP BY ROLLUP (Campus, Program)

Output:

Explanation:
 ROLLUP operator creates an additional row that
represents subtotals for each campus. In the last
row, it represents the grand total for all values in
the NumberOfStudents column.
(Note: To make the output more readable, you can use the COALESCE()
function to substitute the appropriate value representing subtotal and
grand total to the NULL values.)
Using the CUBE operator, we will display all possible
combinations of columns in the Enrolled_Students table
(see Table 1).
SELECT COALESCE(Program, 'All Program') AS 'Program',

04 Handout 1 *Property of STI

 [email protected] Page 8 of 12
IT2003

COALESCE(Campus, 'All Campus') AS 'Campus', SELECT NumberOfStudents, Program FROM

SUM(NumberOfStudents) AS 'TotalStudents' Enrolled_Students
FROM Enrolled_Students ) AS SourceTable
GROUP BY CUBE (Program, Campus) PIVOT
Output:
(
SUM(NumberOfStudents)
FOR Program IN ([BSIT], [BSCS])
) AS PivotTable

Output:

Explanation:
 The first query specifies the column for cross-
tabulation results. We want to display the first column
Explanation:
as the identifier of the remaining column (second and
 We use the COALESCE function to specify the returning
third columns).
text of NULL values in a specific column.
 As for the source table, we specify the returning data
 It has similar output to ROLLUP, but it returns two (2)
that will be used for the pivot statement.
additional rows below the grand total. This is because the
 In the pivot statement, we used the SUM() function to get
ROLLUP operator generates aggregated results for the
the total number of students that are enrolled.
selected columns like Campus in a hierarchical way, while
 We need to specify what rows/values to include from the
the CUBE operator generates an aggregated result that
Program
contains all the possible combinations for the selected
column as it will become our column headings in our pivot
columns.
table.
Using the PIVOT operator, we will turn the unique values/rows
C. Data Mining
in the
 Data mining refers to analyzing massive amounts of data
Program column into multiple columns.
in a data warehouse or other sources to uncover hidden
SELECT 'Total students in all campus:' AS 'Program:', [BSIT], [BSCS] trends, patterns, and relationships. This explains the
FROM past and predicting the future for analysis.
( Data Mining Implementation Process
04 Handout 1 *Property of STI
 [email protected] Page 9 of 12
IT2003
1. Business Understanding: In this step, the goals of
the businesses are set, and the important
factors that will help in achieving the goal are
discovered.

04 Handout 1 *Property of STI

 [email protected] Page 10 of 12
IT2003
2. Data Understanding: This step will collect the entire classification, etc. It analyzes past events or instances
data and populate the data in the tool (if using any in the right sequence for predicting a future event.
tool).
3. Data Preparation: This step involves selecting the Benefits of Data Mining
appropriate data, cleaning, constructing attributes from  Helps with the decision-making process
data, integrating data from multiple databases.  Helps companies to get knowledge-based information
4. Modeling: Selection of the data mining technique such
as decision-tree, generate test design for evaluating the
selected model, building models from the dataset, and
assessing the built model with experts to discuss the
result is done in this step.
5. Evaluation: This step will determine the degree to which
the resulting model meets the business requirements.
The model is reviewed for any mistakes or steps that
should be repeated.
6. Deployment: In this step, a deployment plan is made. The
strategy to monitor and maintain the data mining model
results to check for its usefulness is formed. Final reports
are also made, and a review of the whole process is
done to check any mistake and see if any step is
repeated.

Data Mining Techniques

1. Classification: used to retrieve important and relevant
information about data and metadata.
2. Clustering: used to identify data that are like each
other. This process helps to understand the
differences and similarities between the data.
3. Regression: used to identify and analyze the
relationship between variables.
4. Association Rules: used to help find the association
between two or more Items. It discovers a hidden pattern
in the data set.
5. Outer detection: used to observe data items in the
dataset that do not match an expected pattern or
expected behavior.
6. Sequential Patterns: used to discover or identify similar
patterns or trends in transaction data for a certain
period.
7. Prediction: used to combine other data mining
techniques like trends, sequential patterns, clustering,
04 Handout 1 *Property of STI
 [email protected] Page 11 of 12
IT2003
 Facilitates automated prediction of trends and
behaviors as well as the automated discovery of
hidden patterns
 The speedy process which makes it easy for the users to
analyze a huge amount of data in less time
Figure 3. Extracting knowledge from data

REFERENCES
Coronel, C. and Morris, S. (2018). Database systems design, implementation, &
management (13th ed.). Cengage Learning.
Elmasri, R. & Navathe, S. (2016). Fundamentals of Database Systems (7th ed.). Pearson
Higher Education.
Kroenke, D. & Auer, D. Database Processing: Fundamentals, Design, and Implementation
(12th ed.). Pearson Higher Education.
Silberschatz A., Korth H.F., & Sudarshan, S. (2019). Database system concepts (7th ed.).
McGraw-Hill Education.

04 Handout 1 *Property of STI

 [email protected] Page 12 of 12

Learn Data Warehousing in 24 Hours
From Everand
Learn Data Warehousing in 24 Hours
Alex Nordeen
No ratings yet
Learn SAP BI in 24 Hours
From Everand
Learn SAP BI in 24 Hours
Alex Nordeen
3/5 (1)
Advance Database System
No ratings yet
Advance Database System
8 pages
21IS503 UnitI LM1
No ratings yet
21IS503 UnitI LM1
28 pages
Data Warehousing unit 1,2
No ratings yet
Data Warehousing unit 1,2
9 pages
Data Warehousing: Special Thanks To: Liem Tran, Robert Turan, and Miguel Delgado
No ratings yet
Data Warehousing: Special Thanks To: Liem Tran, Robert Turan, and Miguel Delgado
46 pages
VV_Data Warehousing and Data Mining
No ratings yet
VV_Data Warehousing and Data Mining
25 pages
Unit IV - Data Warehousing and OLAP Technologies
No ratings yet
Unit IV - Data Warehousing and OLAP Technologies
68 pages
4th Year Dw& Dm Kai075 Unit 1
No ratings yet
4th Year Dw& Dm Kai075 Unit 1
25 pages
Dwbi Notes-4
No ratings yet
Dwbi Notes-4
34 pages
Chapter6_DataWareHousing_final
No ratings yet
Chapter6_DataWareHousing_final
46 pages
Unit IV Data Mining
No ratings yet
Unit IV Data Mining
65 pages
Data Mining UNIT 2 LECTURE NOTES
No ratings yet
Data Mining UNIT 2 LECTURE NOTES
32 pages
Data Mining
No ratings yet
Data Mining
98 pages
DMDW 6
No ratings yet
DMDW 6
41 pages
Data Warehousing: Data Models and OLAP Operations
No ratings yet
Data Warehousing: Data Models and OLAP Operations
41 pages
OLAP (Online Analytical Processing) : Zalpa Rathod (39) Yatin Puthran (37) Mayuri Pawar (35) Mitesh Patil
No ratings yet
OLAP (Online Analytical Processing) : Zalpa Rathod (39) Yatin Puthran (37) Mayuri Pawar (35) Mitesh Patil
37 pages
01-Introduction To BI
No ratings yet
01-Introduction To BI
41 pages
Advance Database Concepts
No ratings yet
Advance Database Concepts
23 pages
Dwbi Notes
No ratings yet
Dwbi Notes
32 pages
Dwbi Notes
No ratings yet
Dwbi Notes
26 pages
Data Warehouse
No ratings yet
Data Warehouse
4 pages
Wk3-4 Data Warehouse
No ratings yet
Wk3-4 Data Warehouse
60 pages
BusinessIntelligence 2023
No ratings yet
BusinessIntelligence 2023
36 pages
DW Concepts
100% (1)
DW Concepts
40 pages
DWM Unit 1 (2023)
No ratings yet
DWM Unit 1 (2023)
38 pages
3 BI and Data Warehouse For BI Solutions
No ratings yet
3 BI and Data Warehouse For BI Solutions
43 pages
Unit 1
No ratings yet
Unit 1
99 pages
Data Warehousingand Data Mining
No ratings yet
Data Warehousingand Data Mining
65 pages
DWM UNIT 1 (2)
No ratings yet
DWM UNIT 1 (2)
67 pages
Unit-1 4
No ratings yet
Unit-1 4
54 pages
Module-1
No ratings yet
Module-1
78 pages
DW Concepts Shiva
No ratings yet
DW Concepts Shiva
32 pages
Data Warehouse
No ratings yet
Data Warehouse
71 pages
DMBI Sort
No ratings yet
DMBI Sort
89 pages
Intro to DWH & BI
No ratings yet
Intro to DWH & BI
31 pages
Data Warehousing Interview Q&A
No ratings yet
Data Warehousing Interview Q&A
14 pages
DW&DM Material
No ratings yet
DW&DM Material
107 pages
First Data WarehouseAima First Final Updated 9 Sep 2016
No ratings yet
First Data WarehouseAima First Final Updated 9 Sep 2016
188 pages
Datawarehouse: Fact Table
No ratings yet
Datawarehouse: Fact Table
55 pages
3.1 What Is Data Warehouse?: Unit Iii
No ratings yet
3.1 What Is Data Warehouse?: Unit Iii
33 pages
BA Module 1
No ratings yet
BA Module 1
52 pages
DMDW 7
No ratings yet
DMDW 7
30 pages
UNIT-1 (RIT-062) : Data Warehousing
No ratings yet
UNIT-1 (RIT-062) : Data Warehousing
34 pages
Lecture 13
No ratings yet
Lecture 13
17 pages
Data Warehousing AND Data Mining
No ratings yet
Data Warehousing AND Data Mining
51 pages
06 Data Warehouse Design and Analytics
No ratings yet
06 Data Warehouse Design and Analytics
36 pages
DWDM Unit 2
No ratings yet
DWDM Unit 2
21 pages
UEU Sistem Pendukung Keputusan Pertemuan 5
No ratings yet
UEU Sistem Pendukung Keputusan Pertemuan 5
46 pages
An Introduction To MSBI & DWH by QuontraSolutions
No ratings yet
An Introduction To MSBI & DWH by QuontraSolutions
32 pages
Chapter1 Data Warehousing Intro
No ratings yet
Chapter1 Data Warehousing Intro
48 pages
Data Mining& Data Warehousing.
No ratings yet
Data Mining& Data Warehousing.
13 pages
Overview of Data Warehousing and OLAP: Slide 29-2
No ratings yet
Overview of Data Warehousing and OLAP: Slide 29-2
36 pages
Data Warehousing and Data Mining
No ratings yet
Data Warehousing and Data Mining
10 pages
Concepts and Techniques: - Chapter 4
No ratings yet
Concepts and Techniques: - Chapter 4
58 pages
Donato Malerba: Dipartimento Di Informatica Università Degli Studi, Bari, Italy Malerba@di - Uniba.it
No ratings yet
Donato Malerba: Dipartimento Di Informatica Università Degli Studi, Bari, Italy Malerba@di - Uniba.it
29 pages
Concepts and Techniques: - Chapter 4
No ratings yet
Concepts and Techniques: - Chapter 4
58 pages
04OLAP
No ratings yet
04OLAP
66 pages
Dwm Chp2 Notes
No ratings yet
Dwm Chp2 Notes
21 pages
Data_Mining_Warehousing Unit I
No ratings yet
Data_Mining_Warehousing Unit I
45 pages
DBMS Lab Questions
No ratings yet
DBMS Lab Questions
56 pages
PDF (Ebook) Expert Cube Development with SSAS Multidimensional Models: Expert Tips and Tricks for Designing Analysis Services Multidimensional Models by Alberto Ferrari, Christopher Webb, Marco Russo ISBN 9781849689908, 9781849689915, 1849689903, 1849689911 download
100% (2)
PDF (Ebook) Expert Cube Development with SSAS Multidimensional Models: Expert Tips and Tricks for Designing Analysis Services Multidimensional Models by Alberto Ferrari, Christopher Webb, Marco Russo ISBN 9781849689908, 9781849689915, 1849689903, 1849689911 download
67 pages
Assignment 2
No ratings yet
Assignment 2
5 pages
Smaranika Patnaik - Resume
No ratings yet
Smaranika Patnaik - Resume
6 pages
Full Stack Development (Jspiders)
100% (1)
Full Stack Development (Jspiders)
11 pages
DBMS PPT
No ratings yet
DBMS PPT
10 pages
Data Warehouse Lab Manual
No ratings yet
Data Warehouse Lab Manual
60 pages
LINQ To SQL
No ratings yet
LINQ To SQL
10 pages
Idbms File
No ratings yet
Idbms File
48 pages
Jdbcnotes
No ratings yet
Jdbcnotes
174 pages
Top 50 Mainframe Interview Questions & Answers PDF
100% (1)
Top 50 Mainframe Interview Questions & Answers PDF
11 pages
User Defined Server Roles PDF
No ratings yet
User Defined Server Roles PDF
14 pages
Optimizing Stored Procedure Performance: Kimberly L. Tripp Solid Quality Learning
No ratings yet
Optimizing Stored Procedure Performance: Kimberly L. Tripp Solid Quality Learning
38 pages
Tuning SSAS Processing Performance
No ratings yet
Tuning SSAS Processing Performance
24 pages
Sample Code
No ratings yet
Sample Code
10 pages
Normalization Vs DeNormalization
No ratings yet
Normalization Vs DeNormalization
26 pages
Tut_01 CMT221
No ratings yet
Tut_01 CMT221
4 pages
Midterm Exam Database Programming With SQL
No ratings yet
Midterm Exam Database Programming With SQL
15 pages
DBE Model Questions
No ratings yet
DBE Model Questions
7 pages
Introduction To DocumentDB A NoSQL JSON
No ratings yet
Introduction To DocumentDB A NoSQL JSON
697 pages
Memory Management Interview Questions and Answers Guide.: Global Guideline
No ratings yet
Memory Management Interview Questions and Answers Guide.: Global Guideline
11 pages
Building Reports With Oracle SQL Developer
No ratings yet
Building Reports With Oracle SQL Developer
14 pages
Module 4-DB Management Module
No ratings yet
Module 4-DB Management Module
104 pages
DBMS Most IMP Q by Campusify
No ratings yet
DBMS Most IMP Q by Campusify
3 pages
ASUMBI COMPUTER PP2
No ratings yet
ASUMBI COMPUTER PP2
7 pages
Section 3 (Quiz)
No ratings yet
Section 3 (Quiz)
6 pages
Data modeling (1)
No ratings yet
Data modeling (1)
7 pages
Oracle Queries
80% (5)
Oracle Queries
606 pages
Project Report Group8
No ratings yet
Project Report Group8
38 pages
SQL Test Paper
No ratings yet
SQL Test Paper
5 pages

04 Handout 1

Uploaded by

04 Handout 1

Uploaded by

IT2003

Data Analytics Components of Data Warehouse (DW)

A. Data Warehouse  Data Warehouse Database – This is a databank that

Characteristics of Data Warehouses (DW)

04 Handout 1 *Property of STI

Figure 1. Data Warehouse Architecture

Benefits of a Data Warehouse

OLAP vs. OLTP

B. Online Analytical Processing (OLAP)

 Online Analytical Processing (OLAP)

04 Handout 1 *Property of STI

 Multi-dimensional data analysis techniques – Data is

 Relational OLAP (ROLAP)

04 Handout 1 *Property of STI

Campus NumberOfStudents Program

Figure 2. Multi-dimensional data/data cubes

SQL has been enhanced with analytic functions that support

 ROLLUP operator – an extension of the GROUP BY

Using the ROLLUP operator, we will display the total

04 Handout 1 *Property of STI

COALESCE(Campus, 'All Campus') AS 'Campus', SELECT NumberOfStudents, Program FROM

04 Handout 1 *Property of STI

Data Mining Techniques

04 Handout 1 *Property of STI

You might also like