0% found this document useful (0 votes)

17 views

What Is Data Warehouse

A data warehouse architecture defines the overall data communication and presentation architecture for end users within an enterprise. Common architectures include basic data warehouse, warehouse with staging area, and warehouse with staging area and data marts. The document then explains data scales, types of data collection methods, steps of data processing, data marts, data lakes, and knowledge discovery in databases with advantages and disadvantages.

Uploaded by

محمد صادق

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as DOCX, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

17 views

What Is Data Warehouse

Uploaded by

محمد صادق

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as DOCX, PDF, TXT or read online on Scribd

You are on page 1/ 9

what is Data warehouse ? Draw figure and explain its architecture?

A data warehouse architecture is a method of defining the overall architecture of

data communication processing and presentation that exist for end-clients
computing within the enterprise. Each data warehouse is different, but all are
characterized by standard vital components.

Three common architectures are:

o Data Warehouse Architecture: Basic

o Data Warehouse Architecture: With Staging Area
o Data Warehouse Architecture: With Staging Area and Data Marts
o Operational System
o An operational system is a method used in data warehousing to refer to
a system that is used to process the day-to-day transactions of an organization.
o Flat Files
o A Flat file system is a system of files in which transactional data is stored, and
every file in the system must have a different name.
o Meta Data
o A set of data that defines and gives information about other data.
o Data Warehouse Staging Area is a temporary location where a record from
source systems is copied.
o Properties of Data Warehouse
Architectures
o 1. Separation: Analytical and transactional processing should be keep apart as
much as possible.
o 2. Scalability: Hardware and software architectures should be simple to upgrade
the data volume, which has to be managed and processed, and the number of
user's requirements, which have to be met, progressively increase.
o 3. Extensibility: The architecture should be able to perform new operations and
technologies without redesigning the whole system.
o 4. Security: Monitoring accesses are necessary because of the strategic data
stored in the data warehouses.
o 5. Administerability: Data Warehouse management should not be complicated.

Types of Data Warehouse Architectures

Explain Data scale & types of Data scales with examples?

In statistics and data analysis, the data scale refers to the level of measurement
used to quantify data points. Essentially, it tells us how meaningful comparisons
and calculations we can make based on the data's values. There are four main
types of data scales, each with its own characteristics and limitations:

1. Nominal Scale:  Characteristics: Categorizes data into distinct groups without

any inherent order or ranking. Imagine sorting books by genre. Each genre
(fantasy, history, etc.) is distinct, but there's no order between them.  Examples:
Eye color (blue, green, brown), blood type (A, B, AB, O), job titles (doctor, teacher,
engineer).  Operations allowed: Counting and identifying frequencies within each
category.

2. Ordinal Scale:  Characteristics: Data points are ranked or ordered, but the
intervals between ranks are not necessarily equal. Think of movie ratings (1-5
stars). While we know 4 stars is "better" than 2 stars, the difference in quality
might not be the same between all levels.  Examples: Customer satisfaction
ratings (poor, average, good, excellent), socioeconomic status (low, middle, high),
degree of injury (minor, moderate, severe).  Operations allowed: Ranking,
identifying median and mode, comparing relative order.

3. Interval Scale:  Characteristics: Data points are ordered with equal intervals
between them, but there is no true zero point. Consider temperature in Celsius.
The difference between 20°C and 30°C is the same as 0°C and 10°C, but a
temperature of 0°C doesn't mean "no heat" at all.  Examples: Temperature
(Celsius, Fahrenheit), height and weight, IQ scores.  Operations allowed: All
operations of ordinal scales plus calculations like addition, subtraction, finding
mean and standard deviation.

4. Ratio Scale:  Characteristics: Data points are ordered with equal intervals and
have a true zero point, meaning the absence of the measured quantity. Imagine
money. A balance of $0 truly means no money, and the difference between $10
and $20 is the same as $20 and $30.  Examples: Age, time, distance, salary,
What Are the Different Data Collection Methods?

Primary and secondary methods of data collection are two approaches used to gather
information for research or analysis purposes. Let's explore each data collection method
in detail:

1. Primary Data Collection:

Primary data collection involves the collection of original data directly from the source or
through direct interaction with the respondents.

a. Surveys and Questionnaires: Researchers design structured questionnaires or surveys

to collect data from individuals or groups.
b. Interviews: Interviews involve direct interaction between the researcher and the
respondent.
c. Observations: Researchers observe and record behaviors, actions, or events in their
natural setting.
d. Experiments: Experimental studies involve the manipulation of variables to observe their
impact on the outcome.
e. Focus Groups: Focus groups bring together a small group of individuals who discuss
specific topics in a moderated setting.

2. Secondary Data Collection:

Secondary data collection involves using existing data collected by someone else for a
purpose different from the original intent.

a. Published Sources: Researchers refer to books, academic journals, magazines,

newspapers, government reports, and other published materials that contain relevant
data.

b. Online Databases: Numerous online databases provide access to a wide range of

secondary data, such as research articles, statistical information, economic data, and
social surveys.

c. Government and Institutional Records: Government agencies, research institutions,

and organizations often maintain databases or records that can be used for research
purposes.
d. Publicly Available Data: Data shared by individuals, organizations, or communities on
public platforms, websites, or social media can be accessed and utilized for research.

e. Past Research Studies: Previous research studies and their findings can serve as
valuable secondary data sources.

Explain the steps of data processing?

Stages of data processing Data processing involves transforming raw

data into valuable information, and it usually follows these key steps:

1. Data Collection: This first step gathers OR collection data from

various sources like sensors, databases, websites, surveys, or
experiments. The chosen method depends on your specific data
needs and goals.

2. Data Preparation: Here, you make the raw data usable for
analysis. This often involves:  Cleaning: Removing errors,
inconsistencies, and missing values.  Transformation: Formatting
data into a consistent structure, converting units, and handling
outliers.  Integration: Combining data from multiple sources if
needed.

3. Data Input: The prepared data is then loaded into a chosen

platform for analysis, like a data warehouse, spreadsheet, or
statistical software.

4. Data Processing: This is where you analyze and manipulate the

data to extract insights. This can involve:  Descriptive statistics:
Summarizing the data through measures like mean, median, and
standard deviation.  Data visualization: Creating charts, graphs, and
other visual representations to understand patterns and trends. 
Modeling: Building statistical or machine learning models to predict
future outcomes or relationships within the data.

5. Data Output: The extracted insights are presented in a clear and

concise way, often through reports, dashboards, or visualizations.

6. Data Storage: Finally, the processed data is saved securely for

future use, analysis, or reference.

Explain Data Mart and in Detail?

A data mart is a subject-oriented, integrated, time-variant,
non-volatile collection of data in support of decision-making
processes for a specific department or business unit within
an organization.
1. Subject-oriented: Data marts are built around specific topics or areas of
interest, such as marketing, sales, finance, or human resources.
2. Integrated: Data marts integrate data from various sources, both internal
and external, into a single, consistent format.
3. Time-variant: Data marts typically track data over time, allowing users to
analyze trends and patterns.
4. Non-volatile: Unlike operational databases that are constantly being
updated, data marts are relatively static.

5. Decision-making support: Ultimately, the purpose of a data mart is to

support decision-making processes within a specific department or business
unit.
Data Lake Explained in Detail ?
A data lake is essentially a giant container that can hold a massive
amount of data in its raw, native format. Imagine it like a digital
warehouse, but instead of neatly organizing everything into shelves
and categories, it just throws everything in together.
 Scalability: Data lakes can scale up easily to accommodate whatever amount
of data you throw at them.

 Flexibility: You can store any type of data in a data lake, regardless of its
structure or format. This makes them ideal for organizations that deal with a
lot of diverse data.

 Accessibility: Data lakes are designed to be easily accessible by data analysts

and scientists.

-Cost-effectiveness: Compared to data warehouses, data lakes are typically

more costeffective, especially for storing large amounts of data.

 Complexity: Managing a data lake can be complex, especially as it grows in

size.

 Data quality: Because data lakes store everything, it's easy for low-quality or
irrelevant data to creep in.

 Security: Ensuring the security of all that data in a data lake is crucial
Explain KDD in detail with advantages & disadvantages with diagram &
example?

in the context of computer science, “Data Mining” can be referred to as

knowledge mining from data, knowledge extraction, data/pattern analysis,
data archaeology, and data dredging. Data Mining also known as Knowledge
Discovery in Databases, refers to the nontrivial extraction of implicit,
previously unknown and potentially useful information from data stored in
databases.

KDD (Knowledge Discovery in Databases) is a process that involves the

extraction of useful, previously unknown, and potentially valuable information
from large datasets.

Data Cleaning Data cleaning is defined as removal of noisy and irrelevant data
from collection. 1. Cleaning in case of Missing values. 2. Cleaning noisy data,
where noise is a random or variance error. 3. Cleaning with Data discrepancy
detection and Data transformation tools

Data Integration Data integration is defined as heterogeneous data from

multiple sources combined in a common source(DataWarehouse). Data
integration using Data Migration tools, Data Synchronization tools and
ETL(Extract-LoadTransformation) process.

Data Selection Data selection is defined as the process where data relevant to
the analysis is decided and retrieved from the data collection. For this we can
use Neural network, Decision Trees, Naive bayes, Clustering, and Regression
methods

Data Transformation Data Transformation is defined as the process of

transforming data into appropriate form required by mining procedure. Data
Transformation is a two step process:

1. Data Mapping: Assigning elements from source base to destination to

capture transformations. 2. Code generation: Creation of the actual
transformation program
2. Code generation: Creation of the actual transformation program.

Data Mining Data mining is defined as techniques that are applied to extract
patterns potentially useful. It transforms task relevant data into patterns, and
decides purpose of model using classification or characterization.

Pattern Evaluation Pattern Evaluation is defined as identifying strictly

increasing patterns representing knowledge based on given measures. I

Knowledge Representation This involves presenting the results in a way that is

meaningful and can be used to make decisions.

Advantages of KDD

1. Improves decision-making:
2- Increased efficiency:
3- Better customer service:
4- Fraud detection:
5- Predictive modeling

Disadvantages of KDD

1. Privacy concerns: 3-Data Quality:

2. Complexity: 4-High cost:

DATA Ware House & Mining NOTES
100% (2)
DATA Ware House & Mining NOTES
31 pages
Learn Data Warehousing in 24 Hours
From Everand
Learn Data Warehousing in 24 Hours
Alex Nordeen
No ratings yet
4.1 KD Handbook v2.12 FinalDraft
No ratings yet
4.1 KD Handbook v2.12 FinalDraft
47 pages
Sun Tzu's Art of War in Business
50% (2)
Sun Tzu's Art of War in Business
9 pages
Ba Unit 2
No ratings yet
Ba Unit 2
20 pages
Data Repositories in Data Analytics
No ratings yet
Data Repositories in Data Analytics
8 pages
microooooooooooooo
No ratings yet
microooooooooooooo
33 pages
Data Warehouse: in It's Simplest Form, A Data Ware House Is A Collection of Key
No ratings yet
Data Warehouse: in It's Simplest Form, A Data Ware House Is A Collection of Key
4 pages
BIS (1)
No ratings yet
BIS (1)
11 pages
Data Accquisition
No ratings yet
Data Accquisition
6 pages
DWM QB ANSWERS
No ratings yet
DWM QB ANSWERS
14 pages
What Is Data Warehouse?
No ratings yet
What Is Data Warehouse?
9 pages
??? ????????? ???
No ratings yet
??? ????????? ???
21 pages
Jawaharlal Nehru Engineering College: Laboratory Manual
No ratings yet
Jawaharlal Nehru Engineering College: Laboratory Manual
106 pages
DWM QB Soln
No ratings yet
DWM QB Soln
18 pages
Data Warehousing, Business Analytics and Online Analytical -1 (1)
No ratings yet
Data Warehousing, Business Analytics and Online Analytical -1 (1)
35 pages
Data Warehouse and Data Mining - Unit 1
No ratings yet
Data Warehouse and Data Mining - Unit 1
40 pages
2024, BA, 7, BA at Data Warehouse Level
No ratings yet
2024, BA, 7, BA at Data Warehouse Level
63 pages
DWDM fresh notes for Unit 1,Unit 2 ,Unit 3
No ratings yet
DWDM fresh notes for Unit 1,Unit 2 ,Unit 3
54 pages
Data Warehousing and Data Mining
No ratings yet
Data Warehousing and Data Mining
36 pages
Overview of Data Warehousing and OLAP
No ratings yet
Overview of Data Warehousing and OLAP
12 pages
Dataware House Lec 2
No ratings yet
Dataware House Lec 2
3 pages
1202 MD - Rayhan 64-D
No ratings yet
1202 MD - Rayhan 64-D
5 pages
Data Warehousing & Data Mining
No ratings yet
Data Warehousing & Data Mining
16 pages
Data Mining
No ratings yet
Data Mining
25 pages
Unit-1 DMDW
No ratings yet
Unit-1 DMDW
22 pages
Data Mining
No ratings yet
Data Mining
26 pages
Data Warehousing and Data Mining: Downloaded From
No ratings yet
Data Warehousing and Data Mining: Downloaded From
94 pages
Data Mining and Data Warehouse: Qis College of Engineering & Technology Ongole
No ratings yet
Data Mining and Data Warehouse: Qis College of Engineering & Technology Ongole
10 pages
Data Sources Data Handling Data Visualization
No ratings yet
Data Sources Data Handling Data Visualization
23 pages
Data Warehousing
No ratings yet
Data Warehousing
20 pages
Antim Prahar Business Data Warehousing Data Mining 2024
No ratings yet
Antim Prahar Business Data Warehousing Data Mining 2024
65 pages
Chapter 2
No ratings yet
Chapter 2
44 pages
DWH Concepts
No ratings yet
DWH Concepts
18 pages
Emergency chapter two(2)
No ratings yet
Emergency chapter two(2)
41 pages
Data Warehouse Modeling
No ratings yet
Data Warehouse Modeling
17 pages
What Is A Data Mart - IBM
No ratings yet
What Is A Data Mart - IBM
9 pages
Business Intelligence
No ratings yet
Business Intelligence
17 pages
Chapter 2 Data Science (4)
No ratings yet
Chapter 2 Data Science (4)
8 pages
Chapter 2
No ratings yet
Chapter 2
79 pages
Data Warehousing Concepts
No ratings yet
Data Warehousing Concepts
50 pages
Data Mining and Data Warehousing: Gayathri Vidya Parishad College of Engineering Visakhapatnam
No ratings yet
Data Mining and Data Warehousing: Gayathri Vidya Parishad College of Engineering Visakhapatnam
11 pages
Unit - I DW
No ratings yet
Unit - I DW
12 pages
Data Mining and Data Warehouse
No ratings yet
Data Mining and Data Warehouse
11 pages
Introduction To Data Mining
No ratings yet
Introduction To Data Mining
29 pages
Unit 2 Data Warehouse
No ratings yet
Unit 2 Data Warehouse
22 pages
Unit 2
No ratings yet
Unit 2
58 pages
Data Warehouse
No ratings yet
Data Warehouse
56 pages
Cs3352 Foundation of Data Science
No ratings yet
Cs3352 Foundation of Data Science
80 pages
UNIT-1 (RIT-062) : Data Warehousing
No ratings yet
UNIT-1 (RIT-062) : Data Warehousing
34 pages
dwm 2
No ratings yet
dwm 2
31 pages
Data Warehouse & Data Mining
No ratings yet
Data Warehouse & Data Mining
12 pages
612719980-DATA-ware-house-mining-NOTES
No ratings yet
612719980-DATA-ware-house-mining-NOTES
31 pages
UNIT 3
No ratings yet
UNIT 3
7 pages
Dataware House Design and Modeling
No ratings yet
Dataware House Design and Modeling
5 pages
Unit_2 Data Warehouse
No ratings yet
Unit_2 Data Warehouse
11 pages
DAunit1 (1)
No ratings yet
DAunit1 (1)
22 pages
221
No ratings yet
221
2 pages
FDS notes
No ratings yet
FDS notes
5 pages
BDA U2
No ratings yet
BDA U2
44 pages
Data Science with R: Beginner to Expert
From Everand
Data Science with R: Beginner to Expert
Narayana Nemani
No ratings yet
"Big Data Science" Basic Concepts and Applications
From Everand
"Big Data Science" Basic Concepts and Applications
Sukanta Bhattacharya
No ratings yet
Business Communications 2 4tf2l24u
0% (1)
Business Communications 2 4tf2l24u
6 pages
Imt Sample Project Report
No ratings yet
Imt Sample Project Report
21 pages
Mass Society Theory
No ratings yet
Mass Society Theory
5 pages
How To Make Literature Review Map
100% (2)
How To Make Literature Review Map
4 pages
Fundamentals of Reading Academic Texts
78% (9)
Fundamentals of Reading Academic Texts
26 pages
WE SOAR Performance Coaching Playbook V2
No ratings yet
WE SOAR Performance Coaching Playbook V2
15 pages
International Journal of Computational Engineering Research (IJCER)
No ratings yet
International Journal of Computational Engineering Research (IJCER)
6 pages
PISA 2018 Released REA Items
100% (1)
PISA 2018 Released REA Items
68 pages
csv2tcxml:TCXML Data Migration in TC11.2.x
No ratings yet
csv2tcxml:TCXML Data Migration in TC11.2.x
1 page
73-11-11 - Fuel Pump
No ratings yet
73-11-11 - Fuel Pump
254 pages
IRB4400 M2004 Product Manual 3HAC022032-001 RevH en
No ratings yet
IRB4400 M2004 Product Manual 3HAC022032-001 RevH en
309 pages
Measurement Instrumentation And Sensors Handbook Electromagnetic Optical Radiation Chemical And Biomedical Measurement 2nd John G Webster pdf download
No ratings yet
Measurement Instrumentation And Sensors Handbook Electromagnetic Optical Radiation Chemical And Biomedical Measurement 2nd John G Webster pdf download
88 pages
Dashboard Design Patterns
No ratings yet
Dashboard Design Patterns
11 pages
CS427
No ratings yet
CS427
6 pages
Evaluation of Texts and Images in Multicultural Contexts
No ratings yet
Evaluation of Texts and Images in Multicultural Contexts
16 pages
Virtual Assistance and Chatbot
No ratings yet
Virtual Assistance and Chatbot
11 pages
Groot an Event-graph-based Approach for Root
No ratings yet
Groot an Event-graph-based Approach for Root
11 pages
Active Shooter Azalea
No ratings yet
Active Shooter Azalea
27 pages
Implementing DDD Cqrs and Event Sourcing
No ratings yet
Implementing DDD Cqrs and Event Sourcing
445 pages
Pca - Inglish Pre A1
No ratings yet
Pca - Inglish Pre A1
22 pages
Main Project Report Check
No ratings yet
Main Project Report Check
34 pages
Q3_LE_Science 4_Lesson 7_Week 7
No ratings yet
Q3_LE_Science 4_Lesson 7_Week 7
14 pages
Policy and Procedures
No ratings yet
Policy and Procedures
2 pages
Ebook Part 3 - The BI Framework - How To Turn Information Into A Competitive Asset
No ratings yet
Ebook Part 3 - The BI Framework - How To Turn Information Into A Competitive Asset
54 pages
MB0039 Business Communication
No ratings yet
MB0039 Business Communication
11 pages
Extension Worker
No ratings yet
Extension Worker
7 pages
Information and Communication Technologies Act, No 44 of 2001
No ratings yet
Information and Communication Technologies Act, No 44 of 2001
44 pages
The Effect of Information and Communicat
No ratings yet
The Effect of Information and Communicat
66 pages

What Is Data Warehouse

Uploaded by

What Is Data Warehouse

Uploaded by

what is Data warehouse ? Draw figure and explain its architecture?

A data warehouse architecture is a method of defining the overall architecture of

Three common architectures are:

o Data Warehouse Architecture: Basic

Types of Data Warehouse Architectures

1. Nominal Scale:  Characteristics: Categorizes data into distinct groups without

1. Primary Data Collection:

a. Surveys and Questionnaires: Researchers design structured questionnaires or surveys

2. Secondary Data Collection:

a. Published Sources: Researchers refer to books, academic journals, magazines,

b. Online Databases: Numerous online databases provide access to a wide range of

c. Government and Institutional Records: Government agencies, research institutions,

Explain the steps of data processing?

Stages of data processing Data processing involves transforming raw

1. Data Collection: This first step gathers OR collection data from

3. Data Input: The prepared data is then loaded into a chosen

4. Data Processing: This is where you analyze and manipulate the

5. Data Output: The extracted insights are presented in a clear and

6. Data Storage: Finally, the processed data is saved securely for

Explain Data Mart and in Detail?

5. Decision-making support: Ultimately, the purpose of a data mart is to

 Accessibility: Data lakes are designed to be easily accessible by data analysts

-Cost-effectiveness: Compared to data warehouses, data lakes are typically

 Complexity: Managing a data lake can be complex, especially as it grows in

in the context of computer science, “Data Mining” can be referred to as

KDD (Knowledge Discovery in Databases) is a process that involves the

Data Integration Data integration is defined as heterogeneous data from

Data Transformation Data Transformation is defined as the process of

1. Data Mapping: Assigning elements from source base to destination to

Pattern Evaluation Pattern Evaluation is defined as identifying strictly

Knowledge Representation This involves presenting the results in a way that is

1. Privacy concerns: 3-Data Quality:

2. Complexity: 4-High cost:

You might also like