0% found this document useful (0 votes)

15 views

Module 7 Introduction To Data Mining

Data mining is a process that uses statistical and machine learning techniques to uncover hidden patterns in large datasets. It involves developing models from sample data known as training data in order to discover patterns in new data. There are two main methodologies for data mining - CRISP-DM which consists of 6 phases for conducting data mining analysis, and SEMMA which focuses on a core set of tasks. Data mining relies on data warehouses, which are large databases designed to analyze patterns in historical data from multiple sources. Data warehouses differ from operational databases in that they focus on analytical querying rather than transaction processing.

Uploaded by

carolinelouise.bagsik.acct

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

15 views

Module 7 Introduction To Data Mining

Uploaded by

carolinelouise.bagsik.acct

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 56

INTRODUCTION TO

DATA MINING
Learning Objectives
At the end of the module, the student should be able to:
1. Define data mining and some common approaches used in data
mining;
2. Distinguish the difference among database, data warehouse, and
datamart;
3. Differentiate Online Analytical Processing (OLAP) and Online
Transactional Processing (OLTP);
4. Describe the data mining methodologies.
Data Mining
Data mining is a field of business analytics focused on better
understanding characteristics and patterns among variables in
large databases using a variety of statistical and analytical
tools (Evans, 2017).
Data mining includes a wide variety of statistical procedures
for exploring data, including regression analysis (Evans, 2017).
Data mining attempts to discover patterns, trends, and
relationships among data, especially nonobvious and
unexpected patterns (Albright & Winston, 2020).
Data Mining (Jaggia et al, 2021)
Data mining describes the process of applying a set of
analytical techniques necessary for the development of
machine learning and artificial intelligence.
The goal of data mining is to uncover hidden patterns
and relationships in data, which allows us to gain insights
and derive relevant information to help make decisions
(Jaggia et al., 2021).
Data Mining (Albright and Winston, 2020 )
The place to start is with a data warehouse. A data warehouse is a huge
database that is designed specifically to study patterns in data. It should:
1. Combine data from multiple sources to discover as many relationships as
possible;
2. Contain accurate and consistent data;
3. Be structured to enable quick and accurate responses to a variety of
queries; and
4. Allow follow-up responses to specific relevant questions.
A data warehouse represents a type of data base that is specifically
structured to enable data mining.
Data Mining (Jaggia et al, 2021, page 318)
Data mining is recognized as a building block of machine learning and
artificial intelligence.
Data Mining Process (Jaggia et al, 2021, page 319)
There is a growing need for the establishment of standards in this field.
Two commonly adopted are CRISP-DM and SEMMA methodologies
Data Mining Process (Jaggia et al, 2021, page 319)
What is CRISP-DM Methodology?
When conducting data mining
analysis, practitioners generally adopt
either CRISP-DM methodology or
SEMMA methodology.
CRISP-DM stands for Cross-Industry
Standard Process for Data Mining and
consists of six major phases. It was
developed in the 1990s by SPSS,
TeraData, Daimler AG, NCR and OHRA.
Data Mining Process (Jaggia et al, 2021, page 319)

Some practitioners prefer the SEMMA methodology. Developed by

the SAS Institute, this methodology focuses on a core set of tasks and
provides step by step process for analyzing data.
Database vs Data Warehouse
• A database is an organized collection of information stored in a way
that makes logical sense and that facilitates easier search, retrieval,
manipulation, and analysis of data.
• Perhaps the most common way of classifying databases is SQL vs.
NoSQL (also known as relational vs. non-relational).
• A data warehouse is a system that aggregates and stores information
from a variety of disparate sources within an organization.
• The goal of a data warehouse is explicitly business-oriented; it is
designed to facilitate decision-making by allowing end-users to
consolidate and analyze information from different sources.
https://www.xplenty.com/blog/data-warehouse-vs-database-what-are-the-key-differences/
Database vs. Data Warehouse
• A data warehouse and database both stores data that are
multidimensional. Database stores real-time information about one
particular part of your business, while data warehouse stores
historical data about your business. It does not store current
information, nor is it updated in real-time.
• Database - its main job is to process the daily transactions that your
company makes, e.g., recording which items have sold, while data
warehouse is a system that pulls together data from many different
sources within an organization for reporting and analysis. The reports
created from complex queries within a data warehouse are used to
make business decisions.
Data warehouse and Data mart ( Albright and Winston, 2020)

Data warehouse is a huge database designed specifically to study

patterns in data.
• A data mart is a scaled-down or part of a data warehouse, structured
specifically for one part of an organization, such as sales.
Database vs. Data Warehouse
• A data warehouse is a relational or multidimensional database that is
designed for query and analysis.
• Database, however is focused on the day-to-day operations of the
company, while data warehouse is for making analysis of historical
data to extract insights from it
• Data warehouses are not optimized for transaction processing, which
is the domain of OLTP systems but rather, Data warehouse is for
analytical processing which is the main focus of OLAP.
• The Online Analytical Processing (OLAP) and Online Transactional
Processing (OLTP) is the main or key difference between data
warehouse and database.
Database vs Data Warehouse
• Database. This type of processing immediately responds to
user requests, and so is used to process the day-to-day
operations of a business in real-time. For example, if a user
wants to reserve a hotel room using an online booking form,
the process is executed with OLTP.
• Data warehouses. This process gives analysts the power to
look at your data from different points of view. For example,
even though your database records sales data for every
minute of every day, you may just want to know the total
amount sold each day.
Database vs. Data Warehouse
•Databases use OnLine Transactional Processing
(OLTP) to delete, insert, replace, and update
large numbers of short online transactions
quickly.
•Data warehouses use OnLine Analytical
Processing (OLAP) to analyze massive volumes of
data rapidly.
Database vs. Data Warehouse Comparison Chart
Parameter Database Data Warehouse
Use Recording data Analyzing data
Processing OnLine Transactional Processing
OnLine Analytical Processing (OLAP)
Methods (OLTP)
Concurrent
Thousands Limited Number
Users
Use Cases Small transactions Complex Analysis
Downtime Always available Some scheduled downtime
For CRUD (create, read, update, and
Optimization For complex analysis
delete) operations
Data
Real-time detailed data Summarized historical data
Type/timeline
https://www.xplenty.com/blog/data-warehouse-vs-database-what-are-the-key-differences/
OLTP vs. OLAP
• OLTP and OLAP: The two terms look similar but refer to
different kinds of systems.
• Online transaction processing (OLTP) captures,
stores, and processes data from transactions in real time.
• Online analytical processing (OLAP) uses complex queries
to analyze aggregated historical data from OLTP systems.
• Examples of OLTP application are ATM center, online
banking, online booking, sending a text message, etc.

https://www.guru99.com/oltp-vs-olap.html
OLTP vs. OLAP
• Examples of OLTP applications are ATM centers, online
banking, online booking, sending text messages, etc.
• Examples of the use of OLAP are as follows:
• Spotify analyzed songs by users to come up with a
personalized homepage of their songs and playlist.
• Netflix movie recommendation system.

Source: Difference between OLAP and OLTP in DBMS - GeeksforGeeks

KEY DIFFERENCE between OLTP and OLAP:
• Online Analytical Processing (OLAP) is a category of software tools
that analyze data stored in a database whereas Online transaction
processing (OLTP) supports transaction-oriented applications in a
3-tier architecture.
• OLAP is characterized by a large volume of data while OLTP is
characterized by large numbers of short online transactions.
• In OLAP, data warehouse is created uniquely so that it can
integrate different data sources for building a consolidated
database whereas OLTP uses traditional DBMS.
Benefits of using OLAP services
• OLAP creates a single platform for all type of business
analytical needs which includes planning, budgeting,
forecasting, and analysis.
• The main benefit of OLAP is the consistency of
information and calculations.
• Easily apply security restrictions on users and objects
to comply with regulations and protect sensitive data.

https://www.xplenty.com/blog/snowflake-schemas-vs-star-schemas-what-are-they-and-how-are-they-
different/#:~:text=Star%20schemas%20will%20only%20join,for%20datamarts%20with%20simple%20relationships.
Benefits of OLTP method
• It administers daily transactions of an organization.
• OLTP widens the customer base of an organization by
simplifying individual processes.
• OLTP systems are optimized for transactional superiority
instead of data analysis, thus it can handle simultaneous
transactions that OLAP cannot perform due to a large
volume of data and are integrated with different data
sources for building a consolidated database.
https://www.xplenty.com/blog/snowflake-schemas-vs-star-schemas-what-are-they-and-how-are-they-
different/#:~:text=Star%20schemas%20will%20only%20join,for%20datamarts%20with%20simple%20relationships.
OLTP vs. OLAP Comparison Chart
Online Analytical Processing (OLAP) Online Transactional Processing (OLTP)

Consists of historical data from various databases Consist only operational current data

It is subject oriented. Used for Data mining,

It is application oriented. Used for business tasks.
Analytics, Decision making, etc.
The data is used in planning, problem solving, and The data is used to perform day to day fundamental
Decision making. operations.
It provides a multi-dimensional view of different
It reveals a snapshot of present business tasks.
business tasks.
The size of the data is relatively small as the historical
Large amount of data is stored typically in TB, PB.
data is archived. For example MB, GB.
This data is generally manage by CEO, Managing
Manage by clerks, managers, encoder.
Directors, General Manager.
Only read and rarely write operations. Both read and write operations
a
Methods of Data mining (Albright and Winston, 2020)
Once a data warehouse is in place, analysts can begin to mine the data with a collection of
methodologies:
• Classification analysis
• Prediction
• Cluster analysis
• Market basket analysis
• Forecasting
Numerous software packages are available that perform various data mining
procedures.
Supervised and Unsupervised Data Mining Techniques
(Albright and Winston, 2020)
• In supervised data mining techniques, there is a dependent variable
that the method is trying to predict.

Source: Jaggia et al, 2021, p. 320

Supervised and Unsupervised Data Mining Techniques
(Albright and Winston, 2020)
• In unsupervised data mining techniques, there is no dependent
variable. Instead, these techniques search for patterns and
structure among all of the variables.
• Clustering or segmentation is the most common unsupervised
method.
• Another popular unsupervised method is market basket
analysis (also called association analysis), where patterns of
customer purchases are examined to see which items
customers tend to purchase together, in the same “market
basket.”
Supervised and Unsupervised Data Mining Techniques
(Jaggia et al, 2021)
Classification Methods (Albright and Winston, 2020)

• One of the most important problems studied in data mining

is the classification problem.
• This is basically the same problem attacked by regression
analysis, but now the dependent variable is categorical.
• Each of the classification methods has the same
objective: to use data from the explanatory variables to
classify each record (person, company, or whatever) into
one of the known categories.
Classification Methods (Albright and Winston, 2020)

• It attempts to find variables that are related to a

categorical (often binary) variable.

• For example, classification analysis would attempt

to find explanatory variables that would help
predict whether a credit card holder will pay their
balances in a reasonable amount of time or not.
Classification Methods (Albright and Winston, 2020)
• Data partitioning plays an important role in classification.
• The data set is partitioned into two or even three distinct subsets before
algorithms are applied.
• The first subset, usually with about 70% to 80% of the records, is called
the training set. The algorithm is trained with data in the training set.
• The second subset, called the testing set, usually contains the rest of
the data. The model from the training set is tested on the testing set.
• Some software packages might also let you specify a third subset, often
called a prediction set, where the values of the dependent variables
are unknown. Then you can use the model to classify these unknown
values.
A. Logistic Regression (Albright and Winston, 2020)
• Logistic regression is a popular method for classifying individuals,
given the values of a set of explanatory variables.
• It estimates the probability that an individual is in a particular
category.
• It uses a nonlinear function of the explanatory variables for
classification.
• It is essentially regression with a binary (0-1) dependent variable.
• For the two-category problem, the binary variable indicates
whether an observation is in category 0 or category 1.
B. Discriminant Analysis (Albright and Winston, 2020)

• Most software include another classification procedure

called discriminant analysis.
• This is a classical technique developed many decades ago
that is still in use.
• It is somewhat similar to logistic regression and has the
same basic goals.
• However, it is not as prominent in data mining discussions
as logistic regression.
C. Neural Network (Albright and Winston, 2020)
• The neural network (or neural net) methodology is an attempt
to model the complex behavior of the human brain.
• It sends inputs (the values of explanatory variables) through a
complex nonlinear network to produce one or more outputs
(the values of the dependent variable).
• It can be used to predict a categorical dependent variable or a
numeric dependent variable.
C. Neural Network (Albright and Winston, 2020)
• The neural network (or neural net) methodology is an attempt
to model the complex behavior of the human brain.
• The biggest advantage of neural nets is that they often
provide more accurate predictions than any other
methodology, especially when relationships are highly
nonlinear.
• However, neural nets do not provide easily interpretable
equations where you can see the contributions of the
individual explanatory variables.
Neural Networks (Albright and Winston, 2020)
• Each neural net has an associated network diagram, like the
one shown below.

❑This figure assumes two inputs and one output.

❑The network also includes a “hidden layer” in the middle
with two hidden nodes.
© 2020 Cengage. May not be scanned, copied or duplicated, or posted to a publicly accessible website, in whole or in part, except for use as permitted in a license distributed with a certain product or service or otherwise on a password-
protected website or school-approved learning management system for classroom use.
Neural Networks (Albright and Winston, 2020)

❑Scaled values of the inputs enter the network at the left, they are weighted by
the W values and summed, and these sums are sent to the hidden nodes.
❑At the hidden nodes, the sums are “squished” by an S-shaped logistic-type
function.
❑These squished values are then weighted and summed, and the sum is sent to
the output node, where it is squished again and rescaled.

• Classification trees are also capable of discovering nonlinear

relationships, but it is more intuitive.
• This method, which has many variations, has existed for decades and
has been implemented in a variety of software packages.

The ability of classification trees to provide such simples rules, plus

Fairly accurate classifications, has made this a very popular
classification technique.

© 2020 Cengage. May not be scanned, copied or duplicated, or posted to a publicly accessible website, in whole or in part, except for use as permitted in a license distributed with a certain product or service or otherwise on a password-
protected website or school-approved learning management system for classroom use.
Clustering Methods (Albright and Winston, 2020)
• Probably the most common unsupervised method is clustering,
known in marketing circles as segmentation.
• It tries to group entities (customers, companies, cities, etc.)
into similar clusters, based on the values of their variables.
• There are no fixed groups like the triers and nontriers in
classification.
• Instead, the purpose of clustering is to discover the number
of groups and their characteristics, based entirely on the
data.
Clustering Methods (Albright and Winston, 2020)
• Clustering or segmentation tries to attach cases to categories (or
clusters), with high similarity within categories and high dissimilarity
across categories.
• The key to all clustering methods is the development of a
dissimilarity measure. Once a dissimilarity measure is developed, a
clustering algorithm attempts to find cluster of rows where rows
within a cluster are similar and rows in different clusters are
dissimilar.
Clustering Methods (Albright and Winston, 2020)
• A popular application of cluster analysis is called customer of
market segmentation, where companies analyze a large amount of
customer-related demographic and behavioral data and group
customers into different market segments.
• Two common clustering techniques are hierarchical clustering and
K-means clustering.
Clustering Methods (Albright and Winston, 2020)
• For example, a credit card company might group customers into
those who pay off their account balance every month versus those
who carry a monthly balance, and within these two customer
segments, group them further according to their spending habits.
• The company would likely target each of the customer segments
with different promotion and advertising campaigns or design
different financial products for each group.
Common Clustering Methods (Jaggia et al, 2021)

Agglomerative clustering (or nesting) is referred to as AGNES while divisive clustering (or analysis) is referred to as DIANA.
Common Clustering Methods (Jaggia et al, 2021)
Association Rule Analysis (Jaggia et al, 2021)
• Another widely used unsupervised data mining technique, it is also
referred to as affinity analysis or market basket analysis.
• It is essentially a “what goes with what” study designed to identify
events that tend to occur together.
• For example, retail companies seek to identify products that
consumers tend to purchase together. This type of information is
useful for retail store managers in displaying their products on the
shelf or when promotional campaigns are developed.
Association Rule Analysis (Jaggia et al, 2021)
Forecasting Methods (Jaggia et al, 2021)
Forecasting Methods (Jaggia et al, 2021)
Quantitative Forecasting Methods (Jaggia et al, 2021)
SIMPLE SMOOTHING TECHNIQUES
1. Moving Average Technique
Quantitative Forecasting Methods (Jaggia et al, 2021)
SIMPLE SMOOTHING TECHNIQUES
2. Simple Exponential Smoothing Technique
Quantitative Forecasting Methods (Jaggia et al, 2021)
LINEAR REGRESSION MODELS FOR TREND AND SEASONALITY
1. The Linear Trend Model
Quantitative Forecasting Methods (Jaggia et al, 2021)
LINEAR REGRESSION MODELS FOR TREND AND SEASONALITY
2. The Linear Trend Model with Seasonality
Quantitative Forecasting Methods (Jaggia et al, 2021)
NONLINEAR REGRESSION MODELS FOR TREND AND SEASONALITY
3. The Exponential Trend Model
Quantitative Forecasting Methods (Jaggia et al, 2021)
NONLINEAR REGRESSION MODELS FOR TREND AND SEASONALITY
4. The Polynomial Trend Model
Quantitative Forecasting Methods (Jaggia et al, 2021)
NONLINEAR REGRESSION MODELS WITH SEASONALITY
1. The Exponential Trend Model with SEASONAL DUMMY VARIABLES
Quantitative Forecasting Methods (Jaggia et al, 2021)
NONLINEAR REGRESSION MODELS WITH SEASONALITY
2. The Quadratic Trend Model with SEASONAL DUMMY VARIABLES
Reference
• Business Analytics: Data Analysis and Introduction to Decision Making
by Albright, C. and Winston, W. 5th Edition
Copyright 2020 by Cengage Learning.
• Business Analytics: Communicating with Numbers by Jaggia, S., Kelly,
A., Lertwachara, K. and Chen, L.
Copyright 2021 by McGraw-Hill Education.

Learn Data Warehousing in 24 Hours
From Everand
Learn Data Warehousing in 24 Hours
Alex Nordeen
No ratings yet
Exploring Hadoop Ecosystem (Volume 2): Stream Processing
From Everand
Exploring Hadoop Ecosystem (Volume 2): Stream Processing
Wei Liu
No ratings yet
Background: Wcob Dillard's Data Model and Data Dictionary
No ratings yet
Background: Wcob Dillard's Data Model and Data Dictionary
5 pages
U1-U5 Consolidated PDF
No ratings yet
U1-U5 Consolidated PDF
222 pages
CHapter 2 Data Data Warehousing and OLAP technologies
No ratings yet
CHapter 2 Data Data Warehousing and OLAP technologies
15 pages
OLAP
No ratings yet
OLAP
5 pages
DWHDM_22CSE120__MODULE-1
No ratings yet
DWHDM_22CSE120__MODULE-1
45 pages
CHapter 2 Data Data Warehousing and OLAP Technologies
No ratings yet
CHapter 2 Data Data Warehousing and OLAP Technologies
18 pages
OLAP (Online Analytical Processing) : Zalpa Rathod (39) Yatin Puthran (37) Mayuri Pawar (35) Mitesh Patil
No ratings yet
OLAP (Online Analytical Processing) : Zalpa Rathod (39) Yatin Puthran (37) Mayuri Pawar (35) Mitesh Patil
37 pages
CH - 3
No ratings yet
CH - 3
45 pages
Module-1
No ratings yet
Module-1
78 pages
DBMS II Seven 7
No ratings yet
DBMS II Seven 7
13 pages
DWDM Unit 1
No ratings yet
DWDM Unit 1
86 pages
Difference Between Data Mining and Data Warehouse Data Warehouse
No ratings yet
Difference Between Data Mining and Data Warehouse Data Warehouse
4 pages
Data Warehousing & Mining
No ratings yet
Data Warehousing & Mining
154 pages
Data Mining Unit-2 notes
No ratings yet
Data Mining Unit-2 notes
8 pages
Data_Mining_Assignment (4)
No ratings yet
Data_Mining_Assignment (4)
2 pages
Unit 1
No ratings yet
Unit 1
99 pages
dm unit 2
No ratings yet
dm unit 2
21 pages
DW&DM Material
No ratings yet
DW&DM Material
107 pages
$RPPE2MQ
No ratings yet
$RPPE2MQ
12 pages
DWBI Unit-1
No ratings yet
DWBI Unit-1
19 pages
$RPPE2MQ
No ratings yet
$RPPE2MQ
12 pages
Term 1
No ratings yet
Term 1
12 pages
Wk3-4 Data Warehouse
No ratings yet
Wk3-4 Data Warehouse
60 pages
5 DATA WAREHOUSE (1)
No ratings yet
5 DATA WAREHOUSE (1)
17 pages
DWDM Unit 2
No ratings yet
DWDM Unit 2
21 pages
Data Warehouse and Mining-1
No ratings yet
Data Warehouse and Mining-1
40 pages
BI unit 1 Data warehouse.ppt
No ratings yet
BI unit 1 Data warehouse.ppt
169 pages
Module 6
No ratings yet
Module 6
7 pages
1.6 OLAP Operations
No ratings yet
1.6 OLAP Operations
29 pages
Data Warehousing and Data Mining
No ratings yet
Data Warehousing and Data Mining
30 pages
Decision Support, Data Warehousing, and OLAP
No ratings yet
Decision Support, Data Warehousing, and OLAP
48 pages
Defining Data Mining and Data Warehouse (Adugna Gutema)
No ratings yet
Defining Data Mining and Data Warehouse (Adugna Gutema)
9 pages
dwm 2
No ratings yet
dwm 2
31 pages
Dmw Qb Solved Sem6 (1) 2
No ratings yet
Dmw Qb Solved Sem6 (1) 2
131 pages
Data Warehousing, OLAP, and Data Mining
No ratings yet
Data Warehousing, OLAP, and Data Mining
28 pages
Data Mining: Concepts and Techniques: - Slides For Textbook - Chapter 2
No ratings yet
Data Mining: Concepts and Techniques: - Slides For Textbook - Chapter 2
86 pages
A Close Look at OLTP, OLAP, and RDBMS Dynamics
No ratings yet
A Close Look at OLTP, OLAP, and RDBMS Dynamics
26 pages
Data Warehousing & Dimensional Modeling Concepts !!
No ratings yet
Data Warehousing & Dimensional Modeling Concepts !!
33 pages
Data Warehousing Interview Q&A
No ratings yet
Data Warehousing Interview Q&A
14 pages
Big Data Analytics Notes
No ratings yet
Big Data Analytics Notes
9 pages
MultiDimensional Data Model
No ratings yet
MultiDimensional Data Model
22 pages
DWM UNIT 1 (2)
No ratings yet
DWM UNIT 1 (2)
67 pages
Data Base Vs Data Ware House
No ratings yet
Data Base Vs Data Ware House
29 pages
Module 1 DMDW
No ratings yet
Module 1 DMDW
64 pages
DW Concepts
100% (1)
DW Concepts
40 pages
Unit IV - Data Warehousing and OLAP Technologies
No ratings yet
Unit IV - Data Warehousing and OLAP Technologies
68 pages
DataminingWarehousing Module 1 PPT Notes
No ratings yet
DataminingWarehousing Module 1 PPT Notes
95 pages
DMDW 6
No ratings yet
DMDW 6
41 pages
OBIEE - Quick Guide
No ratings yet
OBIEE - Quick Guide
78 pages
Unit 1 Data Warehouse Fundamentals: Structure
No ratings yet
Unit 1 Data Warehouse Fundamentals: Structure
10 pages
1 - OLAP & Data Warehouse
No ratings yet
1 - OLAP & Data Warehouse
15 pages
Data Warehouse
No ratings yet
Data Warehouse
97 pages
Data Warehouse
No ratings yet
Data Warehouse
16 pages
An Introduction To MSBI & DWH by QuontraSolutions
No ratings yet
An Introduction To MSBI & DWH by QuontraSolutions
32 pages
Data Warehouse Administration
No ratings yet
Data Warehouse Administration
14 pages
Data Warehousing and Data Mining
100% (4)
Data Warehousing and Data Mining
169 pages
Practical Data Strategies and Recipes
From Everand
Practical Data Strategies and Recipes
Tom Henricksen
No ratings yet
Learn Hadoop in 24 Hours
From Everand
Learn Hadoop in 24 Hours
Alex Nordeen
No ratings yet
The Study of Building the Data Warehouse
From Everand
The Study of Building the Data Warehouse
venkateswara Rao
No ratings yet
DBMS lab
No ratings yet
DBMS lab
4 pages
TD Connecting SQLWindows Objects to Databases
No ratings yet
TD Connecting SQLWindows Objects to Databases
145 pages
Data Warehousing and Data Mining June July 2022
No ratings yet
Data Warehousing and Data Mining June July 2022
2 pages
CS Project
No ratings yet
CS Project
20 pages
Resource Description
No ratings yet
Resource Description
4 pages
Unit Ii - TRC, DRC
No ratings yet
Unit Ii - TRC, DRC
3 pages
Functions of SCADA: 1. Data Acquisition
No ratings yet
Functions of SCADA: 1. Data Acquisition
3 pages
Lecture1 Introduction
No ratings yet
Lecture1 Introduction
67 pages
UNIT 1 On Databases
No ratings yet
UNIT 1 On Databases
66 pages
Software Engineer Developer Java in San Francisco Bay CA Resume Sanat Sahasrabudhe
No ratings yet
Software Engineer Developer Java in San Francisco Bay CA Resume Sanat Sahasrabudhe
2 pages
The SQL Tutorial For Data Analysis v2
No ratings yet
The SQL Tutorial For Data Analysis v2
103 pages
Full Backups
No ratings yet
Full Backups
6 pages
Get Database Concepts 8th Edition Kroenke Test Bank free all chapters
100% (22)
Get Database Concepts 8th Edition Kroenke Test Bank free all chapters
52 pages
SQL Assignment2 Opt1
No ratings yet
SQL Assignment2 Opt1
5 pages
24 Jan DBMS Workshop BE Kargil Batch 2026
No ratings yet
24 Jan DBMS Workshop BE Kargil Batch 2026
1 page
CS403 Grand QUIZ Spring 2021-1
No ratings yet
CS403 Grand QUIZ Spring 2021-1
4 pages
Mongodb
No ratings yet
Mongodb
9 pages
DBMS 1
No ratings yet
DBMS 1
22 pages
1016 10161 Merged
No ratings yet
1016 10161 Merged
32 pages
Semwork 1
No ratings yet
Semwork 1
19 pages
List & Label Training Materials: Timeline Neo GMBH S.C. Timeline Business Systems S.R.L
No ratings yet
List & Label Training Materials: Timeline Neo GMBH S.C. Timeline Business Systems S.R.L
18 pages
CST8276 Lab 9 Raman
No ratings yet
CST8276 Lab 9 Raman
11 pages
DBMS Front Page
No ratings yet
DBMS Front Page
3 pages
Lecture 8 Database Security
No ratings yet
Lecture 8 Database Security
34 pages
DBMS Notes 1-8
No ratings yet
DBMS Notes 1-8
29 pages
Data Digger
No ratings yet
Data Digger
21 pages
Oracle Technical Certification Path
No ratings yet
Oracle Technical Certification Path
8 pages
Features of MRP Live in S4HANA
No ratings yet
Features of MRP Live in S4HANA
25 pages
unit 3 database management system
No ratings yet
unit 3 database management system
59 pages

Module 7 Introduction To Data Mining

Uploaded by

Module 7 Introduction To Data Mining

Uploaded by

INTRODUCTION TO

Some practitioners prefer the SEMMA methodology. Developed by

Data warehouse is a huge database designed specifically to study

Source: Difference between OLAP and OLTP in DBMS - GeeksforGeeks

It is subject oriented. Used for Data mining,

Source: Jaggia et al, 2021, p. 320

• One of the most important problems studied in data mining

• It attempts to find variables that are related to a

• For example, classification analysis would attempt

• Most software include another classification procedure

❑This figure assumes two inputs and one output.

• Classification trees are also capable of discovering nonlinear

The ability of classification trees to provide such simples rules, plus

You might also like