0% found this document useful (0 votes)

3 views

Unit 1 Data Mining

The document provides an overview of data mining, defining data and information, and categorizing data into structured and unstructured types. It details the data mining process, its history, tasks, architecture, and techniques, emphasizing its importance in extracting valuable insights from large datasets for decision-making. Additionally, it distinguishes between data mining and knowledge discovery in databases (KDD), as well as comparing data mining with OLAP.

Uploaded by

soumyachandu

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

3 views

Unit 1 Data Mining

Uploaded by

soumyachandu

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 15

Unit 1: Data Mining

Data?
Data is distinct pieces of information, usually formatted in a special way”. Data
can be measured, collected, reported, and analyzed, whereupon it is often
visualized using graphs, images, or other analysis tools. Raw data (“unprocessed
data”) may be a collection of numbers or characters before it’s been “cleaned”
and corrected by researchers.
What is Information ?
Information is data that has been processed , organized, or structured in a way
that makes it meaningful, valuable and useful.
Categories of Data
Data can be catogeries into two main parts –
Structured Data: This type of data is organized data into specific format, making
it easy to search , analyze and process. Structured data is found in a relational
databases that includes information like numbers, data and categories.
UnStructured Data: Unstructured data does not conform to a specific structure
or format. It may include some text documents , images, videos, and other data
that is not easily organized or analyzed without additional processing.
What is Data Mining
Data Mining:
Definition: Data mining is the process of extracting useful patterns,
relationships, and insights from large datasets using statistical techniques,
machine learning algorithms, and database systems. It plays a crucial role in
modern industries by helping organizations uncover hidden trends that drive
data-driven decision-making.
Purpose: The primary purpose of data mining is to extract valuable knowledge
and information from large volumes of data that might be hidden or not readily
apparent. It involves using advanced statistical and machine learning techniques
to identify patterns and trends.
Functions: Data mining algorithms and techniques are applied to the data to
identify associations, clusters, classifications, and anomalies. It helps in
understanding customer behavior, predicting trends, detecting fraud, and making
data-driven business decisions.

1
Usage: Data mining is widely used in areas such as marketing analysis, customer
segmentation, recommendation systems, fraud detection, healthcare research, and
financial forecasting.
Goals of Data Mining:
• The goal of data mining is to extract useful information from large datasets
and use it to make predictions or inform decision-making.
• Data mining is important because it allows organizations to uncover
insights and trends in their data that would be difficult or impossible to
discover manually.
• This can help organizations make better decisions, improve their
operations, and gain a competitive advantage.
Data Mining History and Origins
One of the earliest and most influential pioneers of data mining was Dr. Herbert
Simon, a Nobel laureate in economics who is widely considered to be the father
of artificial intelligence. In the 1950s and 1960s, Simon and his colleagues
developed a number of algorithms and techniques for extracting useful
information and insights from data, including clustering, classification, and
decision trees.
In the 1980s and 1990s, the field of data mining continued to evolve, and new
algorithms and techniques were developed to address the challenges of working
with large and complex data sets. The development of data mining software and
platforms, such as SAS, SPSS, and RapidMiner, made it easier for organizations
to apply data mining techniques to their data.
In recent years, the availability of large data sets and the growth of cloud
computing and big data technologies have made data mining even more powerful
and widely used. Today, data mining is a crucial tool for many organizations and
industries and is used to extract valuable insights and information from data sets
in a wide range of domains.
Tasks of Data Mining
1. Classification: Categorizing data into predefined classes.
2. Clustering: Grouping similar data points together.
3. Regression: Predicting numerical values based on data relationships.
4. Association Rule Mining: Discovering interesting relationships between
variables.

2
5. Anomaly Detection: Identifying unusual patterns in data.
6. Text Mining: Extracting insights from unstructured text data.
7. Prediction and Forecasting: Predicting future trends based on historical data.
8. Pattern Mining: Identifying recurring patterns in sequential data.
9. Feature Selection and Dimensionality Reduction: Identifying relevant features
and reducing dataset complexity.

Architecture of Data Mining

Data mining architecture typically consists of several components:

1. Data Sources: These are the repositories of data where the raw information
3
resides. Sources can include databases, data warehouses, websites, and more.
2. Data Cleaning and Integration: This stage involves preprocessing the data to
ensure its quality and compatibility for mining. It includes tasks like removing
noise, handling missing values, and integrating data from different sources.

3. Data Selection and Transformation: Here, relevant data subsets are selected
for analysis based on the mining goals. The selected data may also undergo
transformation to better suit the mining algorithms.
4. Data Mining Engine: This is the core component where various data mining
algorithms are applied to the prepared data to discover patterns, trends, and
insights.
5. Pattern Evaluation: Once patterns are discovered, they need to be evaluated
for their relevance, validity, and usefulness. This step often involves statistical
techniques and domain expertise.
6. Knowledge Presentation: Finally, the discovered knowledge is presented to
users in a comprehensible format, such as reports, visualizations, or dashboards,
to aid in decision making.
Throughout this process, feedback loops may exist where insights gained from
the data mining results inform subsequent data selection, cleaning, or mining
steps, creating a continuous improvement cycle.
Data Mining Process
The Data Mining process can be explored in 5 steps.
• Step 1: Collection – First data is collected, organized, and filled into a data
warehouse. The data is stored and managed either in the cloud or in-house servers.
• Step 2: Understanding – In this step, data scientists and business analysts examine the
properties of the data and conduct an in-depth analysis from the context of a particular
problem statement as defined by the company. This is addressed using querying,
visualization, and reporting.
• Step 3: Preparation – Once the data sources of the available data are confirmed, the
data is cleared, constructed, and formatted into the required form. In this process,
additional data can also be explored at a greater depth, which is well informed by the
insights and uncovered in the previous stage.
• Step 4: Modeling – In this stage, for the prepared dataset, modeling techniques are
selected. A data model is just like a diagram that reflects and describes the
relationships between different types of information that are stored in the database.
• Step 5: Evaluation – In the context of the business objectives, the model results are
evaluated. In this phase, due to new patterns that are discovered in the model results
or other factors, new business requirements may be raised.
4
Classification of data mining
Classification Based on the mined Databases
A data mining system can be classified based on the types of databases that have
been mined. A database system can be further segmented based on distinct
principles, such as data models, types of data, etc., which further assist in
classifying a data mining system.
For example, if we want to classify a database based on the data model, we need
to select either relational, transactional, object-relational or data warehouse
mining systems.
Classification Based on the type of Knowledge Mined
A data mining system categorized based on the kind of knowledge mind may have
the following functionalities:
1. Characterization
2. Discrimination
3. Association and Correlation Analysis

4. Classification
5. Prediction
6. Outlier Analysis
7. Evolution Analysis
Classification Based on the Techniques Utilized
A data mining system can also be classified based on the type of techniques that
are being incorporated.
These techniques can be assessed based on the involvement of user interaction
involved or the methods of analysis employed.
5
Classification Based on the Applications Adapted
Data mining systems classified based on adapted applications adapted are as
follows:
1. Finance
2. Telecommunications
3. DNA
4. Stock Markets
5. E-mail
What is KDD (Knowledge Discovery in Databases).
KDD is a computer science field specializing in extracting previously unknown
and interesting information from raw data. KDD is the whole process of trying to
make sense of data by developing appropriate methods or techniques. The
following steps are included in KDD process:
Data Cleaning
Data cleaning is defined as removal of noisy and irrelevant data from collection.
• Cleaning in case of Missing values.
• Cleaning noisy data, where noise is a random or variance error.
• Cleaning with Data discrepancy detection and Data transformation tools.
Data Integration
Data integration is defined as heterogeneous data from multiple sources
combined in a common source (DataWarehouse). Data integration using Data
Migration tools, Data Synchronization tools and ETL(Extract-Load-
Transformation) process.
Data Selection
Data selection is defined as the process where data relevant to the analysis is
decided and retrieved from the data collection. For this we can use Neural
network, Decision Trees, Naive bayes, Clustering, and Regression methods.
Data Transformation
Data Transformation is defined as the process of transforming data into
appropriate form required by mining procedure. Data Transformation is a two
step process:

6
• Data Mapping: Assigning elements from source base to destination to
capture transformations.
• Code generation: Creation of the actual transformation program.

Data Mining
Data mining is defined as techniques that are applied to extract patterns
potentially useful. It transforms task relevant data into patterns, and decides
purpose of model using classification or characterization.
Pattern Evaluation
Pattern Evaluation is defined as identifying strictly increasing patterns
representing knowledge based on given measures. It find interestingness score of
each pattern, and uses summarization and Visualization to make data
understandable by user.
Knowledge Representation
This involves presenting the results in a way that is meaningful and can be used
to make decisions.

7
Difference between KDD and Data Mining
Parameter KDD Data Mining
Definition KDD refers to a process of identifying Data Mining refers to a
valid, novel, potentially useful, and process of extracting useful
ultimately understandable patterns and and valuable information
relationships in data. or patterns from large data
sets.
Objective To find useful knowledge from data. To extract useful
information from data.
Techniques Data cleaning, data integration, data Association rules,
Used selection, data transformation, data classification, clustering,
mining, pattern evaluation, and regression, decision trees,
knowledge representation and neural networks, and
visualization. dimensionality reduction.
Output Structured information, such as rules and Patterns, associations, or
models, that can be used to make insights that can be used to
decisions or predictions improve decision-making
or understanding
Focus Focus is on the discovery of useful Data mining focus is on the
knowledge, rather than simply finding discovery of patterns or
patterns in data. relationships in data.
Role of Domain expertise is important in KDD, Domain expertise is
domain as it helps in defining the goals of the important in KDD, as it
expertise process, choosing appropriate data, and helps in defining the goals
interpreting the results. of the process, choosing
appropriate data, and
interpreting the results.

8
What is the difference between DBMS and Data mining?

What is OLAP?

OLAP stands for Online Analytical Processing. It is a computing method that

allows users to extract useful information and query data in order to analyze it
from different angles. For example, OLAP business intelligence queries usually
aid in financial reporting, budgeting, predict future sales, trends analysis and
other purposes. It enables the user to analyze database information from different
database systems simultaneously. OLAP data is stored in multidimensional
databases.
OLAP and data mining look similar since they operate on data to gain knowledge,
but the major difference is how they operate on data. OLAP tools provide
multidimensional data analysis and a summary of the data.
Key features of OLAP
➢ It supports complex calculations
➢ Time intelligence
➢ It has a multidimensional view of data
➢ Business-focused calculations
➢ Flexible and self-service reporting

9
➢ Applications of OLAP
➢ Database Marketing
➢ Marketing and sales analysis

Data Mining OLAP

Data mining refers to the field of OLAP is a technology of immediate

computer science, which deals with the access to data with the help of
extraction of data, trends and patterns multidimensional structures.
from huge sets of data.

It deals with the data summary. It deals with detailed transaction-level

data.

It is discovery-driven. It is query driven.

It is used for future data prediction. It is used for analyzing past data.

It has huge numbers of dimensions. It has a limited number of dimensions.

Bottom-up approach. Top-down approach.

It is an emerging field. It is widely used.

Data Mining as a Whole Process

The whole process of Data Mining consists of three main phases:
Data Pre-processing – Data cleaning, integration, selection, and transformation
takes place
Data Extraction – Occurrence of exact data mining
Data Evaluation and Presentation – Analyzing and presenting results

10
What is Data Mining Techniques?
Data mining techniques are algorithms and methods used to extract information
and insights from data sets.
1. Regression
Regression is a data mining technique that is used to model the relationship
between a dependent variable and one or more independent variables. In
regression analysis, the goal is to fit a mathematical model to the data that can be
used to make predictions or forecasts about the dependent variable based on the
values of the independent variables.
There are many different types of regression models, including linear regression,
logistic regression, and non-linear regression. In general, regression models are
used to answer questions such as:
• What is the relationship between the dependent and independent variables?
• How well does the model fit the data?
• How accurate are the predictions or forecasts made by the model?
2. Classification
Classification is a data mining technique that is used to predict the class or
category of an item or instance based on its characteristics or attributes. There are
many different types of classification models, including decision trees, k-nearest
neighbours, and support vector machines. In general, classification models are
used to answer questions such as:
• What is the relationship between the classes and the attributes
• How well does the model fit the data?

11
• How accurate are the predictions made by the model?
3. Clustering
Clustering is a data mining technique that is used to group items or instances in a
data set into clusters or groups based on their similarity or proximity. In clustering
analysis, the goal is to identify and explore the natural structure or organization
of the data, and to uncover hidden patterns and relationships.
There are many different types of clustering algorithms, including k-means
clustering, hierarchical clustering, and density-based clustering. In general,
clustering is used to answer questions such as:
• What is the natural structure or organization of the data?
• What are the main clusters or groups in the data?
• How similar or dissimilar are the items in the data set?
4. Association rule mining
Association rule mining is a data mining technique that is used to identify and
explore relationships between items or attributes in a data set. In association rule
mining, the goal is to identify patterns and rules that describe the co-occurrence
or occurrence of items or attributes in the data set and to evaluate the strength and
significance of these patterns and rules.
There are many different algorithms and methods for association rule mining,
including the Apriori algorithm and the FP-growth algorithm. In general,
association rule mining is used to answer questions such as
• What are the main patterns and rules in the data?
• How strong and significant are these patterns and rules?
• What are the implications of these patterns and rules for the data set and
the domain?
5. Dimensionality Reduction
Dimensionality reduction is a data mining technique that is used to reduce the
number of dimensions or features in a data set while retaining as much
information and structure as possible. There are many different methods for
dimensionality reduction, including principal component analysis (PCA),
independent component analysis (ICA), and singular value decomposition
(SVD). In general, dimensionality reduction is used to answer questions such as:
• What are the main dimensions or features in the data set?

12
• How much information and structure can be retained in a lower-
dimensional space?
• How can the data be visualized and analyzed in a lower-dimensional space?
6. Anomaly Detection: Anomaly detection identifies outliers or anomalies in
data that deviate from normal patterns. It is used for detecting fraud, network
intrusions, and equipment failures.Techniques include statistical methods,
clustering-based approaches, and machine learning algorithms such as isolation
forests and one-class SVM.
7. Sequential Pattern Mining: Sequential pattern mining discovers patterns that
occur sequentially or temporally in data. It is used in applications such as
analyzing customer behavior over time or identifying patterns in sequences of
events.Examples include the Prefix Span algorithm and the GSP (Generalized
Sequential Pattern)algorithm.
8. Text Mining: Text mining techniques extract useful information from
unstructured text data. This includes tasks such as sentiment analysis, topic
modeling, named entity recognition, and document classification. Techniques
such as natural language processing (NLP) and machine learning algorithms are
commonly used in text mining.
Benefits of Data Mining
Improved decision-making: Data mining can provide valuable insights that can
help organizations make better decisions by identifying patterns and trends in
large data sets.
Increased efficiency: Data mining can automate repetitive and time-consuming
tasks, such as data cleaning and preparation, which can help organizations save
time and resources.
Enhanced competitiveness: Data mining can help organizations gain a
competitive edge by uncovering new business opportunities and identifying areas
for improvement.
Improved customer service: Data mining can help organizations better
understand their customers and tailor their products and services to meet their
needs.
Fraud detection: Data mining can be used to identify fraudulent activities by
detecting unusual patterns and anomalies in data.
Predictive modeling: Data mining can be used to build models that can predict
future events and trends, which can be used to make proactive decisions.

13
New product development: Data mining can be used to identify new product
opportunities by analyzing customer purchase patterns and preferences.
Risk management: Data mining can be used to identify potential risks by
analyzing data on customer behavior, market conditions, and other factors.
Challenges and Issues in Data Mining
1]Data Quality
The quality of data used in data mining is one of the most significant challenges.
The accuracy, completeness, and consistency of the data affect the accuracy of
the results obtained. The data may contain errors, omissions, duplications, or
inconsistencies, which may lead to inaccurate results.
To address these challenges, data mining practitioners must apply data cleaning
and data preprocessing techniques to improve the quality of the data
2] Data Complexity

Data complexity refers to the vast amounts of data generated by various sources,
such as sensors, social media, and the internet of things (IoT). The complexity of
the data may make it challenging to process, analyze, and understand. In addition,
the data may be in different formats, making it challenging to integrate into a
single dataset.
To address this challenge, data mining practitioners use advanced techniques such
as clustering, classification, and association rule mining.
3] Data Privacy and Security

Data privacy and security is another significant challenge in data mining. As more
data is collected, stored, and analyzed, the risk of data breaches and cyber-attacks
increases. The data may contain personal, sensitive, or confidential information
that must be protected. Moreover, data privacy regulations such as GDPR, CCPA,
and HIPAA impose strict rules on how data can be collected, used, and shared.
To address this challenge, data mining practitioners must apply data
anonymization and data encryption techniques to protect the privacy and security
of the data. Data anonymization involves removing personally identifiable
information (PII) from the data, while data encryption involves using algorithms
to encode the data to make it unreadable to unauthorized users.

4] Scalability
Data mining algorithms must be scalable to handle large datasets efficiently. As
the size of the dataset increases, the time and computational resources required to
perform data mining operations also increase.
14
To address this challenge, data mining practitioners use distributed computing
frameworks such as Hadoop and Spark.
5] Interpretability

Data mining algorithms can produce complex models that are difficult to
interpret. This is because the algorithms use a combination of statistical and
mathematical techniques to identify patterns and relationships in the data.
To address this challenge, data mining practitioners use visualization techniques
to represent the data and the models visually.
Data Mining Applications
Data mining is used by a wide range of organizations and individuals across many
different industries and domains. Some examples of who uses data mining
include:
Businesses and Enterprises – Many businesses and enterprises use data mining
to extract useful insights and information from their data, in order to make better
decisions, improve their operations, and gain a competitive advantage. For
example, a retail company might use data mining to identify customer trends and
preferences or to predict demand for its products.
Government Agencies and Organizations – Government agencies and
organizations use data mining to analyze data related to their operations and the
population they serve, in order to make better decisions and improve their
services. For example, a health department might use data mining to identify
patterns and trends in public health data or to predict the spread of infectious
diseases.
Academic and Research Institutions – Academic and research institutions use
data mining to analyze data from their research projects and experiments, in order
to identify patterns, relationships, and trends in the data. For example, a university
might use data mining to analyze data from a clinical trial or to explore the
relationships between different variables in a social science study.
Individuals – Many individuals use data mining to analyze their own data, in
order to better understand and manage their personal information and activities.

For example, a person might use data mining to analyze their financial data and
identify patterns in their spending or to analyze their social media data and
understand their online behavior and interactions.

Solution Manual For An Introduction To Signal Detection and Estimation - Vincent Poor
No ratings yet
Solution Manual For An Introduction To Signal Detection and Estimation - Vincent Poor
1 page
Unit 3
No ratings yet
Unit 3
22 pages
Data Mining Cognate
No ratings yet
Data Mining Cognate
23 pages
Chapter-1 (Introduction)
No ratings yet
Chapter-1 (Introduction)
17 pages
DMW Notes by Me
No ratings yet
DMW Notes by Me
45 pages
full and correct notes for FDS-6th bca
No ratings yet
full and correct notes for FDS-6th bca
83 pages
Data Mining - Prashant
No ratings yet
Data Mining - Prashant
10 pages
Data Mining Notes1
No ratings yet
Data Mining Notes1
56 pages
Data Mining - Digital Notes (Unit I To V)
No ratings yet
Data Mining - Digital Notes (Unit I To V)
85 pages
BIDW Lecture 2
No ratings yet
BIDW Lecture 2
33 pages
fundamentals_of_Datascience1
No ratings yet
fundamentals_of_Datascience1
83 pages
Fundamentals of Datascience
No ratings yet
Fundamentals of Datascience
80 pages
What Is Data Mining
No ratings yet
What Is Data Mining
8 pages
Data Mining and Data Warehousing Unit 3 Part 1
No ratings yet
Data Mining and Data Warehousing Unit 3 Part 1
13 pages
DM Module1
No ratings yet
DM Module1
15 pages
Unit 1 Data Mining
No ratings yet
Unit 1 Data Mining
16 pages
1_Lect 1 & 2 Data Mining
No ratings yet
1_Lect 1 & 2 Data Mining
20 pages
DATA_MINING_UNIT_1
No ratings yet
DATA_MINING_UNIT_1
13 pages
Chapter 1&2
No ratings yet
Chapter 1&2
91 pages
DM Mod 1
No ratings yet
DM Mod 1
17 pages
Data-Mining-OVERVIEW (1)
No ratings yet
Data-Mining-OVERVIEW (1)
8 pages
Data Mining U-1
No ratings yet
Data Mining U-1
10 pages
Unit 3 Data Mining
No ratings yet
Unit 3 Data Mining
21 pages
Data Mining-CH5
No ratings yet
Data Mining-CH5
49 pages
DM
No ratings yet
DM
15 pages
Presentation On Data Mining
100% (1)
Presentation On Data Mining
51 pages
Fundamentals of Datascience
No ratings yet
Fundamentals of Datascience
81 pages
Lecture 1428550844
No ratings yet
Lecture 1428550844
87 pages
DWDM 2
No ratings yet
DWDM 2
15 pages
Mehrdad Jalali: Jalali@mshdiau - Ac.ir Jalali - Mshdiau.ac - Ir
No ratings yet
Mehrdad Jalali: Jalali@mshdiau - Ac.ir Jalali - Mshdiau.ac - Ir
27 pages
Data Mining L1,2
No ratings yet
Data Mining L1,2
26 pages
What Is Data Mining
No ratings yet
What Is Data Mining
8 pages
Data Mining
No ratings yet
Data Mining
43 pages
Data Mining
No ratings yet
Data Mining
6 pages
Chapter1 Introduction (Autosaved)
No ratings yet
Chapter1 Introduction (Autosaved)
23 pages
Data Mining
No ratings yet
Data Mining
7 pages
Lecture 1
No ratings yet
Lecture 1
17 pages
DWDM Unit3
No ratings yet
DWDM Unit3
15 pages
Chapter 3-IB
No ratings yet
Chapter 3-IB
69 pages
Unit 1 Datamining
No ratings yet
Unit 1 Datamining
16 pages
Introduction To Data Mining & Business Intelligence
No ratings yet
Introduction To Data Mining & Business Intelligence
25 pages
Introduction To Data Mining-Week1
No ratings yet
Introduction To Data Mining-Week1
43 pages
DM Chapter 1
No ratings yet
DM Chapter 1
10 pages
Unit 3 Ba
No ratings yet
Unit 3 Ba
29 pages
Chapter 1
No ratings yet
Chapter 1
55 pages
DWM Notes Class by Proff
No ratings yet
DWM Notes Class by Proff
88 pages
Data Mining
No ratings yet
Data Mining
18 pages
Seminar Data Mining
No ratings yet
Seminar Data Mining
10 pages
Data Mining in Search Engine Analytics
No ratings yet
Data Mining in Search Engine Analytics
7 pages
Introduction-to-Data-Mining
No ratings yet
Introduction-to-Data-Mining
32 pages
Major Issues in Data Mining
75% (4)
Major Issues in Data Mining
45 pages
1 Intro
No ratings yet
1 Intro
33 pages
Data Mining e Resources
No ratings yet
Data Mining e Resources
98 pages
Data Mining and Warehousing-1
No ratings yet
Data Mining and Warehousing-1
43 pages
UNIT-2_BI
No ratings yet
UNIT-2_BI
58 pages
VO_MCA_S4_Data Mining Unit 1
No ratings yet
VO_MCA_S4_Data Mining Unit 1
18 pages
Data Mining Unit 1
No ratings yet
Data Mining Unit 1
24 pages
Data Mining.pdf
No ratings yet
Data Mining.pdf
6 pages
Data mining
No ratings yet
Data mining
8 pages
data mining
No ratings yet
data mining
4 pages
Mastering Data Mining Techniques
From Everand
Mastering Data Mining Techniques
Dhaanyalakshmi Ahuja
No ratings yet
School of Statistics
No ratings yet
School of Statistics
9 pages
30 Questions To Test Your Understanding of Logistic Regression
No ratings yet
30 Questions To Test Your Understanding of Logistic Regression
13 pages
ECO242
No ratings yet
ECO242
1 page
Econometrics Worksheet
No ratings yet
Econometrics Worksheet
7 pages
AMS 572 Lecture Notes
No ratings yet
AMS 572 Lecture Notes
4 pages
Multinomial Logistic Regression - SPSS
100% (1)
Multinomial Logistic Regression - SPSS
7 pages
SHS StatProb Q4 W1-8 68pgs-041522
No ratings yet
SHS StatProb Q4 W1-8 68pgs-041522
69 pages
Asg 4
No ratings yet
Asg 4
3 pages
Week 4 Quiz
No ratings yet
Week 4 Quiz
2 pages
Lesson 2
No ratings yet
Lesson 2
22 pages
Chapter 3 - Parametric estimation using confidence intervals
No ratings yet
Chapter 3 - Parametric estimation using confidence intervals
7 pages
Large Sample Test
No ratings yet
Large Sample Test
27 pages
Asst. Prof. Florence C. Navidad, RMT, RN, M.Ed
100% (1)
Asst. Prof. Florence C. Navidad, RMT, RN, M.Ed
37 pages
Lesson 2.4–the Empirical Rule and Assessing Normality
No ratings yet
Lesson 2.4–the Empirical Rule and Assessing Normality
11 pages
Chapter 11. Analysis of Variance: One-Way ANOVA: FFT2073 Basic Statistics Tutorial
No ratings yet
Chapter 11. Analysis of Variance: One-Way ANOVA: FFT2073 Basic Statistics Tutorial
4 pages
Hw3 Solutions
No ratings yet
Hw3 Solutions
7 pages
Bio Statistics (Presentation)
No ratings yet
Bio Statistics (Presentation)
46 pages
STAT 1 Course Outline
No ratings yet
STAT 1 Course Outline
1 page
FDS - 3 SOLVED
No ratings yet
FDS - 3 SOLVED
21 pages
Lecture 3&4
No ratings yet
Lecture 3&4
63 pages
Robust Statistical Methods For Empirical Software Engineering
No ratings yet
Robust Statistical Methods For Empirical Software Engineering
52 pages
CP 3
No ratings yet
CP 3
2 pages
Kruskal Wallis Test
No ratings yet
Kruskal Wallis Test
10 pages
TestBank 78
No ratings yet
TestBank 78
39 pages
11 BS201 Prob and Stat - Ch4
100% (1)
11 BS201 Prob and Stat - Ch4
30 pages
Data Analytics Project
No ratings yet
Data Analytics Project
7 pages
Annotated-Part20skittles 20project
No ratings yet
Annotated-Part20skittles 20project
2 pages
Instant download Probability and statistics for computer scientists Third Edition Michael Baron pdf all chapter
No ratings yet
Instant download Probability and statistics for computer scientists Third Edition Michael Baron pdf all chapter
55 pages
Correlation (Pearson, Kendall, Spearman)
100% (1)
Correlation (Pearson, Kendall, Spearman)
4 pages

Unit 1 Data Mining

Uploaded by

Unit 1 Data Mining

Uploaded by

Unit 1: Data Mining

Architecture of Data Mining

Data mining architecture typically consists of several components:

OLAP stands for Online Analytical Processing. It is a computing method that

Data Mining OLAP

Data mining refers to the field of OLAP is a technology of immediate

It deals with the data summary. It deals with detailed transaction-level

It is discovery-driven. It is query driven.

It has huge numbers of dimensions. It has a limited number of dimensions.

Bottom-up approach. Top-down approach.

It is an emerging field. It is widely used.

Data Mining as a Whole Process

You might also like