0% found this document useful (0 votes)

28 views7 pages

Lec 1

This document provides an introduction and overview of data mining. It discusses what data mining is, the process of extracting useful information from large datasets, and some key advantages and challenges. Specifically, it explains that data mining involves cleaning, integrating, transforming, mining, evaluating, and presenting data to extract patterns that can be used for applications like fraud detection, market analysis, and science exploration. It also provides examples of how data mining can benefit industries like marketing, finance, manufacturing, and government.

Uploaded by

marshal 04

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

28 views7 pages

Lec 1

Uploaded by

marshal 04

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 7

DATA MINING

&
WAREHOUSING
LECTURE 1: Introduction to Data Mining

DATA MINING
Introduction to Data Mining
There is huge amount of data available in Information Industry. This data is of no use until
converted into useful information. Analyzing this huge amount of data and extracting useful
information from it is necessary.

The extraction of information is not the only process we need to perform; it also involves other
processes such as Data Cleaning, Data Integration, Data Transformation, Data Mining, Pattern
Evaluation and Data Presentation. Once all these processes are over, we are now position to
use this information in many applications such as Fraud Detection, Market Analysis,
Production Control, Science Exploration etc.

History of Data Mining

In 1960s statisticians used the terms “Data Fishing” or “Data Dredging”. That was to refer to
what they considered the bad practice of analyzing data. The term “Data Mining” appeared
around 1990 in the database community.

What is Data Mining?

Data Mining is defined as extracting information from huge sets of data. In other words, we
can say that data mining is the procedure of mining knowledge from data. This information can
be used for any of the following applications:

• Market Analysis
• Fraud Detection
• Customer Retention
• Production Control
• Science Exploration

Advantages of Data Mining

1. Marketing / Retail

Data mining helps marketing companies build models based on historical data to predict who
will respond to the new marketing campaigns such as direct mail, online marketing
campaign…etc. Through the results, marketers will have an appropriate approach to selling
profitable products to targeted customers.

Data mining brings a lot of benefits to retail companies in the same way as marketing. Through
market basket analysis, a store can have an appropriate production arrangement in a way that
customers can buy frequent buying products together with pleasant. In addition, it also helps
the retail companies offer certain discounts for particular products that will attract more
customers.

2. Finance / Banking

Data mining gives financial institutions information about loan information and credit
reporting. By building a model from historical customer’s data, the bank, and financial
institution can determine good and bad loans. In addition, data mining helps banks detect
fraudulent credit card transactions to protect credit card’s owner.

3. Manufacturing

By applying data mining in operational engineering data, manufacturers can detect faulty
equipment and determine optimal control parameters. For example, semiconductor
manufacturers have a challenge that even the conditions of manufacturing environments at
different wafer production plants are similar, the quality of wafer are a lot the same and some
for unknown reasons even has defects. Data mining has been applying to determine the ranges
of control parameters that lead to the production of the golden wafer. Then those optimal
control parameters are used to manufacture wafers with desired quality.

4. Governments

Data mining helps government agency by digging and analyzing records of the financial
transaction to build patterns that can detect money laundering or criminal activities.

Challenges of data mining

1. Privacy Issues

The concerns about the personal privacy have been increasing enormously recently especially
when the internet is booming with social networks, e-commerce, forums, blogs…. Because of
privacy issues, people are afraid of their personal information is collected and used in an
unethical way that potentially causing them a lot of troubles. Businesses collect information
about their customers in many ways for understanding their purchasing behaviors trends.
However businesses don’t last forever, some days they may be acquired by other or gone. At
this time, the personal information they own probably is sold to other or leak.

2. Security issues

Security is a big issue. Businesses own information about their employees and customers
including social security number, birthday, payroll and etc. However how properly this
information is taken care is still in questions. There have been a lot of cases that hackers
accessed and stole big data of customers from the big corporation such as Ford Motor Credit
Company, Sony… with so much personal and financial information available, the credit card
stolen and identity theft become a big problem.

3. Misuse of information/inaccurate information

Information is collected through data mining intended for the ethical purposes can be misused.
This information may be exploited by unethical people or businesses to take benefits of
vulnerable people or discriminate against a group of people.

In addition, data mining technique is not perfectly accurate. Therefore, if inaccurate

information is used for decision-making, it will cause serious consequence.

Deriving business value from data mining

The real value of data mining comes from being able to unearth hidden gems in the form of
patterns and relationships in data, which can be used to make predictions that can have a
significant impact on businesses.

For example, if a company determines that a particular marketing campaign resulted in

extremely high sales of a particular model of a product in certain parts of the country but not
in others, it can refocus the campaign in the future to get the maximum returns.

The benefits of the technology can vary depending on the type of business and its goals. For
example, sales and marketing managers in retail might mine customer information in different
ways to improve conversion rates than those in the airline or financial services industries.

Regardless of the industry, data mining that’s applied to sales patterns and client behavior in
the past can be used to create models that predict future sales and behavior.
There’s also the potential for data mining to help eliminate activities that can harm businesses.
For example, you can use data mining to enhance product safety, or detect fraudulent activity
in insurance and financial services transactions.

The applications of data mining

Data mining can be applied to a variety of applications in virtually every industry.

• Retailers can deploy data mining to better identify which products people are likely to
purchase based on their past buying habits, or which goods are likely to sell at certain times
of the year. This can help merchandisers plan inventories and store layouts.
• Banks and other financial services providers can mine data related to their clients’
accounts, transactions, and channel preferences to better meet their needs. They can also
gather then analyzed data from their websites and social media interactions to help increase
the loyalty of existing customers and attract new ones.
• Manufacturing companies can use data mining to look for patterns in the production
process, so they can precisely identify bottlenecks and flawed methods and find ways to
increase efficiencies. They can also apply knowledge from data mining to the design of
products, and make tweaks based on feedback from customer experiences.
• Educational institutions can benefit from data mining such as analyzing data sets to
predict the future learning behaviors and performance of students, and then using this
knowledge to make improvements in teaching methods or curricula.
• Health care providers can mine and analyze data to determine better ways of delivering
care to patients and cutting costs. With the help of data mining, they can predict how many
patients they will need to care for and what type of services those patients will need. In the
life sciences, mining can be used to glean insights from massive biological data, to help
develop new medicines and other treatments.
• In multiple industries, including health care and retail, you can use data mining to detect
fraud and other abuses - much more quickly than with traditional methods for identifying
such activities.

The key components of data mining

The process of data mining includes several distinct components that address different needs:

• Preprocessing. Before you can apply data mining algorithms, you need to build a target
data set. One common source for data is a data mart or warehouse. You need to perform
preprocessing to be able to analyze the data sets.
• Data cleansing and preparation. The target data set must be cleaned and otherwise
prepared, to remove “noise,” address missing values, filter outlying data points (for
anomaly detection) to remove errors or do further exploration, create segmentation
rules, and perform other functions related to data preparation.
• Association rule learning (also known as market basket analysis). These tools search
for relationships among variables in a data set, such as determining which products in
a store are often purchased together.
• Clustering. This feature of data mining is used to discover groups and structures in
data sets that are in some way similar to each other, without using known structures in
the data.
• Classification. Tools that perform classification generalize known structures to apply
to new data points, such as when an email application tries to classify a message as
legitimate mail or spam.
• Regression. This data mining technique tis used to predict a range of numeric values,
such as sales, housing values, temperatures, or prices when given a particular data set.
• Summarization. This technique provides a compact representation of a data set,
including visualization and report generation.

Data Mining Tasks

The data mining tasks can be classified generally into two types based on what a specific task
tries to achieve. Those two categories are descriptive tasks and predictive tasks. The
descriptive data mining tasks characterize the general properties of data whereas predictive
data mining tasks perform inference on the available data set to predict how a new data set will
behave.

There are a number of data mining tasks such as classification, prediction, time-series
analysis, association, clustering, summarization etc. All these tasks are either predictive data
mining tasks or descriptive data mining tasks. A data mining system can execute one or more
of the above specified tasks as part of data mining.

a) Classification
Classification derives a model to determine the class of an object based on its attributes. A
collection of records will be available, each record with a set of attributes. One of the attributes
will be class attribute and the goal of classification task is assigning a class attribute to new set
of records as accurately as possible.

Classification can be used in direct marketing that is to reduce marketing costs by targeting a
set of customers who are likely to buy a new product. Using the available data, it is possible to
know which customers purchased similar products and who did not purchase in the past. Hence,
{purchase, don’t purchase} decision forms the class attribute in this case. Once the class
attribute is assigned, demographic and lifestyle information of customers who purchased
similar products can be collected and promotion mails can be sent to them directly.

b) Prediction

Prediction task predicts the possible values of missing or future data. Prediction involves
developing a model based on the available data and this model is used in predicting future
values of a new data set of interest. For example, a model can predict the income of an
employee based on education, experience and other demographic factors like place of stay,
gender etc. Also prediction analysis is used in different areas including medical diagnosis, fraud
detection etc.

c) Time - Series Analysis

Time series is a sequence of events where the next event is determined by one or more of the
preceding events. Time series reflects the process being measured and there are certain
components that affect the behavior of a process. Time series analysis includes methods to
analyze time-series data in order to extract useful patterns, trends, rules and statistics. Stock
market prediction is an important application of time- series analysis.

d) Association

Association discovers the association or connection among a set of items. Association identifies
the relationships between objects. Association analysis is used for commodity management,
advertising, catalog design, direct marketing etc. A retailer can identify the products that
normally customers purchase together or even find the customers who respond to the promotion
of same kind of products. If a retailer finds that beer and nappy are bought together mostly, he
can put nappies on sale to promote the sale of beer.

e) Clustering

Clustering is used to identify data objects that are similar to one another. The similarity can be
decided based on a number of factors like purchase behavior, responsiveness to certain actions,
geographical locations and so on. For example, an insurance company can cluster its customers
based on age, residence, income etc. This group information will be helpful to understand the
customers better and hence provide better customized services.
f) Summarization

Summarization is the generalization of data. A set of relevant data is summarized which result
in a smaller set that gives aggregated information of the data. For example, the shopping done
by a customer can be summarized into total products, total spending, offers used, etc. Such
high level summarized information can be useful for sales or customer relationship team for
detailed customer and purchase behavior analysis. Data can be summarized in different
abstraction levels and from different angles.

Handwritten Text Recognition: Software Requirements Specification
No ratings yet
Handwritten Text Recognition: Software Requirements Specification
10 pages
Technincal Report
No ratings yet
Technincal Report
10 pages
Unit 3
No ratings yet
Unit 3
22 pages
What Is Data Mining
No ratings yet
What Is Data Mining
8 pages
Data Mining Notes
No ratings yet
Data Mining Notes
46 pages
IT in Society On Data Mining
No ratings yet
IT in Society On Data Mining
22 pages
Notes DATA MINING MBA III
No ratings yet
Notes DATA MINING MBA III
8 pages
IT in Society - Data Mining
No ratings yet
IT in Society - Data Mining
22 pages
Data Mining Tutorial
No ratings yet
Data Mining Tutorial
30 pages
Data Mining1
No ratings yet
Data Mining1
37 pages
Data Mining Unit 1(Msc Ds 3 Sem)
No ratings yet
Data Mining Unit 1(Msc Ds 3 Sem)
119 pages
Unit 2 (DWDM)
No ratings yet
Unit 2 (DWDM)
40 pages
Data mining M1
No ratings yet
Data mining M1
64 pages
DataMiningFinal
No ratings yet
DataMiningFinal
38 pages
Data Mining
No ratings yet
Data Mining
395 pages
Data-Mining-OVERVIEW (1)
No ratings yet
Data-Mining-OVERVIEW (1)
8 pages
Data Mining: by Doug Alexander
No ratings yet
Data Mining: by Doug Alexander
6 pages
Unit 1
No ratings yet
Unit 1
27 pages
VO_MCA_S4_Data Mining Unit 1
No ratings yet
VO_MCA_S4_Data Mining Unit 1
18 pages
L_1 Data Mining
No ratings yet
L_1 Data Mining
17 pages
DMI UNIT 1_186_N3
No ratings yet
DMI UNIT 1_186_N3
12 pages
data_mining
No ratings yet
data_mining
22 pages
Data Mining Tutorial - Javatpoint
No ratings yet
Data Mining Tutorial - Javatpoint
12 pages
Data Mining Techniques Unit-1
No ratings yet
Data Mining Techniques Unit-1
122 pages
Data Mining
No ratings yet
Data Mining
89 pages
Data-Mining by Harshit Khattar
No ratings yet
Data-Mining by Harshit Khattar
11 pages
data mining
No ratings yet
data mining
23 pages
Introduction to Data Mining_125604
No ratings yet
Introduction to Data Mining_125604
7 pages
Data Mining First draft
No ratings yet
Data Mining First draft
84 pages
Final Document
No ratings yet
Final Document
25 pages
Data Mining L1,2
No ratings yet
Data Mining L1,2
26 pages
Data Rich, Information Poor
No ratings yet
Data Rich, Information Poor
5 pages
Data Mining
No ratings yet
Data Mining
24 pages
Data mining
No ratings yet
Data mining
8 pages
Data Mining-Introduction
No ratings yet
Data Mining-Introduction
8 pages
Data Mining and Data Warehousing Unit 3 Part 1
No ratings yet
Data Mining and Data Warehousing Unit 3 Part 1
13 pages
BIDW Lecture 2
No ratings yet
BIDW Lecture 2
33 pages
Lecture 1 - Introduction
No ratings yet
Lecture 1 - Introduction
46 pages
Data Mining in Search Engine Analytics
No ratings yet
Data Mining in Search Engine Analytics
7 pages
Centre For Management Studies: Online Submission of Assignment-02
No ratings yet
Centre For Management Studies: Online Submission of Assignment-02
10 pages
DWDM 2
No ratings yet
DWDM 2
15 pages
Question 2
No ratings yet
Question 2
1 page
Data Mining Unit 1
No ratings yet
Data Mining Unit 1
71 pages
Module 1 Introduction To Data Mining
No ratings yet
Module 1 Introduction To Data Mining
4 pages
Lps Week 16 Iatb
No ratings yet
Lps Week 16 Iatb
5 pages
KM Notes Unit-3
No ratings yet
KM Notes Unit-3
20 pages
A Techinical Paper: Tupimakadia1@yahoo - Co.in Yamu - 4u1985@yahoo - Co.in
No ratings yet
A Techinical Paper: Tupimakadia1@yahoo - Co.in Yamu - 4u1985@yahoo - Co.in
14 pages
In What Way Data Mining Is Important
No ratings yet
In What Way Data Mining Is Important
7 pages
Data Mining
No ratings yet
Data Mining
18 pages
Unit 1 Data Warehouse and Data Mining
No ratings yet
Unit 1 Data Warehouse and Data Mining
13 pages
Motivation of Data Mining
No ratings yet
Motivation of Data Mining
4 pages
Data Mining Cognate
No ratings yet
Data Mining Cognate
23 pages
Lesson 3 Data Mining activity
No ratings yet
Lesson 3 Data Mining activity
7 pages
data mining
No ratings yet
data mining
17 pages
Data Mining: Concepts and Techniques
No ratings yet
Data Mining: Concepts and Techniques
46 pages
DM-Unit_1
No ratings yet
DM-Unit_1
13 pages
Data Mining
No ratings yet
Data Mining
2 pages
UNIT 1 DSML (1)
No ratings yet
UNIT 1 DSML (1)
11 pages
Data Mining
No ratings yet
Data Mining
6 pages
Dataming T PDF
No ratings yet
Dataming T PDF
48 pages
Data-Driven Business Strategies: Understanding and Harnessing the Power of Big Data
From Everand
Data-Driven Business Strategies: Understanding and Harnessing the Power of Big Data
Steven Vollmer
No ratings yet
DABI - Final Assignment - Arif - Shayekh
No ratings yet
DABI - Final Assignment - Arif - Shayekh
12 pages
1.data Mining Functionalities
No ratings yet
1.data Mining Functionalities
14 pages
1 - Introduction To Data Science
No ratings yet
1 - Introduction To Data Science
6 pages
Number Plate Recogination Using Machine Learning
No ratings yet
Number Plate Recogination Using Machine Learning
11 pages
Advanced Data Analytics Using Python - Unit II
No ratings yet
Advanced Data Analytics Using Python - Unit II
57 pages
Unit I DATA MINING AAGAC
No ratings yet
Unit I DATA MINING AAGAC
27 pages
Final Report
No ratings yet
Final Report
76 pages
Detection of Fake Online Reviews Using Semi-Supervised and Supervised Learning
No ratings yet
Detection of Fake Online Reviews Using Semi-Supervised and Supervised Learning
10 pages
Capstone Review 2
No ratings yet
Capstone Review 2
6 pages
Arc Hydro - Identifying and Managing Sinks
No ratings yet
Arc Hydro - Identifying and Managing Sinks
35 pages
Basic Tweet Preprocessing in Python: 1. Hashtag Extraction Using Regex
No ratings yet
Basic Tweet Preprocessing in Python: 1. Hashtag Extraction Using Regex
2 pages
Hybrid Page Layout Analysis Via Tab-Stop Detection: Google Inc. 1600 Amphitheatre Parkway, Mountain View, CA 94043, USA
No ratings yet
Hybrid Page Layout Analysis Via Tab-Stop Detection: Google Inc. 1600 Amphitheatre Parkway, Mountain View, CA 94043, USA
5 pages
Mini Project Report
No ratings yet
Mini Project Report
22 pages
Facereminder Mobile Application For Prosopagnosia Patients: Ethodology
No ratings yet
Facereminder Mobile Application For Prosopagnosia Patients: Ethodology
4 pages
Sklearn - An Introduction Guide To Machine Learning - AlgoTrading101 Blog
No ratings yet
Sklearn - An Introduction Guide To Machine Learning - AlgoTrading101 Blog
26 pages
Data Mining Doubt Clearing Session Questions
No ratings yet
Data Mining Doubt Clearing Session Questions
12 pages
Machine Learning - It3190E: Hanoi University of Science and Technology School of Information and Communication Technology
No ratings yet
Machine Learning - It3190E: Hanoi University of Science and Technology School of Information and Communication Technology
14 pages
006 Practical List of DM-2023
No ratings yet
006 Practical List of DM-2023
1 page
Social Media Cyberbullying Detection Using Machine Learning in Bengali Language IJERTV10IS050083
No ratings yet
Social Media Cyberbullying Detection Using Machine Learning in Bengali Language IJERTV10IS050083
4 pages
Notes - Unit01 - Data Science and Big Data Analytics
No ratings yet
Notes - Unit01 - Data Science and Big Data Analytics
7 pages
Data Science Is A Multidisciplinary Field That Uses Scientific Methods
No ratings yet
Data Science Is A Multidisciplinary Field That Uses Scientific Methods
2 pages
Prediction and Sentiment Analysis of Stock Using Machine Learning
No ratings yet
Prediction and Sentiment Analysis of Stock Using Machine Learning
10 pages
Data Binning
No ratings yet
Data Binning
9 pages
Saransh Bairagi Minor-2
No ratings yet
Saransh Bairagi Minor-2
16 pages
Dimensionality Reduction of High Dimensional Data: Summer Internship Project Summary
No ratings yet
Dimensionality Reduction of High Dimensional Data: Summer Internship Project Summary
20 pages
Classification and Analysis of Fake Product Review Using Ai
No ratings yet
Classification and Analysis of Fake Product Review Using Ai
9 pages
Chapter 3 - For Class
No ratings yet
Chapter 3 - For Class
52 pages
Sunglass Synergy: Ai-Powered Style Tailoring
No ratings yet
Sunglass Synergy: Ai-Powered Style Tailoring
5 pages
Pre-Processing Example - 1
No ratings yet
Pre-Processing Example - 1
6 pages

Lec 1

Uploaded by

Lec 1

Uploaded by

DATA MINING

History of Data Mining

What is Data Mining?

Advantages of Data Mining

Challenges of data mining

3. Misuse of information/inaccurate information

In addition, data mining technique is not perfectly accurate. Therefore, if inaccurate

Deriving business value from data mining

For example, if a company determines that a particular marketing campaign resulted in

The applications of data mining

Data mining can be applied to a variety of applications in virtually every industry.

The key components of data mining

Data Mining Tasks

c) Time - Series Analysis

You might also like