SlideShare a Scribd company logo
Course title: Introduction to Data Mining and Data Warehousing
CHAPTER ONE
WHAT IS DATA MINING?
Data mining is the process of searching and analyzing a large batch of raw data in order to
identify patterns and extract useful information.
Companies use data mining software to learn more about their customers. It can help them to
develop more effective marketing strategies, increase sales, and decrease costs. Data mining
relies on effective data collection, warehousing, and computer processing.
Data mining is the process of extracting knowledge or insights from large amounts of
data using various statistical and computational techniques. The data can be structured,
semi-structured or unstructured, and can be stored in various forms such as databases,
data warehouses, and data lakes.
The primary goal of data mining is to discover hidden patterns and relationships in the
data that can be used to make informed decisions or predictions. This involves
exploring the data using various techniques such as clustering, classification, regression
analysis, association rule mining, and anomaly detection.
Data mining has a wide range of applications across various industries, including
marketing, finance, healthcare, and telecommunications. For example, in marketing,
data mining can be used to identify customer segments and target marketing campaigns,
while in healthcare, it can be used to identify risk factors for diseases and develop
personalized treatment plans.
 Data mining is the process of analyzing a large batch of information to discern
trends and patterns.
 Data mining can be used by corporations for everything from learning about
what customers are interested in or want to buy to fraud detection and spam
filtering.
Course title: Introduction to Data Mining and Data Warehousing
 Data mining programs break down patterns and connections in data based on
what information users request or provide.
 Social media companies use data mining techniques to commodify their users
in order to generate profit.
 This use of data mining has come under criticism lately as users are often
unaware of the data mining happening with their personal information,
especially when it is used to influence preferences.
Data warehouse and data mining
 S.
No.
Basis of
Comparison Data Warehousing Data Mining
1. Definition
A data warehouse is a
database system that is
designed for analytical
analysis instead of
transactional work.
Data mining is the process of
analyzing data patterns.
2. Process
Data is stored
periodically.
Data is analyzed regularly.
3. Purpose
Data warehousing is the
process of extracting
and storing data to
allow easier reporting.
Data mining is the use of
pattern recognition logic to
identify patterns.
Course title: Introduction to Data Mining and Data Warehousing
 S.
No.
Basis of
Comparison Data Warehousing Data Mining
4.
Managing
Authorities
Data warehousing is
solely carried out by
engineers.
Data mining is carried out by
business users with the help of
engineers.
5.
Data
Handling
Data warehousing is the
process of pooling all
relevant data together.
Data mining is considered as a
process of extracting data from
large data sets.
6. Functionality
Subject-oriented,
integrated, time-varying
and non-volatile
constitute data
warehouses.
AI, statistics, databases,
and machine learning systems
are all used in data mining
technologies.
7. Task
Data warehousing is the
process of extracting
and storing data in order
to make reporting more
efficient.
Pattern recognition logic is
used in data mining to find
patterns.
8. Uses
It extracts data and
stores it in an orderly
format, making
This procedure employs pattern
recognition tools to aid in the
identification of access
Course title: Introduction to Data Mining and Data Warehousing
 S.
No.
Basis of
Comparison Data Warehousing Data Mining
reporting easier and
faster.
patterns.
9. Examples
When a data warehouse
is connected with
operational business
systems like CRM
(Customer Relationship
Management) systems,
it adds value.
Data mining aids in the creation
of suggestive patterns of key
parameters. Customer
purchasing behavior, items, and
sales are examples. As a result,
businesses will be able to make
the required adjustments to
their operations and production.
Course title: Introduction to Data Mining and Data Warehousing
Statistics
Data Mining
Data utilized is Numeric or Non
numeric.
Data utilized is Numeric.
Inductive Process (Generation of
modern hypothesis from data)
Deductive Process (Does not include
making any forecasts)
Data Cleaning is drained data mining.
Clean data is utilized to apply
statistical strategy.
Investigate and assemble data to begin
with, builds show to distinguish patterns
and make theories.
It gives speculations to test utilizing
statistical.
Reasonable for expansive data sets Suitable for littler data sets
Needs less client interaction to approve
model thus, simple to automate.
Needs client interaction to approve
show consequently, troublesome to
automate.
It’s an calculation which learns from
data without utilizing any programming
rule.
ationship in data within the shape of
Skills required for data mining are
Classification, Clustering, Neural
Skills required for Statistics are
Descriptive Statistical, Inferential
Course title: Introduction to Data Mining and Data Warehousing
Statistics
Data Mining
network, Association, Estimation,
Sequence based analysis
Statistical
Applications are Financial Data
Analysis, Retail Industry,
Telecommunication Industry,
Applications are Demography,
Actuarial ScienceBiostatistics, Quality
Control
Course title: Introduction to Data Mining and Data Warehousing
Advantages and Disadvantages of Data Mining
Advantages Disadvantages
It helps gather reliable information Data Mining tools are complex and require training to use
Helps businesses make operational adjustments Data mining techniques are not infallible
Helps to make informed decisions Rising privacy concerns
It helps detect risks and fraud Data mining requires large databases
Helps to understand behaviours, trends and discover
hidden patterns
Expensive
Helps to analyse very large quantities of data quickly
Pros of Data Mining
 It drives profitability and efficiency
 It can be applied to any type of data and business problem
 It can reveal hidden information and trends
Cons of Data Mining
 Complexity
 Results and benefits are not guaranteed
 It can be expensive

Course title: Introduction to Data Mining and Data Warehousing
Applications of Data Mining
Nowadays, large quantities of data are being accumulated. The amount of data collected
is said to be almost doubled every year. An extracting data or seeking knowledge from
this massive data, data mining techniques are used. Data mining is used in almost all
places where a large amount of data is stored and processed. For example, banks typically
use ‘data mining’ to find out their prospective customers who could be interested in credit
cards, personal loans, or insurance as well. Since banks have the transaction details and
detailed profiles of their customers, they analyze all this data and try to find out patterns
that help them predict that certain customers could be interested in personal loans, etc.
Basically, the motive behind mining data, whether commercial or scientific, is the same –
the need to find useful information in data to enable better decision-making or a better
understanding of the world around us.
“Extraction of interesting information or patterns from data in large databases is known
as data mining.”
According to William J.Frawley “Data mining or KDD(Knowledge Discovery in
Databases) as it is also known, is the nontrivial extraction of implicit, previously
unknown, and potentially useful information from data.”
Technically, data mining is the computational process of analyzing data from different
perspectives, dimensions, angles and categorizing/summarizing it into meaningful
information. Data Mining can be applied to any type of data e.g. Data Warehouses,
Transactional Databases, Relational Databases, Multimedia Databases, Spatial Databases,
Time-series Databases, World Wide Web.
Data mining provides competitive advantages in the knowledge economy. It does this by
providing the maximum knowledge needed to rapidly make valuable business decisions
despite the enormous amounts of available data.
There are many measurable benefits that have been achieved in different application
areas from data mining. So, let’s discuss different applications of Data Mining:
Course title: Introduction to Data Mining and Data Warehousing
Scientific Analysis: Scientific simulations are generating bulks of data every day. This
includes data collected from nuclear laboratories, data about human psychology, etc. Data
mining techniques are capable of the analysis of these data. Now we can capture and
store more new data faster than we can analyze the old data already accumulated.
Example of scientific analysis:
 Sequence analysis in bioinformatics
 Classification of astronomical objects
 Medical decision support.
Intrusion Detection: A network intrusion refers to any unauthorized activity on a
digital network. Network intrusions often involve stealing valuable network resources.
Data mining technique plays a vital role in searching intrusion detection, network attacks,
and anomalies. These techniques help in selecting and refining useful and relevant
information from large data sets. Data mining technique helps in classify relevant data for
Intrusion Detection System. Intrusion Detection system generates alarms for the network
traffic about the foreign invasions in the system. For example:
 Detect security violations
 Misuse Detection
Course title: Introduction to Data Mining and Data Warehousing
 Anomaly Detection
Business Transactions: Every business industry is memorized for perpetuity. Such
transactions are usually time-related and can be inter-business deals or intra-business
operations. The effective and in-time use of the data in a reasonable time frame for
competitive decision-making is definitely the most important problem to solve for
businesses that struggle to survive in a highly competitive world. Data mining helps to
analyze these business transactions and identify marketing approaches and decision-
making. Example :
 Direct mail targeting
 Stock trading
 Customer segmentation
 Churn prediction (Churn prediction is one of the most popular Big Data use cases in
business)
Market Basket Analysis: Market Basket Analysis is a technique that gives the careful
study of purchases done by a customer in a supermarket. This concept identifies the
pattern of frequent purchase items by customers. This analysis can help to promote deals,
Course title: Introduction to Data Mining and Data Warehousing
offers, sale by the companies and data mining techniques helps to achieve this analysis
task. Example:
 Data mining concepts are in use for Sales and marketing to provide better customer
service, to improve cross-selling opportunities, to increase direct mail response rates.
 Customer Retention in the form of pattern identification and prediction of likely
defections is possible by Data mining.
 Risk Assessment and Fraud area also use the data-mining concept for identifying
inappropriate or unusual behavior etc.
Education: For analyzing the education sector, data mining uses Educational Data
Mining (EDM) method. This method generates patterns that can be used both by learners
and educators. By using data mining EDM we can perform some educational task:
 Predicting students admission in higher education
 Predicting students profiling
 Predicting student performance
 Teachers teaching performance
 Curriculum development
 Predicting student placement opportunities
Research: A data mining technique can perform predictions, classification, clustering,
associations, and grouping of data with perfection in the research area. Rules generated
by data mining are unique to find results. In most of the technical research in data mining,
we create a training model and testing model. The training/testing model is a strategy to
measure the precision of the proposed model. It is called Train/Test because we split the
data set into two sets: a training data set and a testing data set. A training data set used to
design the training model whereas testing data set is used in the testing model. Example:
 Classification of uncertain data.
 Information-based clustering.
 Decision support system
 Web Mining
 Domain-driven data mining
Course title: Introduction to Data Mining and Data Warehousing
 IoT (Internet of Things)and Cybersecurity
 Smart farming IoT(Internet of Things)
Healthcare and Insurance: A Pharmaceutical sector can examine its new deals force
activity and their outcomes to improve the focusing of high-value physicians and figure
out which promoting activities will have the best effect in the following upcoming
months, Whereas the Insurance sector, data mining can help to predict which customers
will buy new policies, identify behavior patterns of risky customers and identify
fraudulent behavior of customers.
 Claims analysis i.e which medical procedures are claimed together.
 Identify successful medical therapies for different illnesses.
 Characterizes patient behavior to predict office visits.
Transportation: A diversified transportation company with a large direct sales force can
apply data mining to identify the best prospects for its services. A large consumer
merchandise organization can apply information mining to improve its business cycle to
retailers.
 Determine the distribution schedules among outlets.
 Analyze loading patterns.
Financial/Banking Sector: A credit card company can leverage its vast warehouse of
customer transaction data to identify customers most likely to be interested in a new
credit product.
 Credit card fraud detection.
 Identify ‘Loyal’ customers.
 Extraction of information related to customers.
 Determine credit card spending by customer groups.
Course title: Introduction to Data Mining and Data Warehousing
Data Mining vs Statistics
Data Mining Statistics
Explorative – Dig out the data first, Confirmative – Provide theory first and
discover novel patterns and then make then test it using various statistical
theories. tools.
Statistical methods applied on Clean
Involves Data Cleaning
Data
Usually involves working with large Usually involves working with small
datasets. datasets.
Makes generous use of heuristics think There is no scope for heuristics think.
Deductive (Does not involve making
Inductive process
any predictions)
Numeric and Non-Numeric Data Numeric Data
Less concerned about data collection. More concerned about data collection.
Some of the popular data mining
methods include –Estimation, Some of the popular statistical methods
Classification, Neural Networks, include –Inferential and Descriptive
Clustering, Association, and Statistics.
Visualization .
Course title: Introduction to Data Mining and Data Warehousing
Challenges of Data Mining
Data mining, the process of extracting knowledge from data, has become increasingly
important as the amount of data generated by individuals, organizations, and machines
has grown exponentially. However, data mining is not without its challenges.
1]Data Quality
The quality of data used in data mining is one of the most significant challenges. The
accuracy, completeness, and consistency of the data affect the accuracy of the results
obtained. The data may contain errors, omissions, duplications, or inconsistencies, which
may lead to inaccurate results. Moreover, the data may be incomplete, meaning that some
attributes or values are missing, making it challenging to obtain a complete understanding
of the data.
Data quality issues can arise due to a variety of reasons, including data entry errors, data
storage issues, data integration problems, and data transmission errors. To address these
challenges, data mining practitioners must apply data cleaning and data preprocessing
techniques to improve the quality of the data. Data cleaning involves detecting and
correcting errors, while data preprocessing involves transforming the data to make it
suitable for data mining.
2] Data Complexity
Data complexity refers to the vast amounts of data generated by various sources, such as
sensors, social media, and the internet of things (IoT). The complexity of the data may
make it challenging to process, analyze, and understand. In addition, the data may be in
different formats, making it challenging to integrate into a single dataset.
To address this challenge, data mining practitioners use advanced techniques such as
clustering, classification, and association rule mining. These techniques help to identify
patterns and relationships in the data, which can then be used to gain insights and make
predictions.
3] Data Privacy and Security
Data privacy and security is another significant challenge in data mining. As more data is
Course title: Introduction to Data Mining and Data Warehousing
collected, stored, and analyzed, the risk of data breaches and cyber-attacks increases. The
data may contain personal, sensitive, or confidential information that must be protected.
Moreover, data privacy regulations such as GDPR, CCPA, and HIPAA impose strict
rules on how data can be collected, used, and shared.
To address this challenge, data mining practitioners must apply data anonymization and
data encryption techniques to protect the privacy and security of the data. Data
anonymization involves removing personally identifiable information (PII) from the data,
while data encryption involves using algorithms to encode the data to make it unreadable
to unauthorized users.
4] Scalability
Data mining algorithms must be scalable to handle large datasets efficiently. As the size
of the dataset increases, the time and computational resources required to perform data
mining operations also increase. Moreover, the algorithms must be able to handle
streaming data, which is generated continuously and must be processed in real-time.
To address this challenge, data mining practitioners use distributed computing
frameworks such as Hadoop and Spark. These frameworks distribute the data and
processing across multiple nodes, making it possible to process large datasets quickly and
efficiently.
4] interpretability
Data mining algorithms can produce complex models that are difficult to interpret. This is
because the algorithms use a combination of statistical and mathematical techniques to
identify patterns and relationships in the data. Moreover, the models may not be intuitive,
making it challenging to understand how the model arrived at a particular conclusion.
To address this challenge, data mining practitioners use visualization techniques to
represent the data and the models visually. Visualization makes it easier to understand the
patterns and relationships in the data and to identify the most important variables.
5] Ethics
Data mining raises ethical concerns related to the collection, use, and dissemination of
data. The data may be used to discriminate against certain groups, violate privacy rights,
Course title: Introduction to Data Mining and Data Warehousing
or perpetuate existing biases. Moreover, data mining algorithms may not be transparent,
making it challenging to detect biases or discrimination.

More Related Content

PPTX
Data mining
PPTX
Introduction to dm and dw
PPTX
Introduction to Data Mining and Data Warehousing
PPT
dwdm unit 1.ppt
PPTX
Data Mining & Applications
PDF
Data mining excel.pdf
PPTX
Data warehousing and Data mining
PPT
Data Warehouse and Data Mining
Data mining
Introduction to dm and dw
Introduction to Data Mining and Data Warehousing
dwdm unit 1.ppt
Data Mining & Applications
Data mining excel.pdf
Data warehousing and Data mining
Data Warehouse and Data Mining

Similar to Chapter 1 Handoutfffffffffffffffffffffffffffffffffffff.pdf (20)

PPTX
Data mining and Data Warehousing in Databases.pptx
PPT
Data Mining
DOCX
Seminar Report Vaibhav
PPT
Data Mining Xuequn Shang NorthWestern Polytechnical University
PPTX
DATA MINING seminar prjzkpwnshzghBwkwodoxjz
PPTX
Business Intelligence and Analytics Unit-2 part-A .pptx
PPT
Unit 1_data mining and warehousing subject
PDF
Lect 1 introduction
PPTX
Topic(1)-Intro data mining master ALEX.pptx
PDF
Overview of Data Mining
PPT
Data mining & data warehousing
PPT
Data Mining and Data Warehousing
PPTX
Lect 1 introduction
PPT
DMML1_overview.ppt
DOCX
notes_dmdw_chap1.docx
PPTX
DWDM 3rd EDITION TEXT BOOK SLIDES24.pptx
PPTX
dataminingintroductionpptpptpptptro.pptx
PPTX
Data mining
PPT
3RD B.TECH-DATAMINING-INTRODUCTION-UNIT1 .ppt
Data mining and Data Warehousing in Databases.pptx
Data Mining
Seminar Report Vaibhav
Data Mining Xuequn Shang NorthWestern Polytechnical University
DATA MINING seminar prjzkpwnshzghBwkwodoxjz
Business Intelligence and Analytics Unit-2 part-A .pptx
Unit 1_data mining and warehousing subject
Lect 1 introduction
Topic(1)-Intro data mining master ALEX.pptx
Overview of Data Mining
Data mining & data warehousing
Data Mining and Data Warehousing
Lect 1 introduction
DMML1_overview.ppt
notes_dmdw_chap1.docx
DWDM 3rd EDITION TEXT BOOK SLIDES24.pptx
dataminingintroductionpptpptpptptro.pptx
Data mining
3RD B.TECH-DATAMINING-INTRODUCTION-UNIT1 .ppt
Ad

Recently uploaded (20)

PPTX
ENCOR_Chapter_10 - OSPFv3 Attribution.pptx
PDF
APNIC Update, presented at PHNOG 2025 by Shane Hermoso
PPTX
SEO Trends in 2025 | B3AITS - Bow & 3 Arrows IT Solutions
PDF
Triggering QUIC, presented by Geoff Huston at IETF 123
PPTX
Parallel & Concurrent ...
PDF
Decoding a Decade: 10 Years of Applied CTI Discipline
PDF
Generative AI Foundations: AI Skills for the Future of Work
PPTX
Crypto Recovery California Services.pptx
PDF
“Google Algorithm Updates in 2025 Guide”
PDF
KIPER4D situs Exclusive Game dari server Star Gaming Asia
PDF
5g is Reshaping the Competitive Landscape
PDF
LABUAN4D EXCLUSIVE SERVER STAR GAMING ASIA NO.1
PPTX
Unlocking Hope : How Crypto Recovery Services Can Reclaim Your Lost Funds
PDF
Glotv Iptv Overview Channels, Pricing, and Setup Guide (1).pdf
PDF
WebRTC in SignalWire - troubleshooting media negotiation
PDF
Testing WebRTC applications at scale.pdf
PDF
LABUAN4D EXCLUSIVE SERVER STAR GAMING ASIA NO.1
PPT
256065457-Anaesthesia-in-Liver-Disease-Patient.ppt
PPTX
LESSON-2-Roles-of-ICT-in-Teaching-for-learning_123922 (1).pptx
PPTX
QR Codes Qr codecodecodecodecocodedecodecode
ENCOR_Chapter_10 - OSPFv3 Attribution.pptx
APNIC Update, presented at PHNOG 2025 by Shane Hermoso
SEO Trends in 2025 | B3AITS - Bow & 3 Arrows IT Solutions
Triggering QUIC, presented by Geoff Huston at IETF 123
Parallel & Concurrent ...
Decoding a Decade: 10 Years of Applied CTI Discipline
Generative AI Foundations: AI Skills for the Future of Work
Crypto Recovery California Services.pptx
“Google Algorithm Updates in 2025 Guide”
KIPER4D situs Exclusive Game dari server Star Gaming Asia
5g is Reshaping the Competitive Landscape
LABUAN4D EXCLUSIVE SERVER STAR GAMING ASIA NO.1
Unlocking Hope : How Crypto Recovery Services Can Reclaim Your Lost Funds
Glotv Iptv Overview Channels, Pricing, and Setup Guide (1).pdf
WebRTC in SignalWire - troubleshooting media negotiation
Testing WebRTC applications at scale.pdf
LABUAN4D EXCLUSIVE SERVER STAR GAMING ASIA NO.1
256065457-Anaesthesia-in-Liver-Disease-Patient.ppt
LESSON-2-Roles-of-ICT-in-Teaching-for-learning_123922 (1).pptx
QR Codes Qr codecodecodecodecocodedecodecode
Ad

Chapter 1 Handoutfffffffffffffffffffffffffffffffffffff.pdf

  • 1. Course title: Introduction to Data Mining and Data Warehousing CHAPTER ONE WHAT IS DATA MINING? Data mining is the process of searching and analyzing a large batch of raw data in order to identify patterns and extract useful information. Companies use data mining software to learn more about their customers. It can help them to develop more effective marketing strategies, increase sales, and decrease costs. Data mining relies on effective data collection, warehousing, and computer processing. Data mining is the process of extracting knowledge or insights from large amounts of data using various statistical and computational techniques. The data can be structured, semi-structured or unstructured, and can be stored in various forms such as databases, data warehouses, and data lakes. The primary goal of data mining is to discover hidden patterns and relationships in the data that can be used to make informed decisions or predictions. This involves exploring the data using various techniques such as clustering, classification, regression analysis, association rule mining, and anomaly detection. Data mining has a wide range of applications across various industries, including marketing, finance, healthcare, and telecommunications. For example, in marketing, data mining can be used to identify customer segments and target marketing campaigns, while in healthcare, it can be used to identify risk factors for diseases and develop personalized treatment plans.  Data mining is the process of analyzing a large batch of information to discern trends and patterns.  Data mining can be used by corporations for everything from learning about what customers are interested in or want to buy to fraud detection and spam filtering.
  • 2. Course title: Introduction to Data Mining and Data Warehousing  Data mining programs break down patterns and connections in data based on what information users request or provide.  Social media companies use data mining techniques to commodify their users in order to generate profit.  This use of data mining has come under criticism lately as users are often unaware of the data mining happening with their personal information, especially when it is used to influence preferences. Data warehouse and data mining  S. No. Basis of Comparison Data Warehousing Data Mining 1. Definition A data warehouse is a database system that is designed for analytical analysis instead of transactional work. Data mining is the process of analyzing data patterns. 2. Process Data is stored periodically. Data is analyzed regularly. 3. Purpose Data warehousing is the process of extracting and storing data to allow easier reporting. Data mining is the use of pattern recognition logic to identify patterns.
  • 3. Course title: Introduction to Data Mining and Data Warehousing  S. No. Basis of Comparison Data Warehousing Data Mining 4. Managing Authorities Data warehousing is solely carried out by engineers. Data mining is carried out by business users with the help of engineers. 5. Data Handling Data warehousing is the process of pooling all relevant data together. Data mining is considered as a process of extracting data from large data sets. 6. Functionality Subject-oriented, integrated, time-varying and non-volatile constitute data warehouses. AI, statistics, databases, and machine learning systems are all used in data mining technologies. 7. Task Data warehousing is the process of extracting and storing data in order to make reporting more efficient. Pattern recognition logic is used in data mining to find patterns. 8. Uses It extracts data and stores it in an orderly format, making This procedure employs pattern recognition tools to aid in the identification of access
  • 4. Course title: Introduction to Data Mining and Data Warehousing  S. No. Basis of Comparison Data Warehousing Data Mining reporting easier and faster. patterns. 9. Examples When a data warehouse is connected with operational business systems like CRM (Customer Relationship Management) systems, it adds value. Data mining aids in the creation of suggestive patterns of key parameters. Customer purchasing behavior, items, and sales are examples. As a result, businesses will be able to make the required adjustments to their operations and production.
  • 5. Course title: Introduction to Data Mining and Data Warehousing Statistics Data Mining Data utilized is Numeric or Non numeric. Data utilized is Numeric. Inductive Process (Generation of modern hypothesis from data) Deductive Process (Does not include making any forecasts) Data Cleaning is drained data mining. Clean data is utilized to apply statistical strategy. Investigate and assemble data to begin with, builds show to distinguish patterns and make theories. It gives speculations to test utilizing statistical. Reasonable for expansive data sets Suitable for littler data sets Needs less client interaction to approve model thus, simple to automate. Needs client interaction to approve show consequently, troublesome to automate. It’s an calculation which learns from data without utilizing any programming rule. ationship in data within the shape of Skills required for data mining are Classification, Clustering, Neural Skills required for Statistics are Descriptive Statistical, Inferential
  • 6. Course title: Introduction to Data Mining and Data Warehousing Statistics Data Mining network, Association, Estimation, Sequence based analysis Statistical Applications are Financial Data Analysis, Retail Industry, Telecommunication Industry, Applications are Demography, Actuarial ScienceBiostatistics, Quality Control
  • 7. Course title: Introduction to Data Mining and Data Warehousing Advantages and Disadvantages of Data Mining Advantages Disadvantages It helps gather reliable information Data Mining tools are complex and require training to use Helps businesses make operational adjustments Data mining techniques are not infallible Helps to make informed decisions Rising privacy concerns It helps detect risks and fraud Data mining requires large databases Helps to understand behaviours, trends and discover hidden patterns Expensive Helps to analyse very large quantities of data quickly Pros of Data Mining  It drives profitability and efficiency  It can be applied to any type of data and business problem  It can reveal hidden information and trends Cons of Data Mining  Complexity  Results and benefits are not guaranteed  It can be expensive 
  • 8. Course title: Introduction to Data Mining and Data Warehousing Applications of Data Mining Nowadays, large quantities of data are being accumulated. The amount of data collected is said to be almost doubled every year. An extracting data or seeking knowledge from this massive data, data mining techniques are used. Data mining is used in almost all places where a large amount of data is stored and processed. For example, banks typically use ‘data mining’ to find out their prospective customers who could be interested in credit cards, personal loans, or insurance as well. Since banks have the transaction details and detailed profiles of their customers, they analyze all this data and try to find out patterns that help them predict that certain customers could be interested in personal loans, etc. Basically, the motive behind mining data, whether commercial or scientific, is the same – the need to find useful information in data to enable better decision-making or a better understanding of the world around us. “Extraction of interesting information or patterns from data in large databases is known as data mining.” According to William J.Frawley “Data mining or KDD(Knowledge Discovery in Databases) as it is also known, is the nontrivial extraction of implicit, previously unknown, and potentially useful information from data.” Technically, data mining is the computational process of analyzing data from different perspectives, dimensions, angles and categorizing/summarizing it into meaningful information. Data Mining can be applied to any type of data e.g. Data Warehouses, Transactional Databases, Relational Databases, Multimedia Databases, Spatial Databases, Time-series Databases, World Wide Web. Data mining provides competitive advantages in the knowledge economy. It does this by providing the maximum knowledge needed to rapidly make valuable business decisions despite the enormous amounts of available data. There are many measurable benefits that have been achieved in different application areas from data mining. So, let’s discuss different applications of Data Mining:
  • 9. Course title: Introduction to Data Mining and Data Warehousing Scientific Analysis: Scientific simulations are generating bulks of data every day. This includes data collected from nuclear laboratories, data about human psychology, etc. Data mining techniques are capable of the analysis of these data. Now we can capture and store more new data faster than we can analyze the old data already accumulated. Example of scientific analysis:  Sequence analysis in bioinformatics  Classification of astronomical objects  Medical decision support. Intrusion Detection: A network intrusion refers to any unauthorized activity on a digital network. Network intrusions often involve stealing valuable network resources. Data mining technique plays a vital role in searching intrusion detection, network attacks, and anomalies. These techniques help in selecting and refining useful and relevant information from large data sets. Data mining technique helps in classify relevant data for Intrusion Detection System. Intrusion Detection system generates alarms for the network traffic about the foreign invasions in the system. For example:  Detect security violations  Misuse Detection
  • 10. Course title: Introduction to Data Mining and Data Warehousing  Anomaly Detection Business Transactions: Every business industry is memorized for perpetuity. Such transactions are usually time-related and can be inter-business deals or intra-business operations. The effective and in-time use of the data in a reasonable time frame for competitive decision-making is definitely the most important problem to solve for businesses that struggle to survive in a highly competitive world. Data mining helps to analyze these business transactions and identify marketing approaches and decision- making. Example :  Direct mail targeting  Stock trading  Customer segmentation  Churn prediction (Churn prediction is one of the most popular Big Data use cases in business) Market Basket Analysis: Market Basket Analysis is a technique that gives the careful study of purchases done by a customer in a supermarket. This concept identifies the pattern of frequent purchase items by customers. This analysis can help to promote deals,
  • 11. Course title: Introduction to Data Mining and Data Warehousing offers, sale by the companies and data mining techniques helps to achieve this analysis task. Example:  Data mining concepts are in use for Sales and marketing to provide better customer service, to improve cross-selling opportunities, to increase direct mail response rates.  Customer Retention in the form of pattern identification and prediction of likely defections is possible by Data mining.  Risk Assessment and Fraud area also use the data-mining concept for identifying inappropriate or unusual behavior etc. Education: For analyzing the education sector, data mining uses Educational Data Mining (EDM) method. This method generates patterns that can be used both by learners and educators. By using data mining EDM we can perform some educational task:  Predicting students admission in higher education  Predicting students profiling  Predicting student performance  Teachers teaching performance  Curriculum development  Predicting student placement opportunities Research: A data mining technique can perform predictions, classification, clustering, associations, and grouping of data with perfection in the research area. Rules generated by data mining are unique to find results. In most of the technical research in data mining, we create a training model and testing model. The training/testing model is a strategy to measure the precision of the proposed model. It is called Train/Test because we split the data set into two sets: a training data set and a testing data set. A training data set used to design the training model whereas testing data set is used in the testing model. Example:  Classification of uncertain data.  Information-based clustering.  Decision support system  Web Mining  Domain-driven data mining
  • 12. Course title: Introduction to Data Mining and Data Warehousing  IoT (Internet of Things)and Cybersecurity  Smart farming IoT(Internet of Things) Healthcare and Insurance: A Pharmaceutical sector can examine its new deals force activity and their outcomes to improve the focusing of high-value physicians and figure out which promoting activities will have the best effect in the following upcoming months, Whereas the Insurance sector, data mining can help to predict which customers will buy new policies, identify behavior patterns of risky customers and identify fraudulent behavior of customers.  Claims analysis i.e which medical procedures are claimed together.  Identify successful medical therapies for different illnesses.  Characterizes patient behavior to predict office visits. Transportation: A diversified transportation company with a large direct sales force can apply data mining to identify the best prospects for its services. A large consumer merchandise organization can apply information mining to improve its business cycle to retailers.  Determine the distribution schedules among outlets.  Analyze loading patterns. Financial/Banking Sector: A credit card company can leverage its vast warehouse of customer transaction data to identify customers most likely to be interested in a new credit product.  Credit card fraud detection.  Identify ‘Loyal’ customers.  Extraction of information related to customers.  Determine credit card spending by customer groups.
  • 13. Course title: Introduction to Data Mining and Data Warehousing Data Mining vs Statistics Data Mining Statistics Explorative – Dig out the data first, Confirmative – Provide theory first and discover novel patterns and then make then test it using various statistical theories. tools. Statistical methods applied on Clean Involves Data Cleaning Data Usually involves working with large Usually involves working with small datasets. datasets. Makes generous use of heuristics think There is no scope for heuristics think. Deductive (Does not involve making Inductive process any predictions) Numeric and Non-Numeric Data Numeric Data Less concerned about data collection. More concerned about data collection. Some of the popular data mining methods include –Estimation, Some of the popular statistical methods Classification, Neural Networks, include –Inferential and Descriptive Clustering, Association, and Statistics. Visualization .
  • 14. Course title: Introduction to Data Mining and Data Warehousing Challenges of Data Mining Data mining, the process of extracting knowledge from data, has become increasingly important as the amount of data generated by individuals, organizations, and machines has grown exponentially. However, data mining is not without its challenges. 1]Data Quality The quality of data used in data mining is one of the most significant challenges. The accuracy, completeness, and consistency of the data affect the accuracy of the results obtained. The data may contain errors, omissions, duplications, or inconsistencies, which may lead to inaccurate results. Moreover, the data may be incomplete, meaning that some attributes or values are missing, making it challenging to obtain a complete understanding of the data. Data quality issues can arise due to a variety of reasons, including data entry errors, data storage issues, data integration problems, and data transmission errors. To address these challenges, data mining practitioners must apply data cleaning and data preprocessing techniques to improve the quality of the data. Data cleaning involves detecting and correcting errors, while data preprocessing involves transforming the data to make it suitable for data mining. 2] Data Complexity Data complexity refers to the vast amounts of data generated by various sources, such as sensors, social media, and the internet of things (IoT). The complexity of the data may make it challenging to process, analyze, and understand. In addition, the data may be in different formats, making it challenging to integrate into a single dataset. To address this challenge, data mining practitioners use advanced techniques such as clustering, classification, and association rule mining. These techniques help to identify patterns and relationships in the data, which can then be used to gain insights and make predictions. 3] Data Privacy and Security Data privacy and security is another significant challenge in data mining. As more data is
  • 15. Course title: Introduction to Data Mining and Data Warehousing collected, stored, and analyzed, the risk of data breaches and cyber-attacks increases. The data may contain personal, sensitive, or confidential information that must be protected. Moreover, data privacy regulations such as GDPR, CCPA, and HIPAA impose strict rules on how data can be collected, used, and shared. To address this challenge, data mining practitioners must apply data anonymization and data encryption techniques to protect the privacy and security of the data. Data anonymization involves removing personally identifiable information (PII) from the data, while data encryption involves using algorithms to encode the data to make it unreadable to unauthorized users. 4] Scalability Data mining algorithms must be scalable to handle large datasets efficiently. As the size of the dataset increases, the time and computational resources required to perform data mining operations also increase. Moreover, the algorithms must be able to handle streaming data, which is generated continuously and must be processed in real-time. To address this challenge, data mining practitioners use distributed computing frameworks such as Hadoop and Spark. These frameworks distribute the data and processing across multiple nodes, making it possible to process large datasets quickly and efficiently. 4] interpretability Data mining algorithms can produce complex models that are difficult to interpret. This is because the algorithms use a combination of statistical and mathematical techniques to identify patterns and relationships in the data. Moreover, the models may not be intuitive, making it challenging to understand how the model arrived at a particular conclusion. To address this challenge, data mining practitioners use visualization techniques to represent the data and the models visually. Visualization makes it easier to understand the patterns and relationships in the data and to identify the most important variables. 5] Ethics Data mining raises ethical concerns related to the collection, use, and dissemination of data. The data may be used to discriminate against certain groups, violate privacy rights,
  • 16. Course title: Introduction to Data Mining and Data Warehousing or perpetuate existing biases. Moreover, data mining algorithms may not be transparent, making it challenging to detect biases or discrimination.