0% found this document useful (0 votes)
27 views17 pages

Data Mining

Data mining involves extracting patterns from large amounts of data. It is used to discover hidden knowledge and make organizational decisions. The document discusses what data mining is, the types of data and functionalities involved, potential applications like market analysis and fraud detection, and challenges like privacy and developing domain-specific methods. Data mining draws from multiple disciplines and involves cleaning, transforming, mining, and evaluating data to discover useful patterns.

Uploaded by

GOURAV GHOSH
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPT, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
27 views17 pages

Data Mining

Data mining involves extracting patterns from large amounts of data. It is used to discover hidden knowledge and make organizational decisions. The document discusses what data mining is, the types of data and functionalities involved, potential applications like market analysis and fraud detection, and challenges like privacy and developing domain-specific methods. Data mining draws from multiple disciplines and involves cleaning, transforming, mining, and evaluating data to discover useful patterns.

Uploaded by

GOURAV GHOSH
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPT, PDF, TXT or read online on Scribd
You are on page 1/ 17

Data Mining

What is Data Mining/KDD


Data mining (knowledge discovery from data)
– Extraction of interesting (non-trivial, implicit,
previously unknown and potentially useful)
patterns or knowledge from huge amount of data
What is Data Mining
• By definition is the process of extracting
previously unknown data from large
databases and using it to make orgnisational
decisions.
– Is concerned with the discovery of hidden
knowledge.
– Usually works on large volumes of data
– Is useful in making critical organisationnal
decisions, particularly those of strategic nature
Data Mining
Data Mining referred using a number of names:
–Data Fishing, Data Dredging (1960…):
• Used by statisticians (as bad name)
–Knowledge Discovery in Databases (1989…):
• Used by AI, Machine Learning Community
–Business Intelligence (1990…):
• Business management term
–Also data archaeology, information harvesting, information
discovery, knowledge extraction, data/pattern analysis, etc.
Data Mining: On What Kinds Of Data?
Relational database
Data warehouse
Transactional database
Advanced database and information repository
– Object-relational database
– Spatial and temporal data
– Time-series data
– Stream data
– Multimedia database
– Text databases & WWW
Data Mining Functionalities
Concept description
– Generalize, summarize, and contrast data
characteristics, e.g., dry vs. wet regions
Association (correlation and causality)
– Nappies & Beer
Classification and Prediction
– Construct models that describe and distinguish
classes or concepts for future prediction
– Predict some unknown or missing numerical
values
Data Mining Functionalities
Cluster analysis
– Class label is unknown: Group data to form new classes,
e.g., cluster houses to find distribution patterns
Outlier analysis
– Outlier: a data object that does not comply with the
general behavior of the data
– Noise or exception? No! useful in fraud detection and rare
event analysis
Other pattern-directed or statistical analyses
Data Mining is Multidisciplinary
Statistics
Pattern Neurocomputing
Recognition

Machine
Data Mining Learning AI

Databases
KDD
Why we Need Data Mining
Data explosion problem
– Automated data collection tools and mature
database technology lead to huge amounts of
data accumulated
We are drowning in data, but starving for
knowledge!
Solution: Data warehousing and data mining
– Data warehousing and on-line analytical
processing
– Mining interesting knowledge (rules, regularities,
patterns, constraints) from data in large databases
Potential Applications
Data analysis and decision support
– Market analysis and management
– Risk analysis and management
– Fraud detection and detection of unusual patterns
Other applications
– Text mining (email, documents) and Web mining
– Stream data mining
– DNA and bio-data analysis
Stages of KDD
Knowledge

Evaluation &
Presentation

Data Mining

Selection &
Transformation
Data
Warehouse

Cleaning &
Integration

Databases
Issues and Challenges of Data Mining
Data mining methodology
– Mining different kinds of knowledge from diverse
data types, e.g., bio, stream, Web
– Performance: efficiency, effectiveness, and
scalability
– Pattern evaluation: the interestingness problem
– Incorporation of background knowledge
– Handling noise and incomplete data
– Parallel, distributed and incremental mining
methods
– Integration of the discovered knowledge with
existing one: knowledge fusion
Issues and Challenges of Data Mining
User interaction
– Data mining query languages and ad-hoc mining
– Expression and visualization of resultant
knowledge
– Interactive mining of knowledge at multiple levels
of abstraction
Applications and social impacts
– Domain-specific data mining & invisible data
mining
– Protection of data security, integrity, and privacy
Market Analysis And Management
Where does the data come from?
– Credit card transactions, loyalty cards, discount
coupons, customer complaint calls, etc
Target marketing
– Find clusters of “model” customers who share the
same characteristics
– Determine customer purchasing patterns over
time
Cross-market analysis
– Associations/co-relations between product sales,
& prediction based on such association
Market Analysis And Management
(cont…)
Customer profiling
– What types of customers buy what products
(clustering or classification)
Customer requirement analysis
– Identifying the best products for different customers
– Predict what factors will attract new customers
Provision of summary information
– Multidimensional summary reports
– Statistical summary information (data central
tendency and variation)
Corporate Analysis & Risk
Management
Finance planning and asset evaluation
– Cash flow analysis and prediction
– Contingent claim analysis to evaluate assets
– Cross-sectional and time series analysis (financial-ratio,
trend analysis, etc.)
Resource planning
– Summarize and compare the resources and spending
Competition
– Monitor competitors and market directions
– Group customers into classes and a class-based pricing
procedure
– Set pricing strategy in a highly competitive market
Fraud Detection & Mining Unusual
Patterns
Applications: Health care, retail, credit card service,
telecommunications
– Auto insurance: ring of collisions
– Money laundering: suspicious monetary transactions
– Medical insurance
• Professional patients, ring of doctors, and ring of references
• Unnecessary or correlated screening tests
– Telecommunications: phone-call fraud
• Phone call model: destination of the call, duration, time of day or week.
Analyze patterns that deviate from an expected norm
– Retail industry
• Analysts estimate that 38% of retail shrink is due to dishonest employees
– Anti-terrorism
Approaches: Clustering, model construction, outlier analysis, etc.

You might also like