Data Mining
Data Mining
Machine
Data Mining Learning AI
Databases
KDD
Why we Need Data Mining
Data explosion problem
– Automated data collection tools and mature
database technology lead to huge amounts of
data accumulated
We are drowning in data, but starving for
knowledge!
Solution: Data warehousing and data mining
– Data warehousing and on-line analytical
processing
– Mining interesting knowledge (rules, regularities,
patterns, constraints) from data in large databases
Potential Applications
Data analysis and decision support
– Market analysis and management
– Risk analysis and management
– Fraud detection and detection of unusual patterns
Other applications
– Text mining (email, documents) and Web mining
– Stream data mining
– DNA and bio-data analysis
Stages of KDD
Knowledge
Evaluation &
Presentation
Data Mining
Selection &
Transformation
Data
Warehouse
Cleaning &
Integration
Databases
Issues and Challenges of Data Mining
Data mining methodology
– Mining different kinds of knowledge from diverse
data types, e.g., bio, stream, Web
– Performance: efficiency, effectiveness, and
scalability
– Pattern evaluation: the interestingness problem
– Incorporation of background knowledge
– Handling noise and incomplete data
– Parallel, distributed and incremental mining
methods
– Integration of the discovered knowledge with
existing one: knowledge fusion
Issues and Challenges of Data Mining
User interaction
– Data mining query languages and ad-hoc mining
– Expression and visualization of resultant
knowledge
– Interactive mining of knowledge at multiple levels
of abstraction
Applications and social impacts
– Domain-specific data mining & invisible data
mining
– Protection of data security, integrity, and privacy
Market Analysis And Management
Where does the data come from?
– Credit card transactions, loyalty cards, discount
coupons, customer complaint calls, etc
Target marketing
– Find clusters of “model” customers who share the
same characteristics
– Determine customer purchasing patterns over
time
Cross-market analysis
– Associations/co-relations between product sales,
& prediction based on such association
Market Analysis And Management
(cont…)
Customer profiling
– What types of customers buy what products
(clustering or classification)
Customer requirement analysis
– Identifying the best products for different customers
– Predict what factors will attract new customers
Provision of summary information
– Multidimensional summary reports
– Statistical summary information (data central
tendency and variation)
Corporate Analysis & Risk
Management
Finance planning and asset evaluation
– Cash flow analysis and prediction
– Contingent claim analysis to evaluate assets
– Cross-sectional and time series analysis (financial-ratio,
trend analysis, etc.)
Resource planning
– Summarize and compare the resources and spending
Competition
– Monitor competitors and market directions
– Group customers into classes and a class-based pricing
procedure
– Set pricing strategy in a highly competitive market
Fraud Detection & Mining Unusual
Patterns
Applications: Health care, retail, credit card service,
telecommunications
– Auto insurance: ring of collisions
– Money laundering: suspicious monetary transactions
– Medical insurance
• Professional patients, ring of doctors, and ring of references
• Unnecessary or correlated screening tests
– Telecommunications: phone-call fraud
• Phone call model: destination of the call, duration, time of day or week.
Analyze patterns that deviate from an expected norm
– Retail industry
• Analysts estimate that 38% of retail shrink is due to dishonest employees
– Anti-terrorism
Approaches: Clustering, model construction, outlier analysis, etc.