Data Mining and Its Applications
Data Mining and Its Applications
• Databases to be mined
– Relational, transactional, object-oriented, object-relational, active,
spatial, time-series, text, multi-media, heterogeneous, legacy, WWW,
etc.
• Knowledge to be mined
– Characterization, discrimination, association, classification, clustering,
trend, deviation and outlier analysis, etc.
– Multiple/integrated functions and mining at multiple levels
• Techniques utilized
– Database-oriented, data warehouse (OLAP), machine learning,
statistics, visualization, neural network, etc.
• Applications adapted
– Retail, telecommunication, banking, fraud analysis, DNA mining, stock market
analysis, Web mining, Weblog analysis, etc.
Data Mining Models and Tasks
Data Mining Tasks
• Prediction Tasks
– Use some variables to predict unknown or future values of other
variables
• Description Tasks
– Find human-interpretable patterns that describe the data.
Data Exploration
Statistical Analysis, Querying and Reporting
• Relational databases
• Data warehouses
• Transactional databases
• Advanced DB and information repositories
– Object-oriented and object-relational databases
– Spatial databases
– Time-series data and temporal data
– Text databases and multimedia databases
– Heterogeneous and legacy databases
– WWW
Data Mining: Confluence of Multiple
Disciplines
Database
Statistics
Technology
Machine
Learning
Data Mining Visualization
Information Other
Science Disciplines
Data Mining vs. Statistical Analysis
Statistical Analysis:
• Ill-suited for Nominal and Structured Data Types
• Completely data driven - incorporation of domain knowledge not possible
• Interpretation of results is difficult and daunting
• Requires expert user guidance
Data Mining:
• Large Data sets
• Efficiency of Algorithms is important
• Scalability of Algorithms is important
• Real World Data
• Lots of Missing Values
• Pre-existing data - not user generated
• Data not static - prone to updates
• Efficient methods for data retrieval available for use
Data Mining vs. DBMS
Campaign Management
Data Mining in Practice
Application Areas
Industry Application
Finance Credit Card Analysis
Insurance Claims, Fraud Analysis
Telecommunication Call record analysis
Transport Logistics management
Consumer goods promotion analysis
Data Service providers Value added data
Utilities Power usage analysis
Why Now?
• Data is being produced
• Data is being warehoused
• The computing power is available
• The computing power is affordable
• The competitive pressures are strong
• Commercial products are available
Data Mining works with
Warehouse Data
• Data Warehousing provides the
Enterprise with a memory