MLDM2006S Lecture 01 Introduction
MLDM2006S Lecture 01 Introduction
References:
1. E. Alpaydin, Introduction to Machine Learning, Chapter 1
2. Kantard, Data Mining: Concepts, Models, Methods and Algorithms, Chapter 1
3. Mitchell, Machine Learning, Chapter 1
4. Han and Kamber, Data Mining: Concepts and Techniques, Chapter 1
Main Textbooks
MLDM-Berlin Chen 2
Reference Textbooks
MLDM-Berlin Chen 3
Goals
MLDM-Berlin Chen 4
Machine Learning
• Budding industry
MLDM-Berlin Chen 6
Machine Learning: When ?
MLDM-Berlin Chen 7
Machine Learning: Applications
• Association
• Supervised Learning
– Classification
– Regression
• Unsupervised Learning
• Reinforcement Learning
MLDM-Berlin Chen 8
Associations
• Example: Basket Analysis
– Customer transaction to consumer behavior
• P (Y | X ) probability that somebody who buys X also buys Y
where X and Y are products/services
• Examples:
P( buying “chips” | buying “beer”) = ?
P( buying “Pattern Classification”| buying “Machine Learning” ) = ?
MLDM-Berlin Chen 9
Classification (1/2)
• Also known as Pattern Recognition
• Example 1: Credit Scoring
– Differentiating between low-risk and high-risk customers from
their income and savings
– Discriminant:
IF income > θ1 AND savings > θ2 THEN low-risk ELSE high-risk
MLDM-Berlin Chen 10
Classification (2/2)
• Face Recognition
Training examples of a person
Test images
MLDM-Berlin Chen 12
Regression (2/2)
MLDM-Berlin Chen 13
Clustering
MLDM-Berlin Chen 14
Reinforcement Learning
MLDM-Berlin Chen 15
Other Possible Applications
• Business Management
• Production Control
• Scientific/Medical Research
• …
MLDM-Berlin Chen 16
Some Issues in Machine Learning
MLDM-Berlin Chen 18
What is Data Mining ? (2/4)
MLDM-Berlin Chen 19
What is Data Mining ? (3/4)
Data Analysis
Data Understanding
Data Cleansing
Data Integration
MLDM-Berlin Chen 21
Categories of Data Mining
MLDM-Berlin Chen 22
Multi-Dimensional View of Data Mining
• Databases to be mined
– Relational, transactional, object-oriented, object-relational, active,
spatial, time-series, text, multi-media, heterogeneous, legacy,
WWW, etc.
• Knowledge to be mined
– Characterization, discrimination, association, classification,
clustering, trend, deviation and outlier analysis, etc.
– Granularity: mining at multiple levels of abstraction
• Techniques utilized
– Machine learning, statistics, visualization, neural network,
database-oriented, data warehouse (OLAP), etc.
• Applications adapted
– Retail, telecommunication, banking, fraud analysis, DNA mining,
stock market analysis, Web mining, Weblog analysis, etc.
MLDM-Berlin Chen 23
Roots of Data Mining (1/2)
• Statistics, Mathematics
– Models
• Machine Learning
– Algorithms
• Control theory
– System identification
MLDM-Berlin Chen 24
Roots of Data Mining (2/2)
y* _ + predict system’s
Mathematical model y*=f(u,t) ∑ behaviors
y-y*
Identification techniques
Linear ? Nonlinear ?
MLDM-Berlin Chen 25
Phases of Data Mining (1/7)
MLDM-Berlin Chen 26
Phases of Data Mining (2/7)
MLDM-Berlin Chen 27
Phases of Data Mining (3/7)
MLDM-Berlin Chen 29
Phases of Data Mining (5/7)
MLDM-Berlin Chen 30
Phases of Data Mining (6/7)
MLDM-Berlin Chen 31
Phases of Data Mining (7/7)
Perform preprocessing
MLDM-Berlin Chen 32
Large Data Sets (1/2)
MLDM-Berlin Chen 33
Large Data Sets (2/2)
MLDM-Berlin Chen 35
Data Warehouse (1/5)
• Definition
– A collection of integrated, subject-oriented databases designed to
support the decision-support functions (DSF), where each unit of
data is relevant to some moment in time
• Modeled as a multidimensional database structure
MLDM-Berlin Chen 37
Data Warehouse (3/5)
MLDM-Berlin Chen 38
Data Warehouse (4/5)
• Applications
– Data mining
• Represent one of the major applications for data warehouse
• Provide end-user with the capability to extract hidden,
nontrivial (not obvious) information
– Act as exploratory queries
MLDM-Berlin Chen 40
A Typical Data Mining System
• Architecture
MLDM-Berlin Chen 41
Confluence of Multiple Disciplines
Machine
Learning
Artificial Neural
Intelligence Networks
Database
Statistics
Pattern Knowledge
Recognition Acquisition
Data Visualization/
Knowledge Representation
MLDM-Berlin Chen 42
Topic List and Tentative Schedule
MLDM-Berlin Chen 43
Resources: Journals
MLDM-Berlin Chen 44
Resources: Conferences
MLDM-Berlin Chen 45