An Overview of Machine Learning
An Overview of Machine Learning
Machine Learning
1
Outline
Learning
Machine Learning
Supervised vs Unsupervised Learning
Other Learning Paradigms
Challenges
2
Learning is….
3
Learning
Learning knowledge
data
4
Learning
data
5
Learning
data
6
Learning
data
7
Machine Learning is…
8
Machine Learning is…
9
Machine Learning is…
past future
10
Machine Learning is…
Traditional Programming
Data
Computer Output
Program
Machine Learning
Data
Computer Program
Output
11
Magic?
Seeds = Algorithms
Nutrients = Data
Gardener = You
Plants = Programs
12
Learning: Why?
13
Learning: Why?
14
Learning: Why?
15
Learning: Why?
17
Learning: How?
18
Structure of a learning agent
19
Design of learning element
Key questions:
– What is the agent design that will implement
the desired performance?
– Improve the performance of what piece of the
agent system and how is that piece
represented?
– What data are available relevant to that
piece? (In particular, do we know the right
answers?)
– What knowledge is already available?
20
Lots of data
22
Application: Discovering DNA motifs
...TTGGAACAACCATGCACGGTTGATTCGTGCCTGTGACCGCGCGCCTCACACGGAAGACGCAGCCACCGGTTGTGATG
TCATAGGGAATTCCCCATGTCGTGAATAATGCCTCGAATGATGAGTAATAGTAAAACGCAGGGGAGGTTCTTCAGTAGTA
TCAATATGAGACACATACAAACGGGCGTACCTACCGCAGCTCAAAGCTGGGTGCATTTTTGCCAAGTGCCTTACTGTTAT
CTTAGGACGGAAATCCACTATAAGATTATAGAAAGGAAGGCGGGCCGAGCGAATCGATTCAATTAAGTTATGTCACAAGG
GTGCTATAGCCTATTCCTAAGATTTGTACGTGCGTATGACTGGAATTAATAACCCCTCCCTGCACTGACCTTGACTGAAT
AACTGTGATACGACGCAAACTGAACGCTGCGGGTCCTTTATGACCACGGATCACGACCGCTTAAGACCTGAGTTGGAGTT
GATACATCCGGCAGGCAGCCAAATCTTTTGTAGTTGAGACGGATTGCTAAGTGTGTTAACTAAGACTGGTATTTCCACTA
GGACCACGCTTACATCAGGTCCCAAGTGGACAACGAGTCCGTAGTATTGTCCACGAGAGGTCTCCTGATTACATCTTGAA
GTTTGCGACGTGTTATGCGGATGAAACAGGCGGTTCTCATACGGTGGGGCTGGTAAACGAGTTCCGGTCGCGGAGATAAC
TGTTGTGATTGGCACTGAAGTGCGAGGTCTTAAACAGGCCGGGTGTACTAACCCAAAGACCGGCCCAGCGTCAGTGA...
23
Application: User website behavior from
clickstream data (from P. Smyth, UCI)
128.195.36.195, -, 3/22/00, 10:35:11, W3SVC, SRVR1, 128.200.39.181, 781, 363, 875, 200, 0, GET, /top.html, -,
128.195.36.195, -, 3/22/00, 10:35:16, W3SVC, SRVR1, 128.200.39.181, 5288, 524, 414, 200, 0, POST, /spt/main.html, -,
128.195.36.195, -, 3/22/00, 10:35:17, W3SVC, SRVR1, 128.200.39.181, 30, 280, 111, 404, 3, GET, /spt/images/bk1.jpg, -,
128.195.36.101, -, 3/22/00, 16:18:50, W3SVC, SRVR1, 128.200.39.181, 60, 425, 72, 304, 0, GET, /top.html, -,
128.195.36.101, -, 3/22/00, 16:18:58, W3SVC, SRVR1, 128.200.39.181, 8322, 527, 414, 200, 0, POST, /spt/main.html, -,
128.195.36.101, -, 3/22/00, 16:18:59, W3SVC, SRVR1, 128.200.39.181, 0, 280, 111, 404, 3, GET, /spt/images/bk1.jpg, -,
128.200.39.17, -, 3/22/00, 20:54:37, W3SVC, SRVR1, 128.200.39.181, 140, 199, 875, 200, 0, GET, /top.html, -,
128.200.39.17, -, 3/22/00, 20:54:55, W3SVC, SRVR1, 128.200.39.181, 17766, 365, 414, 200, 0, POST, /spt/main.html, -,
128.200.39.17, -, 3/22/00, 20:54:55, W3SVC, SRVR1, 128.200.39.181, 0, 258, 111, 404, 3, GET, /spt/images/bk1.jpg, -,
128.200.39.17, -, 3/22/00, 20:55:07, W3SVC, SRVR1, 128.200.39.181, 0, 258, 111, 404, 3, GET, /spt/images/bk1.jpg, -,
128.200.39.17, -, 3/22/00, 20:55:36, W3SVC, SRVR1, 128.200.39.181, 1061, 382, 414, 200, 0, POST, /spt/main.html, -,
128.200.39.17, -, 3/22/00, 20:55:36, W3SVC, SRVR1, 128.200.39.181, 0, 258, 111, 404, 3, GET, /spt/images/bk1.jpg, -,
128.200.39.17, -, 3/22/00, 20:55:39, W3SVC, SRVR1, 128.200.39.181, 0, 258, 111, 404, 3, GET, /spt/images/bk1.jpg, -,
128.200.39.17, -, 3/22/00, 20:56:03, W3SVC, SRVR1, 128.200.39.181, 1081, 382, 414, 200, 0, POST, /spt/main.html, -,
128.200.39.17, -, 3/22/00, 20:56:04, W3SVC, SRVR1, 128.200.39.181, 0, 258, 111, 404, 3, GET, /spt/images/bk1.jpg, -,
128.200.39.17, -, 3/22/00, 20:56:33, W3SVC, SRVR1, 128.200.39.181, 0, 262, 72, 304, 0, GET, /top.html, -,
128.200.39.17, -, 3/22/00, 20:56:52, W3SVC, SRVR1, 128.200.39.181, 19598, 382, 414, 200, 0, POST, /spt/main.html, -,
User 1 2 3 2 2 3 3 3 1 1 1 3 1 3 3 3 3
User 2 3 3 3 1 1 1
User 3 7 7 7 7 7 7 7 7
User 4 1 5 1 1 1 5 1 5 1 1 1 1 1 1
User 5 5 1 1 5
… …
24
Application: social network analysis
25
Application: spam filtering
26
Machine Learning
induction
optimization
27
Related Disciplines
decision game
theory theory
AI control
theory
information
biological theory
evolution
Machine
probability Learning
& philosophy
statistics
optimization
Data Mining statistical psychology
mechanics
computational
complexity
theory neurophysiology
28
History of Machine Learning
32
A Glimpse in to the future
Today status:
– Second-generation algorithms:
– Deep Neural nets, classifiers, regression etc.
Future:
– Smart remote controls, phones, cars
– Smart Surveillance
– Data and communication networks, software
33
Machine learning problems
34
Data
examples
Data
35
Types of Learning
36
Supervised vs. unsupervised Learning
38
Supervised Learning
Problems:
– Classification
The domain of the target attribute is finite and
categorical.
A classifier must assign a class to a unseen
example.
– Regression
The target attribute is formed by infinite values.
To fit a model to learn the output target attribute as
a function of input attributes.
– Time Series Analysis
Making predictions in time. 39
Unsupervised Learning (Clustering)
Finding groups of objects in data such that the
objects in a group will be similar (or related) to
one another and different from (or unrelated to)
the objects in other groups
Inter-cluster
Intra-cluster distances are
distances are maximized
minimized
40
Unsupervised Learning (cont.)
41
Unsupervised Learning
Problems:
– Clustering
– Association Rules
– Pattern Mining
It is adopted as amore general term than frequent
pattern mining or association mining.
– Outlier Detection
Ot is the process of finding data examples with
behaviours that are very different from the
expectation (outliers or anomalies).
42
Other Learning Paradigms
Imbalanced Learning
– A classification problem where the data has
exceptional distribution on the target attribute.
– The number of examples representing the
class of interest is much lower than that of the
other classes.
Multi-instance Learning
– imposed restrictions on models in which each
example consists of a bag of instances
instead of an unique instance.
43
Other Learning Paradigms
Multi-label Classification
– Each instance is associated not with a class,
but instead with a subset of them.
Semi-supervised Learning
– It is concerned with the design of models in
the presence of both labeled and unlabeled
data.
– Semi-supervised classification and Semi-
supervised clustering.
– Relationship with Active Learning.
44
Other Learning Paradigms
Subgroup Discovery
– It is formed as the result of the hybridization
between classification and association
mining.
– They aim to extract interesting rules with
respect to a target attribute.
Transfer Learning
– Aims to extract the knowledge from one or
more source tasks and apply the knowledge
to a target task.
– The so-called data shift problem is closely
related. 45
Other Learning Paradigms
46
Issues in Machine Learning
47
Measuring Performance
Generalization accuracy
Solution correctness
Solution quality (length, efficiency)
Speed of performance
48
Scaling issues in ML
Number of
– Inputs
– Outputs
– Batch vs realtime
– Training vs testing
49
Machine Learning versus Human Learning
50
Observations
ML has many practical applications and is
probably the most used method in AI.
ML is also an active research area
Role of cognitive science
• Computational model of cognition
Role of neuroscience
• Computational model of the brain
Neural networks
Brain vs mind; hardware vs software
Nearly all ML is still dependent on human
“guidance”
51
Natural Questions
52
Questions ?
53
Thanks
54