0% found this document useful (0 votes)
51 views

An Overview of Machine Learning

This document provides an overview of machine learning. It defines machine learning as a branch of artificial intelligence that uses algorithms to learn from data and improve performance. The document outlines different types of learning paradigms such as supervised learning, unsupervised learning, and other approaches. It also discusses challenges in machine learning like dealing with large amounts of data and adversarial learning situations.

Uploaded by

Charanjit singh
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
51 views

An Overview of Machine Learning

This document provides an overview of machine learning. It defines machine learning as a branch of artificial intelligence that uses algorithms to learn from data and improve performance. The document outlines different types of learning paradigms such as supervised learning, unsupervised learning, and other approaches. It also discusses challenges in machine learning like dealing with large amounts of data and adversarial learning situations.

Uploaded by

Charanjit singh
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 54

An Overview of

Machine Learning

Dr. Ashish Khare


Associate Professor of Computer Science
University of Allahabad, Prayagraj
Email: [email protected]

1
Outline

 Learning
 Machine Learning
 Supervised vs Unsupervised Learning
 Other Learning Paradigms
 Challenges

2
Learning is….

… a computational process for improving


performance based on experience

… is the acquisition of knowledge or skills through


study, experience, or being taught.

3
Learning

Learning knowledge

data

4
Learning

prior Learning knowledge


knowledge

data

5
Learning

prior Learning knowledge


knowledge

data

6
Learning

prior Learning knowledge


knowledge

data

Crucial open problem: weak intermediate forms of knowledge


that support future generalizations

7
Machine Learning is…

Machine learning, a branch of artificial intelligence,


concerns the construction and study of systems that
can learn from data.

8
Machine Learning is…

Machine learning is about predicting the future based


on the past.
-- Hal Daume III

9
Machine Learning is…

Machine learning is about predicting the future based


on the past.
-- Hal Daume III

past future

Training model/ Testing model/


Data predictor Data predictor

10
Machine Learning is…

Traditional Programming

Data
Computer Output
Program

Machine Learning

Data
Computer Program
Output

11
Magic?

No, more like gardening

 Seeds = Algorithms
 Nutrients = Data
 Gardener = You
 Plants = Programs

12
Learning: Why?

13
Learning: Why?

 The baby, assailed by eyes, ears, nose, skin, and


entrails at once, feels it all as one great blooming,
buzzing confusion …
– [William James, 1890]

14
Learning: Why?

 The baby, assailed by eyes, ears, nose, skin, and


entrails at once, feels it all as one great blooming,
buzzing confusion …
– [William James, 1890]

Learning is essential for unknown environments,


i.e., when the designer lacks omniscience

15
Learning: Why?

 Instead of trying to produce a programme to


simulate the adult mind, why not rather try to
produce one which simulates the child's? If this
were then subjected to an appropriate course of
education one would obtain the adult brain.
Presumably the child brain is something like a
notebook as one buys it from the stationer's.
Rather little mechanism, and lots of blank sheets.
– [Alan Turing, 1950]
 Learning is useful as a system
construction method, i.e., expose the
system to reality rather than trying to write
it down
16
Learning: How?

17
Learning: How?

18
Structure of a learning agent

19
Design of learning element

 Key questions:
– What is the agent design that will implement
the desired performance?
– Improve the performance of what piece of the
agent system and how is that piece
represented?
– What data are available relevant to that
piece? (In particular, do we know the right
answers?)
– What knowledge is already available?

20
Lots of data

• Web: estimated Google index 45 billion pages


• Clickstream data: 10-100 TB/day
• Transaction data: 5-50 TB/day
• Satellite image feeds: ~1TB/day/satellite
• Sensor networks/arrays
– CERN Large Hadron Collider ~100
petabytes/day
• Biological data: 1-10TB/day/sequencer
• TV: 2TB/day/channel; YouTube 4TB/day
uploaded
• Digitized telephony: ~100 petabytes/day
21
Application: satellite image analysis

22
Application: Discovering DNA motifs

...TTGGAACAACCATGCACGGTTGATTCGTGCCTGTGACCGCGCGCCTCACACGGAAGACGCAGCCACCGGTTGTGATG
TCATAGGGAATTCCCCATGTCGTGAATAATGCCTCGAATGATGAGTAATAGTAAAACGCAGGGGAGGTTCTTCAGTAGTA
TCAATATGAGACACATACAAACGGGCGTACCTACCGCAGCTCAAAGCTGGGTGCATTTTTGCCAAGTGCCTTACTGTTAT
CTTAGGACGGAAATCCACTATAAGATTATAGAAAGGAAGGCGGGCCGAGCGAATCGATTCAATTAAGTTATGTCACAAGG
GTGCTATAGCCTATTCCTAAGATTTGTACGTGCGTATGACTGGAATTAATAACCCCTCCCTGCACTGACCTTGACTGAAT
AACTGTGATACGACGCAAACTGAACGCTGCGGGTCCTTTATGACCACGGATCACGACCGCTTAAGACCTGAGTTGGAGTT
GATACATCCGGCAGGCAGCCAAATCTTTTGTAGTTGAGACGGATTGCTAAGTGTGTTAACTAAGACTGGTATTTCCACTA
GGACCACGCTTACATCAGGTCCCAAGTGGACAACGAGTCCGTAGTATTGTCCACGAGAGGTCTCCTGATTACATCTTGAA
GTTTGCGACGTGTTATGCGGATGAAACAGGCGGTTCTCATACGGTGGGGCTGGTAAACGAGTTCCGGTCGCGGAGATAAC
TGTTGTGATTGGCACTGAAGTGCGAGGTCTTAAACAGGCCGGGTGTACTAACCCAAAGACCGGCCCAGCGTCAGTGA...

23
Application: User website behavior from
clickstream data (from P. Smyth, UCI)
128.195.36.195, -, 3/22/00, 10:35:11, W3SVC, SRVR1, 128.200.39.181, 781, 363, 875, 200, 0, GET, /top.html, -,
128.195.36.195, -, 3/22/00, 10:35:16, W3SVC, SRVR1, 128.200.39.181, 5288, 524, 414, 200, 0, POST, /spt/main.html, -,
128.195.36.195, -, 3/22/00, 10:35:17, W3SVC, SRVR1, 128.200.39.181, 30, 280, 111, 404, 3, GET, /spt/images/bk1.jpg, -,
128.195.36.101, -, 3/22/00, 16:18:50, W3SVC, SRVR1, 128.200.39.181, 60, 425, 72, 304, 0, GET, /top.html, -,
128.195.36.101, -, 3/22/00, 16:18:58, W3SVC, SRVR1, 128.200.39.181, 8322, 527, 414, 200, 0, POST, /spt/main.html, -,
128.195.36.101, -, 3/22/00, 16:18:59, W3SVC, SRVR1, 128.200.39.181, 0, 280, 111, 404, 3, GET, /spt/images/bk1.jpg, -,
128.200.39.17, -, 3/22/00, 20:54:37, W3SVC, SRVR1, 128.200.39.181, 140, 199, 875, 200, 0, GET, /top.html, -,
128.200.39.17, -, 3/22/00, 20:54:55, W3SVC, SRVR1, 128.200.39.181, 17766, 365, 414, 200, 0, POST, /spt/main.html, -,
128.200.39.17, -, 3/22/00, 20:54:55, W3SVC, SRVR1, 128.200.39.181, 0, 258, 111, 404, 3, GET, /spt/images/bk1.jpg, -,
128.200.39.17, -, 3/22/00, 20:55:07, W3SVC, SRVR1, 128.200.39.181, 0, 258, 111, 404, 3, GET, /spt/images/bk1.jpg, -,
128.200.39.17, -, 3/22/00, 20:55:36, W3SVC, SRVR1, 128.200.39.181, 1061, 382, 414, 200, 0, POST, /spt/main.html, -,
128.200.39.17, -, 3/22/00, 20:55:36, W3SVC, SRVR1, 128.200.39.181, 0, 258, 111, 404, 3, GET, /spt/images/bk1.jpg, -,
128.200.39.17, -, 3/22/00, 20:55:39, W3SVC, SRVR1, 128.200.39.181, 0, 258, 111, 404, 3, GET, /spt/images/bk1.jpg, -,
128.200.39.17, -, 3/22/00, 20:56:03, W3SVC, SRVR1, 128.200.39.181, 1081, 382, 414, 200, 0, POST, /spt/main.html, -,
128.200.39.17, -, 3/22/00, 20:56:04, W3SVC, SRVR1, 128.200.39.181, 0, 258, 111, 404, 3, GET, /spt/images/bk1.jpg, -,
128.200.39.17, -, 3/22/00, 20:56:33, W3SVC, SRVR1, 128.200.39.181, 0, 262, 72, 304, 0, GET, /top.html, -,
128.200.39.17, -, 3/22/00, 20:56:52, W3SVC, SRVR1, 128.200.39.181, 19598, 382, 414, 200, 0, POST, /spt/main.html, -,

User 1 2 3 2 2 3 3 3 1 1 1 3 1 3 3 3 3
User 2 3 3 3 1 1 1
User 3 7 7 7 7 7 7 7 7
User 4 1 5 1 1 1 5 1 5 1 1 1 1 1 1
User 5 5 1 1 5
… …
24
Application: social network analysis

HP Labs email data


500 users, 20k connections
evolving over time

25
Application: spam filtering

 200 billion spam messages sent per day


 Asymmetric cost of false positive/false negative
 Weak label: discarded without reading
 Strong label (“this is spam”) hard to come by
 Standard iid assumption violated: spammers alter
spam generators to evade or subvert spam filters
(“adversarial learning” task)

26
Machine Learning

data mining: machine learning applied to “databases”,


i.e. collections of data

inference and/or estimation in statistics

pattern recognition in engineering

signal processing in electrical engineering

induction

optimization
27
Related Disciplines

decision game
theory theory
AI control
theory
information
biological theory
evolution
Machine
probability Learning
& philosophy
statistics
optimization
Data Mining statistical psychology
mechanics

computational
complexity
theory neurophysiology

28
History of Machine Learning

 1960’s and 70’s: Models of human learning


– High-level symbolic descriptions of knowledge, e.g., logical expressions
or graphs/networks, e.g., (Karpinski & Michalski, 1966) (Simon & Lea,
1974).
– Winston’s (1975) structural learning system learned logic-based
structural descriptions from examples.

 Minsky Paper, 1969


 1970’s: Genetic algorithms
– Developed by Holland (1975)
 1970’s - present: Knowledge-intensive learning
– A tabula rasa approach typically fares poorly. “To acquire new
knowledge a system must already possess a great deal of initial
knowledge.” Lenat’s CYC project is a good example. 29
History of Machine Learning (cont’d)

 1970’s - present: Alternative modes of learning (besides


examples)
– Learning from instruction, e.g., (Mostow, 1983) (Gordon &
Subramanian, 1993)
– Learning by analogy, e.g., (Veloso, 1990)
– Learning from cases, e.g., (Aha, 1991)
– Discovery (Lenat, 1977)
– 1991: The first of a series of workshops on Multistrategy Learning
(Michalski)
 1970’s – present: Meta-learning
– Heuristics for focusing attention, e.g., (Gordon & Subramanian, 1996)
– Active selection of examples for learning, e.g., (Angluin, 1987),
(Gasarch & Smith, 1988), (Gordon, 1991)
– Learning how to learn, e.g., (Schmidhuber, 1996) 30
History of Machine Learning (cont’d)

 1980 – The First Machine Learning Workshop was held at Carnegie-Mellon


University in Pittsburgh.
 1980 – Three consecutive issues of the International Journal of Policy
Analysis and Information Systems were specially devoted to machine
learning.
 1981 - Hinton, Jordan, Sejnowski, Rumelhart, McLeland at UCSD
– Back Propagation alg. PDP Book
 1986 – The establishment of the Machine Learning journal.
 1987 – The beginning of annual international conferences on machine
learning (ICML). Snowbird ML conference
 1988 – The beginning of regular workshops on computational learning theory
(COLT).
 1990’s – Explosive growth in the field of data mining, which involves the
application of machine learning techniques. 31
Bottom line from History

 1960 – The Perceptron (Minsky Papert)

 1960 – “Bellman Curse of Dimensionality”

 1980 – Bounds on statistical estimators (C. Stone)

 1990 – Beginning of high dimensional data (Hundreds


variables)

 2000 – High dimensional data (Thousands variables)

32
A Glimpse in to the future

 Today status:
– Second-generation algorithms:
– Deep Neural nets, classifiers, regression etc.

 Future:
– Smart remote controls, phones, cars
– Smart Surveillance
– Data and communication networks, software

33
Machine learning problems

What high-level machine learning problems have


you seen or heard of before?

34
Data

examples

Data

35
Types of Learning

 Supervised (inductive) learning


– Training data includes desired outputs
 Unsupervised learning
– Training data does not include desired outputs
 Semi-supervised learning
– Training data includes a few desired outputs
 Reinforcement learning
– Rewards from sequence of actions

36
Supervised vs. unsupervised Learning

 Supervised learning: classification is seen as


supervised learning from examples.
– Supervision: The data (observations,
measurements, etc.) are labeled with pre-
defined classes. It is like that a “teacher”
gives the classes (supervision).
– Test data are classified into these classes
too.
 Unsupervised learning (clustering)
– Class labels of the data are unknown
– Given a set of data, the task is to establish
the existence of classes or clusters in the
37
data
Supervised learning process: two steps

 Learning (training): Learn a model using


the training data
 Testing: Test the model using unseen test
data to assess the model accuracy
Number of correct classifica tions
Accuracy  ,
Total number of test cases

38
Supervised Learning

 Problems:
– Classification
The domain of the target attribute is finite and
categorical.
A classifier must assign a class to a unseen
example.
– Regression
The target attribute is formed by infinite values.
To fit a model to learn the output target attribute as
a function of input attributes.
– Time Series Analysis
Making predictions in time. 39
Unsupervised Learning (Clustering)
 Finding groups of objects in data such that the
objects in a group will be similar (or related) to
one another and different from (or unrelated to)
the objects in other groups
Inter-cluster
Intra-cluster distances are
distances are maximized
minimized

40
Unsupervised Learning (cont.)

• UL is often used to discover underlying structure in data;


it is very similar to an area also called cluster analysis

• Animals (brains) mainly learn by a form of unsupervised


learning in their formative years.

• The main product of an unsupervised learning method is


some kind of structuring of the data that has been fed in.

• This structuring then may become useful for:


• Prediction
• Mapping
• Visualisation

41
Unsupervised Learning

 Problems:
– Clustering
– Association Rules
– Pattern Mining
It is adopted as amore general term than frequent
pattern mining or association mining.
– Outlier Detection
Ot is the process of finding data examples with
behaviours that are very different from the
expectation (outliers or anomalies).

42
Other Learning Paradigms

 Imbalanced Learning
– A classification problem where the data has
exceptional distribution on the target attribute.
– The number of examples representing the
class of interest is much lower than that of the
other classes.
 Multi-instance Learning
– imposed restrictions on models in which each
example consists of a bag of instances
instead of an unique instance.

43
Other Learning Paradigms

 Multi-label Classification
– Each instance is associated not with a class,
but instead with a subset of them.
 Semi-supervised Learning
– It is concerned with the design of models in
the presence of both labeled and unlabeled
data.
– Semi-supervised classification and Semi-
supervised clustering.
– Relationship with Active Learning.

44
Other Learning Paradigms

 Subgroup Discovery
– It is formed as the result of the hybridization
between classification and association
mining.
– They aim to extract interesting rules with
respect to a target attribute.
 Transfer Learning
– Aims to extract the knowledge from one or
more source tasks and apply the knowledge
to a target task.
– The so-called data shift problem is closely
related. 45
Other Learning Paradigms

 Data Stream Learning


– When all data is not available at a specific
moment, it is necessary to develop learning
algorithms that treat the input as a continuous
data stream.
– Each instance can be inspected only once
and must then be discarded to make room for
subsequent instances.

46
Issues in Machine Learning

 What algorithms can approximate functions well


and when
– How does the number of training examples influence
accuracy
 Problem representation / feature extraction
 Intention/independent learning
 Integrating learning with systems
 What are the theoretical limits of learnability
 Transfer learning
 Continuous learning

47
Measuring Performance

 Generalization accuracy
 Solution correctness
 Solution quality (length, efficiency)
 Speed of performance

48
Scaling issues in ML

 Number of
– Inputs
– Outputs
– Batch vs realtime
– Training vs testing

49
Machine Learning versus Human Learning

• Some ML behavior can challenge the performance of


human experts (e.g., playing chess)
• Although ML sometimes matches human learning
capabilities, it is not able to learn as well as humans
or in the same way that humans do
• There is no claim that machine learning can be
applied in a truly creative way
• Formal theories of ML systems exist but are often
lacking (why a method succeeds or fails is not clear)
• ML success is often attributed to manipulation of
symbols (rather than mere numeric information)

50
Observations
 ML has many practical applications and is
probably the most used method in AI.
 ML is also an active research area
 Role of cognitive science
• Computational model of cognition
 Role of neuroscience
• Computational model of the brain
Neural networks
 Brain vs mind; hardware vs software
 Nearly all ML is still dependent on human
“guidance”

51
Natural Questions

 How does ML affect information science?


 Natural vs artificial learning – which is
better?
 Is ML needed in all problems?
 What are the future directions of ML?

52
Questions ?

53
Thanks

54

You might also like