0% found this document useful (0 votes)
37 views

01 Introduction

The document provides an introduction to machine learning including definitions, examples of applications, and the components of defining a learning task. It also briefly outlines some state of the art machine learning applications like autonomous vehicles.

Uploaded by

ghania azhar
Copyright
© © All Rights Reserved
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
37 views

01 Introduction

The document provides an introduction to machine learning including definitions, examples of applications, and the components of defining a learning task. It also briefly outlines some state of the art machine learning applications like autonomous vehicles.

Uploaded by

ghania azhar
Copyright
© © All Rights Reserved
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
You are on page 1/ 68

Introduction to Machin

Learning
Machine Learning
About the Instructor
• Assistant Professor, LGU
• PhD, Jiangsu University China- 2019

Affiliations:
Lahore Garrison University
Collaborations: Princeton, UCL, University of Edinburgh, EPFL, ANU, KAUST

PhD Students: 2
MS Students: 22 (18 Graduated)

Publications: More than 30 (Journals, and Conference proceedings)


What is this course
about?
Introductory course in Machine Learning (ML) – Fundamental topics in
- Supervised learning
- Unsupervised learning

Course Objectives:
• To provide a thorough introduction to ML methods
• To build mathematical foundations of ML and provide an appreciation for its
applications
• To provide experience in the implementation and evaluation of ML
algorithms
• To develop research interest in the theory and application of ML
Learning Interface
Communication:
Slack: Course-related questions or discussions. We will try to respond to the queries ASAP.
Office Hours: Posted on course page; distributed throughout the week

Email Policy:
Subject:
- ‘ML-URGENT-Assignment Clarification’
- ‘ML-NOT URGENT-Extend Assignment deadline’

Please do not email to verify whether we have received your submission via LMS or
the submission is late due to last-minute connectivity issues.
Grading Distribution
• Programming Assignments and Homeworks:
- 5 Programming Assignments
- 3 Homeworks
• Quizzes: 15% (Almost every week)
• Project: 10%
• Mid/Final Exam: 75%
Course Polices
• Homework Late Policy
- 10% per day for 3 days. No submission after 3 days (72 hours)

• Missed Quiz Policy


- No make-up for quiz

• Plagiarism will be strictly dealt with as per university policies (take it seriously).

• Zero Tolerance for Plagiarism and Cheating

• Re-grading can be requested after grade reporting, within the following time limits:
- HW and Assignments: 2 days
- Final Exam: 3 days
Course Polices
Harassment Policy

Harassment of any kind is unacceptable, whether it be sexual harassment, online harassment, bullying,
coercion, stalking, verbal or physical abuse of any kind. Harassment is a very broad term; it includes both direct
and indirect behaviour, it may be physical or psychological in nature, it may be perpetrated online or offline, on
campus and off campus. It may be one offense, or it may comprise of several incidents which together amount
to sexual harassment. It may include overt requests for sexual favours but can also constitute verbal or written
communication of a loaded nature. Further details of what may constitute harassment may be found in the
LGU Sexual Harassment Policy, which is available as part of the university code of conduct.

To
Course Polices
Help related to equity and Belonging at SSE

SSE’s Council on Equity and Belonging is committed to devising ways to provide a safe, inclusive, and respectful
learning, living, and working environment for its students, faculty, and staff.
For help related to any such issue, please feel free to write to any member of the school council for help or
feedback.

Mental Health Support at LUMS

For matters relating to counselling, kindly email Student affair for more information.

You are welcome to write to me or speak to me if you find that your mental health is impacting your ability to
participate in the course. However, should you choose not to do so, please contact the Counselling Unit and
speak to a counsellor or speak to the OSA team and ask them to write to me so that any necessary
accommodations can be made.
Modules

Course Overview, notation


1- ML Overview Supervised Learning Setup

Weeks: 1,2

Components:
• Programming Assignment 1: Intro to Python, Setting up Environment
Modules
Classification
KNN
Evaluation Metrics, Curse of Dimensionality
2 - Classification Multi-class Classification

Weeks: 3,4

Components:
• Programming Assignment 2: KNN based (Using Images)
• Homework 1A
Modules
Linear Regression
Gradient Descent
Multi-variate Regression
3 - Regression
Polynomial Regression

Bias-Variance Trade-off, Regularization

Weeks: 4,5

Components:
• Programming Assignment 3: Regression
• Homework 1B
Modules
Logistic Regression

4 - Logistic
Regression

Weeks: 6

Components:
• Programming Assignment 4: Logistic Regression
Modules
Bayes Theorem
Naive Bayes Classification

5 – Bayesian
Framework

Weeks: 7,8

Components:
•Programming Assignment 5: Naïve Bayes
Classifier (may be merged with Assignment 4)
• Homework 2
Modules
Perceptron Algorithm
SVM
Neural Networks
6 – Perceptron,
SVM and Neural
Network

Weeks: 9,10,11,12

Components:
• Programming Assignment 6: Neural Networks
• Homework 3
Modules
Unsupervised Learning Overview
Clustering (k-means)

7 – Clustering

Weeks: 13,14

Components:
• Homework 3
Modules
Feature Engineering, Dimensionality Reduction

Kernel Methods and Gaussian Process

8 – Further
Topics
Suggested Reference
Books

• (CB) Pattern Recognition and Machine Learning, Christopher M. Bishop


• (KM) Machine Learning: a Probabilistic Perspective, Kevin Murphy
• (TM) Machine Learning, Tom Mitchell
• (HTF) The Elements of Statistical Learning: Data mining, Inference, and
Prediction, by Hastie, Tibshirani, Friedman
• (DM) Information Theory, Inference, and Learning Algorithms, David
Mackay
• Lecture Notes/Slides will be shared.
What is Machine Learning
“Learning is any process by which a system improves
performance from experience.”
- Herbert Simon

Definition by Tom Mitchell (1998):


Machine Learning is the study of algorithms that
• improve their performance P
• at some task T
• with experience E.
A well-defined learning task is given by <P, T, E>.
3
Traditional Programming

Data
Computer Output

Program
Machine Learning

Data
Computer Progra
m
Output 4
When Do We Use Machine
Learning?
ML is used when:
• Human expertise does not exist (navigating on Mars)
• Humans can’t explain their expertise (speech recognition)
• Models must be customized (personalized medicine)
• Models are based on huge amounts of data (genomics)

Learning isn’t always useful:


• There is no need to “learn” to calculate payroll
5
Some more examples of tasks that are best
solved by using a learning algorithm

• Recognizing patterns:
– Facial identities or facial expressions
– Handwritten or spoken words
– Medical images
• Generating patterns:
– Generating images or motion sequences
• Recognizing anomalies:
– Unusual credit card transactions
– Unusual patterns of sensor readings in a
nuclear power plant
• Prediction: 7
Sample Applications
• Web search
• Computational biology
• Finance
• E-commerce
• Space exploration
• Robotics
• Information extraction
• Social networks
• Debugging software
• [Your favorite area]

8
Samuel’s Checkers-Player
“Machine Learning: Field of study that gives
computers the ability to learn without being
explicitly programmed.” -Arthur Samuel (1959)

9
Defining the Learning Task
10
Improve on task T, with respect to
performance metric P, based on experience E
T: Playing checkers
P: Percentage of games won against an arbitrary
opponent E: Playing practice games against itself

T: Recognizing hand-written words


P: Percentage of words correctly classified
E: Database of human-labeled images of
handwritten words

T: Driving on four-lane highways using vision


sensors
P: Average distance traveled before a human-
judged error
E: A sequence of images and steering commands recorded while
observing a human driver.

T: Categorize email messages as spam or legitimate.


P: Percentage of email messages correctly classified.
11

State of the Art Applications of


Machine Learning
Autonomous Cars

• Nevada made it legal for


autonomous cars to drive on
roads in June 2011
• As of 2013, four states (Nevada,
Florida, California, and
Michigan) have legalized
autonomous cars
Penn’s Autonomous
12
Car 
Autonomous Car Sensors

13
Autonomous Car Technology
Path

Planning

Laser Terrain Mapping

Learning from Human Drivers


Adaptive Vision

Sebastian

Stanley

Images and movies taken from Sebastian Thrun’s multimedia w1e4bsite.


Deep Learning in the Headlines

15
Deep Belief Net on Face Images
object models

object parts
(combination
of edges)

edges

pixels
Based on materials 16
by Andrew Ng
Learning of Object Parts
17

Slide credit: Andrew Ng


Training on Multiple Objects
18

 Trained on 4 classes
(cars, faces, motorbikes,
airplanes).
 Second layer: Shared-
features and object-
specific features.
 Third layer:
More specific
features.

Slide credit: Andrew Ng


Scene Labeling via Deep Learning

[Farabet et al. ICML 2012, PAMI 2013] 19


Inference from Deep Learned Models
Generating posterior samples from faces by “filling in” experiments
(cf. Lee and Mumford, 2003). Combine bottom-up and top-down inference.

Input images

Samples from
feedforward
Inference
(control)

Samples from
Full posterior
inference

20
Slide credit: Andrew Ng
Machine Learning in
Automatic Speech Recognition
A Typical Speech Recognition System

ML used to predict of phone states from the sound spectrogram

Deep learning has state-of-the-art results


# Hidden Layers 1 2 4 8 10 12

Word Error Rate % 16.0 12.8 11.4 10.9 11.0 11.1

Baseline GMM performance = 15.4%


[Zeiler et al. “On rectified linear units for speech
recognition” ICASSP 2013]
2
1
Impact of Deep Learning in Speech Technology

22
Slide credit: Li Deng, MS Research
Types of Learning

23
Types of Learning

• Supervised (inductive) learning


– Given: training data + desired outputs (labels)
• Unsupervised learning
– Given: training data (without desired outputs)
• Semi-supervised learning
– Given: training data + a few desired outputs
• Reinforcement learning
– Rewards from sequence of actions

24
Based on slide by Pedro Domingos
Supervised Learning: Regression
• Given (x 1 , y1), (x 2 , y2), ..., (x n , yn)
• Learn a function f(x) to predict y given x
– y is real-valued == regression
9
8
September Arctic Sea Ice Extent

7
(1,000,000 sq km)

6
5
4
3
2
1
0
1970 1990 2000 2010 2020
1980 Year
26
Data from G. Witt. Journal of Statistics Education, Volume 21,
Supervised Learning: Classification
• Given (x 1 , y1), (x 2 , y2), ..., (x n , yn)
• Learn a function f(x) to predict y given x
– y is categorical == classification
Breast Cancer (Malignant / Benign)

1(Malignant)

0(Benign)
Tumor Size

27
Based on example by Andrew Ng
Supervised Learning: Classification
28
• Given (x 1 , y1), (x 2 , y2), ..., (x n , yn)
• Learn a function f(x) to predict y given x
– y is categorical == classification
Breast Cancer (Malignant / Benign)

1(Malignant)

0(Benign)
Tumor Size

Based on example by Andrew Ng


Tumor Size
Supervised Learning: Classification
29
• Given (x 1 , y1), (x 2 , y2), ..., (x n , yn)
• Learn a function f(x) to predict y given x
– y is categorical == classification
Breast Cancer (Malignant / Benign)

1(Malignant)

0(Benign)
Tumor Size
Predict Benign Predict Malignant

Based on example by Andrew Ng


Tumor Size
Supervised Learning
• x can be multi-dimensional
– Each dimension corresponds to an attribute

- Clump Thickness
- Uniformity of Cell Size
Age - Uniformity of Cell Shape

Tumor Size

30
Based on example by Andrew Ng
Unsupervised Learning
• Given x 1 , x 2 , ..., x n (without labels)
• Output hidden structure behind the x’s
– E.g., clustering

31
Unsupervised Learning
Genomics application: group individuals by genetic similarity
Genes

Individuals 32
[Source: Daphne Koller]
Unsupervised Learning

Organize computing clusters Social network analysis

Image credit: NASA/JPL-Caltech/E. Churchwell (Univ. of Wisconsin, Madison)

Market segmentation Astronomical data analysis 33


Slide credit: Andrew Ng
Unsupervised Learning 34
• Independent component analysis – separate a
combined signal into its original sources

Image credit: statsoft.com Audio from http://www.ism.ac.jp/~shiro/research/blindsep.html


Unsupervised Learning 35
• Independent component analysis – separate a
combined signal into its original sources

Image credit: statsoft.com Audio from http://www.ism.ac.jp/~shiro/research/blindsep.html


Reinforcement Learning
36
• Given a sequence of states and actions with
(delayed) rewards, output a policy
– Policy is a mapping from states  actions that
tells you what to do in a given state
• Examples:
– Credit assignment problem
– Game playing
– Robot in a maze
– Balance a pole on your hand
The Agent-Environment Interface
37

Agent and environment interact at discrete time : t  0, 1, 2,


steps Agent observes state at step t: K
t S
sproduces action at step t : at  A(st )
gets resulting reward : rt1 
and resulting next state :
st 1

... st rt +1 rt +2 rt +3 ...
at st +1 st +2 st +3
at +1 at +2 at +3
Reinforcement Learning

https://www.youtube.com/watch?v=4cgWya-wjgY 38
Inverse Reinforcement Learning
39
• Learn policy from user demonstrations

Stanford Autonomous Helicopter


40

Framing a Learning Problem


Designing a Learning System
• Choose the training experience
• Choose exactly what is to be learned
– i.e. the target function
• Choose how to represent the target function
• Choose a learning algorithm to infer the target
function from the experience

Training data Learner

Environment/
Experience Knowledge

Testing data
Performanc
e Element 41
Training vs. Test Distribution
• We generally assume that the training and
test examples are independently drawn from
the same overall distribution of data
– We call this “i.i.d” which stands for “independent
and identically distributed”

• If examples are not independent, requires


collective classification
• If test distribution is different, requires
transfer learning
42
ML in a Nutshell
• Tens of thousands of machine learning
algorithms
– Hundreds new every year

• Every ML algorithm has three


components:
– Representation
– Optimization
– Evaluation

43
Various Function Representations
44
• Numerical functions
– Linear regression
– Neural networks
– Support vector machines
• Symbolic functions
– Decision trees
– Rules in propositional logic
– Rules in first-order predicate logic
• Instance-based functions
– Nearest-neighbor
– Case-based
• Probabilistic Graphical Models
– Naïve Bayes
– Bayesian networks
– Hidden-Markov Models (HMMs)
– Probabilistic Context Free Grammars (PCFGs)
– Markov networks
Various Search/Optimization
Algorithms 45
• Gradient descent
– Perceptron
– Backpropagation
• Dynamic Programming
– HMM Learning
– PCFG Learning
• Divide and Conquer
– Decision tree induction
– Rule learning
• Evolutionary Computation
– Genetic Algorithms (GAs)
– Genetic Programming (GP)
– Neuro-evolution
Evaluation
• Accuracy
• Precision and recall
• Squared error
• Likelihood
• Posterior probability
• Cost / Utility
• Margin
• Entropy
• K-L divergence
• etc.

47
ML in Practice

• Understand domain, prior knowledge, and goals


• Data integration, selection, cleaning, pre-processing, etc.
Loop • Learn models
• Interpret results
• Consolidate and deploy discovered knowledge

48
Lessons Learned about Learning
• Learning can be viewed as using direct or indirect
experience to approximate a chosen target function.

• Function approximation can be viewed as a search


through a space of hypotheses (representations of
functions) for one that best fits a set of training data.

• Different learning methods assume different


hypothesis spaces (representation languages) and/or
employ different search techniques.

49
A Brief History of
Machine Learning

50
History of Machine Learning
51
• 1950s
– Samuel’s checker player
– Selfridge’s Pandemonium
• 1960s:
– Neural networks: Perceptron
– Pattern recognition
– Learning in the limit theory
– Minsky and Papert prove limitations of Perceptron
• 1970s:
– Symbolic concept induction
– Winston’s arch learner
– Expert systems and the knowledge acquisition bottleneck
– Quinlan’s ID3
– Michalski’s AQ and soybean diagnosis
– Scientific discovery with BACON
– Mathematical discovery with AM

Slide credit: Ray Mooney


History of Machine Learning (cont.)
52
• 1980s:
– Advanced decision tree and rule learning
– Explanation-based Learning (EBL)
– Learning and planning and problem solving
– Utility problem
– Analogy
– Cognitive architectures
– Resurgence of neural networks (connectionism, backpropagation)
– Valiant’s PAC Learning Theory
– Focus on experimental methodology
• 1990s
– Data mining
– Adaptive software agents and web applications
– Text learning
– Reinforcement learning (RL)
– Inductive Logic Programming (ILP)
– Ensembles: Bagging, Boosting, and Stacking
– Bayes Net learning
History of Machine Learning (cont.)
53
• 2000s
– Support vector machines & kernel methods
– Graphical models
– Statistical relational learning
– Transfer learning
– Sequence labeling
– Collective classification and structured outputs
– Computer Systems Applications (Compilers, Debugging, Graphics, Security)
– E-mail management
– Personalized assistants that learn
– Learning in robotics and vision
• 2010s
– Deep learning systems
– Learning for big data
– Bayesian methods
– Multi-task & lifelong learning
– Applications to vision, speech, social networks, learning to read, etc.
– ???
What We’ll Cover in this Course
54

• Supervised learning • Unsupervised learning


– Decision tree induction – Clustering
– Linear regression – Dimensionality reduction
– Logistic regression • Reinforcement learning
– Support vector machines – Temporal difference
& kernel methods learning
– Model ensembles – Q learning
– Bayesian learning • Evaluation
– Neural networks & deep
learning • Applications
– Learning theory

Our focus will be on applying machine learning to real applications

You might also like