01 Introduction
01 Introduction
Learning
Machine Learning
About the Instructor
• Assistant Professor, LGU
• PhD, Jiangsu University China- 2019
Affiliations:
Lahore Garrison University
Collaborations: Princeton, UCL, University of Edinburgh, EPFL, ANU, KAUST
PhD Students: 2
MS Students: 22 (18 Graduated)
Course Objectives:
• To provide a thorough introduction to ML methods
• To build mathematical foundations of ML and provide an appreciation for its
applications
• To provide experience in the implementation and evaluation of ML
algorithms
• To develop research interest in the theory and application of ML
Learning Interface
Communication:
Slack: Course-related questions or discussions. We will try to respond to the queries ASAP.
Office Hours: Posted on course page; distributed throughout the week
Email Policy:
Subject:
- ‘ML-URGENT-Assignment Clarification’
- ‘ML-NOT URGENT-Extend Assignment deadline’
Please do not email to verify whether we have received your submission via LMS or
the submission is late due to last-minute connectivity issues.
Grading Distribution
• Programming Assignments and Homeworks:
- 5 Programming Assignments
- 3 Homeworks
• Quizzes: 15% (Almost every week)
• Project: 10%
• Mid/Final Exam: 75%
Course Polices
• Homework Late Policy
- 10% per day for 3 days. No submission after 3 days (72 hours)
• Plagiarism will be strictly dealt with as per university policies (take it seriously).
• Re-grading can be requested after grade reporting, within the following time limits:
- HW and Assignments: 2 days
- Final Exam: 3 days
Course Polices
Harassment Policy
Harassment of any kind is unacceptable, whether it be sexual harassment, online harassment, bullying,
coercion, stalking, verbal or physical abuse of any kind. Harassment is a very broad term; it includes both direct
and indirect behaviour, it may be physical or psychological in nature, it may be perpetrated online or offline, on
campus and off campus. It may be one offense, or it may comprise of several incidents which together amount
to sexual harassment. It may include overt requests for sexual favours but can also constitute verbal or written
communication of a loaded nature. Further details of what may constitute harassment may be found in the
LGU Sexual Harassment Policy, which is available as part of the university code of conduct.
To
Course Polices
Help related to equity and Belonging at SSE
SSE’s Council on Equity and Belonging is committed to devising ways to provide a safe, inclusive, and respectful
learning, living, and working environment for its students, faculty, and staff.
For help related to any such issue, please feel free to write to any member of the school council for help or
feedback.
For matters relating to counselling, kindly email Student affair for more information.
You are welcome to write to me or speak to me if you find that your mental health is impacting your ability to
participate in the course. However, should you choose not to do so, please contact the Counselling Unit and
speak to a counsellor or speak to the OSA team and ask them to write to me so that any necessary
accommodations can be made.
Modules
Weeks: 1,2
Components:
• Programming Assignment 1: Intro to Python, Setting up Environment
Modules
Classification
KNN
Evaluation Metrics, Curse of Dimensionality
2 - Classification Multi-class Classification
Weeks: 3,4
Components:
• Programming Assignment 2: KNN based (Using Images)
• Homework 1A
Modules
Linear Regression
Gradient Descent
Multi-variate Regression
3 - Regression
Polynomial Regression
Weeks: 4,5
Components:
• Programming Assignment 3: Regression
• Homework 1B
Modules
Logistic Regression
4 - Logistic
Regression
Weeks: 6
Components:
• Programming Assignment 4: Logistic Regression
Modules
Bayes Theorem
Naive Bayes Classification
5 – Bayesian
Framework
Weeks: 7,8
Components:
•Programming Assignment 5: Naïve Bayes
Classifier (may be merged with Assignment 4)
• Homework 2
Modules
Perceptron Algorithm
SVM
Neural Networks
6 – Perceptron,
SVM and Neural
Network
Weeks: 9,10,11,12
Components:
• Programming Assignment 6: Neural Networks
• Homework 3
Modules
Unsupervised Learning Overview
Clustering (k-means)
7 – Clustering
Weeks: 13,14
Components:
• Homework 3
Modules
Feature Engineering, Dimensionality Reduction
8 – Further
Topics
Suggested Reference
Books
Data
Computer Output
Program
Machine Learning
Data
Computer Progra
m
Output 4
When Do We Use Machine
Learning?
ML is used when:
• Human expertise does not exist (navigating on Mars)
• Humans can’t explain their expertise (speech recognition)
• Models must be customized (personalized medicine)
• Models are based on huge amounts of data (genomics)
• Recognizing patterns:
– Facial identities or facial expressions
– Handwritten or spoken words
– Medical images
• Generating patterns:
– Generating images or motion sequences
• Recognizing anomalies:
– Unusual credit card transactions
– Unusual patterns of sensor readings in a
nuclear power plant
• Prediction: 7
Sample Applications
• Web search
• Computational biology
• Finance
• E-commerce
• Space exploration
• Robotics
• Information extraction
• Social networks
• Debugging software
• [Your favorite area]
8
Samuel’s Checkers-Player
“Machine Learning: Field of study that gives
computers the ability to learn without being
explicitly programmed.” -Arthur Samuel (1959)
9
Defining the Learning Task
10
Improve on task T, with respect to
performance metric P, based on experience E
T: Playing checkers
P: Percentage of games won against an arbitrary
opponent E: Playing practice games against itself
13
Autonomous Car Technology
Path
Planning
Sebastian
Stanley
15
Deep Belief Net on Face Images
object models
object parts
(combination
of edges)
edges
pixels
Based on materials 16
by Andrew Ng
Learning of Object Parts
17
Trained on 4 classes
(cars, faces, motorbikes,
airplanes).
Second layer: Shared-
features and object-
specific features.
Third layer:
More specific
features.
Input images
Samples from
feedforward
Inference
(control)
Samples from
Full posterior
inference
20
Slide credit: Andrew Ng
Machine Learning in
Automatic Speech Recognition
A Typical Speech Recognition System
22
Slide credit: Li Deng, MS Research
Types of Learning
23
Types of Learning
24
Based on slide by Pedro Domingos
Supervised Learning: Regression
• Given (x 1 , y1), (x 2 , y2), ..., (x n , yn)
• Learn a function f(x) to predict y given x
– y is real-valued == regression
9
8
September Arctic Sea Ice Extent
7
(1,000,000 sq km)
6
5
4
3
2
1
0
1970 1990 2000 2010 2020
1980 Year
26
Data from G. Witt. Journal of Statistics Education, Volume 21,
Supervised Learning: Classification
• Given (x 1 , y1), (x 2 , y2), ..., (x n , yn)
• Learn a function f(x) to predict y given x
– y is categorical == classification
Breast Cancer (Malignant / Benign)
1(Malignant)
0(Benign)
Tumor Size
27
Based on example by Andrew Ng
Supervised Learning: Classification
28
• Given (x 1 , y1), (x 2 , y2), ..., (x n , yn)
• Learn a function f(x) to predict y given x
– y is categorical == classification
Breast Cancer (Malignant / Benign)
1(Malignant)
0(Benign)
Tumor Size
1(Malignant)
0(Benign)
Tumor Size
Predict Benign Predict Malignant
- Clump Thickness
- Uniformity of Cell Size
Age - Uniformity of Cell Shape
…
Tumor Size
30
Based on example by Andrew Ng
Unsupervised Learning
• Given x 1 , x 2 , ..., x n (without labels)
• Output hidden structure behind the x’s
– E.g., clustering
31
Unsupervised Learning
Genomics application: group individuals by genetic similarity
Genes
Individuals 32
[Source: Daphne Koller]
Unsupervised Learning
... st rt +1 rt +2 rt +3 ...
at st +1 st +2 st +3
at +1 at +2 at +3
Reinforcement Learning
https://www.youtube.com/watch?v=4cgWya-wjgY 38
Inverse Reinforcement Learning
39
• Learn policy from user demonstrations
Environment/
Experience Knowledge
Testing data
Performanc
e Element 41
Training vs. Test Distribution
• We generally assume that the training and
test examples are independently drawn from
the same overall distribution of data
– We call this “i.i.d” which stands for “independent
and identically distributed”
43
Various Function Representations
44
• Numerical functions
– Linear regression
– Neural networks
– Support vector machines
• Symbolic functions
– Decision trees
– Rules in propositional logic
– Rules in first-order predicate logic
• Instance-based functions
– Nearest-neighbor
– Case-based
• Probabilistic Graphical Models
– Naïve Bayes
– Bayesian networks
– Hidden-Markov Models (HMMs)
– Probabilistic Context Free Grammars (PCFGs)
– Markov networks
Various Search/Optimization
Algorithms 45
• Gradient descent
– Perceptron
– Backpropagation
• Dynamic Programming
– HMM Learning
– PCFG Learning
• Divide and Conquer
– Decision tree induction
– Rule learning
• Evolutionary Computation
– Genetic Algorithms (GAs)
– Genetic Programming (GP)
– Neuro-evolution
Evaluation
• Accuracy
• Precision and recall
• Squared error
• Likelihood
• Posterior probability
• Cost / Utility
• Margin
• Entropy
• K-L divergence
• etc.
47
ML in Practice
48
Lessons Learned about Learning
• Learning can be viewed as using direct or indirect
experience to approximate a chosen target function.
49
A Brief History of
Machine Learning
50
History of Machine Learning
51
• 1950s
– Samuel’s checker player
– Selfridge’s Pandemonium
• 1960s:
– Neural networks: Perceptron
– Pattern recognition
– Learning in the limit theory
– Minsky and Papert prove limitations of Perceptron
• 1970s:
– Symbolic concept induction
– Winston’s arch learner
– Expert systems and the knowledge acquisition bottleneck
– Quinlan’s ID3
– Michalski’s AQ and soybean diagnosis
– Scientific discovery with BACON
– Mathematical discovery with AM