0% found this document useful (0 votes)
81 views77 pages

Lecture Notes 01

This lecture introduces the course on deep learning. It outlines the course objectives of learning how to write, debug, and train neural networks and understand key deep learning concepts and state-of-the-art research topics. The lecture also reviews the syllabus, schedule, grading policy, and prerequisites. It provides an introduction to deep learning, including using deep neural networks as a data-driven approach to build intelligent algorithms that can make sense of data like images and language.

Uploaded by

zhao linger
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
81 views77 pages

Lecture Notes 01

This lecture introduces the course on deep learning. It outlines the course objectives of learning how to write, debug, and train neural networks and understand key deep learning concepts and state-of-the-art research topics. The lecture also reviews the syllabus, schedule, grading policy, and prerequisites. It provides an introduction to deep learning, including using deep neural networks as a data-driven approach to build intelligent algorithms that can make sense of data like images and language.

Uploaded by

zhao linger
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 77

Lecture 1: Introduction

Xuming He
SIST, ShanghaiTech
Fall, 2020

9/7/2020 Xuming He – CS 280 Deep Learning 1


Outline
 Course logistics
 Overall objective

 Grading policy

 Pre-requisite / Syllabus

 Introduction to deep learning


 Machine learning review
 Artificial neurons

9/7/2020 Xuming He – CS 280 Deep Learning 2


Course objectives
 Learning to use deep networks
 How to write from scratch, debug and train neural networks

 Toolboxes commonly used in practice

 Understanding deep models


 Key concepts and principles

 State of the art


 Some new topics from research field

 Focusing on vision-related problems

9/7/2020 Xuming He – CS 280 Deep Learning 3


Syllabus & Schedule
 Piazza:
 piazza.com/shanghaitech.edu.cn/fall2020/cs280
 The schedule for the latter half of the semester may vary a bit
 Part I: Basic neural networks (1~1.5 weeks by Prof He)
 Linear models
 Multiple layer networks
 Gradient descent and BP
 Part II: Convolutional neural networks
 Part III: Recurrent neural networks
 Part IV: Generative neural networks
 Part V: Advanced Topics

9/7/2020 Xuming He – CS 280 Deep Learning 4


Syllabus & Schedule
 Piazza:
 piazza.com/shanghaitech.edu.cn/fall2020/cs280
 The schedule for the latter half of the semester may vary a bit
 Part I: Basic neural networks (1~1.5 weeks)
 Part II: Convolutional neural networks (4 weeks by Prof He)
 CNN basics
 Understanding CNN
 CNN in Vision
 Part III: Recurrent neural networks
 Part IV: Generative neural networks
 Part V: Advanced Topics

9/7/2020 Xuming He – CS 280 Deep Learning 5


Syllabus & Schedule
 Piazza:
 piazza.com/shanghaitech.edu.cn/fall2020/cs280
 The schedule for the latter half of the semester may vary a bit
 Part I: Basic neural networks (1~1.5 weeks)
 Part II: Convolutional neural networks (4 weeks)
 Part III: Recurrent neural networks (3 weeks by Prof Xu)
 LSTM, GRU
 Attention modeling
 RNN in Vision/NLP
 Transformer and Graph Neural Networks
 Part IV: Generative neural networks
 Part V: Advanced Topics

9/7/2020 Xuming He – CS 280 Deep Learning 6


Syllabus & Schedule
 Piazza:
 piazza.com/shanghaitech.edu.cn/fall2020/cs280
 The schedule for the latter half of the semester may vary a bit
 Part I: Basic neural networks (1~1.5 weeks)
 Part II: Convolutional neural networks (4 weeks)
 Part III: Recurrent neural networks (3 weeks)
 Part IV: Generative neural networks (2 weeks by Prof Xu)
 Variational Auto Encoder (VAE)
 Generative deep nets (GAN)
 Part V: Advanced Topics (2 weeks)
 Note: no lectures in the following weeks
 Nov 9 ~ Nov 16 (CVPR)

9/7/2020 Xuming He – CS 280 Deep Learning 7


Reference books and materials
 Deep learning:
 http://www.deeplearningbook.org/

 https://d2l.ai/

 Online deep learning courses:


 Stanford: CS230, CS231n
 CMU: 11-785
 MIT: 6.S191

 Additional reading materials on Piazza


 Survey papers, tutorials, etc.

9/7/2020 Xuming He – CS 280 Deep Learning 8


Instructor and TAs
 Instructor: Prof Xuming He and Prof Lan Xu
[email protected] ; [email protected]
 SIST 1A-304D ; 1C-203D

 TAs:
 Haozhe Wang, Qiuyue Wang, Guoxing Sun, Yannan He, Quan
Meng, Yinwenqi Jiang

 Office hours: To be announced on Piazza

 We will use Piazza as the main communication platform

9/7/2020 Xuming He – CS 280 Deep Learning 9


Grading policy
 4 Problem sets: 10% x 4 = 40%
 Write-up problem sets + Programming tasks
 Final course project: 40% (+10%)
 Proposal
 Final report (Conference format)
 Presentation
 Bonus points for novel results: 10%
 10 Quizzes (in class): 2% x 10 = 20%
 Late policy
 A total of 7 free late (calendar) days to use, but no more than 4 late days can be
used on any single assignment.
 After that, 25% off per day late
 Does not apply to Final course project/Quizzes
 Collaboration policy
 Project team: 3~5 students
 Grading according to each member’s contribution

9/7/2020 Xuming He – CS 280 Deep Learning 10


Administrative Stuff
 Plagiarism
 All assignments must be done individually
 You may not look at solutions from any other source
 You may not share solutions with any other students
 Plagiarism detection software will be used on all the programming
assignments
 You may discuss together or help another student but you cannot
give the exact solution

 Plagiarism punishment
 When one student copies from another student, both students
are responsible
 Zero point on the assignment or exam in question
 Repeated violation will result in an F grade for this course as well
as further discipline at the school/university level
Pre-requisite
 Proficiency in Python
 All class assignments will be in Python (and use numpy)
 A Python tutorial available on Piazza
 Calculus, Linear Algebra, Probability and Statistics
 Undergrad course level
 Equivalent knowledge of Andrew Ng’s CS229 (Machine
Learning)
 Formulating cost functions
 Taking derivatives
 Performing optimization with gradient descent
 Will be evaluated in next quiz (Wednesday)

9/7/2020 Xuming He – CS 280 Deep Learning 12


Outline
 Course logistics
 Introduction to deep learning
 What & Why deep learning?

 Machine Learning review


 Artificial neurons

Acknowledgement: Bhiksha Raj@CMU’s course notes


9/7/2020 Xuming He – CS 280 Deep Learning 13
Introduction
 Our goal: Build intelligent algorithms to make sense of data
 Example: Recognizing objects in images

red panda (Ailurus fulgens)

 Example: Predicting what would happen next

Vondrick et al. CVPR2016

9/7/2020 Xuming He – CS 280 Deep Learning 14


Introduction
 Our goal: Build intelligent algorithms to make sense of data
 Example: Recognizing objects in images

 Example: Predicting what would happen next

9/7/2020 Xuming He – CS 280 Deep Learning 15


Introduction
 A broad range of real-world applications
 Speech recognition
 Input: sound wave → Output: transcript
 Language translation
 Input: text in language A (Eng) → Output: text in language B (Chs)
 Image classification
 Input: images → Output: image category (cat, dog, car, house, etc.)
 Autonomous driving
 Input: sensory inputs → Output: actions (straight, left, right, stop, etc.)

 Main challenges: difficult to manually design the algorithms

9/7/2020 16
A data-driven approach
 Each task as a mapping function (or a model)
Mapping function
Input data Expected output

 input data: images


 expected output: object or action names

 Building such mapping functions from data

Mapping function

red panda (Ailurus fulgens)


9/7/2020 Xuming He – CS 280 Deep Learning 17
A data-driven approach
 Building a mapping function (model)

 x: input data
 y: expected output
 : parameters to be estimated

 Learning the model from data


 Given a dataset
 Find the ‘best’ parameter , such that

 And it can be generalized to unseen input data

9/7/2020 18
What is deep learning?
 Using deep neural networks as the mapping function

 Model: Deep neural networks


 A family of parametric models
 Consisting of many ‘simple’ computational units
 Constructing a multi-layer representation of input

Image from Jeff Clune’s Deep Learning Overview

9/7/2020 19
What is deep learning?
 Using deep neural networks as the mapping function

 Learning: Parameter estimation from data


 Parameters: connection weights between units
 Formulated as an optimization problem
 Efficient algorithms for handling large-scale models & datasets

9/7/2020 20
Why deep networks?
 Inspiration from visual cortex

9/7/2020 21
Why deep networks?
 A deep architecture can represent certain functions
(exponentially) more compactly
 Learning a rich representation of input data

9/7/2020 22
Recent success with DL
Recent
 Somesuccess with
recent neuralwith
success networks
neural networks
Steel drum
The Im ag e C lassification C halleng e:
1,000 ob ject classes
1,431,167 im ag es

Recent success with neural networks


Recent success with neural network
Russakovsky et al. arX iv, 2014

• Some recent successes with neural networks


Fei-Fei Li & Justin Jo h nso n & S erena Y eung Lecture 1 - 24 4/4/2017

– A bit of hyperbole, but still..

9/7/2020 23
Summary: Why deep learning?
 One of the major thrust areas recently in various pattern
recognition, prediction and data analysis
 Efficient representation of data and computation
 Other key factors: large datasets and hardware

 The state of the art in many problems


 Often exceeding previous benchmarks by large margins
 Achieve better performances than human for certain “complex”
tasks.

 But also somewhat controversial …


 Lack of theoretical understanding
 Sometimes difficult to make it work in practice

9/7/2020 24
Is it alchemy?

9/7/2020 Xuming He – CS 280 Deep Learning 25


Questions to ask
 Understanding neural networks
 What is different from traditional ML methods?

 How it works for specific problems?

 Why get great performance?

 Future development
 Its limitation and weakness?

 After more than 10 years, what is on-going or next?

 The road to general-purpose AI?

9/7/2020 Xuming He – CS 280 Deep Learning 26


Outline
 Course logistics
 Introduction to deep learning
 Machine learning review
 Math review

 Supervised learning

 Artificial neurons

Acknowledgement: Hugo Larochelle’s, Mehryar Mohri@NYU’s & Yingyu


Liang@Princeton’s course notes
9/7/2020 Xuming He – CS 280 Deep Learning 27
Math review – Calculus
 Gradient

9/7/2020 Xuming He – CS 280 Deep Learning 28


Math review – Calculus
 Local and global minima
 Necessary condition

 Sufficient condition
 Hessian is positive definite

9/7/2020 Xuming He – CS 280 Deep Learning 29


Math review – Probability
 Factorization

9/7/2020 Xuming He – CS 280 Deep Learning 30


Math review – Probability
 Common distributions

9/7/2020 Xuming He – CS 280 Deep Learning 31


Math review – Statistics
 Monte Carlo estimation

 Maximum likelihood

 Independent and identically distributed

9/7/2020 Xuming He – CS 280 Deep Learning 32


ML tasks
 Classification: assign a category to each item (e.g.,
document classification)
 Regression: predict a real value for each item (e.g.,
prediction of stock values, economic variables)
 Ranking: order items according to some criterion (e.g.,
relevant web pages returned by a search engine)
 Clustering: partition data into ‘homogenous’ regions
(e.g., analysis of very large data sets)
 Dimensionality reduction: find lower-dimensional
manifold preserving some properties of the data

9/7/2020 Xuming He – CS 280 Deep Learning 33


Standard learning scenarios
 Unsupervised learning: no labeled data
 Supervised learning: uses labeled data for prediction on
unseen points
 Semi-supervised learning: uses labeled and unlabeled
data for prediction on unseen points
 Reinforcement learning: uses reward to learn prediction
on action policies.
 …

9/7/2020 Xuming He – CS 280 Deep Learning 34


Supervised learning
 Task formulation

9/7/2020 Xuming He – CS 280 Deep Learning 35


Supervised learning
 Task formulation

9/7/2020 Xuming He – CS 280 Deep Learning 36


Learning problem
 Problem setup

9/7/2020 Xuming He – CS 280 Deep Learning 37


Learning problem
 Problem setup

9/7/2020 Xuming He – CS 280 Deep Learning 38


Learning problem
 Problem setup

9/7/2020 Xuming He – CS 280 Deep Learning 39


Learning problem
 Problem setup

9/7/2020 Xuming He – CS 280 Deep Learning 40


Learning problem
 Problem setup

9/7/2020 Xuming He – CS 280 Deep Learning 41


Learning problem
 Problem setup

9/7/2020 Xuming He – CS 280 Deep Learning 42


Learning problem
 Problem setup

9/7/2020 Xuming He – CS 280 Deep Learning 43


Learning problem
 Problem setup

9/7/2020 Xuming He – CS 280 Deep Learning 44


Learning problem
 Problem setup

9/7/2020 Xuming He – CS 280 Deep Learning 45


Learning as iterative optimization
 Gradient descent

9/7/2020 Xuming He – CS 280 Deep Learning 46


Learning as iterative optimization
 Stochastic gradient descent (SGD)

9/7/2020 Xuming He – CS 280 Deep Learning 47


Supervised learning pipeline
 Three steps

 Datasets & hyper-parameters


 Hyper-parameter: a parameter of a model that is not trained
(specified before training)

9/7/2020 Xuming He – CS 280 Deep Learning 48


Generalization
 Model selection for better generalization
 Capacity: flexibility of a model
 Underfitting: state of model which could improve generalization
with more training or capacity
 Overfitting: state of model which could improve generalization
with less training or capacity
 Model Selection: process of choosing the best hyper-parameters
on validation set

9/7/2020 Xuming He – CS 280 Deep Learning 49


Generalization
 Training/Validation curves

9/7/2020 Xuming He – CS 280 Deep Learning 50


Questions
 Generalization
 Interaction between training set size/capacity/training time and
training error/generalization error
 If capacity increases:
 Training error will ?
 Generalization error will ?
 If training time increases:
 Training error will ?
 Generalization error will ?
 If training set size increases:
 Generalization error will ?
 Gap between the training and generalization error will ?

9/7/2020 Xuming He – CS 280 Deep Learning 51


Outline
 Course logistics
 Introduction to deep learning
 Machine learning review
 Artificial neurons
 Math model

 Perceptron algorithm

Acknowledgement: Hugo Larochelle’s, Mehryar Mohri@NYU’s & Yingyu


Liang@Princeton’s course notes
9/7/2020 Xuming He – CS 280 Deep Learning 52
Artificial Neuron
 Biological inspiration

https://www.youtube.com/watch?v=m0rHZ_RDdyQ
9/7/2020 Xuming He – CS 280 Deep Learning 53
Artificial Neuron
 Biological inspiration

9/7/2020 Xuming He – CS 280 Deep Learning 54


Mathematical model of a neuron

9/7/2020 55
Activation functions

9/7/2020 Xuming He – CS 280 Deep Learning 56


Capacity of single neuron
 Sigmoid activation function

9/7/2020 Xuming He – CS 280 Deep Learning 57


What a single neuron does?
 A neuron (perceptron) fires if its input is within a specific
angle of its weight
 If the input pattern matches the weight pattern closely enough

9/7/2020 Xuming He – CS 280 Deep Learning 58


Single neuron as a linear classifier
 Binary classification

9/7/2020 Xuming He – CS 280 Deep Learning 59


How do we determine the weights?
 Learning problem

9/7/2020 Xuming He – CS 280 Deep Learning 60


Linear classification
 Learning problem: simple approach

• Drawback: Sensitive to “outliers”

9/7/2020 Xuming He – CS 280 Deep Learning 61


1D Example
 Compare two predictors

9/7/2020 Xuming He – CS 280 Deep Learning 62


Perceptron algorithm
 Learn a single neuron for binary classification

https://towardsdatascience.com/perceptron-explanation-implementation-and-a-visual-example-3c8e76b4e2d1

9/7/2020 Xuming He – CS 280 Deep Learning 63


Perceptron algorithm
 Learn a single neuron for binary classification

 Task formulation

9/7/2020 Xuming He – CS 280 Deep Learning 64


Perceptron algorithm
 Algorithm outline

9/7/2020 Xuming He – CS 280 Deep Learning 65


Perceptron algorithm
 Intuition: correct the current mistake

9/7/2020 Xuming He – CS 280 Deep Learning 66


Perceptron algorithm
 The Perceptron theorem

9/7/2020 Xuming He – CS 280 Deep Learning 67


Hyperplane Distance
Perceptron algorithm
 The Perceptron theorem: proof

9/7/2020 Xuming He – CS 280 Deep Learning 69


Perceptron algorithm
 The Perceptron theorem: proof

9/7/2020 Xuming He – CS 280 Deep Learning 70


Perceptron algorithm
 The Perceptron theorem: proof intuition

9/7/2020 Xuming He – CS 280 Deep Learning 71


Perceptron algorithm
 The Perceptron theorem: proof

9/7/2020 Xuming He – CS 280 Deep Learning 72


Perceptron algorithm
 The Perceptron theorem

9/7/2020 Xuming He – CS 280 Deep Learning 73


Perceptron Learning problem
 What loss function is minimized?

9/7/2020 Xuming He – CS 280 Deep Learning 74


Perceptron algorithm
 What loss function is minimized?

9/7/2020 Xuming He – CS 280 Deep Learning 75


Perceptron algorithm
 What loss function is minimized?

9/7/2020 Xuming He – CS 280 Deep Learning 76


Summary
 Introduction to deep learning
 Course logistics
 Review of basic math & ML
 Artificial neurons

 Next time
 Basic neural networks
 First Quiz on prerequisite

9/7/2020 Xuming He – CS 280 Deep Learning 77

You might also like