0% found this document useful (0 votes)

81 views77 pages

Lecture Notes 01

This lecture introduces the course on deep learning. It outlines the course objectives of learning how to write, debug, and train neural networks and understand key deep learning concepts and state-of-the-art research topics. The lecture also reviews the syllabus, schedule, grading policy, and prerequisites. It provides an introduction to deep learning, including using deep neural networks as a data-driven approach to build intelligent algorithms that can make sense of data like images and language.

Uploaded by

zhao linger

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

81 views77 pages

Lecture Notes 01

Uploaded by

zhao linger

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 77

Lecture 1: Introduction

Xuming He
SIST, ShanghaiTech
Fall, 2020

9/7/2020 Xuming He – CS 280 Deep Learning 1

Outline
 Course logistics
 Overall objective

 Grading policy

 Pre-requisite / Syllabus

 Introduction to deep learning

 Machine learning review
 Artificial neurons

9/7/2020 Xuming He – CS 280 Deep Learning 2

Course objectives
 Learning to use deep networks
 How to write from scratch, debug and train neural networks

 Toolboxes commonly used in practice

 Understanding deep models

 Key concepts and principles

 State of the art

 Some new topics from research field

 Focusing on vision-related problems

9/7/2020 Xuming He – CS 280 Deep Learning 3

Syllabus & Schedule
 Piazza:
 piazza.com/shanghaitech.edu.cn/fall2020/cs280
 The schedule for the latter half of the semester may vary a bit
 Part I: Basic neural networks (1~1.5 weeks by Prof He)
 Linear models
 Multiple layer networks
 Gradient descent and BP
 Part II: Convolutional neural networks
 Part III: Recurrent neural networks
 Part IV: Generative neural networks
 Part V: Advanced Topics

9/7/2020 Xuming He – CS 280 Deep Learning 4

Syllabus & Schedule
 Piazza:
 piazza.com/shanghaitech.edu.cn/fall2020/cs280
 The schedule for the latter half of the semester may vary a bit
 Part I: Basic neural networks (1~1.5 weeks)
 Part II: Convolutional neural networks (4 weeks by Prof He)
 CNN basics
 Understanding CNN
 CNN in Vision
 Part III: Recurrent neural networks
 Part IV: Generative neural networks
 Part V: Advanced Topics

9/7/2020 Xuming He – CS 280 Deep Learning 5

Syllabus & Schedule
 Piazza:
 piazza.com/shanghaitech.edu.cn/fall2020/cs280
 The schedule for the latter half of the semester may vary a bit
 Part I: Basic neural networks (1~1.5 weeks)
 Part II: Convolutional neural networks (4 weeks)
 Part III: Recurrent neural networks (3 weeks by Prof Xu)
 LSTM, GRU
 Attention modeling
 RNN in Vision/NLP
 Transformer and Graph Neural Networks
 Part IV: Generative neural networks
 Part V: Advanced Topics

9/7/2020 Xuming He – CS 280 Deep Learning 6

Syllabus & Schedule
 Piazza:
 piazza.com/shanghaitech.edu.cn/fall2020/cs280
 The schedule for the latter half of the semester may vary a bit
 Part I: Basic neural networks (1~1.5 weeks)
 Part II: Convolutional neural networks (4 weeks)
 Part III: Recurrent neural networks (3 weeks)
 Part IV: Generative neural networks (2 weeks by Prof Xu)
 Variational Auto Encoder (VAE)
 Generative deep nets (GAN)
 Part V: Advanced Topics (2 weeks)
 Note: no lectures in the following weeks
 Nov 9 ~ Nov 16 (CVPR)

9/7/2020 Xuming He – CS 280 Deep Learning 7

Reference books and materials
 Deep learning:
 http://www.deeplearningbook.org/

 https://d2l.ai/

 Online deep learning courses:

 Stanford: CS230, CS231n
 CMU: 11-785
 MIT: 6.S191

 Additional reading materials on Piazza

 Survey papers, tutorials, etc.

9/7/2020 Xuming He – CS 280 Deep Learning 8

Instructor and TAs
 Instructor: Prof Xuming He and Prof Lan Xu
 [email protected] ; [email protected]
 SIST 1A-304D ; 1C-203D

 TAs:
 Haozhe Wang, Qiuyue Wang, Guoxing Sun, Yannan He, Quan
Meng, Yinwenqi Jiang

 Office hours: To be announced on Piazza

 We will use Piazza as the main communication platform

9/7/2020 Xuming He – CS 280 Deep Learning 9

Grading policy
 4 Problem sets: 10% x 4 = 40%
 Write-up problem sets + Programming tasks
 Final course project: 40% (+10%)
 Proposal
 Final report (Conference format)
 Presentation
 Bonus points for novel results: 10%
 10 Quizzes (in class): 2% x 10 = 20%
 Late policy
 A total of 7 free late (calendar) days to use, but no more than 4 late days can be
used on any single assignment.
 After that, 25% off per day late
 Does not apply to Final course project/Quizzes
 Collaboration policy
 Project team: 3~5 students
 Grading according to each member’s contribution

9/7/2020 Xuming He – CS 280 Deep Learning 10

Administrative Stuff
 Plagiarism
 All assignments must be done individually
 You may not look at solutions from any other source
 You may not share solutions with any other students
 Plagiarism detection software will be used on all the programming
assignments
 You may discuss together or help another student but you cannot
give the exact solution

 Plagiarism punishment
 When one student copies from another student, both students
are responsible
 Zero point on the assignment or exam in question
 Repeated violation will result in an F grade for this course as well
as further discipline at the school/university level
Pre-requisite
 Proficiency in Python
 All class assignments will be in Python (and use numpy)
 A Python tutorial available on Piazza
 Calculus, Linear Algebra, Probability and Statistics
 Undergrad course level
 Equivalent knowledge of Andrew Ng’s CS229 (Machine
Learning)
 Formulating cost functions
 Taking derivatives
 Performing optimization with gradient descent
 Will be evaluated in next quiz (Wednesday)

9/7/2020 Xuming He – CS 280 Deep Learning 12

Outline
 Course logistics
 Introduction to deep learning
 What & Why deep learning?

 Machine Learning review

 Artificial neurons

Acknowledgement: Bhiksha Raj@CMU’s course notes

9/7/2020 Xuming He – CS 280 Deep Learning 13
Introduction
 Our goal: Build intelligent algorithms to make sense of data
 Example: Recognizing objects in images

red panda (Ailurus fulgens)

 Example: Predicting what would happen next

Vondrick et al. CVPR2016

9/7/2020 Xuming He – CS 280 Deep Learning 14

Introduction
 Our goal: Build intelligent algorithms to make sense of data
 Example: Recognizing objects in images

 Example: Predicting what would happen next

9/7/2020 Xuming He – CS 280 Deep Learning 15

Introduction
 A broad range of real-world applications
 Speech recognition
 Input: sound wave → Output: transcript
 Language translation
 Input: text in language A (Eng) → Output: text in language B (Chs)
 Image classification
 Input: images → Output: image category (cat, dog, car, house, etc.)
 Autonomous driving
 Input: sensory inputs → Output: actions (straight, left, right, stop, etc.)

 Main challenges: difficult to manually design the algorithms

9/7/2020 16
A data-driven approach
 Each task as a mapping function (or a model)
Mapping function
Input data Expected output

 input data: images

 expected output: object or action names

 Building such mapping functions from data

Mapping function

red panda (Ailurus fulgens)

9/7/2020 Xuming He – CS 280 Deep Learning 17
A data-driven approach
 Building a mapping function (model)

 x: input data
 y: expected output
 : parameters to be estimated

 Learning the model from data

 Given a dataset
 Find the ‘best’ parameter , such that

 And it can be generalized to unseen input data

9/7/2020 18
What is deep learning?
 Using deep neural networks as the mapping function

 Model: Deep neural networks

 A family of parametric models
 Consisting of many ‘simple’ computational units
 Constructing a multi-layer representation of input

Image from Jeff Clune’s Deep Learning Overview

9/7/2020 19
What is deep learning?
 Using deep neural networks as the mapping function

 Learning: Parameter estimation from data

 Parameters: connection weights between units
 Formulated as an optimization problem
 Efficient algorithms for handling large-scale models & datasets

9/7/2020 20
Why deep networks?
 Inspiration from visual cortex

9/7/2020 21
Why deep networks?
 A deep architecture can represent certain functions
(exponentially) more compactly
 Learning a rich representation of input data

9/7/2020 22
Recent success with DL
Recent
 Somesuccess with
recent neuralwith
success networks
neural networks
Steel drum
The Im ag e C lassification C halleng e:
1,000 ob ject classes
1,431,167 im ag es

Recent success with neural networks

Recent success with neural network
Russakovsky et al. arX iv, 2014

• Some recent successes with neural networks

Fei-Fei Li & Justin Jo h nso n & S erena Y eung Lecture 1 - 24 4/4/2017

– A bit of hyperbole, but still..

9/7/2020 23
Summary: Why deep learning?
 One of the major thrust areas recently in various pattern
recognition, prediction and data analysis
 Efficient representation of data and computation
 Other key factors: large datasets and hardware

 The state of the art in many problems

 Often exceeding previous benchmarks by large margins
 Achieve better performances than human for certain “complex”
tasks.

 But also somewhat controversial …

 Lack of theoretical understanding
 Sometimes difficult to make it work in practice

9/7/2020 24
Is it alchemy?

9/7/2020 Xuming He – CS 280 Deep Learning 25

Questions to ask
 Understanding neural networks
 What is different from traditional ML methods?

 How it works for specific problems?

 Why get great performance?

 Future development
 Its limitation and weakness?

 After more than 10 years, what is on-going or next?

 The road to general-purpose AI?

9/7/2020 Xuming He – CS 280 Deep Learning 26

Outline
 Course logistics
 Introduction to deep learning
 Machine learning review
 Math review

 Supervised learning

 Artificial neurons

Acknowledgement: Hugo Larochelle’s, Mehryar Mohri@NYU’s & Yingyu

Liang@Princeton’s course notes
9/7/2020 Xuming He – CS 280 Deep Learning 27
Math review – Calculus
 Gradient

9/7/2020 Xuming He – CS 280 Deep Learning 28

Math review – Calculus
 Local and global minima
 Necessary condition

 Sufficient condition
 Hessian is positive definite

9/7/2020 Xuming He – CS 280 Deep Learning 29

Math review – Probability
 Factorization

9/7/2020 Xuming He – CS 280 Deep Learning 30

Math review – Probability
 Common distributions

9/7/2020 Xuming He – CS 280 Deep Learning 31

Math review – Statistics
 Monte Carlo estimation

 Maximum likelihood

 Independent and identically distributed

9/7/2020 Xuming He – CS 280 Deep Learning 32

ML tasks
 Classification: assign a category to each item (e.g.,
document classification)
 Regression: predict a real value for each item (e.g.,
prediction of stock values, economic variables)
 Ranking: order items according to some criterion (e.g.,
relevant web pages returned by a search engine)
 Clustering: partition data into ‘homogenous’ regions
(e.g., analysis of very large data sets)
 Dimensionality reduction: find lower-dimensional
manifold preserving some properties of the data

9/7/2020 Xuming He – CS 280 Deep Learning 33

Standard learning scenarios
 Unsupervised learning: no labeled data
 Supervised learning: uses labeled data for prediction on
unseen points
 Semi-supervised learning: uses labeled and unlabeled
data for prediction on unseen points
 Reinforcement learning: uses reward to learn prediction
on action policies.
 …

9/7/2020 Xuming He – CS 280 Deep Learning 34

Supervised learning
 Task formulation