0% found this document useful (0 votes)
3 views

cs231n_2019_lecture01

The document is an introduction to the CS231n course on Convolutional Neural Networks for Visual Recognition, taught by Justin Johnson, Serena Yeung, and Fei-Fei Li. It covers the course agenda, a brief history of computer vision, and the evolution of techniques in visual recognition, emphasizing the significance of CNNs in the field. The lecture also highlights related courses at Stanford and various visual recognition problems such as object detection and image captioning.

Uploaded by

pengqiwulin
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
3 views

cs231n_2019_lecture01

The document is an introduction to the CS231n course on Convolutional Neural Networks for Visual Recognition, taught by Justin Johnson, Serena Yeung, and Fei-Fei Li. It covers the course agenda, a brief history of computer vision, and the evolution of techniques in visual recognition, emphasizing the significance of CNNs in the field. The lecture also highlights related courses at Stanford and various visual recognition problems such as object detection and image captioning.

Uploaded by

pengqiwulin
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 52

CS231n: Convolutional Neural Network

for Visual Recognition

Justin Johnson, Serena Yeung, Fei-Fei Li

Lecture 1: Introduction

Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 1 - 1 4/2/2019


Welcome to CS231n

Top row, left to right: Middle row, left to right Bottom row, left to right
Image by Roger H Goun is licensed under CC BY 2.0 Image by BGPHP Conference is licensed under CC BY 2.0; changes made Image is CC0 1.0 public domain
Image is CC0 1.0 public domain Image is CC0 1.0 public domain Image by Derek Keats is licensed under CC BY 2.0; changes made
Image is CC0 1.0 public domain Image by NASA is licensed under CC BY 2.0 Image is public domain
Image is CC0 1.0 public domain Image is CC0 1.0 public domain Image is licensed under CC-BY 2.0; changes made

Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 1 - 2 4/2/2019


Biology
Psychology
Neuroscience
Physics optics Cognitive
sciences

Image graphics, algorithms,


processing theory,… Computer
Computer
Science
Vision systems,
Speech, NLP architecture, …

Robotics Information retrieval


Engineering
Machine learning

Mathematics

Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 1 - 3 4/2/2019


Related Courses @ Stanford
• CS131: Computer Vision: Foundations and Applications
– Fall 2018, Juan Carlos Niebles and Ranjay Krishna
– Undergraduate introductory class
• CS231a: Computer Vision, from 3D Reconstruction to Recognition
– Professor Silvio Savarese
– Core computer vision class for seniors, masters, and PhDs
– Image processing, cameras, 3D reconstruction, segmentation, object recognition,
scene understanding; not just deep learning
• CS 224n: Natural Language Processing with Deep Learning
– Winter 2019, Chris Manning
• CS 230: Deep Learning
– Spring 2019, Prof. Andrew Ng and Kian Katanforoosh
• CS231n: Convolutional Neural Networks for Visual Recognition
– This course, Justin Johnson & Serena Yeung & Fei-Fei Li
– Focusing on applications of deep learning to computer vision

Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 1 - 4 4/2/2019


Today’s agenda
• A brief history of computer vision

• CS231n overview

Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 1 - 5 4/2/2019


Evolution’s Big Bang

This image is licensed under CC-BY 2.5

This image is licensed under CC-BY 2.5 This image is licensed under CC-BY 3.0

543 million years, B.C.


Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 1 - 6 4/2/2019
Camera Obscura
Gemma Frisius, 1545 Encyclopedie, 18th Century

This work is in the public domain

Leonardo da Vinci,
16th Century AD
This work is in the public domain

This work is in the public domain

Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 1 - 7 4/2/2019


Hubel & Wiesel, 1959
Electrical signal
from brain
Simple cells:
Response to light orientation

Complex cells:
Response to light orientation
and movement Stimulus
Hypercomplex cells: response
to movement with an end point

Stimulus Response
No response Response Cat image by CNX OpenStax is licensed
(end point) under CC BY 4.0; changes made

Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 1 - 8 4/2/2019


Block world
Larry Roberts, 1963

(a) Original picture (b) Differentiated picture (c) Feature points selected

Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 1 - 9 4/2/2019


Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 1 - 10 4/2/2019
David Marr, 1970s
Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 1 - 11 4/2/2019
2 ½-D sketch 3-D model
Input image Edge image

This image is CC0 1.0 public domain This image is CC0 1.0 public domain

Input Primal 2 ½-D 3-D Model


Image Sketch Sketch Representation

Zero crossings, Local surface 3-D models


blobs, edges, orientation and hierarchically
Perceived bars, ends, discontinuities in organized in
intensities virtual lines, depth and in terms of surface
groups, curves surface and volumetric
boundaries orientation primitives

Stages of Visual Representation, David Marr, 1970s


Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 1 - 12 4/2/2019
• Generalized Cylinder • Pictorial Structure
Brooks & Binford, 1979 Fischler and Elschlager, 1973

Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 1 - 13 4/2/2019


Image is CC0 1.0 public domain

David Lowe, 1987


Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 1 - 14 4/2/2019
Normalized Cut (Shi & Malik, 1997) Image is CC-BY 2.0;
Image is CC BY 3.0 Image is public domain changes made

Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 1 - 15 4/2/2019


Face Detection, Viola & Jones,
2001

Image is public
domain

Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 1 - 16 4/2/2019


Image is public domain
Image is public domain

“SIFT” & Object Recognition, David Lowe, 1999

Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 1 - 17 4/2/2019


Image is CC0 1.0 public domain

Level 0 Level 1
Spatial Pyramid Matching, Lazebnik, Schmid & Ponce, 2006

Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 1 - 18 4/2/2019


Image is CC0 1.0 public domain
frequency

Deformable Part Model


Felzenswalb, McAllester, Ramanan, 2009
orientation

Histogram of Gradients (HoG)


Dalal & Triggs, 2005

Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 1 - 19 4/2/2019


PASCAL Visual Object Challenge
(20 object categories)
[Everingham et al. 2006-2012]

Image is CC0 1.0 public domain

Train
Person

Airplane

Image is CC0 1.0 public domain


Image is CC0 1.0 public domain

Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 1 - 20 4/2/2019


www.image-net.org

22K categories and 15M images

• Animals • Plants • Structures • Person


• Bird • Tree • Artifact • Scenes
• Fish • Flower • Tools • Indoor
• Mammal • Food • Appliances • Geological Formations
• Invertebrate • Materials • Structures • Sport Activities

Deng, Dong, Socher, Li, Li, & Fei-Fei, 2009

Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 1 - 21 4/2/2019


Steel drum
The Image Classification Challenge:
1,000 object classes
1,431,167 images

Output: Output:
Scale Scale
T-shirt
Steel drum
Drumstick
✔ T-shirt
Giant panda
Drumstick

Mud turtle Mud turtle

Russakovsky et al. IJCV 2015


Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 1 - 22 4/2/2019
Steel drum
The Image Classification Challenge:
1,000 object classes
1,431,167 images

Russakovsky et al. IJCV 2015


Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 1 - 23 4/2/2019
Today’s agenda
• A brief history of computer vision

• CS231n overview

Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 1 - 24 4/2/2019


CS231n focuses on one of the most fundamental
problems of visual recognition –
image classification

Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 1 - 25 4/2/2019


Image by US Army is licensed under CC BY 2.0 Image is CC0 1.0 public domain

Image by Kippelboy is licensed under CC BY-SA 3.0 Image by Christina C. is licensed under CC BY-SA 4.0

Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 1 - 26 4/2/2019


There are many visual recognition problems that are
related to image classification, such as
object detection, image captioning

Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 1 - 27 4/2/2019


• Object detection
• Action classification
• Image captioning
• …
This image is licensed under CC BY-NC-SA 2.0; changes made

Person on Bike

Person
Hammer
Person Bike

This image is licensed under CC BY-SA 3.0; changes made


This image is licensed under CC BY-SA 2.0; changes made

Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 1 - 28 4/2/2019


Convolutional Neural Networks (CNN) have
become an important tool for object recognition

Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 1 - 29 4/2/2019


Year 2010 Year 2012 Year 2014 Year 2015
NEC-UIUC SuperVision GoogLeNet VGG MSRA
Image
Pooling
Convolution conv-64
Softmax conv-64
Other maxpool

conv-128
Dense descriptor grid:
conv-128
HOG, LBP
maxpool

conv-256
Coding: local coordinate, conv-256
super-vector maxpool

conv-512
conv-512
Pooling, SPM maxpool

conv-512
conv-512
Linear SVM maxpool

fc-4096
fc-4096
fc-1000
softmax

[Lin CVPR 2011] [Krizhevsky NIPS 2012]

Figure copyright Alex Krizhevsky, Ilya [Szegedy arxiv 2014] [Simonyan arxiv 2014] [He ICCV 2015]
Lion image by Swissfrog is
Sutskever, and Geoffrey Hinton, 2012.
licensed under CC BY 3.0
Reproduced with permission.

Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 1 - 30 4/2/2019


Convolutional Neural Networks (CNN)
were not invented overnight

Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 1 - 31 4/2/2019


Image Maps
Input

1998
LeCun et al.
K Output

Fully Connected
Convolutions
Subsampling

# of transistors # of pixels used in training


106 107

2012
Krizhevsky et al.

# of transistors GPUs # of pixels used in training


Figure copyright Alex Krizhevsky, Ilya
Sutskever, and Geoffrey Hinton, 2012.
Reproduced with permission.
109 1014
Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 1 - 32 4/2/2019
Ingredients for Deep Learning
Algorithms

Data

Computation

Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 1 - 33 4/2/2019


18
GigaFLOPs per Dollar
CPU GPU Deep Learning Explosion
16

14
GTX 1080 Ti
12

10

8
GeForce
6
GTX 580
(AlexNet)
4
GeForce
2
8800 GTX

0
1/2004 10/2006 7/2009 4/2012 12/2014 9/2017
Time

Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 1 - 34 4/2/2019


40
GigaFLOPs per Dollar
CPU GPU TPU
35
TITAN V
(Tensor Cores)
30

25 Deep Learning Explosion

20 GTX 1080 Ti

15
GeForce
10 GTX 580
GeForce (AlexNet)
5 8800 GTX

0
1/2004 10/2006 7/2009 4/2012 12/2014 9/2017
Time

Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 1 - 35 4/2/2019


The quest for visual intelligence
goes far beyond object recognition…

Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 1 - 36 4/2/2019


Image is CC0 1.0 public domain

Wall Laptop

Glass Wire

Image is GFDL

Desk

Image is CC BY-SA 2.0 Image is CC BY-SA 4.0

Waving

Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 1 - 37 4/2/2019


Johnson et al., “Image Retrieval using Scene Graphs”, CVPR 2015
Figures copyright IEEE, 2015. Reproduced for educational purposes

Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 1 - 38 4/2/2019


PT = 500ms
Some kind of game or fight. Two groups of two men?
The man on the left is throwing something. Outdoors
seemed like because i have an impression of grass
and maybe lines on the grass? That would be why I
think perhaps a game, rough game though, more like
rugby than football because they pairs weren't in
pads and helmets, though I did get the impression of
similar clothing. maybe some trees? in the
background. (Subject: SM)

Fei-Fei, Iyer, Koch, Perona, JoV, 2007


Image is licensed under CC BY-SA 3.0; changes made

Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 1 - 39 4/2/2019


This image is copyright-free United States government work

Example credit: Andrej Karpathy

Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 1 - 40 4/2/2019


Computer Vision Technology
Can Better Our Lives

Outside border images, clockwise, starting from top left:


Image by Pop Culture Geek is licensed under CC BY 2.0; changes made
Image by the US Government is in the public domain Inside four images, clockwise, starting from top left:
Image by the US Government is in the public domain Image is CC0 1.0 public domain
Image by Glogger is licensed under CC BY-SA 3.0; changes made Image by Tucania is licensed under CC BY-SA 3.0; changes made
Image by Sylenius is licensed under CC BY 3.0; changes made Image by Intuitive Surgical, Inc. is licensed under CC BY-SA 3.0; changes made
Image by US Government is in the public domain Image by Oyundari Zorigtbaatar is licensed under CC BY-SA 4.0

Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 1 - 41 4/2/2019


Who we are
Instructors

Teaching Assistants

Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 1 - 42 4/2/2019


How to Contact Us
• Course Website: http://cs231n.stanford.edu/
– Syllabus, lecture slides, links to assignment downloads, etc
• Piazza: http://piazza.com/stanford/spring2019/cs231n
– Use this for most communication with course staff
– Ask questions about homework, grading, logistics, etc
– Use private questions if you want to post code
• Canvas
– For watching lecture videos

Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 1 - 43 4/2/2019


Optional Textbook

• Deep Learning by
Goodfellow, Bengio,
and Courville
• Free online

Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 1 - 44 4/2/2019


Our philosophy
• Thorough and Detailed.
– Understand how to write from scratch, debug and train
convolutional neural networks.
• Practical.
– Focus on practical techniques for training these
networks at scale, and on GPUs (e.g. will touch on
distributed optimization, differences between CPU vs.
GPU, etc.) Also look at state of the art software tools
• State of the art.
– Most materials are new from research world in the past
1-3 years. Very exciting stuff!

Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 1 - 45 4/2/2019


Our philosophy (cont’d)
• Fun.
– Some fun topics such as Image Captioning (using RNN)
– Also DeepDream, NeuralStyle, etc.

Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 1 - 46 4/2/2019


Pre-requisite
• Proficiency in Python, some high-level familiarity with
C/C++
– All class assignments will be in Python (and use numpy), but
some of the deep learning libraries we may look at later in
the class are written in C++.
– A Python tutorial available on course website
• College Calculus, Linear Algebra
• Equivalent knowledge of CS229 (Machine Learning)
– We will be formulating cost functions, taking derivatives and
performing optimization with gradient descent.

Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 1 - 47 4/2/2019


Grading Policy
• 3 Problem Sets: 15% x 3 = 45%
• Midterm Exam: 20%
• Course Project: 35%
– Project Proposal: 1%
– Milestone: 2%
– Poster: 2%
– Project Report: 30%
• Late policy
– 4 free late days – use up to 2 late days per assignment
– Afterwards, 25% off per day late
– No late days for project report

Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 1 - 48 4/2/2019


Collaboration Policy
• We follow the Stanford Honor Code and the CS
Department Honor Code – read them!
• Rule 1: Don’t look at solutions or code that are not your
own; everything you submit should be your own work
• Rule 2: Don’t share your solution code with others;
however discussing ideas or general strategies is fine and
encouraged
• Rule 3: Indicate in your submissions anyone you worked
with
• Turning in something late / incomplete is better than
violating the honor code

Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 1 - 49 4/2/2019


Next Time: Image Classification

K-Nearest Neighbor Linear Classifier

Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 1 - 50 4/2/2019


References
• Hubel, David H., and Torsten N. Wiesel. "Receptive fields, binocular interaction and functional architecture in the
cat's visual cortex." The Journal of physiology 160.1 (1962): 106. [PDF]
• Roberts, Lawrence Gilman. "Machine Perception of Three-dimensional Solids." Diss. Massachusetts Institute of
Technology, 1963. [PDF]
• Marr, David. "Vision.” The MIT Press, 1982. [PDF]
• Brooks, Rodney A., and Creiner, Russell and Binford, Thomas O. "The ACRONYM model-based vision system. " In
Proceedings of the 6th International Joint Conference on Artificial Intelligence (1979): 105-113. [PDF]
• Fischler, Martin A., and Robert A. Elschlager. "The representation and matching of pictorial structures." IEEE
Transactions on Computers 22.1 (1973): 67-92. [PDF]
• Lowe, David G., "Three-dimensional object recognition from single two-dimensional images," Artificial Intelligence,
31, 3 (1987), pp. 355-395. [PDF]
• Shi, Jianbo, and Jitendra Malik. "Normalized cuts and image segmentation." Pattern Analysis and Machine
Intelligence, IEEE Transactions on 22.8 (2000): 888-905. [PDF]
• Viola, Paul, and Michael Jones. "Rapid object detection using a boosted cascade of simple features." Computer
Vision and Pattern Recognition, 2001. CVPR 2001. Proceedings of the 2001 IEEE Computer Society Conference on.
Vol. 1. IEEE, 2001. [PDF]
• Lowe, David G. "Distinctive image features from scale-invariant keypoints." International Journal of Computer Vision
60.2 (2004): 91-110. [PDF]
• Lazebnik, Svetlana, Cordelia Schmid, and Jean Ponce. "Beyond bags of features: Spatial pyramid matching for
recognizing natural scene categories." Computer Vision and Pattern Recognition, 2006 IEEE Computer Society
Conference on. Vol. 2. IEEE, 2006. [PDF]

Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 1 - 51 4/2/2019


• Dalal, Navneet, and Bill Triggs. "Histograms of oriented gradients for human detection." Computer Vision and
Pattern Recognition, 2005. CVPR 2005. IEEE Computer Society Conference on. Vol. 1. IEEE, 2005. [PDF]
• Felzenszwalb, Pedro, David McAllester, and Deva Ramanan. "A discriminatively trained, multiscale, deformable part
model." Computer Vision and Pattern Recognition, 2008. CVPR 2008. IEEE Conference on. IEEE, 2008 [PDF]
• Everingham, Mark, et al. "The pascal visual object classes (VOC) challenge." International Journal of Computer
Vision 88.2 (2010): 303-338. [PDF]
• Deng, Jia, et al. "Imagenet: A large-scale hierarchical image database." Computer Vision and Pattern Recognition,
2009. CVPR 2009. IEEE Conference on. IEEE, 2009. [PDF]
• Russakovsky, Olga, et al. "Imagenet Large Scale Visual Recognition Challenge." arXiv:1409.0575. [PDF]
• Lin, Yuanqing, et al. "Large-scale image classification: fast feature extraction and SVM training." Computer Vision
and Pattern Recognition (CVPR), 2011 IEEE Conference on. IEEE, 2011. [PDF]
• Krizhevsky, Alex, Ilya Sutskever, and Geoffrey E. Hinton. "Imagenet classification with deep convolutional neural
networks." Advances in neural information processing systems. 2012. [PDF]
• Szegedy, Christian, et al. "Going deeper with convolutions." arXiv preprint arXiv:1409.4842 (2014). [PDF]
• Simonyan, Karen, and Andrew Zisserman. "Very deep convolutional networks for large-scale image recognition."
arXiv preprint arXiv:1409.1556 (2014). [PDF]
• He, Kaiming, et al. "Spatial Pyramid Pooling in Deep Convolutional Networks for Visual Recognition." arXiv preprint
arXiv:1406.4729 (2014). [PDF]
• LeCun, Yann, et al. "Gradient-based learning applied to document recognition." Proceedings of the IEEE 86.11
(1998): 2278-2324. [PDF]
• Fei-Fei, Li, et al. "What do we perceive in a glance of a real-world scene?." Journal of vision 7.1 (2007): 10. [PDF]

Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 1 - 52 4/2/2019

You might also like