SlideShare a Scribd company logo
1
2
3
4
5
6
Car
7
Car
HOW?
8
H
V
9
All
10
V
All
H
11
All
HV
12
All
HV
HVV
13
All
HV
HVV
14
H VV
V VV
15
H VV
V VV
16
V V V V V V V V V V
H H H H H
X
17
V V V V V V V V V V
H H H H H
X
v v vv v v v
X
18
X
h h h h
v v vv v v v
19
X
v v vv v v v
h h h h
20
X
v v vv v v v
h h h h
21
X
v v vv v v v
h h h h
abstraction
abstraction
22
X
v v vv v v v
h h h h
abstraction
abstraction
23
24
• Deep learning is all about deep neural networks
• 1949 : Hebbian learning
• Donald Hebb : the father of neural networks
• 1958 : (single layer) Perceptron
• Frank Rosenblatt
- Marvin Minsky, 1969
• 1986 : Multilayer Perceptron(Back propagation)
• David Rumelhart, Geoffrey Hinton, and Ronald Williams
• 2006 : Deep Neural Networks
• Geoffrey Hinton and Ruslan Salakhutdinov
25
• Weakness in kernel machine(SVM …):
• It does not scale well with sample size.
• Based on matching local templates.
• the training data is referenced for test data
• Local representation VS distributed representation
• N N(Neural Network) -> Kernel machine -> Deep NN
26
27
28
29
30
31
32
33
34
35
36
Shallow learning Deep learning
feature extraction by domain experts
(SIFT, SURF, orb...)
automatic feature extraction from data
separate modules
(feature extractor + trainable classifier)
unified model : end-to-end learning
(trainable feature + trainable classifier)
37
38
• Core visual object recognition
Feedback
39
40
41
42
43
44
45
46
47
48
49
50
51
52
𝑥 𝑡
ℎ 𝑡
=
𝑥0
ℎ0
𝑥2
ℎ2
𝑥1
ℎ1
𝑥 𝑡
ℎ 𝑡
…
[http://karpathy.github.io/2015/05/21/rnn-effectiveness]53
• Bidirection Neural Network utilize in the past and future context for
every point in the sequence
• Two Hidden Layer(Forwards and Backwards) shared same output layer
Visualized of the amount of input information for prediction by different network structures
[Schuster 97]
54
55
RNN LSTM
• RNN forget the previous input(vanishing gradient)
• LSTM remember previous data and reminder if it wants
56
ℎ 𝑡−1(𝑝𝑟𝑒𝑣 𝑟𝑒𝑠𝑢𝑙𝑡)
𝜎
𝑥 𝑡(𝑐𝑢𝑟𝑟𝑒𝑛𝑡 𝑑𝑎𝑡𝑎)
𝐶𝑡−1 𝐶𝑡
𝑓𝑡 = 𝜎(𝑊𝑓 ∙ ℎ 𝑡−1, 𝑥 𝑡 + 𝑏𝑓)
𝑓𝑡
[http://colah.github.io/posts/2015-08-Understanding-LSTMs]
57
ℎ 𝑡−1(𝑝𝑟𝑒𝑣 𝑟𝑒𝑠𝑢𝑙𝑡)
𝜎
𝑥 𝑡(𝑐𝑢𝑟𝑟𝑒𝑛𝑡 𝑑𝑎𝑡𝑎)
𝐶𝑡−1 𝐶𝑡
𝑖 𝑡 = 𝜎(𝑊𝑖 ∙ ℎ 𝑡−1, 𝑥 𝑡 + 𝑏𝑖)
𝜎
𝑓𝑡
𝑖 𝑡
𝑡𝑎𝑛ℎ
𝐶𝑡
𝐶𝑡 = 𝑡𝑎𝑛ℎ(𝑊𝑐 ∙ ℎ 𝑡−1, 𝑥 𝑡 + 𝑏 𝑐)
58
[http://colah.github.io/posts/2015-08-Understanding-LSTMs]
ℎ 𝑡−1(𝑝𝑟𝑒𝑣 𝑟𝑒𝑠𝑢𝑙𝑡)
𝜎
𝑥 𝑡(𝑐𝑢𝑟𝑟𝑒𝑛𝑡 𝑑𝑎𝑡𝑎)
𝐶𝑡−1 𝐶𝑡
𝐶𝑡 = 𝑓𝑡 ∗ 𝐶𝑡−1 + 𝑖 𝑡 ∗ 𝐶𝑡
𝜎
𝑓𝑡
𝑖 𝑡
𝑡𝑎𝑛ℎ
𝐶𝑡
ⅹ
+ⅹ
59
[http://colah.github.io/posts/2015-08-Understanding-LSTMs]
ℎ 𝑡−1(𝑝𝑟𝑒𝑣 𝑟𝑒𝑠𝑢𝑙𝑡)
𝜎
𝑥 𝑡(𝑐𝑢𝑟𝑟𝑒𝑛𝑡 𝑑𝑎𝑡𝑎)
𝐶𝑡−1 𝐶𝑡
𝑂𝑡 = 𝜎(𝑊𝑜 ∙ ℎ 𝑡−1, 𝑥 𝑡 + 𝑏 𝑜)
𝜎
𝑓𝑡
𝑖 𝑡
𝑡𝑎𝑛ℎ
𝐶𝑡
ⅹ
+ⅹ
𝜎
ⅹ
𝑡𝑎𝑛ℎ
ℎ 𝑡
ℎ 𝑡
ℎ 𝑡 = 𝑂𝑡 ∗ 𝑡𝑎nh(𝐶𝑡)
60
[http://colah.github.io/posts/2015-08-Understanding-LSTMs]
61
• Dropout operator only to non-recurrent connections
[Zaremba14]
Arrow dash applied dropout otherwise solid line is not applied
ℎ 𝑡
𝑙
: hidden state in layer 𝑙 in timestep 𝑡.
dropout operator
Frame-level speech recognition accuracy
decode
encode
V1
W1
X2
X1
X1
V1
W1
X2
X1
X1
X2
V2
W2
X3
• Regress from observation to itself (input X1 -> output X1)
• ex : data compression(JPEG etc..)
[Lemme 10]
62
output
hidden
input
0 1 0 0…
0.05 0.7 0.5 0.01…
0.9 0.1 10−8…10−4
cow dog cat bus
original target
output of ensemble
[Hinton 14]
Softened outputs reveal the dark knowledge in the ensemble
dog
dog
training result
cat buscow
dog cat buscow
63
• Distribution of the top layer has more information.
• Model size in DNN can increase up to tens of GB
input
target
input
output
Training a DNN
Training a shallow network
64
[Hinton 14]
65
0 1 0 0 0 0 0 0 0 0dog
0 0 1 0 0 0 0 0 0 0cat
• Word embedding 𝑊: 𝑤𝑜𝑟𝑑𝑠 → ℝ 𝑛 function mapping to high-dimensional vectors
0.3 0.2 0.1 0.5 0.7dog
0.2 0.8 0.3 0.1 0.9cat
one hot vector representation
[Vinyals 14]
Nearest neighbors a few words
Word Embedding
𝜏𝑖 : time sequence
𝑔𝑖 : gain
𝑏𝑖 : bias
𝑤𝑗𝑖 : weight value of the between neuron 𝑖 and 𝑗
𝐼𝑖 : external input for neuron 𝑖
𝜎 : non-linear function(𝑡𝑎𝑛ℎ)
𝑦𝑖 : rate of change activation post synaptic neuron
Input Nodes
Hidden Nodes
Output Nodes
(subset of hidden nodes)
𝜏𝑖
𝑑𝑦𝑖
𝑑𝑡
= −𝑦𝑖 + 𝑊𝑗𝑖 𝜎 𝑔𝑗 𝑦𝑗 − 𝑏𝑗 + 𝐼𝑖
Update Equation
66
• Dynamic system model of biological neural network(walk, bike, etc..)
• Ordinary differential equations to model the effects on a neuron
of the training(using Generic Algorithm)
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
Convolution
Pooling
Softmax
Other
92
93
94
95
96
Data augmentation
97
November 13, 2015) submission deadline
• (pre-2015): (Google) 4.9%
• Beyond human-level performance
98
99
100
[Karpathy 14]
[Girshick 13]
• Generate dense, free-from descriptions of images
Infer region word alignments use to R-CNN + BRNN + MRF
101
Image Segmentation(Graph Cut + Disjoint union)
[Karpathy 14]
Infer region word alignments use to R-CNN + BRNN + MRF
102
𝑆 𝑘𝑙 =
𝑡∈𝑔 𝑙 𝑖∈𝑔 𝑘
𝑚𝑎𝑥(0, 𝑣𝑖
𝑇
𝑆𝑡)
Result BRNN
Result RNN
𝑔𝑙
𝑔 𝑘
• 𝑆𝑡 and 𝑣𝑖 with their additional
Multiple Instance Learning
hⅹ4096 maxrix(h is 1000~1600)
t-dimensional word dictionary
[Karpathy 14]
103
𝐸 𝑎1, . . , 𝑎 𝑛 = 𝑎 𝑗=𝑡
−𝑠𝑖𝑚𝑖𝑙𝑎𝑟𝑖𝑡𝑦(𝑤𝑗, 𝑟𝑡) + 𝑗=1..𝑁−1 𝛽[𝑎𝑗 = 𝑎𝑗+1]
Smoothing with an MRF
• Best region independently align each other
• Similarity regions are arrangement nearby
• Argmin can found dynamic programming
(word, region)
104
• Generation Methods on Auto Caption
1) Compose descriptors directly from recognized content
2) Retrieve relevant existing text given recognized content
• Compose descriptions given recognized content
Yao et al. (2010), Yang et al. (2011), Li et al. ( 2011), Kulkarni et al. (2011)
• Generation as retrieval
Farhadi et al. (2010), Ordonez et al (2011), Gupta et al (2012), Kuznetsova et al (2012)
• Generation using pre-associated relevant text
Leong et al (2010), Aker and Gaizauskas (2010), Feng and Lapata (2010a)
• Other (image annotation, video description, etc)
Barnard et al (2003), Pastra et al (2003), Gupta et al (2008), Gupta et al (2009),
Feng and Lapata (2010b), del Pero et al (2011), Krishnamoorthy et al (2012),
Barbu et al (2012), Das et al (2013)
105
106
107
• Divided to five part of human body(two arms, two legs, trunk)
• Modeling movements of these individual part and layer composed of 9
layers(BRNN, fusion layer, fully connection layer)
[Yong 15]
108
109
110
• “Maching Learning to Deep Learning by 곽동민
• http://www.cs.toronto.edu/~hinton/MatlabForSciencePaper.html
• convolutional neural networks : LeCun
• Alex Krizhevsky: Hinton (python, C++)
• https://code.google.com/p/cuda-convnet/
• Caffe: UC Berkeley (C++)
• http://caffe.berkeleyvision.org/
111
112

More Related Content

What's hot (20)

PDF
Convolutional Neural Networks (CNN)
Gaurav Mittal
 
PDF
Unsupervised Learning (D2L6 2017 UPC Deep Learning for Computer Vision)
Universitat Politècnica de Catalunya
 
PPTX
Geek Night 17.0 - Artificial Intelligence and Machine Learning
GeekNightHyderabad
 
PDF
Scene classification using Convolutional Neural Networks - Jayani Withanawasam
WithTheBest
 
PPTX
CNN Tutorial
Sungjoon Choi
 
PPTX
Inside the Black Box: How Does a Neural Network Understand Names? - Philip Bl...
Lucidworks
 
PDF
Training machine learning deep learning 2017
Iwan Sofana
 
PPTX
Deep learning: the future of recommendations
Balázs Hidasi
 
PDF
Deep Neural Networks 
that talk (Back)… with style
Roelof Pieters
 
PDF
Long Short Term Memory
Yan Xu
 
PDF
Deep Learning Architectures for NLP (Hungarian NLP Meetup 2016-09-07)
Márton Miháltz
 
PDF
Convolutional Neural Networks: Part 1
ananth
 
PDF
Recurrent Neural Networks
CloudxLab
 
PDF
Deep learning - Conceptual understanding and applications
Buhwan Jeong
 
PPTX
Deep Learning - A Literature survey
Akshay Hegde
 
PDF
AI&BigData Lab. Артем Чернодуб "Распознавание изображений методом Lazy Deep ...
GeeksLab Odessa
 
PDF
Convolutional neural network
Yan Xu
 
PDF
Tutorial on Deep Learning
inside-BigData.com
 
PDF
Deep Learning - Convolutional Neural Networks
Christian Perone
 
PDF
Deep Learning: a birds eye view
Roelof Pieters
 
Convolutional Neural Networks (CNN)
Gaurav Mittal
 
Unsupervised Learning (D2L6 2017 UPC Deep Learning for Computer Vision)
Universitat Politècnica de Catalunya
 
Geek Night 17.0 - Artificial Intelligence and Machine Learning
GeekNightHyderabad
 
Scene classification using Convolutional Neural Networks - Jayani Withanawasam
WithTheBest
 
CNN Tutorial
Sungjoon Choi
 
Inside the Black Box: How Does a Neural Network Understand Names? - Philip Bl...
Lucidworks
 
Training machine learning deep learning 2017
Iwan Sofana
 
Deep learning: the future of recommendations
Balázs Hidasi
 
Deep Neural Networks 
that talk (Back)… with style
Roelof Pieters
 
Long Short Term Memory
Yan Xu
 
Deep Learning Architectures for NLP (Hungarian NLP Meetup 2016-09-07)
Márton Miháltz
 
Convolutional Neural Networks: Part 1
ananth
 
Recurrent Neural Networks
CloudxLab
 
Deep learning - Conceptual understanding and applications
Buhwan Jeong
 
Deep Learning - A Literature survey
Akshay Hegde
 
AI&BigData Lab. Артем Чернодуб "Распознавание изображений методом Lazy Deep ...
GeeksLab Odessa
 
Convolutional neural network
Yan Xu
 
Tutorial on Deep Learning
inside-BigData.com
 
Deep Learning - Convolutional Neural Networks
Christian Perone
 
Deep Learning: a birds eye view
Roelof Pieters
 

Viewers also liked (19)

PPTX
머신러닝 시그 에이다부스트 07
Yonghoon Kwon
 
PPTX
머신러닝 시그 세미나_(k-means clustering)
Yonghoon Kwon
 
PDF
Adaboost를 이용한 face recognition
Yoseop Shin
 
ODP
ujava.org workshop : Deep Learning [2015-03-08]
신동 강
 
ODP
Deep Learning for Java (DL4J)
신동 강
 
PPTX
Basic Understanding of the Deep
Mad Scientists
 
PDF
Support Vector Machines
Daegeun Lee
 
PDF
34th.余凯.机器学习进展及语音图像中的应用
komunling
 
PDF
사물인터넷 노트7_사물인터넷과 영상처리
Dong Hwa Jeong
 
PDF
Searching for magic formula by deep learning
James Ahn
 
PPTX
Face Feature Recognition System with Deep Belief Networks, for Korean/KIISE T...
Mad Scientists
 
PDF
아마존, 구글 사례 중심의 영상처리기술 응용 트렌드
Juhyeun Han
 
PDF
딥러닝을 이용한 얼굴인식 (Face Recogniton with Deep Learning)
Daehee Han
 
PDF
Machine Learning Lecture 3 Decision Trees
ananth
 
PPTX
A Simple Introduction to Word Embeddings
Bhaskar Mitra
 
PPTX
Ersatz meetup - DeepLearning4j Demo
Adam Gibson
 
PDF
Recurrent Neural Networks, LSTM and GRU
ananth
 
PDF
한국어와 NLTK, Gensim의 만남
Eunjeong (Lucy) Park
 
PDF
자바, 미안하다! 파이썬 한국어 NLP
Eunjeong (Lucy) Park
 
머신러닝 시그 에이다부스트 07
Yonghoon Kwon
 
머신러닝 시그 세미나_(k-means clustering)
Yonghoon Kwon
 
Adaboost를 이용한 face recognition
Yoseop Shin
 
ujava.org workshop : Deep Learning [2015-03-08]
신동 강
 
Deep Learning for Java (DL4J)
신동 강
 
Basic Understanding of the Deep
Mad Scientists
 
Support Vector Machines
Daegeun Lee
 
34th.余凯.机器学习进展及语音图像中的应用
komunling
 
사물인터넷 노트7_사물인터넷과 영상처리
Dong Hwa Jeong
 
Searching for magic formula by deep learning
James Ahn
 
Face Feature Recognition System with Deep Belief Networks, for Korean/KIISE T...
Mad Scientists
 
아마존, 구글 사례 중심의 영상처리기술 응용 트렌드
Juhyeun Han
 
딥러닝을 이용한 얼굴인식 (Face Recogniton with Deep Learning)
Daehee Han
 
Machine Learning Lecture 3 Decision Trees
ananth
 
A Simple Introduction to Word Embeddings
Bhaskar Mitra
 
Ersatz meetup - DeepLearning4j Demo
Adam Gibson
 
Recurrent Neural Networks, LSTM and GRU
ananth
 
한국어와 NLTK, Gensim의 만남
Eunjeong (Lucy) Park
 
자바, 미안하다! 파이썬 한국어 NLP
Eunjeong (Lucy) Park
 
Ad

Similar to Computer vision lab seminar(deep learning) yong hoon (20)

PDF
Deep Learning for Personalized Search and Recommender Systems
Benjamin Le
 
PDF
Deep Learning: Application & Opportunity
iTrain
 
PPTX
Deep Learning and Watson Studio
Sasha Lazarevic
 
PDF
From neural networks to deep learning
Viet-Trung TRAN
 
PDF
Multimedia data mining using deep learning
Peter Wlodarczak
 
PPTX
Deep Learning with Python (PyData Seattle 2015)
Alexander Korbonits
 
PPT
deeplearning
huda2018
 
PDF
MLIP - Chapter 3 - Introduction to deep learning
Charles Deledalle
 
PPTX
Deep learning
Rajgupta258
 
PDF
Big Data Malaysia - A Primer on Deep Learning
Poo Kuan Hoong
 
PPTX
Semantic, Cognitive and Perceptual Computing -Deep learning
Artificial Intelligence Institute at UofSC
 
PDF
imageclassification-160206090009.pdf
KammetaJoshna
 
PPTX
Introduction to deep learning
doppenhe
 
PPTX
Image classification with Deep Neural Networks
Yogendra Tamang
 
PDF
An Introduction to Deep Learning
Poo Kuan Hoong
 
PPTX
What Deep Learning Means for Artificial Intelligence
Jonathan Mugan
 
PPTX
Image captioning
Muhammad Zbeedat
 
PPT
Introduction_to_DEEP_LEARNING.ppt machine learning that uses data, loads ...
gkyenurkar
 
PDF
Alberto Massidda - Images and words: mechanics of automated captioning with n...
Codemotion
 
PPTX
Deep Learning With Neural Networks
Aniket Maurya
 
Deep Learning for Personalized Search and Recommender Systems
Benjamin Le
 
Deep Learning: Application & Opportunity
iTrain
 
Deep Learning and Watson Studio
Sasha Lazarevic
 
From neural networks to deep learning
Viet-Trung TRAN
 
Multimedia data mining using deep learning
Peter Wlodarczak
 
Deep Learning with Python (PyData Seattle 2015)
Alexander Korbonits
 
deeplearning
huda2018
 
MLIP - Chapter 3 - Introduction to deep learning
Charles Deledalle
 
Deep learning
Rajgupta258
 
Big Data Malaysia - A Primer on Deep Learning
Poo Kuan Hoong
 
Semantic, Cognitive and Perceptual Computing -Deep learning
Artificial Intelligence Institute at UofSC
 
imageclassification-160206090009.pdf
KammetaJoshna
 
Introduction to deep learning
doppenhe
 
Image classification with Deep Neural Networks
Yogendra Tamang
 
An Introduction to Deep Learning
Poo Kuan Hoong
 
What Deep Learning Means for Artificial Intelligence
Jonathan Mugan
 
Image captioning
Muhammad Zbeedat
 
Introduction_to_DEEP_LEARNING.ppt machine learning that uses data, loads ...
gkyenurkar
 
Alberto Massidda - Images and words: mechanics of automated captioning with n...
Codemotion
 
Deep Learning With Neural Networks
Aniket Maurya
 
Ad

Recently uploaded (20)

PDF
MODULE-5 notes [BCG402-CG&V] PART-B.pdf
Alvas Institute of Engineering and technology, Moodabidri
 
PDF
13th International Conference of Security, Privacy and Trust Management (SPTM...
ijcisjournal
 
PPTX
Stability of IBR Dominated Grids - IEEE PEDG 2025 - short.pptx
ssuser307730
 
PPTX
template.pptxr4t5y67yrttttttttttttttttttttttttttttttttttt
SithamparanaathanPir
 
PDF
LLC CM NCP1399 SIMPLIS MODEL MANUAL.PDF
ssuser1be9ce
 
PDF
bs-en-12390-3 testing hardened concrete.pdf
ADVANCEDCONSTRUCTION
 
PDF
Bayesian Learning - Naive Bayes Algorithm
Sharmila Chidaravalli
 
PDF
Authentication Devices in Fog-mobile Edge Computing Environments through a Wi...
ijujournal
 
PDF
June 2025 Top 10 Sites -Electrical and Electronics Engineering: An Internatio...
elelijjournal653
 
PDF
Plant Control_EST_85520-01_en_AllChanges_20220127.pdf
DarshanaChathuranga4
 
PPTX
Kel.3_A_Review_on_Internet_of_Things_for_Defense_v3.pptx
Endang Saefullah
 
PDF
June 2025 - Top 10 Read Articles in Network Security and Its Applications
IJNSA Journal
 
PPTX
Electrical_Safety_EMI_EMC_Presentation.pptx
drmaneharshalid
 
PDF
Tesia Dobrydnia - An Avid Hiker And Backpacker
Tesia Dobrydnia
 
PDF
PROGRAMMING REQUESTS/RESPONSES WITH GREATFREE IN THE CLOUD ENVIRONMENT
samueljackson3773
 
PDF
Python Mini Project: Command-Line Quiz Game for School/College Students
MPREETHI7
 
PDF
Artificial Neural Network-Types,Perceptron,Problems
Sharmila Chidaravalli
 
PDF
William Stallings - Foundations of Modern Networking_ SDN, NFV, QoE, IoT, and...
lavanya896395
 
PPTX
darshai cross section and river section analysis
muk7971
 
PPTX
Engineering Quiz ShowEngineering Quiz Show
CalvinLabial
 
MODULE-5 notes [BCG402-CG&V] PART-B.pdf
Alvas Institute of Engineering and technology, Moodabidri
 
13th International Conference of Security, Privacy and Trust Management (SPTM...
ijcisjournal
 
Stability of IBR Dominated Grids - IEEE PEDG 2025 - short.pptx
ssuser307730
 
template.pptxr4t5y67yrttttttttttttttttttttttttttttttttttt
SithamparanaathanPir
 
LLC CM NCP1399 SIMPLIS MODEL MANUAL.PDF
ssuser1be9ce
 
bs-en-12390-3 testing hardened concrete.pdf
ADVANCEDCONSTRUCTION
 
Bayesian Learning - Naive Bayes Algorithm
Sharmila Chidaravalli
 
Authentication Devices in Fog-mobile Edge Computing Environments through a Wi...
ijujournal
 
June 2025 Top 10 Sites -Electrical and Electronics Engineering: An Internatio...
elelijjournal653
 
Plant Control_EST_85520-01_en_AllChanges_20220127.pdf
DarshanaChathuranga4
 
Kel.3_A_Review_on_Internet_of_Things_for_Defense_v3.pptx
Endang Saefullah
 
June 2025 - Top 10 Read Articles in Network Security and Its Applications
IJNSA Journal
 
Electrical_Safety_EMI_EMC_Presentation.pptx
drmaneharshalid
 
Tesia Dobrydnia - An Avid Hiker And Backpacker
Tesia Dobrydnia
 
PROGRAMMING REQUESTS/RESPONSES WITH GREATFREE IN THE CLOUD ENVIRONMENT
samueljackson3773
 
Python Mini Project: Command-Line Quiz Game for School/College Students
MPREETHI7
 
Artificial Neural Network-Types,Perceptron,Problems
Sharmila Chidaravalli
 
William Stallings - Foundations of Modern Networking_ SDN, NFV, QoE, IoT, and...
lavanya896395
 
darshai cross section and river section analysis
muk7971
 
Engineering Quiz ShowEngineering Quiz Show
CalvinLabial
 

Computer vision lab seminar(deep learning) yong hoon

  • 1. 1
  • 2. 2
  • 3. 3
  • 4. 4
  • 5. 5
  • 6. 6
  • 17. V V V V V V V V V V H H H H H X 17
  • 18. V V V V V V V V V V H H H H H X v v vv v v v X 18
  • 19. X h h h h v v vv v v v 19
  • 20. X v v vv v v v h h h h 20
  • 21. X v v vv v v v h h h h 21
  • 22. X v v vv v v v h h h h abstraction abstraction 22
  • 23. X v v vv v v v h h h h abstraction abstraction 23
  • 24. 24
  • 25. • Deep learning is all about deep neural networks • 1949 : Hebbian learning • Donald Hebb : the father of neural networks • 1958 : (single layer) Perceptron • Frank Rosenblatt - Marvin Minsky, 1969 • 1986 : Multilayer Perceptron(Back propagation) • David Rumelhart, Geoffrey Hinton, and Ronald Williams • 2006 : Deep Neural Networks • Geoffrey Hinton and Ruslan Salakhutdinov 25
  • 26. • Weakness in kernel machine(SVM …): • It does not scale well with sample size. • Based on matching local templates. • the training data is referenced for test data • Local representation VS distributed representation • N N(Neural Network) -> Kernel machine -> Deep NN 26
  • 27. 27
  • 28. 28
  • 29. 29
  • 30. 30
  • 31. 31
  • 32. 32
  • 33. 33
  • 34. 34
  • 35. 35
  • 36. 36
  • 37. Shallow learning Deep learning feature extraction by domain experts (SIFT, SURF, orb...) automatic feature extraction from data separate modules (feature extractor + trainable classifier) unified model : end-to-end learning (trainable feature + trainable classifier) 37
  • 38. 38
  • 39. • Core visual object recognition Feedback 39
  • 40. 40
  • 41. 41
  • 42. 42
  • 43. 43
  • 44. 44
  • 45. 45
  • 46. 46
  • 47. 47
  • 48. 48
  • 49. 49
  • 50. 50
  • 51. 51
  • 52. 52
  • 53. 𝑥 𝑡 ℎ 𝑡 = 𝑥0 ℎ0 𝑥2 ℎ2 𝑥1 ℎ1 𝑥 𝑡 ℎ 𝑡 … [http://karpathy.github.io/2015/05/21/rnn-effectiveness]53
  • 54. • Bidirection Neural Network utilize in the past and future context for every point in the sequence • Two Hidden Layer(Forwards and Backwards) shared same output layer Visualized of the amount of input information for prediction by different network structures [Schuster 97] 54
  • 55. 55
  • 56. RNN LSTM • RNN forget the previous input(vanishing gradient) • LSTM remember previous data and reminder if it wants 56
  • 57. ℎ 𝑡−1(𝑝𝑟𝑒𝑣 𝑟𝑒𝑠𝑢𝑙𝑡) 𝜎 𝑥 𝑡(𝑐𝑢𝑟𝑟𝑒𝑛𝑡 𝑑𝑎𝑡𝑎) 𝐶𝑡−1 𝐶𝑡 𝑓𝑡 = 𝜎(𝑊𝑓 ∙ ℎ 𝑡−1, 𝑥 𝑡 + 𝑏𝑓) 𝑓𝑡 [http://colah.github.io/posts/2015-08-Understanding-LSTMs] 57
  • 58. ℎ 𝑡−1(𝑝𝑟𝑒𝑣 𝑟𝑒𝑠𝑢𝑙𝑡) 𝜎 𝑥 𝑡(𝑐𝑢𝑟𝑟𝑒𝑛𝑡 𝑑𝑎𝑡𝑎) 𝐶𝑡−1 𝐶𝑡 𝑖 𝑡 = 𝜎(𝑊𝑖 ∙ ℎ 𝑡−1, 𝑥 𝑡 + 𝑏𝑖) 𝜎 𝑓𝑡 𝑖 𝑡 𝑡𝑎𝑛ℎ 𝐶𝑡 𝐶𝑡 = 𝑡𝑎𝑛ℎ(𝑊𝑐 ∙ ℎ 𝑡−1, 𝑥 𝑡 + 𝑏 𝑐) 58 [http://colah.github.io/posts/2015-08-Understanding-LSTMs]
  • 59. ℎ 𝑡−1(𝑝𝑟𝑒𝑣 𝑟𝑒𝑠𝑢𝑙𝑡) 𝜎 𝑥 𝑡(𝑐𝑢𝑟𝑟𝑒𝑛𝑡 𝑑𝑎𝑡𝑎) 𝐶𝑡−1 𝐶𝑡 𝐶𝑡 = 𝑓𝑡 ∗ 𝐶𝑡−1 + 𝑖 𝑡 ∗ 𝐶𝑡 𝜎 𝑓𝑡 𝑖 𝑡 𝑡𝑎𝑛ℎ 𝐶𝑡 ⅹ +ⅹ 59 [http://colah.github.io/posts/2015-08-Understanding-LSTMs]
  • 60. ℎ 𝑡−1(𝑝𝑟𝑒𝑣 𝑟𝑒𝑠𝑢𝑙𝑡) 𝜎 𝑥 𝑡(𝑐𝑢𝑟𝑟𝑒𝑛𝑡 𝑑𝑎𝑡𝑎) 𝐶𝑡−1 𝐶𝑡 𝑂𝑡 = 𝜎(𝑊𝑜 ∙ ℎ 𝑡−1, 𝑥 𝑡 + 𝑏 𝑜) 𝜎 𝑓𝑡 𝑖 𝑡 𝑡𝑎𝑛ℎ 𝐶𝑡 ⅹ +ⅹ 𝜎 ⅹ 𝑡𝑎𝑛ℎ ℎ 𝑡 ℎ 𝑡 ℎ 𝑡 = 𝑂𝑡 ∗ 𝑡𝑎nh(𝐶𝑡) 60 [http://colah.github.io/posts/2015-08-Understanding-LSTMs]
  • 61. 61 • Dropout operator only to non-recurrent connections [Zaremba14] Arrow dash applied dropout otherwise solid line is not applied ℎ 𝑡 𝑙 : hidden state in layer 𝑙 in timestep 𝑡. dropout operator Frame-level speech recognition accuracy
  • 62. decode encode V1 W1 X2 X1 X1 V1 W1 X2 X1 X1 X2 V2 W2 X3 • Regress from observation to itself (input X1 -> output X1) • ex : data compression(JPEG etc..) [Lemme 10] 62 output hidden input
  • 63. 0 1 0 0… 0.05 0.7 0.5 0.01… 0.9 0.1 10−8…10−4 cow dog cat bus original target output of ensemble [Hinton 14] Softened outputs reveal the dark knowledge in the ensemble dog dog training result cat buscow dog cat buscow 63
  • 64. • Distribution of the top layer has more information. • Model size in DNN can increase up to tens of GB input target input output Training a DNN Training a shallow network 64 [Hinton 14]
  • 65. 65 0 1 0 0 0 0 0 0 0 0dog 0 0 1 0 0 0 0 0 0 0cat • Word embedding 𝑊: 𝑤𝑜𝑟𝑑𝑠 → ℝ 𝑛 function mapping to high-dimensional vectors 0.3 0.2 0.1 0.5 0.7dog 0.2 0.8 0.3 0.1 0.9cat one hot vector representation [Vinyals 14] Nearest neighbors a few words Word Embedding
  • 66. 𝜏𝑖 : time sequence 𝑔𝑖 : gain 𝑏𝑖 : bias 𝑤𝑗𝑖 : weight value of the between neuron 𝑖 and 𝑗 𝐼𝑖 : external input for neuron 𝑖 𝜎 : non-linear function(𝑡𝑎𝑛ℎ) 𝑦𝑖 : rate of change activation post synaptic neuron Input Nodes Hidden Nodes Output Nodes (subset of hidden nodes) 𝜏𝑖 𝑑𝑦𝑖 𝑑𝑡 = −𝑦𝑖 + 𝑊𝑗𝑖 𝜎 𝑔𝑗 𝑦𝑗 − 𝑏𝑗 + 𝐼𝑖 Update Equation 66 • Dynamic system model of biological neural network(walk, bike, etc..) • Ordinary differential equations to model the effects on a neuron of the training(using Generic Algorithm)
  • 67. 67
  • 68. 68
  • 69. 69
  • 70. 70
  • 71. 71
  • 72. 72
  • 73. 73
  • 74. 74
  • 75. 75
  • 76. 76
  • 77. 77
  • 78. 78
  • 79. 79
  • 80. 80
  • 81. 81
  • 82. 82
  • 83. 83
  • 84. 84
  • 85. 85
  • 86. 86
  • 87. 87
  • 88. 88
  • 89. 89
  • 90. 90
  • 92. 92
  • 93. 93
  • 94. 94
  • 95. 95
  • 97. 97 November 13, 2015) submission deadline • (pre-2015): (Google) 4.9% • Beyond human-level performance
  • 98. 98
  • 99. 99
  • 100. 100
  • 101. [Karpathy 14] [Girshick 13] • Generate dense, free-from descriptions of images Infer region word alignments use to R-CNN + BRNN + MRF 101 Image Segmentation(Graph Cut + Disjoint union)
  • 102. [Karpathy 14] Infer region word alignments use to R-CNN + BRNN + MRF 102 𝑆 𝑘𝑙 = 𝑡∈𝑔 𝑙 𝑖∈𝑔 𝑘 𝑚𝑎𝑥(0, 𝑣𝑖 𝑇 𝑆𝑡) Result BRNN Result RNN 𝑔𝑙 𝑔 𝑘 • 𝑆𝑡 and 𝑣𝑖 with their additional Multiple Instance Learning hⅹ4096 maxrix(h is 1000~1600) t-dimensional word dictionary
  • 103. [Karpathy 14] 103 𝐸 𝑎1, . . , 𝑎 𝑛 = 𝑎 𝑗=𝑡 −𝑠𝑖𝑚𝑖𝑙𝑎𝑟𝑖𝑡𝑦(𝑤𝑗, 𝑟𝑡) + 𝑗=1..𝑁−1 𝛽[𝑎𝑗 = 𝑎𝑗+1] Smoothing with an MRF • Best region independently align each other • Similarity regions are arrangement nearby • Argmin can found dynamic programming (word, region)
  • 104. 104 • Generation Methods on Auto Caption 1) Compose descriptors directly from recognized content 2) Retrieve relevant existing text given recognized content • Compose descriptions given recognized content Yao et al. (2010), Yang et al. (2011), Li et al. ( 2011), Kulkarni et al. (2011) • Generation as retrieval Farhadi et al. (2010), Ordonez et al (2011), Gupta et al (2012), Kuznetsova et al (2012) • Generation using pre-associated relevant text Leong et al (2010), Aker and Gaizauskas (2010), Feng and Lapata (2010a) • Other (image annotation, video description, etc) Barnard et al (2003), Pastra et al (2003), Gupta et al (2008), Gupta et al (2009), Feng and Lapata (2010b), del Pero et al (2011), Krishnamoorthy et al (2012), Barbu et al (2012), Das et al (2013)
  • 105. 105
  • 106. 106
  • 107. 107
  • 108. • Divided to five part of human body(two arms, two legs, trunk) • Modeling movements of these individual part and layer composed of 9 layers(BRNN, fusion layer, fully connection layer) [Yong 15] 108
  • 109. 109
  • 110. 110
  • 111. • “Maching Learning to Deep Learning by 곽동민 • http://www.cs.toronto.edu/~hinton/MatlabForSciencePaper.html • convolutional neural networks : LeCun • Alex Krizhevsky: Hinton (python, C++) • https://code.google.com/p/cuda-convnet/ • Caffe: UC Berkeley (C++) • http://caffe.berkeleyvision.org/ 111
  • 112. 112

Editor's Notes

  • #10: Deep Learning의 핵심인 Hidden Variable(Latent Variable) 로 object의 representation을 표현함
  • #110: 이홍락 : m.s ph.d stanford, michigan univ. 조교수