Deep Learning And Business Models (VNITC 2015-09-13)

Text
Deep Learning and Business Models
Tran Quoc Hoan
The University of Tokyo
VNITC@2015-09-13

2
Today agendas
Deep Learning boom
Essentials of Deep Learning
Deep Learning - the newest

3
What is Deep Learning?
Deep Learning is a new trend of machine learning that enables
machines to unravel high level abstraction in large amount of data
Image
recognition
Speech
recognition
Customer Centric
Management
Natural Language
Processing
Drug Discovery
& Toxicology

Deep Learning Boom
–Albert Einstein
“If we knew what it was we were doing, it would not
be called research, would it?”

Big News - ILSVRC 2012
ILSVRC 2012 (ImageNet
Large Scale Visual
Recognition)
SuperVision
ISI
OXFORD_VGG
XRCE/INRIA
UnivofAmsterdam
29.576%
27.058%26.979%26.172%
15.315%
Error rate
Image source: http://cs.stanford.edu/people/karpathy/cnnembed/
5

6
Big News - Google brain
Self-taught learning with
unlabelled youtube
videos and 16,000
computers
AI grandmother cell (2012)
(Cat)(Human)

7
Data and Machine Learning
Most learning
algorithms
New AI method
(Deep Learning)
Amount of data
Performance
Image source: http://cs229.stanford.edu/materials/CS229-DeepLearning.pdf

8
New boom for the Giants & Startups
Google brain
project (2012)
Facebook AI Research
Lab (2013 Dec.)
Project Adam (2014)
Enlitic
Ersatz Labs
MetaMind
Nervana Systems
Skymind
Deepmind
One of 10 breakthrough technologies 2013
(MIT Technology Review)
PFN (Japan)

9
Pioneers
Geoffrey Hinton 
(Toronto, Google)
Yoshua Bengio 
(Montreal)
Yann LeCun 
(NewYork, Facebook)
Andrew Ng 
(Stanford, Baidu)
Jeffrey Dean 
(Google)
Le Viet Quoc 
(Stanford, Google)
Theory +
algorithms
Implementation

10
Sub-summary
・More than 2000 papers in 2014 ~ 2015 with 47 Google services.

Essentials of Deep Learning
–Albert Einstein

12
The basic calculation
x1
x2
x3
+1
z
f
Active function
z = f(x1w1 + x2w2 + x3w3 + w4)
w1
w2
w3
w4
Input
Forward propagation
wi: weight parameters

13
Neural network (1/2)
x1
x2
x3
+1
Input layer
+1 +1
y
Hidden layers
Output layer
f
f
f
f
h1
h2
z1
z2
Put deeper
The deeper layer displays for comprehensive
combination of small parts
Prob(cat) = 99%

14
Neural network (2/2)
x1
x2
x3
+1
Input layer
+1 +1
y1
Hidden layers
• Loss function E measures
how output labels are similar
to true labels. 
• For N input data
{w1ĳ}
{w2ĳ}
{w3ĳ}
Supervisor learning
y2
yk
Output
(Ex. probability of each in k class
in classiﬁcation task)
Find parameters {w} to minimize E → How?

15
Backward propagation
x1
x2
x3
+1
Input layer
r s
y
Output layer
y*
E
w
∂E
∂y
∂E
∂y
∂y
∂s
∂E
∂y
∂y
∂s
∂s
∂w
∂E
∂w
=
Backward from loss to update parameters
(true label)
How E change
when move w
How E change when move s
How E change when
move y

16
Vanishing gradient problem
x1
x2
x3
+1
+1 +1
y
y*
E
…
Error vanish with back-propagation at low layer
・Neural network architecture reached the limit before
deep learning boom

17
Features representation
・Deep learning reduced the human’s time-consuming of
features selection process in classiﬁcation task
Features selection
Parameter learning
Answer
Features selection
+
Parameter learning
Answer
Previous methods Deep Learning

Convolutional Neural Network
ConvNet diagram from Torch Tutorial
Sub sampling
Emphasis special
features
The most popular kind of deep learning model
18

How to train parameters
Trainable parameters
Values in each kernel of convolutional layer
Weigh values and bias in fully connection layer
Stochastic Gradient Descent (SGD) (others: AdaGrad, Adam,…)
Parameters update
W ← W - ß
∂E
∂W
Backward propagation
Learning rate

SGD - Mini batch
Batch learning: update all samples in each update (over-ﬁtting)
Mini batch: update parameters with some samples
20

Training trick
Data normalization
Reduce learning rate after some iterations
Momentum SGD
21

22
Auto-encoder (1/2)
y
…
How to initialize these parameters

23
Auto-encoder (2/2)
The ability of
reconstruct input
makes a good
initialization for
parameters
x x’
h(x)
Reconstruct error
E = ||x - x’||
Encode Decode

24
Robust algorithms
Data augmentation: more
noisy data → more robust
model
Nodes in network don’t
need to active all (mimic
human brain) →  
Dropout concept
…

Deep Learning - The Newest
–Albert Einstein

26
In Services
Google
Translate App
Deep learning inside (work ofﬂine)
July 29, 2015 version

27
In Services
My implemented service http://yelpio.hongo.wide.ad.jp/

28
In Medical Imaging
Segmentation
Ofﬂine due to image’s copyright

In Automatic Devices
http://blogs.nvidia.com/blog/2015/02/24/deep-learning-drive/
Self-driving car
29

In Speech Interfaces
99% is much diﬀerent with 95%
(Andrew Ng. - Baidu)
https://medium.com/s-c-a-l-e/how-baidu-mastered-mandarin-with-deep-
learning-and-lots-of-data-1d94032564a5 30

In Drug Discovery
Active
Active
Inactive
1
0
1
0
0
1
10010000110011
Chemical
compound
Assay data Finger print
+ Activity
Deep Neural Net
PubChem
Database
Prediction of Drug Activity
multiple targets
http://on-demand.gputechconf.com/gtc/2015/presentation/S5813-Nobuyuki-Ota.pdf 31

Applications in IoT(Internet of Things)
More difﬁcult tasks in AI, robotics,
information processing
Huge amount of time series data
and states of sensor & devices data
RNN (recurrent neural network)
Difﬁcult to get supervisor data
VAE (variational auto encoder)
Take action in conditions,
environments
Deep Reinforcement Learning
Image source: 
http://on-demand.gputechconf.com/gtc/2015/presentation/S5813-
Nobuyuki-Ota.pdf
32

RNN (1/3)
Recurrent Neural Network: loop inside neural network
Represent for time series data, sequence of inputs
(speech model, natural language model,…)
.
.
.
.
.
.
.
.
.
x(t) y(t)
z(t) = f( z(t-1), x(t) )
33

RNN (2/3)
Time series extension of RNN
…
…
t = 1 t = 2 t = 3 t = T
Back propagation through time (BPTT)
+ Long Short-Term Memory
How to learn
parameters?
34
…
…
…
…
…
…
…
…
…
…
…

RNN (3/3)
Neural Turing Machine [A. Graves, 2014]
Neural network which has the capability of coupling the external memories
Applications: COPY, PRIORITY SORT
35
NN with
parameters to
coupling to
external
memories

36
VAE (variational auto encoder)
Learn a mapping from some latent variable z to a
complicated distribution on x
p(x) = ∫p(x,z)dz where p(x, z) = p(x|z)p(z)
p(z) = something simple and p(x|z) = f(z) = neural network
Encode

VAE
Example: generate face image from 29 hidden
variables
http://vdumoulin.github.io/morphing_faces/online_demo.html
37

VAE
Example: learning from number
and writing style
Experiment (PFN): given a
number x, produce outputs as
remained numbers with writing
style of x
VAE: useful in half-supervised
learning process (especially when
training data are not enough)
Image from ディープラーニングが活かすIoT, Preferred Networks, Inc. 2015/06/09 Interop 2015 seminar

39
Reinforcement Learning (RL)
Environment
(street, factory,…)
Agent 
(car, robot,…)
Action
a(t)
Reward r(t)
State s(t)
Total rewards in the future
R = r(t) + µr(t+1) + µ2r(t+2) + …
(µ < 1)
Design next action

40
Deep Reinforcement Learning (RL)
Q - learning: need to know future expectation reward Q(s, a)
of action a at state s (Bellman update equation) 
Q(s, a) <— Q(s, a) + ß ( r + µ maxa’ Q(s’, a’) - Q(s, a) )
Deep Q-Learning Network [V. Mnih, 2015]
Get Q(s, a) by a deep neural network Q(s, a, w)
Maximize: L(w) = E[ (r + µ max Q(s’, a’, w) - Q(s, a, w))2 ]
Useful when there are many states

41
PFN - demo https://www.youtube.com/watch?v=RH2TmreYkdA

–Albert Einstein

The future of the ﬁeld
Better hardware and bigger
machine cluster
Better implementations and
optimizations
Understand video, text and
signals
Develop in business models
Deep learning is just
the tip of the iceberg
43

Model 1 - Framework development
Bridge the gap between
algorithms and
implementation
Provide common
interfaces and
understandable API for
users
Pylearn2
44

Model 2 - Building a deep-able hardware
New hardware (chipset)
architecture for Deep
Learning
Synopsys
http://www.bdti.com/InsideDSP/2015/04/21/Synopsys
Processors’ convolutional network capabilities
Qualcomm Zeroth Platform
Cognitive computing and custom CPU
drive Snapdragon 820 processors
https://www.qualcomm.com/news/snapdragon/2015/03/02/cognitive-
computing-and-custom-cpu-drive-next-gen-snapdragon-processors
NVIDIA DIGITS
DevBox
https://developer.nvidia.com/devbox
45

Model 3 - Deep Intelligences for IoT
Sensing
Action
Feedback
Analysis
Union
Image from ディープラーニングが活かすIoT, Preferred Networks, Inc. 2015/06/09 Interop 2015 seminar

Model 4 - Host API and deep learning service
Data
in
Model
out
https://www.metamind.io/vision/train
https://www.metamind.io/language/train
Usability + Tuning + Scale out
47

Model 5 - Personal Deep Learning
Data
in
Label
out
DeepFace
(Facebook)
http://www.adweek.com/socialtimes/deepface/433401
Facial veriﬁcation & tagging
Applied previous solutions
Almost free 48

Summary
Deep Learning has the capability to ﬁnd patterns
among data by enabling wide range of
abstraction.
Deep Learning shown signiﬁcant results in voice
and image recognition compared with
conventional machine learning methods.
Deep Learning has potential applications and
business models in some important key sectors.
The challenges are your ideas !
49

“Thank you for listening.”
• Some good materials for learning
• Papers, code, blog, seminars, online courses
• For beginner: 深層学習 (機械学習プロフェッショナルシリーズ)
• Deep Learning - an MIT Press book in preparation
• http://www.iro.umontreal.ca/~bengioy/dlbook/
50

Deep Learning And Business Models (VNITC 2015-09-13)

More Related Content

What's hot (20)

Similar to Deep Learning And Business Models (VNITC 2015-09-13) (20)

More from Ha Phuong (20)

Recently uploaded (20)

Deep Learning And Business Models (VNITC 2015-09-13)