Chapter 1
Chapter 1
Aggarwal
IBM T J Watson Research Center
Yorktown Heights, NY
HIDDEN STATES
h1 h2 h3
v1 v2 v3 v4
VISIBLE STATES
PARENTS LIKELY TO
BUY DIFFERENT ITEMS
FROM DIFFERENT TRUCKS
[ENCODED IN WEIGHTS]
(h)
• The bias associated with hidden node hj is denoted by bj .
1
P (vi = 1|h) = m (2)
(v)
1 + exp(−bi − j=1 hj wij )
VISIBLE STATES
FIX VISIBLE STATES
(RECONSTRUCTED)
HIDDEN STATES IN A LAYER TO
INPUT DATA POINT WT
W WT HIDDEN STATES
REPLACE DISCRETE (REDUCED FEATURES)
VISIBLE STATES SAMPLING WITH
REAL-VALUED W
PROBABILITIES
VISIBLE STATES (FIXED)
1
v̂i = m (4)
(v)
1 + exp(−bi − j=1 ĥj wij )
Why Use an RBM to Initialize a Conventional Neural
Network?
RBM 3
W3
STACKED
COPY
RBM 2 REPRESENTATION
W2
W1
COPY
RBM 1
THE PARAMETER MATRICES W1, W2, and W3
ARE LEARNED BY SUCCESSIVELY TRAINING
RBM1, RBM2, AND RBM3 INDIVIDUALLY
(PRE-TRAINING PHASE)
DECODER DECODER
RECONSTRUCTION (TARGET=INPUT) RECONSTRUCTION (TARGET=INPUT)
W2T W2T+E2
W3T W3T+E3
CODE CODE
W3 W3+E4
FINE-TUNE
(BACKPROP)
W2 W2+E5
W1 W1+E6
– Topic models
– Classification
Collaborative Filtering
E.T. (RATING=2)
0 1 0
1 0
1 0
1
NIXON (RATING=5)
h1
E.T. (RATING=4) 0 0
1 0
1 0
1 1
HIDDEN UNITS
0 0
1 0
1 1 0
1
h1
HIDDEN UNITS
GANDHI (RATING=4) h2
SHREK (RATING=5) 0 0
1 0
1 1 0
1
0 0
1 0
1 0
1 1
NERO (RATING=3)
h2 0 0
1 1 0
1 0
1
h1 h2 h3 h4
BINARY HIDDEN
STATES
VISIBLE UNITS SHARE SAME
SET OF PARAMETERS BUT MULTINOMIAL
NOT HIDDEN UNITS VISIBLE STATES
LEXICON
SIZE d IS
TYPICALLY
LARGER
THAN
DOCUMENT
SIZE
W U