0% found this document useful (0 votes)

8 views25 pages

Chapter 1

Restricted Boltzmann Machines (RBMs) are a type of neural network architecture that excels in unsupervised learning by modeling probabilistic dependencies between binary hidden and visible states. They are historically significant for enabling pre-training of conventional neural networks and can be adapted for both supervised and unsupervised applications. The document discusses the structure, training methods, and various applications of RBMs, highlighting their role in collaborative filtering, topic modeling, and classification tasks.

Uploaded by

Nandita Bhanja Chaudhuri

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

8 views25 pages

Chapter 1

Uploaded by

Nandita Bhanja Chaudhuri

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 25

Charu C.

Aggarwal
IBM T J Watson Research Center
Yorktown Heights, NY

Restricted Boltzmann Machines

Neural Networks and Deep Learning, Springer, 2018

Chapter 6
Restricted Boltzmann Machines

• Most of the neural architectures map inputs to outputs.

– Ideal for supervised models.

– Autoencoders can be used for unsupervised models by

replicating the output.

• Restricted Boltzmann machines are borrowed from proba-

bilistic graphical models.

– Graph of probabilistic dependencies between binary states

that are outcomes of distributions.

– Binary training data provides some examples of states.

– Ideal for unsupervised models.

Key Diﬀerences from Conventional Neural Networks

• No input to output mapping

• States are discrete samples of probability distributions with

interdependencies among samples.

• Training points provide examples of some (visible) states.

• Computational graph abstraction: The parameterized

edges deﬁne dependencies among states.

– The computational graph abstraction is main commonal-

ity (can be exploited for pre-training)

– Can approximately convert a sampling-based dependency

to real-valued operation for initialization of a related (con-
ventional) neural network
Historical Signiﬁcance

• Most of the practical applications in neural networks use su-

pervised learning.

• RBMs can still be used for unsupervised pre-training of con-

ventional neural networks and also extended to supervised
learning.

– Replace binary state outcomes with fractional probabilities

– Treat fractional values as the activations of a conventional

neural network

– Pretraining owes its historical origins to RBMs

Deﬁning a Restricted Boltzmann Machine

HIDDEN STATES
h1 h2 h3

v1 v2 v3 v4
VISIBLE STATES

• Bipartite graph of binary hidden states and visible states con-

nected by undirected edges signifying probabilistic dependen-
cies ⇒ Origin of word “restricted” from bipartite model
An Interpretable Boltzmann Machine

BEN’S JERRY’S TOM’S PARENTS SEE

TRUCK TRUCK TRUCK
HIDDEN STATES
[TRUCKS]

PARENTS LIKELY TO
BUY DIFFERENT ITEMS
FROM DIFFERENT TRUCKS
[ENCODED IN WEIGHTS]

CHILD ONLY SEES VISIBLE

CONES SUNDAE POPSICLE CUP
STATES [ICECREAMS] FROM
DAILY TRAINING DATA AND
MODELS THE WEIGHTS

• Undirected model ⇒ Probability to buy icecreams and pick

trucks depend on one another (using weights)
What Kind of Model does a Restricted Boltzmann
Machine Build?

• Probability distributions of the binary hidden and visible

states depend on one another.

– Weights on edges control probabilistic dependencies.

– Training data assumed to be samples of visible states.

• We want to learn weights that are “consistent” with training

samples.

• Use energy function to force “consistency” ⇒ Unsupervised

model.

• The model can use learned weights to output samples that

are consistent with training data ⇒ Generative model.
Notations

• We assume that the binary hidden units are h1 . . . hm and the

visible units are v1 . . . vd.

• The bias associated with the visible node vi be denoted by

(v)
bi .

(h)
• The bias associated with hidden node hj is denoted by bj .

• The weight of the edge between visible node vi and hidden

node hj is denoted by wij .

• Can be generalized to non-binary data with some work.

Probabilistic Relationships

• Want to learn weights wij so that samples of the training

data are most “consistent” with the following relationships:
1
P (hj = 1|v) = (1)
(h)
1 + exp(−bj − di=1 viwij )

1
P (vi = 1|h) = m (2)
(v)
1 + exp(−bi − j=1 hj wij )

• Use energy function to force consistency by minimizing ex-

(v) (h)
pected value of E = − i bi vi − j bj hj − i,j:i<j wij vihj
How Data is Generated from a Boltzmann Machine

• Data is generated by using Gibb’s sampling.

• Randomly initialize visible states and then sample hidden

states using Equation 1 (previous slide).

• Alternately sample hidden states and visible states using

Equations 1 and 2 until thermal equilibrium is reached.

• A particular set of visible states at thermal equilibrium pro-

vides a sample of a binary training vector.

• The weights implicitly encode the distribution by deﬁning

probabilistic dependencies.
Intuition for Weights

• Consider weights like aﬃnities ⇒ Large positive values of wij

imply that states will be “on ” together.

• We already have samples showing which visible states are

“on” together.

• Weights will be learned in such a way that hidden states will

be connected to correlated visible states with large weights.

– Biological motivation: In Hebbian learning, a synapse

between two neurons is strengthened when the neurons on
either side of the synapse have highly correlated outputs.

– Contrastive divergence algorithm learns weights.

Overview of Contrastive Divergence

• Positive phase: Draw b instances of hidden states based

on visible states ﬁxed to each of a mini-batch of b training
points ⇒ Yields vi, hj pos

• Negative phase: For each of the b instances in positive

phase continue to alternately sample visible states and hidden
states from one another for r iterations ⇒ Yields vi, hj neg

wij ⇐ wij + α vi, hj pos − vi, hj neg
(v) (v)
bi ⇐ bi + α (vi, 1pos − vi, 1neg )

(h) (h)
bj ⇐ bj + α 1, hj pos − 1, hj neg
Remarks on Contrastive Divergence

• Strictly speaking, the negative phase needs a very large num-

ber of iterations to reach thermal equilibrium in negative
phase.

• Positive phase requires only one iteration because visible

states are ﬁxed to training points.

• Contrastive divergence says that only a small number of it-

erations of negative phase are suﬃcient for “good” update
of weight vector (even without thermal equilibrium).

• In the early phases of training, one iteration is enough for

“good” update.

• Can increase number of iterations in later phases.

Utility of Unsupervised Learning

• One can use an RBM to initialize an autoencoder for binary

data (later slides).

• Treat the sigmoid-based sampling as an a sigmoid activation.

• Basic idea can be extended to multilayer neural networks by

using stacked RBMs.

– One of the earliest methods for pretraining.

Equivalence of Directed and Undirected Models

HIDDEN STATES HIDDEN STATES

EQUIVALENCE
W W WT

VISIBLE STATES VISIBLE STATES

• Replace undirected edges with directed edges

(h)
h ∼ Sigmoid(v, b ,W)
(v)
v ∼ Sigmoid(h, b ,WT)

• Replace sampling with real-valued operations

Using a Trained RBM to Initialize a Conventional
Autoencoder

VISIBLE STATES
FIX VISIBLE STATES
(RECONSTRUCTED)
HIDDEN STATES IN A LAYER TO
INPUT DATA POINT WT
W WT HIDDEN STATES
REPLACE DISCRETE (REDUCED FEATURES)
VISIBLE STATES SAMPLING WITH
REAL-VALUED W
PROBABILITIES
VISIBLE STATES (FIXED)

• Architecture on right uses real-valued sigmoid operations

rather than discrete operations ⇒ Conventional autoen-
coder!.
1
ĥj = d (3)
(h)
1 + exp(−bj − i=1 viwij )

1
v̂i = m (4)
(v)
1 + exp(−bi − j=1 ĥj wij )
Why Use an RBM to Initialize a Conventional Neural
Network?

• In the early years, conventional neural networks did not train

well (especially with increased depth).

– Vanishing and exploding gradient problems

– RBM trains with contrastive divergence (no vanishing and

exploding gradient)

• Real-valued approximation was used with stacked RBMs to

initialize deep networks

• Approach was generalized to conventional autoencoders later

Stacked RBM

RBM 3

STACKED
COPY
RBM 2 REPRESENTATION
W2

COPY
RBM 1
THE PARAMETER MATRICES W1, W2, and W3
ARE LEARNED BY SUCCESSIVELY TRAINING
RBM1, RBM2, AND RBM3 INDIVIDUALLY
(PRE-TRAINING PHASE)

• Train diﬀerent layers sequentially

Stacked RBM to Conventional Neural Network

DECODER DECODER
RECONSTRUCTION (TARGET=INPUT) RECONSTRUCTION (TARGET=INPUT)

W1T W1T +E1

W2T W2T+E2

W3T W3T+E3
CODE CODE

W3 W3+E4
FINE-TUNE
(BACKPROP)
W2 W2+E5

W1 W1+E6

FIX TO INPUT FIX TO INPUT

ENCODER ENCODER
Applications

• Pretraining can be used for supervised and unsupervised ap-

plications

– Collaborative ﬁltering: Was a component of Netﬂix prize

contest

∗ Gives diﬀerent results from an autoencoder-like archi-

tecture in an earlier lecture

– Topic models

– Classiﬁcation
Collaborative Filtering

E.T. (RATING=2)
0 1 0
1 0
1 0
1

NIXON (RATING=5)
h1
E.T. (RATING=4) 0 0
1 0
1 0
1 1

HIDDEN UNITS
0 0
1 0
1 1 0
1

HIDDEN UNITS
GANDHI (RATING=4) h2
SHREK (RATING=5) 0 0
1 0
1 1 0
1
0 0
1 0
1 0
1 1
NERO (RATING=3)
h2 0 0
1 1 0
1 0
1

• Changes for softmax activations and shared weights across

RBMs
Topic Models

h1 h2 h3 h4
BINARY HIDDEN
STATES
VISIBLE UNITS SHARE SAME
SET OF PARAMETERS BUT MULTINOMIAL
NOT HIDDEN UNITS VISIBLE STATES

LEXICON
SIZE d IS
TYPICALLY
LARGER
THAN
DOCUMENT
SIZE

NUMBER OF SOFTMAX UNITS EQUALS

DOCUMENT SIZE FOR EACH RBM
Classiﬁcation

• Can be used for unsupervised pretraining for classiﬁcation

– Goal of RBM is only to learn features in unsupervised way

– Class label does not get a state in RBM

• Can also be used to train by treating a class label as a state.

– Hidden features are connected to both class variables and

feature variables.

– Generative approach of RBMs does not fully optimize for

classiﬁcation accuracy ⇒ Need discriminative Boltzmann
Machines (Larochelle et al).
Classiﬁcation Architecture

BINARY HIDDEN STATES

W U

BINARY VISIBLE STATES MULTINOMIAL VISIBLE STATES

(FEATURES) (CLASSES)
Comments

• RBMs represent a special case of probabilistic graphical mod-

els.

• Provides an alternative to the autoencoder.

• Can be extended to non-binary data.

• These models are not quite as popular anymore.

• Signiﬁcant historical signiﬁcance in starting the idea of pre-

training for deep models.

PB Rena Mero (Wrestling Superstar Sable) 1999 - Text
0% (3)
PB Rena Mero (Wrestling Superstar Sable) 1999 - Text
81 pages
The mathematics of quantum mechanics
From Everand
The mathematics of quantum mechanics
Alessio Mangoni
No ratings yet
Luciw RBM DBN
No ratings yet
Luciw RBM DBN
38 pages
RBM, DBN, and DBM
No ratings yet
RBM, DBN, and DBM
79 pages
Unit v Deep Generative Models_Part 01
No ratings yet
Unit v Deep Generative Models_Part 01
33 pages
Deep Boltzmann Machines
No ratings yet
Deep Boltzmann Machines
8 pages
DL CO4 - PPT 2
No ratings yet
DL CO4 - PPT 2
23 pages
DL mod 5
No ratings yet
DL mod 5
2 pages
Boltzmann Machine learning (1)
No ratings yet
Boltzmann Machine learning (1)
15 pages
AItRBM Proof
No ratings yet
AItRBM Proof
23 pages
Deep Learning & Neural Networks: Kevin Duh
No ratings yet
Deep Learning & Neural Networks: Kevin Duh
86 pages
Tom M CMU ANN Lecture Notes
No ratings yet
Tom M CMU ANN Lecture Notes
68 pages
Unit-V Deep Generative Models Part-01
No ratings yet
Unit-V Deep Generative Models Part-01
41 pages
Restricted Boltzmann Machines: Abstract
No ratings yet
Restricted Boltzmann Machines: Abstract
21 pages
Deep Learning: IPAM Summer School 2012 Tutorial On
No ratings yet
Deep Learning: IPAM Summer School 2012 Tutorial On
69 pages
Neural Networks For Machine Learning: Lecture 14A Learning Layers of Features by Stacking Rbms
No ratings yet
Neural Networks For Machine Learning: Lecture 14A Learning Layers of Features by Stacking Rbms
39 pages
Boltzmann Learning
No ratings yet
Boltzmann Learning
47 pages
RBM
No ratings yet
RBM
47 pages
Study Materials - Restricted Boltzmann Machine
No ratings yet
Study Materials - Restricted Boltzmann Machine
6 pages
Training Restricted Boltzmann Machines: An Introduction
No ratings yet
Training Restricted Boltzmann Machines: An Introduction
27 pages
2501.04387v1
No ratings yet
2501.04387v1
7 pages
Chapter_12_PartI
No ratings yet
Chapter_12_PartI
54 pages
unit 3
No ratings yet
unit 3
38 pages
Unit V
No ratings yet
Unit V
21 pages
DeepLearning Book
No ratings yet
DeepLearning Book
108 pages
Nips00 Ywt
No ratings yet
Nips00 Ywt
7 pages
Nips10 Workshop Tutorial Final PDF
No ratings yet
Nips10 Workshop Tutorial Final PDF
73 pages
DL NOTES
No ratings yet
DL NOTES
34 pages
Restricted Boltzmann Machine1
No ratings yet
Restricted Boltzmann Machine1
7 pages
ANIMESH GUPTA_2021UEA6545_PPT_DL
No ratings yet
ANIMESH GUPTA_2021UEA6545_PPT_DL
23 pages
Restricted Boltzmann Machines
No ratings yet
Restricted Boltzmann Machines
8 pages
oussidi2018
No ratings yet
oussidi2018
8 pages
Unit 5-Restricted Boltzmann Machine
No ratings yet
Unit 5-Restricted Boltzmann Machine
3 pages
Restricted Boltzman Machines
No ratings yet
Restricted Boltzman Machines
25 pages
Stochastically Reducing Overfitting in D
No ratings yet
Stochastically Reducing Overfitting in D
5 pages
Boltz321 PDF
No ratings yet
Boltz321 PDF
7 pages
Signal Classifier Using Deep Learning Architecture: Dwijith R A
No ratings yet
Signal Classifier Using Deep Learning Architecture: Dwijith R A
40 pages
3
No ratings yet
3
16 pages
CNN Stanford2015
No ratings yet
CNN Stanford2015
129 pages
Unit 2.1
No ratings yet
Unit 2.1
37 pages
Unit-IV
No ratings yet
Unit-IV
3 pages
01 - Introduction To Deep Learning
No ratings yet
01 - Introduction To Deep Learning
56 pages
BOLTZMANN (2)
No ratings yet
BOLTZMANN (2)
12 pages
LN - ieML DeepLearning
No ratings yet
LN - ieML DeepLearning
130 pages
Awesome Machine Learning Papers
100% (1)
Awesome Machine Learning Papers
326 pages
Talk MLSS Part2
No ratings yet
Talk MLSS Part2
97 pages
Computational Statistical Physics Exercise Sheet 5: V H J I 1 N N Ij J I
No ratings yet
Computational Statistical Physics Exercise Sheet 5: V H J I 1 N N Ij J I
5 pages
Chapter 9
No ratings yet
Chapter 9
73 pages
RBF Neural Network
No ratings yet
RBF Neural Network
34 pages
DL Lecture8 Autoencoder
No ratings yet
DL Lecture8 Autoencoder
28 pages
Deep Learning Basics Lecture 8 Autoencoder & DBM
No ratings yet
Deep Learning Basics Lecture 8 Autoencoder & DBM
28 pages
Stochastic Networks
No ratings yet
Stochastic Networks
16 pages
A Probabilistic Theory of Deep Learning: Unit 2
No ratings yet
A Probabilistic Theory of Deep Learning: Unit 2
17 pages
3 ArtificialNeuralNetworks PDF
No ratings yet
3 ArtificialNeuralNetworks PDF
77 pages
Boltzmann Machines
No ratings yet
Boltzmann Machines
7 pages
Deep Belief Network
No ratings yet
Deep Belief Network
4 pages
3545572-supp
No ratings yet
3545572-supp
6 pages
Statistical Mechanics of Deep Learning
No ratings yet
Statistical Mechanics of Deep Learning
30 pages
A Group Theoretic Perspective On Unsupervised Deep Learning
No ratings yet
A Group Theoretic Perspective On Unsupervised Deep Learning
2 pages
Artificial Neural Networks An Artificial Neuron: X W X W S X W W y
No ratings yet
Artificial Neural Networks An Artificial Neuron: X W X W S X W W y
7 pages
L06 Slides.mlp3
No ratings yet
L06 Slides.mlp3
26 pages
33932803
No ratings yet
33932803
7 pages
Celex 32016L1629 en TXT
No ratings yet
Celex 32016L1629 en TXT
59 pages
UFOs and Aliens - Exceptional Cases of Alien Contact (PDFDrive)
No ratings yet
UFOs and Aliens - Exceptional Cases of Alien Contact (PDFDrive)
217 pages
Valuation of Equity
No ratings yet
Valuation of Equity
2 pages
Moby Dick
No ratings yet
Moby Dick
32 pages
HEDIONE®-964898 MSDS
No ratings yet
HEDIONE®-964898 MSDS
9 pages
MS7SL87O Global Marketing and Digital Business
No ratings yet
MS7SL87O Global Marketing and Digital Business
26 pages
Ansel Adams Thesis Statement
100% (3)
Ansel Adams Thesis Statement
7 pages
C-IGS-4215-16T2S_L
No ratings yet
C-IGS-4215-16T2S_L
9 pages
Therapeutic Phlebotomy Order Form
No ratings yet
Therapeutic Phlebotomy Order Form
2 pages
Budget Management Thesis
100% (3)
Budget Management Thesis
5 pages
Chapter 6: MEASUREMENT: Conceptual Framework in Financial Reporting
No ratings yet
Chapter 6: MEASUREMENT: Conceptual Framework in Financial Reporting
24 pages
Isuzu Gold Star: Certified Pre-Owned Truck Warranty
No ratings yet
Isuzu Gold Star: Certified Pre-Owned Truck Warranty
2 pages
Section 200 Complaint in Vinay Rai V
No ratings yet
Section 200 Complaint in Vinay Rai V
4 pages
Sample Loan
No ratings yet
Sample Loan
47 pages
Design Practice: Introduction
No ratings yet
Design Practice: Introduction
3 pages
Beckman DU800 Manual
100% (1)
Beckman DU800 Manual
162 pages
Smash The Stack
100% (1)
Smash The Stack
29 pages
DC Motors
No ratings yet
DC Motors
3 pages
Chapter 2 Demand Supply
No ratings yet
Chapter 2 Demand Supply
79 pages
Gathering Evidences and Forms of Evidence1
No ratings yet
Gathering Evidences and Forms of Evidence1
12 pages
One Last Cry Lyric
No ratings yet
One Last Cry Lyric
13 pages
Advanced Algorithmic Trading and Portfolio Management - Unit 3 - Week 1
No ratings yet
Advanced Algorithmic Trading and Portfolio Management - Unit 3 - Week 1
3 pages
Unit 1
No ratings yet
Unit 1
24 pages
430 New Study
No ratings yet
430 New Study
88 pages
Van's Gauge Installation
No ratings yet
Van's Gauge Installation
10 pages
Exaltitude
No ratings yet
Exaltitude
15 pages
Sprinkler Irrigation System For 1 Acre Land
No ratings yet
Sprinkler Irrigation System For 1 Acre Land
4 pages
Welding Key Concepts
No ratings yet
Welding Key Concepts
3 pages