Open navigation menu

Scribd

0% found this document useful (0 votes)

50 views

Autoregressive Models

This document describes autoregressive generative models. It begins by explaining the chain rule of probability and how autoregressive models aim to learn a data distribution by modeling each conditional probability in the chain rule. It then provides an example using a binarized MNIST dataset, showing how each pixel is modeled conditioned on previous pixels in a raster scan order. The conditional distributions are modeled using parametrized functions like logistic regression, with the goal of learning the optimum parameter values from training data.

Uploaded by

Copyright

© © All Rights Reserved

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

50 views

Autoregressive Models

This document describes autoregressive generative models. It begins by explaining the chain rule of probability and how autoregressive models aim to learn a data distribution by modeling each conditional probability in the chain rule. It then provides an example using a binarized MNIST dataset, showing how each pixel is modeled conditioned on previous pixels in a raster scan order. The conditional distributions are modeled using parametrized functions like logistic regression, with the goal of learning the optimum parameter values from training data.

Uploaded by

Copyright

© © All Rights Reserved

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 57

SYS863-01

Deep Generative Modeling: Theory and Applications

Mohammadhadi (Hadi) Shateri

Email: [email protected]
Generative Models
Generative Model

MH. Shateri, SYS863-01: Deep Generative Modeling: Theory and Applications (Summer 2023) 2
Generative Models
Autoregressive (AR)

Autoregressive Generative Models:

Consider a D-dimensional random variable 𝑋 ∈ ℝ𝐷 with probability distribution 𝑝 𝑥 = 𝑝(𝑥1 , 𝑥2 , … , 𝑥𝐷 ).
Using the chain rule of probability we can say:
𝐷

𝑝 𝑥 = 𝑝(𝑥1 , 𝑥2 , … , 𝑥𝐷 ) = 𝑝 𝑥1 𝑝 𝑥2 |𝑥1 𝑝 𝑥3 |𝑥1 , 𝑥2 𝑝 𝑥4 |𝑥1 , 𝑥2 , 𝑥3 … 𝑝 𝑥𝐷 |𝑥1:𝐷−1 = ෑ 𝑝(𝑥𝑑 |𝑥1:𝑑−1 )

𝑑=1

where we define 𝑝 𝑥1 𝑥1:0 = 𝑝(𝑥1 ).

MH. Shateri, SYS863-01: Deep Generative Modeling: Theory and Applications (Summer 2023) 3
Generative Models
Autoregressive (AR)

Autoregressive Generative Models:

Example: consider binarized MNIST dataset for which 𝑋 ∈ {0,1}784 or in other words 𝑋𝑑 ∈ 0,1 ; 𝑑 = 1,2, … , 784.

𝑝 𝑥 = 𝑝(𝑥1 , 𝑥2 , … , 𝑥28 , 𝑥29 , … , 𝑥784 )

Here are two samples/realizations of random variable 𝑋:

MH. Shateri, SYS863-01: Deep Generative Modeling: Theory and Applications (Summer 2023) 4
Generative Models
Autoregressive (AR)

Autoregressive Generative Models:

Example: consider binarized MNIST dataset for which 𝑋 ∈ {0,1}784 or in other words 𝑋𝑑 ∈ 0,1 ; 𝑑 = 1,2, … , 784.

𝑝 𝑥 = 𝑝(𝑥1 , 𝑥2 , … , 𝑥28 , 𝑥29 , … , 𝑥784 )

Starting from the most
Here are two samples/realizations of random variable 𝑋: top-left pixel…

MH. Shateri, SYS863-01: Deep Generative Modeling: Theory and Applications (Summer 2023) 5
Generative Models
Autoregressive (AR)

Autoregressive Generative Models:

Example: consider binarized MNIST dataset for which 𝑋 ∈ {0,1}784 or in other words 𝑋𝑑 ∈ 0,1 ; 𝑑 = 1,2, … , 784.

𝑝 𝑥 = 𝑝(𝑥1 , 𝑥2 , … , 𝑥28 , 𝑥29 , … , 𝑥784 )

Each time we move one
Here are two samples/realizations of random variable 𝑋: pixel to the right…

MH. Shateri, SYS863-01: Deep Generative Modeling: Theory and Applications (Summer 2023) 6
Generative Models
Autoregressive (AR)

Autoregressive Generative Models:

Example: consider binarized MNIST dataset for which 𝑋 ∈ {0,1}784 or in other words 𝑋𝑑 ∈ 0,1 ; 𝑑 = 1,2, … , 784.

𝑝 𝑥 = 𝑝(𝑥1 , 𝑥2 , … , 𝑥28 , 𝑥29 , … , 𝑥784 )

Until reaching to the
Here are two samples/realizations of random variable 𝑋: end of that row…

MH. Shateri, SYS863-01: Deep Generative Modeling: Theory and Applications (Summer 2023) 7
Generative Models
Autoregressive (AR)

Autoregressive Generative Models:

Example: consider binarized MNIST dataset for which 𝑋 ∈ {0,1}784 or in other words 𝑋𝑑 ∈ 0,1 ; 𝑑 = 1,2, … , 784.

𝑝 𝑥 = 𝑝(𝑥1 , 𝑥2 , … , 𝑥28 , 𝑥29 , … , 𝑥784 )

Then go to the next row
Here are two samples/realizations of random variable 𝑋: and repeat it to the end.

MH. Shateri, SYS863-01: Deep Generative Modeling: Theory and Applications (Summer 2023) 8
Generative Models
Autoregressive (AR)

Autoregressive Generative Models:

Example: consider binarized MNIST dataset for which 𝑋 ∈ {0,1}784 or in other words 𝑋𝑑 ∈ 0,1 ; 𝑑 = 1,2, … , 784.

𝑝 𝑥 = 𝑝(𝑥1 , 𝑥2 , … , 𝑥28 , 𝑥29 , … , 𝑥784 )

This is called a raster
Here are two samples/realizations of random variable 𝑋: scan order of images!

MH. Shateri, SYS863-01: Deep Generative Modeling: Theory and Applications (Summer 2023) 9
Generative Models
Autoregressive (AR)

Note: raster scan order of an image is ordering of pixels by rows (row by row and pixel by pixel within every
row) thus, considering the chain rule, every pixel depends on all the pixels above and to the left of it.

MH. Shateri, SYS863-01: Deep Generative Modeling: Theory and Applications (Summer 2023) 10
Generative Models
Autoregressive (AR)

Autoregressive Generative Models:

Consider a D-dimensional random variable 𝑋 ∈ ℝ𝐷 with probability distribution 𝑝 𝑥 = 𝑝(𝑥1 , 𝑥2 , … , 𝑥𝐷 ).
Using the chain rule of probability we can say:
𝐷

𝑝 𝑥 = 𝑝(𝑥1 , 𝑥2 , … , 𝑥𝐷 ) = 𝑝 𝑥1 𝑝 𝑥2 |𝑥1 𝑝 𝑥3 |𝑥1 , 𝑥2 𝑝 𝑥4 |𝑥1 , 𝑥2 , 𝑥3 … 𝑝 𝑥𝐷 |𝑥1:𝐷−1 = ෑ 𝑝(𝑥𝑑 |𝑥1:𝑑−1 )

𝑑=1

where we define 𝑝 𝑥1 𝑥1:0 = 𝑝(𝑥1 ).

Autoregressive Generative Models aim to learn the probability distribution 𝑝 𝑥 = 𝑝(𝑥1 , 𝑥2 , … , 𝑥𝐷 ) by

modeling each of the conditional probabilities in the chain rule 𝑝 𝑥 = ς𝐷
𝑑=1 𝑝(𝑥𝑑 |𝑥1:𝑑−1 ).

MH. Shateri, SYS863-01: Deep Generative Modeling: Theory and Applications (Summer 2023) 11
Generative Models
Autoregressive (AR)

Autoregressive Generative Models:

Example: given binarized MNIST dataset (shown with random variable 𝑋 ∈ {0,1}784 ) that includes images of handwritten
digits with size 28×28 where each pixel can either be 0 (black) or 1(white).

𝑝 𝑥 = 𝑝 𝑥1 , 𝑥2 , … , 𝑥784 = 𝑝 𝑥1 𝑝 𝑥2 |𝑥1 𝑝 𝑥3 |𝑥1 , 𝑥2 … 𝑝 𝑥𝑑 |𝑥1 , 𝑥2 , … , 𝑥𝑑−1 … 𝑝 𝑥784 |𝑥1:784−1

Now we need to model (effectively) each conditional probability.

MH. Shateri, SYS863-01: Deep Generative Modeling: Theory and Applications (Summer 2023) 12
Generative Models
Autoregressive (AR)

Example: (Cont.)

𝑝 𝑥 = 𝑝 𝑥1 , 𝑥2 , … , 𝑥784 = 𝑝 𝑥1 𝑝 𝑥2 |𝑥1 𝑝 𝑥3 |𝑥1 , 𝑥2 … 𝑝 𝑥𝑑 |𝑥1 , 𝑥2 , … , 𝑥𝑑−1 … 𝑝 𝑥784 |𝑥1:784−1

Each conditional distributions are modeled by a parametrized function and then we use the given training data to learn
the optimum values of parameters!

We will use the approach presented here Brendan J. Frey. Graphical models for machine learning and digital communication. MIT press, 1998

MH. Shateri, SYS863-01: Deep Generative Modeling: Theory and Applications (Summer 2023) 13
Generative Models
Autoregressive (AR)

Example: (Cont.)

𝑝 𝑥 = 𝑝 𝑥1 , 𝑥2 , … , 𝑥784 = 𝑝 𝑥1 𝑝 𝑥2 |𝑥1 𝑝 𝑥3 |𝑥1 , 𝑥2 … 𝑝 𝑥𝑑 |𝑥1 , 𝑥2 , … , 𝑥𝑑−1 … 𝑝 𝑥784 |𝑥1:784−1

𝑝 𝑥1 ; 𝑤 (1) : 𝑝 𝑥1 = 1; 𝑤 (1) = 𝑤 (1) , 𝑝 𝑥1 = 0; 𝑤 (1) = 1 − 𝑤 1 0 ≤ 𝑤 (1) ≤ 1

MH. Shateri, SYS863-01: Deep Generative Modeling: Theory and Applications (Summer 2023) 14
Generative Models
Autoregressive (AR)

Example: (Cont.)

𝑝 𝑥 = 𝑝 𝑥1 , 𝑥2 , … , 𝑥784 = 𝑝 𝑥1 𝑝 𝑥2 |𝑥1 𝑝 𝑥3 |𝑥1 , 𝑥2 … 𝑝 𝑥𝑑 |𝑥1 , 𝑥2 , … , 𝑥𝑑−1 … 𝑝 𝑥784 |𝑥1:784−1

𝑝 𝑥1 ; 𝑤 (1) : 𝑝 𝑥1 = 1; 𝑤 (1) = 𝑤 (1) , 𝑝 𝑥1 = 0; 𝑤 (1) = 1 − 𝑤 1 0 ≤ 𝑤 (1) ≤ 1

(2) (2)
𝑝 𝑥2 |𝑥1 ; 𝒘(2) : 𝑝 𝑥2 = 1|𝑥1 ; 𝒘(2) = 𝜎 𝑤0 + 𝑤1 𝑥1

Note: The conditional distributions are modeled by a parametrized functions (here logistic regression), i.e., a logistic
regression is used to predict the next pixel given all the previous pixels. This is what we mean by an autoregressive model.

MH. Shateri, SYS863-01: Deep Generative Modeling: Theory and Applications (Summer 2023) 15
Generative Models
Autoregressive (AR)

Example: (Cont.)

𝑝 𝑥 = 𝑝 𝑥1 , 𝑥2 , … , 𝑥784 = 𝑝 𝑥1 𝑝 𝑥2 |𝑥1 𝑝 𝑥3 |𝑥1 , 𝑥2 … 𝑝 𝑥𝑑 |𝑥1 , 𝑥2 , … , 𝑥𝑑−1 … 𝑝 𝑥784 |𝑥1:784−1

𝑝 𝑥1 ; 𝑤 (1) : 𝑝 𝑥1 = 1; 𝑤 (1) = 𝑤 (1) , 𝑝 𝑥1 = 0; 𝑤 (1) = 1 − 𝑤 1 0 ≤ 𝑤 (1) ≤ 1

(2) (2)
𝑝 𝑥2 |𝑥1 ; 𝒘(2) : 𝑝 𝑥2 = 1|𝑥1 ; 𝒘(2) = 𝜎 𝑤0 + 𝑤1 𝑥1

1
Sigmoid (logistic) function: 𝜎 𝑧 =
1+𝑒 −𝑧

MH. Shateri, SYS863-01: Deep Generative Modeling: Theory and Applications (Summer 2023) 16
Generative Models
Autoregressive (AR)

Example: (Cont.)

𝑝 𝑥 = 𝑝 𝑥1 , 𝑥2 , … , 𝑥784 = 𝑝 𝑥1 𝑝 𝑥2 |𝑥1 𝑝 𝑥3 |𝑥1 , 𝑥2 … 𝑝 𝑥𝑑 |𝑥1 , 𝑥2 , … , 𝑥𝑑−1 … 𝑝 𝑥784 |𝑥1:784−1

𝑝 𝑥1 ; 𝑤 (1) : 𝑝 𝑥1 = 1; 𝑤 (1) = 𝑤 (1) , 𝑝 𝑥1 = 0; 𝑤 (1) = 1 − 𝑤 1 0 ≤ 𝑤 (1) ≤ 1

(2) (2)
𝑝 𝑥2 |𝑥1 ; 𝒘(2) : 𝑝 𝑥2 = 1|𝑥1 ; 𝒘(2) = 𝜎 𝑤0 + 𝑤1 𝑥1

(3) (3) (3)

𝑝 𝑥3 |𝑥1 , 𝑥2 ; 𝒘(3) :𝑝 𝑥3 = 1|𝑥1 , 𝑥2 ; 𝒘(3) = 𝜎 𝑤0 + 𝑤1 𝑥1 + 𝑤2 𝑥2

MH. Shateri, SYS863-01: Deep Generative Modeling: Theory and Applications (Summer 2023) 17
Generative Models
Autoregressive (AR)

Example: (Cont.)

𝑝 𝑥 = 𝑝 𝑥1 , 𝑥2 , … , 𝑥784 = 𝑝 𝑥1 𝑝 𝑥2 |𝑥1 𝑝 𝑥3 |𝑥1 , 𝑥2 … 𝑝 𝑥𝑑 |𝑥1 , 𝑥2 , … , 𝑥𝑑−1 … 𝑝 𝑥784 |𝑥1:784−1

𝑝 𝑥1 ; 𝑤 (1) : 𝑝 𝑥1 = 1; 𝑤 (1) = 𝑤 (1) , 𝑝 𝑥1 = 0; 𝑤 (1) = 1 − 𝑤 1 0 ≤ 𝑤 (1) ≤ 1

(2) (2)
𝑝 𝑥2 |𝑥1 ; 𝒘(2) : 𝑝 𝑥2 = 1|𝑥1 ; 𝒘(2) = 𝜎 𝑤0 + 𝑤1 𝑥1

(3) (3) (3)

𝑝 𝑥3 |𝑥1 , 𝑥2 ; 𝒘(3) :𝑝 𝑥3 = 1|𝑥1 , 𝑥2 ; 𝒘(3) = 𝜎 𝑤0 + 𝑤1 𝑥1 + 𝑤2 𝑥2
…
(𝑑) (𝑑)
𝑝 𝑥𝑑 |𝑥1 , 𝑥2 , … , 𝑥𝑑−1 ; 𝒘(𝑑) :𝑝 𝑥𝑑 = 1|𝑥1:𝑑−1 ; 𝒘(𝑑) = 𝜎 𝑤0 + σ𝑑−1
𝑖=1 𝑤𝑖 𝑥𝑖

MH. Shateri, SYS863-01: Deep Generative Modeling: Theory and Applications (Summer 2023) 18
Generative Models
Autoregressive (AR)

Example: (Cont.)

𝑝 𝑥 = 𝑝 𝑥1 , 𝑥2 , … , 𝑥784 = 𝑝 𝑥1 𝑝 𝑥2 |𝑥1 𝑝 𝑥3 |𝑥1 , 𝑥2 … 𝑝 𝑥𝑑 |𝑥1 , 𝑥2 , … , 𝑥𝑑−1 … 𝑝 𝑥784 |𝑥1:784−1

𝑝 𝑥1 ; 𝑤 (1) : 𝑝 𝑥1 = 1; 𝑤 (1) = 𝑤 (1) , 𝑝 𝑥1 = 0; 𝑤 (1) = 1 − 𝑤 1 0 ≤ 𝑤 (1) ≤ 1

(2) (2)
𝑝 𝑥2 |𝑥1 ; 𝒘(2) : 𝑝 𝑥2 = 1|𝑥1 ; 𝒘(2) = 𝜎 𝑤0 + 𝑤1 𝑥1

(3) (3) (3)

𝑝 𝑥3 |𝑥1 , 𝑥2 ; 𝒘(3) :𝑝 𝑥3 = 1|𝑥1 , 𝑥2 ; 𝒘(3) = 𝜎 𝑤0 + 𝑤1 𝑥1 + 𝑤2 𝑥2
…
(𝑑) (𝑑)
𝑝 𝑥𝑑 |𝑥1 , 𝑥2 , … , 𝑥𝑑−1 ; 𝒘(𝑑) :𝑝 𝑥𝑑 = 1|𝑥1:𝑑−1 ; 𝒘(𝑑) = 𝜎 𝑤0 + σ𝑑−1
𝑖=1 𝑤𝑖 𝑥𝑖
…
(784) (784)
𝑝 𝑥784 |𝑥1:783; 𝒘(784) :𝑝 𝑥784 = 1|𝑥1:783 ; 𝒘(784) = 𝜎 𝑤0 + σ783
𝑖=1 𝑤𝑖 𝑥𝑖

MH. Shateri, SYS863-01: Deep Generative Modeling: Theory and Applications (Summer 2023) 19
Generative Models
Autoregressive (AR)

Autoregressive Generative Models: Fully Visible Sigmoid Belief Network

Modelling these conditional distributions 𝑝 𝑥 = 𝑝(𝑥1 , 𝑥2 , … , 𝑥𝐷 ) = ς𝐷
𝑑=1 𝑝(𝑥𝑑 |𝑥1:𝑑−1 ) using parametrized Sigmoid is

called Fully Visible Sigmoid Belief Network (FVSBN).

𝑥ො1 𝑥ො2 𝑥ො3 𝑥ො4

𝑥1 𝑥2 𝑥3 𝑥4

❑ How many parameters the model has? Assume 𝑋 ∈ {0,1}𝐷

(𝐷+1)×𝐷
There are 1 parameter for 𝑝(𝑥1 ), 2 for 𝑝(𝑥2 |𝑥1 ), …, D parameters for 𝑝(𝑥𝐷 |𝑥1:𝐷−1 ) thus: #parameters = 1 + 2 + ⋯ + 𝐷 = .
2
Note: This is a function of 𝐷 2 .

MH. Shateri, SYS863-01: Deep Generative Modeling: Theory and Applications (Summer 2023) 20
Generative Models
Autoregressive (AR)

Example: Assume we are given binarized 2×2 images (shown with random variable 𝑋 ∈ {0,1}4), i.e., each pixel can either
be 0 or 1. The probability distribution 𝑝 𝑥 is modelled by parameterized Sigmoid functions as below:

𝑝 𝑥 = 𝑝 𝑥1 , 𝑥2 , 𝑥3 , 𝑥4 = 𝑝 𝑥1 𝑝 𝑥2 |𝑥1 𝑝 𝑥3 |𝑥1 , 𝑥2 𝑝 𝑥4 |𝑥1 , 𝑥2 , 𝑥3

𝑝 𝑥1 = 1; 𝑤 (1) = 𝑤 (1)
(2) (2)
𝑝 𝑥2 = 1|𝑥1 ; 𝒘(2) = 𝜎 𝑤0 + 𝑤1 𝑥1
(3) (3) (3)
𝑝 𝑥3 = 1|𝑥1 , 𝑥2 ; 𝒘(3) = 𝜎 𝑤0 + 𝑤1 𝑥1 + 𝑤2 𝑥2
(4) (4) (4) (4)
𝑝 𝑥4 = 1|𝑥1 , 𝑥2 , 𝑥3 ; 𝒘(4) = 𝜎 𝑤0 + 𝑤1 𝑥1 + 𝑤2 𝑥2 + 𝑤3 𝑥3

Q1: Find the likelihood for a given sample 𝑥 = (𝑥1 , 𝑥2 , 𝑥3 , 𝑥4 ) = 1,1,0,0 .

Q2: Assume the optimum parameters are given as 𝑤 1∗ = 0.9, 𝒘(2∗) = [−0.5, 0.1] , 𝒘(3∗) = [1, 0.5, −1] , 𝒘(4∗) = [−2, 1.5, 0.5, 1].
How a new sample 𝑥ො = (𝑥ො1 , 𝑥ො2 , 𝑥ො3 , 𝑥ො4 ) can be generated?
Note: in Python to sample a random value from Bernoulli distribution with parameter α we can use numpy.random.choice( 0,1 , p = [1 − α, α]).

MH. Shateri, SYS863-01: Deep Generative Modeling: Theory and Applications (Summer 2023) 21
Generative Models
Autoregressive (AR)

Solution:

𝑝 𝑥1 = 1, 𝑥2 = 1, 𝑥3 = 0, 𝑥4 = 0 = 𝑝 𝑥1 = 1 𝑝 𝑥2 = 1|𝑥1 = 1 𝑝 𝑥3 = 0|𝑥1 = 1, 𝑥2 = 1 𝑝 𝑥4 = 0|𝑥1 = 1, 𝑥2 = 1, 𝑥3 = 0

(2) (2) (3) (3) (3) (4) (4) (4) (4)
= 𝑤 (1) 𝜎 𝑤0 + 𝑤1 1 𝜎 𝑤0 + 𝑤1 1 + 𝑤2 1 𝜎 𝑤0 + 𝑤1 1 + 𝑤2 1 + 𝑤3 0

Note: The optimum values of these parameters (𝑤 (1), 𝒘(2) , 𝒘(3) , 𝒘(4) ) are found by maximizing the (log) likelihood over
the training data samples. It is equivalent to minimizing the negative (log) likelihood!

• Sample 𝑥ො1 ~𝑝 𝑥1 ; 𝑤 (1) randomly (e.g., numpy.random.choice( 0,1 , 𝑝 = [1 − 𝑤 1

, 𝑤 (1) ])).

• Sample 𝑥ො2 ~𝑝 𝑥2 |𝑥1 = 𝑥ො1 ; 𝒘(2) randomly.

• Sample 𝑥ො3 ~𝑝 𝑥3 |𝑥1 = 𝑥ො1 , 𝑥2 = 𝑥ො2 ; 𝒘(3) randomly.

• Sample 𝑥ො4 ~𝑝 𝑥4 |𝑥1 = 𝑥ො1 , 𝑥2 = 𝑥ො2 , 𝑥3 = 𝑥ො3 ; 𝒘(4) randomly.

MH. Shateri, SYS863-01: Deep Generative Modeling: Theory and Applications (Summer 2023) 22
Generative Models
Autoregressive (AR)

Autoregressive Generative Models: Fully Visible Sigmoid Belief Network

Here is a result of FVSBN trained on the Caltech 101 Silhouettes dataset. The result is from Gan et al., PMLR 2015.

From Figure 4 of article Gan et al., PMLR 2015: Training data (on left) and synthesized samples (on right).
MH. Shateri, SYS863-01: Deep Generative Modeling: Theory and Applications (Summer 2023) 23
Generative Models
Autoregressive (AR)

Autoregressive Generative Models: Neural Autoregressive Density Estimator (NADE)

The FVSBN can be improved by using a neural network. In addition, weight sharing can be used. The resulting model is
called Neural Autoregressive Density Estimator or NADE Larochelle et al., JMLR 2011.
𝐷

𝑝 𝑥 = 𝑝(𝑥1 , 𝑥2 , … , 𝑥𝐷 ) = 𝑝 𝑥1 𝑝 𝑥2 |𝑥1 𝑝 𝑥3 |𝑥1 , 𝑥2 𝑝 𝑥4 |𝑥1 , 𝑥2 , 𝑥3 … 𝑝 𝑥𝐷 |𝑥1:𝐷−1 = ෑ 𝑝(𝑥𝑑 |𝑥1:𝑑−1 )

𝑑=1

𝑥1

𝑥2

𝑥3 𝑥 Neural Network 𝑝(𝑥𝑑 |𝑥1:𝑑−1 )

…

𝑥𝐷
The 𝑑 th output should be just a function of
the previous 𝑑 − 1 inputs.

MH. Shateri, SYS863-01: Deep Generative Modeling: Theory and Applications (Summer 2023) 24
Generative Models
Autoregressive (AR)

Autoregressive Generative Models: Neural Autoregressive Density Estimator (NADE)

The FVSBN can be improved by using a neural network. In addition, weight sharing can be used. The resulting model is
called Neural Autoregressive Density Estimator or NADE Larochelle et al., JMLR 2011.

Let see how a simple NADE can be formulated: The goal is to use a neural network to model all of the conditional
probabilities in 𝑝 𝑥 = 𝑝 𝑥1 𝑝 𝑥2 |𝑥1 𝑝 𝑥3 |𝑥1 , 𝑥2 𝑝 𝑥4 |𝑥1 , 𝑥2 , 𝑥3 … 𝑝 𝑥𝐷 |𝑥1:𝐷−1 . Which means we need a neural
network that receives 𝐷 inputs 𝑥1 , 𝑥2 , …, 𝑥𝐷 and returns 𝐷 outputs (each corresponding to one of the terms in the joint
factorization). BUT, we need to be careful since the 𝑑 th output should be just a function of the previous 𝑑 − 1 inputs (why is
it so? Because looking at the 𝑑 th term in the factorization we have 𝑝 𝑥𝑑 𝑥1:𝑑−1 ). The next thing that we should be careful about is

that we cannot simply use a fully connected layer! Why? Because in the fully connected layer every nodes in the hidden
layer sees all the inputs which means all the nodes in the next layers after that up to the output layer are a function of all
the inputs! We should avoid such a thing happens. BUT how? NADE has a solution for that …

MH. Shateri, SYS863-01: Deep Generative Modeling: Theory and Applications (Summer 2023) 25
Generative Models
Autoregressive (AR)

Autoregressive Generative Models: Neural Autoregressive Density Estimator (NADE)

Masking the input Shared NN

Hidden
𝑾 representation
𝑥1
Hadamard product 𝑉𝑑
𝑥2 (element-wise multiplication)

𝑥3 𝑥 ⊙ ℎ𝑑 𝑝(𝑥𝑑 |𝑥1:𝑑−1 )

…
…

…
…
…
…

…
𝑥𝐷 …
𝑚𝑑

ℎ𝑑 = 𝜎 𝑾(𝑥 ⊙ 𝑚𝑑 ) + 𝑏
1 1 … 1 0 … 0
𝑝 𝑥𝑑 |𝑥1:𝑑−1 = 𝜎 𝑉𝑑 𝑇 ℎ𝑑 + 𝑐𝑑
𝑑−1

MH. Shateri, SYS863-01: Deep Generative Modeling: Theory and Applications (Summer 2023) 26
Generative Models
Autoregressive (AR)

Autoregressive Generative Models: Neural Autoregressive Density Estimator (NADE)

First, associated to each output, a hidden representation is computed using

only the relevant inputs (e.g., by masking inputs).

𝑾
𝑥1 𝑉1 𝑝 𝑥1
ℎ𝑑 = 𝜎 𝑾(𝑥 ⊙ 𝑚𝑑 ) + 𝑏 𝑾 is an 𝑀 × 𝐷 matrix (shared). … ℎ1

𝑥2
𝑉2 𝑝 𝑥2 |𝑥1
𝑥1 … ℎ2

𝑥 = … is an 𝐷 × 1 vector. 𝑉3 𝑝 𝑥3 |𝑥1 , 𝑥2
Then, the outputs are computed as follows: 𝑥3 … ℎ3
𝑥𝐷

…
…
…
𝑚𝑑 is an 𝐷 × 1 mask vector that its 𝑉𝐷
𝑝 𝑥𝑑 |𝑥1:𝑑−1 = 𝜎 𝑉𝑑 𝑇 ℎ𝑑 + 𝑐𝑑 first (𝑑 − 1) elements are all 1 and 𝑥𝐷 … ℎ𝐷 𝑝 𝑥𝐷 |𝑥1:𝐷−1
the rest are 0. 𝑀 units
Note: The number of parameters is a function 𝑏 is an 𝑀 × 1 vector (shared).
of 𝐷𝑀 and not 𝐷2 !
𝑉𝑑 is an 𝑀 × 1 vector.
𝑐𝑑 is a scalar.

MH. Shateri, SYS863-01: Deep Generative Modeling: Theory and Applications (Summer 2023) 27
Generative Models
Autoregressive (AR)

Autoregressive Generative Models: Neural Autoregressive Density Estimator (NADE)

𝑁
How NADE can be trained? assume a set of 𝑁 training data samples 𝒮 = 𝑥 (𝑖) 𝑖=1
is given.

Using maximum likelihood or equivalently by minimizing the negative log-likelihood (NLL).

𝑁 𝑁 𝐷
1 1
ℒ 𝜃 = 𝔼𝑋 𝑵𝑳𝑳 = 𝔼𝑋 −log 𝑝 𝑋 = ෍ −log 𝑝 𝑥 (𝑖) = ෍ ෍ − log 𝑝 𝑥 (𝑖) 𝑑 |𝑥 (𝑖)1:𝑑−1
𝑁 𝑁
𝑖=1 𝑖=1 𝑑=1

where 𝜃 = {𝑾, 𝑏, 𝑉1 , 𝑐1 , 𝑉2 , 𝑐2 , … , 𝑉𝐷 , 𝑐𝐷 } includes the parameters of the NADE that we aim to learn.

MH. Shateri, SYS863-01: Deep Generative Modeling: Theory and Applications (Summer 2023) 28
Generative Models
Autoregressive (AR)

Example: Consider a simple case where 𝑋 ∈ {0,1}3 , i.e. binary sequences with length 𝐷 = 3. Compute the loss function of
the NADE for a training instance 𝑥 (1) = (𝑥 (1)1 , 𝑥 (1) 2 , 𝑥 (1) 3 ) = 1,0,0 .

𝑁 𝑁 𝐷
1 1
ℒ 𝜃 = 𝔼𝑋 𝑵𝑳𝑳 = 𝔼𝑋 −log 𝑝 𝑋 = ෍ −log 𝑝 𝑥 (𝑖) = ෍ ෍ − log 𝑝 𝑥 (𝑖) 𝑑 |𝑥 (𝑖)1:𝑑−1
𝑁 𝑁
𝑖=1 𝑖=1 𝑑=1

Solution:
3

ℒ(𝜃) = ෍ − log 𝑝 𝑥 (1) 𝑑 |𝑥 (1)1:𝑑−1 = − log 𝑝 𝑥 (1)1 = 1 − log 𝑝 𝑥 (1) 2 = 0|𝑥 (1)1 − log 𝑝 𝑥 (1) 3 = 0|𝑥 (1)1 , 𝑥 (1) 2
ฬ
𝑥=𝑥(1)
𝑑=1

= − log 𝜎 𝑉1 𝑇 ℎ1 + 𝑐1 − log 1 − 𝜎 𝑉2 𝑇 ℎ2 + 𝑐2 − log 1 − 𝜎 𝑉3 𝑇 ℎ3 + 𝑐3

MH. Shateri, SYS863-01: Deep Generative Modeling: Theory and Applications (Summer 2023) 29
Generative Models
Autoregressive (AR)

Autoregressive Generative Models: Neural Autoregressive Density Estimator (NADE)

Looking more closely at the loss function of the NADE …

𝑁 𝑁 𝐷
1 (𝑖)
1
ℒ 𝜃 = 𝔼𝑋 𝑵𝑳𝑳 = 𝔼𝑋 −log 𝑝 𝑋 = ෍ −log 𝑝 𝑥 = ෍ ෍ − log 𝑝 𝑥 (𝑖) 𝑑 |𝑥 (𝑖)1:𝑑−1
𝑁 𝑁
𝑖=1 𝑖=1 𝑑=1
Cross entropy
calculated for pixel 𝑑!

Note that this is nothing but the (averaged) summation of the cross-entropies. In other words, the cross-entropy is applied
to each output (which is binary) and then the (averaged) summation of these cross-entropies are computed.

For example, assume that for a given training instance we have 𝑥3 = 0 which means the true probability distribution (at the third output)
is 𝑝 = 𝑝 𝑥3 = 0 𝑥1 , 𝑥2 = 1, 𝑝 𝑥3 = 1 𝑥1 , 𝑥2 = 0 while the predicted one is 𝑞 = 1 − 𝜎 𝑉3 𝑇 ℎ3 + 𝑐3 , 𝜎 𝑉3 𝑇 ℎ3 + 𝑐3 . So the cross-entropy
would be as: 𝐶𝐸 𝑝, 𝑞 = −1 × log 1 − 𝜎 𝑉3 𝑇 ℎ3 + 𝑐3 − 0 × log 𝜎 𝑉3 𝑇 ℎ3 + 𝑐3

MH. Shateri, SYS863-01: Deep Generative Modeling: Theory and Applications (Summer 2023) 30
Generative Models
Autoregressive (AR)

Example: Consider a simple case where 𝑋 ∈ {0,1}3 , i.e. binary sequences with length 𝐷 = 3. Compute the loss function of
the NADE for a training instance 𝑥 (1) = (𝑥 (1)1 , 𝑥 (1) 2 , 𝑥 (1) 3 ) = 1,0,0 .

𝑁 𝑁 𝐷
1 1
ℒ 𝜃 = 𝔼𝑋 𝑵𝑳𝑳 = 𝔼𝑋 −log 𝑝 𝑋 = ෍ −log 𝑝 𝑥 (𝑖) = ෍ ෍ − log 𝑝 𝑥 (𝑖) 𝑑 |𝑥 (𝑖)1:𝑑−1
𝑁 𝑁
𝑖=1 𝑖=1 𝑑=1

Method 1: (negative log-likelihood)

3

ℒ(𝜃) = ෍ − log 𝑝 𝑥 (1) 𝑑 |𝑥 (1)1:𝑑−1 = − log 𝑝 𝑥 (1)1 = 1 − log 𝑝 𝑥 (1) 2 = 0|𝑥 (1)1 − log 𝑝 𝑥 (1) 3 = 0|𝑥 (1)1 , 𝑥 (1) 2
ฬ
𝑥=𝑥(1)
𝑑=1

= − log 𝜎 𝑉1 𝑇 ℎ1 + 𝑐1 − log 1 − 𝜎 𝑉2 𝑇 ℎ2 + 𝑐2 − log 1 − 𝜎 𝑉3 𝑇 ℎ3 + 𝑐3

MH. Shateri, SYS863-01: Deep Generative Modeling: Theory and Applications (Summer 2023) 31
Generative Models
Autoregressive (AR)

Example: Consider a simple case where 𝑋 ∈ {0,1}3 , i.e. binary sequences with length 𝐷 = 3. Compute the loss function of
the NADE for a training instance 𝑥 (1) = (𝑥 (1)1 , 𝑥 (1) 2 , 𝑥 (1) 3 ) = 1,0,0 .

𝑁 𝑁 𝐷
1 1
ℒ 𝜃 = 𝔼𝑋 𝑵𝑳𝑳 = 𝔼𝑋 −log 𝑝 𝑋 = ෍ −log 𝑝 𝑥 (𝑖) = ෍ ෍ − log 𝑝 𝑥 (𝑖) 𝑑 |𝑥 (𝑖)1:𝑑−1
𝑁 𝑁
𝑖=1 𝑖=1 𝑑=1

Method 2 : (cross-entropy)

𝑥 (1)1 = 1 →: 𝑝 = 0,1 , 𝑞 = 1 − 𝜎 𝑉1 𝑇 ℎ1 + 𝑐1 , 𝜎 𝑉1 𝑇 ℎ1 + 𝑐1 → 𝐶𝐸1 𝑝, 𝑞 = − log 𝜎 𝑉1 𝑇 ℎ1 + 𝑐1

𝑥 (1) 2 = 0 →: 𝑝 = 1,0 , 𝑞 = 1 − 𝜎 𝑉2 𝑇 ℎ2 + 𝑐2 , 𝜎 𝑉2 𝑇 ℎ2 + 𝑐2 → 𝐶𝐸2 𝑝, 𝑞 = − log 1 − 𝜎 𝑉2 𝑇 ℎ2 + 𝑐2

𝑥 (1) 3 = 0 →: 𝑝 = 1,0 , 𝑞 = 1 − 𝜎 𝑉3 𝑇 ℎ3 + 𝑐3 , 𝜎 𝑉3 𝑇 ℎ3 + 𝑐3 → 𝐶𝐸3 𝑝, 𝑞 = − log 1 − 𝜎 𝑉3 𝑇 ℎ3 + 𝑐3

⇒ ℒ(𝜃) = 𝐶𝐸1 𝑝, 𝑞 + 𝐶𝐸2 𝑝, 𝑞 + 𝐶𝐸3 𝑝, 𝑞

ฬ
𝑥=𝑥(1)

MH. Shateri, SYS863-01: Deep Generative Modeling: Theory and Applications (Summer 2023) 32
Generative Models
Autoregressive (AR)

Autoregressive Generative Models: Neural Autoregressive Density Estimator (NADE)

To summarize:
𝑾
𝑥1 𝑉1 𝑝 𝑥1
For each 𝑑 = 1, 2, … , 𝐷: 𝑀 × 𝐷 matrix … ℎ1

𝑥2 … ℎ2
𝑉2 𝑝 𝑥2 |𝑥1
ℎ𝑑 = 𝜎 𝑾(𝑥 ⊙ 𝑚𝑑 ) + 𝑏 𝑀 × 1 vector
𝑉3 𝑝 𝑥3 |𝑥1 , 𝑥2
𝑥3 … ℎ3
𝑝 𝑥𝑑 |𝑥1:𝑑−1 = 𝜎 𝑉𝑑 𝑇ℎ + 𝑐𝑑
𝑑

…
…
…
𝑉𝐷
𝑥𝐷 … ℎ𝐷 𝑝 𝑥𝐷 |𝑥1:𝐷−1
𝐷
𝑀 units
ℒ 𝜃 = 𝔼𝑋 𝑵𝑳𝑳 = 𝔼𝑋 −log 𝑝 𝑋 = 𝔼𝑋 ෍ − log 𝑝 𝑋𝑑 |𝑋1:𝑑−1
𝑑=1

Cross entropy
calculated for pixel 𝑑

MH. Shateri, SYS863-01: Deep Generative Modeling: Theory and Applications (Summer 2023) 33
Generative Models
Autoregressive (AR)

Autoregressive Generative Models: Neural Autoregressive Density Estimator (NADE)

Here is a result of Neural Autoregressive Density Estimator (NADE) from Larochelle et al., JMLR 2011 where hidden kayers
of size 500 (𝑀 =500) is used.

From Figure 4 of article Larochelle et al., JMLR

2011 : Result of NADE with one hidden layer
of size 500 with Sigmoid activation on
binarized MNIST dataset. (Left) generated data
and (right) training data . See more details and
more examples in the original article.

MH. Shateri, SYS863-01: Deep Generative Modeling: Theory and Applications (Summer 2023) 34
Generative Models
Autoregressive (AR)

Autoregressive Generative Models:

So far, Fully Visible Sigmoid Belief Network (FVSBN) and Neural Autoregressive Density Estimator (NADE) for modeling 𝑝 𝑥 =
𝑝(𝑥1 , 𝑥2 , … , 𝑥𝐷 ) the probability distribution of 𝐷-dimensional data 𝑋 ∈ 0,1 𝐷 .

Question 1: Can we apply these models to non-binary discrete random variables? For example 𝑋 ∈ 0,1,2, … , 255 𝐷 .
- Yes, we can simply model this multinomial distribution using Softmax activation at the output (instead of Sigmoid function).

For each 𝑑 = 1, 2, … , 𝐷: 𝑀 × 𝐷 matrix 𝑾

𝑥1 …
𝑉1 𝑝 𝑥1
ℎ1
ℎ𝑑 = 𝜎 𝑾(𝑥 ⊙ 𝑚𝑑 ) + 𝑏 𝑀 × 256 vector
𝑉2
𝑥2 … ℎ2 𝑝 𝑥2 |𝑥1
𝑝 𝑥𝑑 |𝑥1:𝑑−1 = 𝑆𝑜𝑓𝑡𝑚𝑎𝑥 𝑉𝑑 𝑇 ℎ𝑑 + 𝑐𝑑 𝑥3 … ℎ3
𝑉3 𝑝 𝑥3 |𝑥1 , 𝑥2
𝐷

…
…
…
ℒ 𝜃 = 𝔼𝑋 𝑵𝑳𝑳 = 𝔼𝑋 −log 𝑝 𝑋 = 𝔼𝑋 ෍ − log 𝑝 𝑋𝑑 |𝑋1:𝑑−1 𝑉𝐷
𝑥𝐷 … ℎ𝐷 𝑝 𝑥𝐷 |𝑥1:𝐷−1
𝑑=1 𝑀 units
Cross entropy
calculated for pixel 𝑑
MH. Shateri, SYS863-01: Deep Generative Modeling: Theory and Applications (Summer 2023) 35
Generative Models
Autoregressive (AR)

Autoregressive Generative Models:

Question 2: What about the continuous random variables? e.g., speech signals where 𝑋 ∈ ℝ𝐷 .
- In this case, we can assume 𝑝 𝑥𝑑 |𝑥1:𝑑−1 is a mixture of 𝐾 Gaussians i.e., 𝑝 𝑥𝑑 |𝑥1:𝑑−1 = σ𝐾 2
𝑗=1 𝜋𝑑,𝑗 𝒩 𝑥𝑑 ; 𝜇𝑑,𝑗 , 𝜎 𝑑,𝑗 where the
parameters 𝜋𝑑,𝑗 , 𝜇𝑑,𝑗 , 𝜎2𝑑,𝑗 are learned from the 𝑥1:𝑑−1 by a neural network. This model is called Real-valued Neural Autoregressive
Density Estimator (RNADE). For more details please see the article by Uria et al., NeurIPS 2013.

For each 𝑑 = 1, 2, … , 𝐷: 𝑀 × 𝐷 matrix 𝐾

ℎ𝑑 = 𝜎 𝑾(𝑥 ⊙ 𝑚𝑑 ) + 𝑏 𝑝 𝑥𝑑 |𝑥1:𝑑−1 = ෍ 𝜋𝑑,𝑗 𝒩 𝑥𝑑 ; 𝜇𝑑,𝑗 , 𝜎2𝑑,𝑗

𝑗=1
𝑉𝜋1
𝜋𝑑 = 𝑠𝑜𝑓𝑡𝑚𝑎𝑥( 𝑉𝜋 𝑇ℎ + 𝑐𝜋 𝑾
𝑑 𝑑 𝑑) 𝑥1 … ℎ1
𝑉𝜇1
𝑉𝜋𝑑, 𝑉𝜇
𝑑, and 𝑉𝜎
𝑑 are 𝑀 × 𝐾. 𝑉𝜎1
𝜇𝑑 = 𝑉𝜇𝑑 𝑇 ℎ𝑑 + 𝑐𝜇𝑑
𝑐𝜋𝑑 , 𝑐𝜇𝑑 , and 𝑐𝜎𝑑 are 𝐾 ×1. 𝑥2 … ℎ2

log 𝜎𝑑 = 𝑉𝜎𝑑 𝑇 ℎ𝑑 + 𝑐𝜎𝑑

…
𝑥3 … ℎ3

…
…
𝑉𝜋𝐷
ℒ 𝜃 = 𝔼𝑋 𝑵𝑳𝑳 = 𝔼𝑋 −log 𝑝 𝑋 = 𝔼𝑋 ෍ − log 𝑝 𝑋𝑑 |𝑋1:𝑑−1 𝑥𝐷 … ℎ𝐷
𝑉𝜇𝐷
𝑉𝜎𝐷
𝑑=1
𝑀 units

MH. Shateri, SYS863-01: Deep Generative Modeling: Theory and Applications (Summer 2023) 36
Generative Models**
Autoregressive (AR)

Autoregressive Generative Models: Masked Autoencoder for Distribution Estimation (MADE)

On surface, both Fully Visible Sigmoid Belief Network (FVSBN) and Neural Autoregressive Density Estimator (NADE) look
very similar to an Autoencoder (going from 𝐷 inputs to 𝐷 outputs). In work by Germain et al., ICML 2015 it was shown
how autoencoder neural networks can be modified to enable generation. The main idea involved the masking of
autoencoder parameters to enforce autoregressive constraints. This resulted in a model named as Masked Autoencoder
for Distribution Estimation or MADE.
Hadamard product
(element-wise multiplication)
𝑝 𝑥2 |𝑥1 𝑝 𝑥3 |𝑥1 , 𝑥2
𝑥ො1 𝑥ො2 𝑥ො3 𝑝 𝑥1 Mask

𝑥ො = 𝜎 𝑐 + 𝑉ℎ[2] 1 2 3
V V
𝑥ො = 𝜎 𝑐 + 𝑉 ⊙ 𝑀 𝑉 ℎ[2]

ℎ[2] = 𝑓 [2] 𝑏[2] + 𝑊 [2] ℎ[1] 1 1 2 2

W [2] W [2]
ℎ[2] = 𝑓 [2] 𝑏[2] + 𝑊 2
⊙ 𝑀 2 ℎ[1]
ℎ[1] = 𝑓 [1] 𝑏[1] + 𝑊 [1] 𝑥 2 1 1 2
ℎ[1] = 𝑓 [1] 𝑏[1] + 𝑊 1
⊙𝑀 1 𝑥
W [1] W [1]
1 2 3
𝑥1 𝑥2 𝑥3 𝑥1 𝑥2 𝑥3

MH. Shateri, SYS863-01: Deep Generative Modeling: Theory and Applications (Summer 2023) 37
Generative Models**
Autoregressive (AR)

Autoregressive Generative Models: Masked Autoencoder for Distribution Estimation (MADE)

The main question is how the masks can be formed to ensure autoregressive constraints? Below this is discussed based on
the work by Germain et al., ICML 2015.

o Assume an ordering in 𝐷-dimensional data 𝑥, e.g., 𝑥2 , 𝑥1 , 𝑥3 , … and keep it for both input and output nodes. (in example below:
𝑥1 , 𝑥2 , 𝑥3 )
o For the hidden layers, assign each node a number between 1 and 𝐷 − 1.
o Each nodes at the output can be connected to the ones at
previous layers with (strictly) smaller node number. 𝑝 𝑥2 |𝑥1 𝑝 𝑥3 |𝑥1 , 𝑥2 Hadamard product
𝑝 𝑥1 (element-wise multiplication)

o Each nodes at the hidden layers can be connected to the ones 1 2 3 Mask
at the previous layer with smaller or equal node number. V

1 1 2 2
𝑥ො = 𝜎 𝑐 + 𝑉 ⊙ 𝑀 𝑉 ℎ[2]

1 1 0 0 1 1 0 W [2]
0 0 0 0
ℎ[2] = 𝑓 [2] 𝑏[2] + 𝑊 2 ⊙ 𝑀 2 ℎ[1]
= 0 1 2 1 1 2
𝑀 1
= 1 0 0 , 𝑀 2 1 0 , 𝑀 𝑉
= 1 1 0 0
1 0 0 1 1 1 1
1 1 1 1 W [1]
1 1 0 1 1 1 1
1 2 3 ℎ[1] = 𝑓 [1] 𝑏[1] + 𝑊 1
⊙𝑀1 𝑥
𝑥1 𝑥2 𝑥3

MH. Shateri, SYS863-01: Deep Generative Modeling: Theory and Applications (Summer 2023) 38
Generative Models**
Autoregressive (AR)

Autoregressive Generative Models: Masked Autoencoder for Distribution Estimation (MADE)

The main question is how the masks can be formed to ensure autoregressive constraints? Below this is discussed based on
the work by Germain et al., ICML 2015.

➢ Training is done as the one for fully connected network except applying the mask (to ensure autoregressive constraints).
➢ The loss function is as before for the Neural Autoregressive Density Estimator (NADE).

𝑝 𝑥2 |𝑥1 𝑝 𝑥3 |𝑥1 , 𝑥2
𝑥ො1 𝑥ො2 𝑥ො3 0 0 0 0 𝑝 𝑥1
𝑀𝑉 = 1 1 0 0
1 1 2 3
1 1 1
V V
0 1 1 0
𝑀2 = 0 1 1 0 1 1 2 2
1 1 1 1
W [2] ⊙ 1 1 1 1 ≡ W [2]
2 1 1 2
1 1 0
W [1] 𝑀1 = 1 0 0
W [1]
1 0 0
1 1 0 1 2 3
𝑥1 𝑥2 𝑥3 𝑥1 𝑥2 𝑥3

MH. Shateri, SYS863-01: Deep Generative Modeling: Theory and Applications (Summer 2023) 39
Generative Models**
Autoregressive (AR)

Autoregressive Generative Models: Masked Autoencoder for Distribution Estimation (MADE)

In masked autoencoder for distribution estimation (MADE) the generation can be done (similar to before) as follows:

𝑝 𝑥2 |𝑥1 𝑝 𝑥3 |𝑥1 , 𝑥2 𝑝 𝑥2 |𝑥1 𝑝 𝑥3 |𝑥1 , 𝑥2 𝑝 𝑥2 |𝑥1 𝑝 𝑥3 |𝑥1 , 𝑥2

𝑝 𝑥1 𝑝 𝑥1 𝑝 𝑥1
1 2 3 1 2 3 1 2 3
V V V

1 1 2 2 1 1 2 2 1 1 2 2
W [2] W [2] W [2]
2 1 1 2 2 1 1 2 2 1 1 2

W [1] W [1] W [1]

1 2 3 1 2 3 1 2 3
𝑥1 𝑥2 𝑥3 𝑥1 𝑥2 𝑥3 𝑥1 𝑥2 𝑥3

Sample a value of 𝑥1 from 𝑝 𝑥1 Feed 𝑥1 to the network and Feed 𝑥1 and 𝑥2 to the network
compute 𝑝 𝑥2 |𝑥1 . Then sample and compute 𝑝 𝑥3 |𝑥1 , 𝑥2 . Then
𝑥2 from 𝑝 𝑥2 |𝑥1 . sample 𝑥3 from 𝑝 𝑥3 |𝑥1 , 𝑥2 .

MH. Shateri, SYS863-01: Deep Generative Modeling: Theory and Applications (Summer 2023) 40
Generative Models
Autoregressive (AR)

Autoregressive Generative Models: Lets turn into another modeling based on CNNs
Can we do this sequential modeling using convolutional layers (Conv 1D)?

RECAP: 1D Convolution is an operation applied to 1D inputs where each output is the weighted sum of the nearby inputs.

Lets assume 1D input 𝒙 = [𝑥1 , 𝑥2 , 𝑥3 , 𝑥4 , 𝑥5 , 𝑥6 , 𝑥7 ] and a 1D convolution filter of size three 𝒘 = [𝑤1 , 𝑤2 , 𝑤3 ]𝑇 .
input filter output
⊗ Filter/Kernel size = 3
𝑥1 𝑤1
⊗
𝑥2 𝑤2 𝑧1 𝑧1 = 𝑤1 𝑥1 + 𝑤2 𝑥2 + 𝑤3 𝑥3
⊗
𝑥3 𝑤3

𝑥4
Note that normally there is a bias term (i.e., 𝑧1 = 𝑤1 𝑥1 + 𝑤2 𝑥2 + 𝑤3 𝑥3 + 𝑏)
𝑥5
which for simplicity is ignored here.
𝑥6
𝑥7
MH. Shateri, SYS863-01: Deep Generative Modeling: Theory and Applications (Summer 2023) 41
Generative Models
Autoregressive (AR)

Autoregressive Generative Models: Lets turn into another modeling based on CNNs
Can we do this sequential modeling using convolutional layers (Conv 1D)?

RECAP: 1D Convolution is an operation applied to 1D inputs where each output is the weighted sum of the nearby inputs.

Lets assume 1D input 𝒙 = [𝑥1 , 𝑥2 , 𝑥3 , 𝑥4 , 𝑥5 , 𝑥6 , 𝑥7 ] and a 1D convolution filter of size three 𝒘 = [𝑤1 , 𝑤2 , 𝑤3 ]𝑇 .
input filter output
Filter/Kernel size = 3
𝑥1
⊗
𝑥2 𝑤1 𝑧1 𝑧1 = 𝑤1 𝑥1 + 𝑤2 𝑥2 + 𝑤3 𝑥3
⊗
𝑥3 𝑤2 𝑧2 𝑧2 = 𝑤1 𝑥2 + 𝑤2 𝑥3 + 𝑤3 𝑥4
⊗
𝑥4 𝑤3

…
𝑥5
…

𝑥6
𝑥7
MH. Shateri, SYS863-01: Deep Generative Modeling: Theory and Applications (Summer 2023) 42
Generative Models
Autoregressive (AR)

Autoregressive Generative Models: Lets turn into another modeling based on CNNs
Can we do this sequential modeling using convolutional layers (Conv 1D)?

Yes, we can but we need to ensure this operation is done in a causal manner. Why? This is the condition enforced by the
autoregressive representation of the data probability distribution.
𝐷

𝑝 𝑥 = 𝑝(𝑥1 , 𝑥2 , … , 𝑥𝐷 ) = 𝑝 𝑥1 𝑝 𝑥2 |𝑥1 𝑝 𝑥3 |𝑥1 , 𝑥2 𝑝 𝑥4 |𝑥1 , 𝑥2 , 𝑥3 … 𝑝 𝑥𝐷 |𝑥1:𝐷−1 = ෑ 𝑝(𝑥𝑑 |𝑥1:𝑑−1 )

𝑑=1

Prediction at pixel/time 𝑑 − 1 cannot depend on any future

timesteps/pixels 𝑥𝑑 , 𝑥𝑑+1 , … , 𝑥𝐷 .

MH. Shateri, SYS863-01: Deep Generative Modeling: Theory and Applications (Summer 2023) 43
Generative Models
Autoregressive (AR)

Autoregressive Generative Models:

What is the causal 1D convolutions (Causal Conv 1D)? regular Conv 1D with a proper padding (and masking).

𝒕=𝟓
𝒕=𝟒
𝒕=𝟐

𝒕=𝟑
𝒕=𝟏

Filter/Kernel size = 2 Q1. If the 𝑥5 is not used at the input then

𝑥ො1 𝑥ො2 𝑥ො3 𝑥ො4 𝑥ො5 where we are using it?!

𝒕=𝟓
𝒕=𝟒
𝒕=𝟐

𝒕=𝟑
𝒕=𝟏
Q2. In this model the receptive field is 2, in
output
𝑥ො1 𝑥ො2 𝑥ො3 𝑥ො4 𝑥ො5 other words the output at timestep 𝑡
Causal Conv 1D depends on the input at 𝑡 − 1 and 𝑡 − 2.
How can we increase the receptive field to
input
0 0 𝑥1 𝑥2 𝑥3 𝑥4 𝑥5 have a better modeling?

𝑥1 𝑥2 𝑥3 𝑥4 𝑥5

MH. Shateri, SYS863-01: Deep Generative Modeling: Theory and Applications (Summer 2023) 44
Generative Models
Autoregressive (AR)

Autoregressive Generative Models:

What is the causal 1D convolutions (Causal Conv 1D)? regular Conv 1D with a proper padding (and masking).

𝒕=𝟓 Filter/Kernel size = 2

𝒕=𝟒
𝒕=𝟐

𝒕=𝟑
𝒕=𝟏

Filter/Kernel size = 2

𝒕=𝟓
𝒕=𝟒
𝒕=𝟐

𝒕=𝟑
𝒕=𝟏
𝑥ො1 𝑥ො2 𝑥ො3 𝑥ො4 𝑥ො5
𝑥ො1 𝑥ො2 𝑥ො3 𝑥ො4 𝑥ො5

𝒕=𝟓
𝒕=𝟒
𝒕=𝟐

𝒕=𝟑
𝒕=𝟏
output
𝑥ො1 𝑥ො2 𝑥ො3 𝑥ො4 𝑥ො5

Causal Conv 1D 0

input
0 0 𝑥1 𝑥2 𝑥3 𝑥4 𝑥5 0 0 𝑥1 𝑥2 𝑥3 𝑥4 𝑥5

𝑥1 𝑥2 𝑥3 𝑥4 𝑥5

MH. Shateri, SYS863-01: Deep Generative Modeling: Theory and Applications (Summer 2023) 45
Generative Models
Autoregressive (AR)

Autoregressive Generative Models:

What is the causal 1D convolutions (Causal Conv 1D)? regular Conv 1D with a proper padding (and masking).

𝒕=𝟓 Filter/Kernel size = 2

𝒕=𝟒
𝒕=𝟐

𝒕=𝟑
𝒕=𝟏

Filter/Kernel size = 2

𝒕=𝟓
𝒕=𝟒
𝒕=𝟐

𝒕=𝟑
𝒕=𝟏
𝑥ො1 𝑥ො2 𝑥ො3 𝑥ො4 𝑥ො5
𝑥ො1 𝑥ො2 𝑥ො3 𝑥ො4 𝑥ො5

𝒕=𝟓
𝒕=𝟒
𝒕=𝟐

𝒕=𝟑
𝒕=𝟏
output
𝑥ො1 𝑥ො2 𝑥ො3 𝑥ො4 𝑥ො5

Causal Conv 1D 0

input
0 0 𝑥1 𝑥2 𝑥3 𝑥4 𝑥5 0 0 𝑥1 𝑥2 𝑥3 𝑥4 𝑥5

𝑥1 𝑥2 𝑥3 𝑥4 𝑥5 The receptive field is now three!

MH. Shateri, SYS863-01: Deep Generative Modeling: Theory and Applications (Summer 2023) 46
Generative Models
Autoregressive (AR)

Autoregressive Generative Models:

Adding more layers (and/or increasing the filter size) increases the receptive field (which is #layers + filter length -1). But it
increases the computational cost! Any other solution for increasing the receptive field?

𝑥ො𝑇
Output

Hidden Layer

Hidden Layer

Hidden Layer

Input

𝑥𝑇−1
Oord et al., 2016

MH. Shateri, SYS863-01: Deep Generative Modeling: Theory and Applications (Summer 2023) 47
Generative Models
Autoregressive (AR)

Autoregressive Generative Models:

One can use dilated convolution to increase the receptive field of convolution.

𝑥ො𝑇
Output
Dilation =8

Hidden Layer
Dilation =4

Hidden Layer
Dilation =2

Hidden Layer
Dilation =1

Input

𝑥𝑇−1

Oord et al., 2016

MH. Shateri, SYS863-01: Deep Generative Modeling: Theory and Applications (Summer 2023) 48
Generative Models
Autoregressive (AR)

Autoregressive Generative Models:

Using Softmax activation at the output for modeling the conditional distribution (even for the Audio Oord et al., 2016 ).

𝑝(𝑥𝑡 |𝑥1:𝑡−1 )

Output
Dilation =8

Hidden Layer
Dilation =4

Hidden Layer
Dilation =2

Hidden Layer
Dilation =1

Input

𝑥𝑡−1

Oord et al., 2016

MH. Shateri, SYS863-01: Deep Generative Modeling: Theory and Applications (Summer 2023) 49
Generative Models
Autoregressive (AR)

Autoregressive Generative Models:

Two additional tricks to improve the strength of Causal Conv 1D includes: Oord et al., 2016

Gated activation unit

Learnable filter weights

𝑧 = tanh(𝑊𝑓,𝑘 ∗ 𝑥)⨀𝜎 𝑊𝑔,𝑘 ∗ 𝑥

Causal Conv 1D layer Causal Conv 1D layer Layer k

𝑥
Input to layer

MH. Shateri, SYS863-01: Deep Generative Modeling: Theory and Applications (Summer 2023) 50
Generative Models
Autoregressive (AR)

Autoregressive Generative Models:

Two additional tricks to improve the strength of Causal Conv 1D includes: Oord et al., 2016
Output of layer
Using skip connection
(residual block) He et. al, 2015 +

residual/skip/shortcut connection
𝑧 = tanh(𝑊𝑓,𝑘 ∗ 𝑥)⨀𝜎 𝑊𝑔,𝑘 ∗ 𝑥

Causal Conv 1D layer Causal Conv 1D layer Layer k

𝑥
Input to layer

MH. Shateri, SYS863-01: Deep Generative Modeling: Theory and Applications (Summer 2023) 51
Generative Models
Autoregressive (AR)

Autoregressive Generative Models: WaveNet

WaveNet developed by DeepMind Oord et al., 2016 as a deep generative model for audio and other sequential data. Its
strength lies in its ability to generate high-quality, realistic audio and its potential for various applications, such as speech
synthesis, music generation, and audio effects processing.

o Changing speaker gender

o Adding emotion, accent etc.

o Generating music (trained on a dataset of classical piano music)

https://www.deepmind.com/blog/wavenet-a-generative-model-for-raw-audio

MH. Shateri, SYS863-01: Deep Generative Modeling: Theory and Applications (Summer 2023) 52
Generative Models
Autoregressive (AR)

Autoregressive Generative Models: PixelCNN

The idea of causal one-dimensional convolution (or causal Conv 1D) can be extended to Conv 2D by masking the 2D filter
weights (to ensure causality). This leads to causal Conv 2D or masked convolution Oord et al., NeurIPS 2016.
Input image Feature map

𝑤11 𝑤12 𝑤13 1 1 1

𝑥𝑑 𝑥ො𝑑
𝑤21 𝑤22 𝑤23 ⊙ 1 0 0

𝑤31 𝑤32 𝑤33 0 0 0

kernel Mask

Note that masked convolution was introduced few months before Conv 1D (by the same author) and it is more precise to
say that Conv 1D was introduced based on masked convolution.

MH. Shateri, SYS863-01: Deep Generative Modeling: Theory and Applications (Summer 2023) 53
Generative Models
Autoregressive (AR)

Autoregressive Generative Models: PixelCNN

PixelCNN proposed by Oord et al., NeurIPS 2016 is an autoregressive generative model for image data and generates
images pixel by pixel, conditioning on the previously generated pixels. The key idea behind PixelCNN is to model the
conditional probability distribution of each pixel given the previously generated pixels (in its receptive field). This is
achieved by using masked convolutions.
𝑝

Using a Softmax to find a multinomial distribution over 256 possible values.

Output Layer
0 255

1 1 1

First Hidden Layer 1 1 0 Mask type II This is applied to all the subsequent convolutional layers.
0 0 0

1 1 1

Input 1 0 0
Mask type I This is applied only to the first convolutional layer to ensure autoregressive constraints.
0 0 0

MH. Shateri, SYS863-01: Deep Generative Modeling: Theory and Applications (Summer 2023) 54
Generative Models
Autoregressive (AR)

Autoregressive Generative Models: PixelCNN

Note: The receptive field can be increased by adding more hidden layers. However, due to the masked convolution, this
can lead to creating blind spots which means some pixels never contribute to the output pixel.

In other words:

Output Layer ×
× ×
𝑥𝑑
First Hidden Layer

Blind spot for the determined output. It happens no

matter how many layers we add. So it is modeling
Input
𝑝 𝑥13 |𝑥1:9, 𝑥11:12 not 𝑝 𝑥13 |𝑥1:12 ! Blind spots in modeling position t for a 3×3 kernel.

MH. Shateri, SYS863-01: Deep Generative Modeling: Theory and Applications (Summer 2023) 55
Generative Models
Autoregressive (AR)

Autoregressive Generative Models: PixelCNN

Note 1: To prevent having blind spots, each convolution is split into vertical and horizontal stacks where the vertical stack
looks at the rows above the targeted pixel while the horizontal stack looks at the pixel to the left of the target pixel in the
same row. For more details look at the original article Oord et al., NeurIPS 2016.

Note 2: In implementation of PixelCNN, a sort of gated non-linearity function/activation (rooted from LSTM) is used to
improve the performance. In addition, to improve gradient flow and speed up training, residual connections are often
used in PixelCNN models. These connections bypass one or more layers in the model and allow gradients to flow directly
to earlier layers.

Note 3: In the case of colored images, each pixel’s color channels are modeled sequentially, with the B channel
conditioned on (R,G) and the G channel conditioned solely on the R. Thus, PixelCNN takes images of size 𝑁 × 𝑁 × 3 at its
input and returns as output the predictions of size 𝑁 × 𝑁 × 3 × 256.

MH. Shateri, SYS863-01: Deep Generative Modeling: Theory and Applications (Summer 2023) 56
Generative Models
Autoregressive (AR)

Autoregressive Generative Models: PixelCNN

Note: That there are various extensions of PixelCNN such as PixelCNN++ by Salimans et al., ICLR2017 and PixelRNN
proposed by Oord et al., ICML2016 which are not covered in this course.

MH. Shateri, SYS863-01: Deep Generative Modeling: Theory and Applications (Summer 2023) 57

You might also like

Experimentation An Introduction To Measurement Theory and Experiment Design DC Baird
100% (3)
Experimentation An Introduction To Measurement Theory and Experiment Design DC Baird
210 pages
EC402 Syllabus 2019
No ratings yet
EC402 Syllabus 2019
2 pages
On A Two-Fold Algebra Based On The Standard Fuzzy Number Theoretical System
No ratings yet
On A Two-Fold Algebra Based On The Standard Fuzzy Number Theoretical System
6 pages
A Compact Exploration of Turiyam Neutrosophic Competition Graphs
No ratings yet
A Compact Exploration of Turiyam Neutrosophic Competition Graphs
9 pages
On Refined Netrusophic Fractional Calculus
No ratings yet
On Refined Netrusophic Fractional Calculus
12 pages
Am1 Ex3
No ratings yet
Am1 Ex3
3 pages
A Study of Derivative and Integration A Neutrosophic Functions
No ratings yet
A Study of Derivative and Integration A Neutrosophic Functions
7 pages
cs6359 hw1 With Hints
No ratings yet
cs6359 hw1 With Hints
2 pages
An Example of Two-Fold Fuzzy Algebras Based On Neutrosophic Real Numbers
No ratings yet
An Example of Two-Fold Fuzzy Algebras Based On Neutrosophic Real Numbers
10 pages
Some Properties of M-Accretive Operators in Normed Spaces: Christina Kartika Sari
No ratings yet
Some Properties of M-Accretive Operators in Normed Spaces: Christina Kartika Sari
8 pages
Unifying Count-Based Exploration and Intrinsic Motivation: ON Tezuma S Evenge
No ratings yet
Unifying Count-Based Exploration and Intrinsic Motivation: ON Tezuma S Evenge
26 pages
ST2131 22S2 Tutorial 10
No ratings yet
ST2131 22S2 Tutorial 10
2 pages
Unit 1-Tutorial sheet 3
No ratings yet
Unit 1-Tutorial sheet 3
2 pages
15-Fixedpointtheoremsofintegraltypecontractionincompleteb2-Metricspaces
No ratings yet
15-Fixedpointtheoremsofintegraltypecontractionincompleteb2-Metricspaces
15 pages
Neutrosophic Goal Programming Applied To Bank: Three Investment Problem
No ratings yet
Neutrosophic Goal Programming Applied To Bank: Three Investment Problem
8 pages
Determinant and Adjoint of Fuzzy Neutrosophic Soft Matrices
No ratings yet
Determinant and Adjoint of Fuzzy Neutrosophic Soft Matrices
17 pages
Neutrosophic TwoFold SuperhyperAlgebra and Anti SuperhyperAlgebra
No ratings yet
Neutrosophic TwoFold SuperhyperAlgebra and Anti SuperhyperAlgebra
12 pages
Foundation of Appurtenance and Inclusion Equations For Constructing The Operations of Neutrosophic Numbers Needed in Neutrosophic Statistics (Revised)
No ratings yet
Foundation of Appurtenance and Inclusion Equations For Constructing The Operations of Neutrosophic Numbers Needed in Neutrosophic Statistics (Revised)
20 pages
Simulating Chi-Square Data Through Algorithms in The Presence of Uncertainty
No ratings yet
Simulating Chi-Square Data Through Algorithms in The Presence of Uncertainty
14 pages
Lecture Series 1 Linear Random and Fixed Effect Models and Their (Less) Recent Extensions
No ratings yet
Lecture Series 1 Linear Random and Fixed Effect Models and Their (Less) Recent Extensions
62 pages
Matrix-Based RSA Encryption of Streaming Data Prendergast 20210804
No ratings yet
Matrix-Based RSA Encryption of Streaming Data Prendergast 20210804
15 pages
A Triple Fixed Point Theorem of Caristi Type Contraction For Multi Valued Maps in A Haussdorff Metric Space
No ratings yet
A Triple Fixed Point Theorem of Caristi Type Contraction For Multi Valued Maps in A Haussdorff Metric Space
4 pages
Fuzzy Graph Theory
No ratings yet
Fuzzy Graph Theory
6 pages
493-ArticleText-1491-1-10-20221222
No ratings yet
493-ArticleText-1491-1-10-20221222
7 pages
Publishable Paper
No ratings yet
Publishable Paper
15 pages
Probability and Statistics PG 83,84,85
No ratings yet
Probability and Statistics PG 83,84,85
98 pages
Introduction To Symbolic 2-Plithogenic Probability Theory
No ratings yet
Introduction To Symbolic 2-Plithogenic Probability Theory
14 pages
A Novel Approach To Neutrosophic Hypersoft Graphs With Properties
No ratings yet
A Novel Approach To Neutrosophic Hypersoft Graphs With Properties
20 pages
Research Article
No ratings yet
Research Article
9 pages
Lec31 32 CaterpillarRegressionExample
No ratings yet
Lec31 32 CaterpillarRegressionExample
108 pages
5860-Article Text-10603-1-10-20210520
No ratings yet
5860-Article Text-10603-1-10-20210520
13 pages
Chapter - 2
No ratings yet
Chapter - 2
29 pages
Algebra 1-Chapter 02
No ratings yet
Algebra 1-Chapter 02
6 pages
New Generalized Weighted Fractional Variants of Hermite-Hadamard Inequalities With Applications, Siberian Electronic Mathematical Report
No ratings yet
New Generalized Weighted Fractional Variants of Hermite-Hadamard Inequalities With Applications, Siberian Electronic Mathematical Report
19 pages
Foundation of Appurtenance and Inclusion Equations For Constructing The Operations of Neutrosophic Numbers Needed in Neutrosophic Statistics
No ratings yet
Foundation of Appurtenance and Inclusion Equations For Constructing The Operations of Neutrosophic Numbers Needed in Neutrosophic Statistics
17 pages
CS 215: Data Analysis and Interpretation: Sample Questions
No ratings yet
CS 215: Data Analysis and Interpretation: Sample Questions
10 pages
Lec22 PDF
No ratings yet
Lec22 PDF
8 pages
N-Refined Neutrosophic Fuzzy of Some Topological Concepts
No ratings yet
N-Refined Neutrosophic Fuzzy of Some Topological Concepts
8 pages
Journal 1
No ratings yet
Journal 1
10 pages
Splitting the One-Dimensional Wave Equation Part I Solving by Finite-Difference Method and Separation Variables
No ratings yet
Splitting the One-Dimensional Wave Equation Part I Solving by Finite-Difference Method and Separation Variables
7 pages
Rough Stat 20
No ratings yet
Rough Stat 20
20 pages
MTH-429
No ratings yet
MTH-429
37 pages
Implementation of artificial intelligence for prediction performance of solar thermal system
No ratings yet
Implementation of artificial intelligence for prediction performance of solar thermal system
10 pages
Principal Intuitionistic Fuzzy Ideals and Filters On A Lattice
No ratings yet
Principal Intuitionistic Fuzzy Ideals and Filters On A Lattice
14 pages
A Study of Systems of Neutrosophic Linear Equations
No ratings yet
A Study of Systems of Neutrosophic Linear Equations
10 pages
Dinh 2021
No ratings yet
Dinh 2021
45 pages
Some Modal Operators On Intuitionistic Fuzzy Multisets
No ratings yet
Some Modal Operators On Intuitionistic Fuzzy Multisets
10 pages
Fixed Point Results For Single-Valued Mappings On A Set With Two Metrics Using A Dass Gupta-Type Bilateral Contraction
No ratings yet
Fixed Point Results For Single-Valued Mappings On A Set With Two Metrics Using A Dass Gupta-Type Bilateral Contraction
10 pages
Lab 5-2
No ratings yet
Lab 5-2
4 pages
Bakari832017ARJOM34769 PDF
No ratings yet
Bakari832017ARJOM34769 PDF
7 pages
Panel VAR
No ratings yet
Panel VAR
29 pages
Study of Simplicial Methods with Stability
No ratings yet
Study of Simplicial Methods with Stability
7 pages
140203Spring_LinearAlgebra_solution
No ratings yet
140203Spring_LinearAlgebra_solution
2 pages
PreCalc Unit 9 Honors Assignment
No ratings yet
PreCalc Unit 9 Honors Assignment
3 pages
Unbiased Recursive Partitioning I: A Non-Parametric Conditional Inference Framework
No ratings yet
Unbiased Recursive Partitioning I: A Non-Parametric Conditional Inference Framework
27 pages
3.7 Blank
No ratings yet
3.7 Blank
8 pages
Hanif Lecture
No ratings yet
Hanif Lecture
65 pages
SQB7024 Tutorial 1
No ratings yet
SQB7024 Tutorial 1
2 pages
MODULE-2 - Relations and Functions
No ratings yet
MODULE-2 - Relations and Functions
21 pages
Coupled Fixed Point Theorems in - Metric Space
No ratings yet
Coupled Fixed Point Theorems in - Metric Space
8 pages
Linear Regression
No ratings yet
Linear Regression
47 pages
Core Concepts in Real Analysis
From Everand
Core Concepts in Real Analysis
Roshan Trivedi
No ratings yet
Deep Generative Model
No ratings yet
Deep Generative Model
27 pages
Tutorial - SD - Bass Diffusion Model - ENG
No ratings yet
Tutorial - SD - Bass Diffusion Model - ENG
41 pages
A Mixed Integer Linear Program For Airport Departure Scheduling
No ratings yet
A Mixed Integer Linear Program For Airport Departure Scheduling
13 pages
A Comparative Study of Learning Curves With
No ratings yet
A Comparative Study of Learning Curves With
9 pages
Quiz1arithmetic Sequence
No ratings yet
Quiz1arithmetic Sequence
1 page
Implementation of Cordic Algorithm Using Matlab & VHDL For Wlan Receiver
No ratings yet
Implementation of Cordic Algorithm Using Matlab & VHDL For Wlan Receiver
3 pages
Philosophies in Entrepreneurship
No ratings yet
Philosophies in Entrepreneurship
21 pages
Aiming For Grade 9 Maths Spring 2020 Practice
No ratings yet
Aiming For Grade 9 Maths Spring 2020 Practice
119 pages
Cap9 Sisteme Discrete - G - 169
No ratings yet
Cap9 Sisteme Discrete - G - 169
32 pages
Category Theory For Dummies
80% (5)
Category Theory For Dummies
27 pages
Aetna Book 2015 Hyper
No ratings yet
Aetna Book 2015 Hyper
221 pages
حقيبة تعليمية لمادة التحليلات الهندسية والعددية
No ratings yet
حقيبة تعليمية لمادة التحليلات الهندسية والعددية
28 pages
55 Minute Lesson Plan
No ratings yet
55 Minute Lesson Plan
4 pages
22 Growth of Functions
No ratings yet
22 Growth of Functions
29 pages
SSC Class X Maths - Sample - Paper
No ratings yet
SSC Class X Maths - Sample - Paper
4 pages
GP4 Maths 2 Lesson-Note
No ratings yet
GP4 Maths 2 Lesson-Note
7 pages
Arithmetic Progression
No ratings yet
Arithmetic Progression
5 pages
Circles 1-27--Math--Jee main
No ratings yet
Circles 1-27--Math--Jee main
25 pages
Stage 10 B
No ratings yet
Stage 10 B
6 pages
DPQ Sheet - 2 Maths Olympiad Numbers Introduction
No ratings yet
DPQ Sheet - 2 Maths Olympiad Numbers Introduction
18 pages
Topical Question Quadratics 5
No ratings yet
Topical Question Quadratics 5
40 pages
HSJ
No ratings yet
HSJ
6 pages
MCQ Questions 18MAT21
50% (2)
MCQ Questions 18MAT21
5 pages
Lecture 03 PDF
No ratings yet
Lecture 03 PDF
9 pages
TIMO 2018 卷S1f
No ratings yet
TIMO 2018 卷S1f
5 pages
Five Trigonometry Identities Problems
No ratings yet
Five Trigonometry Identities Problems
2 pages
Math 8 Q3 Weeks 1-2
No ratings yet
Math 8 Q3 Weeks 1-2
10 pages
Algebra Thesis
100% (2)
Algebra Thesis
8 pages
AE675: Introduction To Finite Element Method: Indian Institute of Technology Kanpur
No ratings yet
AE675: Introduction To Finite Element Method: Indian Institute of Technology Kanpur
15 pages
2 Simple Regression Model
No ratings yet
2 Simple Regression Model
55 pages
Engineering Drawing: Projections: S. P. Harsha, PHD
No ratings yet
Engineering Drawing: Projections: S. P. Harsha, PHD
40 pages
Maths Grade 8
80% (5)
Maths Grade 8
228 pages