0% found this document useful (0 votes)

4 views

exercise0_solution

The document contains voluntary exercises for a Deep Reinforcement Learning course, focusing on mathematical concepts and their applications in machine learning. Exercises include Taylor expansion, critical points analysis, probability distributions, variance calculations, and empirical mean estimates. Solutions are provided for each exercise, demonstrating the necessary mathematical derivations and proofs.

Uploaded by

Meme Necromancer

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

4 views

exercise0_solution

Uploaded by

Meme Necromancer

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 9

DSAIT4115 Deep Reinforcement Learning Exercise Sheet 0

Wendelin Böhmer <[email protected]> voluntary exercises

Math and machine learning primer

Voluntary exercises

The following exercises do not have to be submitted as homework, but might be helpful to practice
the required math and prepare for the exam. Some questions are from old exams and contain the used
rubrik. You do not have to submit these questions and will not receive points for them.

E0.1: Taylor expansion (voluntary)

√
For the function 1 + x, write down the Taylor series around x0 = 0 up to 3rd order.

Solution Approximating f (x) via Taylor expansion at x0 :

∞
X f (n) (x0 )
f (x) = (x − x0 )n
n!
n=0

i.e. for an expansion around x0 = 0

1 ′′ 1 ′′′
f (x) ≈ f (0) + f ′ (0)x + f (0)x2 + f (0)x3 + O(x4 )
2 6
with
1 1
f ′ (x) =
(x + 1)− 2 → f ′ (0) = 1/2
2
1 3
f (x) = − (x + 1)− 2 → f ′′ (0) = −1/4
′′
4
′′′ 3 5
f (x) = (x + 1)− 2 → f ′′′ (0) = 3/8
8
√ 1
the Taylor expansion of 1 + x = (1 + x) 2 around x0 = 0 reads
√ 1 1 1
1 + x ≈ 1 + x − x2 + x3 + ...
2 8 16

E0.2: Critical points (voluntary)

Consider the two functions

f (x, y) := c + x2 + y 2
g(x, y) := c + x2 − y 2 ,
where c ∈ R is a constant.

1
DSAIT4115 Deep Reinforcement Learning Exercise Sheet 0

(a) Show that a = (0, 0) is a critical point of both functions.

(b) Check for f and for g whether a is a minimum, maximum, or saddlepoint using the Hessian matrix.
Hint: A matrix is positive (negative) definite if and only if all its eigenvalues are positive (negative).

Solution
∇f (a) = (2x, 2y) = (0, 0)
(x,y)=a

and
∇g(a) = (2x, −2y) = (0, 0)
(x,y)=a

=⇒ Necessary condition of extrema (vanishing gradient) is fulfilled at a.

Checking for extrema:
Minimum if Hessian matrix H is positive definite (all eigenvalues >0)
Maximum if H is negative definite (all eigenvalues <0)
→ characteristic polyomial i.e. det[H − λI]:

2 0 !
(Hf )(a) = ⇒ (2 − λ)2 = 0
0 2

i.e. all eigenvalues (2&2) are real, positive ⇒ Hf is pos. definite. Thus, a is a minimum of f

2 0 !
(Hg )(a) = ⇒ (2 − λ)(−2 − λ) = 0
0 −2
positive and negative Eigenvalues (2&-2) ⇒ Hg is neither positive nor negative definite. Therefore a
is a saddlepoint but no extremum of g.

E0.3: Distributions and expected values (voluntary)

Let x ∈ R be a random variable with probability density p : R → R with:

c · sin(x), x ∈ [0, π]
p(x) =
0, elsewhere
(a) Determine the parameter c ∈ R such that p(x) is indeed a probability density.
(b) Determine the expected value µ := Ep [x]
(c) Determine the variance of x, Ep [(x − µ)2 ].

Solution
(a) For p being a probability density it is required that (i) p(x) ≥ 0 ∀x ∈ R which is fullfilled here.
Furthermore, (ii) p must be normalized appropriately:
Z
p(x)dx = 1
R

Therefore, we get for the unknown constant c:

Zπ
π !
c sin(x)dx = c[− cos(x)] = 2c = 1 → c = 1/2
0
0

2
DSAIT4115 Deep Reinforcement Learning Exercise Sheet 0

(b) To calculate the expected value, we use integration by parts, i.e., for any functions f and g:
Z b b
Z b
f g ′ dx = (f g) − f ′ gdx
a a a

Z π π
Z π
1
µ := Ep [x] = 2 x sin(x)dx = − 12 x cos(x) + 1
2 cos(x)dx = π/2
0 0
|0 {z }
=0

(c) To calculate the variance, we proceed in the same way

Z π Z π
π
Ep [x2 ] = 12 x2 sin(x)dx = − 12 x2 cos(x) + 2
2 x cos(x)dx
0 0 0
| {z } | {z }
2
= π2 =k

with Z π
π π
k = x sin(x) − sin(x)dx = 0 + cos(x) =0−2
0 0 0

and therefore
π2
Ep [x2 ] = 2 −2
yielding
π2 π2 π2
Ep [(x − µ)2 ] = Ep [x2 ] − µ2 = 2 −2− 4 = 4 − 2.

E0.4: Variance of the empirical mean (old exam question) (voluntary)

Prove that the variance of the empirical mean fn := n1 ni=1 xi , based on n samples xi ∈ R drawn
P
2
i.i.d. from the Gaussian distribution N (µ, σ 2 ), is V[fn ] = σn , without using the fact the variance of a
sum of independent variables is the sum of the variables’ variances.

Solution The major insights are that E[xi ] = µ, ∀i, E[xi xj ] = E[xi ]E[xj ] if i ̸= j due to i.i.d. sampling
and that E[(xi − µ)2 ] = σ 2 .
h n 2 i n n
1P 1 PP
V[fn ] = E n xi −µ = n2
E[(xi − µ)(xj − µ)]
i=1 i=1j=1
n
σ2
1 P
µ)] E[(xj − µ)] + n12 E[(xi − µ)2 ]
P
= n2
E[(xi − = n .
i̸=j
| {z } | {z } i=1 | {z }
0 0 σ2

Rubrik:
• 1 point for the correct definition of variance V
• 1 point for using E[xi ] = µ
• 1 point for the use of independent samples
• 1 point for the use of the definition of σ 2
• 1 point for putting it correctly together
• − 21 points for minor mistakes (e.g. E[xi xj ] = 0 for i ̸= j)
• but no point loss for forgetting little things like one or two ± mistakes

3
DSAIT4115 Deep Reinforcement Learning Exercise Sheet 0

E0.5: Unbiased variance estimate (voluntary)

Let {xi }ni=1 be a data set that is drawn i.i.d. from the Gaussian distribution xi ∼ N (µ, σ 2 ). Let further
µ̂ := n1 ni=1 xi denote the empirical mean and σ̂ 2 := n1 ni=1 (xi − µ̂)2 the equivalent empirical
P P
variance. Prove analytically that µ̂ is unbiased, i.e. E[µ̂] = µ, and that σ̂ 2 is biased, i.e. E[σ̂ 2 ] ̸= σ 2 .
Bonus-question: Can you derive an unbiased estimator for the empirical variance?
Hint: If xi and xj are drawn i.i.d. from N (µ, σ 2 ), then holds ∀i:

E[xi ] = µ , E[(xi − µ)2 ] = σ 2 and E[(xi − µ)(xj − µ)] = 0 if i ̸= j .

Solution We prove that µ̂ is bias free simply by using its definition:

h Pn i n
E[µ̂] = E n1 xi = n1 E[xi ] =
P
µ.
i=1 i=1 | {z }
µ

Proving that σ̂ 2 is biased is more involved, as σ̂ 2 contains the empirical mean µ̂:
n h i n n
2 2
1P
= n1 E[x2i ] − 2 n1 E[xi µ̂] + E[µ̂2 ]
P P
E[σ̂ ] = n E (xi − µ̂)
i=1 i=1 i=1
n n h n i h P n n i
2
= n1 E[xi ] − 2 n1 E xi n1 xj + E n1 xi n1 xj
P P P P
i=1 i=1 j=1 i=1 j=1
n n P
n
1P
E[x2i ] − 1
E[xi xj ] −µ2 + µ2
P
= n n2
i=1 i=1j=1 | {z }
0
n n P
n
1P
E[(xi − µ)2 ] − n12
P
= n E[(xi − µ)(xj − µ)]
i=1 | {z } i=1j=1 | {z }
σ2 σ 2 if i=j else 0
2 1 2 n−1 2
= σ − nσ = n σ ,

where we used E[(xi − µ)(xj − µ)] = E[xi xj ] − E[xi ]µ − E[xj ]µ + µ2 = E[xi xj ] − µ2 , because
E[xi ] = µ.
n
Bonus-question: Note that σ̂ 2 would be unbiased if we would multiply it with n−1 and we can therefore
define the unbiased empirical estimate of the variance as
n
ˆ2
σ̂ := 1 P
(xi − µ̂)2 .
n−1
i=1

E0.6: Maximum dice (voluntary)

This question is designed to practice the use of Kronecker-delta functions and become more familiar
with (discrete) probabilities. You are given 3 dice, a D6, a D8 and a D10, where Dx refers to a x-
sided fair dice, where each of the x sides is numbered uniquely 1 to x and rolled with the exact same
probability.
(a) Prove analytically that the probability that the D6 is among the highest (including equal) numbers
when all 3 dice are rolled together is roughly ρ ≈ 19%.
(b) Prove analytically that the probability that the D8 rolls among the highest is ρ′ ≈ 38%.
(c) Prove analytically that the probability that the D10 rolls among the highest is ρ′′ ≈ 58%.

4
DSAIT4115 Deep Reinforcement Learning Exercise Sheet 0

Hint: You can solve the question however you want, but you are encouraged to use Kronecker-deltas,
e.g. δ(i > 5) is 1 if i > 5 and 0 otherwise. You will find that this can simplify complex sums
(1) 2 (2)
enormously. If you do so, you can use the equalities ni=1 i = n 2+n and ni=1 i2 = n(n+1)(2n+1)
P P
6 .
Bonus-question: Why don’t the above numbers sum up to 1?

1
Solution The three dice are statistically independent and have the probability px (i) = x of outcome
1 ≤ i ≤ x. The probability of a Dx rolling higher or equal than a Dy is therefore:
x y
1 PP
p(i ≥ j|i ∼ px , j ∼ py ) = xy δ(i ≥ j) .
i=1j=1

Note that if two conditions must be true, one can simply multiply the Kronecker-delta functions.
(a) The probability ρ of a D6 to roll higher than the D8 and the D10 is thus: i i
z }| { z }| {
6 8 10
1 PP P
ρ = p(i ≥ j ∧ i ≥ k|i ∼ p6 , j ∼ p8 , k ∼ p10 ) = 6·8·10 δ(i ≥ j) δ(i ≥ k)
i=1 j=1 k=1
6
1 P 2 (2) 1 6(6+1)(12+1) 91
= 480 i = 480 6 = 480 ≈ 19% .
i=1

P6
(b) The major difference is that ≥ j) cannot get larger than 6, even if i > 6.
j=1 δ(i
8 6 10
ρ′ = p(i ≥ j ∧ i ≥ k|i ∼ p8 , j ∼ p6 , k ∼ p10 ) = 6·8·10 1 PP P
δ(i ≥ j) δ(i ≥ k)
i=1j=1k=1
8 P
6 10 8 6
1 P P 1 P P
= 6·8·10 δ(i ≥ j) δ(i ≥ k) = 6·8·10 i δ(i ≥ j)
i=1j=1 k=1 i=1 j=1
| {z } | {z }
i min(i,6)
6 8
(2)
P
1
i2 + 6i 1 6·7·13 91+90
P
= 6·8·10 = 6·8·10 6 + 6(7 + 8) = 480 ≈ 38% .
i=1 i=7

(c) Similarly for the D10: min(i,6) min(i,8)

z }| { z }| {
10 6 8
ρ′′ = p(i ≥ j ∧ i ≥ k|i ∼ p10 , j ∼ p6 , k ∼ p8 ) 1 PP P
= 6·8·10 δ(i ≥ j) δ(i ≥ k)
i=1 j=1 k=1
6
P 8 10
1
i2 + 91+90+96
P P
= 480 6i + 6·8 = 480 ≈ 58% .
i=1 i=7 i=9

Bonus-question: Because conditions like δ(i ≥ j) and δ(j ≥ i) overlap in the case δ(i = j). To get
a probability distribution over disjunct outcomes, one would have to consider the cases “D6 is highest,
D8 is highest, D10 is highest, D6 and D8 are highest, D8 and D10 are highest, D6 and D10 are highest
and finally all 3 dice are equal (and thus highest)”. The probabilities over these cases would sum up to
1.

E0.7: Implement MNIST classification (voluntary)

Implement the MNIST classification example from the lecture slides. Make sure you get the correct
deep CNN model architecture from Lecture 2 (p.18).
(a) Train the model fθ : R28×28 → R10 from the lecture slides with a cross-entropy loss for 10 epochs.
Plot the average train/test losses during each epoch (y-axis) over all epochs (x-axis). Do the same

5
DSAIT4115 Deep Reinforcement Learning Exercise Sheet 0

with the average train/test accuracies during each epoch. Try to program as modular as possible, as
you will re-use the code later.
(b) Change your optimization criterion to a mean-squared-error loss between the same model architec-
ture fθ : R28×28 → R10 you used in (a) and a one-hot encoding (hi ∈ R10 , hij = 1 iff j = yi ,
otherwise hij = 0) of the labels yi :
n 2
1P
L := n fθ (xi ) − hi
i=1

Plot the same plots as in (a). Try to reuse as much of your old code as possible, e.g., by defining
the criterion (which is now different) as external functions that can be overwritten.
(c) Define a new architecture fθ′ : R28×28 → R, that is exactly the same as above, but with only one
output neuron instead of 10. Train it with a regression mean-squared error loss between the model
output and the scalar class identifier.
n 2
L′ := 1P
n fθ′ (xi ) − yi
i=1

Plot the same plots as in (a), but for 50 epochs.

(d) Learning in (c) should be significantly slower, in terms of accuracy gain per epoch, than in (a) and
(b). Use a transformation of your model output (which can be implemented in the functions that
compute the criterion and the accuracy, or as an extra module) as fθ′′ (xi ) := αfθ′ (xi ) + β, with
α = β = 4.5. Plot the same plots as in (c). Does the learning behavior change? Why?
Bonus-question: Can you come up with an alternative approach to (d) that has the same speed-up effect?
Hint: Evaluate your test loss and accuracy before every training to make sure the accuracy is defined
correctly (should be around 0.1 for a model without training). This means that you will always have
one test measurement more.
Hint: Try to reuse as much of your old code as possible, e.g., by defining the criterion and the accuracy
(which will change for some question) as external functions that can be overwritten later.

Solution A sample implementation can be found in the accompanying Jupyter Notebook.

Bonus-question: Interestingly, a larger learning rate does not help, as the RMSProp seems to auto-
matically compensate for it. However, transforming the output is mathematically equivalent to scal-
ing the last model layer. Both the linear weights model[-1].weight *= 4.5 and the bias
model[-1].bias[0] = 4.5 of the torch.nn.Linear layer need to be adjusted. Note that
PyTorch does not allow you to modify parameters in-place, as this can break auto-differentiation when
done during a forward pass. The context torch.no_grad() allows to circumvent this security
feature. This context is generally helpful when no gradient computation is necessary, as it frees all
intermediately computed tensors and can save a lot of memory usage when used correctly.

E0.8: Mean and variance of online estimates (voluntary)

Let {yt }∞ 2
t=1 an infinite training set drawn i.i.d. from the Gaussian distribution N (µ, σ ). At time t, the
online estimate ft of the average over the training set, starting at f0 , is defined as

ft = ft−1 + α (yt − ft−1 ) , 0 < α < 1.

(a) Show that for small t the online estimate is biased, i.e., E[ft ] ̸= µ .

6
DSAIT4115 Deep Reinforcement Learning Exercise Sheet 0

(b) Prove that in the limit t → ∞ the online estimate is unbiased, i.e., E[ft ] = µ .
α σ2
(c) Prove that in the limit t → ∞ the variance of the online estimate is E[ft2 ] − E[ft ]2 = 2−α .
t−1
1−rt
rk =
P
Hint: You can use the geometric series 1−r , ∀|r| < 1 .
k=0

α 1
Bonus-question: Prove that for the decaying learning rate αt = 1−(1−α)t holds lim αt = t .
α→0

Hint: You can also use the binomial identity (x + y)t = tk=0 t
xk y t−k .
P
k

Solution
Pt−1
(a) Note that by recursion ft = (1 − α)t f0 + i=0 α (1 − α)i yt−i .

Xt−1
E[ft ] = (1 − α)t f0 + α (1 − α)i E[yt−i ] = µ + (1 − α)t (f0 − µ) ̸= µ.
| {z }
|i=0 {z } µ
1−(1−α)t
1−(1−α)

(b) In limit of (a), the term (1 − α)t goes against 0.

lim E[ft ] = µ + lim (1 − α)t (f0 − µ) = µ.

t→∞ t→∞
| {z }
0

h t−1
X 2 i
lim E[ft2 ] = lim E (1 − α)t f0 + α(1 − α)i yt−i
t→∞ t→∞
i=0
h 2 t−1
X
t t
= lim E (1 − α) f0 + 2(1 − α) f0 α(1 − α)i yt−i
t→∞
i=0
t−1
X t−1
X i
+ α(1 − α)i yt−i α(1 − α)j yt−j
i=0 j=0
2 t 2t
−(1−α)
= lim (1 − α)t f0 + lim 2α f0 µ (1−α)
1−(1−α)
t→∞ | {z } t→∞ | {z }
→0 →0
t−1
hX t−1
X i
+ lim E α(1 − α)i yt−i α(1 − α)j yt−j
t→∞
i=0 j=0
t−1
X
= α2 lim (1 − α)i+j E yt−i yt−j

t→∞
i,j=0
∞ 2 t−1
2
X
i 2 2
X i
= µ α (1 − α) + α σ lim (1 − α)2
t→∞
i=0 i=0
| {z }
=1
α2 σ 2 α σ2
= µ2 + = µ2 + .
1 − (1 − α)2 2−α

Subtracting lim E[ft ]2 = µ2 yields the variance.

t→∞

7
DSAIT4115 Deep Reinforcement Learning Exercise Sheet 0

Bonus-question: One can use the binomial identity to reformulate the (1 − α)t :
t−1 t −1
h 1
i
lim αt = lim α
t lim α−1 − α− t − α t
=
α→0 α→0 1−(1−α) α→0
t (t−k)(t−1) −1
h i
t−k − kt
lim α−1 − t
P
= k (−1) α α t
α→0 k=0
| {z }
αt−k−1
h t−2 i−1
lim α−1 − α−1
P t t−k t−k−1 t t
α0 −

= k (−1) α + t−1 t
α→0 k=0 | {z } |{z}
t 1
h t−2
P t i−1 1
t−k t−k−1
= lim t − k (−1) α = .
α→0 k=0 t

E0.9: Noise in linear functions (voluntary)

Let {xi }ni=1 ⊂ Rm denote a set of training samples and {yi }ni=1 ⊂ R the set of corresponding training
labels. We will use the mean squared loss L := n1 i (f (xi ) − yi )2 to learn a function f (xi ) ≈ yi , ∀i.
P

(a) Derive the analytical solution for parameter a ∈ Rm of a linear function f (x) := a⊤ x.
(b) We will now augment the training data by adding i.i.d. noise ϵi ∼ N (0, σ 2 ) ∈ R to the training
labels, i.e. ỹi := yi + ϵi . Show that this does not change the analytical solution of the expected loss
E[L].
(c) Let f denote the function that minimizes L without label-noise, and let f˜ denote the function that
minimizes L with a random noise ϵi added to labels yi (but not the solution of the expected loss
E[L]). Derive the analytical variance E[(f˜(x) − f (x))2 ] of the noisy solution f˜.
(d) We will now augment the training data by adding i.i.d. noise ϵi ∼ N (0, Σ) ∈ Rm to the training
samples: x̃i = xi + ϵi . Derive the analytical solution for parameter a ∈ Rm that minimizes the
expected loss E[L].

Bonus-question: Which popular regularization method is equivalent to (d) and what problem is solved?
Hint: Summarize all training samples into matrix X = [x1 , . . . , xn ]⊤ ∈ Rn×m , all training labels into
vector y = [y1 , . . . , yn ]⊤ ∈ Rn , and denote the noisy versions ỹ ∈ Rn and X̃ ∈ Rn×m .

Solution
2 P ⊤ 2 ⊤ 2 ⊤ !
(a) Setting the gradient to zero: ∇a L = n i (a xi − yi )xi = n X Xa − n X y = 0 allows us to
!
derive the analytic solution for a if the matrix X⊤ X is invertible: a = (X⊤ X)−1 X⊤ y.

(b) Using the result from (a): E[∇a L] = n2 X⊤ Xa − n2 X⊤ E[ỹ] = n2 X⊤ Xa − n2 X⊤ y, because E[ỹ] =
y + E[ϵ] = y due to the zero mean noise vector ϵ := [ϵ1 , . . . , ϵn ]⊤ .

(c) First note that E[(f˜(x) − f (x))2 ] = E[f˜2 (x)] − 2f (x)E[f˜(x)] + f 2 (x) = E[f˜2 (x)] − f 2 (x),
because E[ỹ] = y. Due to i.i.d. noise we have E[ỹ ỹ ⊤ ] = yy ⊤ + yE[ϵ⊤ ] + E[ϵ]y ⊤ + E[ϵϵ⊤ ] =
yy ⊤ +σ 2 I, and E[f˜2 (x)] = x⊤ (X⊤ X)−1 X⊤ E[ỹ ỹ ⊤ ]X(X⊤ X)−1 x = f 2 (x)+σ 2 x⊤ (X⊤ X)−1 x.
The variance is therefore E[(f˜(x) − f (x))2 ] = σ 2 x⊤ (X⊤ X)−1 x.

8
DSAIT4115 Deep Reinforcement Learning Exercise Sheet 0

ϵn ]⊤ , where we know from the definition of all ϵi that E[E] = 0 ∈ Rn×m

(d) Let E := [ϵ1 , . . . , P
and E[E E] = E[ i ϵi ϵ⊤
⊤
i ] = nΣ. Gradients pass through sums (and therefore expectations):
!
∇a E[L] = n1 E[X̃⊤ X̃]a − n1 E[X̃⊤ ]y = ( n1 X⊤ X + Σ)a − n1 X⊤ y = 0. Now the optimal solution
!
for the parameter vector is a = (X⊤ X + nΣ)−1 X⊤ y.

Bonus-question: The popular L2 regularization, also called weight decay, adds the term λ∥a∥2 to the
!
loss and yields the analytical solution a = (X⊤ X + nλI)−1 Xy, which is also called ridge regression.
This regularization guarantees that the matrix X⊤ X+nλI is invertible for all λ > 0 and yields smoother
functions. For Σ = λI, which corresponds to noising each input dimension independently with variance
λ, the two solutions are the same, which indicates that noising the input smoothes the learned function!

https://xkcd.com/2343

cs109 Final Cheat 3 PDF
No ratings yet
cs109 Final Cheat 3 PDF
13 pages
Properties of Sums: Problem Set 1 - Due July 16th ECON 139/239 2010 Summer Term II
No ratings yet
Properties of Sums: Problem Set 1 - Due July 16th ECON 139/239 2010 Summer Term II
17 pages
exercise0
No ratings yet
exercise0
4 pages
Solutions To The Exercises On The Bias-Variance Dilemma
No ratings yet
Solutions To The Exercises On The Bias-Variance Dilemma
8 pages
Concepts_in_Deep_Learning_Solutions_v1.0
No ratings yet
Concepts_in_Deep_Learning_Solutions_v1.0
110 pages
STAT3902 (24-25) Assignment 2 Solution
No ratings yet
STAT3902 (24-25) Assignment 2 Solution
5 pages
Machine 2020 Jul-Dec Practice 7,8
No ratings yet
Machine 2020 Jul-Dec Practice 7,8
37 pages
Assignment 0 (Sol.) : Reinforcement Learning
No ratings yet
Assignment 0 (Sol.) : Reinforcement Learning
50 pages
solutions chapter 3 till 3.75-1
No ratings yet
solutions chapter 3 till 3.75-1
10 pages
msqe_metrics_1_ps2
No ratings yet
msqe_metrics_1_ps2
11 pages
endsem_solutions
No ratings yet
endsem_solutions
19 pages
Cheat Sheet
No ratings yet
Cheat Sheet
5 pages
326 Formulas
No ratings yet
326 Formulas
3 pages
Queuing Theory PDF
No ratings yet
Queuing Theory PDF
38 pages
EE3110 Jul 2024 Tutorial5 Solutions
No ratings yet
EE3110 Jul 2024 Tutorial5 Solutions
7 pages
Hw2 - Raymond Von Mizener - Chirag Mahapatra
No ratings yet
Hw2 - Raymond Von Mizener - Chirag Mahapatra
13 pages
Math525 2
No ratings yet
Math525 2
8 pages
Review Materials 0 8 1
No ratings yet
Review Materials 0 8 1
140 pages
Fundamentals of Statistics (18.6501x)
No ratings yet
Fundamentals of Statistics (18.6501x)
20 pages
Homework 1: Instructions and Notes
No ratings yet
Homework 1: Instructions and Notes
2 pages
Stochastic Calculus Midterm Exam Solutions
No ratings yet
Stochastic Calculus Midterm Exam Solutions
6 pages
Best Question
No ratings yet
Best Question
6 pages
prml_solution_manual-2
No ratings yet
prml_solution_manual-2
122 pages
12_MLEFilled (1)
No ratings yet
12_MLEFilled (1)
8 pages
17 Notes MFML Probreview
No ratings yet
17 Notes MFML Probreview
19 pages
Expectation
No ratings yet
Expectation
19 pages
Assignment 1 (Sol.) : Introduction To Machine Learning Prof. B. Ravindran
No ratings yet
Assignment 1 (Sol.) : Introduction To Machine Learning Prof. B. Ravindran
7 pages
Proefoefen Tentamen 17 September 2019 Antwoorden
No ratings yet
Proefoefen Tentamen 17 September 2019 Antwoorden
5 pages
Assignment 1: Probability : Partial Solution
No ratings yet
Assignment 1: Probability : Partial Solution
7 pages
Probability Theory and Stochastic Process Problems11
No ratings yet
Probability Theory and Stochastic Process Problems11
74 pages
STAT2006_A1
No ratings yet
STAT2006_A1
21 pages
Sol 3
No ratings yet
Sol 3
7 pages
ELEN90054 Probability and Random Models: X S K K
No ratings yet
ELEN90054 Probability and Random Models: X S K K
14 pages
HW2
No ratings yet
HW2
4 pages
Exercise 0: Probability Theory: N I I N N I
No ratings yet
Exercise 0: Probability Theory: N I I N N I
3 pages
Answer: MN 2
No ratings yet
Answer: MN 2
8 pages
Solved Problems
No ratings yet
Solved Problems
6 pages
Probability Theory and Mathematical Statistics: Homework 2, Vitaliy Pozdnyakov
No ratings yet
Probability Theory and Mathematical Statistics: Homework 2, Vitaliy Pozdnyakov
13 pages
Homework #5: MA 402 Mathematics of Scientific Computing Due: Monday, September 27
100% (1)
Homework #5: MA 402 Mathematics of Scientific Computing Due: Monday, September 27
6 pages
ECON 1630 Problem Set #2 Fall 2021: Bias Variance
No ratings yet
ECON 1630 Problem Set #2 Fall 2021: Bias Variance
9 pages
HW_8
No ratings yet
HW_8
3 pages
ML A0
No ratings yet
ML A0
7 pages
DGM 2023 Endterm Solution
No ratings yet
DGM 2023 Endterm Solution
12 pages
murphysolns
No ratings yet
murphysolns
45 pages
hw3 Soln
No ratings yet
hw3 Soln
7 pages
slides_no_break
No ratings yet
slides_no_break
77 pages
Manual For Instructors: TO Linear Algebra Fifth Edition
No ratings yet
Manual For Instructors: TO Linear Algebra Fifth Edition
12 pages
Probability Final Threoms
No ratings yet
Probability Final Threoms
2 pages
05 Fall Exam1 Soln
No ratings yet
05 Fall Exam1 Soln
2 pages
10-701/15-781, Machine Learning: Homework 1: Aarti Singh Carnegie Mellon University
No ratings yet
10-701/15-781, Machine Learning: Homework 1: Aarti Singh Carnegie Mellon University
6 pages
X400004_20220215_solutions
No ratings yet
X400004_20220215_solutions
8 pages
Solution 3 Problem 1: Let X
No ratings yet
Solution 3 Problem 1: Let X
12 pages
probability_and_statistics_22s_soln
No ratings yet
probability_and_statistics_22s_soln
4 pages
hw1
No ratings yet
hw1
11 pages
Hw1a Soln
No ratings yet
Hw1a Soln
5 pages
homework1
No ratings yet
homework1
3 pages
ADASD
No ratings yet
ADASD
4 pages
A-level Maths Revision: Cheeky Revision Shortcuts
From Everand
A-level Maths Revision: Cheeky Revision Shortcuts
Scool Revision
3.5/5 (8)
Algebraic Equations
From Everand
Algebraic Equations
Demetrios P. Kanoussis
No ratings yet
Calculus I Essentials
From Everand
Calculus I Essentials
Editors of REA
1/5 (1)
Town Planning PDF
100% (1)
Town Planning PDF
86 pages
Educ 580 - Edpuzzle PD Handout
No ratings yet
Educ 580 - Edpuzzle PD Handout
3 pages
(Template) CV MSIB 4 2023
No ratings yet
(Template) CV MSIB 4 2023
2 pages
Log
No ratings yet
Log
22 pages
Project On Amazon
100% (1)
Project On Amazon
30 pages
Ch2 PROBLEMS Solutions
No ratings yet
Ch2 PROBLEMS Solutions
14 pages
Pygame
No ratings yet
Pygame
3 pages
External Chamber For Level Measurement With Inserted Sensors Model BZG
No ratings yet
External Chamber For Level Measurement With Inserted Sensors Model BZG
11 pages
Linear System of Equations
100% (1)
Linear System of Equations
11 pages
Get (Ebook) Detection Theory: Applications and Digital Signal Processing by Ralph D. Hippenstiel ISBN 9780849304347, 0849304342 free all chapters
100% (2)
Get (Ebook) Detection Theory: Applications and Digital Signal Processing by Ralph D. Hippenstiel ISBN 9780849304347, 0849304342 free all chapters
67 pages
Deloitte USI Consulting JD Business Strategy - Generalist - vDRAFT
No ratings yet
Deloitte USI Consulting JD Business Strategy - Generalist - vDRAFT
2 pages
Place Value 3
No ratings yet
Place Value 3
50 pages
Navneet Sangyal
No ratings yet
Navneet Sangyal
2 pages
892 Sanchita Sutar Final Draft 1 1 - 240217 - 213145
No ratings yet
892 Sanchita Sutar Final Draft 1 1 - 240217 - 213145
99 pages
Zainab Alam Resume
No ratings yet
Zainab Alam Resume
2 pages
Econ 2220 Lecture 5
No ratings yet
Econ 2220 Lecture 5
26 pages
Commissioning Procedure For Air Tightness Test of Furnace and Ducts
No ratings yet
Commissioning Procedure For Air Tightness Test of Furnace and Ducts
16 pages
Unit III Virtualization
No ratings yet
Unit III Virtualization
36 pages
Selection Guide Camaras
No ratings yet
Selection Guide Camaras
12 pages
Mid Term Exam-2021 Business Research Methodology Paper: 602: Time: 1 HR Full Marks-20 Section A
No ratings yet
Mid Term Exam-2021 Business Research Methodology Paper: 602: Time: 1 HR Full Marks-20 Section A
1 page
Gamification Water 160051028
No ratings yet
Gamification Water 160051028
24 pages
GRP Specification BY CK
100% (3)
GRP Specification BY CK
44 pages
SBM Principle 4 Management of Resources Label Level 2
No ratings yet
SBM Principle 4 Management of Resources Label Level 2
5 pages
Sample Thesis
No ratings yet
Sample Thesis
109 pages
Usblc6-2 UL26 ESD Protection
No ratings yet
Usblc6-2 UL26 ESD Protection
20 pages
Unit Ii: Interpolation and Approximation: XXXX XX Yyx FX y X XX X X X
No ratings yet
Unit Ii: Interpolation and Approximation: XXXX XX Yyx FX y X XX X X X
21 pages
Sudhir Prabhu 2024
No ratings yet
Sudhir Prabhu 2024
5 pages
Discrete Controller Synthesis Applied To Smart Greenhouse
No ratings yet
Discrete Controller Synthesis Applied To Smart Greenhouse
9 pages
Assignment Front Sheet: Unit Number and Title 12 Organisational Behaviour
No ratings yet
Assignment Front Sheet: Unit Number and Title 12 Organisational Behaviour
8 pages
Guidelines EMAD2022
No ratings yet
Guidelines EMAD2022
3 pages

exercise0_solution

Uploaded by

exercise0_solution

Uploaded by

DSAIT4115 Deep Reinforcement Learning Exercise Sheet 0

Wendelin Böhmer <[email protected]> voluntary exercises

Math and machine learning primer

E0.1: Taylor expansion (voluntary)

Solution Approximating f (x) via Taylor expansion at x0 :

i.e. for an expansion around x0 = 0

E0.2: Critical points (voluntary)

Consider the two functions

(a) Show that a = (0, 0) is a critical point of both functions.

=⇒ Necessary condition of extrema (vanishing gradient) is fulfilled at a.

E0.3: Distributions and expected values (voluntary)

Let x ∈ R be a random variable with probability density p : R → R with:

Therefore, we get for the unknown constant c:

(c) To calculate the variance, we proceed in the same way

E0.4: Variance of the empirical mean (old exam question) (voluntary)

E0.5: Unbiased variance estimate (voluntary)

E[xi ] = µ , E[(xi − µ)2 ] = σ 2 and E[(xi − µ)(xj − µ)] = 0 if i ̸= j .

Solution We prove that µ̂ is bias free simply by using its definition:

E0.6: Maximum dice (voluntary)

(c) Similarly for the D10: min(i,6) min(i,8)

E0.7: Implement MNIST classification (voluntary)

Plot the same plots as in (a), but for 50 epochs.

Solution A sample implementation can be found in the accompanying Jupyter Notebook.

E0.8: Mean and variance of online estimates (voluntary)

ft = ft−1 + α (yt − ft−1 ) , 0 < α < 1.

(b) In limit of (a), the term (1 − α)t goes against 0.

lim E[ft ] = µ + lim (1 − α)t (f0 − µ) = µ.

Subtracting lim E[ft ]2 = µ2 yields the variance.

E0.9: Noise in linear functions (voluntary)

ϵn ]⊤ , where we know from the definition of all ϵi that E[E] = 0 ∈ Rn×m

You might also like