0% found this document useful (0 votes)

4 views

SS_2020

The document outlines the structure and instructions for an end-term exam on Deep Learning at the Technical University of Munich, scheduled for August 11, 2020. It includes details about the exam format, rules, and types of questions, covering topics such as neural networks, activation functions, batch normalization, and convolutional neural networks. The exam consists of multiple-choice questions and problem-solving tasks, with a total of 90 credits available.

Uploaded by

aleksanderpiciga

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

4 views

SS_2020

Uploaded by

aleksanderpiciga

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 21

Chair of Visual Computing & Artificial Intelligence

Department of Informatics
Technical University of Munich nm
Note:
Esolution • During the attendance check a sticker containing a unique code will be put on this exam.
Place student sticker here
• This code contains a unique number that associates this exam with your registration number.
• This number is printed both next to the code and to the signature field in the attendance check
list.

Introduction to Deep Learning

Exam: IN2346 / Endterm Date: Tuesday 11th August, 2020
Examiner: Prof. Leal-Taixé and Prof. Nießner Time: 08:00 – 09:30

P1 P2 P3 P4 P5 P6 P7 P8

I
I ~-I I II I I I I I

Left room from to

from to

Early submission at

Notes
Chair of Visual Computing & Artificial Intelligence
Department of Informatics
Technical University of Munich nm
Endterm

Introduction to Deep Learning

Prof. Leal-Taixé and Prof. Nießner

Chair of Visual Computing & Artificial Intelligence
Department of Informatics
Technical University of Munich

Tuesday 11th August, 2020

08:00 – 09:30

Working instructions
• This exam consists of 20 pages with a total of 8 problems.
Please make sure now that you received a complete copy of the exam.

• The total amount of achievable credits in this exam is 90 credits.

• Detaching pages from the exam is prohibited.
• Allowed resources: none

• Do not write with red or green colors nor use pencils.

• Physically turn off all electronic devices, put them into your bag and close the bag.

• If you need additional space for a question, use the additional pages in the back and properly note that
you are using additional space in the question’s solution box.

– Page 1 / 20 –
Problem 1 Multiple Choice Questions: (18 credits)

• For all multiple choice questions any number of answers, i.e. either zero (!), one, all or multiple answers
can be correct.
• For each question, you’ll receive 2 points if all boxes are answered correctly (i.e. correct answers are
checked, wrong answers are not checked) and 0 otherwise.

How to Check a Box:

• Please cross the respective box: (interpreted as checked)

• If you change your mind, please fill the box:

• (interpreted as not checked)

• If you change your mind again, please place a cross to the left side of the box: (interpreted
as checked)

a) Which of the following statements regarding successful ImageNet-classification architectures are correct?

VGG16 uses
ResNet18 million
has 11Skip parameters more than VGG16.
Connections
AlexNet uses filters of different kernel sizes.

InceptionV3 uses filters of different kernel sizes.

VGG16 only uses convolutional layers.

b) You train a neural network and the train loss diverges. What are reasonable things to do? (check all that apply)
Decrease the learning rate.

Add dropout.

Increase the learning rate.

Try a different optimizer.

c) What is the correct order of operations for an optimization with gradient descent?
(a) Update the network weights to minimize the loss.
(b) Calculate the difference between the predicted and target value.
(c) Iteratively repeat the procedure until convergence.
(d) Compute a forward pass.
(e) Initialize the neural network weights.

bcdea

ebadc

eadbc

edbac

d) Consider a simple convolutional neural network with a single convolutional layer. Which of the following
statements is true about this network?
It is rotation invariant.

It is translation equivariant.

All input nodes are connected to all output nodes.

It is scale-invariant.

– Page 2 / 20 –
e) Which of the following activation functions can lead to vanishing gradients?
Tanh.

ReLU.

Sigmoid.

Leaky Relu.

f) Logistic regression (check all that apply).

Is a linear function.
Is a supervised learning algorithm.

Uses a type of cross-entropy loss.

Allows to perform binary classiﬁcation.

g) A sigmoid layer
cannot be used during backpropagation.

has a learnable parameter.

maps surjectively to values in (-1, 1), i.e., hits all values in that interval.

is continuous and differentiable everywhere.

h) Your training loss does not decrease. What could be wrong?

Learning rate is too high.

Too much regularization.

Dropout probability not high enough.

Bad initialization.

i) Which of the following have trainable parameters? (check all that apply)
Leaky ReLU

Batch normalization

Dropout

Max pooling

– Page 3 / 20 –
Problem 2 Activation Functions and Weight Initialization (8 credits)
For your first job, you have to set up a neural network but you have some issue with its weight initialization. You
remember from your I2DL lecture that you can sample the weights from a zero-centered normal distribution,
but you can’t remember which variance to use. Therefore, you set up a small network and try some numbers.
You initialize the weights one time with Var(w) = 0.02 and one time with Var(w) = 1.0:

Q--~i...:::::::-::{~}-
.." .
½)_.-·W3
Inputs:

• i1 = 2, i2 = −4, i3 = 1
Var(w) = 0.02:

• w1 = 0.05, w2 = 0.025, w3 = −0.03

Var(w) = 1.0:
• w1 = 1.0, w2 = 0.5, w3 = 1.5

0 a) Compute a forward pass for each set of weights and draw the results of the linear layer in the Figure of the
tanh plot. You don’t need to compute the tanh.
1

x
−2 −1 1 2

−1

– Page 4 / 20 –
b) Using the results above, explain what problems can arise during backpropagation of deep neural networks 0
when initializing the weights with too small and too large variance. Also, explain the root of these problems.
1

c) Which initialization scheme did you learn in the lecture that tackles these problems? What does this 0
initialization try to achieve in the activations of deep layers of the neural network?
1

d) After switching from tanh to ReLU activation functions, one of your initial problems occurs again. Why 0
does this happen? How can you modify the initialization scheme proposed in c) to adjust it for this new
non-linearity? 1

– Page 5 / 20 –
Problem 3 Batch Normalization and Computation Graphs (6 credits)
For an input vector x as well as variables γ and β the general formula of batch normalization is given by

x − E[x]
x̂ = √
Var[x]
y = γ x̂ + β .

0 a) Why would one want to apply batch normalization in a neural network?

0 b) Why are γ and β needed in the batch normalization formula?

0 c) How is a batch normalization layer applied at training (1p) and at test (1p) time?

0 d) Computational graph of a batch normalization layer. Fill out the nodes (circles) of the following computa-
√ 1
tional graph. Each node can consist of one of the following operations +, −, ∗, 2 , , ..
1

– Page 6 / 20 –
Problem 4 Convolutional Neural Networks and Receptive Field (12 credits)
A friend of yours asked for a quick review of convolutional neural networks. As he has some background in
computer graphics, you start by explaining previous uses of convolutional layers.

a) You are given a two dimensional input (e.g., a grayscale image). Consider the following convolutional 0
kernels
  1
1 1 1 % &
1 1 −1 2
C1 = · 1 1 1 , C2 = .
9 1 −1
1 1 1

What are the effects of the filter kernels C1 and C2 when applied to the image?

After showing him some results of a trained network, he immediately wants to use them and starts building a
model in Pytorch. However, he is unsure about the layer sizes so you quickly help him out.

b) Given a Convolution Layer in a network with 5 filters, filter size of 7, a stride of 3, and a padding of 1. For 0
an input feature map of 26 × 26 × 26, what is the output dimensionality after applying the Convolution Layer
to the input? 1

c) You are given a convolutional layer with 4 filters, kernel size 5, stride 1, and no padding that operates on 0
an RGB image.
1
1. What is the shape of its weight tensor?
2
2. Name all dimensions of your weight tensor.

Now that he knows how to combine convolutional layers, he wonders how deep his network should be. After
some thinking, you illustrate the concept of receptive field to him by these two examples. For the following
two questions, consider a grayscale 224x224 image as network input.

d) A convolutional neural network consists of 3 consecutive 3 × 3 convolutional layers with stride 1 and no 0
padding. How large is the receptive field of a feature in the last layer of this network?
1

– Page 7 / 20 –
0 e) Consider a network consisting of a single layer.

1 1. What layer choice has a receptive field of 1?

2 2. What layer has a receptive field of the full image input?

Blindly, he stacks 10 convolutional layers together to solve his task. However, the gradients seem to vanish
and he can’t seem to be able to train the network. You remember from your lecture that ResNet blocks were
designed for these purposes.

x ;. H(x)
I
I

0 f) Draw a ResNet block in the image above (1p) containing two linear layers, which you can represent by
l1 and l2 . For simplicity, you don’t need to draw any non-linearities. Why does such a block improve the
1 vanishing gradient problem in deep neural networks (1p)?
2

∂ R(x)
0 g) For your above drawing, given the partial derivative of the residual block R(x) = l2 (l1 (x)) as ∂x = r,
calculate ∂ H(x)
∂x .
1

– Page 8 / 20 –
Problem 5 Training a Neural Network (15 credits)
A team of architects approaches you for your deep learning expertise. They have collected nearly 5,000
hand-labeled RGB images and want to build a model to classify the buildings into their different architectural
styles. Now they want to classify images of architectures into 3 classes depending on their style:

Islamic Baroque Soochow

a) How would you split your dataset and give a meaningful percentage as answer. 0

b) After visually inspecting the different splits in the dataset, you realize that the training set only contains 0
pictures taken during the day, whereas the validation set only has pictures taken at night. Explain what is the
issue and how you would correct it. 1

c) As you train your model, you realize that you do not have enough data. Unfortunately, the architects are 0
unable to collect more data so you have to temper the data. Provide 4 data augmentation techniques that
can be used to overcome the shortage of data. 1

– Page 9 / 20 –
0 What is the saddle point and what is the problem with GD?
1

0 e) While training your classifier you experience that loss only slowly converges and always plateaus
independent of the used learning rate. Now you want to use Stochastic Grading Descent (SGD) instead of
1 Gradient Descent (GD). What is an advantage of SGD compared to GD in dealing with saddle points?

0 f) Explain the concept behind momentum in SGD

0 g) Why would one want to use larger mini-batches in SGD?

0 h) Why do we usually use small mini-batches in practice?

0 i) There exists a whole zoo of different optimizers. Name an optimizer that uses both first and second order
moment
1

0 j) Choosing a reasonable learning rate is not easy.

1 1. Name a problem that will result from using a learning rate that is too high (1p).

2 2. Name a problem that will arise from using a learning rate that is too low (1p)?

– Page 10 / 20 –
k) Finally you plot the loss curves with a suitable learning rate for both training data and validation data. 0
What’s the issue of period 2 called? Name a possible actions that you could do without changing the number
of parameters in your network to counteract this problem. 1

2
Loss

Validation set

Training set
1 2

– Page 11 / 20 –
Problem 6 Recurrent Neural Networks and Backpropagation (9 credits)
Consider a vanilla RNN cell of the form ht = tanh(V · ht −1 + W · xt + b). The figure below shows the input
sequence x1 , x2 , and x3 .

0 a) Given the dimensions xt ∈ R3 and ht ∈ R5 , what is the number of parameters in the RNN cell? (Calculate
final number)
1

0 b) If xt and b are the 0 vector, then ht = ht −1 for any value of ht . Discuss whether this statement is correct.

Now consider the following one-dimensional ReLU-RNN cell without bias b.

ht = ReLU(V · ht −1 + W · xt )

(Hidden state, input, and weights are scalars)

0 c) Calculate h2 and h3 where

1 V = −3, W = 3, h0 = 0, x1 = 2, x2 = 3 and x3 = 1.

– Page 12 / 20 –
∂ h3 ∂ h3 ∂ h3
d) Calculate the derivatives ∂V , ∂W , and ∂ x1 for the forward pass of the ReLU-RNN where 0

3 1
V = −2, W = 1, h0 = 2, x1 = 2, x2 = and x3 = 4.
2
2
for the forward outputs
2 3
h1 = 0, h2 = , h3 = 1.
3
!
!
Use that !
∂ x ReLU(x)! = 0.
∂
x=0

– Page 13 / 20 –
0 e) A Long-Short Term Memory (LSTM) unit is defined as

1 g1 = σ (W1 · xt + U1 · ht −1 ) ,
g2 = σ (W2 · xt + U2 · ht −1 ) ,
2
g3 = σ (W3 · xt + U3 · ht −1 ) ,
c̃t = tanh (Wc · xt + uc · ht −1 ) ,
ct = g2 ◦ ct −1 + g3 ◦ c̃t ,
h t = g 1 ◦ ct ,

where g1 , g2 , and g3 are the gates of the LSTM cell.

1) Assign these gates correctly to the forget f , update u, and output o gates. (1p)
2) What does the value ct represent in a LSTM? (1p)

– Page 14 / 20 –
Problem 7 Autoencoder and Network Transfer (11 credits)
You are given a dataset containing 10,000 RGB images with height H and width W of single coins without
any labels or additional information.

To work with the image dataset you build an autoencoder as depicted in the figure below:
HxWx3

HxWx3
H Encoder
IZ
Decoder H

W W

The input of the encoder is the images of dimension (H × W × 3) which are transformed into a one-
dimensional real vector with z entries. The latent code is used to decode the input image with the same
dimension (H × W × 3). Both encoder and decoder are neural networks and the combined network is
trainable and uses the L2 loss as its optimization function.

a) Is an autoencoder an example of unsupervised learning or supervised learning? 0

b) As the data gets scaled down from the original dimension to a lower-dimensional bottleneck, an autoen- 0
coder can be used for data compression. How does an autoencoder as described above differ from linear
methods to reduce the dimensionality of the data such as PCA (principal component analysis)? 1

c) For an autoencoder we can vary the size of the bottleneck. Discuss briefly what may happen if 0

i) the latent space is too small (1p). 1

ii) the latent space is too big (1pt) 2

– Page 15 / 20 –
0 d) Now, you want to generate a random image of a coin. To do so, can you just randomly sample a vector
from the latent space to generate a new coin image?
1

0 e) Now, someone gives you 1,000 images that are annotated for semantic segmentation of coin and
background as shown in the image above. How would you change the architecture of the discussed
1 autoencoder network to perform semantic segmentation?

0 f) If you wanted to train the new semantic segmentation network what loss function would you use and how?

0 g) How would you leverage your pretrained autoencoder for training a new segmentation network efficiently?

0 h) Why do you expect the pretrained autoencoder variant to generalize more than a randomly initialized
network?
1

– Page 16 / 20 –
Problem 8 Unsorted Short Questions (11 credits)

a) Why do we need activation functions in our neural networks? 0

b) You are solving the binary classification task of classifying images as cars vs. persons. You design a CNN 0
with a single output neuron. Let the output of this neuron be z . The final output of your network, ŷ is given by:
1
ŷ = σ (ReLU(z)) ,
where σ denotes the sigmoid function. You classify all inputs with a final value ŷ ≥ 0.5 as car images. What
problem are you going to encounter?

c) Suggest a method to solve exploding gradients when training fully-connected neural networks. 0

d) 0

Was a badly phrased question. Removed.

e) Why do we often refer to L 2-regularization as “weight decay”? Derive a the mathematical expression that 0
includes the weights W , the learning rate η, and the L 2-regularization hyperparameter λ to explain your
point. 1

– Page 17 / 20 –
0 f) You are given input samples x = (x1 , ... , xn ) for which each component xj is drawn from a distribution with
zero mean. For an input vector x the output s = (s1 , ... , sn ) is given by
1
n
!
2 si = wij · xj ,
j=1
3
where your weights w are inititalized by a uniform random distribution U (−α, α).
4
How do you have to choose α such that the variance of the input data and the output is identical, hence
Var(s) = Var(x)?

Hints: For two statistically independent variables X and Y holds:

" #2 " #2
Var(X · Y ) = E(X ) Var(Y ) + E(Y ) Var(X ) + Var(X )Var(Y )

Furthermore the PDF of an uniform distribution U (a, b) is

$
1
for x ∈ [a, b]
f (x) = b −a
0 otherwise.

The variance of a continuous distribution is calculated as

%
Var(X ) = x 2 f (x) dx − µ2 ,
R

where µ is the expected value of X .

Bonus question: Too complex.

– Page 18 / 20 –
Additional space for solutions–clearly mark the (sub)problem your answers are related to and strike
out invalid solutions.

– Page 19 / 20 –
– Page 20 / 20 –

CS230: Deep Learning: Winter Quarter 2018 Stanford University Midterm Examination 180 Minutes
100% (1)
CS230: Deep Learning: Winter Quarter 2018 Stanford University Midterm Examination 180 Minutes
36 pages
Question Bank
No ratings yet
Question Bank
14 pages
Solution PDF
No ratings yet
Solution PDF
20 pages
Solution Dseclzg524 05-07-2020 Ec3r
No ratings yet
Solution Dseclzg524 05-07-2020 Ec3r
7 pages
MT1SP19
No ratings yet
MT1SP19
13 pages
SS 2020 Solutions
No ratings yet
SS 2020 Solutions
22 pages
SS_2021
No ratings yet
SS_2021
16 pages
DL Quiz1
No ratings yet
DL Quiz1
5 pages
WS_2021
No ratings yet
WS_2021
16 pages
Solution: Introduction To Deep Learning
No ratings yet
Solution: Introduction To Deep Learning
20 pages
SS_2021_Solutions
No ratings yet
SS_2021_Solutions
16 pages
WS_2021_Solutions
No ratings yet
WS_2021_Solutions
16 pages
Mock Endterm ADL 2021
No ratings yet
Mock Endterm ADL 2021
8 pages
DNN Cluster S2 22 MidSem Makeup
No ratings yet
DNN Cluster S2 22 MidSem Makeup
7 pages
CS230 Midterm Solutions Fall 2021
No ratings yet
CS230 Midterm Solutions Fall 2021
14 pages
Chapter 9
No ratings yet
Chapter 9
73 pages
APS360H1 20231 631682452284APS360 Midterm Winter 2023
No ratings yet
APS360H1 20231 631682452284APS360 Midterm Winter 2023
16 pages
Second Exam 2021-22
No ratings yet
Second Exam 2021-22
14 pages
Midpaper
No ratings yet
Midpaper
16 pages
GENAI-SEE
No ratings yet
GENAI-SEE
51 pages
CST414-SCHEME
No ratings yet
CST414-SCHEME
8 pages
CT1 DL Ans
No ratings yet
CT1 DL Ans
13 pages
Module 2
No ratings yet
Module 2
13 pages
be_electronics-and-telecommunication-engineering_semester-7_2024_may_deep-learning-dl-2019-pattern
No ratings yet
be_electronics-and-telecommunication-engineering_semester-7_2024_may_deep-learning-dl-2019-pattern
2 pages
DSE 3151 25 Sep 2023
No ratings yet
DSE 3151 25 Sep 2023
9 pages
29122024
No ratings yet
29122024
12 pages
Deep Learning
No ratings yet
Deep Learning
5 pages
DL - Midterm - Fall23
No ratings yet
DL - Midterm - Fall23
2 pages
Deep Learning
No ratings yet
Deep Learning
5 pages
Deep Learning 15
No ratings yet
Deep Learning 15
13 pages
DL Exam 2023-2
No ratings yet
DL Exam 2023-2
5 pages
ML Endsem 2022
No ratings yet
ML Endsem 2022
7 pages
CSE489: Machine Vision (Sheet 7) : Yehia Zakaria
No ratings yet
CSE489: Machine Vision (Sheet 7) : Yehia Zakaria
34 pages
Deep+Learning+Module-02+Search+Creators
No ratings yet
Deep+Learning+Module-02+Search+Creators
15 pages
26 Neural Nets
No ratings yet
26 Neural Nets
77 pages
Deep Learning_Average Learner problems
No ratings yet
Deep Learning_Average Learner problems
3 pages
ANN notes
No ratings yet
ANN notes
7 pages
L4 Training Neural Networks en
No ratings yet
L4 Training Neural Networks en
48 pages
General Notes: Heruntergeladen Durch Petre Weinberger (Extern - Weinberger@tum - De)
No ratings yet
General Notes: Heruntergeladen Durch Petre Weinberger (Extern - Weinberger@tum - De)
6 pages
Quiz sol
No ratings yet
Quiz sol
4 pages
neural network -test questions
No ratings yet
neural network -test questions
9 pages
CS230 Midterm Fall 2022
No ratings yet
CS230 Midterm Fall 2022
14 pages
Assignment Class Notes
No ratings yet
Assignment Class Notes
8 pages
WEEK 9
No ratings yet
WEEK 9
80 pages
SP18 Practice Midterm
No ratings yet
SP18 Practice Midterm
5 pages
Notes Chapter8
No ratings yet
Notes Chapter8
4 pages
[AK]_AIMLCZG511_Midsem_Regular
No ratings yet
[AK]_AIMLCZG511_Midsem_Regular
7 pages
Exam - Deep Learning - From Theory To Practice (201800177) - Jan 22 2019
No ratings yet
Exam - Deep Learning - From Theory To Practice (201800177) - Jan 22 2019
3 pages
Cs230exam Win19 Soln
No ratings yet
Cs230exam Win19 Soln
29 pages
Must Know Questions Deep Learning
No ratings yet
Must Know Questions Deep Learning
22 pages
Introduction to ANN
No ratings yet
Introduction to ANN
6 pages
DL_EXP-3_16010422230
No ratings yet
DL_EXP-3_16010422230
9 pages
2-Qp Key Ece3048 Deep Learning f2 Cat1
No ratings yet
2-Qp Key Ece3048 Deep Learning f2 Cat1
3 pages
CII4Q3 - Computer Vision-EAR - Week-11-Intro To Deep Learning v1.0
No ratings yet
CII4Q3 - Computer Vision-EAR - Week-11-Intro To Deep Learning v1.0
50 pages
a imprimer 4
No ratings yet
a imprimer 4
4 pages
Data Mining: Practical Machine Learning Tools and Techniques
No ratings yet
Data Mining: Practical Machine Learning Tools and Techniques
123 pages
Cs230exam Fall18 PDF
No ratings yet
Cs230exam Fall18 PDF
32 pages
Artificial Intelligence Interview Questions
From Everand
Artificial Intelligence Interview Questions
Tech Interviews
5/5 (2)
Hidden Line Removal: Unveiling the Invisible: Secrets of Computer Vision
From Everand
Hidden Line Removal: Unveiling the Invisible: Secrets of Computer Vision
Fouad Sabry
No ratings yet
Introduction to Deep Learning
From Everand
Introduction to Deep Learning
Eugene Charniak
No ratings yet
Asansol Engineering College: Topic: PPT Assignment
No ratings yet
Asansol Engineering College: Topic: PPT Assignment
11 pages
Extended Abstract Proceeding-ILSC2022
No ratings yet
Extended Abstract Proceeding-ILSC2022
109 pages
Bsadcom 201910007
No ratings yet
Bsadcom 201910007
18 pages
Recent Development in Applied Science
No ratings yet
Recent Development in Applied Science
68 pages
Ascend International (AI) Company Presentation
No ratings yet
Ascend International (AI) Company Presentation
21 pages
Parameter-Efficient Fine-Tuning of Whisper For Low-Resource Speech Recognition
No ratings yet
Parameter-Efficient Fine-Tuning of Whisper For Low-Resource Speech Recognition
4 pages
ITM Mediadaten 2021 en
No ratings yet
ITM Mediadaten 2021 en
13 pages
Question Bank
No ratings yet
Question Bank
5 pages
Elective Paper V &vi - Ug Updated
No ratings yet
Elective Paper V &vi - Ug Updated
19 pages
AI Powered Career Counselor Project
No ratings yet
AI Powered Career Counselor Project
6 pages
FAI Unit 1
No ratings yet
FAI Unit 1
32 pages
Artificial Intelligence in Finance - Turing Report 0
100% (1)
Artificial Intelligence in Finance - Turing Report 0
50 pages
Assignment 3: Introduction To Machine Learning Prof. B. Ravindran
No ratings yet
Assignment 3: Introduction To Machine Learning Prof. B. Ravindran
4 pages
L6-SA_en_Summative Assessment - Foundations Unit
No ratings yet
L6-SA_en_Summative Assessment - Foundations Unit
16 pages
MTech AICurriculum 2022
No ratings yet
MTech AICurriculum 2022
2 pages
Wine5 PDF
No ratings yet
Wine5 PDF
29 pages
SECE Computer MastersPrograms Course Description Only
No ratings yet
SECE Computer MastersPrograms Course Description Only
60 pages
Chapter 11 Test Bank
No ratings yet
Chapter 11 Test Bank
30 pages
Malcode Detection
No ratings yet
Malcode Detection
5 pages
Class X AI Question Bank
No ratings yet
Class X AI Question Bank
2 pages
Ankit Shukla
No ratings yet
Ankit Shukla
24 pages
Health and Med Tech Sadhana
No ratings yet
Health and Med Tech Sadhana
94 pages
U-4_IML
No ratings yet
U-4_IML
17 pages
Clustering Today
No ratings yet
Clustering Today
52 pages
Cognitive Computing Theory and Applications 1st Edition Venkat N. Gudivada - The ebook is available for instant download, no waiting required
No ratings yet
Cognitive Computing Theory and Applications 1st Edition Venkat N. Gudivada - The ebook is available for instant download, no waiting required
32 pages
SpokenLanguages2 - Report
No ratings yet
SpokenLanguages2 - Report
4 pages
AI Midterm Quiz 1 - Attempt Review
No ratings yet
AI Midterm Quiz 1 - Attempt Review
6 pages
Artikel AI ChatGPT
No ratings yet
Artikel AI ChatGPT
14 pages
Mueller2017 Presentation - Explaining and Interpreting Deep Neural Networks LRP
No ratings yet
Mueller2017 Presentation - Explaining and Interpreting Deep Neural Networks LRP
81 pages
Digital fluency_Question Bank
No ratings yet
Digital fluency_Question Bank
36 pages

SS_2020

Uploaded by

SS_2020

Uploaded by

Chair of Visual Computing & Artificial Intelligence

Introduction to Deep Learning

Left room from to

Introduction to Deep Learning

Prof. Leal-Taixé and Prof. Nießner

Tuesday 11th August, 2020

• The total amount of achievable credits in this exam is 90 credits.

• Do not write with red or green colors nor use pencils.

How to Check a Box:

• Please cross the respective box: (interpreted as checked)

• If you change your mind, please fill the box:

InceptionV3 uses filters of different kernel sizes.

VGG16 only uses convolutional layers.

Increase the learning rate.

All input nodes are connected to all output nodes.

f) Logistic regression (check all that apply).

Uses a type of cross-entropy loss.

Allows to perform binary classiﬁcation.

has a learnable parameter.

is continuous and differentiable everywhere.

h) Your training loss does not decrease. What could be wrong?

Too much regularization.

Dropout probability not high enough.

• w1 = 0.05, w2 = 0.025, w3 = −0.03

0 a) Why would one want to apply batch normalization in a neural network?

0 b) Why are γ and β needed in the batch normalization formula?

1 1. What layer choice has a receptive field of 1?

2 2. What layer has a receptive field of the full image input?

Islamic Baroque Soochow

0 f) Explain the concept behind momentum in SGD

0 g) Why would one want to use larger mini-batches in SGD?

0 h) Why do we usually use small mini-batches in practice?

0 j) Choosing a reasonable learning rate is not easy.

Now consider the following one-dimensional ReLU-RNN cell without bias b.

(Hidden state, input, and weights are scalars)

0 c) Calculate h2 and h3 where

where g1 , g2 , and g3 are the gates of the LSTM cell.

a) Is an autoencoder an example of unsupervised learning or supervised learning? 0

i) the latent space is too small (1p). 1

ii) the latent space is too big (1pt) 2

a) Why do we need activation functions in our neural networks? 0

Was a badly phrased question. Removed.

Hints: For two statistically independent variables X and Y holds:

Furthermore the PDF of an uniform distribution U (a, b) is

The variance of a continuous distribution is calculated as

where µ is the expected value of X .

Bonus question: Too complex.

You might also like