10_Improving_Deep_Neural_Networks_Hyperparameter_Tuning,_Regularization

The document contains a series of questions and answers related to deep learning concepts, covering topics such as hyperparameters, neural network architecture, activation functions, and training techniques. Each question is followed by the correct answer, providing insights into best practices and common challenges in training deep neural networks. The content serves as a quiz or study guide for individuals looking to enhance their understanding of deep learning fundamentals.

Uploaded by

eyob53834

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

3 views

10_Improving_Deep_Neural_Networks_Hyperparameter_Tuning,_Regularization

Uploaded by

eyob53834

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 6

1.

Which of the following is NOT typically considered a hyperparameter in training a deep

neural network?
a. Number of epochs
b. Learning rate
c. Weight parameters of the network after training
d. Batch size
2. In a fully connected feedforward neural network, what best describes the term “layer”?
a. A collection of nodes receiving input directly from the outside world
b. A set of neurons where each unit takes input from the previous set of neurons and
passes output to the next
c. A single neuron with trainable weights
d. The output prediction vector
3. When using the ReLU activation function, what is the output for a negative input value x
< 0?
a. x
b. |x|
c. 0
d. A small negative constant
4. Suppose you have a binary classification problem. Which of the following is the most
appropriate output activation function on the last layer?
a. ReLU
b. Sigmoid
c. Tanh
d. Softmax with equal probabilities
5. If the input dimension to a layer is 128 and the layer has 64 neurons, how many weight
parameters (excluding bias) connect these inputs to the next layer?
a. 64
b. 128
c. 8192
d. 192
6. In the context of gradient descent, what is one common reason for using mini-batch
gradient descent instead of full-batch or purely stochastic gradient descent?
a. It always finds the global minimum
b. It reduces variance in gradient estimates and improves computational efficiency
c. It guarantees a better learning rate
d. It prevents any form of overfitting
7. During forward propagation in a neural network, what is computed at each neuron
(ignoring activation functions)?
a. The gradient of the error with respect to the weights
b. The weighted sum of inputs plus a bias
c. The weight decay term
d. The maximum value of all the inputs
8. Which of the following best describes the purpose of a validation set?
a. To provide data for updating model parameters during training
b. To tune hyperparameters and prevent overfitting
c. To never be used after training
d. To provide additional data to increase the training set size
9. One hallmark of deep learning is:
a. Shallow architectures with a single layer
b. Feature engineering done entirely by humans
c. Multiple layers of learned feature representations
d. Guaranteed perfect generalization
10. Which of the following is a common symptom of vanishing gradients during training?
a. Model weights become extremely large
b. Learning slows down dramatically or stops altogether
c. Training loss increases rapidly
d. The network refuses to compile
11. Weight initialization with small random values close to zero is typically done to:
a. Break symmetry and ensure each neuron learns different features
b. Guarantee immediate convergence
c. Ensure that all neurons learn identical parameters
d. Prevent the network from using activation functions
12. Given a training set loss and a validation set loss, if you notice your training loss keeps
decreasing but your validation loss starts increasing, what phenomenon is most likely
occurring?
a. Underfitting
b. Overfitting
c. Proper generalization
d. Stable convergence
13. Suppose you have a dataset with features on vastly different scales. Which method might
you apply before training a deep neural network?
a. Dropout
b. Feature scaling (e.g., standardization or normalization)
c. Early stopping
d. Gradient clipping
14. The backpropagation algorithm primarily uses which rule to compute gradients?
a. Hebbian learning rule
b. Forward-mode differentiation
c. Chain rule of calculus
d. Laplacian smoothing
15. Which of the following techniques can help reduce overfitting in deep networks?
a. Increasing the learning rate indefinitely
b. Dropout regularization
c. Using infinitely many layers
d. Replacing ReLU with linear activation
16. A softmax output layer is commonly used for:
a. Multi-class classification tasks
b. Regression tasks with continuous outputs
c. Binary classification tasks only
d. Unsupervised dimension reduction
17. Momentum in optimization helps primarily by:
a. Keeping the network weights unchanged
b. Accelerating convergence by dampening oscillations in the gradient direction
c. Discarding previously computed gradients
d. Making the gradient updates random
18. In a neural network, bias terms are used to:
a. Reduce overfitting by removing flexibility
b. Shift the activation function and improve representational power
c. Ensure the weights remain constant
d. Increase training time without changing performance
19. Batch normalization is used to:
a. Eliminate the need for activation functions
b. Stabilize the distribution of layer inputs and speed up training
c. Make training data unnecessary
d. Ensure weights never decay
20. If a network’s parameters are initialized too large, what is a likely outcome when training
begins?
a. Gradients will vanish
b. Gradients might explode, causing unstable updates
c. All weights will freeze at zero
d. Perfect generalization from the first epoch
21. Given a network with L hidden layers, which layer's activations serve as input to the
(L+1)-th layer?
a. The first hidden layer’s input
b. The previous layer’s activations
c. The final output layer
d. The raw input features
22. The term “epoch” in neural network training refers to:
a. A single update of weights after one batch
b. One complete pass through the entire training dataset
c. The time it takes to initialize the network
d. The final stage of model evaluation
23. L2 regularization encourages weights to be:
a. Larger in magnitude
b. Closer to zero
c. Completely sparse (many zeros)
d. Randomly shuffled after each update
24. Assume a neural network outputs a probability distribution over classes. Which loss
function is most commonly used for multi-class classification tasks?
a. Mean Squared Error (MSE)
b. Binary Cross-Entropy
c. Categorical Cross-Entropy (Softmax Cross-Entropy)
d. Hinge Loss
25. Early stopping is a form of:
a. Weight initialization technique
b. Data augmentation strategy
c. Regularization that halts training before overfitting
d. Optimization algorithm
26. The main goal of the activation function in a neuron is to:
a. Keep the output linear
b. Introduce non-linearity and allow complex decision boundaries
c. Scale all outputs to a fixed range without non-linearity
d. Control the learning rate
27. A deep network that consistently predicts the average of training outputs (e.g., a constant
value) for all inputs is likely:
a. Underfitting the data
b. Overfitting the data
c. Perfectly trained
d. Experiencing exploding gradients
28. The primary difference between the forward pass and backward pass in training a neural
network is:
a. The forward pass computes outputs from inputs, while the backward pass computes
gradients from outputs to inputs
b. The backward pass updates inputs, while the forward pass updates weights
c. Both passes compute gradients
d. The backward pass uses no activation functions
29. In practice, to prevent numerical instability when computing the softmax function, one
might:
a. Use very large numbers in the exponent
b. Subtract the maximum input value from each input before exponentiation
c. Add a small constant to inputs
d. Use random values for initial exponentiation
30. “Glorot” (Xavier) initialization is designed to:
a. Initialize biases to large negative values
b. Keep the variance of outputs at each layer roughly the same, preventing
vanishing/exploding gradients
c. Guarantee no overfitting
d. Make gradient calculations unnecessary
31. Dropout works by:
a. Setting a subset of activations to zero at random during training
b. Eliminating entire layers permanently
c. Adding Gaussian noise to the inputs
d. Forcing all weights to be identical
32. The choice of optimizer (e.g., SGD vs. Adam) primarily affects:
a. How the architecture of the neural network is designed
b. How gradients are used to update the parameters, potentially impacting training speed
and stability
c. The training dataset’s size
d. The number of layers needed in the network
33. Suppose you apply L1 regularization to your neural network. This encourages:
a. Weights to become sparser (more zeros)
b. No change in weight magnitude
c. Weights to grow without bound
d. Weights to stay strictly positive
34. A potential advantage of deep networks over shallow models is:
a. They never require regularization
b. They can learn hierarchical representations of data
c. They train faster with no tuning
d. They always have fewer parameters
35. Which is NOT a benefit of using vectorized operations (as opposed to explicit loops) for
training neural networks?
a. Faster computations due to optimized linear algebra libraries
b. Easier to implement automatic differentiation
c. Reduction in code complexity
d. Necessarily higher accuracy on the validation set
36. If you have a large training set and notice your model is still overfitting, which strategy
might help?
a. Increase the model size further
b. Apply stronger regularization (e.g., dropout, L2)
c. Train for more epochs
d. Stop using batch normalization
37. Activation functions like sigmoid or tanh can suffer from saturation, which leads to:
a. Exploding gradients
b. Zero gradients in saturated regions, slowing or stopping learning
c. Infinite gradients
d. Constant updates to the weights
38. The main idea behind using a validation set separate from the test set is to:
a. Use it for final model performance reporting
b. Adjust hyperparameters without contaminating the test performance estimate
c. Make the model generalize instantly
d. Skip the need for a training set
39. Suppose a neural network uses sigmoid activation in the output layer for binary
classification. The predicted output is 0.8. This output can be interpreted as:
a. The predicted probability of the positive class is 0.8
b. The raw class score for the positive class is 0.8 units
c. The network is certain of the positive class
d. The margin for a linear classifier
40. When dealing with very high-dimensional inputs (e.g., images), deep networks help by:
a. Relying solely on handcrafted features
b. Automatically learning complex features hierarchically from raw data
c. Ignoring lower-level patterns
d. Reducing the input dimension to a single scalar without learning
1. c (Hyperparameters are set before training, learned weights after training are not
hyperparameters.)
2. b (A layer is a set of neurons that transform inputs to outputs for the next layer.)
3. c (ReLU(x) = max(0, x), so negative inputs produce 0.)
4. b (For binary classification, a sigmoid output is standard.)
5. c (Number of weights = 128 inputs * 64 neurons = 8192.)
6. b (Mini-batch gradient descent balances variance and computation.)
7. b (A neuron sums weighted inputs plus a bias, then applies activation.)
8. b (Validation sets help tune hyperparameters and prevent overfitting.)
9. c (Deep learning involves multiple levels of representation.)
10. b (Vanishing gradients slow or halt learning.)
11. a (Small random initialization breaks symmetry.)
12. b (Training loss ↓ while validation loss ↑ typically indicates overfitting.)
13. b (Feature scaling ensures numeric stability and efficient training.)
14. c (Backpropagation uses the chain rule of calculus.)
15. b (Dropout is a common method to reduce overfitting.)
16. a (Softmax is used for multi-class outputs.)
17. b (Momentum helps accelerate convergence and reduce oscillations.)
18. b (Bias shifts the activation function and increases representational flexibility.)
19. b (Batch normalization stabilizes input distributions of intermediate layers.)
20. b (Excessively large initial weights can lead to exploding gradients.)
21. b (Each layer receives input from the previous layer’s outputs.)
22. b (An epoch is one full pass over the training set.)
23. b (L2 regularization pushes weights towards zero.)
24. c (Categorical cross-entropy is standard for multi-class classification.)
25. c (Early stopping prevents overfitting by halting training.)
26. b (Activation functions add non-linearity.)
27. a (Predicting a constant average often means underfitting.)
28. a (Forward pass: inputs → outputs; Backward pass: gradients from outputs →
inputs.)
29. b (Subtracting max input value reduces potential overflow in the exponent.)
30. b (Glorot initialization maintains variance across layers.)
31. a (Dropout randomly “drops” certain neurons’ outputs during training.)
32. b (Optimizers affect how parameters are updated.)
33. a (L1 regularization encourages sparse weights.)
34. b (Deep networks learn hierarchical feature representations.)
35. d (Vectorization does not guarantee higher accuracy, it just makes computation
faster and cleaner.)
36. b (If still overfitting, stronger regularization can help.)
37. b (Sigmoid/tanh saturation means gradients approach zero.)
38. b (Validation sets guide hyperparameter tuning without touching the final test
set.)
39. a (Sigmoid outputs represent probabilities of the positive class.)
40. b (Deep nets learn features automatically from raw data.)

Technology and Society Social Networks Power and Inequality 3rd
100% (1)
Technology and Society Social Networks Power and Inequality 3rd
368 pages
All Notes Hcia Ai Huawei Mock Exam Written
50% (4)
All Notes Hcia Ai Huawei Mock Exam Written
28 pages
Question Bank
No ratings yet
Question Bank
14 pages
Merged Hcia Ai Huawei Mock Exam Written
No ratings yet
Merged Hcia Ai Huawei Mock Exam Written
25 pages
MT1SP19
No ratings yet
MT1SP19
13 pages
Managing People at Work - University of London Certificate - Coursera PDF
No ratings yet
Managing People at Work - University of London Certificate - Coursera PDF
1 page
Examen Deep Learning
100% (1)
Examen Deep Learning
8 pages
midterm-csci566
No ratings yet
midterm-csci566
10 pages
MCQs DL Mid I R20 2023 With Answers
No ratings yet
MCQs DL Mid I R20 2023 With Answers
3 pages
Deep Learning
No ratings yet
Deep Learning
5 pages
Neural Network Quiz Questions
No ratings yet
Neural Network Quiz Questions
6 pages
ITML MID-2 Bits
No ratings yet
ITML MID-2 Bits
17 pages
deep learning mcq (Autosaved)
No ratings yet
deep learning mcq (Autosaved)
6 pages
Question Bank - Deep Learning
No ratings yet
Question Bank - Deep Learning
25 pages
Deep Learning 117 MCQ
No ratings yet
Deep Learning 117 MCQ
33 pages
CNN
No ratings yet
CNN
19 pages
Quiz_1_Spring_2024_CSE_638_Deep_Learning (1)
No ratings yet
Quiz_1_Spring_2024_CSE_638_Deep_Learning (1)
2 pages
MCQ
100% (1)
MCQ
9 pages
minor 1 - DNN
No ratings yet
minor 1 - DNN
2 pages
DL MID2 Bit Bank 2024-25
No ratings yet
DL MID2 Bit Bank 2024-25
25 pages
WS_2021
No ratings yet
WS_2021
16 pages
Deep Learning
No ratings yet
Deep Learning
6 pages
Deep Learning
No ratings yet
Deep Learning
17 pages
DL MCQ
No ratings yet
DL MCQ
13 pages
ObjectiveQ&a Mid-I NNDL
No ratings yet
ObjectiveQ&a Mid-I NNDL
15 pages
WS_2021_Solutions
No ratings yet
WS_2021_Solutions
16 pages
120 Deep Learning Important Questions + Answers ?
No ratings yet
120 Deep Learning Important Questions + Answers ?
68 pages
deped mission and vision
No ratings yet
deped mission and vision
5 pages
DL OBJECTIVES (1)
No ratings yet
DL OBJECTIVES (1)
4 pages
QB Btech DP Sem Viii 21-22
No ratings yet
QB Btech DP Sem Viii 21-22
12 pages
MLA Question Bank
No ratings yet
MLA Question Bank
25 pages
Deep_Learning_Interview_Q&A
No ratings yet
Deep_Learning_Interview_Q&A
10 pages
Mock Paper 8NN
No ratings yet
Mock Paper 8NN
2 pages
Artificial Neural Networks - Lect - 4
No ratings yet
Artificial Neural Networks - Lect - 4
17 pages
CAIS-Demo
No ratings yet
CAIS-Demo
15 pages
Artificial Intelligence MIDTERM
No ratings yet
Artificial Intelligence MIDTERM
5 pages
Soalan
No ratings yet
Soalan
2 pages
Deep Learning Final
No ratings yet
Deep Learning Final
17 pages
Shoolini University Mid Sem
No ratings yet
Shoolini University Mid Sem
3 pages
SS_2021_Solutions
No ratings yet
SS_2021_Solutions
16 pages
4 - Mcq-Ann-Ann-Quiz - Selected
No ratings yet
4 - Mcq-Ann-Ann-Quiz - Selected
13 pages
MT1 SP19 Solutions
No ratings yet
MT1 SP19 Solutions
14 pages
SP18 Practice Midterm
No ratings yet
SP18 Practice Midterm
5 pages
neural network basics
No ratings yet
neural network basics
37 pages
Assignment Mtech
No ratings yet
Assignment Mtech
5 pages
Week 8
No ratings yet
Week 8
3 pages
Deep Learning 15
No ratings yet
Deep Learning 15
13 pages
F16midterm Sols v2
No ratings yet
F16midterm Sols v2
14 pages
B. Tech - (AR19 & AR 20) Question Bank Template
No ratings yet
B. Tech - (AR19 & AR 20) Question Bank Template
7 pages
practice MCQ
No ratings yet
practice MCQ
19 pages
19CSE456_VI Sem May 2022
No ratings yet
19CSE456_VI Sem May 2022
6 pages
ML Mcqs Without Answers
50% (2)
ML Mcqs Without Answers
21 pages
H13-311_V3.5
No ratings yet
H13-311_V3.5
149 pages
Module 2
No ratings yet
Module 2
13 pages
Interview Questions in Neural Network
No ratings yet
Interview Questions in Neural Network
9 pages
Week8
No ratings yet
Week8
3 pages
Deep Learning
No ratings yet
Deep Learning
78 pages
tutorial 1,2
No ratings yet
tutorial 1,2
12 pages
Comptia Network+ Primer
From Everand
Comptia Network+ Primer
John Greene
No ratings yet
Backpropagation: Fundamentals and Applications for Preparing Data for Training in Deep Learning
From Everand
Backpropagation: Fundamentals and Applications for Preparing Data for Training in Deep Learning
Fouad Sabry
No ratings yet
CompTIA Network+ N10-005 Exam Questions 600+
From Everand
CompTIA Network+ N10-005 Exam Questions 600+
Eddie Vi
2/5 (1)
Comptia Server+ Primer
From Everand
Comptia Server+ Primer
John Greene
5/5 (1)
Lesson 10 GTG
No ratings yet
Lesson 10 GTG
18 pages
Provision Store
No ratings yet
Provision Store
36 pages
Lab 1 - Cybersecurity Case Studies
No ratings yet
Lab 1 - Cybersecurity Case Studies
2 pages
0 CSC 210 - Spring 2024 - Syllabus
No ratings yet
0 CSC 210 - Spring 2024 - Syllabus
2 pages
Tally Practical 2
No ratings yet
Tally Practical 2
14 pages
Energy Meters Catalogue
No ratings yet
Energy Meters Catalogue
28 pages
ESG-Economic-Validation-Fortinet-Automated-SOC-Jul-2023
No ratings yet
ESG-Economic-Validation-Fortinet-Automated-SOC-Jul-2023
16 pages
ALPINION Company Profile - ENG - F - 20181123
No ratings yet
ALPINION Company Profile - ENG - F - 20181123
30 pages
Garmin GI 275
No ratings yet
Garmin GI 275
328 pages
01 Kishan Shenoi - Fundamentals of Timing and Synchronization
No ratings yet
01 Kishan Shenoi - Fundamentals of Timing and Synchronization
15 pages
FZK Documents
No ratings yet
FZK Documents
5 pages
CATIA
No ratings yet
CATIA
59 pages
E-Invoice in KSA
No ratings yet
E-Invoice in KSA
16 pages
Software Testing Strategies and Methods
No ratings yet
Software Testing Strategies and Methods
13 pages
Ict Its4 09 0811 Monitor and Administer Database
100% (8)
Ict Its4 09 0811 Monitor and Administer Database
24 pages
VIGI 4E Principle Flyer 2025
No ratings yet
VIGI 4E Principle Flyer 2025
6 pages
User-Centered Website Development: A Human-Computer Interaction Approach
No ratings yet
User-Centered Website Development: A Human-Computer Interaction Approach
37 pages
Artificial Intelligence Abrham Y. (MSC) Department of Computer Science
No ratings yet
Artificial Intelligence Abrham Y. (MSC) Department of Computer Science
63 pages
7100 - Motorized Cable Reel - MagDrive
No ratings yet
7100 - Motorized Cable Reel - MagDrive
20 pages
HTML
No ratings yet
HTML
9 pages
Challenges and Opportunities For Online Education in India
No ratings yet
Challenges and Opportunities For Online Education in India
5 pages
ODI Statement of Direction 20200501
No ratings yet
ODI Statement of Direction 20200501
6 pages
DS-3T0510HP-E HS Datasheet 20240315
No ratings yet
DS-3T0510HP-E HS Datasheet 20240315
5 pages
DSA Revision Time Table
No ratings yet
DSA Revision Time Table
3 pages
EP3243185B1
No ratings yet
EP3243185B1
36 pages
Class IX - Viva - Questions
75% (8)
Class IX - Viva - Questions
3 pages
Lecture5 - CSP
No ratings yet
Lecture5 - CSP
47 pages
Dav Public School: Office
No ratings yet
Dav Public School: Office
2 pages