0% found this document useful (0 votes)
3 views

Assignment Jaiprakash

The document discusses key concepts in deep learning, including hyperparameters, activation functions, and training techniques like backpropagation and gradient descent. It explains the roles of various components such as Multi-layer Perceptrons, Generative Adversarial Networks, and regularization methods like L1/L2. Additionally, it covers the importance of techniques like dropout, batch normalization, and data augmentation in enhancing model performance.
Copyright
© © All Rights Reserved
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
3 views

Assignment Jaiprakash

The document discusses key concepts in deep learning, including hyperparameters, activation functions, and training techniques like backpropagation and gradient descent. It explains the roles of various components such as Multi-layer Perceptrons, Generative Adversarial Networks, and regularization methods like L1/L2. Additionally, it covers the importance of techniques like dropout, batch normalization, and data augmentation in enhancing model performance.
Copyright
© © All Rights Reserved
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
You are on page 1/ 5

ASSIGNMENT

Deep learning with keras and Tensorflow

ANS1.

#### a. *Hyperparameters*

Hyperparameters are settings or configurations that are specified before training a machine learning
model. These parameters are not learned from the data itself but are manually set by the
practitioner. Examples include the learning rate, number of hidden layers in a neural network, batch
size, and the number of epochs. Choosing the right set of hyperparameters is critical for model
performance, and hyperparameter tuning can be done through techniques like grid search or
random search.

#### b. *Softmax and ReLU Functions*

- *Softmax*:

Softmax is a function used primarily in classification tasks, especially in the output layer of a neural
network. It transforms the output logits (raw predictions) into probabilities, with each output
representing the probability of each class. Softmax is defined as:

\[

\text{Softmax}(z_i) = \frac{e^{z_i}}{\sum_{j=1}^{K} e^{z_j}}

\]

where \( z_i \) is the raw score for class \(i\) and \( K \) is the total number of classes. Softmax
ensures that the sum of all probabilities is 1, making them interpretable as probabilities.

- *ReLU (Rectified Linear Unit)*:

ReLU is a popular activation function used in hidden layers of a neural network. It is mathematically
expressed as:

\[

\text{ReLU}(x) = \max(0, x)

\]

It replaces all negative values in the input with 0 and leaves positive values unchanged. ReLU is
widely used due to its simplicity and its ability to help mitigate the vanishing gradient problem.

#### c. *Multi-layer Perceptron (MLP)*

A Multi-layer Perceptron (MLP) is a class of feedforward artificial neural network models consisting of
multiple layers: an input layer, one or more hidden layers, and an output layer. Each layer consists of
neurons that use an activation function to produce an output. MLPs are used for a wide range of
tasks like classification and regression and are trained using backpropagation to minimize the error
between predicted and actual outputs.

#### d. *Backpropagation Algorithm*

Backpropagation is the algorithm used to train neural networks by minimizing the error between
predicted outputs and actual labels. It works by calculating the gradient of the loss function with
respect to each weight in the network using the chain rule of calculus, and then updating the weights
in the opposite direction of the gradient to reduce the loss. This is done iteratively through multiple
passes (epochs) over the training dataset.

#### e. *Dropout and Batch Normalization*

- *Dropout*: Dropout is a regularization technique where, during training, random units (neurons) of
a neural network are set to zero with a certain probability. This prevents overfitting by reducing the
co-dependency between neurons and forcing the network to learn more robust features. During
testing, all neurons are used.

- *Batch Normalization*: Batch normalization normalizes the activations of a neural network layer by
adjusting and scaling them. It ensures that the output of a layer has a mean of zero and a standard
deviation of one. This helps to accelerate training, reduces internal covariate shift, and can act as a
form of regularization.

#### f. *Epoch, Batch, and Iteration in Deep Learning*

- *Epoch*: One full pass over the entire training dataset.

- *Batch*: A subset of the dataset used in one iteration of the training process.

- *Iteration*: One update of the model weights after processing a batch of data. The number of
iterations per epoch is determined by dividing the total number of training samples by the batch size.

#### g. *Data Augmentation*

Data augmentation is a technique used to increase the diversity of the training set by applying
random transformations (e.g., rotations, flips, scaling) to the input data. It is especially useful in tasks
like image classification to help prevent overfitting and to make the model more robust to variations
in the input data.

---
ANS2. *Weight Initialization in a Network*

Weights in a neural network are typically initialized randomly, often using techniques like
*Xavier/Glorot* or *He initialization*. The goal of weight initialization is to break symmetry (so all
neurons don't learn the same thing) and to prevent gradients from either vanishing or exploding.
Adding randomness is important to avoid biased starting points and to allow the network to explore
different regions of the weight space during training, helping to find optimal solutions.

---

ANS3. *Gradient Descent & Its Types*

- *Gradient Descent* is an optimization algorithm used to minimize a loss function by adjusting the
weights of the network in the direction of the negative gradient (the direction of steepest descent).

- *Batch Gradient Descent (BGD)* computes the gradient using the entire dataset. It can be
computationally expensive for large datasets, but it converges smoothly.

- *Stochastic Gradient Descent (SGD)* computes the gradient for each individual sample. It updates
weights more frequently, which can make it faster but also more noisy, leading to more fluctuations
during training.

---

ANS4. *Generative Adversarial Network (GAN)*

A *GAN* consists of two neural networks: the *generator* and the *discriminator*. The generator
creates fake data (e.g., images), and the discriminator attempts to distinguish between real and fake
data. The two networks are trained together in an adversarial process: the generator tries to fool the
discriminator, while the discriminator improves its ability to tell real from fake. Over time, the
generator produces more realistic data.

---

ANS5. *Activation Functions*

An activation function introduces non-linearity into the neural network, allowing it to learn complex
patterns. Without it, the network would essentially behave like a linear model.
- *Sigmoid*: Squashes input values to a range between 0 and 1. Used in binary classification.

- *Tanh*: Squashes input values between -1 and 1. It is similar to sigmoid but is centered around 0,
which can help mitigate the vanishing gradient problem.

- *ReLU*: As mentioned above, it replaces negative values with 0, making it faster and more effective
in training deep networks.

---

ANS6. *Autoencoders*

Autoencoders are unsupervised neural networks used for learning efficient representations of data.
They consist of two parts:

- *Encoder*: Compresses the input data into a lower-dimensional latent space representation.

- *Decoder*: Reconstructs the data from this compressed representation.

They are often used for dimensionality reduction, anomaly detection, and data denoising.

---

ANS7. *Why Use Batch Normalization?*

Batch normalization helps speed up training by stabilizing the learning process. It reduces internal
covariate shift (changes in the distribution of the network’s layer inputs during training) and can help
reduce the dependence on initialization and the learning rate. It also acts as a regularizer, reducing
the need for other forms of regularization like dropout.

---

ANS8. *Vanishing and Exploding Gradients*

- *Vanishing Gradients* occur when gradients become very small during backpropagation, leading to
slow learning or no learning at all, especially in deep networks. This is often seen with activation
functions like sigmoid or tanh.
- *Exploding Gradients* occur when gradients become too large, causing numerical instability and
causing the model’s weights to diverge. This can be mitigated by gradient clipping, proper
initialization, or using activation functions like ReLU.

---

ANS 9. *Effect of L1/L2 Regularization on Neural Networks*

- *L1 Regularization* adds a penalty equal to the absolute value of the weights, which encourages
sparsity (i.e., many weights become zero).

- *L2 Regularization* adds a penalty equal to the square of the weights, which discourages large
weights and tends to shrink them toward zero without making them exactly zero.

Both regularizations help prevent overfitting by constraining the complexity of the model.

---

ANS10. *Learning Rate in Neural Network Models*

The learning rate determines the size of the steps the model takes during optimization. A *high
learning rate* might cause the model to overshoot the optimal point, whereas a *low learning rate*
could result in very slow convergence or getting stuck in local minima. Fine-tuning the learning rate is
crucial for ensuring efficient training.

You might also like