0% found this document useful (0 votes)

29 views

DL Unit1

The document discusses several key machine learning concepts: 1. Regularization is used in machine learning models to prevent overfitting by adding a penalty term to the loss function that discourages complex patterns. 2. Common regularization techniques include L1 and L2 regularization. 3. Estimators, bias, and variance are important concepts for evaluating machine learning models. Estimators make predictions, bias measures overfitting, and variance measures sensitivity to training data. The goal is to balance bias and variance.

Uploaded by

Ankit Mahapatra

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

29 views

DL Unit1

Uploaded by

Ankit Mahapatra

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 10

Loss Function—Regularization

A loss function, also known as a cost function or objective function, is a critical component in machine
learning algorithms. It quantifies the difference between the predicted values and the actual target
values, serving as a measure of how well the model is performing on the training data. The goal of the
learning process is to minimize the loss function, which leads to better model performance and
improved generalization of unseen data.
Regularization is a technique used to prevent overfitting in machine learning models. It involves adding a
penalty term to the loss function that discourages the model from learning overly complex patterns from
the training data. Regularization helps to achieve a balance between fitting the training data well and
maintaining simplicity, reducing the risk of overfitting.

In linear regression and other models with linear relationships, the loss function typically consists of two
parts: the data fitting term (e.g., Mean Squared Error) and the regularization term. The overall loss
function can be written as:

Loss = Data Fitting Term + Regularization Term

The regularization term penalizes large coefficients (weights) in the model, encouraging the model to use
smaller weights and, therefore, simpler representations of the data. Two common types of regularization
are L1 regularization and L2 regularization:

1. L1 Regularization (Lasso Regression):

L1 regularization adds the sum of the absolute values of the model's coefficients to the loss
function. It encourages the model to set some coefficients to exactly zero, effectively performing
feature selection. L1 regularization can lead to sparse models with only a subset of the features
being important.

2. L2 Regularization (Ridge Regression):

L2 regularization adds the sum of the squares of the model's coefficients to the loss function. It
penalizes large weights and encourages all coefficients to be small but non-zero. L2
regularization does not lead to feature selection, and all features contribute to the model.

The amount of regularization is controlled by a hyperparameter, typically denoted as λ (lambda). By

tuning the value of λ, we can control the balance between the data fitting term and the regularization
term, and thus control the model's complexity.
In summary, regularization is used to prevent overfitting in machine learning models by adding a penalty
term to the loss function, encouraging the model to be simpler and more generalizable. It is an essential
technique in building robust and well-performing machine learning models, especially when dealing with
high-dimensional data or complex models.

McCulloch-Pitts units

McCulloch-Pitts units, also known as McCulloch-Pitts neurons, are the foundational building blocks of
artificial neural networks. They were proposed by Warren McCulloch and Walter Pitts in 1943 and are
one of the earliest formalizations of artificial neurons. McCulloch-Pitts units operate based on a simple
thresholding logic.

Here's how McCulloch-Pitts units work:

1. Inputs and Weights:

Each McCulloch-Pitts unit takes multiple binary inputs (0 or 1) represented as x1, x2, ..., xn. Each input
is associated with a weight (w1, w2, ..., wn), which determines the importance or strength of that input.

2. Thresholding Logic:

The McCulloch-Pitts unit performs a weighted sum of the inputs, and if the sum exceeds a certain
threshold, the neuron fires and produces an output signal. Otherwise, it remains inactive (output is 0).

3. Activation Function:

The activation function used in McCulloch-Pitts units is a step function or a threshold function. The
output (y) of the neuron is determined as follows:

y = 1, if Σ(xi * wi) ≥ Threshold (T)

y = 0, otherwise

The threshold (T) is a parameter that defines the point at which the neuron activates.

4. Binary Output:

The output of a McCulloch-Pitts unit is binary, either 0 or 1. It represents the neuron's firing state
based on the thresholding logic.
McCulloch-Pitts units were influential in the early
development of neural networks and inspired
subsequent research on artificial neurons and
artificial neural networks. While these units are
simple and can perform basic logical operations
(AND, OR, NOT), they have limitations. For example,
they are unable to learn from data or adapt to new
patterns, making them less suitable for complex
tasks compared to modern neural network
architectures.

However, the concept of thresholding logic and binary output served as a foundation for more
sophisticated neuron models and paved the way for the development of the perceptron and, eventually,
modern neural network architectures with trainable parameters and different activation functions.

Estimators – Bias – Variance

Estimators, bias, and variance are fundamental concepts in the context of machine learning and model
evaluation.

Estimators:

In machine learning, an estimator refers to an algorithm or model that learns patterns and relationships
from the data and makes predictions or estimates based on that learning. Estimators are the core
components of machine learning models and are used for various tasks, such as classification,
regression, clustering, and more. The learning process involves finding the best model parameters that
minimize the error between the predicted values and the actual target values.

Bias:

Bias refers to the error introduced by approximating a real-world problem using a simplified model. It
represents the model's tendency to consistently underpredict or overpredict the target values compared
to the true values in the dataset. A model with high bias oversimplifies the data, leading to systematic
errors and poor performance on both the training and test datasets. It typically occurs when the model is
too simple to capture the underlying patterns and relationships in the data.

Variance:

Variance refers to the amount of fluctuation or variability in a model's performance when trained on
different subsets of the training data. It measures how sensitive the model is to the particular data
points in the training set. A model with high variance tends to be overly complex and can capture noise
in the training data, leading to poor performance on new, unseen data. High variance often occurs when
the model is overfitting the training data.
Bias-Variance Trade-Off:

The bias-variance trade-off is a fundamental concept in machine learning. It refers to the balance
between a model's bias and variance when making predictions. Models with high bias tend to underfit
the data, while models with high variance tend to overfit the data. The goal is to find a model that strikes
a balance between bias and variance to achieve good generalization performance on unseen data.

To achieve the right balance, various strategies can be employed:

- Bias Reduction: To reduce bias, one can use more complex models or increase the model's
capacity to capture the underlying patterns in the data.

- Variance Reduction: To reduce variance, regularization techniques, cross-validation, or

ensemble methods can be used.

It's important to understand the bias-variance trade-off when developing machine learning models, as
optimizing one aspect often comes at the expense of the other. Proper model evaluation using
techniques like cross-validation and monitoring both bias and variance can guide the process of building
a well-performing and generalizable machine-learning model.

Linear perceptron

The linear perceptron, also known as the single-layer perceptron, is one of the simplest and earliest
neural network architectures. It was introduced by Frank Rosenblatt in 1958. The linear perceptron is a
binary classification algorithm used for linearly separable datasets.

Architecture of Linear Perceptron:

The linear perceptron consists of an input layer and an output layer. It does not have any hidden layers.
The input layer represents the features of the data, and the output layer produces the binary
classification decision.
Working of Linear Perceptron:

1. Inputs and Weights:

The linear perceptron takes multiple input features, denoted as x1, x2, ..., xn. Each input is associated
with a weight, denoted as w1, w2, ..., wn. The weights represent the importance or contribution of each
feature to the classification decision.

2. Weighted Sum and Activation:

The perceptron computes the weighted sum of the inputs and their corresponding weights and applies
an activation function to produce the output. The output (y) of the perceptron is computed as follows:

y = 1, if Σ(xi * wi) + bias ≥ 0

y = 0, otherwise

The bias (denoted as b) is an additional parameter that acts as a threshold, determining the decision
boundary of the perceptron.

3. Activation Function:

The activation function used in the linear perceptron is a step function or a threshold function. The
output is binary, with the perceptron producing a positive (1) or negative (0) classification decision.

4. Training:

The training of the linear perceptron involves adjusting the weights and the bias based on the training
data. The goal is to find the optimal weights and biases that minimize the classification error on the
training data.

5. Convergence Theorem:

The perceptron training process is guaranteed to converge and find a solution if the data is linearly
separable. However, if the data is not linearly separable, the perceptron training process may not
converge.

Limitations of Linear Perceptron:

The linear perceptron has several limitations:

- It can only handle linearly separable datasets, making it unsuitable for problems with more complex
decision boundaries.
- It cannot solve problems that require capturing nonlinear relationships between features and the
target variable.

- The training process may not converge if the data is not linearly separable.

- It does not support probabilistic outputs or confidence scores.

Despite these limitations, the linear perceptron played a crucial role in the development of neural
networks and inspired more advanced models, such as multi-layer perceptrons and deep neural
networks, which can address more complex tasks and learn nonlinear patterns in the data.

Perceptron Learning Algorithm

The Perceptron Learning Algorithm (PLA) is a supervised learning algorithm used to train a linear
perceptron for binary classification tasks. It was introduced by Frank Rosenblatt in 1957 and is one of the
earliest learning algorithms for neural networks. The PLA is designed to find the optimal weights and
biases for a linear perceptron, allowing it to learn a decision boundary that separates the two classes in
the dataset.

We initialize w with some random vector. We then iterate over all the examples in the data, (P U N) both
positive and negative examples. Now if an input x belongs to P, ideally what should the dot
product w.x be? I’d say greater than or equal to 0 because that’s the only thing that our perceptron wants
at the end of the day so let's give it that. And if x belongs to N, the dot product MUST be less than 0. So if
you look at the if conditions in the while loop:

Case 1: When x belongs to P and its dot product w.x < 0

Case 2: When x belongs to N and its dot product w.x ≥ 0

Only for these cases, we are updating our randomly initialized w. Otherwise, we don’t touch w at all
because Case 1 and Case 2 are violating the very rule of a perceptron. So we are adding x to w (ahem
vector addition ahem) in Case 1 and subtracting x from w in Case 2.

Algorithm Steps:

1. Initialization:

Initialize the weights (w1, w2, ..., wn) and bias (b) of the perceptron to small random values or zeros.

2. Training Data:

Provide a labeled training dataset where each data point is associated with a target class (either 0 or 1).

3. Training Process:

- For each data point in the training dataset, do the following:

- Compute the weighted sum of the inputs and the current weights: Σ(xi * wi) + b.

- Apply the activation function (step function) to the weighted sum to produce the predicted
output (y_pred).

- Update the weights and bias based on the prediction and the true label (y_true) as follows:

- If y_pred is equal to y_true (correct prediction), do not update the weights and bias.
- If y_pred is 1 and y_true is 0 (false positive), decrease the weights and bias:

- wi_new = wi_old - α * xi

- b_new = b_old - α

- If y_pred is 0 and y_true is 1 (false negative), increase the weights and bias:

- wi_new = wi_old + α * xi

- b_new = b_old + α

- Repeat the training process for a fixed number of iterations (epochs) or until the algorithm
converges to a solution (when all data points are correctly classified).

4. Convergence:

The Perceptron Learning Algorithm is guaranteed to converge and find a solution if the training data is
linearly separable. If the data is not linearly separable, the PLA may not converge, and the algorithm will
keep updating the weights indefinitely.

Learning Rate (α):

The learning rate (α) is a hyperparameter of the PLA that controls the step size during weight and bias
updates. It determines how much the weights and bias are adjusted based on the prediction errors. A
larger learning rate allows for faster convergence but may lead to overshooting the optimal solution. A
smaller learning rate may result in slower convergence but better stability.

Limitations:

The Perceptron Learning Algorithm has some limitations:

- It can only handle linearly separable datasets and may not converge if the data is not linearly
separable.

- The algorithm does not support probabilistic outputs or confidence scores.

- It is not suitable for problems that require capturing nonlinear relationships between features
and the target variable.

Despite these limitations, the PLA played a crucial role in the history of artificial neural networks and laid
the foundation for more advanced learning algorithms and neural network architectures.

Multilayer perceptron

A multilayer perceptron (MLP) is a type of artificial neural network that consists of multiple layers of
interconnected neurons. It is a feedforward neural network, meaning that the data flows in one
direction, from the input layer through the hidden layers to the output layer, without any feedback
connections. MLPs are one of the foundational architectures in deep learning and are widely used for a
variety of tasks, including classification, regression, and pattern recognition.
Architecture:

The multilayer perceptron typically consists of the following layers:

1. Input Layer:

The input layer is responsible for accepting the input data, which could be a feature vector
representing the characteristics of the data points.

2. Hidden Layers:

MLPs have one or more hidden layers sandwiched between the input and output layers. Each hidden
layer contains multiple neurons, and the number of hidden layers and neurons in each layer is a
hyperparameter that can be adjusted based on the complexity of the task.

3. Output Layer:

The output layer produces the final output of the model, which depends on the specific task being
performed. For binary classification, it might consist of a single neuron with a sigmoid activation function
to produce binary outputs (0 or 1). For multiclass classification, the output layer might have multiple
neurons, each representing a different class, with a softmax activation function to produce probabilities
for each class.

Working:

During the forward pass of an MLP, the input data propagates through the network layer by layer. Each
neuron in a layer performs a weighted sum of its inputs and applies an activation function to produce an
output, which becomes the input to the next layer. This process continues until the final output is
produced.

The weights and biases of the neurons are learned through the process of training using techniques like
backpropagation and gradient descent. The goal of training is to adjust the model's parameters to
minimize the difference between the predicted outputs and the actual target values in the training data.

Activation Functions:

Activation functions introduce non-linearity into the model, allowing MLPs to capture complex
relationships in the data. Some commonly used activation functions in hidden layers include:

- ReLU (Rectified Linear Unit)

- Sigmoid

- Tanh (Hyperbolic Tangent)

Training:

Training an MLP involves feeding the training data through the network, calculating the loss (error)
between the predicted outputs and the actual targets, and then updating the model's parameters
(weights and biases) using optimization algorithms like gradient descent and backpropagation. The
training process continues for multiple epochs until the model converges and reaches a satisfactory level
of performance on the training data. MLPs can be implemented using deep learning frameworks like
Keras, TensorFlow, or PyTorch, which provide user-friendly APIs to create, train, and evaluate neural
network models.

Backpropagation

Backpropagation, short for "backward propagation of errors," is a widely used algorithm for training
artificial neural networks, including multilayer perceptrons (MLPs). It is a supervised learning algorithm
that aims to adjust the weights of the neural network based on the prediction errors, allowing the
network to learn from the training data and improve its performance over time.

How Backpropagation Works:

1. Forward Pass:

During the forward pass, the input data is fed into the neural network, and the data propagates
through the network layer by layer. Each neuron performs a weighted sum of its inputs, applies an
activation function to produce an output, and passes that output to the next layer as its input. This
process continues until the output layer produces the final predictions.

2. Loss Calculation:

After the forward pass, the neural network produces predictions for the input data. The loss function
(e.g., mean squared error for regression or binary cross-entropy for binary classification) is then used to
measure the difference between the predicted values and the actual target values in the training data.

3. Backward Pass:

The backward pass is the core of the backpropagation algorithm. It involves propagating the error
backward through the network to compute the gradients of the loss function with respect to the model's
parameters (weights and biases). The gradients indicate how the loss function changes with respect to
changes in the model's parameters.

4. Gradient Descent:

Once the gradients have been computed, the model's parameters are updated using an optimization
algorithm such as gradient descent. Gradient descent adjusts the weights and biases in the direction that
minimizes the loss function. The learning rate determines the step size in the weight update process.

5. Iterations:
The forward pass, loss calculation, backward pass, and weight updates are performed iteratively over
the entire training dataset. This process is repeated for a fixed number of epochs (iterations) or until the
model's performance converges to a satisfactory level.

Benefits of Backpropagation:

- Backpropagation allows neural networks to learn from data and improve their performance on various
tasks, including classification, regression, and more.
- It enables neural networks to capture complex patterns and relationships in the data by adjusting their
internal parameters (weights and biases).

- Backpropagation facilitates the use of deep learning, as it allows for the training of deep neural
networks with multiple hidden layers.

While backpropagation is a powerful algorithm for training neural networks, it is not without challenges.
For example, it can suffer from vanishing or exploding gradients in deep networks, which can slow down
or hinder learning. However, various techniques, such as weight initialization, activation functions, and
batch normalization, have been developed to address these challenges and improve the training process.

Deep-Learning Notes 01
No ratings yet
Deep-Learning Notes 01
8 pages
unit-1.2-Perceptron-2024
No ratings yet
unit-1.2-Perceptron-2024
107 pages
Deep Learning[1]
No ratings yet
Deep Learning[1]
26 pages
CMPE257 - W2C3 - ML Fundamentals_ Part 2
No ratings yet
CMPE257 - W2C3 - ML Fundamentals_ Part 2
34 pages
Unit - 2 Deep Learning
No ratings yet
Unit - 2 Deep Learning
26 pages
Data analysis ch1
No ratings yet
Data analysis ch1
13 pages
2.3 Feed Forward Netwoks
No ratings yet
2.3 Feed Forward Netwoks
25 pages
ml
No ratings yet
ml
10 pages
mod3
No ratings yet
mod3
101 pages
unit -1 leftover topic notes
No ratings yet
unit -1 leftover topic notes
8 pages
Unit Online 1.4
No ratings yet
Unit Online 1.4
132 pages
Unit 3
No ratings yet
Unit 3
7 pages
Machine Learning Juunit2.pdf Lands
No ratings yet
Machine Learning Juunit2.pdf Lands
7 pages
Neural Networks - V Unit (2)
No ratings yet
Neural Networks - V Unit (2)
43 pages
DL_Unit1 (1)
100% (1)
DL_Unit1 (1)
79 pages
Mid 1 DL Notes
No ratings yet
Mid 1 DL Notes
15 pages
Data Science Interview Question
No ratings yet
Data Science Interview Question
23 pages
Artificial Neural Network
No ratings yet
Artificial Neural Network
39 pages
FALLSEM2023-24 CSE4020 ELA VL2023240104096 2023-09-07 Reference-Material-I
No ratings yet
FALLSEM2023-24 CSE4020 ELA VL2023240104096 2023-09-07 Reference-Material-I
7 pages
module 3 modified
No ratings yet
module 3 modified
48 pages
Unit 4
No ratings yet
Unit 4
50 pages
ML Super Imp
No ratings yet
ML Super Imp
19 pages
Unit 2
No ratings yet
Unit 2
37 pages
Deep Learning
No ratings yet
Deep Learning
21 pages
Deep Learning
No ratings yet
Deep Learning
12 pages
reserch papers on deep learning mpgi
No ratings yet
reserch papers on deep learning mpgi
6 pages
tfm_lichtner_bajjaoui_aisha
No ratings yet
tfm_lichtner_bajjaoui_aisha
18 pages
Gansp Awareness Quiz PDF
No ratings yet
Gansp Awareness Quiz PDF
13 pages
Machine Learning Unit 4
No ratings yet
Machine Learning Unit 4
21 pages
deep learning
No ratings yet
deep learning
11 pages
Unit 2
No ratings yet
Unit 2
97 pages
NN-2nd
No ratings yet
NN-2nd
23 pages
DL Miid1 Mansi
No ratings yet
DL Miid1 Mansi
18 pages
AIML(4th Sem)
No ratings yet
AIML(4th Sem)
22 pages
Ensemble Method
No ratings yet
Ensemble Method
12 pages
All DL
No ratings yet
All DL
72 pages
UNIT V (1)
No ratings yet
UNIT V (1)
25 pages
DL UNIT2
No ratings yet
DL UNIT2
22 pages
Csa202 Unit 2
No ratings yet
Csa202 Unit 2
36 pages
Unit 3
No ratings yet
Unit 3
8 pages
Notes Machine Learning
No ratings yet
Notes Machine Learning
34 pages
Artificial Neural Network Concepts/Terminology
No ratings yet
Artificial Neural Network Concepts/Terminology
22 pages
Upload_Unit_2
No ratings yet
Upload_Unit_2
19 pages
Unit Iii
No ratings yet
Unit Iii
20 pages
ML Decode
No ratings yet
ML Decode
130 pages
ML-UNIT-I
No ratings yet
ML-UNIT-I
14 pages
ML Module Ii
No ratings yet
ML Module Ii
24 pages
MLquestions
No ratings yet
MLquestions
26 pages
Types of Machine Learning: Supervised Learning: The Computer Is Presented With Example Inputs and Their
No ratings yet
Types of Machine Learning: Supervised Learning: The Computer Is Presented With Example Inputs and Their
50 pages
Session 6 Machine Learning Algorithms
No ratings yet
Session 6 Machine Learning Algorithms
46 pages
2 marks
No ratings yet
2 marks
5 pages
machine learning
No ratings yet
machine learning
37 pages
MACHINE LEARNING NOTES ANNA UNIVERSITY
No ratings yet
MACHINE LEARNING NOTES ANNA UNIVERSITY
9 pages
DL Notes
No ratings yet
DL Notes
16 pages
Machine Learning Volume I 280820241047
No ratings yet
Machine Learning Volume I 280820241047
4 pages
ML Decode
No ratings yet
ML Decode
130 pages
Bias and Variance
No ratings yet
Bias and Variance
7 pages
Ensemble Learning
No ratings yet
Ensemble Learning
46 pages
Hybrid Neural Networks: Fundamentals and Applications for Interacting Biological Neural Networks with Artificial Neuronal Models
From Everand
Hybrid Neural Networks: Fundamentals and Applications for Interacting Biological Neural Networks with Artificial Neuronal Models
Fouad Sabry
No ratings yet
Artificial Intelligence Algorithms
From Everand
Artificial Intelligence Algorithms
akosnemeth
No ratings yet
Pratapa P Evidence of Learning 4
No ratings yet
Pratapa P Evidence of Learning 4
2 pages
What Is State Machine Diagram?
No ratings yet
What Is State Machine Diagram?
1 page
Blastmate III
No ratings yet
Blastmate III
2 pages
Fairatmos Is Hiring 1693199150
No ratings yet
Fairatmos Is Hiring 1693199150
7 pages
Digital Communication (EC51)
No ratings yet
Digital Communication (EC51)
75 pages
DSE-0043.8 MultiCoat en
No ratings yet
DSE-0043.8 MultiCoat en
8 pages
A Project Report ON Railway Reservation System: Bachelor of Computer Application
No ratings yet
A Project Report ON Railway Reservation System: Bachelor of Computer Application
24 pages
CCE Syllabus 2016 2020 PDF
No ratings yet
CCE Syllabus 2016 2020 PDF
60 pages
DSP Sine Wave Inverter
67% (9)
DSP Sine Wave Inverter
8 pages
Pixma Mp500: Service Manual
No ratings yet
Pixma Mp500: Service Manual
66 pages
Gerbera
No ratings yet
Gerbera
96 pages
Ict Eplan5 1
No ratings yet
Ict Eplan5 1
2 pages
Proposal for Supplying of Computers
No ratings yet
Proposal for Supplying of Computers
5 pages
CompatibilityMatrix
No ratings yet
CompatibilityMatrix
2 pages
Checker Board Detection
No ratings yet
Checker Board Detection
6 pages
SP2.2 Assembler-Machine-Dependent Assembler Features
No ratings yet
SP2.2 Assembler-Machine-Dependent Assembler Features
13 pages
MySQL Books
No ratings yet
MySQL Books
2 pages
Ard1 - Using 7-Segment and PWM
No ratings yet
Ard1 - Using 7-Segment and PWM
8 pages
Chapter 3 Artificial Intelligence (AI) - Final
No ratings yet
Chapter 3 Artificial Intelligence (AI) - Final
43 pages
Dell PowerEdgeVRTX
No ratings yet
Dell PowerEdgeVRTX
5 pages
Product Technical Sheet
No ratings yet
Product Technical Sheet
2 pages
Rosen, Discrete Mathematics and Its Applications, 6th Edition Extra Examples
No ratings yet
Rosen, Discrete Mathematics and Its Applications, 6th Edition Extra Examples
4 pages
TCS Ninja Aptitude Questions - Set 1 (Standard Section)
No ratings yet
TCS Ninja Aptitude Questions - Set 1 (Standard Section)
8 pages
ASM 306 S Brochure Leak Detector
No ratings yet
ASM 306 S Brochure Leak Detector
8 pages
Kinetic 215a
No ratings yet
Kinetic 215a
24 pages
Tabela de Referencia de Oleos Minerais Industriais
No ratings yet
Tabela de Referencia de Oleos Minerais Industriais
2 pages
Migrating From Classic GL To NewGL
No ratings yet
Migrating From Classic GL To NewGL
8 pages
Endalk Bisskey
No ratings yet
Endalk Bisskey
4 pages
ANSYS Resources06
No ratings yet
ANSYS Resources06
1 page
100 Greatest Science Discoveries of All Time Part100
No ratings yet
100 Greatest Science Discoveries of All Time Part100
2 pages