0% found this document useful (0 votes)
5 views

UNIT-1 DLL

The document outlines the syllabus for a course on Deep Learning and its applications, covering fundamental concepts of machine learning, including supervised, unsupervised, and reinforcement learning. It explains the workings of different learning types, their advantages and disadvantages, and introduces neural networks as a key component of deep learning. The document emphasizes the importance of machine learning in analyzing large datasets and improving decision-making processes.

Uploaded by

bitsmid167
Copyright
© © All Rights Reserved
Available Formats
Download as DOC, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
5 views

UNIT-1 DLL

The document outlines the syllabus for a course on Deep Learning and its applications, covering fundamental concepts of machine learning, including supervised, unsupervised, and reinforcement learning. It explains the workings of different learning types, their advantages and disadvantages, and introduces neural networks as a key component of deep learning. The document emphasizes the importance of machine learning in analyzing large datasets and improving decision-making processes.

Uploaded by

bitsmid167
Copyright
© © All Rights Reserved
Available Formats
Download as DOC, PDF, TXT or read online on Scribd
You are on page 1/ 73

UNIT-1 Syllabus- Deep Learning and its Applications

Machine Learning Basics: Introduction, Learning Algorithms,

Maximum Likelihood Estimation, Building Machine Learning

Algorithms, Neural Networks: Multi Layer Perceptron, Back

propagation Algorithm , Stochastic Gradient Descent And Its

variants, Curse Of Dimensionality.

Introduction to ML

Machine Learning is a concept which allows the machine to learn from

examples and experience, and that too without being explicitly

programmed. So instead of you writing the code, what you do is you feed

data to the algorithm, and the algorithm/ machine builds the logic based

on the given data.

"A computer program is said to learn from experience E with respect to


some class of tasks T and performance measure P, if its performance at
tasks in T, as measured by P, improves with experience E."stated by
(Tom Michel) "
Machine Learning is the field of study that gives computers the ability to
learn without being explicitly programmed".
Learning = Improving with experience at some task
- Improve over task T,
- with respect to performance measure P,
- based on experience E.
ML Learning Types
They can be classified as supervised, unsupervised, semi-supervised or
reinforced.
— Supervised Learning: Supervised learning models use external
feedback to learning functions that map inputs to output observations. In
those models the external environment acts as a “teacher” of the AI
algorithms.

— Unsupervised Learning: Unsupervised models focus on learning a


pattern in the input data without any external feedback. Clustering is a
classic example of unsupervised learning models.
— Semi-supervised Learning or Reinforcement Learning: Semi-
Supervised learning uses a set of curated, labelled data and tries to infer
new labels/attributes on new data sets. Semi-Supervised learning models
are a solid middle ground between supervised and unsupervised models.
 Reinforcement learning models use opposite dynamics such as
rewards and punishment to “reinforce” different types of
knowledge. This type of learning technique is becoming really
popular in modern AI solutions.

Need for ML: Machine learning enables analysis of massive

quantities of data. While it generally delivers faster, more accurate

results in order to identify profitable opportunities or dangerous risks,

it may also require additional time and resources to train it properly.

Combining machine learning with AI and cognitive technologies can

make it even more effective in processing large volumes of

information.

 1. Supervised machine learning algorithms can apply what has


been learned in the past to new data using labelled examples to
predict future events.
Eg: suppose you are given a basket filled with different kinds of fruits.

 Now the first step is to train the machine with all different fruits
one by one like this:
 If shape of object is rounded and depression at top having color
Red then it will be labelled as –Apple.
 If shape of object is long curving cylinder having color Green-
Yellow then it will be labelled as –Banana.

 Now suppose after training the data, you have given a new separate
fruit say Banana from basket and asked to identify it.
 Since machine has already learned the things from previous data
and this time have to use it wisely. It will first classify the fruit
with its shape and color, and would confirm the fruit name as
BANANA and put it in Banana category.
 Thus machine learns the things from training data(basket
containing fruits) and then apply the knowledge to test data(new
fruit).
 Supervised learning classified into two categories of algorithms:
 Classification: A classification problem is when the output variable
is a category, such as ―Red or blue or disease and no disease.
Regression: A regression problem is when the output variable is a
real value, such as price or weight or height.

How Supervised Learning Works?


The working of Supervised learning can be easily understood by the
below example and diagram:
 Suppose we have a dataset of different types of shapes which
includes square, rectangle, triangle, and Polygon.
 Now the first step is that we need to train the model for each shape.
 If the given shape has four sides, and all the sides are equal, then it
will be labelled as a Square.
 If the given shape has three sides, then it will be labelled as a
triangle.
 If the given shape has six equal sides, then it will be labelled as
hexagon.
 Now, after training, we test our model using the test set, and the
task of the model is to identify the shape.
 The machine is already trained on all types of shapes, and when it
finds a new shape, it classifies the shape on the bases of a number
of sides, and predicts the output.
Steps Involved in Supervised Learning:

 First Determine the type of training dataset.

 Collect/Gather the labelled training data.

 Split the training dataset into training dataset, test dataset, and

validation dataset.
 Determine the input features of the training dataset, which should

have enough knowledge so that the model can accurately predict the

output.

 Determine the suitable algorithm for the model, such as support

vector machine, decision tree, etc.

 Execute the algorithm on the training dataset. Sometimes we need

validation sets as the control parameters, which are the subset of

training datasets.

 Evaluate the accuracy of the model by providing the test set. If the

model predicts the correct output, which means our model is

accurate.

Supervised learning can be further divided into two types of problems:

1. Regression: Regression algorithms are used if there is a

relationship between the input variable and the output variable. It is

used for the prediction of continuous variables, such as Weather

forecasting, Market Trends, etc.


2. Classification: Classification algorithms are used when the output

variable is categorical, which means there are two classes such as

Yes-No, Male-Female, True-false, etc

Advantages of Supervised Learning:

 With the help of supervised learning, the model can predict

the output on the basis of prior experiences.

 In supervised learning, we can have an exact idea about the

classes of objects.

 Supervised learning model helps us to solve various real-

world problems such as fraud detection, spam filtering, etc.

Disadvantages of supervised learning:

 Supervised learning models are not suitable for handling the

complex tasks.

 Supervised learning cannot predict the correct output if the

test data is different from the training dataset.

 Training required lots of computation time.

 In supervised learning, we need enough knowledge about the

classes of object.

Unsupervised learning

 Unsupervised learning is the training of a machine using

information that is neither classified nor labelled and allowing the

algorithm to act on that information without guidance.


 Here the task of the machine is to group unsorted information

according to similarities, patterns, and differences without any

prior training of data.

 Therefore, the machine is restricted to find the hidden structure in

unlabelled data by itself.

 For instance, suppose it is given an image having both dogs and


cats which it has never seen.

Unsupervised algorithms are classified into two types:

 Clustering: The task of grouping data points based on their

similarity with each other is called Clustering or Cluster Analysis.

This method is defined under the branch of Unsupervised


Learning, which aims at gaining insights from unlabelled data

points.

Ex: grouping customers by purchasing behaviour.

 Association: Association rule learning is a type of unsupervised

learning technique that checks for the dependency of one data item

on another data item and it is used to find relations or associations

among the variables of dataset.

Ex: people that buy X also tend to buy Y.

Working of unsupervised learning can be understood by the below

diagram.

 Here, we have taken an unlabelled input data, which means it is not

categorized and corresponding outputs are also not given.

 Now, this unlabelled input data is fed to the machine learning

model in order to train it.

 Firstly, it will interpret the raw data to find the hidden patterns

from the data and then will apply suitable algorithms such as
K-means clustering, Decision tree, etc..

 Once it applies the suitable algorithm, the algorithm divides the

data objects into groups according to the similarities and

difference between the objects.

Advantages of Unsupervised Learning

 Unsupervised learning is used for more complex tasks as

compared to supervised learning because, in unsupervised

learning, we don't have labelled input data.

 Unsupervised learning is preferable as it is easy to get

unlabelled data in comparison to labelled data.

Disadvantages of Unsupervised Learning

 Unsupervised learning is intrinsically more difficult than

supervised learning as it does not have corresponding output.

 The result of the unsupervised learning algorithm might be

less accurate as input data is not labelled, and algorithms do not

know the exact output in advance.

Reinforcement Learning

 Reinforcement learning(RL) is the problem of getting an agent to act

in the world so as to maximize its rewards.

 Unlike supervised learning, which relies on a training dataset with

predefined answers, RL involves learning through experience.


 In RL, an agent learns to achieve a goal in an uncertain, potentially

complex environment by performing actions and receiving feedback

through rewards or penalties.

 A learner (the program) is not told what actions to take as in most

forms of machine learning, but instead must discover which actions

yield the most reward by trying them. In the most interesting and

challenging cases, actions may affect not only the immediate reward

but also the next situations and, through that, all subsequent rewards.

For example, consider teaching a dog a new trick: we cannot tell it

what to do, but we can reward/punish it if it does the right/wrong

thing. It has to find out what it did that made it get the

reward/punishment. We can use a similar method to train computers

to do many tasks, such as playing backgammon or chess, scheduling

jobs, and controlling robot limbs.

Typical applications of reinforcement learning involve playing games

and some form of robots, e.g., drones, warehouse robots, and more

recently self-driving cars.


Key Concepts of Reinforcement Learning

 Agent: The learner or decision-maker.

 Environment: Everything the agent interacts with.

 State: A specific situation in which the agent finds itself.

 Action: All possible moves the agent can make.

 Reward: Feedback from the environment based on the action taken.

 RL operates on the principle of learning optimal behaviour through

trial and error. The agent takes actions within the environment,

receives rewards or penalties, and adjusts its behaviour to maximize

the cumulative reward.

Supervised Vs Unsupervised Vs Reinforcement Learning

Unsupervised Reinforcement
Aspect Supervised Learning
Learning Learning

Definition Learning from labelled data Learning from Learning to make

to predict outcomes for new unlabelled data to decisions by

data. identify patterns performing

and structures. actions in an

environment and
receiving rewards

or penalties.

No predefined

dataset; learns
Works with
Requires a dataset with from interactions
Data unlabelled data.
input-output pairs. Data with the
Requirement No need for input-
must be labelled. environment
output pairs.
through trial and

error.

Model that
Policy or strategy
identifies the
that specifies the
A predictive model that data's patterns,
Output action to take in
maps inputs to outputs. clusters,
each state of the
associations, or
environment.
features.

Feedback Direct feedback (correct No explicit Indirect feedback


(rewards or

feedback. The penalties after

output is known). algorithm infers actions, not

structures. necessarily

immediate).

Discover the
Minimize the error between Maximize
underlying
Goal predicted and actual cumulative
structure of the
outputs. reward over time.
data.

Video game AI,

Clustering, robotic control,

Image classification, spam dimensionality dynamic pricing,


Examples
detection, regression tasks. reduction, market personalized

basket analysis. recommendations

Learning Learns from examples Learns patterns or Learns from the


consequences of
features from data
its actions rather
Approach provided during training. without specific
than from direct
guidance.
instruction.

Evaluated based Evaluated based

Typically evaluated on a on metrics like on the amount of

separate test set using silhouette score, reward it can


Evaluation
accuracy, precision, recall, within-cluster secure over time

etc. sum of squares, in the

etc. environment.

Challenges Requires a large amount of Difficult to Requires a

labelled data, which can be validate results as balance between

expensive or impractical. there is no true exploration and

benchmark. exploitation and

Interpretation is can be

often subjective. challenging in

environments
with sparse

rewards.

Deep learning is a branch of machine learning which is based on

artificial neural networks. It is capable of learning complex patterns and

relationships within data. In deep learning, we don’t need to explicitly

program everything. It has become increasingly popular in recent years

due to the advances in processing power and the availability of large

datasets.

Deep Learning is based on Neural Networks.

Neural Networks

The term "Artificial Neural Network" is derived from Biological neural

networks that develop the structure of a human brain. Similar to the

human brain that has neurons interconnected to one another, artificial

neural networks also have neurons that are interconnected to one another

in various layers of the networks. These neurons are known as nodes.


The given figure illustrates the typical diagram of Biological Neural

Network.

The typical Artificial Neural Network looks something like the given

figure.

Dendrites from Biological Neural Network represent inputs in Artificial

Neural Networks, cell nucleus represents Nodes, synapse represents

Weights, and Axon represents Output.

Relationship between Biological neural network and Artificial neural

network:

 An Artificial Neural Network in the field of Artificial intelligence

where it attempts to mimic the network of neurons makes up a human

brain so that computers will have an option to understand things and


make decisions in a human-like manner. The artificial neural network

is designed by programming computers to behave simply like

interconnected brain cells.

 There are around 1000 billion neurons in the human brain. Each

neuron has an association point somewhere in the range of 1,000 and

100,000.

 In the human brain, data is stored in such a manner as to be

distributed, and we can extract more than one piece of this data when

necessary from our memory parallel. We can say that the human brain

is made up of incredibly amazing parallel processors.

 We can understand the artificial neural network with an example,

consider an example of a digital logic gate that takes an input and

gives an output. "OR" gate, which takes two inputs. If one or both the

inputs are "On," then we get "On" in output. If both the inputs are

"Off," then we get "Off" in output. Here the output depends upon

input.

 Our brain does not perform the same task. The outputs to inputs

relationship keep changing because of the neurons in our brain, which

are "learning."
Input Layer:

As the name suggests, it accepts inputs in several different formats

provided by the programmer.

Hidden Layer:

The hidden layer presents in-between input and output layers. It performs

all the calculations to find hidden features and patterns.

Output Layer:

The input goes through a series of transformations using the hidden layer,

which finally results in output that is conveyed using this layer.

The artificial neural network takes input and computes the weighted sum

of the inputs and includes a bias. This computation is represented in the

form

of a transfer function.
It determines weighted total is passed as an input to an activation function

to produce the output. Activation functions choose whether a node should

fire or not. Only those who are fired make it to the output layer. There are

distinctive activation functions available that can be applied upon the sort

of task we are performing.

Advantages of Artificial Neural Network (ANN)

 Parallel processing capability:

Artificial neural networks have a numerical value that can perform more

than one task simultaneously.

 Storing data on the entire network:

Data that is used in traditional programming is stored on the whole

network, not on a database. The disappearance of a couple of pieces of

data in one place doesn't prevent the network from working.

 Capability to work with incomplete knowledge:

After ANN training, the information may produce output even with

inadequate data. The loss of performance here relies upon the

significance of missing data.

 Having a memory distribution:

For ANN is to be able to adapt, it is important to determine the examples

and to encourage the network according to the desired output by

demonstrating these examples to the network. The succession of the


network is directly proportional to the chosen instances, and if the event

can't appear to the network in all its aspects, it can produce false output.

 Having fault tolerance:

Extortion of one or more cells of ANN does not prohibit it from

generating output, and this feature makes the network fault-tolerance.

Learning Algorithms:

Support Vector Machines(SVM)


 Support Vector Machine or SVM is one of the most popular
Supervised Learning algorithms, which is used for Classification as
well as Regression problems.
 However, primarily, it is used for Classification problems in Machine
Learning.
 The goal of the SVM algorithm is to create the best fit line or
decision boundary that can segregate n-dimensional space into classes
so that we can easily put the new data point in the correct category in
the future. This best decision boundary is called a hyperplane.
 SVM chooses the extreme points/vectors that help in creating the
hyperplane. These extreme cases are called as support vectors, and
hence algorithm is termed as Support Vector Machine.
Consider the below diagram in which there are two different
categories that are classified using a decision boundary or
hyperplane:
o Kernel: It is a function used to map a lower-dimensional data
into higher dimensional data.
o Hyperplane: In general SVM, it is a separation line
between two classes, but in SVR, it is a line which helps to
predict the continuous variables and cover most of the data-
points.
o Boundary line: Boundary lines are the two lines apart from
hyperplane, which creates a margin for data-points.
o Support vectors: Support vectors are the datapoints which are
nearest to the hyperplane and opposite class.
In SVR, we always try to determine a hyperplane with a maximum
margin, so that maximum number of data-points are covered in that
margin.
 The main goal of SVR is to consider the maximum data-points
within the boundary lines and the hyperplane (best-fit line) must
contain a maximum number of data-points.
Consider the below image:
Here, the blue line is called hyperplane, and the other two lines are
known as boundary lines.
For example: Suppose we see a strange cat that also has some
features of dogs, so if we want a model that can accurately identify
whether it is a cat or dog, so such a model can be created by using the
SVM algorithm.
 We will first train our model with lots of images of cats and dogs so
that it can learn about different features of cats and dogs, and then we
test it with this strange creature.
 So as support vector creates a decision boundary between these two
data (cat and dog) and choose extreme cases (support vectors), it will
see the extreme case of cat and dog. On the basis of the support
vectors, it will classify it as a cat.
Consider the below diagram:
SVM algorithm can be used for Face detection, image classification,
text categorization, etc.

The steps for the Support Vector Machine (SVM) algorithm are:
 Select a kernel function: Choose a kernel function for the problem

 Define parameters and constraints: Define the parameters and


constraints for the problem

 Solve the optimization problem: Find the optimal hyperplane by solving


the optimization problem

 Make predictions: Use the learned model to make predictions

 Evaluate the model's performance: Evaluate how well the model


performed
Types of SVM

SVM can be of two types:

o Linear SVM: Linear SVM is used for linearly separable data,


which means if a dataset can be classified into two classes by using
a single straight line, then such data is termed as linearly separable
data, and classifier is used called as Linear SVM classifier.
o Non-linear SVM: Non-Linear SVM is used for non-linearly
separated data, which means if a dataset cannot be classified by
using a straight line, then such data is termed as non-linear data and
classifier used is called as Non-linear SVM classifier.

Advantages of Support Vector Machine (SVM)

 High-Dimensional Performance: SVM excels in high dimensional


spaces, making it suitable for image classification and gene
expression analysis.
 Non-linearCapability:Utilizing kernel functions like RBF(Radial
basis function) or Gaussian and polynomial, SVM effectively
handles non-linear relationships.
 Outlier Resilience: The soft margin feature allows SVM to ignore
outliers, enhancing robustness in spam detection and anomaly
detection.
 Binary and Multi class Support: SVM is effective for both binary
classification and multi-class classification, suitable for
applications in text classification.
 Memory Efficiency: SVM focuses on support vectors, making it
memory efficient compared to other algorithms.

Disadvantages of Support Vector Machine (SVM)

 Slow Training: SVM can be slow for large datasets, affecting


performance in SVM in data mining tasks.
 Parameter Tuning Difficulty: Selecting the right kernel and
adjusting parameters like C requires careful tuning, impacting SVM
algorithms.
 Noise Sensitivity: SVM struggles with noisy datasets and
overlapping classes, limiting effectiveness in real-world scenarios.
 Limited Interpretability: The complexity of the hyperplane in
higher dimensions makes SVM less interpretable than other models.
 Feature Scaling Sensitivity: Proper feature scaling is essential;
otherwise, SVM models may perform poorly.

Random Forest Algorithm

Random Forest is a popular machine learning algorithm that

belongs to the supervised learning technique. It can be used for


both Classification and Regression problems in ML. It is based

on the concept of ensemble learning, which is a process

of combining multiple classifiers to solve a complex problem

and to improve the performance of the model.

As the name suggests, "Random Forest is a classifier that

contains a number of decision trees on various subsets of the

given dataset and takes the average to improve the

predictive accuracy of that dataset." Instead of relying on one

decision tree, the random forest takes the prediction from each

tree and based on the majority votes of predictions, and it

predicts the final output.

The greater number of trees in the forest leads to higher

accuracy and prevents the problem of overfitting.

The below diagram explains the working of the Random Forest

algorithm:
Assumptions for Random Forest

Since the random forest combines multiple trees to predict the

class of the dataset, it is possible that some decision trees may

predict the correct output, while others may not. But together,

all the trees predict the correct output. Therefore, below are two

assumptions for a better Random forest classifier:

o There should be some actual values in the feature variable

of the dataset so that the classifier can predict accurate

results rather than a guessed result.

o The predictions from each tree must have very low

correlations.

Features of random Forest Algorithm


 It takes less training time as compared to other algorithms.

 It predicts output with high accuracy, even for the large

dataset it runs efficiently.

 It can also maintain accuracy when a large proportion of data

is missing.

Working of Algorithm

Random Forest works in two-phase first is to create the random

forest by combining N decision tree, and second is to make

predictions for each tree created in the first phase.

The Working process can be explained in the below steps and

diagram:

Step-1: Select random K data points from the training set.

Step-2: Build the decision trees associated with the selected

data points (Subsets).

Step-3: Choose the number N for decision trees that you want to

build.

Step-4: Repeat Step 1 & 2.


Step-5: For new data points, find the predictions of each

decision tree, and assign the new data points to the category that

wins the majority votes.

The working of the algorithm can be better understood by the

below example:

Example: Suppose there is a dataset that contains multiple fruit

images. So, this dataset is given to the Random forest classifier.

The dataset is divided into subsets and given to each decision

tree. During the training phase, each decision tree produces a

prediction result, and when a new data point occurs, then based

on the majority of results, the Random Forest classifier predicts

the final decision. Consider the below image:


Applications of Random Forest

 Banking: Banking sector mostly uses this algorithm for the

identification of loan risk.

 Medicine: With the help of this algorithm, disease trends

and risks of the disease can be identified.

 Land Use: We can identify the areas of similar land.

Marketing: Marketing trends can be identified using this

algorithm.

Advantages of Random Forest


o Random Forest is capable of performing both

Classification and Regression tasks.

o It is capable of handling large datasets with high

dimensionality.

o It enhances the accuracy of the model and prevents the

overfitting issue.

Disadvantages of Random Forest

o Although random forest can be used for both classification

and regression tasks, it is not more suitable for

Regression tasks.

Linear Regression:

 Linear regression is a statistical method which is used for

predictive analysis.

 It is one of the very simple and easy algorithms which works

on regression and shows the relationship between the

continuous variables.

 It is used for solving the regression problem in machine

learning.
 Linear regression shows the linear relationship between the

independent variable (X-axis) and the dependent variable (Y-

axis), hence called linear regression.

 If there is only one input variable (x), then such linear

regression is called simple linear regression.

 And if there is more than one input variable, then such linear

regression is called multiple linear regression.

The relationship between variables in the linear regression

model can be explained using the below image.

Here we are predicting the salary of an employee on the basis of

the year of experience.

Below is the mathematical equation for Linear regression:


Y= aX+b

Here, Y = dependent variables (target variables),

X= Independent variables (predictor variables),

a and b are the linear coefficients

Since linear regression shows the linear relationship, which

means it finds how the value of the dependent variable is

changing according to the value of the independent variable.

The linear regression model provides a sloped straight line

representing the relationship between the variables.

Consider the below image:

Types of Linear Regression

Linear regression can be further divided into two types of the

algorithm:

Simple Linear Regression:


If a single independent variable is used to predict the value of a

numerical dependent variable, then such a Linear Regression

algorithm is called Simple Linear Regression.

Multiple Linear regression:

If more than one independent variable is used to predict the

value of a numerical dependent variable, then such a Linear

Regression algorithm is called Multiple Linear Regression.

Linear Regression Line:

A linear line showing the relationship between the dependent

and independent variables is called a regression line.

A regression line can show two types of relationship:

Positive Linear Relationship:

If the dependent variable increases on the Y-axis and

independent variable increases on X-axis, then such a

relationship is termed as a Positive linear relationship.


Negative Linear Relationship:

If the dependent variable decreases on the Y-axis and

independent variable increases on the X-axis, then

such a relationship is called a negative linear relationship.

Finding the best fit line:

When working with linear regression, our main goal is to find

the best fit line that means the error between predicted
values and actual values should be minimized. The best fit

line will have the least error.

 The different values for weights or the coefficient of lines

(a0, a1) gives a different line of regression, so we need to

calculate the best values for a0 and a1 to find the best fit line,

so to calculate this we use cost function.

Cost function

 The different values for weights or coefficient of lines (a0,

a1) gives the different line of regression, and the cost

function is used to estimate the values of the coefficient for

the best fit line.

 Cost function optimizes the regression coefficients or

weights. It measures how a linear regression model is

performing.

 We can use the cost function to find the accuracy of the

mapping function, which maps the input variable to the

output variable. This mapping function is also known as

Hypothesis function.
For Linear Regression, we use the Mean Squared Error

(MSE) cost function, which is the average of squared error

occurred between the predicted values and actual values. It can

be written as:

For the above linear equation, MSE can be calculated as:

Where,

N=Total number of observation

Yi = Actual value

(a1xi+a0)= Predicted value.

Some popular applications of linear regression are:

 Analyzing trends and sales estimates

 Salary forecasting

 Real estate prediction

K-Nearest Neighbor (KNN) Algorithm

o K-Nearest Neighbour is one of the simplest Machine

Learning algorithms based on Supervised Learning

technique.
o K-NN algorithm assumes the similarity between the new

case/data and available cases and put the new case into the

category that is most similar to the available categories.

o K-NN algorithm stores all the available data and classifies

a new data point based on the similarity. This means when

new data appears then it can be easily classified into a well

suite category by using K- NN algorithm.

o K-NN algorithm can be used for Regression as well as for

Classification but mostly it is used for the Classification

problems.

o K-NN is a non-parametric algorithm, which means it

does not make any assumption on underlying data.

o It is also called a lazy learner algorithm because it does

not learn from the training set immediately instead it stores

the dataset and at the time of classification, it performs an

action on the dataset.

o KNN algorithm at the training phase just stores the dataset

and when it gets new data, then it classifies that data into a

category that is much similar to the new data.


o Example: Suppose, we have an image of a creature that

looks similar to cat and dog, but we want to know either it

is a cat or dog. So for this identification, we can use the

KNN algorithm, as it works on a similarity measure. Our

KNN model will find the similar features of the new data

set to the cats and dogs images and based on the most

similar features it will put it in either cat or dog category.

Why do we need a K-NN Algorithm?

Suppose there are two categories, i.e., Category A and Category

B, and we have a new data point x1, so this data point will lie in

which of these categories. To solve this type of problem, we

need a K-NN algorithm. With the help of K-NN, we can easily


identify the category or class of a particular dataset. Consider

the below diagram:

How does K-NN work?

The K-NN working can be explained on the basis of the

below algorithm:

o Step-1: Select the number K of the neighbors.

o Step-2: Calculate the Euclidean distance of K number of

neighbors.

o Step-3: Take the K nearest neighbors as per the calculated

Euclidean distance.

o Step-4: Among these k neighbors, count the number of the

data points in each category.


o Step-5: Assign the new data points to that category for

which the number of the neighbor is maximum.

o Step-6: Our model is ready.

Suppose we have a new data point and we need to put it in the

required category. Consider the below image:

o Firstly, we will choose the number of neighbors, so we

will choose the k=5.

o Next, we will calculate the Euclidean distance between

the data points. The Euclidean distance is the distance

between two points, which we have already studied in

geometry. It can be calculated as:


o By calculating the Euclidean distance we got the nearest

neighbors, as three nearest neighbors in category A and

two nearest neighbors in category B. Consider the below

image:

o As we can see the 3 nearest neighbors are from category

A, hence this new data point must belong to category A.

How to select the value of K in the K-NN Algorithm?


Below are some points to remember while selecting the value of

K in the K-NN algorithm:

o There is no particular way to determine the best value for

"K", so we need to try some values to find the best out of

them. The most preferred value for K is 5.

o A very low value for K such as K=1 or K=2, can be noisy

and lead to the effects of outliers in the model.

o Large values for K are good, but it may find some

difficulties.

Advantages of KNN Algorithm:

o It is simple to implement.

o It is robust to the noisy training data

o It can be more effective if the training data is large.

Disadvantages of KNN Algorithm:

o Always needs to determine the value of K which may be

complex some time.


o The computation cost is high because of calculating the

distance between the data points for all the training

samples.

Decision Trees
 Decision Tree is a Supervised learning technique that can be used

for both classification and Regression problems, but mostly it is

preferred for solving Classification problems.

 It is a tree-structured classifier, where internal nodes represent the

features of a dataset, branches represent the decision

rules and each leaf node represents the outcome.

 In a Decision tree, there are two nodes, which are the Decision

Node and Leaf Node.

 Decision nodes are used to make any decision and have multiple

branches, whereas Leaf nodes are the output of those decisions and do

not contain any further branches.

 The goal of using a Decision Tree is to create a training model

that can use to predict the class or value of the target variable

by learning simple decision rules inferred from prior

data(training data).

 The decisions or the test are performed on the basis of features of the

given dataset.
 It is called a decision tree because, similar to a tree, it starts with the

root node, which expands on further branches and constructs a tree-

like structure.

A decision tree simply asks a question, and based on the answer

(Yes/No), it further split the tree into subtrees.

Need for Decision Trees


Below are the two reasons for using the Decision tree:

 Decision Trees usually mimic human thinking ability while making a

decision, so it is easy to understand.

 The logic behind the decision tree can be easily understood because it

shows a tree-like structure.

Decision Tree Terminologies

Root Node: Root node is from where the decision tree starts. It represents

the entire dataset, which further gets divided into two or more

homogeneous sets.
Leaf Node: Leaf nodes are the final output node, and the tree cannot be

segregated further after getting a leaf node.

Splitting: Splitting is the process of dividing the decision node/root node

into sub-nodes according to the given conditions.

Branch/Sub Tree: A tree formed by splitting the tree.

Pruning: Pruning is the process of removing the unwanted branches from

the tree.

Parent/Child node: The root node of the tree is called the parent node,

and other nodes are called the child nodes.

Working of Decision Tree:

 In a decision tree, for predicting the class of the given dataset, the

algorithm starts from the root node of the tree.

 This algorithm compares the values of root attribute with the record

(real dataset) attribute and, based on the comparison, follows the

branch and jumps to the next node.

 For the next node, the algorithm again compares the attribute value

with the other sub-nodes and move further.

 It continues the process until it reaches the leaf node of the tree.
Example: Suppose there is a candidate who has a job offer and wants

to decide whether he should accept the offer or Not.

K-Means Clustering Algorithm:

Clustering or cluster analysis is a machine learning technique, which


groups the unlabelled dataset. It can be defined as "A way of grouping
the data points into different clusters, consisting of similar data
points. The objects with the possible similarities remain in a group that
has less or no similarities with another group."

It does it by finding some similar patterns in the unlabelled dataset


such as shape, size, colour, behaviour, etc., and divides them as per
the presence and absence of those similar patterns.
The clustering technique is commonly used for statistical data analysis.

Note: Clustering is somewhere similar to the classification


algorithm,
but the difference is the type of dataset that we are using. In

classification, we work with the labelled data set, whereas in

clustering, we work with the unlabelled dataset.

k-means clustering aims to partition n observations into k clusters in


which each observation
belongs to the cluster with the nearest means. It works for n-dimensional
spaces

Steps involved in K-Means Clustering Algorithm:

Step-1: Select the number K to decide the number of clusters.

Step-2: Select random K points or centroids. (It can be other from


the input dataset).

Step-3: Assign each data point to their closest centroid, which will
form the predefined K clusters.
Step-4: Calculate the variance and place a new centroid of each cluster.

Step-5: Repeat the third steps, which means reassign each data point
to the new closest centroid of each cluster.

Step-6: If any reassignment occurs, then go to step-4 else go to FINISH.

Example:
Let's take number k of clusters, i.e., K=2, to identify the dataset and
to put them into different clusters. It means here we will try to group
these datasets into two different clusters.

We need to choose some random k points or centroid to form the cluster.


These points can be either the points from the dataset or any other point.

Now we will assign each data point of the scatter plot to its closest K-
point or centroid. We will compute it by applying some mathematics
that we have studied to calculate the distance between two points. So,
we will draw a median between both the centroids.
From the above image, it is clear that points left side of the line is near
to the K1 or blue centroid, and points to the right of the line are close to
the yellow centroid. Let's color them as blue and yellow for clear
visualization.

As we need to find the closest cluster, so we will repeat the process by


choosing a new centroid.

Next, we will reassign each datapoint to the new centroid.


For this, we will repeat the same process of finding a
median line.
From the above image, we can see, one yellow point is on the left side
of the line, and two blue points are right to the line. So, these three
points will be assigned to new centroids.

As reassignment has taken place, so we will again go to the step-4,


which is finding new centroids or K-points.

As we got the new centroids so again will draw the median line and
reassign the data points. So, the image will be:
We can see in the above image; there are no dissimilar data points
on either side of the line, which means our model is formed.

As our model is ready, so we can now remove the assumed centroids,


and the two final clusters will be as shown in the below image:

Q Learning Algorithm

 Q-learning is a fundamental algorithm in the field of reinforcement

learning (RL), a type of machine learning that focuses on training agents

to make sequential decisions through trial and error.

 In RL, the agent interacts with its environment, learning to achieve a

goal by maximizing cumulative rewards over time.


 Q-learning plays a crucial role within RL by offering a model-free

solution, meaning it does not require a predefined model of the

environment. Instead, the agent learns directly from experience, making it

well-suited for tasks where the environment’s behaviour is complex or

unknown.

 Q-learning is a model-free reinforcement learning algorithm that

enables agents to learn the optimal actions in a given environment

through trial and error.

Key Components of Q-learning

1. Temporal Difference (TD) Update

Q-learning uses temporal difference (TD) learning to update its

knowledge. The agent learns from both the immediate reward and the

expected future reward. This is done by comparing the predicted Q-value

with the actual reward, known as the TD error.

2. Q-Values (Action-Values)

Each state-action pair is assigned a Q-value, representing the estimated

future reward. As the agent interacts with the environment, these Q-

values are updated to reflect better estimates, guiding the agent towards

actions that yield the highest cumulative reward.


3. ϵ-Greedy Policy

The ϵ-greedy policy balances exploration and exploitation.

 Exploration: Trying new actions to discover better rewards.

 Exploitation: Choosing the action with the highest Q-value to

maximize known rewards.

With this policy, the agent selects a random action with a small

probability (ϵ) and the best-known action with the remaining

probability (1-ϵ). This helps the agent explore new strategies

without getting stuck in local optima.

3. Rewards and Episodes

Rewards: Feedback the agent receives after each action. Positive

rewards encourage the action, while negative rewards discourage it.

 Episodes: An episode is a complete cycle from the initial state to

the goal state. Q-learning aims to learn optimal behaviour over

several episodes.
Building a Machine Learning Model

The following steps are involved in building a machine learning model.

1. Data Collection

Machine learning requires training data, a lot of it. This data can either be
labelled meaning Supervised Learning or not labelled meaning
Unsupervised Learning.

Accuracy of the model depends on the quality and quantity of the


data. The outcome of this step is generally a representation of data which
will be used for training.

Datasets can be pre-collected from sites like Kaggle, UCI, etc. forms the
basis of you Machine learning project. Data can also be collected through
user-surveys, analysis reports, trends, usage metrics, etc.

2. Data Preparation

We cannot work on raw data. Data needs to be processed by


normalization, removing duplicates, errors and biases.

Visualising data can be helpful in searching for patterns and


outliers to check if the data collected is right or if it contains missing
values. This can be done using libraries like seaborn, matplotlib, etc.
Visualize data to help detect relevant relationships between variables or
class imbalances, or perform other exploratory analysis.

After performing data wrangling, we need to prepare the data for


training. Cleaning of data is done that involves steps like removing
duplicates, dealing with missing values, type conversions, correcting
errors, normalizing the data, etc.

Some datasets may not require data preparation at all while for
some data preparation step takes majority of their ML model build
time.

Data can also be randomized, which erases the effects of the particular
order in which we collected and/or otherwise prepared our data.
Later data can be split into training, testing and evaluation sets

3. Choose a Model / Algorithm

The third step consists of selecting the right model. There are many
models which can be used for many different purposes. Once the model is
selected, it needs to meet the specific goal.

Having a complex model does not mean a better model.

Common machine learning algorithms include Decision Trees, Random


Forest, Linear Regression, Support Vector Machines (SVM), Logistic
Regression, K-means, Principal Component Analysis (PCA), Naïve
Bayes, and Neural Networks. Different algorithms need to be applied to
different tasks, and we need to choose the correct one for our application.

4. Training the Model

Training a model forms the basis of machine learning. The goal is to use
our training data and improve the predictions of our model.

Every cycle in training a model involves updating


the weights and biases in each training step. We can use labelled sample
data in case supervised machine learning and unlabelled sample data for
unsupervised learning.

The goal of training is to evaluate and further improve our model


accuracy and performance. Training happens in the form of iterations
which is called a training step.

5. Evaluate the Model


After training the model comes evaluating the model. The larger the
number of variables in the real world, the bigger the training and test data
should be.

Performance metrics are used to measure the performance of the model.


These include precision, recall, accuracy, specificity, etc.

The model is then tested against previously unseen data. The unseen data
is meant to act as representative of model performance in the real world,
but still helps tune the model.

A 70/30 split, or similar, is considered a good train/eval split, which


depends on things like data availability, dataset features, domain, etc.

6. Parameter Tuning

The original model parameters need to be tested after evaluating your


model. By increasing the training, it can lead to better results.

Parameter tuning is an experimental process and hence we need to


define when to stop parameter tuning otherwise it will continue to tweak
the model.

Hyperparameter tuning is an art and one that requires patience &


experience. Once the model parameters are tuned it can give us better
results. Some common hyperparameters include: number of training
steps, learning rate, initialization values and distribution, etc.

7. Make Predictions

After the process of collecting data, preparing the data, selecting a


machine learning algorithm, training the model and evaluating the model
& tuning the parameters, we need to make predictions.

Our machine learning model can make predictions ranging from image
recognition to predictive analytics to natural language processing.

After building the model needs to be tested on a testing set to check how
the model performs on unseen data. It helps to further evaluate the model
and provides better approximation.

Maximum Likelihood Estimation


Maximum likelihood estimation (MLE) is a statistical approach that
determines the models’ parameters in machine learning. The idea is to
find the values of the model parameters that maximize the likelihood of
observed data such that the observed data is most probable.

The method is based on the likelihood function, which measures how


likely the observed data is for different values of the parameters.

MLE is widely used in many fields of research, including biology,


economics, finance, engineering, and social sciences. It provides a
powerful and flexible tool for modeling and analyzing complex data sets.
It is one of the primary and core concepts essential for learning other
advanced machine learning and deep learning techniques and algorithms.

Likelihood describes how to find the best distribution of the data for
some feature or some situation in the data given a certain value of some
feature or situation, while probability describes how to find the chance
of something given a sample distribution of data.

Consider a dataset containing the weight of the customers. Let’s say the
mean of the data is 70 & the standard deviation is 2.5.

When Probability has to be calculated for any situation using this dataset,
then the mean and standard deviation of the dataset will be constant. Let’s
say the probability of weight > 70 kg has to be calculated for a random
record in the dataset, then the equation will contain weight, mean and
standard deviation.

Considering the same dataset, now if we need to calculate the probability


of weight > 100 kg, then only the height part of the equation be changed
and the rest would be unchanged.
But in the case of Likelihood, the equation of the conditional probability
flips as compared to the equation in the probability calculation i.e mean
and standard deviation of the dataset will be varied to get the maximum
likelihood for weight > 70 kg.

Working of Maximum Likelihood Estimation


The maximization of the likelihood estimation is the main objective of
the MLE. Let’s understand this with an example. Consider there is a
binary classification problem in which we need to classify the data into
two categories either 0 or 1 based on a feature called “salary”.

So MLE will calculate the possibility for each data point in salary and
then by using that possibility, it will calculate the likelihood of those data
points to classify them as either 0 or 1. It will repeat this process of
likelihood until the learner line is best fitted. This process is known as the
maximization of likelihood.

The above explains the scenario, as we can see there is a threshold of 0.5
so if the possibility comes out to be greater than that it is labelled as 1
otherwise 0.
 Artificial Neural Networks contain artificial neurons which are called
units. These units are arranged in a series of layers that together
constitute the whole Artificial Neural Network in a system.
 A layer can have only a dozen units or millions of units as this
depends on how the complex neural networks will be required to
learn the hidden patterns in the dataset.
 Commonly, Artificial Neural Network has an input layer, an output
layer as well as hidden layers.
 The input layer receives data from the outside world which the neural
network needs to analyze or learn about. Then this data passes
through one or multiple hidden layers that transform the input into
data that is valuable for the output layer. Finally, the output layer
provides an output in the form of a response of the Artificial Neural
Networks to input data provided.
 In the majority of neural networks, units are interconnected from one
layer toanother. Each of these connections has weights that determine
the influence of one unit on another unit. As the data transfers from
one unit to another, the neural network learns more and more about
the data which eventually results in an output from the output layer.
The input layer of an artificial neural network is the first layer, and it
receives input from external sources and releases it to the hidden layer,
which is the second layer. In the hidden layer, each neuron receives input
from the previous layer neurons, computes the weighted sum, and sends it
to the neurons in the next layer.
These connections are weighted means effects of the inputs from the
previous layer are optimized more or less by assigning different weights
to each input and it is adjusted during the training process by optimizing
these weights for improved model performance.
Non-linearclassificationexampleusingNeural Networks:XOR
Among various logical gates, the XOR or also known as the “exclusive
or” problem is one of the logical operations when performed on binary
inputs that yield output for different combinations of input, and for the
same combination of input no output is produced. The outputs generated
by the XOR logic are notlinearlyseparable in the hyperplane.
From the below truth table, it can be inferred that XOR produces an
output for different states of inputs and for thesame inputs the XOR logic
does not produce any output. The Output of XOR logic is yielded by the
equation as shown below.

Thelinearseparabilityofpoints

Linearseparabilityofpoints is the ability to classify the datapoints in the


hyperplane by avoiding the overlapping of the classes in the planes. Each
of the classes should fall above or below the separating line and then they
are termed as linearly separable data points. With respect to logical gates
operations like AND or OR the outputs generated by this logic are
linearly separable in the hyperplane. The linear separable data points
appear to be as shown below.

Multi-layerperceptionis alsoknownas MLP.It is fully connected


dense layers, which transform any input dimension to the desired
dimension. A multi-layer perception is a neural network that has multiple
layers. To create a neuralnetwork, we combine neurons together so that
the outputs of some neurons are inputs of other neurons.
A multi-layer perceptron has one input layer and for each input, there is
one neuron (or node), it has one output layer with a single node for each
output andit can have any number of hidden layers and each hidden layer
canhaveanynumberofnodes.
AschematicdiagramofaMultiLayerPerceptron(MLP)isdepicted
below.
In the multi-layer perceptron diagram above, we can see that there are
three inputs and thus three input nodes and the hidden layer has three
nodes. The output layer gives two outputs, therefore there are two output
nodes. The nodes in the input layer take input and forward it for further
process, in the diagram above the nodes in the input layer forwardstheir
output to each of the three nodes in the hidden layer, and in the same
way, the hidden layer processes the information and passes it to the
output layer.
Every node in the multi-layer perception uses an activation function.The
activation function transforms the given input to corresponding output.

The XORproblemwith neural networkscan be solved


byusing Multi
Layer Perceptrons or a neural network architecture with
an input layer, hidden layer, and output layer. So during
the forward propagationthrough the neural networks, the
weights get updated to the corresponding layers and the
XOR logic gets executed

Back Propagation Algorithm

A backpropagation algorithm, or backward propagation of errors, is


an algorithm that's used to help train neural network models. The
algorithm adjusts the network's weights to minimize any gaps referred to
as errors -- between predicted outputs and the actual target output.

Weights are adjustable parameters that determine the strength of the


connections between artificial neurons also referred to as nodes in
different layers of a neural network. Specifically, weights determine how
much influence the output of one neuron has on the input of the next
neuron, which can directly influence the network's output and
performance.

Backpropagation is designed to test for errors working back from output


nodes to input nodes. It's an important mathematical tool for improving
the accuracy of predictions in ML and DL processes. Essentially,
backpropagation is an algorithm used to quickly calculate derivatives in a
neural network, which are the changes in output because of tuning and
adjustments.

Backpropagation is a fundamental aspect of training deep neural


networks, as it enables these networks to learn complex patterns in data
by fine-tuning the weights, improving their performance.

Neural networks are composed of multiple layers of interconnected


neurons. These are organized into three main layers: the input layer, the
hidden layer and the output layer.

 The input layer receives the raw data features. Each neuron in this
layer corresponds to a specific feature in the input data.

 The hidden layer, of which there can be more than one, processes the
data it receives. Hidden layer neurons apply weights, biases and
activation functions.
 The output layer produces the final output predictions. Neurons in this
layer represent different possible outputs of the model.

Artificial neural networks (ANNs) and deep neural networks use


backpropagation to compute gradients. This is done by passing any error
information backward through the network, from the output layer through
the hidden layer or layers to the input layer. The calculated gradients are
then used in an optimization algorithm called gradient descent. Gradient
descent minimizes errors or gaps between the predicted outputs and the
network's actual outputs by adjusting the weights of the network.

gradient descent helps the system minimize the gap between desired
outputs and actual system outputs. The algorithm tunes the system by
adjusting the weight values for various inputs to narrow the difference
between outputs. This is also known as the error between the two.

More specifically, gradient descent provides information on how a


network's parameters including weights and biases need to be adjusted
to reduce error. A cost function, which is a mathematical function that
measures this error, guides this process. The algorithm's goal is to
determine how the parameters must be adjusted to reduce the cost
function and improve overall accuracy.

In backpropagation, the error is propagated backward from the output


through the hidden layers, enabling the network to calculate how each
weight needs to be adjusted. The term backpropagation refers to this
process of propagating errors backward, from output nodes to input
nodes.

Activation functions can then activate neurons to learn new complex


patterns, information and whatever else they need to adjust their weights
and biases and mitigate this error to improve the network.

Stochastic Gradient Descent And Its variants

Gradient Descent is known as one of the most commonly used


optimization algorithms to train machine learning models by means of
minimizing errors between actual and expected results. Further, gradient
descent is also used to train Neural Networks.

Optimization is the task of minimizing the cost function parameterized by


the model's parameters. The main objective of gradient descent is to
minimize the cost function using iteration of parameter updates. Once
these machine learning models are optimized, these models can be used
as powerful tools for Artificial Intelligence and Deep Learning
applications.

Gradient Descent is defined as one of the most commonly used iterative


optimization algorithms of machine learning to train the machine
learning and deep learning models. It helps in finding the local
minimum of a function.
The cost function is defined as the measurement of difference or
error between actual values and expected values at the current
position and present in the form of a single real number. It helps to
increase and improve machine learning efficiency by providing feedback
to this model so that it can minimize error and find the local or global
minimum. Further, it continuously iterates along the direction of the
negative gradient until the cost function approaches zero. At this steepest
descent point, the model will stop learning further. Although cost
function and loss function are considered synonymous, also there is a
minor difference between them. The slight difference between the loss
function and the cost function is about the error within the training of
machine learning models, as loss function refers to the error of one
training example, while a cost function calculates the average error across
an entire training set.

How does Gradient Descent work?

Before starting the working principle of gradient descent, we should


know some basic concepts to find out the slope of a line from linear
regression. The equation for simple linear regression is given as:

Y=mX+c
Where 'm' represents the slope of the line, and 'c' represents the intercepts
on the y-axis.

The starting point(shown in above fig.) is used to evaluate the


performance as it is considered just as an arbitrary point. At this starting
point, we will derive the first derivative or slope and then use a tangent
line to calculate the steepness of this slope. Further, this slope will inform
the updates to the parameters (weights and bias).

The slope becomes steeper at the starting point or arbitrary point, but
whenever new parameters are generated, then steepness gradually
reduces, and at the lowest point, it approaches the lowest point, which is
called a point of convergence.

The main objective of gradient descent is to minimize the cost function or


the error between expected and actual. To minimize the cost function,
two data points are required:

Direction & Learning Rate

These two factors are used to determine the partial derivative calculation
of future iteration and allow it to the point of convergence or local
minimum or global minimum. Let's discuss learning rate factors in brief;
Learning Rate:

It is defined as the step size taken to reach the minimum or lowest point.
This is typically a small value that is evaluated and updated based on the
behavior of the cost function. If the learning rate is high, it results in
larger steps but also leads to risks of overshooting the minimum. At the
same time, a low learning rate shows the small step sizes, which
compromises overall efficiency but gives the advantage of more
precision.

Types of Gradient Descent


Based on the error in various training models, the Gradient Descent
learning algorithm can be divided into Batch gradient descent,
stochastic gradient descent, and mini-batch gradient descent. The
different types of gradient descent are:

1. Batch Gradient Descent:

Batch gradient descent (BGD) is used to find the error for each point in
the training set and update the model after evaluating all training
examples. This procedure is known as the training epoch. In simple
words, it is a greedy approach where we have to sum over all examples
for each update.

Advantages of Batch gradient descent:


o It produces less noise in comparison to other gradient descent.
o It produces stable gradient descent convergence.
o It is Computationally efficient as all resources are used for all
training samples.

2. Stochastic gradient descent

Stochastic gradient descent (SGD) is a type of gradient descent that runs


one training example per iteration. Or in other words, it processes a
training epoch for each example within a dataset and updates each
training example's parameters one at a time. As it requires only one
training example at a time, hence it is easier to store in allocated memory.
However, it shows some computational efficiency losses in comparison to
batch gradient systems as it shows frequent updates that require more
detail and speed. Further, due to frequent updates, it is also treated as a
noisy gradient. However, sometimes it can be helpful in finding the
global minimum and also escaping the local minimum.

Advantages of Stochastic gradient descent:

In Stochastic gradient descent (SGD), learning happens on every


example, and it consists of a few advantages over other gradient descent.

o It is easier to allocate in desired memory.


o It is relatively fast to compute than batch gradient descent.
o It is more efficient for large datasets.

3. MiniBatch Gradient Descent:

Mini Batch gradient descent is the combination of both batch gradient


descent and stochastic gradient descent. It divides the training datasets
into small batch sizes then performs the updates on those batches
separately. Splitting training datasets into smaller batches make a balance
to maintain the computational efficiency of batch gradient descent and
speed of stochastic gradient descent. Hence, we can achieve a special
type of gradient descent with higher computational efficiency and less
noisy gradient descent.

Advantages of Mini Batch gradient descent:

o It is easier to fit in allocated memory.


o It is computationally efficient.

o It produces stable gradient descent convergence.

Curse of Diamensionality
Curse of Dimensionality arises when working with high-dimensional
data, leading to increased computational complexity, overfitting, and
spurious correlations.

Techniques like dimensionality reduction, feature selection, and careful


model design are essential for mitigating its effects and improving
algorithm performance.

 Curse of Dimensionality refers to the phenomenon where the


efficiency and effectiveness of algorithms deteriorate as the
dimensionality of the data increases exponentially.
 In high-dimensional spaces, data points become sparse, making it
challenging to discern meaningful patterns or relationships due to
the vast amount of data required to adequately sample the space.
 Curse of Dimensionality leads to increased computational
complexity, longer training times, and higher resource requirements.
Moreover, it also causes the risk of overfitting and spurious
correlations, hindering the algorithms' ability to generalize well to
unseen data.
Hughes Phenomenon

The Hughes Phenomenon shows that as the number of features increases,


the classifier’s performance increases as well until we reach the optimal
number of features. Adding more features based on the same size as the
training set will then degrade the classifier’s performance.
To overcome the curse of diamensionality the followingf strategies are
considered.

1. Dimensionality Reduction Techniques:

 Feature Selection: Identify and select the most relevant features


from the original dataset while discarding irrelevant or redundant
ones. This reduces the dimensionality of the data, simplifying the
model and improving its efficiency.
 Feature Extraction: Transform the original high-dimensional data
into a lower-dimensional space by creating new features that capture
the essential information.

2. Data Preprocessing:
 Normalization: Scale the features to a similar range to prevent
certain features from dominating others, especially in distance-based
algorithms.
 Handling Missing Values: Address missing data appropriately
through imputation or deletion to ensure robustness in the model
training process.

You might also like