0% found this document useful (0 votes)
24 views

Machine Learning QB

Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
24 views

Machine Learning QB

Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 32

Machine Learning

Unit 3

Q1. State the Formula For Linear Regression Using the Least Squares Method.
5 Marks

Ans :-
Linear Regression :- Least Square Method formula is used to find the best-
fitting line through a set of data points. For a simple linear regression, which is
a line of the form y=mx+c, where y is the dependent variable, x is the
independent variable, a is the slope of the line, and b is the y-intercept, the
formulas to calculate the slope (m) and intercept (c) of the line are derived
from the following equations:
1. Slope (m) Formula: m = n(Σxy)−(Σx)(Σy) / n(Σx2)−(Σx)2
2. Intercept (c) Formula: c = (Σy)−a(Σx) / n

Where:
n is the number of data points,
Σxy is the sum of the product of each pair of x and y values,
Σx is the sum of all x values,

Least Square Method :- Least Square method is a fundamental mathematical


technique widely used in data analysis, statistics, and regression modeling to
identify the best-fitting curve or line for a given set of data points. This method
ensures that the overall error is reduced, providing a highly accurate model for
predicting future data trends.
In statistics, when the data can be represented on a cartesian plane by using
the independent and dependent variable as the x and y coordinates, it is called
scatter data. This data might not be useful in making interpretations or
predicting the values of the dependent variable for the independent variable.
So, we try to get an equation of a line that fits best to the given data points
with the help of the Least Square Method.

Least Square Method is used to derive a generalized linear equation between


two variables. when the value of the dependent and independent variable is
represented as the x and y coordinates in a 2D cartesian coordinate system.
Initially, known values are marked on a plot.
The plot obtained at this point is called a scatter plot.

1
Then, we try to represent all the marked points as a straight line or a linear
equation. The equation of such a line is obtained with the help of the Least
Square method. This is done to get the value of the dependent variable for an
independent variable for which the value was initially unknown. This helps us
to make predictions for the value of dependent variable.
Least Square Method Definition
Least Squares method is a statistical technique used to find the equation of
best-fitting curve or line to a set of data points by minimizing the sum of the
squared differences between the observed values and the values predicted by
the model.
This method aims at minimizing the sum of squares of deviations as much as
possible. The line obtained from such a method is called a regression line or
line of best fit.

Q2. What is Gradient Decent ? Explain Main Steps in the Algorithm. 5Marks

Ans : -

What is Gradient Descent


Gradient Descent is an iterative optimization algorithm that tries to find the
optimum value (Minimum/Maximum) of an objective function. It is one of the
most used optimization techniques in machine learning projects for updating
the parameters of a model in order to minimize a cost function.
The main aim of gradient descent is to find the best parameters of a model
which gives the highest accuracy on training as well as testing datasets. In
gradient descent, The gradient is a vector that points in the direction of the
steepest increase of the function at a specific point. Moving in the opposite
direction of the gradient allows the algorithm to gradually descend towards

2
lower values of the function, and eventually reaching to the minimum of the
function.
We know that in any machine learning project our main aim relies on how
good our project accuracy is or how much our model prediction differs from
the actual data point. Based on the difference between model prediction and
actual data points we try to find the parameters of the model which give better
accuracy on our dataset\, In order to find these parameters we apply gradient
descent on the cost function of the machine learning model.
Steps Required in Gradient Descent Algorithm
 Step 1 we first initialize the parameters of the model randomly
 Step 2 Compute the gradient of the cost function with respect to each
parameter. It involves making partial differentiation of cost function with
respect to the parameters.
 Step 3 Update the parameters of the model by taking steps in the
opposite direction of the model. Here we choose a hyper parameter
learning rate which is denoted by alpha. It helps in deciding the step size
of the gradient.
 Step 4 Repeat steps 2 and 3 iteratively to get the best parameter for the
defined model.

Q3. What are the Key Components of Linear Regression? Explain Types of
Linear Regression in brief. 10 Marks
Ans :-
Some key components of linear regression include:
 Linearity
A fundamental property of linear regression, linearity is used in forecasting,
neural networks, and other machine learning algorithms. Linearity refers
to the property of a system or model where the output is directly proportional
to the input, while nonlinearity implies that the relationship between input and

3
output is more complex and cannot be expressed as a simple linear function.
The term "linearity" in machine learning describes a straight-line,
proportionate link between input characteristics and output. According to
linear models, variations in the input feature set cause corresponding
variations in the output.
 Normality
A key assumption of linear regression is that the residuals are normally
distributed. This means that the errors are random noise and the model has
captured all the signals in the data. Represents the number of equivalents
contained in one liter solution or the number of milli equivalents of solute
contained in one milliliter of solution.

 Correlation coefficient
A statistical tool used to measure the strength of the relationship between
two variables. The correlation coefficient produces a value between -1 and
Most of the data in the world is interrelated by various factors. Data
Science deals with understanding the relationships between different
variables. This helps us learn the underlying patterns and connections that
can give us valuable insights. “Correlation Analysis” is an important tool used
to understand the type of relation between variables. In this article, we will
learn about correlation analysis and how to implement it.

 Coefficient of determination
A measure of how well a linear regression model fits the data. The formula
for the coefficient of determination is used in many industries. In machine
learning, the coefficient of determination, often referred to as R2R^2R2, is a
statistical measure used to assess how well the model's predictions
approximate the actual data points. Specifically, it is a metric that indicates how
well the independent variables (features) explain the variation in the
dependent variable (target).

 Gradient descent
A common method for solving linear regression problems. Gradient descent
minimizes the model's error iteratively to optimize the value of coefficients.
Gradient descent is used in various machine learning applications, such as
linear regression, logistic regression, and neural networks, to optimize the
model's parameters and improve its accuracy. It is a fundamental algorithm in

4
machine learning and is essential for training complex models with large
amounts of data.
Explain Types of Linear Regression :-

1) Simple Linear Regression:


If a single independent variable is used to predict the value of a numerical
dependent variable, then such a Linear Regression algorithm is called Simple
Linear Regression.
This is the simplest form of linear regression, and it involves only one
independent variable and one dependent variable. The equation for
simple linear regression is:
y=β0+β1Xy=β0+β1X
where:
 Y is the dependent variable
 X is the independent variable
 β0 is the intercept
 β1 is the slope

2) Multiple Linear Regression :-


If more than one independent variable is used to predict the value of a
numerical dependent variable, then such a Linear Regression algorithm is
called Multiple Linear Regression.
This involves more than one independent variable and one dependent
variable. The equation for multiple linear regression is:
y=β0+β1X1+β2X2+………βnXny=β0+β1X1+β2X2+………βnXn
where:
 Y is the dependent variable
 X1, X2, …, Xn are the independent variables
 β0 is the intercept
 β1, β2, …, βn are the slopes

Q4 . Explain the concept of regression and provide example of how it can be


used in real world problem. 5Marks
Ans :-
Regression :- Regression is a supervised learning technique which helps in
finding the correlation between variables and enables us to predict the
continuous output variable based on the one or more predictor variables. It is
mainly used for prediction, forecasting, time series modeling, and
determining the causal-effect relationship between variables.
Regression is a statistical method that helps determine the relationship
between a dependent variable and one or more independent variables. It's
5
Linear Regression

used to predict future events, understand the impact of different factors on an


outcome, and find cause and effect relationships.
Example :-
For example, the relationship between height and weight may be described by a
linear regression model.

 Predicting stock prices


Regression can be used to predict stock prices and estimate the range of
possible variations around the predicted quantity.

 Analyzing customer satisfaction


Regression can be used to analyze survey data to determine the impact of
factors like product quality, pricing, and customer service on customer
satisfaction.

 Comparing the impact of pollution on temperature


Regression can be used to compare the relationship between pollution levels
and temperature.

 Analyzing the relationship between income and happiness


Regression can be used to analyze the relationship between income and
happiness by surveying people and asking them to rank their happiness on a
scale.

 Analyzing the relationship between age and urea level


Regression can be used to analyze the relationship between age and urea
level in patients attending an accident and emergency unit.
6
Q5. Describe the steps involved in applying gradient decent to minimize
the cost function 10Marks
Ans :-
Gradient Decent :-
Gradient Descent is an iterative optimization algorithm that tries to find the
optimum value (Minimum/Maximum) of an objective function. It is one of the
most used optimization techniques in machine learning projects for updating
the parameters of a model in order to minimize a cost function.
The main aim of gradient descent is to find the best parameters of a model
which gives the highest accuracy on training as well as testing datasets. In
gradient descent, The gradient is a vector that points in the direction of the
steepest increase of the function at a specific point. Moving in the opposite
direction of the gradient allows the algorithm to gradually descend towards
lower values of the function, and eventually reaching to the minimum of the
function.

Cost Function :-
Cost function measures the performance of machine learning models.
It quantifies the error between the actual and predicted value of the
observation data.
In linear regression, there are many evaluation metrics (mean absolute error,
mean squared error, R squared, RMSLE, RMSE etc) to quantify the error, but
we generally use Mean Squared Error:
Cost function measures the performance of machine learning models.
It quantifies the error between the actual and predicted value of the observation data.
In linear regression, there are many evaluation metrics (mean absolute error, mean
squared error, R squared, RMSLE, RMSE etc) to quantify the error, but we generally
use Mean Squared Error:

7
This Mean squared function is also referred to as Cost Function. The steps to
apply gradient descent to minimize a cost function are:
1. Define the cost function: The cost function, also known as the loss function,
measures how well a model predicts outputs.

2. Choose a starting point: Start with an initial guess for the parameters or a
small number like 0.0.

3. Define the learning rate: The learning rate determines how much the
coefficient changes with each calculation.

4. Calculate the gradient: The gradient is the derivative of the cost function,
which indicates the direction to move in on the curve.

5. Update the parameters: Move in the direction opposite to the gradient by a


distance determined by the learning rate.

6. Repeat: Repeat steps 4 and 5 until the cost is close to zero or reaches a
minimum.
Gradient descent is an optimization algorithm that's used to train machine
learning models and neural networks. The goal is to find the optimal
parameter values that minimize the cost function and reduce errors between
predicted and actual results.
If the learning curve goes up and down without reaching a lower point, try
decreasing the learning rate.

Q6.Use Gradient Decent to optimize a cost function for a linear regression


problem. 10Marks
Ans :-
To use gradient descent to optimize a cost function for a linear regression
problem, you can follow these steps:

1. Initialize: Set the initial values of the parameters, such as m=0 and c = 0.
2. Calculate the gradient: Calculate the partial derivative of the cost
function with respect to each parameter.
3. Update the parameters: Update the parameters in the opposite
direction of the gradients to minimize the cost function.

8
4. Repeat: Repeat the process until the cost function is very small or ideally
0.

Linear Regression

The value of the parameters left at the end of the process are the optimum
values.
Gradient descent is an optimization algorithm that helps find the best-fitting
line for linear regression by minimizing the error between the predicted and
actual values. The cost function used in linear regression is typically the Mean
Squared Error (MSE), which measures the average of the squared errors.
The learning rate determines the speed at which convergence is reached. A
large learning rate can cause you to miss the lowest point, while a small
learning rate can cause the process to take a long time.

What Is Cost Function of Linear Regression?


Cost function measures the performance of a machine learning model for a
data set. Cost function quantifies the error between predicted and expected
values and presents that error in the form of a single real number. Depending
on the problem, cost function can be formed in many different ways. The
purpose of cost function is to be either minimized or maximized. For algorithms
relying on gradient descent to optimize model parameters, every function has
to be differentiable.

Gradient descent is an optimization algorithm that can be used to minimize a


cost function for a linear regression problem:

 Goal

9
Find the best-fitting line for a set of data by minimizing the error
between the predicted and actual values

 Process

Iteratively adjust the model's parameters (weights and bias) to minimize


the cost function

 Cost function

For linear regression, the cost function is usually the Mean Squared Error
(MSE), which measures the average of the squared errors

 Algorithm

1. Set initial values for the parameters, such as m=0 and c=0

2. Calculate the partial derivative of the cost function with respect to each
parameter

3. Update the parameters in the opposite direction of the gradients

4. Repeat until the cost function is very small or ideally 0

5. The final values of the parameters are the optimal values

Q7. Compare Linear Regression and polynomial regression in terms of their


strengths and weaknesses. when is one more appropriate than the other ?
5Marks
Ans :-

Linear Regression :- Regression analysis is a cornerstone technique in data


science and machine learning, used to model the relationship between a
dependent variable and one or more independent variables. Among the
various types of regression, Linear Regression and Polynomial Regression
are two fundamental approaches.
Linear regression is a statistical method used to model the relationship
between a dependent variable and one or more independent variables by
fitting a linear equation to observed data. The equation for simple linear
regression is:

10
y=a+bxy=a+bx
 where y is the dependent variable,
 x is the independent variable and
 a,b are coefficients.
 Linear regression is ideal when the relationship between variables is linear.

Linear Regression Strength and Weakness : -


Strength :- Linear regression is straightforward to understand and explain, and
can be regularized to avoid overfitting. In addition, linear models can be
updated easily with new data using stochastic gradient descent.

Weakness :- Linear regression performs poorly when there are non-linear


relationships.

Polynomial Regression :- Polynomial regression is an extension of linear


regression that models the relationship between the dependent variable and
the independent variable(s) as an n-th degree polynomial. The equation for
polynomial regression is:
y=a0+a1x+a2x2+a3x3+….+anxny=a0+a1x+a2x2+a3x3+….+anxn
 where y is dependent variable
 x is the independent variable and
 a0,a1,a2,a3,an are the coefficients.
Polynomial Regression is useful for modeling non-linear relationships where
the data points form a curve.

11
Polynomial Regression Strength and Weakness :-
Strength :-

 Flexible: Can be fitted to a wide range of curvatures and functions

 Accurate: Can provide a more accurate representation of relationships


between variables

 Applicable: Can be used in a variety of tasks, such as analyzing economic


data, environmental factors, or biological processes

Weaknesses :-
 Sensitive to outliers: Can be affected by outliers in the data

 Overfitting: Can lead to overfitting if the degree of the polynomial is too high

 Difficult to interpret: Can be difficult to interpret

 Requires more data: Requires more data than other methods

 Fewer model validation tools: Has fewer model validation techniques


available than linear regression.

Q8. Analysis the following data : Height (x) and Weight (Y). Is there a linear
relationship between them ? 5Marks .
Ans :-
Such a plot is called a scatter diagram or scatter plot. Looking at the plot it is
evident that there exists a linear relationship between height x and weight y,
but not a perfect one. The points appear to be following a line, but not exactly.

12
To determine if there's a linear relationship between height (x) and weight (y),
we usually use statistical methods like correlation and regression analysis. A
linear relationship implies that a change in height would result in a
proportional change in weight.

Here's a simplified way to analyze the data:

1. Scatter Plot : Plot the height and weight data on a graph. If the points
roughly form a straight line, a linear relationship might exist.

2. Correlation Coefficient : Calculate the correlation coefficient (r). If r is close


to 1 or -1, it indicates a strong linear relationship. If r is around 0, it indicates no
linear relationship.

3. Regression Line : Perform a linear regression to get an equation of the form


y = mx + c. The slope (m) indicates the rate of change of weight with respect to
height.

4. Plot the data: Create a scatter plot of height (X) on the horizontal axis and
weight (Y) on the vertical axis. This visual inspection can help you see if there is
an obvious trend or pattern.

5. Calculate the coefficient of determination (R²): R² indicates how well the


linear model explains the variation in the data. An R² value close to 1 indicates
a strong linear relationship.

6 . Perform hypothesis testing: You can test if the slope of the regression line
is significantly different from zero (using a t-test). If the p-value for the slope is
very small (usually less than 0.05), it indicates that the relationship is
statistically significant.

Q9. To determine the line of best fit for a set of data using the Least Squares
method, we need to calculate the slope (mmm) and the y-intercept (bbb) of
the line. The formula for the equation of a straight line is: Height (in
centimeters): [160, 162, 164, 166, 168] Weight (in kilograms): [52, 55, 57,
60, 61]

Ans :-

13
Here, we denote Height as x (independent variable) and Weight as y
(dependent variable). Now, we calculate the means of x and y values denoted
by X and Y respectively.

X = (160 + 162 + 164 + 166 +


168 ) / 5 = 164 Y = (52 + 55 +
57 + 60 + 61) / 5 = 57

Xi yi X – xi Y – yi (X-xi)*(Y- yi) (X – xi)2


160 52 4 5 20 16

162 55 2 2 4 4

164 57 0 0 0 0

166 60 -2 -3 6 4

168 61 -4 -4 16 16

Sum 0 0 46 40

Now, the slope of the line of best fit can be calculated from the formula as follows:
m = (Σ (X – xi)✕(Y – yi)) / Σ(X – xi)2
m = 46/40 = 1.15
Now, the intercept will be calculated from the
formula as follows:
c = Y – mX
c = 57 – 1.15*164 = -131.6
Thus, the equation of the line of best fit becomes, y = 1.15 x – 131.6

14
Unit 4
Q1 . Define the Accuracy , Precision ,Recall and F1 Score Term. 5Marks
Ans :-

Accuracy :- Accuracy = (TP + TN) / (TP + FP +TN + FN)


This term tells us how many right classifications were made out of all the
classifications. In other words, how many TPs and TNs were done out of TP +
TN + FP + FNs. It tells the ratio of “True”s to the sum of “True”s and “False”s.
The proportion of all classifications that were correct, regardless of whether
they were positive or negative.
Accuracy measures overall correctness, Precision evaluates positive prediction
quality, Recall assesses sensitivity to positive instances, and F1 Score balances
Precision and Recall. Understanding these metrics is crucial for comprehensive
model evaluation and optimization in machine learning applications.
Use case: Out of all the patients who visited the doctor, how many were
correctly diagnosed as Covid positive and Covid negative.

Precision :- Precision = TP / (TP + FP)


Out of all that were marked as positive, how many are actually truly positive.
Use case: Let‟s take another example of a classification algorithm that marks
emails as spam or not. Here, if emails that are of importance get marked as
positive, then useful emails will end up going to the “Spam” folder, which is
dangerous. Hence, the classification model which has the least FP value needs
to be selected. In other words, a model that has the highest precision needs to
be selected among all the models.

Recall :- Recall = TP/ (TN + FN)


Out of all the actual real positive cases, how many were identified as positive.
Use case: Out of all the actual Covid patients who visited the doctor, how
many were actually diagnosed as Covid positive. Hence, the classification

15
model which has the least FN value needs to be selected. In other words, a
model that has the highest recall value needs to be selected among all the
models.

F1-Score:
F1 score = 2* (Precision * Recall) / (Precision + Recall)
As we saw above, sometimes we need to give weightage to FP and sometimes
to FN. F1 score is a weighted average of Precision and Recall, which means
there is equal importance given to FP and FN. This is a very useful metric
compared to “Accuracy”. The problem with using accuracy is that if we have a
highly imbalanced dataset for training (for example, a training dataset with
95% positive class and 5% negative class), the model will end up learning how
to predict the positive class properly and will not learn how to identify the
negative class. But the model will still have very high accuracy in the test
dataset too as it will know how to identify the positives really well.

Q2. Describe Confusion Matrix ? 5Marks


Ans :-
Confusion Matrix :- Confusion Matrix usually causes a lot of confusion even in
those who are using them regularly. Terms used in defining a confusion matrix
are TP, TN, FP, and FN.
Use case: Let‟s take an example of a patient who has gone to a doctor with
certain symptoms. Since it‟s the season of Covid, let‟s assume that he went
with fever, cough, throat ache, and cold. These are symptoms that can occur
during any seasonal changes too. Hence, it is tricky for the doctor to do the
right diagnosis.

a)True Positive (TP):


Let‟s say the patient was actually suffering from Covid and on doing the
required assessment, the doctor classified him as a Covid patient. This is called
TP or True Positive. This is because the case is positive in real and at the same
time the case was classified correctly. Now, the patient can be given

16
appropriate treatment which means, the decision made by the doctor will have
a positive effect on the patient and society.

b)False Positive (FP):


Let‟s say the patient was not suffering from Covid and he was only showing
symptoms of seasonal flu but the doctor diagnosed him with Covid. This is
called FP or False Positive. This is because the case was actually negative but
was falsely classified as positive. Now, the patient will end up getting admitted
to the hospital or home and will be given treatment for Covid. This is an
unnecessary inconvenience for him and others as he will get unwanted
treatment and quarantine. This is called Type I Error.

c)True Negative (TN):


Let‟s say the patient was not suffering from Covid and the doctor also gave
him a clean chit. This is called TN or True Negative. This is because the case
was actually negative and was also classified as negative which is the right
thing to do. Now the patient will get treatment for his actual illness instead of
taking Covid treatment.

d)False Negative (FN):


Let‟s say the patient was suffering from Covid and the doctor did not diagnose
him with Covid. This is called FN or False Negative as the case was actually
positive but was falsely classified as negative. Now the patient will not get the
right treatment and also he will spread the disease to others. This is a highly
dangerous situation in this example. This is also called Type II Error.

a)True Positive (TP):


Let‟s say the patient was actually suffering from Covid and on doing the
required assessment, the doctor classified him as a Covid patient. This is called
TP or True Positive. This is because the case is positive in real and at the same
time the case was classified correctly. Now, the patient can be given
appropriate treatment which means, the decision made by the doctor will have
a positive effect on the patient and society.

b)False Positive (FP):


Let‟s say the patient was not suffering from Covid and he was only showing
symptoms of seasonal flu but the doctor diagnosed him with Covid. This is
called FP or False Positive. This is because the case was actually negative but
was falsely classified as positive. Now, the patient will end up getting admitted
to the hospital or home and will be given treatment for Covid. This is an

17
unnecessary inconvenience for him and others as he will get unwanted
treatment and quarantine. This is called Type I Error.
c)True Negative (TN):
Let‟s say the patient was not suffering from Covid and the doctor also gave
him a clean chit. This is called TN or True Negative. This is because the case
was actually negative and was also classified as negative which is the right
thing to do. Now the patient will get treatment for his actual illness instead of
taking Covid treatment.
d)False Negative (FN):
Let‟s say the patient was suffering from Covid and the doctor did not diagnose
him with Covid. This is called FN or False Negative as the case was actually
positive but was falsely classified as negative. Now the patient will not get the
right treatment and also he will spread the disease to others. This is a highly
dangerous situation in this example. This is also called Type II Error.

A confusion matrix is a table that is used to define the performance of a


classification algorithm. A confusion matrix visualizes and summarizes the
performance of a classification algorithm. A confusion matrix is shown in Table
5.1, where benign tissue is called healthy and malignant tissue is considered
cancerous.

Q3. Describe The Area Under The Curve (AUC) and the ROC Curve Represent
in Machine Learning. 10Marks
Ans :-
Area Under The Curve (AUC) :-
AUC or Area Under Curve is used in conjecture with ROC Curve which is
Receiver Operating Characteristics Curve. AUC is the area under the ROC Curve.
So let‟s first understand the ROC Curve.
A metric that measures how well a model can rank a positive example higher
than a negative example. AUC is calculated from the ROC curve. A higher AUC

18
indicates better performance. For example, an AUC of 1.0 means the model is
perfect, while an AUC of 0.5 means the model is no better than chance. The
area under the ROC curve (AUC) represents the probability that the model, if
given a randomly chosen positive and negative example, will rank the positive
higher than the negative.

Receiver-operating characteristic curve (ROC) :-


A ROC Curve is drawn by plotting TPR or True Positive Rate or Recall or
Sensitivity (which we saw above) in the y-axis against FPR or False Positive Rate
in the x-axis. FPR = 1- Specificity (which we saw above).
TPR = TP/ (TP + FN)
FPR = 1 – TN/ (TN+FP) = FP/ (TN + FP)
If we use a random model to classify, it has a 50% probability of classifying the
positive and negative classes correctly. Here, the AUC = 0.5. A perfect model
has a 100% probability of classifying the positive and negative classes correctly.
Here, the AUC = 1. So when we want to select the best model, we want a
model that is closest to the perfect model. In other words, a model with AUC
close to 1. When we say a model has a high AUC score, it means the model‟s
ability to separate the classes is very high (high separability). This is a very
important metric that should be checked while selecting a classification model.

The ROC curve is a visual representation of model performance across all


thresholds. The long version of the name, receiver operating characteristic, is a
holdover from WWII radar detection.

The ROC curve is drawn by calculating the true positive rate (TPR) and false
positive rate (FPR) at every possible threshold (in practice, at selected
intervals), then graphing TPR over FPR.

19
Q4. What is Logistic Regression, and how does it work? 5Marks .

Ans :-

Logistic Regression : -

Logistic regression is one of the most popular Machine Learning algorithms,


which comes under the Supervised Learning technique. It is used for predicting
the categorical dependent variable using a given set of independent variables.
Logistic regression predicts the output of a categorical dependent variable.
Therefore the outcome must be a categorical or discrete value. It can be either
Yes or No, 0 or 1, true or False, etc. but instead of giving the exact value as 0
and 1, it gives the probabilistic values which lie between 0 and 1.

Logistic Regression is much similar to the Linear Regression except that how
they are used. Linear Regression is used for solving Regression problems,
whereas Logistic regression is used for solving the classification problems.
In Logistic regression, instead of fitting a regression line, we fit an "S" shaped
logistic function, which predicts two maximum values (0 or 1).
 The curve from the logistic function indicates the likelihood of something
such as whether the cells are cancerous or not, a mouse is obese or not
based on its weight, etc.
 Logistic Regression is a significant machine learning algorithm because it has
the ability to provide probabilities and classify new data using continuous
and discrete datasets.
 Logistic Regression can be used to classify the observations using different
types of data and can easily determine the most effective variables used for
the classification. The below image is showing the logistic function:

 What it's used for


Logistic regression is often used for classification and predictive analytics in a
variety of applications, including:
 Disease risk prediction: Predicting the risk of developing a disease
 Customer behavior prediction: Predicting a customer's likelihood of
purchasing a product or canceling a subscription
 Business risk prediction: Predicting the likelihood of a homeowner defaulting
on a mortgage
 Disaster planning: Predicting the decisions of building occupants during
evacuations.

20
 Logistic regression is a statistical model that uses the logistic function, or
logic function, in mathematics as the equation between x and y. The logit
function maps y as a sigmoid function of x. If you plot this logistic regression
equation, you will get an S-curve as shown below.

Q5. Explain the Support Vector Machine (SVM) Algorithm and Explain types
of SVM. 10Marks
Ans :-

Support Vector Machine (SVM)


Support Vector Machine or SVM is one of the most popular Supervised
Learning algorithms, which is used for Classification as well as Regression
problems. However, primarily, it is used for Classification problems in Machine
Learning.

The goal of the SVM algorithm is to create the best line or decision boundary
that can segregate n-dimensional space into classes so that we can easily put
the new data point in the correct category in the future. This best decision
boundary is called a hyper plane.

SVM chooses the extreme points/vectors that help in creating the hyperplane.
These extreme cases are called as support vectors, and hence algorithm is
termed as Support Vector Machine. A support vector machine (SVM) is a type
of supervised learning algorithm used in machine learning to solve
classification and regression tasks; SVMs are particularly good at solving binary
classification problems, which require classifying the elements of a data set
into two groups.
Consider the below diagram in which there are two different categories that
are classified using a decision boundary or hyperplane:

21
Example: SVM can be understood with the example that we have used in the
KNN classifier. Suppose we see a strange cat that also has some features of
dogs, so if we want a model that can accurately identify whether it is a cat or
dog.

Types of Support Vector Machine :-


There are the two types of SVM .

1) Linear SVM: Linear SVM is used for linearly separable data, which means if a
dataset can be classified into two classes by using a single straight line, then
such data is termed as linearly separable data, and classifier is used called as
Linear SVM classifier.
Linear SVMs use a linear decision boundary to separate the data points of
different classes. When the data can be precisely linearly separated, linear
SVMs are very suitable. This means that a single straight line (in 2D) or a
hyperplane (in higher dimensions) can entirely divide the data points into
their respective classes. A hyperplane that maximizes the margin between the
classes is the decision boundary.

2) Non-linear SVM: Non-Linear SVM is used for non-linearly separated data,


which means if a dataset cannot be classified by using a straight line, then such
data is termed as non-linear data and classifier used is called as Non-linear
SVM classifier.
Non-Linear SVM can be used to classify data when it cannot be separated into
two classes by a straight line (in the case of 2D). By using kernel functions,
nonlinear SVMs can handle nonlinearly separable data. The original input
data is transformed by these kernel functions into a higher-dimensional

22
feature space, where the data points can be linearly separated. A linear SVM
is used to locate a nonlinear decision boundary in this modified space.

Q6. Perform K-Fold Cross-Validation on a machine learning model. 5Marks


Ans :-

K-Fold Cross-Validation :-
K-fold cross-validation approach divides the input dataset into K groups of
samples of equal sizes. These samples are called folds. For each learning set,
the prediction function uses k-1 folds, and the rest of the folds are used for the
test set. This approach is a very popular CV approach because it is easy to
understand, and the output is less biased than other methods.

The steps for k-fold cross-validation are:


 Split the input dataset into K groups
 For each group:
 Take one group as the reserve or test data set.
 Use remaining groups as the training dataset
 Fit the model on the training set and evaluate the performance of the
model using the test set.

Let's take an example of 5-folds cross-validation. So, the dataset is grouped


into 5 folds. On 1st iteration, the first fold is reserved for test the model, and
rest are used to train the model. On 2nd iteration, the second fold is used to
test the model, and rest are used to train the model. This process will continue
until each fold is not used for the test fold.

23
Stratified k-fold cross-validation:-
This technique is similar to k-fold cross-validation with some little changes. This
approach works on stratification concept, it is a process of rearranging the data
to ensure that each fold or group is a good representative of the complete
dataset. To deal with the bias and variance, it is one of the best approaches.
It can be understood with an example of housing prices, such that the price of
some houses can be much high than other houses. To tackle such situations, a
stratified k- fold cross-validation technique is useful.

Q7. Explore the structure of an Artificial Neural Network with its layers and
components. 10 Marks
Ans :-

Artificial Neural Network :- Artificial Neural Networks (ANN) are algorithms


based on brain function and are used to model complicated patterns and
forecast issues. The Artificial Neural Network (ANN) is a deep learning method
that arose from the concept of the human brain Biological Neural Networks.
The development of ANN was the result of an attempt to replicate the
workings of the human brain. The workings of ANN are extremely similar to
those of biological neural networks, although they are not identical. ANN
algorithm accepts only numeric and structured data.

Artificial Neural Network is a computing system based on a collection of nodes


called neurons connected by weighted links. ANN consists of an input layer, an
output layer, and single or multiple hidden layers. During training, the loss is
back-propagated for effective tuning of the weights.

24
Artificial Neural Networks contain artificial neurons which are called units .
These units are arranged in a series of layers that together constitute the
whole Artificial Neural Network in a system.
Artificial Neural Networks :-
1. There are three layers in the network architecture: the input layer, the
hidden layer (more than one), and the output layer. Because of the numerous
layers are sometimes referred to as the MLP (Multi-Layer Perceptron)

Artificial Neural Network with its layers :-


Input Layer: As the name suggests, it accepts inputs in several different
formats provided by the programmer.

Hidden Layer: The hidden layer presents in-between input and output layers.
It performs all the calculations to find hidden features and patterns.

Output Layer: The input goes through a series of transformations using the
hidden layer, which finally results in output that is conveyed using this layer.
2. It is possible to think of the hidden layer as a “distillation layer,” which
extracts some of the most relevant patterns from the inputs and sends them
on to the next layer for further analysis. It accelerates and improves the
efficiency of the network by recognizing just the most important information
from the inputs and discarding the redundant information.
3. The activation function is important for two reasons: first, it allows you to
turn on your computer.

Artificial Neural Network with Components :-


 Neurons: The basic building blocks of an ANN, neurons are mathematical
functions that process input data and produce an output. Neurons are the
basic building blocks of a neural network. They receive inputs, perform

25
computations, and produce outputs. Each neuron is connected to other
neurons through weighted connections.
 Layers: These are made up of neurons and perform specific operations on the
input data. The three layers of an ANN are the input layer, hidden layer, and
output layer. A neural network is organized into layers, which are composed of
multiple neurons. The input layer receives the input data, the output layer
produces the final output, and the hidden layers are in between.
 Weights: These represent the strength of the connections between
neurons. During training, the weights are adjusted to optimize the network's
performance. Weights and biases are parameters that determine the behavior
of a neural network. Each connection between neurons has an associated
weight, which controls the strength of the connection.
 Activation functions: These are used in neural networks. For example, the
hyperbolic tangent (tanh) function is zero-centered, which allows the gradient
to move in both directions. An activation function introduces non-linearity into
the neural network. It takes the weighted sum of inputs from the previous layer
and produces an output. Common activation functions include the sigmoid
function, tanh function, and rectified linear unit (ReLU) function.
 Learning algorithms: These are used to train the network.
 Back propagation: This is used to tune the weights during training. Back
propagation is a key algorithm used to train neural networks. It computes the
gradient of the loss function with respect to the weights and biases of the
network.

Q8. Analyze the working of Support Vector Machine 5Marks .


Ans :-

Support Vector Machine :- Support Vector Machine or SVM is one of the most
popular Supervised Learning algorithms, which is used for Classification as well
as Regression problems. However, primarily, it is used for Classification
problems in Machine Learning.

The goal of the SVM algorithm is to create the best line or decision boundary
that can segregate n-dimensional space into classes so that we can easily put
the new data point in the correct category in the future. This best decision
boundary is called a hyperplane.

26
SVM chooses the extreme points/vectors that help in creating the hyperplane.
These extreme cases are called as support vectors, and hence algorithm is
termed as Support Vector Machine. A support vector machine (SVM) is a type
of supervised learning algorithm used in machine learning to solve
classification and regression tasks; SVMs are particularly good at solving binary
classification problems, which require classifying the elements of a data set
into two groups.

Working of Support Vector Machine


Linear SVM:
Linear SVM is used for linearly separable data, which means if a dataset can be
classified into two classes by using a single straight line, then such data is
termed as linearly separable data, and classifier is used called as Linear SVM
classifier.

So as it is 2-d space so by just using a straight line, we can easily separate these two classes. But
there can be multiple lines that can separate these classes.
Consider the below image:

Hence, the SVM algorithm helps to find the best line or decision boundary; this
best boundary or region is called as a hyperplane. SVM algorithm finds the
closest point of the lines from both the classes. These points are called support

27
vectors. The distance between the vectors and the hyperplane is called as
margin. And the goal of SVM is to maximize this margin.
The hyperplane with maximum margin is called the optimal hyperplane.

Non-Linear SVM:
If data is linearly arranged, then we can separate it by using a straight line, but
for non- linear data, we cannot draw a single straight line.
Consider the below image :

So to separate these data points, we need to add one more dimension. For
linear data, we have used two dimensions x and y, so for non-linear data, we
will add a third dimension z. It can be calculated as:

z=x2 +y2
By adding the third dimension, the sample space will become as below image:

28
Since we are in 3-d Space, hence it is looking like a plane parallel to the x-axis. If
we convert it in 2d space with z=1, then it will become as:

Q9. Analyze the working of back Propagation algorithm is used to minimize


the loss function. 5Marks .

Ans :-

Back Propagation Algorithm :- Back propagation is widely used in neural


network training and calculates the loss function with respect to the weights of
the network. It functions with a multi-layer neural network and observes the
internal representations of input-output mapping. This article gives an
overview of the back propagation neural network along with its advantages
and disadvantages.
The back propagation works on 4 layers. They are the input layer, hidden layer,
hidden layer 2, and final output layer. Hence, it has 3 main layers. They are
 Input layer
 Hidden layer and
 Output layer

29
Each layer works independently in its way to get the desired output and the
scenarios correspond to our conditions.

Working of Back propagation :-

The working of backpropagation can be explained from the figure shown


below.
 The input layer receives the inputs X through the preconnected path
 Input is customized by using actual weights „W‟, where the weights are
selected randomly.
 Output is calculated for every neuron from the input layer, at the hidden
layer and the output data has arrived at the output layer
 Evaluate the errors obtained from the outputs.
 To decrease the error, adjust the weights by going back to the hidden
layer from the output layer.
 Repeat the process until the desired output is obtained.

The difference between the actual output and the desired output is used to
calculate errors obtained in the result.
Error = actual output – desired output.

loss function in back propagation algorithm :- Backward propagate the error


through the network to compute the gradient of the loss function with respect
to each weight. Update the weights in the opposite direction of the gradient
using an optimization algorithm such as Stochastic Gradient Descent (SGD).

30
Q10. Explain Need of Back propagation and Explain Types of Back
propagation Neural Network. 10 Marks .

Ans :-

Need of Back Propagation :- The back propagation technology helps to adjust


the weights of the network connections to minimize the difference between
the actual output and the desired output of the net, which is calculated as a
loss function.
Back propagation (short for "Backward Propagation of Errors") is a method
used to train artificial neural networks. Its goal is to reduce the difference
between the model’s predicted output and the actual output by adjusting the
weights and biases in the network.
In this article, we will explore what back propagation is, why it is crucial in
machine learning, and how it works.

 Helps to simplify the network structure by removing the weighted links,


so that the trained network will have the minimum effect
 This method is especially applicable in deep neural networks, which
work on error-prone projects like speech and image recognition.
 It functions with multiple inputs using chain rules and power rules.
 It is used to calculate the gradient of the loss function with respect to all
the weights in the network.
 Minimizes the loss function by updating the weights with the gradient
optimization method.
 Modifies the weights of the connected nodes during the process of
training to produce „learning‟.
 This method is iterative, recursive, and more efficient.

Types of Back propagation Neural Network


The back propagation neural network is classified into two types. They are,

1) Static Back Propagation Neural Network


In this type of back propagation, the static output is generated due to the
mapping of static input. It is used to resolve static classification problems like
optical character recognition. Used in feed forward neural networks, where
information moves in one direction from input to output nodes. This type of
back propagation is used for static problems, like optical character recognition
(OCR) and predicting the class of an image.

2) Recurrent Back propagation Neural Network


31
The Recurrent Propagation is directed forward or conducted until a certain
determined value or threshold value is reached. After the certain value, the
error is evaluated and propagated backward. The key difference between these
two types is; mapping is static and fast in static back propagation while in
recurrent back propagation it is non-static. Used in recurrent neural networks
(RNNs), where data becomes part of a feedback loop in the hidden
nodes. This type of back propagation is used for non-static problems that
change over time, like sentiment analysis and speech recognition.

32

You might also like