Machine Learning QB
Machine Learning QB
Unit 3
Q1. State the Formula For Linear Regression Using the Least Squares Method.
5 Marks
Ans :-
Linear Regression :- Least Square Method formula is used to find the best-
fitting line through a set of data points. For a simple linear regression, which is
a line of the form y=mx+c, where y is the dependent variable, x is the
independent variable, a is the slope of the line, and b is the y-intercept, the
formulas to calculate the slope (m) and intercept (c) of the line are derived
from the following equations:
1. Slope (m) Formula: m = n(Σxy)−(Σx)(Σy) / n(Σx2)−(Σx)2
2. Intercept (c) Formula: c = (Σy)−a(Σx) / n
Where:
n is the number of data points,
Σxy is the sum of the product of each pair of x and y values,
Σx is the sum of all x values,
1
Then, we try to represent all the marked points as a straight line or a linear
equation. The equation of such a line is obtained with the help of the Least
Square method. This is done to get the value of the dependent variable for an
independent variable for which the value was initially unknown. This helps us
to make predictions for the value of dependent variable.
Least Square Method Definition
Least Squares method is a statistical technique used to find the equation of
best-fitting curve or line to a set of data points by minimizing the sum of the
squared differences between the observed values and the values predicted by
the model.
This method aims at minimizing the sum of squares of deviations as much as
possible. The line obtained from such a method is called a regression line or
line of best fit.
Q2. What is Gradient Decent ? Explain Main Steps in the Algorithm. 5Marks
Ans : -
2
lower values of the function, and eventually reaching to the minimum of the
function.
We know that in any machine learning project our main aim relies on how
good our project accuracy is or how much our model prediction differs from
the actual data point. Based on the difference between model prediction and
actual data points we try to find the parameters of the model which give better
accuracy on our dataset\, In order to find these parameters we apply gradient
descent on the cost function of the machine learning model.
Steps Required in Gradient Descent Algorithm
Step 1 we first initialize the parameters of the model randomly
Step 2 Compute the gradient of the cost function with respect to each
parameter. It involves making partial differentiation of cost function with
respect to the parameters.
Step 3 Update the parameters of the model by taking steps in the
opposite direction of the model. Here we choose a hyper parameter
learning rate which is denoted by alpha. It helps in deciding the step size
of the gradient.
Step 4 Repeat steps 2 and 3 iteratively to get the best parameter for the
defined model.
Q3. What are the Key Components of Linear Regression? Explain Types of
Linear Regression in brief. 10 Marks
Ans :-
Some key components of linear regression include:
Linearity
A fundamental property of linear regression, linearity is used in forecasting,
neural networks, and other machine learning algorithms. Linearity refers
to the property of a system or model where the output is directly proportional
to the input, while nonlinearity implies that the relationship between input and
3
output is more complex and cannot be expressed as a simple linear function.
The term "linearity" in machine learning describes a straight-line,
proportionate link between input characteristics and output. According to
linear models, variations in the input feature set cause corresponding
variations in the output.
Normality
A key assumption of linear regression is that the residuals are normally
distributed. This means that the errors are random noise and the model has
captured all the signals in the data. Represents the number of equivalents
contained in one liter solution or the number of milli equivalents of solute
contained in one milliliter of solution.
Correlation coefficient
A statistical tool used to measure the strength of the relationship between
two variables. The correlation coefficient produces a value between -1 and
Most of the data in the world is interrelated by various factors. Data
Science deals with understanding the relationships between different
variables. This helps us learn the underlying patterns and connections that
can give us valuable insights. “Correlation Analysis” is an important tool used
to understand the type of relation between variables. In this article, we will
learn about correlation analysis and how to implement it.
Coefficient of determination
A measure of how well a linear regression model fits the data. The formula
for the coefficient of determination is used in many industries. In machine
learning, the coefficient of determination, often referred to as R2R^2R2, is a
statistical measure used to assess how well the model's predictions
approximate the actual data points. Specifically, it is a metric that indicates how
well the independent variables (features) explain the variation in the
dependent variable (target).
Gradient descent
A common method for solving linear regression problems. Gradient descent
minimizes the model's error iteratively to optimize the value of coefficients.
Gradient descent is used in various machine learning applications, such as
linear regression, logistic regression, and neural networks, to optimize the
model's parameters and improve its accuracy. It is a fundamental algorithm in
4
machine learning and is essential for training complex models with large
amounts of data.
Explain Types of Linear Regression :-
Cost Function :-
Cost function measures the performance of machine learning models.
It quantifies the error between the actual and predicted value of the
observation data.
In linear regression, there are many evaluation metrics (mean absolute error,
mean squared error, R squared, RMSLE, RMSE etc) to quantify the error, but
we generally use Mean Squared Error:
Cost function measures the performance of machine learning models.
It quantifies the error between the actual and predicted value of the observation data.
In linear regression, there are many evaluation metrics (mean absolute error, mean
squared error, R squared, RMSLE, RMSE etc) to quantify the error, but we generally
use Mean Squared Error:
7
This Mean squared function is also referred to as Cost Function. The steps to
apply gradient descent to minimize a cost function are:
1. Define the cost function: The cost function, also known as the loss function,
measures how well a model predicts outputs.
2. Choose a starting point: Start with an initial guess for the parameters or a
small number like 0.0.
3. Define the learning rate: The learning rate determines how much the
coefficient changes with each calculation.
4. Calculate the gradient: The gradient is the derivative of the cost function,
which indicates the direction to move in on the curve.
6. Repeat: Repeat steps 4 and 5 until the cost is close to zero or reaches a
minimum.
Gradient descent is an optimization algorithm that's used to train machine
learning models and neural networks. The goal is to find the optimal
parameter values that minimize the cost function and reduce errors between
predicted and actual results.
If the learning curve goes up and down without reaching a lower point, try
decreasing the learning rate.
1. Initialize: Set the initial values of the parameters, such as m=0 and c = 0.
2. Calculate the gradient: Calculate the partial derivative of the cost
function with respect to each parameter.
3. Update the parameters: Update the parameters in the opposite
direction of the gradients to minimize the cost function.
8
4. Repeat: Repeat the process until the cost function is very small or ideally
0.
Linear Regression
The value of the parameters left at the end of the process are the optimum
values.
Gradient descent is an optimization algorithm that helps find the best-fitting
line for linear regression by minimizing the error between the predicted and
actual values. The cost function used in linear regression is typically the Mean
Squared Error (MSE), which measures the average of the squared errors.
The learning rate determines the speed at which convergence is reached. A
large learning rate can cause you to miss the lowest point, while a small
learning rate can cause the process to take a long time.
Goal
9
Find the best-fitting line for a set of data by minimizing the error
between the predicted and actual values
Process
Cost function
For linear regression, the cost function is usually the Mean Squared Error
(MSE), which measures the average of the squared errors
Algorithm
1. Set initial values for the parameters, such as m=0 and c=0
2. Calculate the partial derivative of the cost function with respect to each
parameter
10
y=a+bxy=a+bx
where y is the dependent variable,
x is the independent variable and
a,b are coefficients.
Linear regression is ideal when the relationship between variables is linear.
11
Polynomial Regression Strength and Weakness :-
Strength :-
Weaknesses :-
Sensitive to outliers: Can be affected by outliers in the data
Overfitting: Can lead to overfitting if the degree of the polynomial is too high
Q8. Analysis the following data : Height (x) and Weight (Y). Is there a linear
relationship between them ? 5Marks .
Ans :-
Such a plot is called a scatter diagram or scatter plot. Looking at the plot it is
evident that there exists a linear relationship between height x and weight y,
but not a perfect one. The points appear to be following a line, but not exactly.
12
To determine if there's a linear relationship between height (x) and weight (y),
we usually use statistical methods like correlation and regression analysis. A
linear relationship implies that a change in height would result in a
proportional change in weight.
1. Scatter Plot : Plot the height and weight data on a graph. If the points
roughly form a straight line, a linear relationship might exist.
4. Plot the data: Create a scatter plot of height (X) on the horizontal axis and
weight (Y) on the vertical axis. This visual inspection can help you see if there is
an obvious trend or pattern.
6 . Perform hypothesis testing: You can test if the slope of the regression line
is significantly different from zero (using a t-test). If the p-value for the slope is
very small (usually less than 0.05), it indicates that the relationship is
statistically significant.
Q9. To determine the line of best fit for a set of data using the Least Squares
method, we need to calculate the slope (mmm) and the y-intercept (bbb) of
the line. The formula for the equation of a straight line is: Height (in
centimeters): [160, 162, 164, 166, 168] Weight (in kilograms): [52, 55, 57,
60, 61]
Ans :-
13
Here, we denote Height as x (independent variable) and Weight as y
(dependent variable). Now, we calculate the means of x and y values denoted
by X and Y respectively.
162 55 2 2 4 4
164 57 0 0 0 0
166 60 -2 -3 6 4
168 61 -4 -4 16 16
Sum 0 0 46 40
Now, the slope of the line of best fit can be calculated from the formula as follows:
m = (Σ (X – xi)✕(Y – yi)) / Σ(X – xi)2
m = 46/40 = 1.15
Now, the intercept will be calculated from the
formula as follows:
c = Y – mX
c = 57 – 1.15*164 = -131.6
Thus, the equation of the line of best fit becomes, y = 1.15 x – 131.6
14
Unit 4
Q1 . Define the Accuracy , Precision ,Recall and F1 Score Term. 5Marks
Ans :-
15
model which has the least FN value needs to be selected. In other words, a
model that has the highest recall value needs to be selected among all the
models.
F1-Score:
F1 score = 2* (Precision * Recall) / (Precision + Recall)
As we saw above, sometimes we need to give weightage to FP and sometimes
to FN. F1 score is a weighted average of Precision and Recall, which means
there is equal importance given to FP and FN. This is a very useful metric
compared to “Accuracy”. The problem with using accuracy is that if we have a
highly imbalanced dataset for training (for example, a training dataset with
95% positive class and 5% negative class), the model will end up learning how
to predict the positive class properly and will not learn how to identify the
negative class. But the model will still have very high accuracy in the test
dataset too as it will know how to identify the positives really well.
16
appropriate treatment which means, the decision made by the doctor will have
a positive effect on the patient and society.
17
unnecessary inconvenience for him and others as he will get unwanted
treatment and quarantine. This is called Type I Error.
c)True Negative (TN):
Let‟s say the patient was not suffering from Covid and the doctor also gave
him a clean chit. This is called TN or True Negative. This is because the case
was actually negative and was also classified as negative which is the right
thing to do. Now the patient will get treatment for his actual illness instead of
taking Covid treatment.
d)False Negative (FN):
Let‟s say the patient was suffering from Covid and the doctor did not diagnose
him with Covid. This is called FN or False Negative as the case was actually
positive but was falsely classified as negative. Now the patient will not get the
right treatment and also he will spread the disease to others. This is a highly
dangerous situation in this example. This is also called Type II Error.
Q3. Describe The Area Under The Curve (AUC) and the ROC Curve Represent
in Machine Learning. 10Marks
Ans :-
Area Under The Curve (AUC) :-
AUC or Area Under Curve is used in conjecture with ROC Curve which is
Receiver Operating Characteristics Curve. AUC is the area under the ROC Curve.
So let‟s first understand the ROC Curve.
A metric that measures how well a model can rank a positive example higher
than a negative example. AUC is calculated from the ROC curve. A higher AUC
18
indicates better performance. For example, an AUC of 1.0 means the model is
perfect, while an AUC of 0.5 means the model is no better than chance. The
area under the ROC curve (AUC) represents the probability that the model, if
given a randomly chosen positive and negative example, will rank the positive
higher than the negative.
The ROC curve is drawn by calculating the true positive rate (TPR) and false
positive rate (FPR) at every possible threshold (in practice, at selected
intervals), then graphing TPR over FPR.
19
Q4. What is Logistic Regression, and how does it work? 5Marks .
Ans :-
Logistic Regression : -
Logistic Regression is much similar to the Linear Regression except that how
they are used. Linear Regression is used for solving Regression problems,
whereas Logistic regression is used for solving the classification problems.
In Logistic regression, instead of fitting a regression line, we fit an "S" shaped
logistic function, which predicts two maximum values (0 or 1).
The curve from the logistic function indicates the likelihood of something
such as whether the cells are cancerous or not, a mouse is obese or not
based on its weight, etc.
Logistic Regression is a significant machine learning algorithm because it has
the ability to provide probabilities and classify new data using continuous
and discrete datasets.
Logistic Regression can be used to classify the observations using different
types of data and can easily determine the most effective variables used for
the classification. The below image is showing the logistic function:
20
Logistic regression is a statistical model that uses the logistic function, or
logic function, in mathematics as the equation between x and y. The logit
function maps y as a sigmoid function of x. If you plot this logistic regression
equation, you will get an S-curve as shown below.
Q5. Explain the Support Vector Machine (SVM) Algorithm and Explain types
of SVM. 10Marks
Ans :-
The goal of the SVM algorithm is to create the best line or decision boundary
that can segregate n-dimensional space into classes so that we can easily put
the new data point in the correct category in the future. This best decision
boundary is called a hyper plane.
SVM chooses the extreme points/vectors that help in creating the hyperplane.
These extreme cases are called as support vectors, and hence algorithm is
termed as Support Vector Machine. A support vector machine (SVM) is a type
of supervised learning algorithm used in machine learning to solve
classification and regression tasks; SVMs are particularly good at solving binary
classification problems, which require classifying the elements of a data set
into two groups.
Consider the below diagram in which there are two different categories that
are classified using a decision boundary or hyperplane:
21
Example: SVM can be understood with the example that we have used in the
KNN classifier. Suppose we see a strange cat that also has some features of
dogs, so if we want a model that can accurately identify whether it is a cat or
dog.
1) Linear SVM: Linear SVM is used for linearly separable data, which means if a
dataset can be classified into two classes by using a single straight line, then
such data is termed as linearly separable data, and classifier is used called as
Linear SVM classifier.
Linear SVMs use a linear decision boundary to separate the data points of
different classes. When the data can be precisely linearly separated, linear
SVMs are very suitable. This means that a single straight line (in 2D) or a
hyperplane (in higher dimensions) can entirely divide the data points into
their respective classes. A hyperplane that maximizes the margin between the
classes is the decision boundary.
22
feature space, where the data points can be linearly separated. A linear SVM
is used to locate a nonlinear decision boundary in this modified space.
K-Fold Cross-Validation :-
K-fold cross-validation approach divides the input dataset into K groups of
samples of equal sizes. These samples are called folds. For each learning set,
the prediction function uses k-1 folds, and the rest of the folds are used for the
test set. This approach is a very popular CV approach because it is easy to
understand, and the output is less biased than other methods.
23
Stratified k-fold cross-validation:-
This technique is similar to k-fold cross-validation with some little changes. This
approach works on stratification concept, it is a process of rearranging the data
to ensure that each fold or group is a good representative of the complete
dataset. To deal with the bias and variance, it is one of the best approaches.
It can be understood with an example of housing prices, such that the price of
some houses can be much high than other houses. To tackle such situations, a
stratified k- fold cross-validation technique is useful.
Q7. Explore the structure of an Artificial Neural Network with its layers and
components. 10 Marks
Ans :-
24
Artificial Neural Networks contain artificial neurons which are called units .
These units are arranged in a series of layers that together constitute the
whole Artificial Neural Network in a system.
Artificial Neural Networks :-
1. There are three layers in the network architecture: the input layer, the
hidden layer (more than one), and the output layer. Because of the numerous
layers are sometimes referred to as the MLP (Multi-Layer Perceptron)
Hidden Layer: The hidden layer presents in-between input and output layers.
It performs all the calculations to find hidden features and patterns.
Output Layer: The input goes through a series of transformations using the
hidden layer, which finally results in output that is conveyed using this layer.
2. It is possible to think of the hidden layer as a “distillation layer,” which
extracts some of the most relevant patterns from the inputs and sends them
on to the next layer for further analysis. It accelerates and improves the
efficiency of the network by recognizing just the most important information
from the inputs and discarding the redundant information.
3. The activation function is important for two reasons: first, it allows you to
turn on your computer.
25
computations, and produce outputs. Each neuron is connected to other
neurons through weighted connections.
Layers: These are made up of neurons and perform specific operations on the
input data. The three layers of an ANN are the input layer, hidden layer, and
output layer. A neural network is organized into layers, which are composed of
multiple neurons. The input layer receives the input data, the output layer
produces the final output, and the hidden layers are in between.
Weights: These represent the strength of the connections between
neurons. During training, the weights are adjusted to optimize the network's
performance. Weights and biases are parameters that determine the behavior
of a neural network. Each connection between neurons has an associated
weight, which controls the strength of the connection.
Activation functions: These are used in neural networks. For example, the
hyperbolic tangent (tanh) function is zero-centered, which allows the gradient
to move in both directions. An activation function introduces non-linearity into
the neural network. It takes the weighted sum of inputs from the previous layer
and produces an output. Common activation functions include the sigmoid
function, tanh function, and rectified linear unit (ReLU) function.
Learning algorithms: These are used to train the network.
Back propagation: This is used to tune the weights during training. Back
propagation is a key algorithm used to train neural networks. It computes the
gradient of the loss function with respect to the weights and biases of the
network.
Support Vector Machine :- Support Vector Machine or SVM is one of the most
popular Supervised Learning algorithms, which is used for Classification as well
as Regression problems. However, primarily, it is used for Classification
problems in Machine Learning.
The goal of the SVM algorithm is to create the best line or decision boundary
that can segregate n-dimensional space into classes so that we can easily put
the new data point in the correct category in the future. This best decision
boundary is called a hyperplane.
26
SVM chooses the extreme points/vectors that help in creating the hyperplane.
These extreme cases are called as support vectors, and hence algorithm is
termed as Support Vector Machine. A support vector machine (SVM) is a type
of supervised learning algorithm used in machine learning to solve
classification and regression tasks; SVMs are particularly good at solving binary
classification problems, which require classifying the elements of a data set
into two groups.
So as it is 2-d space so by just using a straight line, we can easily separate these two classes. But
there can be multiple lines that can separate these classes.
Consider the below image:
Hence, the SVM algorithm helps to find the best line or decision boundary; this
best boundary or region is called as a hyperplane. SVM algorithm finds the
closest point of the lines from both the classes. These points are called support
27
vectors. The distance between the vectors and the hyperplane is called as
margin. And the goal of SVM is to maximize this margin.
The hyperplane with maximum margin is called the optimal hyperplane.
Non-Linear SVM:
If data is linearly arranged, then we can separate it by using a straight line, but
for non- linear data, we cannot draw a single straight line.
Consider the below image :
So to separate these data points, we need to add one more dimension. For
linear data, we have used two dimensions x and y, so for non-linear data, we
will add a third dimension z. It can be calculated as:
z=x2 +y2
By adding the third dimension, the sample space will become as below image:
28
Since we are in 3-d Space, hence it is looking like a plane parallel to the x-axis. If
we convert it in 2d space with z=1, then it will become as:
Ans :-
29
Each layer works independently in its way to get the desired output and the
scenarios correspond to our conditions.
The difference between the actual output and the desired output is used to
calculate errors obtained in the result.
Error = actual output – desired output.
30
Q10. Explain Need of Back propagation and Explain Types of Back
propagation Neural Network. 10 Marks .
Ans :-
32