0% found this document useful (0 votes)

19 views

Pdf&rendition 1 3

Machine Learning Bsc CS

Uploaded by

rexy000000000000000000000

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

19 views

Pdf&rendition 1 3

Machine Learning Bsc CS

Uploaded by

rexy000000000000000000000

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 49

Deccan Education Society’s

WILLINGDON COLLEGE, SANGLI

B.Sc. Computer Sci ( Entire) DEPARTMENT

B.SC. Part - III

Subject – Machine Learning

B. Sc. Part- III Computer Science Entire (Semester I)
Course Title: Machine Learning

3. Machine Learning Modelling

• ML Modeling flow, How to treat Data in ML?
• Types of machine learning, performance measures
• Bias-Variance Trade-Off
• Overfitting & Under fitting, Bootstrap Sampling, Bagging, Aggregation
Types of machine learning
ML algorithms help to solve different business problems like Regression,
Classification, Forecasting, Clustering, and Associations, etc.
Based on the methods and way of learning, machine learning is divided
into mainly four types, which are:
• Supervised Machine Learning
• Unsupervised Machine Learning
• Semi-Supervised Machine Learning
• Reinforcement Learning
Types of machine learning
1. Supervised Machine Learning
As its name suggests, Supervised Machine Learning is based on
supervision. It means in the supervised learning technique, we train the
machines using the "labelled" dataset, and based on the training, the
machine predicts the output. Here, the labelled data specifies that some
of the inputs are already mapped to the output. More preciously, we can
say; first, we train the machine with the input and corresponding
output, and then we ask the machine to predict the output using the
test dataset.
Types of machine learning
1. Supervised Machine Learning
Let’s understand it with the help of an example.
Example: Consider a scenario where you have to build an image
classifier to differentiate between cats and dogs. If you feed the datasets
of dogs and cats labelled images to the algorithm, the machine will learn
to classify between a dog or a cat from these labelled images. When we
input new dog or cat images that it has never seen before, it will use the
learned algorithms and predict whether it is a dog or a cat. This is
how supervised learning works, and this is particularly an image
classification.
Types of machine learning
1. Supervised Machine Learning
1. Supervised Machine Learning
Categories of Supervised Machine Learning
Supervised machine learning can be classified into two types of
problems, which are given below:
Classification
a) Classification b) Regression
b) Classification
Classification algorithms are used to solve the classification problems in
which the output variable is categorical, such as "Yes" or No, Male or
Female, Red or Blue, etc. The classification algorithms predict the
categories present in the dataset. Some real-world examples of
classification algorithms are Spam Detection, Email filtering, etc.
Some popular classification algorithms are given below:
• Random Forest Algorithm
• Decision Tree Algorithm
• Logistic Regression Algorithm
• Support Vector Machine Algorithm
Types of machine learning
1. Supervised Machine Learning
b) Regression
Regression algorithms are used to solve regression problems in which
there is a linear relationship between input and output variables. These
are used to predict continuous output variables, such as market trends,
weather prediction, etc.
Some popular Regression algorithms are given below:
• Simple Linear Regression Algorithm
• Multivariate Regression Algorithm
• Decision Tree Algorithm
• Lasso Regression
Types of machine learning
1. Supervised Machine Learning
Advantages and Disadvantages of Supervised Learning
Advantages -
• Since supervised learning work with the labelled dataset so we can
have an exact idea about the classes of objects.
• These algorithms are helpful in predicting the output on the basis of
prior experience.
Disadvantages -
• These algorithms are not able to solve complex tasks.
• It may predict the wrong output if the test data is different from the
training data.
• It requires lots of computational time to train the algorithm.
Types of machine learning
2. Unsupervised Machine Learning
Unsupervised learning is different from the Supervised learning
technique; as its name suggests, there is no need for supervision. It
means, in unsupervised machine learning, the machine is trained using
the unlabelled dataset, and the machine predicts the output without
any supervision.
In unsupervised learning, the models are trained with the data that
is neither classified nor labelled, and the model acts on that data
without any supervision.
The main aim of the unsupervised learning algorithm is to group or
categories the unsorted dataset according to the similarities, patterns,
and differences. Machines are instructed to find the hidden patterns
from the input dataset.
Types of machine learning
2. Unsupervised Machine Learning

Let's take an example to understand it more preciously; suppose

there is a basket of fruit images, and we input it into the machine
learning model. The images are totally unknown to the model, and the
task of the machine is to find the patterns and categories of the objects.
So, now the machine will discover its patterns and differences, such
as colour difference, shape difference, and predict the output when it is
tested with the test dataset.
Types of machine learning
2. Unsupervised Machine Learning
Categories of Unsupervised Machine Learning
Unsupervised Learning can be further classified into two types, which
are given below:
Clustering Association
1) Clustering
The clustering technique is used when we want to find the inherent
groups from the data. It is a way to group the objects into a cluster such
that the objects with the most similarities remain in one group and have
fewer or no similarities with the objects of other groups. An example of
the clustering algorithm is grouping the customers by their purchasing
behaviour.
Some of the popular clustering algorithms are given below:
K-Means Clustering algorithm Mean-shift algorithm
DBSCAN Algorithm Principal Component Analysis
Independent Component Analysis
Types of machine learning
2. Unsupervised Machine Learning
Categories of Unsupervised Machine Learning
2) Association
Association rule learning is an unsupervised learning technique,
which finds interesting relations among variables within a large dataset.
The main aim of this learning algorithm is to find the dependency of one
data item on another data item and map those variables accordingly so
that it can generate maximum profit. This algorithm is mainly applied
in Market Basket analysis, Web usage mining, continuous production,
etc.
Some popular algorithms of Association rule learning are
Apriori Algorithm
Eclat
FP-growth algorithm.
Types of machine learning
2. Unsupervised Machine Learning
Advantages and Disadvantages of Unsupervised Learning Algorithm
Advantages -
 These algorithms can be used for complicated tasks compared to
the supervised ones because these algorithms work on the unlabelled
dataset.
 Unsupervised algorithms are preferable for various tasks as getting
the unlabelled dataset is easier as compared to the labelled dataset.
Disadvantages:
 The output of an unsupervised algorithm can be less accurate as
the dataset is not labelled, and algorithms are not trained with the exact
output in prior.
 Working with Unsupervised learning is more difficult as it works
with the unlabelled dataset that does not map with the output.
Types of machine learning
3. Semi-supervised Machine Learning
Semi-Supervised learning is a type of Machine Learning algorithm
that lies between Supervised and Unsupervised machine learning. It
represents the intermediate ground between Supervised (With Labelled
training data) and Unsupervised learning (with no labelled training data)
algorithms and uses the combination of labelled and unlabelled
datasets during the training period.
The main aim of Semi-Supervised learning is to effectively use all
the available data, rather than only labelled data like in supervised
learning. Initially, similar data is clustered along with an unsupervised
learning algorithm, and further, it helps to label the unlabelled data into
labelled data.
Types of machine learning
3. Semisupervised Machine Learning
We can imagine these algorithms with an example. Supervised
learning is where a student is under the supervision of an instructor at
home and college. Further, if that student is self-analysing the same
concept without any help from the instructor, it comes under
unsupervised learning. Under semi-supervised learning, the student
has to revise himself after analysing the same concept under the
guidance of an instructor at college.
data.
Types of machine learning
3. Semi-supervised Machine Learning
Advantages and disadvantages of Semi-supervised Learning
Advantages:
• It is simple and easy to understand the algorithm.
• It is highly efficient.
• It is used to solve drawbacks of Supervised and Unsupervised
Learning algorithms.
Disadvantages:
• Iterations results may not be stable.
• We cannot apply these algorithms to network-level data.
• Accuracy is low.
Types of machine learning
4. Reinforcement Machine Learning
Reinforcement learning works on a feedback-based process, in which
an AI agent (A software component) automatically explore its
surrounding by hitting & trail, taking action, learning from experiences,
and improving its performance. Agent gets rewarded for each good
action and get punished for each bad action; hence the goal of
reinforcement learning agent is to maximize the rewards.
In reinforcement learning, there is no labelled data like supervised
learning, and agents learn from their experiences only.
The reinforcement learning process is similar to a human being; for
example, a child learns various things by experiences in his day-to-day
life. An example of reinforcement learning is to play a game, where the
Game is the environment, moves of an agent at each step define states,
and the goal of the agent is to get a high score. Agent receives feedback
in terms of punishment and rewards.
Types of machine learning
4. Reinforcement Machine Learning
Due to its way of working, reinforcement learning is employed in
different fields such as Game theory, Operation Research, Information
theory, multi-agent systems.
Categories of Reinforcement Learning
Reinforcement learning is categorized mainly into two types of
methods/algorithms:
Positive Reinforcement Learning- Positive reinforcement learning
specifies increasing the tendency that the required behaviour would
occur again by adding something. It enhances the strength of the
behaviour of the agent and positively impacts it.
Negative Reinforcement Learning- Negative reinforcement learning
works exactly opposite to the positive RL. It increases the tendency that
the specific behaviour would occur again by avoiding the negative
condition.
Types of machine learning
4. Reinforcement Machine Learning
Advantages and Disadvantages of Reinforcement Learning
Advantages
• It helps in solving complex real-world problems which are difficult
to be solved by general techniques.
• The learning model of RL is similar to the learning of human
beings; hence most accurate results can be found.
• Helps in achieving long term results.
Disadvantage
• RL algorithms are not preferred for simple problems.
• RL algorithms require huge data and computations.
• Too much reinforcement learning can lead to an overload of states
which can weaken the results.
Overfitting
• A statistical model is said to be overfitted when the model does not
make accurate predictions on testing data. When a model gets trained
with so much data, it starts learning from the noise and inaccurate
data entries in our data set.
• And when testing with test data results in High variance. Then the
model does not categorize the data correctly, because of too many
details and noise.
• The causes of overfitting are the non-parametric and non-linear
methods because these types of machine learning algorithms have
more freedom in building the model based on the dataset and
therefore they can really build unrealistic models. A solution to avoid
overfitting is using a linear algorithm if we have linear data or using
the parameters like the maximal depth if we are using decision trees.
• Too much reinforcement learning can lead to an overload of states
which can weaken the results.
Overfitting
Reasons for Overfitting
• High variance and low bias.
• The model is too complex.
• The size of the training data.

Techniques to Reduce Overfitting

• Improving the quality of training data reduces overfitting by
focusing on meaningful patterns, mitigate ( less severe) the risk of
fitting the noise or irrelevant features.
• Increase the training data can improve the model’s ability to
generalize to unseen data and reduce the likelihood of overfitting.
• Reduce model complexity.
• Early stopping during the training phase (have an eye over the loss
over the training period as soon as loss begins to increase stop
training).
Underfitting
• A statistical model or a machine learning algorithm is said to have
underfitting when a model is too simple to capture data
complexities. It represents the inability of the model to learn the
training data effectively result in poor performance both on the
training and testing data.

• In simple terms, an underfit model’s are inaccurate, especially

when applied to new, unseen examples.

• It mainly happens when we uses very simple model with overly

simplified assumptions. To address underfitting problem of the
model, we need to use more complex models, with enhanced
feature representation and less regularization.

Note that the underfitting model has High bias and low variance.
Underfitting
Reasons for Underfitting
• The model is too simple, So it may be not capable to represent the
complexities in the data.
• The input features which is used to train the model is not the
adequate representations of underlying factors influencing the
target variable.
• The size of the training dataset used is not enough.
• Excessive regularization are used to prevent the overfitting, which
constraint the model to capture the data well.
• Features are not scaled.

Techniques to Reduce Underfitting

• Increase model complexity.
• Increase the number of features
• Remove noise from the data.
Bias-Variance Trade-Off

• The bias is known as the difference between the prediction of the

values by the Machine Learning model and the correct value.
Being high in biasing gives a large error in training as well as
testing data. It recommended that an algorithm should always be
low-biased to avoid the problem of underfitting.

• The variability of model prediction for a given data point which

tells us the spread of our data is called the variance of the model.
The model with high variance has a very complex fit to the training
data and thus is not able to fit accurately on the data which it
hasn’t seen before. As a result, such models perform very well on
training data but have high error rates on test data. When a model
is high on variance, it is then said to as Overfitting of Data.
Bias-Variance Trade-Off

While building the machine learning model, it is really important to take

care of bias and variance in order to avoid overfitting and underfitting in
the model. If the model is very simple with fewer parameters, it may
have low variance and high bias. Whereas, if the model has a large
number of parameters, it will have high variance and low bias. So, it is
required to make a balance between bias and variance errors, and this
balance between the bias error and variance error is known as the Bias-
Variance trade-off.

For an accurate prediction of the model, algorithms need a low variance

and low bias. But this is not possible because bias and variance are
related to each other:
If we decrease the variance, it will increase the bias.
If we decrease the bias, it will increase the variance.
Bias-Variance Trade-Off

Bias-Variance trade-off is a central issue in supervised learning. Ideally,

we need a model that accurately captures the regularities in training
data and simultaneously generalizes well with the unseen dataset.
Unfortunately, doing this is not possible simultaneously. Because a
high variance algorithm may perform well with training data, but it may
lead to overfitting to noisy data. Whereas, high bias algorithm generates
a much simple model that may not even capture important regularities
in the data. So, we need to find a sweet spot between bias and variance
to make an optimal model.
Bootstrapping
• In statistics and machine learning, bootstrapping is a resampling
technique that involves repeatedly drawing samples from our source
data with replacement, often to estimate a population parameter. By
“with replacement”, we mean that the same data point may be
included in our resampled dataset multiple times. It is used to
determine various parameters of a population.
• A bootstrap plot is a graphical representation of the distribution of a
statistic calculated from a sample of data. It is often used to visualize
the variability and uncertainty of a statistic, such as the mean or
standard deviation, by showing the distribution of the statistic over
many bootstrapped samples of the data.
• The bootstrap plot is a powerful tool for understanding the
uncertainty in a statistic, especially when the underlying distribution
of the data is unknown or complex. It can also be used to generate
confidence intervals for a statistic and to compare the distributions of
different statistics.
Bootstrapping

Advantages of bootstrap
• It is a non-parametric method, which means it does not require
any assumptions about the underlying distribution of the data.
• It can be used to estimate standard errors and confidence intervals
for a wide range of statistics.
• It can be used to estimate the uncertainty of a statistic even when
the sample size is small.
• It can be used to perform hypothesis tests and compare the
distributions of different statistics.
• It is widely used in many fields such as statistics, finance, and
machine learning
Bootstrapping

Disadvantages of bootstrap:
• It can be computationally intensive, especially when working with
large datasets.
• It may not be appropriate for all types of data, such as highly
skewed or heavy-tailed distributions.
• It may not be appropriate for estimating the uncertainty of
statistics that have very large variances.
• It may not be appropriate for estimating the uncertainty of
statistics that are not smooth or have very different variances.
• It may not always be a good substitute for other statistical
methods, when large sample sizes are available.
Bagging

• Bagging, also known as bootstrap aggregation, is the ensemble

learning method that is commonly used to reduce variance within a
noisy data set. In bagging, a random sample of data in a training set
is selected with replacement—meaning that the individual data points
can be chosen more than once.

• Bootstrap Aggregating, also known as bagging, is a machine learning

ensemble meta-algorithm designed to improve the stability and
accuracy of machine learning algorithms used in statistical
classification and regression. It decreases the variance and helps to
avoid overfitting. It is usually applied to decision tree methods.
Bagging
Implementation Steps of Bagging
Step 1 Multiple subsets are created from the original data set with equal
tuples, selecting observations with replacement.

Step 2 A base model is created on each of these subsets.

Step 3 Each model is learned in parallel with each training set and
independent of each other.

Step 4 The final predictions are determined by combining the

predictions from all the models.
•
Aggregation

In its simplest form, data aggregation is the process of compiling

typically large amounts of information from a given database and
organizing it into a more consumable and comprehensive medium.
Data aggregation can be applied at any scale, from pivot tables to data
lakes, in order to summarize information and make conclusions based
on data-rich findings.
Because of the growing accessibility to information and importance of
personalization metrics across the enterprise, the application of data
aggregation has become extremely relevant.
Aggregation

In our technologically advanced world, data is constantly evolving,

expanding, and becoming more convoluted with each actioned input
and output. Data is one of the most valuable currencies of our time, but
data without organization, segmentation, and understanding is
essentially useless.
What makes data valuable is the extraction of insights that point to
key trends, results and give a better understanding of the information at
hand. A process in which data is searched, gathered, and presented in a
summarized, report-based form, data aggregation helps organizations to
achieve specific business objectives or conduct process/human analysis
at almost any scale.
Performance Measures in Machine Learning

Evaluating the performance of a Machine learning model is one of

the important steps while building an effective ML model. To evaluate
the performance or quality of the model, different metrics are used, and
these metrics are known as performance metrics or evaluation
metrics. These performance metrics help us understand how well our
model has performed for the given data. In this way, we can improve the
model's performance by tuning the hyper-parameters. Each ML model
aims to generalize well on unseen/new data, and performance metrics
help determine how well the model generalizes on the new dataset.
at almost any scale.
To evaluate the performance of a classification model, different
metrics are used, and some of them are as follows:
Accuracy Confusion Matrix Precision Recall
F-Score AUC(Area Under the Curve)-ROC
Performance Measures in Machine Learning

I. Accuracy
The accuracy metric is one of the simplest Classification metrics to
implement, and it can be determined as the number of correct
predictions to the total number of predictions.

It can be formulated as

To implement an accuracy metric, we can compare ground truth and

predicted values in a loop
It is good to use the Accuracy metric when the target variable classes
in data are approximately balanced. For example, if 60% of classes in a
fruit image dataset are of Apple, 40% are Mango. In this case, if the
model is asked to predict whether the image is of Apple or Mango, it will
give a prediction with 97% of accuracy.
Performance Measures in Machine Learning

II. Confusion Matrix

A confusion matrix is a tabular representation of prediction
outcomes of any binary classifier, which is used to describe the
performance of the classification model on a set of test data when true
values are known. The confusion matrix is simple to implement, but the
terminologies used in this matrix might be confusing for beginners.
A typical confusion matrix for a binary classifier looks like the below
image(However, it can be extended to use for classifiers with more than
two classes).
Performance Measures in Machine Learning

II. Confusion Matrix

In general, the table is divided into four terminologies, which are as

follows -
True Positive(TP) - In this case, the prediction outcome is true, and it is
true in reality, also.
True Negative(TN)- in this case, the prediction outcome is false, and it
is false in reality, also.
False Positive(FP)- In this case, prediction outcomes are true, but they
are false in actuality.
False Negative(FN) - In this case, predictions are false, and they are true
in actuality.
Performance Measures in Machine Learning

III. Precision

The precision metric is used to overcome the limitation of Accuracy. The

precision determines the proportion of positive prediction that was
actually correct. It can be calculated as the True Positive or predictions
that are actually true to the total positive predictions (True Positive and
False Positive).
Performance Measures in Machine Learning

IV. Recall or Sensitivity

It is also similar to the Precision metric; however, it aims to calculate

the proportion of actual positive that was identified incorrectly. It can be
calculated as True Positive or predictions that are actually true to the
total number of positives, either correctly predicted as positive or
incorrectly predicted as negative (true Positive and false negative).
The formula for calculating Recall is given below:

In simple words, if we maximize precision, it will minimize the FP errors,

and if we maximize recall, it will minimize the FN error.
Performance Measures in Machine Learning

V. F-Scores

F-score or F1 Score is a metric to evaluate a binary classification model

on the basis of predictions that are made for the positive class. It is
calculated with the help of Precision and Recall. It is a type of single
score that represents both Precision and Recall. So, the F1 Score can be
calculated as the harmonic mean of both precision and Recall, assigning
equal weight to each of them.
The formula for calculating the F1 score is given below:
Performance Measures in Machine Learning

VI. AUC-ROC
Sometimes we need to visualize the performance of the classification
model on charts; then, we can use the AUC-ROC curve. It is one of the
popular and important metrics for evaluating the performance of the
classification model.
Firstly, let's understand ROC (Receiver Operating Characteristic curve)
curve. ROC represents a graph to show the performance of a
classification model at different threshold levels. The curve is plotted
between two parameters, which are:
True Positive Rate
False Positive Rate
TPR or true Positive rate is a synonym for Recall, hence can be
calculated as:
Performance Metrics for Regression
Regression is a supervised learning technique that aims to find the
relationships between the dependent and independent variables. A
predictive regression model predicts a numeric or discrete value. The
metrics used for regression are different from the classification metrics.
It means we cannot use the Accuracy metric (explained above) to
evaluate a regression model; instead, the performance of a Regression
model is reported as errors in the prediction. Following are the popular
metrics that are used to evaluate the performance of Regression models.
• Mean Absolute Error
• Mean Squared Error
• R2 Score
• Adjusted R2
Performance Metrics for Regression
1) Mean Absolute Error (MAE)
Mean Absolute Error or MAE is one of the simplest metrics, which
measures the absolute difference between actual and predicted values,
where absolute means taking a number as Positive.
To understand MAE, let's take an example of Linear Regression, where
the model draws a best fit line between dependent and independent
variables. To measure the MAE or error in prediction, we need to
calculate the difference between actual values and predicted values. But
in order to find the absolute error for the complete dataset, we need to
find the mean absolute of the complete dataset.
The below formula is used to calculate MAE:
Performance Metrics for Regression
1) Mean Absolute Error (MAE)

Here,
Y is the Actual outcome, Y' is the predicted outcome, and N is the total
number of data points.
MAE is much more robust for the outliers. One of the limitations of MAE
is that it is not differentiable, so for this, we need to apply different
optimizers such as Gradient Descent. However, to overcome this
limitation, another metric can be used, which is Mean Squared Error or
MSE.
Performance Metrics for Regression
II. Mean Squared Error
Mean Squared error or MSE is one of the most suitable metrics for
Regression evaluation. It measures the average of the Squared
difference between predicted values and the actual value given by the
model. Since in MSE, errors are squared, therefore it only assumes
non-negative values, and it is usually positive and non-zero.
Moreover, due to squared differences, it penalizes small errors also, and
hence it leads to over-estimation of how bad the model is.
MSE is a much-preferred metric compared to other regression metrics
as it is differentiable and hence optimized better.
The formula for calculating MSE is given below:

Here, Y is the Actual outcome, Y' is the predicted outcome, and N is the
total number of data points.
Performance Metrics for Regression
III. R Squared Error
R squared error is also known as Coefficient of Determination, which is
another popular metric used for Regression model evaluation. The R-
squared metric enables us to compare our model with a constant
baseline to determine the performance of the model. To select the
constant baseline, we need to take the mean of the data and draw the
line at the mean.
The R squared score will always be less than or equal to 1 without
concerning if the values are too large or small.
Performance Metrics for Regression
IV. Adjusted R Squared
Adjusted R squared, as the name suggests, is the improved version of R
squared error. R square has a limitation of improvement of a score on
increasing the terms, even though the model is not improving, and it
may mislead the data scientists.
To overcome the issue of R square, adjusted R squared is used, which
will always show a lower value than R². It is because it adjusts the
values of increasing predictors and only shows improvement if there is a
real improvement.
We can calculate the adjusted R squared as follows:

SCSA3015 Deep Learning Unit 1 Notes PDF
No ratings yet
SCSA3015 Deep Learning Unit 1 Notes PDF
30 pages
Machine Learning with Clustering: A Visual Guide for Beginners with Examples in Python
From Everand
Machine Learning with Clustering: A Visual Guide for Beginners with Examples in Python
Artem Kovera
No ratings yet
Estimating The Errors On Measured Entropy and Mutual Information
No ratings yet
Estimating The Errors On Measured Entropy and Mutual Information
10 pages
FDS Assignment
No ratings yet
FDS Assignment
76 pages
Learning Algorithms
No ratings yet
Learning Algorithms
28 pages
Types of Machine Learning - Tpoint Tech
No ratings yet
Types of Machine Learning - Tpoint Tech
10 pages
Artificial Intelligence Presentation
No ratings yet
Artificial Intelligence Presentation
10 pages
Machine Learning - Part -1
No ratings yet
Machine Learning - Part -1
17 pages
Session 3 Types of Machine Learning (1)
No ratings yet
Session 3 Types of Machine Learning (1)
22 pages
ML Unit 1
No ratings yet
ML Unit 1
19 pages
Machine Learning Unit-I
No ratings yet
Machine Learning Unit-I
41 pages
Unit-I
No ratings yet
Unit-I
8 pages
Machine Learning Is The Branch of
No ratings yet
Machine Learning Is The Branch of
12 pages
Unit 1
No ratings yet
Unit 1
19 pages
Module 1
No ratings yet
Module 1
122 pages
machine learning and AI
No ratings yet
machine learning and AI
13 pages
Unit 1
No ratings yet
Unit 1
47 pages
AI(Part-II)
No ratings yet
AI(Part-II)
11 pages
Unit-5 Machine Learning
No ratings yet
Unit-5 Machine Learning
25 pages
Machine L
No ratings yet
Machine L
29 pages
CH 01 Intro To ML - Updated
No ratings yet
CH 01 Intro To ML - Updated
66 pages
Machine Learning Presentation
No ratings yet
Machine Learning Presentation
20 pages
Types of Machine Learning
No ratings yet
Types of Machine Learning
14 pages
ML NOTES
No ratings yet
ML NOTES
26 pages
MLT Unit 1
No ratings yet
MLT Unit 1
15 pages
Machine Learning
No ratings yet
Machine Learning
23 pages
Data Science Solutions IA 2
No ratings yet
Data Science Solutions IA 2
16 pages
Machine Learning
No ratings yet
Machine Learning
35 pages
Module 1 PPT
No ratings yet
Module 1 PPT
122 pages
ML Unit 1
No ratings yet
ML Unit 1
42 pages
Unit 1 Notes
No ratings yet
Unit 1 Notes
29 pages
Computer Science & Engineering: Apex Institute of Technology
No ratings yet
Computer Science & Engineering: Apex Institute of Technology
16 pages
ML
No ratings yet
ML
17 pages
ML unit -2
No ratings yet
ML unit -2
36 pages
Module1 And2
No ratings yet
Module1 And2
122 pages
chp5 (14) fam
No ratings yet
chp5 (14) fam
13 pages
Chapter 1 ML
No ratings yet
Chapter 1 ML
30 pages
ML Answers
No ratings yet
ML Answers
2 pages
Meta Motion Fitness Tracker 241109 213742[1] Removed
No ratings yet
Meta Motion Fitness Tracker 241109 213742[1] Removed
20 pages
ML Unit 1 ppt
No ratings yet
ML Unit 1 ppt
71 pages
Chapter 3
No ratings yet
Chapter 3
28 pages
ML Unit 1
No ratings yet
ML Unit 1
42 pages
All algos_of_ML
No ratings yet
All algos_of_ML
31 pages
Business Data Mining Week 5
No ratings yet
Business Data Mining Week 5
19 pages
Machine Learning and Web Scraping Lesson02
No ratings yet
Machine Learning and Web Scraping Lesson02
29 pages
Machine Learning - Data
No ratings yet
Machine Learning - Data
11 pages
A Guide To The Types of Machine Learning Algorithms - SAS UK
No ratings yet
A Guide To The Types of Machine Learning Algorithms - SAS UK
5 pages
ML Doc1
No ratings yet
ML Doc1
14 pages
Chapter Five
No ratings yet
Chapter Five
178 pages
Null 5
No ratings yet
Null 5
16 pages
Machine Learning Algorithms
No ratings yet
Machine Learning Algorithms
3 pages
Unit 1
No ratings yet
Unit 1
21 pages
UNIT4
No ratings yet
UNIT4
12 pages
Full Notes
No ratings yet
Full Notes
37 pages
ML Unit 1
No ratings yet
ML Unit 1
42 pages
INTRODUCTION TO MACHINE LEARNING
No ratings yet
INTRODUCTION TO MACHINE LEARNING
31 pages
Machine Learning Intro 2
No ratings yet
Machine Learning Intro 2
15 pages
AIML - Practical No.01
No ratings yet
AIML - Practical No.01
9 pages
machine learning
No ratings yet
machine learning
29 pages
Machine Learning Types
No ratings yet
Machine Learning Types
30 pages
MACHINE LEARNING FOR BEGINNERS: A Practical Guide to Understanding and Applying Machine Learning Concepts (2023 Beginner Crash Course)
From Everand
MACHINE LEARNING FOR BEGINNERS: A Practical Guide to Understanding and Applying Machine Learning Concepts (2023 Beginner Crash Course)
Elaine Tate
No ratings yet
C 24 05
No ratings yet
C 24 05
58 pages
Factors Affecting Consumer Preferences of Domestic Garments in Ethiopia Case Study of Addis Ababa
No ratings yet
Factors Affecting Consumer Preferences of Domestic Garments in Ethiopia Case Study of Addis Ababa
100 pages
Get Solution Manual For Social Statistics For A Diverse Society Seventh Edition Free All Chapters Available
100% (14)
Get Solution Manual For Social Statistics For A Diverse Society Seventh Edition Free All Chapters Available
43 pages
Monte Carlo Simulation of Correlated Random Variables
No ratings yet
Monte Carlo Simulation of Correlated Random Variables
7 pages
Get (Ebook PDF) Biostatistics With R An Introduction To Statistics Through Biological Data PDF Ebook With Full Chapters Now
100% (5)
Get (Ebook PDF) Biostatistics With R An Introduction To Statistics Through Biological Data PDF Ebook With Full Chapters Now
51 pages
Ensemble Methods
No ratings yet
Ensemble Methods
13 pages
Written Report Guideline Course: Fundamentals of Statistics Course Code: STA111 Project Outcome
No ratings yet
Written Report Guideline Course: Fundamentals of Statistics Course Code: STA111 Project Outcome
3 pages
CLUSTERING GRID-BASED METHODS Elsayed Hemayed Data Mining Course
No ratings yet
CLUSTERING GRID-BASED METHODS Elsayed Hemayed Data Mining Course
14 pages
Stats Notes Books
No ratings yet
Stats Notes Books
10 pages
Data Analysis with Microsoft Excel Updated for Office 2007 3rd Edition by Kenneth N Berk, Patrick M Carey ISBN 0538494670 9780538494670 - Download the full set of chapters carefully compiled
100% (7)
Data Analysis with Microsoft Excel Updated for Office 2007 3rd Edition by Kenneth N Berk, Patrick M Carey ISBN 0538494670 9780538494670 - Download the full set of chapters carefully compiled
85 pages
Statistical Physics Methods in Optimization and Machine Learning Notes
No ratings yet
Statistical Physics Methods in Optimization and Machine Learning Notes
279 pages
An Assessment of Parking Demand and Facilities of Rajkot Regional Bus Stand
100% (1)
An Assessment of Parking Demand and Facilities of Rajkot Regional Bus Stand
6 pages
GT Summary
No ratings yet
GT Summary
92 pages
Silabo Sobre Redes Sociales
No ratings yet
Silabo Sobre Redes Sociales
3 pages
Action Plan
No ratings yet
Action Plan
3 pages
Assignment 15 - Statistics I
No ratings yet
Assignment 15 - Statistics I
6 pages
2017-18 Mba Syllabus 2
No ratings yet
2017-18 Mba Syllabus 2
56 pages
Torrance
No ratings yet
Torrance
15 pages
Siltanu 1 Proposal
No ratings yet
Siltanu 1 Proposal
25 pages
KRAI Practical
No ratings yet
KRAI Practical
14 pages
Poisson-Mishra Distribution: Binod Kumar Sah
No ratings yet
Poisson-Mishra Distribution: Binod Kumar Sah
6 pages
8-IJFAES Vol. 3 No.5-May 2024-Paper7-Ms. Wejdan
No ratings yet
8-IJFAES Vol. 3 No.5-May 2024-Paper7-Ms. Wejdan
45 pages
Education Syllabus For PHD Course Work 21032022
No ratings yet
Education Syllabus For PHD Course Work 21032022
33 pages
Tofik Rahmeto Issa
No ratings yet
Tofik Rahmeto Issa
100 pages
Business Statistics and Analytics in Decision Making: Module 5: Hypothesis Testing
No ratings yet
Business Statistics and Analytics in Decision Making: Module 5: Hypothesis Testing
21 pages
Class Interval Frequency Class Boundary CM (X) CF: SK SK SK SK SK SK SK SK
No ratings yet
Class Interval Frequency Class Boundary CM (X) CF: SK SK SK SK SK SK SK SK
2 pages
BN2102 7-10
No ratings yet
BN2102 7-10
24 pages
Vorticity and Turbulence
No ratings yet
Vorticity and Turbulence
34 pages

Pdf&rendition 1 3

Uploaded by

Pdf&rendition 1 3

Uploaded by

Deccan Education Society’s

WILLINGDON COLLEGE, SANGLI

B.SC. Part - III

Subject – Machine Learning

3. Machine Learning Modelling

Let's take an example to understand it more preciously; suppose

Techniques to Reduce Overfitting

• In simple terms, an underfit model’s are inaccurate, especially

• It mainly happens when we uses very simple model with overly

Techniques to Reduce Underfitting

• The bias is known as the difference between the prediction of the

• The variability of model prediction for a given data point which

While building the machine learning model, it is really important to take

For an accurate prediction of the model, algorithms need a low variance

Bias-Variance trade-off is a central issue in supervised learning. Ideally,

• Bagging, also known as bootstrap aggregation, is the ensemble

• Bootstrap Aggregating, also known as bagging, is a machine learning

Step 2 A base model is created on each of these subsets.

Step 4 The final predictions are determined by combining the

In its simplest form, data aggregation is the process of compiling

In our technologically advanced world, data is constantly evolving,

Evaluating the performance of a Machine learning model is one of

To implement an accuracy metric, we can compare ground truth and

II. Confusion Matrix

II. Confusion Matrix

In general, the table is divided into four terminologies, which are as

The precision metric is used to overcome the limitation of Accuracy. The

IV. Recall or Sensitivity

It is also similar to the Precision metric; however, it aims to calculate

In simple words, if we maximize precision, it will minimize the FP errors,

F-score or F1 Score is a metric to evaluate a binary classification model

You might also like