0% found this document useful (0 votes)
98 views51 pages

Unit 4 Machine Learning

The document discusses various applications of machine learning, including social media features, product recommendations, image recognition, sentiment analysis, virtual personal assistants, self-driving cars, and entertainment. It also defines machine learning, describes the types of machine learning including supervised and unsupervised learning, and covers terminology related to machine learning models, algorithms, datasets, and more.

Uploaded by

shahidshaikh9936
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
98 views51 pages

Unit 4 Machine Learning

The document discusses various applications of machine learning, including social media features, product recommendations, image recognition, sentiment analysis, virtual personal assistants, self-driving cars, and entertainment. It also defines machine learning, describes the types of machine learning including supervised and unsupervised learning, and covers terminology related to machine learning models, algorithms, datasets, and more.

Uploaded by

shahidshaikh9936
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 51

Machine Learning

Unit 4

Overview of Machine learning concepts – Over fitting and train/test splits,


Types of Machine learning – Supervised, Unsupervised, Reinforcement learning

Vinay S. Prabhavalkar
Application of Machine Learning

1. Social Media Features

• Social media platforms use machine


learning algorithms and approaches to
create some attractive and excellent
features. For instance, Facebook notices
and records your activities, chats, likes,
and comments, and the time you spend
on specific kinds of posts. Machine
learning learns from your own
experience and makes friends and page
suggestions for your profile.

Vinay S. Prabhavalkar
Application of Machine Learning

2. Product Recommendations

• Product recommendation is one of the


most popular and known applications of
machine learning. Product
recommendation is one of the stark
features of almost every e-commerce
website today, which is an advanced
application of machine learning
techniques. Using machine learning and
AI, websites track your behavior based
on your previous purchases, searching
patterns, and cart history, and then
make product recommendations.

Vinay S. Prabhavalkar
Application of Machine Learning

3. Image Recognition

• Image recognition, which is an approach for cataloging and detecting a


feature or an object in the digital image, is one of the most significant and
notable machine learning and AI techniques. This technique is being
adopted for further analysis, such as pattern recognition, face detection,
and face recognition.

Vinay S. Prabhavalkar
Application of Machine Learning

4. Sentiment Analysis
• Sentiment analysis is one of the most necessary applications of machine
learning. Sentiment analysis is a real-time machine learning application
that determines the emotion or opinion of the speaker or the writer. For
instance, if someone has written a review or email (or any form of a
document), a sentiment analyzer will instantly find out the actual thought
and tone of the text. This sentiment analysis application can be used to
analyze a review based website, decision-making applications, etc

Vinay S. Prabhavalkar
Application of Machine Learning
5. Virtual Personal Assistants
As the name suggests, Virtual Personal Assistants assist in finding useful
information, when asked via text or voice. Few of the major applications of
Machine Learning here are:
Speech Recognition
Speech to Text Conversion
Natural Language Processing
Text to Speech Conversion

Vinay S. Prabhavalkar
Application of Machine Learning
6. Self Driving Cars
• Well, here is one of the coolest application of Machine Learning. It’s here
and people are already using it. Machine Learning plays a very important
role in Self Driving Cars and I’m sure you guys might have heard
about Tesla. The leader in this business and their current Artificial
Intelligence is driven by hardware manufacturer NVIDIA, which is based
on Unsupervised Learning Algorithm.
• NVIDIA stated that they didn’t train their model to detect people or any
object as such. The model works on Deep Learning and it crowdsources
data from all of its vehicles and its drivers. It uses internal and external
sensors which are a part of IOT. According to the data gathered by
McKinsey, the automotive data will hold a tremendous value of $750
Billion.

Vinay S. Prabhavalkar
Application of Machine Learning
7. Entertainment
Companies such as Netflix, Amazon, YouTube, and Spotify give
relevant movies, songs, and video recommendations to enhance their
customer experience.
This is all thanks to Deep Learning. Based on a person’s browsing
history, interest, and behavior, online streaming companies give
suggestions to help them make product and service choices.
Deep learning techniques are also used to add sound to silent movies
and generate subtitles automatically.

Vinay S. Prabhavalkar
What is Machine Learning
• Definition of Machine Learning

• Machine learning is an application of AI that enables systems to learn and


improve from experience without being explicitly programmed. Machine
learning focuses on developing computer programs that can access data
and use it to learn for themselves.

• Machine learning is a branch of artificial intelligence (AI) and computer


science which focuses on the use of data and algorithms to imitate the
way that humans learn, gradually improving its accuracy.

• Machine learning (ML) is defined as a discipline of artificial intelligence


(AI) that provides machines the ability to automatically learn from data
and past experiences to identify patterns and make predictions with
minimal human intervention.

Vinay S. Prabhavalkar
What is Machine Learning
• Definition of Machine Learning

• A computer program is said to learn from experience E with respect to


some class of tasks T and performance measure P, if its performance at
tasks in T, as measured by P, improves with experience E.

• Task: A task is defined as the main problem in which we are interested.


This task/problem can be related to the predictions and recommendations
and estimations, etc.
• Experience: It is defined as learning from historical or past data and used
to estimate and resolve future tasks.
• Performance: It is defined as the capacity of any machine to resolve any
machine learning task or problem and provide the best outcome for the
same. However, performance is dependent on the type of machine
learning problems.

Vinay S. Prabhavalkar
Terminology in Machine Learning
• An Algorithm is a set of rules that a machine follows to achieve a particular goal.
• Machine Learning is a set of methods that allow computers to learn from data to
make and improve predictions (for example cancer, weekly sales, credit default).
• A Learner or Machine Learning Algorithm is the program used to learn a
machine learning model from data. Another name is "inducer" (e.g. "tree
inducer").
• A Machine Learning Model is the learned program that maps inputs to
predictions. A trained machine is also called as a Model.
• A Dataset is a table with the data from which the machine learns. An Instance is
a row in the dataset. A feature is a column in the dataset.
• The Target is the information the machine learns to predict. In mathematical
formulas, the target is usually called y.
• The Prediction is what the machine learning model "guesses" what the target
value should be based on the given features.
• A Training set is used to train a machine.
• A Test Set is used to test a already trained machine.
• A validation set is used to verify a trained machine.
Vinay S. Prabhavalkar
https://christophm.github.io/interpretable-ml-book/terminology.html
Terminology in Machine Learning

Vinay S. Prabhavalkar
Terminology in Machine Learning

Vinay S. Prabhavalkar
Terminology in Machine Learning
• Bias is the gap between predicted value by the model and the actual or target
value.
• Variance tells how scattered the predicted values are.

Vinay S. Prabhavalkar
https://christophm.github.io/interpretable-ml-book/terminology.html
Types of Machine Learning

Vinay S. Prabhavalkar
Types of Machine Learning

1. Supervised learning:

It is applicable when a machine has sample data, i.e., input as well as output
data with correct labels. Correct labels are used to check the correctness of
the model using some labels and tags.
Supervised learning technique helps us to predict future events with the help
of past experience and labeled examples. Initially, it analyses the known
training dataset, and later it introduces an inferred function that makes
predictions about output values. Further, it also predicts errors during this
entire learning process and also corrects those errors through algorithms.

Example: Let's assume we have a set of images tagged as ''dog''. A machine


learning algorithm is trained with these dog images so it can easily distinguish
whether an image is a dog or not.

Vinay S. Prabhavalkar
Types of Machine Learning
1. Supervised learning:

Vinay S. Prabhavalkar
Types of Machine Learning

2. Unsupervised Learning:

As the name suggests, unsupervised learning is a machine learning technique


in which models are not supervised using training dataset. Instead, models
itself find the hidden patterns and insights from the given data. It can be
compared to learning which takes place in the human brain while learning
new things.
Unsupervised learning cannot be directly applied to a regression or
classification problem because unlike supervised learning, we have the input
data but no corresponding output data.

Example: Suppose the unsupervised learning algorithm is given an input


dataset containing images of different types of cats and dogs. The algorithm
is never trained upon the given dataset, which means it does not have any
idea about the features of the dataset. The task of the unsupervised learning
algorithm is to identify the image features on their own. Unsupervised
learning algorithm will perform this task by clustering the image dataset into
the groups according to similarities between images.
Vinay S. Prabhavalkar
Types of Machine Learning

2. Unsupervised Learning:
Here, we have taken an unlabeled input data, which means it is not
categorized and corresponding outputs are also not given. Now, this
unlabeled input data is fed to the machine learning model in order to
train it. Firstly, it will interpret the raw data to find the hidden patterns
from the data and then will apply suitable algorithms such as k-means
clustering, Decision tree, etc.

Vinay S. Prabhavalkar
Types of Machine Learning
3. Reinforcement Learning:

Reinforcement Learning is a feedback-based machine learning technique.

In such type of learning, agents (computer programs) need to explore the


environment, perform actions, and on the basis of their actions, they get
rewards as feedback.

For each good action, they get a positive reward, and for each bad action,
they get a negative reward.

The goal of a Reinforcement learning agent is to maximize the positive


rewards.

Since there is no labeled data, the agent is bound to learn by its experience
only.
Vinay S. Prabhavalkar
Types of Machine Learning
3. Reinforcement Learning:

An RL problem can be best explained through games. Let’s take the game
of PacMan where the goal of the agent(PacMan) is to eat the food in the grid
while avoiding the ghosts on its way. In this case, the grid world is the
interactive environment for the agent where it acts. Agent receives a reward
for eating food and punishment if it gets killed by the ghost (loses the game).
The states are the location of the agent in the grid world and the total
cumulative reward is the agent winning the game.

Vinay S. Prabhavalkar
Types of Machine Learning

4. Semi-supervised Learning:

Semi-supervised Learning is an intermediate technique of both supervised


and unsupervised learning. It performs actions on datasets having few labels
as well as unlabeled data. However, it generally contains unlabeled data.
Hence, it also reduces the cost of the machine learning model as labels are
costly, but for corporate purposes, it may have few labels. Further, it also
increases the accuracy and performance of the machine learning model.

Semi-supervised Learning is an intermediate technique of both supervised


and unsupervised learning. It performs actions on datasets having few labels
as well as unlabeled data. However, it generally contains unlabeled data.
Hence, it also reduces the cost of the machine learning model as labels are
costly, but for corporate purposes, it may have few labels. Further, it also
increases the accuracy and performance of the machine learning model.

Vinay S. Prabhavalkar
Machine Learning Problem Categories

Vinay S. Prabhavalkar
Machine Learning Problem Categories

1. Classification:

Classification is a task that requires the use of machine learning algorithms


that learn how to assign a class label to examples from the problem domain.
An easy to understand example is classifying emails as “spam” or “not spam.”

Following are the types of classifications:-


1. Classification Predictive Modelling
2. Binary Classification
3. Multi-Class Classification
4. Multi-Label Classification
5. Imbalanced Classification

Vinay S. Prabhavalkar
Machine Learning Problem Categories

1. Classification:

1. Classification Predictive Modelling: In machine learning, classification


refers to a predictive modeling problem where a class label is predicted
for a given example of input data.

Examples of classification problems include:

i. Given an example, classify if it is spam or not.


ii. Given a handwritten character, classify it as one of the known
characters.
iii. Given recent user behavior, classify as churn or not.

Vinay S. Prabhavalkar
Machine Learning Problem Categories
1. Classification:
2. Binary Classification: It refers to those classification tasks that have two
class labels.
Examples include:
i. Email spam detection (spam or not).
ii. Churn prediction (churn or not).
iii. Conversion prediction (buy or not).

• Typically, binary classification tasks involve one class that is the normal
state and another class that is the abnormal state.
• Bernoulli probability distribution is used to classify such a problem.
• Popular algorithms that can be used for binary classification include:

▪ Logistic Regression
▪ k-Nearest Neighbors
▪ Decision Trees
▪ Support Vector Machine
▪ Naive Bayes Vinay S. Prabhavalkar
Machine Learning Problem Categories
1. Classification:
3. Multi-Class Classification: It refers to those classification tasks that have
more than two class labels.
Examples include:
i. Face classification.
ii. Plant species classification.
iii. Optical character recognition.

• Unlike binary classification, multi-class classification does not have the


notion of normal and abnormal outcomes. Instead, examples are classified
as belonging to one among a range of known classes.

• The number of class labels may be very large on some problems. For
example, a model may predict a photo as belonging to one among
thousands or tens of thousands of faces in a face recognition system.

Vinay S. Prabhavalkar
Machine Learning Problem Categories
• Multinoulli probability distribution is used to classify such a problem.
• Popular algorithms that can be used for multi-class classification include:
i. k-Nearest Neighbors.
ii. Decision Trees.
iii. Naive Bayes.
iv. Random Forest.
v. Gradient Boosting.

Vinay S. Prabhavalkar
Machine Learning Problem Categories
1. Classification:
4. Multi-Label Classification: It refers to those classification tasks that have
two or more class labels, where one or more class labels may be
predicted for each example.

• Consider the example of photo classification, where a given photo may


have multiple objects in the scene and a model may predict the presence
of multiple known objects in the photo, such as “bicycle,” “apple,”
“person,” etc.

• Special multi-label versions of the algorithms given below are used for
classification:-
i. Multi-label Decision Trees
ii. Multi-label Random Forests
iii. Multi-label Gradient Boosting

Vinay S. Prabhavalkar
Machine Learning Problem Categories
1. Classification:
5. Imbalanced Classification: It refers to classification tasks where the
number of examples in each class is unequally distributed.
• Typically, imbalanced classification tasks are binary classification tasks
where the majority of examples in the training dataset belong to the
normal class and a minority of examples belong to the abnormal class.
Examples include:
i. Fraud detection.
ii. Outlier detection.
iii. Medical diagnostic tests
• Specialized techniques may be used to change the composition of samples in
the training dataset by under-sampling the majority class or oversampling
the minority class.
Examples include:
i. Random Under-sampling
ii. SMOTE Oversampling
Vinay S. Prabhavalkar
Machine Learning Problem Categories
• Specialized modeling algorithms may be used that pay more attention to
the minority class when fitting the model on the training dataset, such as
cost-sensitive machine learning algorithms.
i. Examples include:
ii. Cost-sensitive Logistic Regression
iii. Cost-sensitive Decision Trees.

Vinay S. Prabhavalkar
Machine Learning Problem Categories
2. Clustering:

• Clustering or cluster analysis is a machine learning technique, which


groups the unlabelled dataset. It can be defined as "A way of grouping
the data points into different clusters, consisting of similar data points.
The objects with the possible similarities remain in a group that has less
or no similarities with another group."
• It does it by finding some similar patterns in the unlabelled dataset such
as shape, size, color, behavior, etc., and divides them as per the presence
and absence of those similar patterns.
• It is an unsupervised learning method, hence no supervision is provided
to the algorithm, and it deals with the unlabeled dataset.
• After applying this clustering technique, each cluster or group is provided
with a cluster-ID. ML system can use this id to simplify the processing of
large and complex datasets.
• The clustering technique is commonly used for statistical data analysis.
Vinay S. Prabhavalkar
Machine Learning Problem Categories
2. Clustering:
Example: Let's understand the clustering technique with the real-world example of
Mall:
• When we visit any shopping mall, we can observe that the things with similar
usage are grouped together. Such as the t-shirts are grouped in one section, and
trousers are at other sections, similarly, at vegetable sections, apples, bananas,
Mangoes, etc., are grouped in separate sections, so that we can easily find out
the things. The clustering technique also works in the same way. Other examples
of clustering are grouping documents according to the topic.
• The clustering technique can be widely used in various tasks. Some most
common uses of this technique are:
– Market Segmentation
– Statistical data analysis
– Social network analysis
– Image segmentation
– Anomaly detection, etc.
• Apart from these general usages, it is used by the Amazon in its recommendation
system to provide the recommendations as per the past search of
products. Netflix also uses this technique to recommend the movies and web-
Vinay S. Prabhavalkar
series to its users as per the watch history.
Machine Learning Problem Categories
2. Clustering:

The below diagram explains the working of the clustering algorithm. We can see
the different fruits are divided into several groups with similar properties.

Vinay S. Prabhavalkar
Machine Learning Problem Categories
2. Clustering:

• Types of Clustering Methods: The clustering methods are broadly divided


into Hard clustering (data points belongs to only one group) and Soft
Clustering (data points can belong to another group also). But there are also
other various approaches of Clustering exist.

• Below are the main clustering methods used in Machine learning:


a. Partitioning Clustering
b. Density-Based Clustering
c. Distribution Model-Based Clustering
d. Hierarchical Clustering
e. Fuzzy Clustering

Vinay S. Prabhavalkar
Machine Learning Problem Categories
2. Clustering:
a. Partitioning Clustering:-

• It is a type of clustering that divides the


data into non-hierarchical groups. It is
also known as the centroid-based
method. The most common example of
partitioning clustering is the K-Means
Clustering algorithm.
• In this type, the dataset is divided into
a set of k groups, where K is used to
define the number of pre-defined
groups. The cluster center is created in
such a way that the distance between
the data points of one cluster is
minimum as compared to another
cluster centroid. Vinay S. Prabhavalkar
Machine Learning Problem Categories
2. Clustering:
b. Density-Based Clustering:-

• The density-based clustering method


connects the highly-dense areas into
clusters, and the arbitrarily shaped
distributions are formed as long as the
dense region can be connected. This
algorithm does it by identifying
different clusters in the dataset and
connects the areas of high densities
into clusters. The dense areas in data
space are divided from each other by
sparser areas.
• These algorithms can face difficulty in
clustering the data points if the dataset
has varying densities and high
dimensions.
Vinay S. Prabhavalkar
Machine Learning Problem Categories
2. Clustering:

c. Distribution Model-Based Clustering:-

• In the distribution model-based


clustering method, the data is divided
based on the probability of how a
dataset belongs to a particular
distribution. The grouping is done by
assuming some distributions
commonly Gaussian Distribution.
• The example of this type is
the Expectation-Maximization
Clustering algorithm that uses
Gaussian Mixture Models (GMM).

Vinay S. Prabhavalkar
Machine Learning Problem Categories
2. Clustering:
d. Hierarchical Clustering:-

• Hierarchical clustering can be used as


an alternative for the partitioned
clustering as there is no requirement of
pre-specifying the number of clusters
to be created.
• In this technique, the dataset is divided
into clusters to create a tree-like
structure, which is also called
a dendrogram.
• The observations or any number of
clusters can be selected by cutting the
tree at the correct level. The most
common example of this method is
the Agglomerative Hierarchical
algorithm.
Vinay S. Prabhavalkar
Machine Learning Problem Categories
2. Clustering:

e. Fuzzy Clustering:-
• Clustering is a type of soft method in which a data object may belong to
more than one group or cluster.
• Each dataset has a set of membership coefficients, which depend on the
degree of membership to be in a cluster.
• Fuzzy C-means algorithm is the example of this type of clustering; it is
sometimes also known as the Fuzzy k-means algorithm.

Vinay S. Prabhavalkar
Machine Learning Problem Categories
2. Clustering:

Clustering Algorithms:-

• K-Means algorithm: The k-means algorithm is one of the most popular clustering algorithms. It
classifies the dataset by dividing the samples into different clusters of equal variances. The
number of clusters must be specified in this algorithm. It is fast with fewer computations
required, with the linear complexity of O(n).
• Mean-shift algorithm: Mean-shift algorithm tries to find the dense areas in the smooth density
of data points. It is an example of a centroid-based model, that works on updating the
candidates for centroid to be the center of the points within a given region.
• DBSCAN Algorithm: It stands for Density-Based Spatial Clustering of Applications with Noise. It
is an example of a density-based model similar to the mean-shift, but with some remarkable
advantages. In this algorithm, the areas of high density are separated by the areas of low
density. Because of this, the clusters can be found in any arbitrary shape.
• Expectation-Maximization Clustering using GMM: This algorithm can be used as an alternative
for the k-means algorithm or for those cases where K-means can be failed. In GMM, it is
assumed that the data points are Gaussian distributed.
• Agglomerative Hierarchical algorithm: The Agglomerative hierarchical algorithm performs the
bottom-up hierarchical clustering. In this, each data point is treated as a single cluster at the
outset and then successively merged. The cluster hierarchy can be represented as a tree-
structure.
• Affinity Propagation: It is different from other clustering algorithms as it does not require to
specify the number of clusters. In this, each data point sends a message between the pair of data
points until convergence. It has O(N2T) time complexity, which is the main drawback of this
algorithm. Vinay S. Prabhavalkar
Machine Learning Problem Categories
3. Regression:
• Regression analysis is a statistical method to model the relationship between a
dependent (target) and independent (predictor) variables with one or more
independent variables. More specifically, Regression analysis helps us to understand
how the value of the dependent variable is changing corresponding to an
independent variable when other independent variables are held fixed. It predicts
continuous/real values such as temperature, age, salary, price, etc.
• We can understand the concept of regression analysis using the below example:

Example: Suppose there is a marketing company A, who


does various advertisement every year and get sales on
that. The below list shows the advertisement made by
the company in the last 5 years and the corresponding
sales:

Now, the company wants to do the advertisement of


$200 in the year 2019 and wants to know the prediction
about the sales for this year. So to solve such type of
prediction problems in machine learning, we need
regression analysis.
Vinay S. Prabhavalkar
Machine Learning Problem Categories
3. Regression:
• Regression is a supervised learning technique which helps in finding the correlation
between variables and enables us to predict the continuous output variable based on
the one or more predictor variables. It is mainly used for prediction, forecasting, time
series modeling, and determining the causal-effect relationship between variables.

• In Regression, we plot a graph between the variables which best fits the given
datapoints, using this plot, the machine learning model can make predictions about
the data. In simple words, "Regression shows a line or curve that passes through all
the datapoints on target-predictor graph in such a way that the vertical distance
between the datapoints and the regression line is minimum." The distance between
datapoints and line tells whether a model has captured a strong relationship or not.

• Some examples of regression can be as:


– Prediction of rain using temperature and other factors
– Determining Market trends
– Prediction of road accidents due to rash driving.

Vinay S. Prabhavalkar
Machine Learning Problem Categories
3. Regression:
Terminologies Related to the Regression Analysis:
• Dependent Variable: The main factor in Regression analysis which we want to predict
or understand is called the dependent variable. It is also called target variable.
• Independent Variable: The factors which affect the dependent variables or which are
used to predict the values of the dependent variables are called independent
variable, also called as a predictor.
• Outliers: Outlier is an observation which contains either very low value or very high
value in comparison to other observed values. An outlier may hamper the result, so it
should be avoided.
• Multicollinearity: If the independent variables are highly correlated with each other
than other variables, then such condition is called Multicollinearity. It should not be
present in the dataset, because it creates problem while ranking the most affecting
variable.
• Underfitting and Overfitting: If our algorithm works well with the training dataset but
not well with test dataset, then such problem is called Overfitting. And if our
algorithm does not perform well even with training dataset, then such problem is
called underfitting.

Vinay S. Prabhavalkar
Machine Learning Problem Categories
4. Optimization:
• Optimization is the problem of finding a set of inputs to an objective
function that results in a maximum or minimum function evaluation.
• Optimization, in simple terms, is a mechanism to make something better or
define a context for a solution that makes it the best.
• Consider a production scenario:- Let's assume there are two machines that
produce the desired product-
– one machine requires more energy for high speed in production and
lower raw materials
– other requires higher raw materials and less energy to produce the same
output in the same time.
• It is important to understand the patterns in the output based on the
variation in inputs; a combination that gives the highest profits would
probably be the one the production manager would want to know.
• As an analyst, one needs to identify the best possible way to distribute the
production between the machines that gives them the highest profit.

Vinay S. Prabhavalkar
Machine Learning Problem Categories
4. Optimization:
• The following image shows the point of highest profit when a graph was
plotted for various distribution options between the two machines.
Identifying this point is the goal of this technique.

Vinay S. Prabhavalkar
Data and inconsistencies in Machine learning

• Before understanding the overfitting and underfitting, let's understand


some basic term that will help to understand this topic well:

• Signal: It refers to the true underlying pattern of the data that helps the
machine learning model to learn from the data.

• Noise: Noise is unnecessary and irrelevant data that reduces the


performance of the model.

• Bias: Bias is a prediction error that is introduced in the model due to


oversimplifying the machine learning algorithms. Or it is the difference
between the predicted values and the actual values.

• Variance: If the machine learning model performs well with the training
dataset, but does not perform well with the test dataset, then variance
occurs.
Vinay S. Prabhavalkar
Data and inconsistencies in Machine learning
• Under-fitting: Underfitting occurs when our machine learning model is
not able to capture the underlying trend of the data.
• Example: We can understand the underfitting using below output of the
linear regression model:
• As we can see from the above diagram, the model is unable to capture the
data points present in the plot.

How to avoid underfitting:

• By increasing the training time of


the model.

• By increasing the number of


features.

Vinay S. Prabhavalkar
Data and inconsistencies in Machine learning
• Overfitting: Overfitting occurs when our machine learning model tries to
cover all the data points or more than the required data points present in
the given dataset.
• Because of this, the model starts caching noise and inaccurate values
present in the dataset, and all these factors reduce the efficiency and
accuracy of the model.
• The overfitted model has low bias and high variance.

• The chances of occurrence of


overfitting increase as much we
provide training to our model. It
means the more we train our
model, the more chances of
occurring the overfitted model.
• Overfitting is the main problem
that occurs in supervised
learning.
Vinay S. Prabhavalkar
Data and inconsistencies in Machine learning
• As we can see from the above graph, the model tries to cover all the data
points present in the scatter plot. It may look efficient, but in reality, it is
not so. Because the goal of the regression model to find the best fit line,
but here we have not got any best fit, so, it will generate the prediction
errors.
How to avoid the Overfitting in Model
• Both overfitting and underfitting cause the degraded performance of the
machine learning model. But the main cause is overfitting, so there are
some ways by which we can reduce the occurrence of overfitting in our
model.
– Cross-Validation
– Training with more data
– Removing features
– Early stopping the training
– Regularization
– Ensembling

Vinay S. Prabhavalkar
Train Test Split

Vinay S. Prabhavalkar

You might also like