Unit-I - Machine Learning Concepts
Unit-I - Machine Learning Concepts
COURSE OBJECTIVE
COURSE OUTCOME
SYLLABUS
Contents
Unit-I Machine Learning: Concept and issues, Supervised versus unsupervised learning,
Regression versus Classification problem, Algorithms versus Models, Model training:
regression and classification models, , model assessment , bias-variance trade-off, hyper
parameter tuning, cross validation, ROC curves
Unit-II Tree based methods: Basics of decision trees, a simple tree, tree entropy and information
gain, Trees versus linear models, pros and cons of trees, overfitting, pruning a tree, Trees
versus linear models, bagging, random forests, boosting, fitting of classification and
regression trees
Unit-III Support vector machines (SVMs): Overview, separating hyperplane, maximal margin
classifier, support vector classifier (SVC): linear classification and classification with non-
linear decision boundaries, SVM versus SVC, SVM with more than 2 classes: One-versus-One
and One-versus- All case, kernel functions
Unit-IV Neural Networks: Overview, single and multilayer neural networks, neural networks for
regression and classification. kNN classifier and k means clustering as machine learning
tools.
5
RECOMMENDED BOOKS
No Title Author
1 Machine Learning Made Easy with R: An Intuitive Step by Lewis, N.D. (2017) CreateSpace
Step Blueprint for Beginners Independent Publishing Platform
2 Introduction to Machine Learning with R: Rigorous Burger, S.V. (2018)
mathematical modeling O Reilly.
3 Machine Learning with R: Expert Techniques for Predictive Lantz, B. (2019)
Modeling. Packt Publications, 3rd edition.
4 Hands-On Machine Learning with Scikit-Learn, Keras, and Aurélien Géron
TensorFlow: Concepts, Tools, and Techniques to Build
Intelligent Systems
5 Machine Learning For Dummies John Paul Mueller, Luca Massaron
Artificial Intelligence
and
Machine Learning
Dr. Zahid Ahmed Ansari 1/30/2024
8
1/30/2024
9
AI AND ML
AI VS ML
Artificial Intelligence Machine Learning
Artificial intelligence is a technology which enables Machine learning is a subset of AI which allows a
a machine to simulate human behavior. machine to automatically learn from past data
without programming explicitly.
The goal of AI is to make a smart computer system The goal of ML is to allow machines to learn from
like humans to solve complex problems. data so that they can give accurate output.
In AI, we make intelligent systems to perform any In ML, we teach machines with data to perform a
task like a human. particular task and give an accurate result.
Machine learning and deep learning are the two Deep learning is a main subset of machine learning.
main subsets of AI.
AI has a very wide range of scope. Machine learning has a limited scope.
AI is working to create an intelligent system which Machine learning is working to create machines
can perform various complex tasks. that can perform only those specific tasks for which
they are trained.
1/30/2024
11
AI VS ML
Artificial Intelligence Machine Learning
AI system is concerned about maximizing the Machine learning is mainly concerned about
chances of success. accuracy and patterns.
The main applications of AI are Siri, customer The main applications of ML are Online
support using catboats, Expert System, Online recommender system, Google search
game playing, intelligent humanoid robot, etc. algorithms, Facebook auto friend tagging
suggestions, etc.
On the basis of capabilities, AI can be divided into ML can also be divided into mainly three types that
three types, which are, Weak AI, General AI, are Supervised learning, Unsupervised learning,
and Strong AI. and Reinforcement learning.
It includes learning, reasoning, and self-correction. It includes learning and self-correction when
introduced with new data.
AI completely deals with Structured, semi- Machine learning deals with Structured and semi-
structured, and unstructured data. structured data.
AI system is concerned about maximizing the Machine learning is mainly concerned about
1/30/2024
chances of success. accuracy and patterns.
12
• Machine Learning (ML) is basically that field of computer science with the help of
which computer systems can provide sense to data in much the same way as human
beings do.
• In simple words, ML is a type of artificial intelligence that extract patterns out of
raw data by using an algorithm or method.
• The key focus of ML is to allow computer systems to learn from experience without
being explicitly programmed or human intervention.
• Arthur Samuel, an early American leader in the field of computer gaming and
artificial intelligence, coined the term “Machine Learning ” in 1959 while at IBM. He
defined machine learning as:
• The field of study that gives computers the ability to learn without being
explicitly programmed .
• However, there is no universally accepted definition for machine learning. Different
authors define the term differently. Another definition is:
• The field of study known as machine learning is concerned with the
question of how to construct computer programs that automatically
improve with experience.
1/30/2024
15
FORMAL DEFINITION OF ML
BY PROFESSOR MITCHELL
• A computer program is said to learn from experience E with respect to some class of
tasks T and performance measure P, if its performance at tasks in T, as measured by
P, improves with experience E.”
• The above definition is basically focusing on three parameters namely Task(T),
Performance(P) and experience (E).
• Human beings, at this moment, are the most intelligent and advanced species on earth
because they can think, evaluate and solve complex problems.
• On the other side, AI is still in its initial stage and haven’t surpassed human intelligence in
many aspects.
• Then the question is that what is the need to make machine learn? The most suitable reason
for doing this is:
• To make decisions, based on data, with efficiency and scale.
• Organizations are investing heavily in technologies like AI, ML and Deep Learning to get
the key information from data to perform several real-world tasks and solve problems.
• We all need to solve real-world problems with efficiency at a huge scale. That is why the
need for machine learning arises
1/30/2024
33
5. Self-driving cars:
• One of the most exciting applications of machine learning is self-driving cars. Machine
learning plays a significant role in self-driving cars. Tesla, the most popular car
manufacturing company is working on self-driving car. It is using unsupervised learning
method to train the car models to detect people and objects while driving.
6. Email Spam and Malware Filtering:
• Whenever we receive a new email, it is filtered automatically as important, normal, and
spam. We always receive an important mail in our inbox with the important symbol and
spam emails in our spam box, and the technology behind this is Machine learning. Below are
some spam filters used by Gmail:
• Content Filter, Header filter, General blacklists filter, Rules-based filters, Permission filters
• Some machine learning algorithms such as Multi-Layer Perceptron, Decision tree, and Naïve
Bayes classifier are used for email spam filtering and malware detection.
12. Web Search Engine: One of the reasons why search engines like google, bing etc work so
well is because the system has learnt how to rank pages through a complex learning
algorithm.
13. Automation: Machine learning, which works entirely autonomously in any field without
the need for any human intervention. For example, robots performing the essential
process steps in manufacturing plants.
14. Computer vision: Machine learning algorithms can be used to recognize objects, people,
and other elements in images and videos.
15. Natural language processing: Machine learning algorithms can be used to understand
and generate human language, including tasks such as translation and text classification.
16. Finance Industry: Machine learning is growing in popularity in the finance industry.
Banks are mainly using ML to find patterns inside the data but also to prevent fraud.
17. Government organization: The government makes use of ML to manage public safety and
utilities. Take the example of China with the massive face recognition. The government
uses Artificial intelligence to prevent jaywalker.
18. Healthcare industry: Healthcare was one of the first industry to use machine learning
with image detection.
19. Marketing: Broad use of AI is done in marketing thanks to abundant access to data. Before
the age of mass data, researchers develop advanced mathematical tools like Bayesian
analysis to estimate the value of a customer. With the boom of data, marketing department
relies on AI to optimize the customer relationship and marketing campaign.
• Types of Machine Learning based on the nature of the learning “signal” or “feedback”
available to a learning system
1. Supervised learning:
2. Unsupervised learning:
3. Reinforcement learning:
• Supervised learning is the type of machine learning in which machines are trained using
well "labelled" training data, and on basis of that data, machines predict the output. The
labelled data means some input data is already tagged with the correct output.
• In supervised learning, the training data provided to the machines work as the supervisor
that teaches the machines to predict the output correctly. It applies the same concept as a
student learns in the supervision of the teacher.
• Supervised learning is a process of providing input data as well as correct output data to the
machine learning model. The aim of a supervised learning algorithm is to find a mapping
function to map the input variable(x) with the output variable(y).
• In the real-world, supervised learning can be used for Risk Assessment, Image classification,
Fraud Detection, spam filtering, etc.
SUPERVISED LEARNING
• The main objective of supervised learning algorithms is to learn an association between
input data samples and corresponding outputs after performing multiple training data
instances.
• For example, we have
x: Input variables and
Y: Output variable
• Now, apply an algorithm to learn the mapping function from the input to output as follows:
Y=f(x)
• Now, the main objective would be to approximate the mapping function so well that even
when we have new input data (x), we can easily predict the output variable (Y) for that new
input data.
1/30/2024
47
1/30/2024
50
REGRESSION
• It is also a supervised learning problem, that predicts a numeric value and outputs are
continuous rather than discrete. For example, predicting the stock prices using historical
data.
• It is used for the prediction of continuous variables, such as Weather forecasting, Market
Trends, etc.
• Below are some popular Regression algorithms which come under supervised learning:
• Linear Regression
• Regression Trees
• Non-Linear Regression
• Bayesian Linear Regression
• Polynomial Regression
1/30/2024
52
CLASSIFICATION
• Classification: Inputs are divided into two or more classes, and the learner must
produce a model that assigns unseen inputs to one or more (multi-label
classification) of these classes and predicting whether or not something belongs to a
particular class. This is typically tackled in a supervised way.
• Classification models can be categorized in two groups: Binary classification and
Multiclass Classification. Spam filtering is an example of binary classification,
where the inputs are email (or other) messages, and the classes are “spam” and “not
spam”.
• Classification algorithms are used when the output variable is categorical, which means
there are two classes such as Yes-No, Male-Female, True-false, etc.
• Random Forest
• Decision Trees
• Logistic Regression
• Support vector Machines
1/30/2024
53
• In the previous topic, we learned supervised machine learning in which models are trained using
labeled data under the supervision of training data. But there may be many cases in which we do
not have labeled data and need to find the hidden patterns from the given dataset. So, to solve
such types of cases in machine learning, we need unsupervised learning techniques.
• Unsupervised Learning is a machine learning technique in which models are not supervised
using training dataset. Instead, models itself find the hidden patterns and insights from the given
data. It can be compared to learning which takes place in the human brain while learning new
things. It can be defined as:
• Unsupervised learning is a type of machine learning in which models are trained using unlabeled
dataset and are allowed to act on that data without any supervision.
• Below are some main reasons which describe the importance of Unsupervised Learning:
• Unsupervised learning is helpful for finding useful insights from the data.
• Unsupervised learning is much similar as a human learns to think by their own
experiences, which makes it closer to the real AI.
• Unsupervised learning works on unlabeled and uncategorized data which make
unsupervised learning more important.
• In real-world, we do not always have input data with the corresponding output so to
solve such cases, we need unsupervised learning.
•
• Here, we have taken an unlabeled input data, which means it is not categorized and
corresponding outputs are also not given. Now, this unlabeled input data is fed to
the machine learning model in order to train it. Firstly, it will interpret the raw data
to find the hidden patterns from the data and then will apply suitable algorithms
such as k-means clustering, Decision tree, etc.
• Once it applies the suitable algorithm, the algorithm divides the data objects into
groups according to the similarities and difference between the objects.
1/30/2024
59
• The most basic disadvantage of any Supervised Learning algorithm is that the dataset has to be hand-
labeled either by a Machine Learning Engineer or a Data Scientist. This is a very costly process,
especially when dealing with large volumes of data. The most basic disadvantage of any Unsupervised
Learning is that it’s application spectrum is limited.
65
SEMI-SUPERVISED LEARNING
• Such kind of algorithms or methods are neither fully supervised nor fully unsupervised.
They basically fall between the two i.e. supervised and unsupervised learning methods.
• These kinds of algorithms generally use small supervised learning component i.e. small
amount of pre-labeled annotated data and large unsupervised learning component i.e. lots
of unlabeled data for training. We can follow any of the following approaches for
implementing semi-supervised learning methods:
• The first and simple approach is to build the supervised model based on small amount
of labeled and annotated data and then build the unsupervised model by applying the
same to the large amounts of unlabeled data to get more labeled samples. Now, train the
model on them and repeat the process.
• The second approach needs some extra efforts. In this approach, we can first use the
unsupervised methods to cluster similar data samples, annotate these groups and then
use a combination of this information to train the model.
1/30/2024
66
SEMI-SUPERVISED LEARNING
• Semi-supervised learning: Problems where you have a large amount of input data and only some of
the data is labeled, are called semi-supervised learning problems.
• These problems sit in between both supervised and unsupervised learning.
• For example, a photo archive where only some of the images are labeled, (e.g. dog, cat, person)
and the majority are unlabeled.
• Semi-supervised learning is particularly useful when there is a large amount of unlabeled data
available, but it’s too expensive or difficult to label all of it. Some examples of semi-supervised learning
applications include:
• Text classification: In text classification, the goal is to classify a given text into one or more
predefined categories. Semi-supervised learning can be used to train a text classification model
using a small amount of labeled data and a large amount of unlabeled text data.
• Image classification: In image classification, the goal is to classify a given image into one or more
predefined categories. Semi-supervised learning can be used to train an image classification
model using a small amount of labeled data and a large amount of unlabeled image data.
• Anomaly detection: In anomaly detection, the goal is to detect patterns or observations that are
unusual or different from the norm.
1/30/2024
68
REINFORCEMENT LEARNING
REGRESSION ANALYSIS IN ML
• Regression analysis is a statistical method to model the relationship
between a dependent (target) and independent (predictor) variables with
one or more independent variables.
• Regression analysis helps us to understand how the value of the
dependent variable is changing corresponding to an independent variable
when other independent variables are held fixed.
• It predicts continuous/real values such as temperature, age, salary,
price, etc.
• Example: Suppose there is a marketing company A, who does various
advertisement every year and get sales on that. The below list shows the
advertisement made by the company in the last 5 years and the
corresponding sales:
• Now, the company wants to do the advertisement of $200 in the year
2019 and wants to know the prediction about the sales for this year.
• So to solve such type of prediction problems in machine learning, we
need regression analysis.
1/30/2024
70
REGRESSION ANALYSIS IN ML
• Regression is a supervised learning technique which helps in finding the correlation between variables
and enables us to predict the continuous output variable based on the one or more predictor variables.
• It is mainly used for prediction, forecasting, time series modeling, and determining the causal-effect
relationship between variables.
• In Regression, we plot a graph between the variables which best fits the given datapoints, using this
plot, the machine learning model can make predictions about the data.
• In simple words, "Regression shows a line or curve that passes through all the datapoints on target-
predictor graph in such a way that the vertical distance between the datapoints and the regression line
is minimum."
• The distance between datapoints and line tells whether a model has captured a strong relationship or
not.
• Some examples of regression can be as:
• Prediction of rain using temperature and other factors
• Determining Market trends
• Prediction of road accidents due to rash driving.
1/30/2024
71
• Dependent Variable: The main factor in Regression analysis which we want to predict or
understand is called the dependent variable. It is also called target variable.
• Independent Variable: The factors which affect the dependent variables or which are used
to predict the values of the dependent variables are called independent variable, also called
as a predictor.
• Outliers: Outlier is an observation which contains either very low value or very high value
in comparison to other observed values. An outlier may hamper the result, so it should be
avoided.
• Multicollinearity: If the independent variables are highly correlated with each other than
other variables, then such condition is called Multicollinearity. It should not be present in
the dataset, because it creates problem while ranking the most affecting variable.
• Underfitting and Overfitting: If our algorithm works well with the training dataset but not
well with test dataset, then such problem is called Overfitting. And if our algorithm does
not perform well even with training dataset, then such problem is called underfitting.
72
TYPES OF REGRESSION
• There are various types of regressions which are used
in data science and machine learning. Each type has its
own importance on different scenarios, but at the core,
all the regression methods analyze the effect of the
independent variable on dependent variables. Here
we are discussing some important types of regression
which are given below:
• Linear Regression
• Logistic Regression
• Polynomial Regression
• Support Vector Regression
• Decision Tree Regression
• Random Forest Regression
1/30/2024
74
LINEAR REGRESSION
• Linear regression is a statistical regression method which is used for predictive analysis.
• It is one of the very simple and easy algorithms which works on regression and shows the
relationship between the continuous variables.
• It is used for solving the regression problem in machine learning.
• Linear regression shows the linear relationship between the independent variable (X-axis)
and the dependent variable (Y-axis), hence called linear regression.
• If there is only one input variable (x), then such linear regression is called simple linear
regression. And if there is more than one input variable, then such linear regression is
called multiple linear regression.
LINEAR REGRESSION
• The given image explains the relationship between
variables in the linear regression model. We are predicting
the salary of an employee based on the year of experience.
• Below is the mathematical equation for Linear regression:
Y = aX + b
• Y = dependent variables (target variables),
• X= Independent variables (predictor variables),
• a and b are the linear coefficients
• Some popular applications of linear regression are:
• Analyzing trends and sales estimates
• Salary forecasting
• Real estate prediction
• Arriving at ETAs in traffic.
1/30/2024
76
LOGISTIC REGRESSION
• Logistic regression is another supervised learning algorithm which is used to solve the
classification problems. In classification problems, we have dependent variables in a binary or
discrete format such as 0 or 1.
• Logistic regression algorithm works with the categorical variable such as 0 or 1, Yes or No, True
or False, Spam or not spam, etc.
• It is a predictive analysis algorithm which works on the concept of probability.
• Logistic regression is a type of regression, but it is different from the linear regression algorithm
in the term how they are used.
• There are three types of logistic regression:
• Binary (0/1, pass/fail)
• Multi (cats, dogs, lions)
• Ordinal (low, medium, high)
LOGISTIC REGRESSION
POLYNOMIAL REGRESSION
POLYNOMIAL REGRESSION
• The equation for polynomial regression also derived
from linear regression equation that means Linear
regression equation Y= b0+ b1x, is transformed into
Polynomial regression equation Y= b0+b1x+ b2x2+
b3x3+.....+ bnxn
• Here Y is the predicted/target output, b0, b1,... bn are
the regression coefficients. x is
our independent/input variable.
• The model is still linear as the coefficients are still
linear with quadratic
• Note: This is different from Multiple Linear
regression in such a way that in Polynomial
regression, a single element has different degrees
instead of multiple variables with the same degree
• Support Vector Machine (SVM) is a supervised learning algorithm which can be used for
regression as well as classification problems. So if we use it for regression problems, then it
is termed as Support Vector Regression (SVR).
• Support Vector Regression is a regression algorithm which works for continuous variables.
Below are some keywords which are used in Support Vector Regression:
• Kernel: It is a function used to map a lower-dimensional data into higher dimensional data.
• Hyperplane: In general SVM, it is a separation line between two classes, but in SVR, it is a
line which helps to predict the continuous variables and cover most of the datapoints.
• Boundary line: Boundary lines are the two lines apart from hyperplane, which creates a
margin for datapoints.
• Support vectors: Support vectors are the datapoints which are nearest to the hyperplane and
opposite class.
Dr. Zahid Ahmed Ansari 1/30/2024
81
• Decision Tree is a supervised learning algorithm which can be used for solving both
classification and regression problems.
• It can solve problems for both categorical and numerical data
• Decision Tree regression builds a tree-like structure in which each internal node
represents the "test" for an attribute, each branch represent the result of the test, and
each leaf node represents the final decision or result.
• A decision tree is constructed starting from the root node/parent node (dataset),
which splits into left and right child nodes (subsets of dataset). These child nodes
are further divided into their children node, and themselves become the parent
node of those nodes. Consider the below image:
1/30/2024
84
• Random forest is one of the most powerful supervised learning algorithms which is
capable of performing regression as well as classification tasks.
• The Random Forest regression is an ensemble learning method which combines
multiple decision trees and predicts the final output based on the average of each
tree output. The combined decision trees are called as base models, and it can be
represented more formally as:
• g(x)= f0(x)+ f1(x)+ f2(x)+....
• Random forest uses Bagging or Bootstrap Aggregation technique of ensemble
learning in which aggregated decision tree runs in parallel and do not interact with
each other.
CLASSIFICATION ALGORITHM
CLASSIFICATION ALGORITHM
• Classification Algorithms can be further divided into the Mainly two category:
• Linear Models
• Logistic Regression
• Support Vector Machines
• Non-linear Models
• Decision Tree Classification
• Random Forest Classification
• Naïve Bayes
• K-Nearest Neighbours
• Kernel SVM
• Classification algorithms can be used in different places. Below are some popular
use cases of Classification Algorithms:
• Email Spam Detection
• Speech Recognition
• Identifications of Cancer tumor cells.
• Drugs Classification
• Biometric Identification, etc.
1/30/2024
95
CLUSTERING EXAMPLE
• Example: Let's understand the clustering technique with the real-world example of Mall:
When we visit any shopping mall, we can observe that the things with similar usage are
grouped together. Such as the t-shirts are grouped in one section, and trousers are at other
sections, similarly, at vegetable sections, apples, bananas, Mangoes, etc., are grouped in
separate sections, so that we can easily find out the things. The clustering technique also
works in the same way. Other examples of clustering are grouping documents according to
the topic.
• The clustering technique can be widely used in various tasks. Some most common uses of
this technique are:
• Market Segmentation
• Statistical data analysis
• Social network analysis
• Image segmentation
• Anomaly detection, etc. 1/30/2024
96
CLUSTERING EXAMPLE
• Apart from these general usages, it is used by the Amazon in its recommendation system to
provide the recommendations as per the past search of products. Netflix also uses this
technique to recommend the movies and web-series to its users as per the watch history.
• The below diagram explains the working of the clustering algorithm. We can see the
different fruits are divided into several groups with similar properties.
1/30/2024
97
• The clustering methods are broadly divided into Hard clustering (datapoint
belongs to only one group) and Soft Clustering (data points can belong to another
group also). But there are also other various approaches of Clustering exist.
• Below are the main clustering methods used in Machine learning:
1. Partitioning Clustering
2. Density-Based Clustering
3. Distribution Model-Based Clustering
4. Hierarchical Clustering
5. Fuzzy Clustering
PARTITIONING CLUSTERING
DENSITY-BASED CLUSTERING
HIERARCHICAL CLUSTERING
FUZZY CLUSTERING
• Fuzzy clustering is a type of soft method in which a data object may belong to more
than one group or cluster.
• Each dataset has a set of membership coefficients, which depend on the degree of
membership to be in a cluster.
• Fuzzy C-means algorithm is the example of this type of clustering; it is sometimes
also known as the Fuzzy k-means algorithm.
CLUSTERING ALGORITHMS
• The Clustering algorithms can be divided based on their models that are explained above.
There are different types of clustering algorithms published, but only a few are commonly
used. The clustering algorithm is based on the kind of data that we are using. Such as, some
algorithms need to guess the number of clusters in the given dataset, whereas some are
required to find the minimum distance between the observation of the dataset.
• Here we are discussing mainly popular Clustering algorithms that are widely used in
machine learning:
1. K-Means algorithm: The k-means algorithm is one of the most popular clustering
algorithms. It classifies the dataset by dividing the samples into different clusters of equal
variances. The number of clusters must be specified in this algorithm. It is fast with fewer
computations required, with the linear complexity of O(n).
2. Mean-shift algorithm: Mean-shift algorithm tries to find the dense areas in the smooth
density of data points. It is an example of a centroid-based model, that works on updating
the candidates for centroid to be the center of the points within a given region.
CLUSTERING ALGORITHMS
3. DBSCAN Algorithm: It stands for Density-Based Spatial Clustering of Applications
with Noise. It is an example of a density-based model similar to the mean-shift, but with
some remarkable advantages. In this algorithm, the areas of high density are separated by
the areas of low density. Because of this, the clusters can be found in any arbitrary shape.
4. Expectation-Maximization Clustering using GMM: This algorithm can be used as an
alternative for the k-means algorithm or for those cases where K-means can be failed. In
GMM, it is assumed that the data points are Gaussian distributed.
5. Agglomerative Hierarchical algorithm: The Agglomerative hierarchical algorithm
performs the bottom-up hierarchical clustering. In this, each data point is treated as a
single cluster at the outset and then successively merged. The cluster hierarchy can be
represented as a tree-structure.
6. Affinity Propagation: It is different from other clustering algorithms as it does not require
to specify the number of clusters. In this, each data point sends a message between the
pair of data points until convergence. It has O(N2T) time complexity, which is the main
drawback of this algorithm.
APPLICATIONS OF CLUSTERING
• Identification of Cancer Cells: The clustering algorithms are widely used for the
identification of cancerous cells. It divides the cancerous and non-cancerous data sets into
different groups.
• Search Engines: Search engines also work on the clustering technique. The search result
appears based on the closest object to the search query. It does it by grouping similar data
objects in one group that is far from the other dissimilar objects. The accurate result of a
query depends on the quality of the clustering algorithm used.
• Customer Segmentation: It is used in market research to segment the customers based on
their choice and preferences.
• Biology: It is used in the biology stream to classify different species of plants and animals
using the image recognition technique.
• Land Use: The clustering technique is used in identifying the area of similar lands use in the
GIS database. This can be very useful to find that for what purpose the particular land
should be used, that means for which purpose it is more suitable.
1/30/2024
106
1/30/2024
107
1/30/2024
108
• While Machine Learning is rapidly evolving, this segment of AI as whole still has a
long way to go. The reason behind is that ML has not been able to overcome number
of challenges. The challenges that ML is facing currently are −
• Quality of data − Having good-quality data for ML algorithms is one of the biggest
challenges. Use of low-quality data leads to the problems related to data
preprocessing and feature extraction.
• Time-Consuming task − Another challenge faced by ML models is the consumption
of time especially for data acquisition, feature extraction and retrieval.
• Lack of specialist persons − As ML technology is still in its infancy stage,
availability of expert resources is a tough job.
• No clear objective for formulating business problems − Having no clear objective
and well-defined goal for business problems is another key challenge for ML
because this technology is not that mature yet.
1/30/2024
109
WHAT IS AN “ALGORITHM” IN ML
WHAT IS A “MODEL” IN ML
1/30/2024
113
• Model evaluation is the process that uses some metrics which help us to analyze the performance
of the machine learning model.
• As we all know that model development is a multi-step process and a check should be kept on
how well the model generalizes future predictions.
• Evaluating a model plays a vital role so that we can judge the performance of our model.
• The evaluation also helps to analyze a model’s key weaknesses.
• There are many metrics like Accuracy, Precision, Recall, F1 score, Area under Curve, Confusion
Matrix, and Mean Square Error.
• Cross Validation is one technique that is followed during the training phase, and it is a model
evaluation technique as well.
1/30/2024
116
ACCURACY
• Accuracy: The accuracy metric is one of the simplest Classification metrics to implement, and it
can be determined as the number of correct predictions to the total number of predictions.
𝐍𝐨.𝐨𝐟 𝐂𝐨𝐫𝐫𝐞𝐜𝐭 𝐏𝐫𝐞𝐝𝐢𝐜𝐭𝐢𝐨𝐧𝐬
Accuracy =
𝐓𝐨𝐭𝐚𝐥 𝐍𝐨.𝐨𝐟 𝐏𝐫𝐞𝐝𝐢𝐜𝐭𝐢𝐨𝐧𝐬
• It is good to use the Accuracy metric when the target variable classes in data are approximately
balanced. For example, if 60% of classes in a fruit image dataset are of Apple, 40% are Mango. In this
case, if the model is asked to predict whether the image is of Apple or Mango, it will give a prediction
with 97% of accuracy.
• It is recommended not to use the Accuracy measure when the target variable majorly belongs to one
class. For example, Suppose there is a model for a disease prediction in which, out of 100 people, only
five people have a disease, and 95 people don't have one. In this case, if our model predicts every
person with no disease (which means a bad prediction), the Accuracy measure will be 95%, which is
not correct.
1/30/2024
118
CONFUSION MATRIX
• A confusion matrix is a tabular representation of
prediction outcomes of any binary classifier, which is
used to describe the performance of the classification
model on a set of test data when true values are known.
• In the matrix, columns are for the prediction values, and
rows specify the Actual values. Here Actual and
prediction give two possible classes, Yes or No. So, if we
are predicting the presence of a disease in a patient, the
Prediction column with Yes means, Patient has the
disease, and for NO, the Patient doesn't have the
disease.
• In this example, the total number of predictions are 165,
out of which 110 time predicted yes, whereas 55 times
predicted No.
• However, in reality, 60 cases in which patients don't
have the disease, whereas 105 cases in which patients
have the disease. 1/30/2024
119
CONFUSION MATRIX
• In general, the table is divided into four
terminologies, which are as follows:
• True Positive(TP): In this case, the prediction
outcome is true, and it is true in reality, also.
• True Negative(TN): in this case, the prediction
outcome is false, and it is false in reality, also.
• False Positive(FP): In this case, prediction
outcomes are true, but they are false in
actuality.
• False Negative(FN): In this case, predictions are
false, and they are true in actuality.
1/30/2024
120
PRECISION
• Precision: The precision metric is used to overcome the limitation of Accuracy. The precision
determines the proportion of positive prediction that was actually correct. It can be
calculated as the True Positive or predictions that are actually true to the total positive
predictions (True Positive and False Positive)
𝐓𝐏
Precision =
𝐓𝐏+𝐅𝐏
RECALL OR SENSITIVITY
• Recall: It is also similar to the Precision metric; however, it aims to calculate the proportion
of actual positive that was identified incorrectly.
• It can be calculated as True Positive or predictions that are actually true to the total number
of positives, either correctly predicted as positive or incorrectly predicted as negative (true
Positive and false negative).
• The formula for calculating Recall is given below
TP
Recall =
TP+FN
• From the above definitions of Precision and Recall, we can say that recall determines the
performance of a classifier with respect to a false negative, whereas precision gives
information about the performance of a classifier with respect to a false positive.
• So, if we want to minimize the false negative, then, Recall should be as near to 100%, and if
we want to minimize the false positive, then precision should be close to 100% as possible.
• In simple words, if we maximize precision, it will minimize the FP errors, and if we
maximize recall, it will minimize the FN error.
F-SCORES
AUC-ROC
• Sometimes we need to visualize the performance of the classification model on charts; then, we
can use the AUC-ROC curve. It is one of the popular and important metrics for evaluating the
performance of the classification model.
• ROC (Receiver Operating Characteristic curve) curve represents a graph to show the
performance of a classification model at different threshold levels. The curve is plotted between
two parameters, which are:
• True Positive Rate
• False Positive Rate
TP
• TPR or true Positive rate is a synonym for Recall, hence can be calculated as: TPR = TP+FP
FP
• FPR or False Positive Rate can be calculated as: FPR = FP+TN
• To calculate value at any point in a ROC curve, we can evaluate a logistic regression model
multiple times with different classification thresholds, but this would not be much efficient. So,
for this, one efficient method is used, which is known as AUC
1/30/2024
126
• Regression is a supervised learning technique that aims to find the relationships between the
dependent and independent variables. A predictive regression model predicts a numeric or
discrete value.
• The metrics used for regression are different from the classification metrics. It means we cannot
use the Accuracy metric (explained above) to evaluate a regression model; instead, the
performance of a Regression model is reported as errors in the prediction.
• Following are the popular metrics that are used to evaluate the performance of Regression
models.
• Mean Absolute Error
• Mean Squared Error
• R2 Score
• Adjusted R2
• Mean Absolute Error or MAE measures the absolute difference between actual and predicted values,
where absolute means taking a number as Positive.
• To understand MAE, let's take an example of Linear Regression, where the model draws a best fit line
between dependent and independent variables. To measure the MAE or error in prediction, we need
to calculate the difference between actual values and predicted values. But in order to find the absolute
error for the complete dataset, we need to find the mean absolute of the complete dataset.
𝟏
• The formula to calculate MAE: 𝑴𝑨𝑬 = 𝑵 ∑|𝒀 − 𝒀′|
• Here, Y is the Actual outcome, Y' is the predicted outcome, and N is the total number of data points.
• MAE is much more robust for the outliers. One of the limitations of MAE is that it is not differentiable,
so for this, we need to apply different optimizers such as Gradient Descent. However, to overcome this
limitation, another metric can be used, which is Mean Squared Error or MSE.
Dr. Zahid Ahmed Ansari 1/30/2024
128
• Mean Squared error or MSE is one of the most suitable metrics for Regression evaluation. It
measures the average of the Squared difference between predicted values and the actual value
given by the model.
• Since in MSE, errors are squared, therefore it only assumes non-negative values, and it is usually
positive and non-zero.
• Moreover, due to squared differences, it penalizes small errors also, and hence it leads to over-
estimation of how bad the model is.
• MSE is a much-preferred metric compared to other regression metrics as it is differentiable and
hence optimized better
𝟏
• The formula to calculate MSE: 𝑴𝑺𝑬 = 𝑵 ∑ 𝒀 − 𝒀′ 𝟐
• Here, Y is the Actual outcome, Y' is the predicted outcome, and N is the total number of data
points.
Dr. Zahid Ahmed Ansari 1/30/2024
129
R SQUARED SCORE
• R squared error is also known as Coefficient of Determination, which is another popular metric
used for Regression model evaluation. The R-squared metric enables us to compare our model
with a constant baseline to determine the performance of the model. To select the constant
baseline, we need to take the mean of the data and draw the line at the mean.
• The R squared score will always be less than or equal to 1 without concerning if the values are
too large or small
𝟐
𝑴𝑺𝑬(𝑴𝒐𝒅𝒆𝒍)
𝑹 =𝟏 −
𝑴𝑺𝑬(𝑩𝒂𝒔𝒆𝒍𝒊𝒏𝒆)
ADJUSTED R SQUARED
• Adjusted R squared, as the name suggests, is the improved version of R squared error. R square
has a limitation of improvement of a score on increasing the terms, even though the model is not
improving, and it may mislead the data scientists.
• To overcome the issue of R square, adjusted R squared is used, which will always show a lower
value than R². It is because it adjusts the values of increasing predictors and only shows
improvement if there is a real improvement.
• We can calculate the adjusted R squared as follows:
𝒏−𝟏
𝟐
𝐑𝐚 = 𝟏 − × 𝟏 − 𝐑𝟐
𝒏−𝒌−𝟏
• Here, 𝒏 is the number of observations
• 𝒌 denotes the number of independent variables
• 𝐑𝐚𝟐 denotes the adjusted 𝐑𝟐
BIAS
• The bias is known as the difference between the
prediction of the values by the ML model and the
correct value.
• Being high in biasing gives a large error in
training as well as testing data.
• Its recommended that an algorithm should always
be low biased to avoid the problem of
underfitting. By high bias, the data predicted is in
a straight line format, thus not fitting accurately in
the data in the data set. Such fitting is known
as Underfitting of Data.
1/30/2024
132
VARIANCE
• The variability of model prediction for a given data
point which tells us spread of our data is called the
variance of the model.
• The model with high variance has a very complex fit
to the training data and thus is not able to fit
accurately on the data which it hasn’t seen before.
• As a result, such models perform very well on
training data but has high error rates on test data.
• When a model is high on variance, it is then said to
as Overfitting of Data. Overfitting is fitting the
training set accurately via complex curve and high
order hypothesis but is not the solution as the error
with unseen data is high.
• While training a data model variance should be kept
1/30/2024
low. The high variance data looks like follows.
133
HYPERPARAMETER TUNING
• A Machine Learning model is defined as a mathematical model with a number of parameters that need
to be learned from the data.
• By training a model with existing data, we are able to fit the model parameters.
• However, there is another kind of parameter, known as Hyperparameters, that cannot be directly
learned from the regular training process. They are usually fixed before the actual training process
begins. These parameters express important properties of the model such as its complexity or how fast
it should learn.
• Some examples of model hyperparameters include:
1. The learning rate for training a neural network.
2. The k in k-nearest neighbors.
3. The penalty in Logistic Regression Classifier i.e. L1 or L2 regularization
4. The C and sigma hyperparameters for support vector machines.
THANK YOU!