Machine Learning Supervised
Machine Learning Supervised
Learning
Dr. Fehmina Malik
Definition
• Supervised machine learning is a type of machine learning
where the model is trained on labeled data to make
predictions or classifications on new, unseen data.
• The goal is to learn a mapping between the input features
and the output labels based on the given labeled examples.
• The model makes predictions based on this mapping, and
the performance of the model is evaluated based on how
well it generalizes to new, unseen data.
• Examples of supervised learning include linear
regression, logistic regression, decision trees, and
neural networks.
Types of Supervised Learning
1. Image and Object Recognition:
Application: Recognizing objects in images or videos.
Example: Classifying images into categories such as people, animals, or
objects using algorithms like Convolutional Neural Networks (CNNs).
2. Speech Recognition:
Use Cases of Application: Converting spoken language into text.
Supervised Example: Developing systems like virtual assistants (e.g., Siri, Google
Assistant) that understand and respond to spoken commands.
Supervised Example: Building models that learn from user preferences and
behaviors to provide personalized recommendations.
• Supervised Machine Learning algorithm can be broadly classified into Regression and
Classification Algorithms. In Regression algorithms, we have predicted the output for
continuous values, but to predict the categorical values, we need Classification
algorithms.
• used to identify the category of new observations on the basis of training data.
• a program learns from the given dataset or observations and then classifies new
observation into a number of classes or groups. Such as, Yes or No, 0 or 1, Spam or Not
Spam, cat or dog, etc. Classes can be called as targets/labels or categories.
• Unlike regression, the output variable of Classification is a category, not a value, such as
"Green or Blue", "fruit or animal", etc. Since the Classification algorithm is a Supervised
learning technique, hence it takes labeled input data, which means it contains input with
the corresponding output.
• In classification algorithm, a discrete output function(y) is mapped to input variable(x).
• y=f(x), where y = categorical output
• The best example of an ML classification algorithm is Email Spam Detector.
• The main goal of the Classification algorithm is to
identify the category of a given dataset, and these
algorithms are mainly used to predict the output for the
categorical data.
• Classification algorithms can be better understood using
the below diagram. In the below diagram, there are two
classes, class A and Class B. These classes have features
that are similar to each other and dissimilar to other
classes.
Regression Classification
• Classification algorithms can be used in different places. Below are some popular use
cases of Classification Algorithms:
• Email Spam Detection
• Speech Recognition
• Identifications of Cancer tumor cells.
• Drugs Classification
• Biometric Identification, etc.
• Can be used for both classification and Regression
problems, but mostly it is preferred for solving
Classification problems.
• It is a tree-structured classifier, where internal nodes
represent the features of a dataset, branches represent
the decision rules and each leaf node represents the
Decision outcome.
Trees for • In a Decision tree, there are two nodes, which are
the Decision Node and Leaf Node. Decision nodes are used
Classification to make any decision and have multiple branches, whereas
Leaf nodes are the output of those decisions and do not
contain any further branches.
• The decisions or the test are performed on the basis of
features of the given dataset.
• It is a graphical representation for getting all the possible
solutions to a problem/decision based on given
conditions.
Decision Trees Structure
• Similar to a tree, it starts with the root node,
which expands on further branches and
constructs a tree-like structure.
• A decision tree simply asks a question, and
based on the answer (Yes/No), it further split the
tree into subtrees.
• Root Node: Root node is from where the decision
tree starts. It represents the entire dataset, which
further gets divided into two or more homogeneous
sets.
• Leaf Node: Leaf nodes are the final output node, and
the tree cannot be segregated further after getting a
leaf node.
• Splitting: Splitting is the process of dividing the
Terminolgies decision node/root node into sub-nodes according
to the given conditions.
• Branch/Sub Tree: A tree formed by splitting the tree.
• Pruning: Pruning is the process of removing the
unwanted branches from the tree.
• Parent/Child node: The root node of the tree is
called the parent node, and other nodes are called
the child nodes.
Why use • Decision Trees usually mimic human thinking
ability while making a decision, so it is easy to
Decision understand.
Trees? • The logic behind the decision tree can be easily
understood because it shows a tree-like
structure.
How does the Decision Tree algorithm
Work?
Measures • F-score
• Accuracy
• Error Rate
• It is a table with 4
different combinations
of predicted and actual
Confusion values.
You predicted that a patient has cancer and she/he actually has cancer.
You predicted that a patient does not have cancer and she/he actually does not have cancer.
• A high precision indicates that the model has a low rate of false
positives. In other words, when the model predicts a positive outcome,
it is likely to be correct.
F-score
• If two models have low precision and high recall or vice versa, it is difficult to compare these models. So, for
this purpose, we can use F-score.
• This score helps us to evaluate the recall and precision at the same time. The F-score is maximum if the recall
is equal to the precision. It can be calculated using the below formula:
2 × Recall × Precision
F − score=
Recall + Precision
• F-score is the harmonic mean of precision and recall. It provides a balance between precision and recall,
offering a single metric to evaluate the model's performance.
• F-score is useful when there is a need to consider both false positives and false negatives. A higher F-score
indicates a model that performs well in terms of both precision and recall.
Accuracy
• From all the classes (positive and negative), how many of them we
have predicted correctly.
Error rate
• Error rate is the ratio of incorrectly predicted instances (sum of false positives and
false negatives) to the total number of instances. It represents the overall rate of
misclassifications made by the model.
• A lower error rate indicates better model performance. However, like accuracy, error
rate may not be suitable for imbalanced datasets as it can be dominated by the
majority class.
• True Positive:
• Interpretation: You predicted positive and it’s true.
• You predicted that a patient has cancer and she/he actually has cancer.
• True Negative:
• Interpretation: You predicted negative and it’s true.
• You predicted that a patient does not have cancer and she/he actually does not have cancer.
• False Positive: (Type 1 Error)
• Interpretation: You predicted positive and it’s false.
• You predicted that a patient has cancer and she/he actually does not has cancer.
• False Negative: (Type 2 Error)
• Interpretation: You predicted negative and it’s false.
• You predicted that a patient does not has cancer and she/he actually has cancer.
• Just Remember, We describe predicted values as Positive and Negative and actual values as True and
False.
Overfitting and Underfitting in Machine
Learning
• Before understanding the overfitting and underfitting, let's understand some
basic term that will help to understand this topic well:
• Signal/Trend: It refers to the true underlying pattern of the data that helps
the machine learning model to learn from the data.
• Noise: Noise is unnecessary and irrelevant data that reduces the performance
of the model.
• Bias: Bias is a prediction error that is introduced in the model due to
oversimplifying the machine learning algorithms. Or it is the difference
between the predicted values and the actual values.
• Variance: If the machine learning model performs well with the training
dataset, but does not perform well with the test dataset, then variance occurs.
Overfitting
• If our algorithm works well with the training dataset but not well with
test dataset, then such problem is called Overfitting.
• Overfitting occurs when our machine learning model tries to cover all
the data points or more than the required data points present in the
given dataset. Because of this, the model starts caching noise and
inaccurate values present in the dataset, and all these factors reduce the
efficiency and accuracy of the model. The overfitted model has low
bias and high variance.
• The chances of occurrence of overfitting increase as much we provide
training to our model. It means the more we train our model, the more
chances of occurring the overfitted model.
• Overfitting is the main problem that occurs in supervised learning.
Example: The concept of overfitting can be understood by the below graph of the
linear regression output:
As we can see from the above graph, the model tries to cover all
the data points present in the scatter plot. It may look efficient,
but in reality, it is not so. Because the goal of the regression
model to find the best fit line, but here we have not got any best
fit, so, it will generate the prediction errors.
Underfitting
• If our algorithm does not perform well even with training dataset, then
such problem is called underfitting.
• Underfitting occurs when our machine learning model is not able to
capture the underlying trend of the data. To avoid the overfitting in the
model, the fed of training data can be stopped at an early stage, due to
which the model may not learn enough from the training data. As a result,
it may fail to find the best fit of the dominant trend in the data.
• In the case of underfitting, the model is not able to learn enough from the
training data, and hence it reduces the accuracy and produces unreliable
predictions.
• An underfitted model has high bias and low variance.
• Example: We can understand the underfitting using below output of the
linear regression model:
• As we can see from the above diagram, the model is unable to capture the
data points present in the plot.