0% found this document useful (0 votes)
1 views

Classification

Supervised Learning is a machine learning approach where models learn from labeled data, identifying patterns between input variables and their corresponding labels. It encompasses various tasks, including classification, which categorizes data into predefined classes using algorithms based on training data. Key types of classification tasks include binary, multi-class, multi-label, and imbalanced classification, each with specific algorithms and applications.

Uploaded by

Amita Garg
Copyright
© © All Rights Reserved
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
1 views

Classification

Supervised Learning is a machine learning approach where models learn from labeled data, identifying patterns between input variables and their corresponding labels. It encompasses various tasks, including classification, which categorizes data into predefined classes using algorithms based on training data. Key types of classification tasks include binary, multi-class, multi-label, and imbalanced classification, each with specific algorithms and applications.

Uploaded by

Amita Garg
Copyright
© © All Rights Reserved
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
You are on page 1/ 21

What is Supervised Learning?

Before we dive into Classification, let’s take a look at what Supervised Learning is.
Suppose you are trying to learn a new concept in maths and after solving a problem,
you may refer to the solutions to see if you were right or not. Once you are confident in
your ability to solve a particular type of problem, you will stop referring to the answers
and solve the questions put before you by yourself.

This is also how Supervised Learning works with machine learning models. In
Supervised Learning, the model learns by example. Along with our input variable, we
also give our model the corresponding correct labels. While training, the model gets to
look at which label corresponds to our data and hence can find patterns between our
data and those labels.

Some examples of Supervised Learning include:

1. It classifies spam Detection by teaching a model of what mail is spam and not spam.

2. Speech recognition where you teach a machine to recognize your voice.

3. Object Recognition by showing a machine what an object looks like and having it pick
that object from among other objects.

We can further divide Supervised Learning into the following:

Figure 1: Supervised Learning Subdivisions

What is Classification?
Classification is defined as the process of recognition, understanding, and grouping of
objects and ideas into preset categories a.k.a “sub-populations.” With the help of these
pre-categorized training datasets, classification in machine learning programs leverage
a wide range of algorithms to classify future datasets into respective and relevant
categories.

Classification algorithms used in machine learning utilize input training data for the
purpose of predicting the likelihood or probability that the data that follows will fall into
one of the predetermined categories. One of the most common applications of
classification is for filtering emails into “spam” or “non-spam”, as used by today’s top
email service providers.

In short, classification is a form of “pattern recognition,”. Here, classification algorithms


applied to the training data find the same pattern (similar number sequences, words or
sentiments, and the like) in future data sets.

We will explore classification algorithms in detail, and discover how a text analysis
software can perform actions like sentiment analysis - used for categorizing
unstructured text by opinion polarity (positive, negative, neutral, and the like).

Figure 2: Classification of vegetables and groceries

What is Classification Algorithm?

Based on training data, the Classification algorithm is a Supervised Learning technique


used to categorize new observations. In classification, a program uses the dataset or
observations provided to learn how to categorize new observations into various classes
or groups. For instance, 0 or 1, red or blue, yes or no, spam or not spam, etc. Targets,
labels, or categories can all be used to describe classes. The Classification algorithm
uses labeled input data because it is a supervised learning technique and comprises
input and output information. A discrete output function (y) is transferred to an input
variable in the classification process (x).

In simple words, classification is a type of pattern recognition in which classification


algorithms are performed on training data to discover the same pattern in new data
sets.

Learners in Classification Problems

There are two types of learners.

 Lazy Learners

It first stores the training dataset before waiting for the test dataset to arrive. When
using a lazy learner, the classification is carried out using the training dataset's most
appropriate data. Less time is spent on training, but more time is spent on predictions.
Some of the examples are case-based reasoning and the KNN algorithm.

 Eager Learners

Before obtaining a test dataset, eager learners build a classification model using a
training dataset. They spend more time studying and less time predicting. Some of the
examples are ANN, naive Bayes, and Decision trees.

Now, let us discuss four types of Classification Tasks in Machine Learning.

4 Types Of Classification Tasks In Machine Learning

Before diving into the four types of Classification Tasks in Machine Learning, let us first
discuss Classification Predictive Modeling.

Classification Predictive Modeling


A classification problem in machine learning is one in which a class label is anticipated
for a specific example of input data.

Problems with categorization include the following:

 Give an example and indicate whether it is spam or not.

 Identify a handwritten character as one of the recognized characters.

 Determine whether to label the current user behavior as churn.

A training dataset with numerous examples of inputs and outputs is necessary for
classification from a modeling standpoint.

A model will determine the optimal way to map samples of input data to certain class
labels using the training dataset. The training dataset must therefore contain a large
number of samples of each class label and be suitably representative of the problem.

When providing class labels to a modeling algorithm, string values like "spam" or "not
spam" must first be converted to numeric values. Label encoding, which is frequently
used, assigns a distinct integer to every class label, such as "spam" = 0, "no spam," = 1.

There are numerous varieties of algorithms for classification in modeling problems,


including predictive modeling and classification.

It is typically advised that a practitioner undertake controlled tests to determine what


algorithm and algorithm configuration produces the greatest performance for a certain
classification task because there is no strong theory on how to map algorithms onto
issue types.

Based on their output, classification predictive modeling algorithms are assessed. A


common statistic for assessing a model's performance based on projected class labels
is classification accuracy. Although not perfect, classification accuracy is a reasonable
place to start for many classification jobs.

Some tasks may call for a class membership probability prediction for each example
rather than class labels. This adds more uncertainty to the prediction, which a user or
application can subsequently interpret. The ROC Curve is a well-liked diagnostic for
assessing anticipated probabilities.

There are four different types of Classification Tasks in Machine Learning and they are
following -
 Binary Classification

 Multi-Class Classification

 Multi-Label Classification

 Imbalanced Classification

Now, let us look at each of them in detail.

Binary Classification

Those classification jobs with only two class labels are referred to as binary
classification.

Examples comprise -

 Prediction of conversion (buy or not).

 Churn forecast (churn or not).

 Detection of spam email (spam or not).

Binary classification problems often require two classes, one representing the normal
state and the other representing the aberrant state.

For instance, the normal condition is "not spam," while the abnormal state is "spam."
Another illustration is when a task involving a medical test has a normal condition of
"cancer not identified" and an abnormal state of "cancer detected."

Class label 0 is given to the class in the normal state, whereas class label 1 is given to
the class in the abnormal condition.

A model that forecasts a Bernoulli probability distribution for each case is frequently
used to represent a binary classification task.

The discrete probability distribution known as the Bernoulli distribution deals with the
situation where an event has a binary result of either 0 or 1. In terms of classification,
this indicates that the model forecasts the likelihood that an example would fall within
class 1, or the abnormal state.

The following are well-known binary classification algorithms:

 Logistic Regression
 Support Vector Machines

 Simple Bayes

 Decision Trees

Some algorithms, such as Support Vector Machines and Logistic Regression, were
created expressly for binary classification and do not by default support more than two
classes.

Let us now discuss Multi-Class Classification.

Multi-Class Classification

Multi-class labels are used in classification tasks referred to as multi-class classification.

Examples comprise -

 Categorization of faces.

 Classifying plant species.

 Character recognition using optical.

The multi-class classification does not have the idea of normal and abnormal outcomes,
in contrast to binary classification. Instead, instances are grouped into one of several
well-known classes.

In some cases, the number of class labels could be rather high. In a facial recognition
system, for instance, a model might predict that a shot belongs to one of thousands or
tens of thousands of faces.

Text translation models and other problems involving word prediction could be
categorized as a particular case of multi-class classification. Each word in the sequence
of words to be predicted requires a multi-class classification, where the vocabulary size
determines the number of possible classes that may be predicted and may range from
tens of thousands to hundreds of thousands of words.

Multiclass classification tasks are frequently modeled using a model that forecasts a
Multinoulli probability distribution for each example.

An event that has a categorical outcome, such as K in 1, 2, 3,..., K, is covered by the


Multinoulli distribution, which is a discrete probability distribution. In terms of
classification, this implies that the model forecasts the likelihood that a given example
will belong to a certain class label.

For multi-class classification, many binary classification techniques are applicable.

The following well-known algorithms can be used for multi-class classification:

 Progressive Boosting

 Choice trees

 Nearest K Neighbors

 Rough Forest

 Simple Bayes

Multi-class problems can be solved using algorithms created for binary classification.

In order to do this, a method is known as "one-vs-rest" or "one model for each pair of
classes" is used, which includes fitting multiple binary classification models with each
class versus all other classes (called one-vs-one).

 One-vs-One: For each pair of classes, fit a single binary classification model.

The following binary classification algorithms can apply these multi-class classification
techniques:

 One-vs-Rest: Fit a single binary classification model for each class versus all other
classes.

The following binary classification algorithms can apply these multi-class classification
techniques:

 Support vector Machine

 Logistic Regression

Let us now learn about Multi-Label Classification.

Multi-Label Classification

Multi-label classification problems are those that feature two or more class labels and
allow for the prediction of one or more class labels for each example.
Think about the photo classification example. Here a model can predict the existence of
many known things in a photo, such as “person”, “apple”, "bicycle," etc. A particular
photo may have multiple objects in the scene.

This greatly contrasts with multi-class classification and binary classification, which
anticipate a single class label for each occurrence.

Multi-label classification problems are frequently modeled using a model that forecasts
many outcomes, with each outcome being forecast as a Bernoulli probability
distribution. In essence, this approach predicts several binary classifications for each
example.

It is not possible to directly apply multi-label classification methods used for multi-class
or binary classification. The so-called multi-label versions of the algorithms, which are
specialized versions of the conventional classification algorithms, include:

 Multi-label Gradient Boosting

 Multi-label Random Forests

 Multi-label Decision Trees

Another strategy is to forecast the class labels using a different classification algorithm.

Now, we will look into the Imbalanced Classification Task in detail.

Imbalanced Classification

The term "imbalanced classification" describes classification jobs where the distribution
of examples within each class is not equal.

A majority of the training dataset's instances belong to the normal class, while a minority
belong to the abnormal class, making imbalanced classification tasks binary
classification tasks in general.

Examples comprise -

 Clinical diagnostic procedures

 Detection of outliers

 Fraud investigation
Although they could need unique methods, these issues are modeled as binary
classification jobs.

By oversampling the minority class or undersampling the majority class, specialized


strategies can be employed to alter the sample composition in the training dataset.

Examples comprise -

 SMOTE Oversampling

 Random Undersampling

It is possible to utilize specialized modeling techniques, like the cost-sensitive machine


learning algorithms, that give the minority class more consideration when fitting the
model to the training dataset.

Examples comprise:

 Cost-sensitive Support Vector Machines

 Cost-sensitive Decision Trees

 Cost-sensitive Logistic Regression

Since reporting the classification accuracy may be deceptive, alternate performance


indicators may be necessary.

Examples comprise -

 F-Measure

 Recall

 Precision

Now, we will be discussing the types of Machine Learning Classification Algorithms.

Types of Classification Algorithms

You can apply many different classification methods based on the dataset you are
working with. It is so because the study of classification in statistics is extensive. The
top five machine learning algorithms are listed below.
1. Logistic Regression

It is a supervised learning classification technique that forecasts the likelihood of a


target variable. There will only be a choice between two classes. Data can be coded as
either one or yes, representing success, or as 0 or no, representing failure. The
dependent variable can be predicted most effectively using logistic regression. When
the forecast is categorical, such as true or false, yes or no, or a 0 or 1, you can use it. A
logistic regression technique can be used to determine whether or not an email is a
spam.

2. Naive Byes

Naive Bayes determines whether a data point falls into a particular category. It can be
used to classify phrases or words in text analysis as either falling within a
predetermined classification or not.

Text Tag

“A great game” Sports

“The election is over” Not Sports

“What a great score” Sports

“A clean and unforgettable game” Sports


“The spelling bee winner was a surprise” Not Sports

3. K-Nearest Neighbors

It calculates the likelihood that a data point will join the groups based on which group
the data points closest to it are a part of. When using k-NN for classification, you
determine how to classify the data according to its nearest neighbor.

4. Decision Tree

A decision tree is an example of supervised learning. Although it can solve regression


and classification problems, it excels in classification problems. Similar to a flow chart, it
divides data points into two similar groups at a time, starting with the "tree trunk" and
moving through the "branches" and "leaves" until the categories are more closely
related to one another.

5. Random Forest Algorithm

The random forest algorithm is an extension of the Decision Tree algorithm where you
first create a number of decision trees using training data and then fit your new data into
one of the created ‘tree’ as a ‘random forest’. It averages the data to connect it to the
nearest tree data based on the data scale. These models are great for improving the
decision tree’s problem of forcing data points unnecessarily within a category.

6. Support Vector Machine

Support Vector Machine is a popular supervised machine learning technique for


classification and regression problems. It goes beyond X/Y prediction by using
algorithms to classify and train the data according to polarity.
Types of ML Classification Algorithms

1. Supervised Learning Approach

The supervised learning approach explicitly trains algorithms under close human
supervision. Both the input and the output data are first provided to the algorithm. The
algorithm then develops rules that map the input to the output. The training procedure is
repeated as soon as the highest level of performance is attained.

The two types of supervised learning approaches are:

 Regression

 Classification

2. Unsupervised Learning

This approach is applied to examine data's inherent structure and derive insightful
information from it. This technique looks for insights that can produce better results by
looking for patterns and insights in unlabeled data.

There are two types of unsupervised learning:

 Clustering

 Dimensionality reduction

3. Semi-supervised Learning

Semi-supervised learning lies on the spectrum between unsupervised and supervised


learning. It combines the most significant aspects of both worlds to provide a unique set
of algorithms.

4. Reinforcement Learning

The goal of reinforcement learning is to create autonomous, self-improving algorithms.


The algorithm's goal is to improve itself through a continual cycle of trials and errors
based on the interactions and combinations between the incoming and labeled data.
Classification Models

 Naive Bayes: Naive Bayes is a classification algorithm that assumes that predictors
in a dataset are independent. This means that it assumes the features are unrelated
to each other. For example, if given a banana, the classifier will see that the fruit is of
yellow color, oblong-shaped and long and tapered. All of these features will
contribute independently to the probability of it being a banana and are not
dependent on each other. Naive Bayes is based on Bayes’ theorem, which is given
as:

Figure 3 : Bayes’ Theorem

Where :

P(A | B) = how often happens given that B happens

P(A) = how likely A will happen

P(B) = how likely B will happen

P(B | A) = how often B happens given that A happens

 Decision Trees: A Decision Tree is an algorithm that is used to visually represent


decision-making. A Decision Tree can be made by asking a yes/no question and
splitting the answer to lead to another decision. The question is at the node and it
places the resulting decisions below at the leaves. The tree depicted below is used to
decide if we can play tennis.
Figure 4: Decision Tree

In the above figure, depending on the weather conditions and the humidity and wind, we
can systematically decide if we should play tennis or not. In decision trees, all the False
statements lie on the left of the tree and the True statements branch off to the right.
Knowing this, we can make a tree which has the features at the nodes and the resulting
classes at the leaves.

 K-Nearest Neighbors: K-Nearest Neighbor is a classification and prediction algorithm


that is used to divide data into classes based on the distance between the data
points. K-Nearest Neighbor assumes that data points which are close to one another
must be similar and hence, the data point to be classified will be grouped with the
closest cluster.

Figure 5: Data to be classified


Figure 6: Classification using K-Nearest Neighbours

Evaluating a Classification Model

After our model is finished, we must assess its performance to determine whether it is a
regression or classification model. So, we have the following options for assessing a
classification model:

1. Confusion Matrix

 The confusion matrix describes the model performance and gives us a matrix or table
as an output.

 The error matrix is another name for it.

 The matrix is made up of the results of the forecasts in a condensed manner,


together with the total number of right and wrong guesses.

The matrix appears in the following table:

Actual Positive Actual Negative


Predicted Positive True Positive False Positive

Predicted Negative False Negative True Negative

Accuracy = (TP+TN)/Total Population

2. Log Loss or Cross-Entropy Loss

 It is used to assess a classifier's performance, and the output is a probability value


between 1 and 0.

 A successful binary classification model should have a log loss value that is close to
0.

 If the anticipated value differs from the actual value, the value of log loss rises.

 The lower log loss shows the model’s higher accuracy.

Cross-entropy for binary classification can be calculated as:

(ylog(p)+(1?y)log(1?p))

Where p = Predicted Output, y = Actual output.

3. AUC-ROC Curve

 AUC is for Area Under the Curve, and ROC refers to Receiver Operating
Characteristics Curve.

 It is a graph that displays the classification model's performance at various


thresholds.

 The AUC-ROC Curve is used to show how well the multi-class classification model
performs.

 The TPR and FPR are used to draw the ROC curve, with the True Positive Rate
(TPR) on the Y-axis and the FPR (False Positive Rate) on the X-axis.
Now, let us discuss the use cases of Classification Algorithms.

Use Cases Of Classification Algorithms

There are many applications for classification algorithms. Here are a few of them

 Speech Recognition

 Detecting Spam Emails

 Categorization of Drugs

 Cancer Tumor Cell Identification

 Biometric Authentication, etc.

Classifier Evaluation

The evaluation to verify a classifier's accuracy and effectiveness is the most crucial step
after it is finished. We can evaluate a classifier in a variety of ways. Let's look at these
techniques that are stated below, beginning with Cross-Validation.

Cross-Validation

The most prominent issue with most machine learning models is over-fitting. It is
possible to check the model's overfitting with K-fold cross-validation.

With this technique, the data set is randomly divided into k equal-sized, mutually
exclusive subsets. One is retained for testing, while the others are utilized for training
the model. For each of the k folds, the same procedure is followed.

Holdout Method

This is the approach used the most frequently to assess classifiers. According to this
method, the given data set is split into a test set and a train set, each comprising 20%
and 80% of the total data.
The unseen test set is used to evaluate the data's prediction ability after it has been
trained using the train set.

ROC Curve

For a visual comparison of classification models, the ROC curve, also known as
receiver operating characteristics, is utilized. It illustrates the correlation between the
false positive rate and the true positive rate. The accuracy of the model is determined
by the area under the ROC curve.

Bias and Variance

Bias is the difference between our actual and predicted values. Bias is the simple
assumptions that our model makes about our data to be able to predict on new data. It
directly corresponds to the patterns found in our data. When the Bias is high,
assumptions made by our model are too basic, the model can’t capture the important
features of our data, this is called underfitting.

Figure 7: Bias

We can define variance as the model’s sensitivity to fluctuations in the data. Our model
may learn from noise. This will cause our model to consider trivial features as important.
When the Variance is high, our model will capture all the features of the data given to it,
will tune itself to the data, and predict on it very well but new data may not have the
exact same features and the model won’t be able to predict on it very well. We call
this Overfitting.
Figure 8: Example of Variance

Precision and Recall

Precision is used to calculate the model's ability to classify values correctly. It is given
by dividing the number of correctly classified data points by the total number of
classified data points for that class label.

Where :

TP = True Positives, when our model correctly classifies the data point to the class it
belongs to.

FP = False Positives, when the model falsely classifies the data point.

Recall is used to calculate the ability of the mode to predict positive values. But, "How
often does the model predict the correct positive values?". This is calculated by the ratio
of true positives and the total number of actual positive values.

Now, let us look at Algorithm Selection.

Algorithm Selection

In addition to the strategy described above, we may apply the procedures listed below
to choose the optimum algorithm for the model.
 Read the information.

 Based on our independent and dependent features, and create dependent and
independent data sets.

 Create training and test sets for the data.

 Utilize many algorithms to train the model, including SVM, Decision Tree, KNN, etc.

 Consider the classifier.

 Decide on the most accurate classifier.

Accuracy is the greatest path ahead to making your model efficient, even though it
could take longer than necessary to select the optimum algorithm for your model.

Our Learners Also Asked

1. What is a classification algorithm, with example?

A classification involves predicting a class label for a specific example of input data. For
example, It can identify whether or not a code is a spam. It can classify the handwriting
if it consists of one of the known characters.

2. What is the best classification algorithm?

Compared to other classification algorithms like Logistic Regression, Support Vector


Machines, and Decision Regression, the Naive Bayes classifier algorithm produces
better results.

3. What is the most straightforward classification algorithm?

One of the most straightforward classification techniques is kNN.

4. Classifier vs. Algorithm in Machine Learning?

The technique, or set of guidelines, that computers use to categorize data is known as a
classifier. When it comes to the classification model, it is the result of the classifiers ML.
The classifier is used to train the model, which then eventually classifies your data.
5. What are classification and types?

Classification is a category or division in a system that categorizes or organizes objects


into groups or types. You can encounter the following four categories of classification
tasks: Binary, Multi-class, Multi-label, and Imbalanced classification.

6. What is the difference between classification and clustering?

The goal of clustering is to group similar types of items by taking into account the most
satisfying criteria, which states that no two items in the same group should be
comparable. This differs from classification, where the goal is to forecast the target
class.

You might also like