Classification
Classification
Before we dive into Classification, let’s take a look at what Supervised Learning is.
Suppose you are trying to learn a new concept in maths and after solving a problem,
you may refer to the solutions to see if you were right or not. Once you are confident in
your ability to solve a particular type of problem, you will stop referring to the answers
and solve the questions put before you by yourself.
This is also how Supervised Learning works with machine learning models. In
Supervised Learning, the model learns by example. Along with our input variable, we
also give our model the corresponding correct labels. While training, the model gets to
look at which label corresponds to our data and hence can find patterns between our
data and those labels.
1. It classifies spam Detection by teaching a model of what mail is spam and not spam.
3. Object Recognition by showing a machine what an object looks like and having it pick
that object from among other objects.
What is Classification?
Classification is defined as the process of recognition, understanding, and grouping of
objects and ideas into preset categories a.k.a “sub-populations.” With the help of these
pre-categorized training datasets, classification in machine learning programs leverage
a wide range of algorithms to classify future datasets into respective and relevant
categories.
Classification algorithms used in machine learning utilize input training data for the
purpose of predicting the likelihood or probability that the data that follows will fall into
one of the predetermined categories. One of the most common applications of
classification is for filtering emails into “spam” or “non-spam”, as used by today’s top
email service providers.
We will explore classification algorithms in detail, and discover how a text analysis
software can perform actions like sentiment analysis - used for categorizing
unstructured text by opinion polarity (positive, negative, neutral, and the like).
Lazy Learners
It first stores the training dataset before waiting for the test dataset to arrive. When
using a lazy learner, the classification is carried out using the training dataset's most
appropriate data. Less time is spent on training, but more time is spent on predictions.
Some of the examples are case-based reasoning and the KNN algorithm.
Eager Learners
Before obtaining a test dataset, eager learners build a classification model using a
training dataset. They spend more time studying and less time predicting. Some of the
examples are ANN, naive Bayes, and Decision trees.
Before diving into the four types of Classification Tasks in Machine Learning, let us first
discuss Classification Predictive Modeling.
A training dataset with numerous examples of inputs and outputs is necessary for
classification from a modeling standpoint.
A model will determine the optimal way to map samples of input data to certain class
labels using the training dataset. The training dataset must therefore contain a large
number of samples of each class label and be suitably representative of the problem.
When providing class labels to a modeling algorithm, string values like "spam" or "not
spam" must first be converted to numeric values. Label encoding, which is frequently
used, assigns a distinct integer to every class label, such as "spam" = 0, "no spam," = 1.
Some tasks may call for a class membership probability prediction for each example
rather than class labels. This adds more uncertainty to the prediction, which a user or
application can subsequently interpret. The ROC Curve is a well-liked diagnostic for
assessing anticipated probabilities.
There are four different types of Classification Tasks in Machine Learning and they are
following -
Binary Classification
Multi-Class Classification
Multi-Label Classification
Imbalanced Classification
Binary Classification
Those classification jobs with only two class labels are referred to as binary
classification.
Examples comprise -
Binary classification problems often require two classes, one representing the normal
state and the other representing the aberrant state.
For instance, the normal condition is "not spam," while the abnormal state is "spam."
Another illustration is when a task involving a medical test has a normal condition of
"cancer not identified" and an abnormal state of "cancer detected."
Class label 0 is given to the class in the normal state, whereas class label 1 is given to
the class in the abnormal condition.
A model that forecasts a Bernoulli probability distribution for each case is frequently
used to represent a binary classification task.
The discrete probability distribution known as the Bernoulli distribution deals with the
situation where an event has a binary result of either 0 or 1. In terms of classification,
this indicates that the model forecasts the likelihood that an example would fall within
class 1, or the abnormal state.
Logistic Regression
Support Vector Machines
Simple Bayes
Decision Trees
Some algorithms, such as Support Vector Machines and Logistic Regression, were
created expressly for binary classification and do not by default support more than two
classes.
Multi-Class Classification
Examples comprise -
Categorization of faces.
The multi-class classification does not have the idea of normal and abnormal outcomes,
in contrast to binary classification. Instead, instances are grouped into one of several
well-known classes.
In some cases, the number of class labels could be rather high. In a facial recognition
system, for instance, a model might predict that a shot belongs to one of thousands or
tens of thousands of faces.
Text translation models and other problems involving word prediction could be
categorized as a particular case of multi-class classification. Each word in the sequence
of words to be predicted requires a multi-class classification, where the vocabulary size
determines the number of possible classes that may be predicted and may range from
tens of thousands to hundreds of thousands of words.
Multiclass classification tasks are frequently modeled using a model that forecasts a
Multinoulli probability distribution for each example.
Progressive Boosting
Choice trees
Nearest K Neighbors
Rough Forest
Simple Bayes
Multi-class problems can be solved using algorithms created for binary classification.
In order to do this, a method is known as "one-vs-rest" or "one model for each pair of
classes" is used, which includes fitting multiple binary classification models with each
class versus all other classes (called one-vs-one).
One-vs-One: For each pair of classes, fit a single binary classification model.
The following binary classification algorithms can apply these multi-class classification
techniques:
One-vs-Rest: Fit a single binary classification model for each class versus all other
classes.
The following binary classification algorithms can apply these multi-class classification
techniques:
Logistic Regression
Multi-Label Classification
Multi-label classification problems are those that feature two or more class labels and
allow for the prediction of one or more class labels for each example.
Think about the photo classification example. Here a model can predict the existence of
many known things in a photo, such as “person”, “apple”, "bicycle," etc. A particular
photo may have multiple objects in the scene.
This greatly contrasts with multi-class classification and binary classification, which
anticipate a single class label for each occurrence.
Multi-label classification problems are frequently modeled using a model that forecasts
many outcomes, with each outcome being forecast as a Bernoulli probability
distribution. In essence, this approach predicts several binary classifications for each
example.
It is not possible to directly apply multi-label classification methods used for multi-class
or binary classification. The so-called multi-label versions of the algorithms, which are
specialized versions of the conventional classification algorithms, include:
Another strategy is to forecast the class labels using a different classification algorithm.
Imbalanced Classification
The term "imbalanced classification" describes classification jobs where the distribution
of examples within each class is not equal.
A majority of the training dataset's instances belong to the normal class, while a minority
belong to the abnormal class, making imbalanced classification tasks binary
classification tasks in general.
Examples comprise -
Detection of outliers
Fraud investigation
Although they could need unique methods, these issues are modeled as binary
classification jobs.
Examples comprise -
SMOTE Oversampling
Random Undersampling
Examples comprise:
Examples comprise -
F-Measure
Recall
Precision
You can apply many different classification methods based on the dataset you are
working with. It is so because the study of classification in statistics is extensive. The
top five machine learning algorithms are listed below.
1. Logistic Regression
2. Naive Byes
Naive Bayes determines whether a data point falls into a particular category. It can be
used to classify phrases or words in text analysis as either falling within a
predetermined classification or not.
Text Tag
3. K-Nearest Neighbors
It calculates the likelihood that a data point will join the groups based on which group
the data points closest to it are a part of. When using k-NN for classification, you
determine how to classify the data according to its nearest neighbor.
4. Decision Tree
The random forest algorithm is an extension of the Decision Tree algorithm where you
first create a number of decision trees using training data and then fit your new data into
one of the created ‘tree’ as a ‘random forest’. It averages the data to connect it to the
nearest tree data based on the data scale. These models are great for improving the
decision tree’s problem of forcing data points unnecessarily within a category.
The supervised learning approach explicitly trains algorithms under close human
supervision. Both the input and the output data are first provided to the algorithm. The
algorithm then develops rules that map the input to the output. The training procedure is
repeated as soon as the highest level of performance is attained.
Regression
Classification
2. Unsupervised Learning
This approach is applied to examine data's inherent structure and derive insightful
information from it. This technique looks for insights that can produce better results by
looking for patterns and insights in unlabeled data.
Clustering
Dimensionality reduction
3. Semi-supervised Learning
4. Reinforcement Learning
Naive Bayes: Naive Bayes is a classification algorithm that assumes that predictors
in a dataset are independent. This means that it assumes the features are unrelated
to each other. For example, if given a banana, the classifier will see that the fruit is of
yellow color, oblong-shaped and long and tapered. All of these features will
contribute independently to the probability of it being a banana and are not
dependent on each other. Naive Bayes is based on Bayes’ theorem, which is given
as:
Where :
In the above figure, depending on the weather conditions and the humidity and wind, we
can systematically decide if we should play tennis or not. In decision trees, all the False
statements lie on the left of the tree and the True statements branch off to the right.
Knowing this, we can make a tree which has the features at the nodes and the resulting
classes at the leaves.
After our model is finished, we must assess its performance to determine whether it is a
regression or classification model. So, we have the following options for assessing a
classification model:
1. Confusion Matrix
The confusion matrix describes the model performance and gives us a matrix or table
as an output.
A successful binary classification model should have a log loss value that is close to
0.
If the anticipated value differs from the actual value, the value of log loss rises.
(ylog(p)+(1?y)log(1?p))
3. AUC-ROC Curve
AUC is for Area Under the Curve, and ROC refers to Receiver Operating
Characteristics Curve.
The AUC-ROC Curve is used to show how well the multi-class classification model
performs.
The TPR and FPR are used to draw the ROC curve, with the True Positive Rate
(TPR) on the Y-axis and the FPR (False Positive Rate) on the X-axis.
Now, let us discuss the use cases of Classification Algorithms.
There are many applications for classification algorithms. Here are a few of them
Speech Recognition
Categorization of Drugs
Classifier Evaluation
The evaluation to verify a classifier's accuracy and effectiveness is the most crucial step
after it is finished. We can evaluate a classifier in a variety of ways. Let's look at these
techniques that are stated below, beginning with Cross-Validation.
Cross-Validation
The most prominent issue with most machine learning models is over-fitting. It is
possible to check the model's overfitting with K-fold cross-validation.
With this technique, the data set is randomly divided into k equal-sized, mutually
exclusive subsets. One is retained for testing, while the others are utilized for training
the model. For each of the k folds, the same procedure is followed.
Holdout Method
This is the approach used the most frequently to assess classifiers. According to this
method, the given data set is split into a test set and a train set, each comprising 20%
and 80% of the total data.
The unseen test set is used to evaluate the data's prediction ability after it has been
trained using the train set.
ROC Curve
For a visual comparison of classification models, the ROC curve, also known as
receiver operating characteristics, is utilized. It illustrates the correlation between the
false positive rate and the true positive rate. The accuracy of the model is determined
by the area under the ROC curve.
Bias is the difference between our actual and predicted values. Bias is the simple
assumptions that our model makes about our data to be able to predict on new data. It
directly corresponds to the patterns found in our data. When the Bias is high,
assumptions made by our model are too basic, the model can’t capture the important
features of our data, this is called underfitting.
Figure 7: Bias
We can define variance as the model’s sensitivity to fluctuations in the data. Our model
may learn from noise. This will cause our model to consider trivial features as important.
When the Variance is high, our model will capture all the features of the data given to it,
will tune itself to the data, and predict on it very well but new data may not have the
exact same features and the model won’t be able to predict on it very well. We call
this Overfitting.
Figure 8: Example of Variance
Precision is used to calculate the model's ability to classify values correctly. It is given
by dividing the number of correctly classified data points by the total number of
classified data points for that class label.
Where :
TP = True Positives, when our model correctly classifies the data point to the class it
belongs to.
FP = False Positives, when the model falsely classifies the data point.
Recall is used to calculate the ability of the mode to predict positive values. But, "How
often does the model predict the correct positive values?". This is calculated by the ratio
of true positives and the total number of actual positive values.
Algorithm Selection
In addition to the strategy described above, we may apply the procedures listed below
to choose the optimum algorithm for the model.
Read the information.
Based on our independent and dependent features, and create dependent and
independent data sets.
Utilize many algorithms to train the model, including SVM, Decision Tree, KNN, etc.
Accuracy is the greatest path ahead to making your model efficient, even though it
could take longer than necessary to select the optimum algorithm for your model.
A classification involves predicting a class label for a specific example of input data. For
example, It can identify whether or not a code is a spam. It can classify the handwriting
if it consists of one of the known characters.
The technique, or set of guidelines, that computers use to categorize data is known as a
classifier. When it comes to the classification model, it is the result of the classifiers ML.
The classifier is used to train the model, which then eventually classifies your data.
5. What are classification and types?
The goal of clustering is to group similar types of items by taking into account the most
satisfying criteria, which states that no two items in the same group should be
comparable. This differs from classification, where the goal is to forecast the target
class.