Ch4
Ch4
Classification:
✓ The Classification algorithm is a Supervised Learning technique that is used to identify the
category of new observations on the basis of training data. In Classification, a program learns
from the given dataset or observations and then classifies new observation into a number of
classes or groups. Such as, Yes or No, 0 or 1, Spam or Not Spam, cat or dog, etc. Classes can
be called as targets/labels or categories.
✓ Unlike regression, the output variable of Classification is a category, not a value, such as "Green
or Blue", "fruit or animal", etc. Since the Classification algorithm is a Supervised learning
technique, hence it takes labeled input data, which means it contains input with the
corresponding output.
✓ The best example of an ML classification algorithm is Email Spam Detector.
✓ The main goal of the Classification algorithm is to identify the category of a given dataset, and
these algorithms are mainly used to predict the output for the categorical data.
✓ Classification algorithms can be better understood using the below diagram. In the below
diagram, there are two classes, class A and Class B. These classes have features that are similar
to each other and dissimilar to other classes.
The algorithm which implements the classification on a dataset is known as a classifier. There are two
types of Classifications:
1. Binary Classifier: If the classification problem has only two possible outcomes, then it is called
as Binary Classifier.
Examples: YES or NO, MALE or FEMALE, SPAM or NOT SPAM, CAT or DOG, etc.
2. Multi-class Classifier: If a classification problem has more than two outcomes, then it is called
as Multi-class Classifier.
Example: Classifications of types of crops, Classification of types of music.
Regression:
✓ Regression analysis is a statistical method to model the relationship between a dependent
(target) and independent (predictor) variables with one or more independent variables.
✓ More specifically, Regression analysis helps us to understand how the value of the dependent
variable is changing corresponding to an independent variable when other independent
variables are held fixed. It predicts continuous/real values such as temperature, age, salary,
price, etc.
Example: Suppose there is a marketing company A, who does various advertisement every year and
get sales on that. The below list shows the advertisement made by the company in the last 5 years and
the corresponding sales:
✓ Now, the company wants to do the advertisement of $200 in the year 2019 and wants to know
the prediction about the sales for this year. So to solve such type of prediction problems in
machine learning, we need regression analysis.
✓ Regression is a supervised learning technique which helps in finding the correlation between
variables and enables us to predict the continuous output variable based on the one or more
predictor variables. It is mainly used for prediction, forecasting, time series modeling, and
determining the causal-effect relationship between variables.
✓ In Regression, we plot a graph between the variables which best fits the given datapoints, using
this plot, the machine learning model can make predictions about the data. In simple
words, "Regression shows a line or curve that passes through all the datapoints on target-
predictor graph in such a way that the vertical distance between the datapoints and the
regression line is minimum." The distance between datapoints and line tells whether a model
has captured a strong relationship or not.
Some examples of regression can be as:
The training is provided to the machine with the set of data that has not been labeled, classified, or
categorized, and the algorithm needs to act on that data without any supervision. The goal of
unsupervised learning is to restructure the input data into new features or a group of objects with similar
patterns.
In unsupervised learning, we don't have a predetermined result. The machine tries to find useful insights
from the huge amount of data. It can be further classifieds into two categories of algorithms:
o Clustering
o Association
Clustering:
✓ Clustering or cluster analysis is a machine learning technique, which groups the unlabeled
dataset. It can be defined as "A way of grouping the data points into different clusters,
consisting of similar data points. The objects with the possible similarities remain in a group
that has less or no similarities with another group."
✓ It does it by finding some similar patterns in the unlabeled dataset such as shape, size, color,
behavior, etc., and divides them as per the presence and absence of those similar patterns.
✓ It is an unsupervised learning method; hence no supervision is provided to the algorithm, and it
deals with the unlabeled dataset.
✓ After applying this clustering technique, each cluster or group is provided with a cluster-ID.
ML system can use this id to simplify the processing of large and complex datasets.
✓ The clustering technique is commonly used for statistical data analysis.
Association:
✓ Association rule learning is a type of unsupervised learning technique that checks for the
dependency of one data item on another data item and maps accordingly so that it can be more
profitable.
✓ It tries to find some interesting relations or associations among the variables of dataset. It is
based on different rules to discover the interesting relations between variables in the database.
✓ The association rule learning is one of the very important concepts of machine learning, and it
is employed in Market Basket analysis, Web usage mining, continuous production, etc.
✓ Here market basket analysis is a technique used by the various big retailer to discover the
associations between items. We can understand it by taking an example of a supermarket, as in
a supermarket, all products that are purchased together are put together.
y= a0+a1x+ ε
Here,
The values for x and y variables are training datasets for Linear Regression model representation.
✓ Logistic regression is one of the most popular Machine Learning algorithms, which comes
under the Supervised Learning technique. It is used for predicting the categorical dependent
variable using a given set of independent variables.
✓ Logistic regression predicts the output of a categorical dependent variable. Therefore the
outcome must be a categorical or discrete value. It can be either Yes or No, 0 or 1, true or
False, etc. but instead of giving the exact value as 0 and 1, it gives the probabilistic values
which lie between 0 and 1.
✓ Logistic Regression is much similar to the Linear Regression except that how they are used.
Linear Regression is used for solving Regression problems, whereas Logistic regression is
used for solving the classification problems.
✓ In Logistic regression, instead of fitting a regression line, we fit an "S" shaped logistic
function, which predicts two maximum values (0 or 1).
✓ The curve from the logistic function indicates the likelihood of something such as whether
the cells are cancerous or not, a mouse is obese or not based on its weight, etc.
✓ Logistic Regression is a significant machine learning algorithm because it has the ability to
provide probabilities and classify new data using continuous and discrete datasets.
✓ Logistic Regression can be used to classify the observations using different types of data
and can easily determine the most effective variables used for the classification. The below
image is showing the logistic function: