supervised_learning
supervised_learning
1
9/8/2018
WHY IS IT IMPORTANT?
Supervised learning gives the algorithm experience which can be used to
output the predictions for new unseen data
Experience also helps in optimizing the performance of the algorithm
2
9/8/2018
USES OF REGRESSION
Determining the strength of predictors (strength of the effect that the
independent variable have on the dependent variable)
Forecasting an effect
Trend forecasting
3
9/8/2018
LINEAR REGRESSION
4
9/8/2018
LINEAR REGRESSION
LINEAR REGRESSION
5
9/8/2018
LINEAR REGRESSION
R-SQUARED VALUE
R-squared value is a statistical measure of how close the data are to the
fitted regression line.
6
9/8/2018
GOODNESS OF FIT
CONT.…
When the value of R square is equal to 1 then the actual values lies on the regression line.
7
9/8/2018
GRADIENT DESCENT
8
9/8/2018
EXAMPLE
CONT.…
9
9/8/2018
CONT.…
CONT.….
For Slope
10
9/8/2018
CONT.…
CODE
11
9/8/2018
LOGISTIC REGRESSION
Logistic Regression produces result in a binary format which is used to predict the
outcome of a categorical dependent variable. So the outcome should be
discrete/categorical such as:
12
9/8/2018
EXAMPLE
CONT.…
13
9/8/2018
CLASSIFICATION
Classification is a process of categorizing a given set of data into classes, It can be
performed on both structured or unstructured data.
The process starts with predicting the class of given data points. The classes are often
referred to as target, label or categories.
CLASSIFICATION TERMINOLOGIES
14
9/8/2018
CLASSIFICATION ALGORITHMS
In machine learning, classification is a supervised learning concept which basically
categorizes a set of data into classes.
15
9/8/2018
LOGISTIC REGRESSION
It is a classification algorithm in machine learning that uses one or more independent
variables to determine an outcome.
It will have only two possible outcomes.
16
9/8/2018
K-NEAREST NEIGHBOR
It is a lazy learning algorithm that stores all instances corresponding to training data
in n-dimensional space.
It is a lazy learning algorithm as it does not focus on constructing a general internal
model, instead, it works on storing instances of training data.
17
9/8/2018
DECISION TREE
The decision tree algorithm builds the classification model in the form of a tree
structure.
RANDOM FOREST
Random decision trees or random forest are an ensemble learning method for
classification, regression, etc.
It operates by constructing a multitude of decision trees at training time and outputs
the class that is the mode of the classes or classification or mean
prediction(regression) of the individual trees.
18
9/8/2018
19
9/8/2018
CLASSIFIER EVALUATION
HOLDOUT METHOD
This is the most common method to evaluate a classifier. In this method, the given data
set is divided into two parts as a test and train set 20% and 80% respectively.
The train set is used to train the data and the unseen test set is used to test its
predictive power.
20
9/8/2018
CROSS-VALIDATION
Over-fitting is the most common problem prevalent
in most of the machine learning models. K-fold
cross-validation can be conducted to verify if the
model is over-fitted at all.
In this method, the data set is randomly
partitioned into k mutually exclusive subsets, each
of which is of the same size. Out of these, one is
kept for testing and others are used to train the
model. The same process takes place for all k
folds.
CLASSIFICATION REPORT
21
9/8/2018
ROC CURVE
Receiver operating characteristics or ROC curve is used for visual comparison of
classification models, which shows the relationship between the true positive rate and
the false positive rate. The area under the ROC curve is the measure of the accuracy
of the model.
ALGORITHM SELECTION
22
9/8/2018
23