Machine Learning (Part 1) : Iykra Data Fellowship Batch 3
Machine Learning (Part 1) : Iykra Data Fellowship Batch 3
Learning
(Part 1)
IYKRA DATA FELLOWSHIP BATCH 3
Outline
• Introduction to Machine
Learning
• Regression
• Linear Regression
• Classification
• Logistic Regression
• Naïve Bayes
• Support Vector Machine
• K-Nearest Neighbours
• Decision Tree
• Random Forest
[Machine Learning is the] field of
study that gives computers the
ability to learn without explicitly
programmed
This is the most basic regression model in As name implies, in this regression model
which predictions are formed from a single, the predictions are formed from multiple
univariate feature of the data. features of the data.
Applications
❖Forecasting or Predictive Analysis
❖Optimization
❖Error Correction
❖Economics
❖Finance
Gradient Descent and Cost
Function
Gradient descent is an optimization
algorithm used to minimize some function
by iteratively moving in the direction of
steepest descent as defined by the negative
of the gradient. In machine learning, we use
gradient descent to update
the parameters of our model. Parameters
refer to coefficients in Linear
Regression and weights in neural networks.
Logistic
Regression
Logistic regression is a
supervised learning
classification algorithm used to
predict the probability of a
target variable. The nature of
target or dependent variable is
dichotomous, which means
there would be only two
possible classes.
In simple words, the dependent
variable is binary in nature
having data coded as either 1
(stands for success/yes) or 0
(stands for failure/no).
Types of Logistic Regression
BINARY OR BINOMIAL MULTINOMIAL ORDINAL
It is the simplest Naïve Bayes The features are assumed to be Another important model is
classifier having the assumption drawn from a simple Multinomial Bernoulli Naïve Bayes in which
that the data from each label is distribution. features are assumed to be binary
drawn from a simple Gaussian (0s and 1s).
distribution.
Pros and Cons of Naïve Bayes
Algorithm
PROS CONS
➢Multi-class prediction
➢ Text Classificatiion
➢Recommendation system
Support Vector
Machine
A set of supervised learning
methods which learn from
the dataset and can be used
for both regression and
classification
Working of SVM
• Support Vectors, Datapoints that are closest to the
hyperplane is called support vectors. Separating line
will be defined with the help of these data points
• Hyperplane − As we can see in the above diagram, it is
a decision plane or space which is divided between a
set of objects having different classes.
• Margin − It may be defined as the gap between two
lines on the closet data points of different classes. It
can be calculated as the perpendicular distance from
the line to the support vectors. Large margin is
considered as a good margin and small margin is
considered as a bad margin.
Kernels
Kernel method is used by SVM to
perform a non-linear classification.
They take low dimensional input space
and convert them into high dimensional
input space. It converts non-separable
classes into the separable one, it finds
out a way to separate the data on the
basis of the data labels defined by us.
Pros and Cons associated with
SVM
PROS CONS
It works really well with a clear margin of It doesn’t perform well when we have large
separation data set because the required training time is
higher
It is effective in high dimensional spaces.
It also doesn’t perform very well, when the
It is effective in cases where the number of data set has more noise i.e. target classes are
dimensions is greater than the number of overlapping
samples.
SVM doesn’t directly provide probability
It uses a subset of training points in the estimates, these are calculated using an
decision function (called support vectors), so it expensive five-fold cross-validation. It is
is also memory efficient. included in the related SVC method of Python
scikit-learn library.
K-Nearest
Neighbours
Works by finding the
distances between a query
and all the examples in the
data, selecting the specified
number examples (K) closest
to the query, then votes for
the most frequent label (in
the case of classification) or
averages the labels (in the
case of regression).
Working of KNN
1. Load datasets
2. Choose value of K
It is very useful for nonlinear data because there is High memory storage required as compared to other
no assumption about data in this algorithm. supervised learning algorithms.
Decision Tree which has binary target Decision Tree has continuous target variable
variable then it called as Binary Variable then it is called as Continuous Variable
Decision Tree. Decision Tree.
Advantages and
Disadvantages of Decision
Tree
ADVANTAGES DISADVANTAGES
• Easy to Understand • Overfit
• Useful in Data Exploration • Not fit for continuous variables
• Less data cleaning required