Machine Learning in A Nutshell
Machine Learning in A Nutshell
LEARNING
IN A NUTSHELL
2
TOPICS COVERED
Introduction
Building an ML Model
Types of ML Algorithms
ML Drawbacks
Neural Nets
MACHINE LEARNING
Firstly , we need to split entire dataset into training and testing data
Train the Model : (Model is the ml algorithm itself) We feed training
data to the model, so that the model will understand the features and
its respected class label. (features are the columns of the data, class
label is the expected output)
Loss Calculation : The Loss functions calculates the difference between
the actual output(y) and predicted output (y hat) by the model. Many
different functions available to calculate loss.
Retrain the Model after tuning parameters : After calculating the loss ,
we need to tune the parameters so that the algorithm improves in
predicting the output.
5
TYPES OF ML ALGORITHMS
Classification Methods
Clustering
2. Unsupuervised Learning
Dimensionality Reduction
3. Reinforcement Learning
7
SUPERVISED LEARNING
The Models which are trained with the “expected output” are called supervised
learning techniques. (train on labelled data)
Essentially, the expected output is the last column in the dataset. The
expected output is known by different names target variable , class label.
K NEAREST NEIGHBORS
Algorithm:
Select a number for “K” , this considers k no.of neighbors to
classify around a data point.
When a new data point is introduced in a plane, the KNN will
calculate the distance between the data point and all other existing
points .
Later , it selects K no.of data points which have less distance ,
considers them as nearest neighbors.
The class which is high in number in the “K” data points , the new
instance is classified to that class.
10
DISTANCE METRICS
1.Euclidean Distance
2.Manhattan Distance
3. Cosine Distance
4.Squared Euclidean
11
NAVIE BAYES
Naïve Bayes , a collection of classification problem based on “bayes theorem” . It
classifies based on the probability.
The key assumption in all the algorithms is that “Every single feature equally
contributes to the probability of the class, without being influenced by the other
features.”
This model predicts the probability of an instance belongs to a class with a given
set of feature value.
12
SVM
• SVM’s classify using a “Hyper plane”.
• They find a hyper-plane , which separates data points into different classes.
• If there are 3 input features then a 3D hyper plane is formed, if there are 2
features then a 2D plane is formed in the space.
• We only need one plane, so the goal is to find a best plane among them.
13
• A best hyper plane is said to have maximum distance between the plane and
the closest data points(support vectors) to the plane on each side
• separation margin / marginal distance = distance between the plane and the
support vectors on each side.
• So, you choose that hyper plane which maximizes the separation margin which
is called “maximum margin hyper plane or hard margin”.
•
14
• In picture one data point (blue ball) is out of its class (outlier),
we cannot draw a hyper plane.
• But in such cases the svm ignores the outlines and finds the
best hyper-plane as discussed above.
• Along with this , for every data point which falls inside the
margin or is misplaced then a penalty is added.
• A slack variable(ξi) is introduced to the objective function and
for every data point and penalty is added to it.
• Eg : If properly classified , (penalty ) ξ1 = 0
falls inside margin , ξ2=0.5
Misplaced , ξ3 = 1
15
• Non – LinearlSVM: In most of the cases the data is not separable linearly , so
in this case the non-linear svm transforms data into higher dimensional space
using kernal trick.
Kernal trick , allows transform data into higher dimension without explicitly
calculating the co-ordinates of the higher dimension with help of its “Kernal
functions”.
Some common kernel functions include linear, polynomial, radial basis function
(RBF), and sigmoid.
DECISION TREES 16
RANDOM FORESTS
• Random forests algorithm is based on the technique “Ensemble Learning”.
• Random Forest is a classifier that contains more than one decision treeand feed on
various subsets of the dataset and takes the majority voting to improve the
accuracy of the prediction.
• These prevent the problem of over fitting and improves accuracy, as they feed on
different subsets of data.
• Many techniques are available to manipulate the data into different subsets to feed
the model.
REGRESSION TASKS
19
The Model which predicts a continuous variable from the given data point is said to
be a Regression Task.
These Models determine relation between one or more Independent variables and
one dependent variable.
Dependent variable is the expected output i.e. Class label and the Independent
variables are the features of the dataset.
The model tries to find the mapping function which maps input to output.
y = b1x+b0
• Since , most of the data available in the real world will have
more than one feature.
y = b0+b1x1+b2x2+b3x3+……….
22
LOGISTIC REGRESSION
UNSUPERVISED LEARNING
These Models train on dataset without the class label
(unlabelled data).
CLUSTERING
The process of
grouping similar
data points
together is said to
be clustering.
K Means Clustering
K MEANS CLUSTERING 25
The clustering algorithm groups entire data points into “K” no.of clusters & the “K” is
defined by the programmer.
Algorithm:
Calculate the distance between a point to all centroids and do it to all the data points.
K MEANS CLUSTERING
27
DIMENSIONALITY REDUCTION
• So, we try to obtain some important features from the dataset this
is known as “Dimensionality reduction”.
PCn>PCn-1>PCn-2>……………PC3>PC2>PC1
29
• The directions picked by PCA (for principal components) are the eigen vectors of
the Covarience matrix i.e PC1 = eigen vector 1 , PC2 = eigen vector 2
• So , the PCA converts the correlations of all features into a lower dimensional
space.
• This results in , clustering of data points which are highly correlated.
30
REINFORCEMENT LEARNING
• Reinforcement Learning is a technique based on trail and error method.
• An agent tries to find out its move through trail and error method by rewards
and penalties.
• It completes the entire task in such a way that it should attain maximum
award in the entire process of completing the goal.
31
DRAWBACKS OF ML
NEURAL NETS
b = bias term
w= weights
x = input
33
1. Sigmoid function
2. Tan h function
3. Re-LU function
4. Leaky ReLU
5. Soft max
34
Praneetha.G
[email protected]