0% found this document useful (0 votes)
59 views

Machine Learning in A Nutshell

Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
59 views

Machine Learning in A Nutshell

Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
You are on page 1/ 36

MACHINE

LEARNING
IN A NUTSHELL
2

TOPICS COVERED

 Introduction
 Building an ML Model
 Types of ML Algorithms
 ML Drawbacks
 Neural Nets
MACHINE LEARNING

• Machine Learning is the study of


making machines learn by training
the algorithms through data without
explicitly coding.
• ML is the subset of Artificial
Intelligence.
BUILDING AN ML MODEL

Firstly , we need to split entire dataset into training and testing data
 Train the Model : (Model is the ml algorithm itself) We feed training
data to the model, so that the model will understand the features and
its respected class label. (features are the columns of the data, class
label is the expected output)
 Loss Calculation : The Loss functions calculates the difference between
the actual output(y) and predicted output (y hat) by the model. Many
different functions available to calculate loss.
 Retrain the Model after tuning parameters : After calculating the loss ,
we need to tune the parameters so that the algorithm improves in
predicting the output.
5

 Validate Model : We hold back some part of the training data to


validate it before “testing” the model. Over –fitting or Under -
fitting of the data during training can be identified by this step ,
also helps to attain the best version of the model.(if the model
performance stops improving on validation-set we can stop
training to prevent overfitting).
 Test the Model : The test data is used in testing the model (in
supervised learning we don’t provide the actual output to the
model) to check the model’s performance on unseen data.
 Performance Metrics : The results obtained on test data are
considered to calculate the performance of the model through
different types of “Performance Metrics” available in machine
learning
6

TYPES OF ML ALGORITHMS

Classification Methods

1.Supervised Learning Regression Methos

Clustering
2. Unsupuervised Learning
Dimensionality Reduction
3. Reinforcement Learning
7
SUPERVISED LEARNING

 The Models which are trained with the “expected output” are called supervised
learning techniques. (train on labelled data)

 Essentially, the expected output is the last column in the dataset. The
expected output is known by different names target variable , class label.

The supervised Learning Techniques are further classified as:


• Classification
• Regression tasks
CLASSIFICATION TASKS
8

 The Model which classifies a data point (essentially a row


in dataset) from the class labels available in the dataset
is said to be classification.

 Some common classification tasks are:

1. KNN – K Nearest Neighbours


2. SVM – Support Vector Machine
3. Naïve Bayes
4. Decision Trees
5. Random Forests
9

K NEAREST NEIGHBORS

• The KNN algorithms classifies data points based on distanace


methods.

Algorithm:
 Select a number for “K” , this considers k no.of neighbors to
classify around a data point.
 When a new data point is introduced in a plane, the KNN will
calculate the distance between the data point and all other existing
points .
 Later , it selects K no.of data points which have less distance ,
considers them as nearest neighbors.
 The class which is high in number in the “K” data points , the new
instance is classified to that class.
10

DISTANCE METRICS

Some popular Distance metrics are:

1.Euclidean Distance

2.Manhattan Distance

3. Cosine Distance

4.Squared Euclidean
11

NAVIE BAYES
 Naïve Bayes , a collection of classification problem based on “bayes theorem” . It
classifies based on the probability.
 The key assumption in all the algorithms is that “Every single feature equally
contributes to the probability of the class, without being influenced by the other
features.”
 This model predicts the probability of an instance belongs to a class with a given
set of feature value.
12
SVM
• SVM’s classify using a “Hyper plane”.

• They find a hyper-plane , which separates data points into different classes.

• A hyper – plane is formed in an N dimensional space , where N is the number of


features.

• If there are 3 input features then a 3D hyper plane is formed, if there are 2
features then a 2D plane is formed in the space.

• In the picture there are 2 input features and the data


points are classified using multiple hyperplanes (green lines).

• We only need one plane, so the goal is to find a best plane among them.
13

• A best hyper plane is said to have maximum distance between the plane and
the closest data points(support vectors) to the plane on each side
• separation margin / marginal distance = distance between the plane and the
support vectors on each side.
• So, you choose that hyper plane which maximizes the separation margin which
is called “maximum margin hyper plane or hard margin”.

• So, from the figure we choose the l2 hyper plane.


14

• In picture one data point (blue ball) is out of its class (outlier),
we cannot draw a hyper plane.
• But in such cases the svm ignores the outlines and finds the
best hyper-plane as discussed above.
• Along with this , for every data point which falls inside the
margin or is misplaced then a penalty is added.
• A slack variable(ξi) is introduced to the objective function and
for every data point and penalty is added to it.
• Eg : If properly classified , (penalty ) ξ1 = 0
falls inside margin , ξ2=0.5
Misplaced , ξ3 = 1
15

• Non – LinearlSVM: In most of the cases the data is not separable linearly , so
in this case the non-linear svm transforms data into higher dimensional space
using kernal trick.

Kernal trick , allows transform data into higher dimension without explicitly
calculating the co-ordinates of the higher dimension with help of its “Kernal
functions”.
Some common kernel functions include linear, polynomial, radial basis function
(RBF), and sigmoid.
DECISION TREES 16

• A decision tree is a tree-like structure which


classifies a data point based on the conditional
branching.
• A decision tree consists of root node , internal
nodes , branches , leaf nodes.
• It selects the best attribute according to metrics
like Gini Index , entropy or Information gain.
• Every time it selects a best attribute , it splits the
tree based on the values the feature holds.
• And , at last it ends at the leaf nodes where the
class labels are present.
17

METRICS USED IN DECISION TREES


Information Gain
18

RANDOM FORESTS
• Random forests algorithm is based on the technique “Ensemble Learning”.

• Ensemble Learning suggests combining multiple classifiers together to increase the


accuracy of the model & to solve a complex problem.

• Random Forest is a classifier that contains more than one decision treeand feed on
various subsets of the dataset and takes the majority voting to improve the
accuracy of the prediction.

• These prevent the problem of over fitting and improves accuracy, as they feed on
different subsets of data.

• Many techniques are available to manipulate the data into different subsets to feed
the model.
REGRESSION TASKS
19

 The Model which predicts a continuous variable from the given data point is said to
be a Regression Task.

 These Models determine relation between one or more Independent variables and
one dependent variable.

 Dependent variable is the expected output i.e. Class label and the Independent
variables are the features of the dataset.

 The model tries to find the mapping function which maps input to output.

y^ = F(x) here x = features of the dataset , y ^ = predicted


output.

 Some common Regression Tasks are:


1. Simple Linear Regression
2. Polynomial regression
3. Logistic Regression
20

SIMPLE LINEAR REGRESSION

• A simple linear regression maps a single independent variable


to a dependent variable , which are features and the class
label.

y = b1x+b0

• Where b1 = Slope of the line & b0 = intercept of the line


equation .
• The algorithm should find a best fit (best line equation) that
best models our data.
• The best fit decreases the residuals as much as possible.
• So , linear regression finds a linear relationship between single
independent and dependent variables.
MULTIPLE LINEAR REGRESSION 21

• Since , most of the data available in the real world will have
more than one feature.

• The multiple Linear Regression has “multiple ” independent


variables and a single dependent variable.

y = b0+b1x1+b2x2+b3x3+……….
22
LOGISTIC REGRESSION

This Regression technique is mostly used for classification


tasks.
The algorithm uses a sigmoid function , which lies between 0 to
1.
It takes features as inputs and maps it to the sigmoid function.
If the mapped data point is between 0 to 0.5 it is considered as
class 1 or else class 2.
23

UNSUPERVISED LEARNING
 These Models train on dataset without the class label
(unlabelled data).

 They are supposed to find the unknown patterns in the data.

 These are classified into :


1. Clustering
2. Dimensionality Reduction
24

CLUSTERING

 The process of
grouping similar
data points
together is said to
be clustering.
 K Means Clustering
K MEANS CLUSTERING 25

The clustering algorithm groups entire data points into “K” no.of clusters & the “K” is
defined by the programmer.

Algorithm:

 Randomly choose “K” number of points and call them “Centroids”.

 Calculate the distance between a point to all centroids and do it to all the data points.

 Now, assign each data point to its closest centroid.

 Compute “K” new centroids.

 Then continue the process until the results are same.

 Hard Clustering : The clusters don’t overlap

 Soft Clustering : Clusters may overlap.


26

K MEANS CLUSTERING
27

DIMENSIONALITY REDUCTION

• In ML every feature i.e every column is considered as a dimension.

• In real world data there might me hundreds of features in the


dataset, which is very hard to handle.

• So, we try to obtain some important features from the dataset this
is known as “Dimensionality reduction”.

• Algorithms for Dim-Reduction : PCA (Principal Component Analysis)


28

PRINCIPAL COMPONENT ANALYSIS

• A Technique which address “CURSE OF DIMENSIONALITY”.


• The “curse of Dimensionality” refers to the difficulties which will arise to
algorithms due to higher dimensional space.
• In Order to work with high dimensional data, it is obvious that we try to consider
only few features but this might result in loosing important information provided
by the other features.
• So, the PCA will consider all features and combines them to form new features
which are called principal components and then maps the data (no loosing of
important information).
• The PCA’s are ranked from most important to least important

PCn>PCn-1>PCn-2>……………PC3>PC2>PC1
29

• The directions picked by PCA (for principal components) are the eigen vectors of
the Covarience matrix i.e PC1 = eigen vector 1 , PC2 = eigen vector 2
• So , the PCA converts the correlations of all features into a lower dimensional
space.
• This results in , clustering of data points which are highly correlated.
30

REINFORCEMENT LEARNING
• Reinforcement Learning is a technique based on trail and error method.

• An agent tries to find out its move through trail and error method by rewards
and penalties.

• When agent (AI robot) is placed in an environment to attain a goal, it


performs a move and then receive an award or a penalty based on its move.

• It completes the entire task in such a way that it should attain maximum
award in the entire process of completing the goal.
31

DRAWBACKS OF ML

1. ML models cannot support huge data.

2. They cannot handle non-linearity in data.

3. The features have to be picked manually before


providing the data to the model.

4. Unstructured data cannot be handled by traditional ml.


32

NEURAL NETS

• Neural Networks are inspired by human


brain.
• The NN models overcome the drawbacks that
ML models hold.
• The NN models are based on the architecture
unlike the ml models.
• A single neuron in a Neural Net consists of
input , output , weights and bias term.
• The input neurons accept the features of a
data point. Every input has its own weighted
term.
Z=∑w​x​+b
Where, Z = weighted sum of the inputs

b = bias term

w= weights

x = input
33

• The Z (weighted sum of inputs) is given to the activation


function and the activation function will return an output.

• SOME COMMON ACTIVATION FUNCTIONS :

1. Sigmoid function
2. Tan h function
3. Re-LU function
4. Leaky ReLU
5. Soft max
34

SIMPLE ARCHITECTURE OF A NEURAL NET


35
• A simple neural net consists of 3 important layers :
1. Input layer
2. Hidden layer
3. Output layer
• An input layer of NN takes the weighted sum of the inputs and pass it to the activation
function which gives a single output and pass on to all the neurons in the subsequent
layer and this process goes on till the output layer.
• The hidden layers are the layers in between input and output layer.
• The NN’s always start from left to right and as we go from left to right the model gets
into the intricacies of the features or it gets more complex.
 SOME IMPORTANT ARCHITECTURES OF NN’s are:
1. CNN (Convolution Neural Networks)
2. RNN (Recurrent Neural Networks)
3. Transformers
4. Auto – Encoders
5. GAN’s (Generative Adversial Networks)
THANK YOU

Praneetha.G
[email protected]

You might also like