0% found this document useful (0 votes)
96 views

UNIT1

machine learning

Uploaded by

Principal CECC
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
96 views

UNIT1

machine learning

Uploaded by

Principal CECC
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
You are on page 1/ 38

UNIT– I

Machine Learning Basics Learning


Algorithms, Capacity, Over fitting and Under fitting, Hyper parameters and Validation Sets,
Estimators, Bias and Variance, Maximum Likelihood, Estimation Bayesian Statistics.
Supervised Learning Algorithms, Unsupervised Learning Algorithms, Stochastic Gradient
Descent, Building a Machine Learning Algorithm, Challenges Motivating Deep Learning.

Machine Learning Basics Learning: Algorithms


What is Machine Learning?
Machine Learning is defined as a technology that is used to train
machines to perform various actions such as predictions,
recommendations, estimations, etc., based on historical data or past
experience.

Machine Learning enables computers to behave like human beings by


training them with the help of past experience and predicted data.

There are three key aspects of Machine Learning, which are as follows

o Task: A task is defined as the main problem in which we are


interested. This task/problem can be related to the predictions and
recommendations and estimations, etc.
o Experience: It is defined as learning from historical or past data
and used to estimate and resolve future tasks.
o Performance: It is defined as the capacity of any machine to
resolve any machine learning task or problem and provide the best
outcome for the same. However, performance is dependent on the
type of machine learning problems.

Types of Machine Learning Algorithms


Machine Learning Algorithm can be broadly classified into three types:

1. Supervised Learning Algorithms


2. Unsupervised Learning Algorithms
3. Reinforcement Learning algorithm
Supervised Learning Algorithm
Supervised learning is a type of Machine learning in which the machine needs external
supervision to learn. The supervised learning models are trained using the labelled dataset.
Once the training and processing are done, the model is tested by providing a sample test data
to check whether it predicts the correct output.

The goal of supervised learning is to map input data with the output data. Supervised learning
is based on supervision, and it is the same as when a student learns things in the teacher's
supervision. The example of supervised learning is spam filtering.

Supervised learning can be divided further into two categories of problem:

o Classification
o Regression

Examples of some popular supervised learning algorithms are Simple Linear regression,
Decision Tree, Logistic Regression, KNN algorithm, etc. Read more..

2) Unsupervised Learning Algorithm


It is a type of machine learning in which the machine does not need any external supervision
to learn from the data, hence called unsupervised learning. The unsupervised models can be
trained using the unlabelled dataset that is not classified, nor categorized, and the algorithm
needs to act on that data without any supervision. In unsupervised learning, the model doesn't
have a predefined output, and it tries to find useful insights from the huge amount of data.
These are used to solve the Association and Clustering problems. Hence further, it
can be classified into two types:

o Clustering
o Association

Examples of some Unsupervised learning algorithms are K-means Clustering,


Apriori Algorithm, Eclat, etc. Read more..

3) Reinforcement Learning
In Reinforcement learning, an agent interacts with its environment by producing actions, and
learn with the help of feedback. The feedback is given to the agent in the form of rewards,
such as for each good action, he gets a positive reward, and for each bad action, he gets a
negative reward. There is no supervision provided to the agent. Q-Learning
algorithm is used in reinforcement learning. Read more…

List of Popular Machine Learning Algorithm


1. Linear Regression Algorithm
2. Logistic Regression Algorithm
3. Decision Tree
4. SVM
5. Naïve Bayes
6. KNN
7. K-Means Clustering
8. Random Forest
9. Apriori
10. PCA

1. Linear Regression
Linear regression is one of the most popular and simple machine learning
algorithms that is used for predictive analysis. Here, predictive
analysis defines prediction of something, and linear regression makes
predictions for continuous numbers such as salary, age, etc.

It shows the linear relationship between the dependent and independent


variables, and shows how the dependent variable(y) changes according to
the independent variable (x).
It tries to best fit a line between the dependent and independent
variables, and this best fit line is knowns as the regression line.

The equation for the regression line is:

y= a0+ a*x+ b

Here, y= dependent variable

x= independent variable

Linear regression is further divided into two types:

o Simple Linear Regression: In simple linear regression, a single


independent variable is used to predict the value of the dependent
variable.
o Multiple Linear Regression: In multiple linear regression, more
than one independent variables are used to predict the value of the
dependent variable.

2. Logistic Regression
Logistic regression is the supervised learning algorithm, which is used
to predict the categorical variables or discrete values. It can be
used for the classification problems in machine learning, and the output of
the logistic regression algorithm can be either Yes or NO, 0 or 1, Red or
Blue, etc.

Logistic regression is similar to the linear regression except how they are
used, such as Linear regression is used to solve the regression problem
and predict continuous values, whereas Logistic regression is used to
solve the Classification problem and used to predict the discrete values.
Instead of fitting the best fit line, it forms an S-shaped curve that lies
between 0 and 1. The S-shaped curve is also known as a logistic function
that uses the concept of the threshold. Any value above the threshold will
tend to 1, and below the threshold will tend to 0.

3. Decision Tree Algorithm


A decision tree is a supervised learning algorithm that is mainly used to
solve the classification problems but can also be used for solving the
regression problems. It can work with both categorical variables and
continuous variables. It shows a tree-like structure that includes nodes
and branches, and starts with the root node that expand on further
branches till the leaf node. The internal node is used to represent
the features of the dataset, branches show the decision
rules, and leaf nodes represent the outcome of the problem.

Some real-world applications of decision tree algorithms are identification


between cancerous and non-cancerous cells, suggestions to customers to
buy a car, etc.

4. Naïve Bayes Algorithm:


Naïve Bayes classifier is a supervised learning algorithm, which is used to
make predictions based on the probability of the object. The algorithm
named as Naïve Bayes as it is based on Bayes theorem, and follows
the naïve assumption that says' variables are independent of each other.

The Bayes theorem is based on the conditional probability; it means the


likelihood that event(A) will happen, when it is given that event(B) has
already happened. The equation for Bayes theorem is given as:

Naïve Bayes classifier is one of the best classifiers that provide a good
result for a given problem. It is easy to build a naïve Bayesian model, and
well suited for the huge amount of dataset. It is mostly used for text
classification.

6. K-Nearest Neighbour (KNN)


K-Nearest Neighbour is a supervised learning algorithm that can be used
for both classification and regression problems. This algorithm works by
assuming the similarities between the new data point and available data
points. Based on these similarities, the new data points are put in the
most similar categories. It is also known as the lazy learner algorithm as it
stores all the available datasets and classifies each new case with the
help of K-neighbours. The new case is assigned to the nearest class with
most similarities, and any distance function measures the distance
between the data points. The distance function can be Euclidean,
Minkowski, Manhattan, or Hamming distance, based on the
requirement. Read more..

7. K-Means Clustering
K-means clustering is one of the simplest unsupervised learning
algorithms, which is used to solve the clustering problems. The datasets
are grouped into K different clusters based on similarities and
dissimilarities, it means, datasets with most of the commonalties remain
in one cluster which has very less or no commonalities between other
clusters. In K-means, K-refers to the number of clusters, and means refer
to the averaging the dataset in order to find the centroid.

It is a centroid-based algorithm, and each cluster is associated with a


centroid. This algorithm aims to reduce the distance between the data
points and their centroids within a cluster.

This algorithm starts with a group of randomly selected centroids that


form the clusters at starting and then perform the iterative process to
optimize these centroids' positions.

It can be used for spam detection and filtering, identification of fake news,
etc. Read more..

8. Random Forest Algorithm


Random forest is the supervised learning algorithm that can be used for
both classification and regression problems in machine learning. It is an
ensemble learning technique that provides the predictions by combining
the multiple classifiers and improve the performance of the model.

It contains multiple decision trees for subsets of the given dataset, and
find the average to improve the predictive accuracy of the model. A
random-forest should contain 64-128 trees. The greater number of trees
leads to higher accuracy of the algorithm.

To classify a new dataset or object, each tree gives the classification


result and based on the majority votes, the algorithm predicts the final
output.

Random forest is a fast algorithm, and can efficiently deal with the
missing & incorrect data. Read more..
9. Apriori Algorithm
Apriori algorithm is the unsupervised learning algorithm that is used to
solve the association problems. It uses frequent itemsets to generate
association rules, and it is designed to work on the databases that contain
transactions. With the help of these association rule, it determines how
strongly or how weakly two objects are connected to each other. This
algorithm uses a breadth-first search and Hash Tree to calculate the
itemset efficiently.

The algorithm process iteratively for finding the frequent itemsets from
the large dataset.

The apriori algorithm was given by the R. Agrawal and Srikant in the
year 1994. It is mainly used for market basket analysis and helps to
understand the products that can be bought together. It can also be used
in the healthcare field to find drug reactions in patients. Read more..

10. Principle Component Analysis


Principle Component Analysis (PCA) is an unsupervised learning
technique, which is used for dimensionality reduction. It helps in reducing
the dimensionality of the dataset that contains many features correlated
with each other. It is a statistical process that converts the observations of
correlated features into a set of linearly uncorrelated features with the
help of orthogonal transformation. It is one of the popular tools that is
used for exploratory data analysis and predictive modeling.

PCA works by considering the variance of each attribute because the high
variance shows the good split between the classes, and hence it reduces
the dimensionality.

Some real-world applications of PCA are image processing, movie


recommendation system, optimizing the power allocation in various
communication channels

Overfitting and Underfitting in Machine


Learning
Overfitting and Underfitting are the two main problems that occur in
machine learning and degrade the performance of the machine learning
models.

The main goal of each machine learning model is to generalize well.


Here generalization defines the ability of an ML model to provide a
suitable output by adapting the given set of unknown input. It means after
providing training on the dataset, it can produce reliable and accurate
output. Hence, the underfitting and overfitting are the two terms that
need to be checked for the performance of the model and whether the
model is generalizing well or not.

Before understanding the overfitting and underfitting, let's understand


some basic term that will help to understand this topic well:

o Signal: It refers to the true underlying pattern of the data that helps the
machine learning model to learn from the data.
o Noise: Noise is unnecessary and irrelevant data that reduces the
performance of the model.
o Bias: Bias is a prediction error that is introduced in the model due to
oversimplifying the machine learning algorithms. Or it is the difference
between the predicted values and the actual values.
o Variance: If the machine learning model performs well with the training
dataset, but does not perform well with the test dataset, then variance
occurs.

Overfitting
Overfitting occurs when our machine learning

model tries to cover all the data points or more than the required data points present in the given
dataset. Because of this, the model starts caching noise and inaccurate values present in the dataset,
and all these factors reduce the efficiency and accuracy of the model. The overfitted model has low
bias and high variance.

The chances of occurrence of overfitting increase as much we provide


training to our model. It means the more we train our model, the more
chances of occurring the overfitted model.

Overfitting is the main problem that occurs in supervised learning

Example: The concept of the overfitting can be understood by the below


graph of the linear regression output:
As we can see from the above graph, the model tries to cover all the data points present in the
scatter plot. It may look efficient, but in reality, it is not so. Because the goal of the regression
model to find the best fit line, but here we have not got any best fit, so, it will generate the
prediction errors.

How to avoid the Overfitting in Model


Both overfitting and underfitting cause the degraded performance of the machine learning
model. But the main cause is overfitting, so there are some ways by which we can reduce the
occurrence of overfitting in our model.

o Cross-Validation
o Training with more data
o Removing features
o Early stopping the training
o Regularization
o Ensembling

Underfitting
Underfitting occurs when our machine learning model is not able to capture the underlying
trend of the data. To avoid the overfitting in the model, the fed of training data can be stopped
at an early stage, due to which the model may not learn enough from the training data. As a
result, it may fail to find the best fit of the dominant trend in the data.

In the case of underfitting, the model is not able to learn enough from the training data, and
hence it reduces the accuracy and produces unreliable predictions.
An underfitted model has high bias and low variance.

Example: We can understand the underfitting using below output of the linear
regression model:

As we can
see from the above diagram, the model is unable to capture the data
points present in the plot.

How to avoid underfitting:


o By increasing the training time of the model.
o By increasing the number of features.

Goodness of Fit
The "Goodness of fit" term is taken from the statistics, and the goal of the
machine learning models to achieve the goodness of fit. In statistics
modelling, it defines how closely the result or predicted values match the
true values of the dataset.

The model with a good fit is between the underfitted and overfitted model,
and ideally, it makes predictions with 0 errors, but in practice, it is difficult
to achieve it.

As when we train our model for a time, the errors in the training data go
down, and the same happens with test data. But if we train the model for
a long duration, then the performance of the model may decrease due to
the overfitting, as the model also learn the noise present in the dataset.
The errors in the test dataset start increasing, so the point, just before the
raising of errors, is the good point, and we can stop here for achieving a
good model.
Hyper parameters
Hyperparameters in Machine learning are those parameters that are
explicitly defined by the user to control the learning process.

What are hyper parameters?


In Machine Learning/Deep Learning, a model is represented by its
parameters. In contrast, a training process involves selecting the
best/optimal hyperparameters that are used by learning algorithms to
provide the best result. So, what are these hyperparameters? The answer
is, "Hyperparameters are defined as the parameters that are
explicitly defined by the user to control the learning process."

Here the prefix "hyper" suggests that the parameters are top-level
parameters that are used in controlling the learning process. The value of
the Hyperparameter is selected and set by the machine learning engineer
before the learning algorithm begins training the model. Hence, these
are external to the model, and their values cannot be changed
during the training process.

Some examples of Hyperparameters in Machine Learning


o The k in kNN or K-Nearest Neighbor algorithm
o Learning rate for training a neural network
o Train-test split ratio
o Batch Size
o Number of Epochs
o Branches in Decision Tree
o Number of clusters in Clustering Algorithm

Validation Dataset: The sample of data used to


provide an unbiased evaluation of a model fit on the
training dataset while tuning model hyperparameters.
The evaluation becomes more biased as skill on the
validation dataset is incorporated into the model
configuration.
The validation set is used to evaluate a given model, but
this is for frequent evaluation. We, as machine learning
engineers, use this data to fine-tune the model
hyperparameters. Hence the model occasionally sees this
data, but never does it “Learn” from this. We use the
validation set results, and update higher level
hyperparameters. So the validation set affects a model, but
only indirectly. The validation set is also known as the Dev
set or the Development set. This makes sense since this
dataset helps during the “development” stage of the
model.

Machine Learning Basics Learning: Estimator

What is an estimator?

In machine learning, an estimator is an equation for picking the “best,” or


most likely accurate, data model based upon observations in realty. Not to
be confused with estimation in general, the estimator is the formula that
evaluates a given quantity (the estimand) and generates an estimate. This
estimate is then inserted into the deep learning classifier system to
determine what action to take.

Uses of Estimators

By quantifying guesses, estimators are how machine learning in theory is


implemented in practice. Without the ability to estimate the parameters of a
dataset (such as the layers in a neural network or the bandwidth in a
kernel), there would be no way for an AI system to “learn.”

A simple example of estimators and estimation in practice is the so-called


“German Tank Problem” from World War Two. The Allies had no way to
know for sure how many tanks the Germans were building every month. By
counting the serial numbers of captured or destroyed tanks (the estimand),
Allied statisticians created an estimator rule. This equation calculated the
maximum possible number of tanks based upon the sequential serial
numbers, and apply minimum variance analysis to generate the most likely
estimate for how many new tanks German was building.

Types of Estimators

Estimators come in two broad categories—point and interval. Point


equations generate single value results, such as standard deviation, that
can be plugged into a deep learning algorithm’s classifier functions. Interval
equations generate a range of likely values, such as a confidence interval,
for analysis.

In addition, each estimator rule can be tailored to generate different types


of estimates:

 Biased - Either an overestimate or an underestimate.


 Efficient - Smallest variance analysis. The smallest possible variance is
referred to as the “best” estimate.
 Invariant: Less flexible estimates that aren’t easily changed by data
transformations.
 Shrinkage: An unprocessed estimate that’s combined with other variables
to create complex estimates.
 Sufficient: Estimating the total population’s parameter from a limited
dataset.
 Unbiased: An exact-match estimate value that neither underestimates nor
overestimates.

Bias and Variance in Machine Learning


Machine learning is a branch of Artificial Intelligence, which allows
machines to perform data analysis and make predictions. However, if the
machine learning model is not accurate, it can make predictions errors,
and these prediction errors are usually known as Bias and Variance. In
machine learning, these errors will always be present as there is always a
slight difference between the model predictions and actual predictions.
The main aim of ML/data science analysts is to reduce these errors in
order to get more accurate results. In this topic, we are going to discuss
bias and variance, Bias-variance trade-off, Underfitting and Overfitting.
But before starting, let's first understand what errors in Machine learning
are?

What is Bias?
In general, a machine learning model analyses the data, find patterns in it
and make predictions. While training, the model learns these patterns in
the dataset and applies them to test data for prediction. While making
predictions, a difference occurs between prediction values made
by the model and actual values/expected values, and this
difference is known as bias errors or Errors due to bias. It can be
defined as an inability of machine learning algorithms such as Linear
Regression to capture the true relationship between the data points. Each
algorithm begins with some amount of bias because bias occurs from
assumptions in the model, which makes the target function simple to
learn. A model has either:

o Low Bias: A low bias model will make fewer assumptions about the
form of the target function.
o High Bias: A model with a high bias makes more assumptions, and
the model becomes unable to capture the important features of our
dataset. A high bias model also cannot perform well on new
data.

Generally, a linear algorithm has a high bias, as it makes them learn fast.
The simpler the algorithm, the higher the bias it has likely to be
introduced. Whereas a nonlinear algorithm often has low bias.

Some examples of machine learning algorithms with low bias are


Decision Trees, k-Nearest Neighbours and Support Vector
Machines. At the same time, an algorithm with high bias is Linear
Regression, Linear Discriminant Analysis and Logistic Regression.

What is a Variance Error?


The variance would specify the amount of variation in the prediction if the
different training data was used. In simple words, variance tells that
how much a random variable is different from its expected
value. Ideally, a model should not vary too much from one training
dataset to another, which means the algorithm should be good in
understanding the hidden mapping between inputs and output variables.
Variance errors are either of low variance or high variance.

Low variance means there is a small variation in the prediction of the


target function with changes in the training data set. At the same
time, High variance shows a large variation in the prediction of the
target function with changes in the training dataset.

A model that shows high variance learns a lot and perform well with the
training dataset, and does not generalize well with the unseen dataset. As
a result, such a model gives good results with the training dataset but
shows high error rates on the test dataset.

Since, with high variance, the model learns too much from the dataset, it
leads to overfitting of the model. A model with high variance has the
below problems:

o A high variance model leads to overfitting.


o Increase model complexities.

Usually, nonlinear algorithms have a lot of flexibility to fit the model, have
high variance.

Some examples of machine learning algorithms with low variance


are, Linear Regression, Logistic Regression, and Linear
discriminant analysis. At the same time, algorithms with high variance
are decision tree, Support Vector Machine, and K-nearest
neighbours.

Ways to Reduce High Variance:


o Reduce the input features or number of parameters as a model is
overfitted.
o Do not use a much complex model.
o Increase the training data.
o Increase the Regularization term.
o
o
Maximum Likelihood Estimation

Maximum Likelihood Estimation (MLE) is a probabilistic based


approach to determine values for the parameters of the model.
Parameters could be defined as blueprints for the model
because based on that the algorithm works. MLE is a widely
used technique in machine learning, time series, panel data
and discrete data
What is the likelihood?
The likelihood function measures the extent to which the data
provide support for different values of the parameter. It
indicates how likely it is that a particular population will produce
a sample. For example, if we compare the likelihood function at
two-parameter points and find that for the first parameter the
likelihood is greater than the other it could be interpreted as the
first parameter being a more plausible value for the learner than
the second parameter. More likely it could be said that it uses a
hypothesis for concluding the result. Both frequentist and
Bayesian analyses consider the likelihood function. The
likelihood function is different from the probability density
function.

Working of Maximum Likelihood Estimation


The maximization of the likelihood estimation is the main
objective of the MLE. Let’s understand this with an example.
Consider there is a binary classification problem in which we
need to classify the data into two categories either 0 or 1 based
on a feature called “salary”.

So MLE will calculate the possibility for each data point in


salary and then by using that possibility, it will calculate the
likelihood of those data points to classify them as either 0 or 1.
It will repeat this process of likelihood until the learner line is
best fitted. This process is known as the maximization of
likelihood.
What is Bayesian machine
learning?
Bayesian ML is a paradigm for constructing statistical models based on Bayes’
Theorem

p(θ|x)=p(x|θ)p(θ)p(x)

(𝑝(𝜃|𝑥)p(θ|x)) given the likelihood (𝑝(𝑥|𝜃)p(x|θ)) and the prior distribution, 𝑝(𝜃)p(θ).
Generally speaking, the goal of Bayesian ML is to estimate the posterior distribution

The likelihood is something that can be estimated from the training data.

In fact, that’s exactly what we’re doing when training a regular machine learning
model. We’re performing Maximum Likelihood Estimation, an iterative process which

the training data 𝑥x having already seen the model parameters 𝜃θ. We call this
updates the model’s parameters in an attempt to maximize the probability of seeing

process Maximum a Posteriori (MAP). It’s easier, however, to think about it in terms
of the likelihood function. By Bayes’ Theorem we can write the posterior as

p(θ|x)∝p(x|θ)p(θ)

with respect to 𝜃θ which 𝑝(𝑥)p(x) does not depend on. Therefore, we can ignore it in
Here we leave out the denominator, (𝑥)p(x), because we are taking the maximization
the maximization procedure. The key piece of the puzzle which leads Bayesian

the term (𝜃)p(θ). We call this the prior distribution over 𝜃θ.
models to differ from their classical counterparts trained by MLE is the inclusion of

Supervised Machine Learning


Supervised learning is the types of machine learning in which machines
are trained using well "labelled" training data, and on basis of that data,
machines predict the output. The labelled data means some input data is
already tagged with the correct output.

In supervised learning, the training data provided to the machines work as


the supervisor that teaches the machines to predict the output correctly.
It applies the same concept as a student learns in the supervision of the
teacher.

Supervised learning is a process of providing input data as well as correct


output data to the machine learning model. The aim of a supervised
learning algorithm is to find a mapping function to map the input
variable(x) with the output variable(y).

In the real-world, supervised learning can be used for Risk Assessment,


Image classification, Fraud Detection, spam filtering, etc.

How Supervised Learning Works?


In supervised learning, models are trained using labelled dataset, where
the model learns about each type of data. Once the training process is
completed, the model is tested on the basis of test data (a subset of the
training set), and then it predicts the output.

The working of Supervised learning can be easily understood by the below


example and diagram:
Suppose we have a dataset of different types of shapes which includes
square, rectangle, triangle, and Polygon. Now the first step is that we
need to train the model for each shape.

o If the given shape has four sides, and all the sides are equal, then it will be
labelled as a Square.
o If the given shape has three sides, then it will be labelled as a triangle.
o If the given shape has six equal sides then it will be labelled as hexagon.

Now, after training, we test our model using the test set, and the task of
the model is to identify the shape.

The machine is already trained on all types of shapes, and when it finds a
new shape, it classifies the shape on the bases of a number of sides, and
predicts the output.

Steps Involved in Supervised Learning:


o First Determine the type of training dataset
o Collect/Gather the labelled training data.
o Split the training dataset into training dataset, test dataset, and
validation dataset.
o Determine the input features of the training dataset, which should have
enough knowledge so that the model can accurately predict the output.
o Determine the suitable algorithm for the model, such as support vector
machine, decision tree, etc.
o Execute the algorithm on the training dataset. Sometimes we need
validation sets as the control parameters, which are the subset of training
datasets.
o Evaluate the accuracy of the model by providing the test set. If the model
predicts the correct output, which means our model is accurate.

Types of supervised Machine learning


Algorithms:
Supervised learning can be further divided into two types of problems:

1. Regression

Regression algorithms are used if there is a relationship between the input


variable and the output variable. It is used for the prediction of continuous
variables, such as Weather forecasting, Market Trends, etc. Below are
some popular Regression algorithms which come under supervised
learning:

o Linear Regression
o Regression Trees
o Non-Linear Regression
o Bayesian Linear Regression
o Polynomial Regression

2. Classification

Classification algorithms are used when the output variable is categorical,


which means there are two classes such as Yes-No, Male-Female, True-
false, etc.

Spam Filtering,
o Random Forest
o Decision Trees
o Logistic Regression
o Support vector Machines

Advantages of Supervised learning:


o With the help of supervised learning, the model can predict the output on
the basis of prior experiences.
o In supervised learning, we can have an exact idea about the classes of
objects.
o Supervised learning model helps us to solve various real-world problems
such as fraud detection, spam filtering, etc.

Disadvantages of supervised learning:


o Supervised learning models are not suitable for handling the complex
tasks.
o Supervised learning cannot predict the correct output if the test data is
different from the training dataset.
o Training required lots of computation times.
o In supervised learning, we need enough knowledge about the classes of
object.

Unsupervised Machine Learning


In the previous topic, we learned supervised machine learning in which
models are trained using labeled data under the supervision of training
data. But there may be many cases in which we do not have labeled data
and need to find the hidden patterns from the given dataset. So, to solve
such types of cases in machine learning, we need unsupervised learning
techniques.

What is Unsupervised Learning?


As the name suggests, unsupervised learning is a machine learning
technique in which models are not supervised using training dataset.
Instead, models itself find the hidden patterns and insights from the given
data. It can be compared to learning which takes place in the human brain
while learning new things. It can be defined as:

Unsupervised learning is a type of machine learning in which models are trained using unlabeled
dataset and are allowed to act on that data without any supervision

Unsupervised learning cannot be directly applied to a regression or


classification problem because unlike supervised learning, we have the
input data but no corresponding output data. The goal of unsupervised
learning is to find the underlying structure of dataset, group that
data according to similarities, and represent that dataset in a
compressed format.

Example: Suppose the unsupervised learning algorithm is given an input


dataset containing I mages of different types of cats and dogs. The
algorithm is never trained upon the given dataset, which means it does
not have any idea about the features of the dataset. The task of the
unsupervised learning algorithm is to identify the image features on their
own. Unsupervised learning algorithm will perform this task by clustering
the image dataset into the groups according to similarities between
images.

Why use Unsupervised Learning?


Below are some main reasons which describe the importance of
Unsupervised Learning:

o Unsupervised learning is helpful for finding useful insights from the data.
o Unsupervised learning is much similar as a human learns to think by their
own experiences, which makes it closer to the real AI.
o Unsupervised learning works on unlabeled and uncategorized data which
make unsupervised learning more important.
o In real-world, we do not always have input data with the corresponding
output so to solve such cases, we need unsupervised learning.

Working of Unsupervised Learning


Working of unsupervised learning can be understood by the below
diagram:

Here, we have taken an unlabeled input data, which means it is not categorized
and corresponding outputs are also not given. Now, this unlabeled input data is
fed to the machine learning model in order to train it. Firstly, it will interpret the
raw data to find the hidden patterns from the data and then will apply suitable
algorithms such as k-means clustering, Decision tree, etc.

Types of Unsupervised Learning Algorithm:


The unsupervised learning algorithm can be further categorized into two
types of problems:
o Clustering: Clustering is a method of grouping the objects into clusters
such that objects with most similarities remains into a group and has less
or no similarities with the objects of another group. Cluster analysis finds
the commonalities between the data objects and categorizes them as per
the presence and absence of those commonalities.
o Association: An association rule is an unsupervised learning method
which is used for finding the relationships between variables in the large
database. It determines the set of items that occurs together in the
dataset. Association rule makes marketing strategy more effective. Such
as people who buy X item (suppose a bread) are also tend to purchase Y
(Butter/Jam) item. A typical example of Association rule is Market Basket
Analysis.

Unsupervised Learning algorithms:


Below is the list of some popular unsupervised learning algorithms:

o K-means clustering
o KNN (k-nearest neighbors)
o Hierarchal clustering
o Anomaly detection
o Neural Networks
o Principle Component Analysis
o Independent Component Analysis
o Apriori algorithm
o Singular value decomposition

Advantages of Unsupervised Learning


o Unsupervised learning is used for more complex tasks as compared to
supervised learning because, in unsupervised learning, we don't have
labeled input data.
o Unsupervised learning is preferable as it is easy to get unlabeled data in
comparison to labeled data.

Disadvantages of Unsupervised Learning


o Unsupervised learning is intrinsically more difficult than supervised
learning as it does not have corresponding output.
o The result of the unsupervised learning algorithm might be less accurate
as input data is not labeled, and algorithms do not know the exact output
in advance.

Stochastic Gradient Descent (SGD)


Gradient Descent in Brief
 Gradient Descent is a generic optimization algorithm capable of finding
optimal solutions to a wide range of problems.
 The general idea is to tweak parameters iteratively in order to minimize
the cost function.
 An important parameter of Gradient Descent (GD) is the size of the steps,
determined by the learning rate hyperparameters. If the learning rate is
too small, then the algorithm will have to go through many iterations to
converge, which will take a long time, and if it is too high we may jump the
optimal value.

Types of Gradient Descent:


 Typically, there are three types of Gradient Descent:

1. Batch Gradient Descent


2. Stochastic Gradient Descent
3. Mini-batch Gradient Descent
In this article, we will be discussing Stochastic Gradient Descent (SGD).
Stochastic Gradient Descent (SGD):

The word ‘stochastic‘ means a system or process linked with a random


probability. Hence, in Stochastic Gradient Descent, a few samples are
selected randomly instead of the whole data set for each iteration. In
Gradient Descent, there is a term called “batch” which denotes the total
number of samples from a dataset that is used for calculating the gradient for
each iteration. In typical Gradient Descent optimization, like Batch Gradient
Descent, the batch is taken to be the whole dataset. Although using the
whole dataset is really useful for getting to the minima in a less noisy and
less random manner, the problem arises when our dataset gets big.
Suppose, you have a million samples in your dataset, so if you use a typical
Gradient Descent optimization technique, you will have to use all of the one
million samples for completing one iteration while performing the Gradient
Descent, and it has to be done for every iteration until the minima are
reached. Hence, it becomes computationally very expensive to perform.
This problem is solved by Stochastic Gradient Descent. In SGD, it uses only
a single sample, i.e., a batch size of one, to perform each iteration. The
sample is randomly shuffled and selected for performing the iteration.

Building a Machine Learning Algorithm

1. Review Different Machine Learning Algorithms And Choose The Algorithm


To Build
You need to first understand your own project requirements. Project teams use
different machine learning methods for different purposes.

Data scientists might use predictive analytics for data science-specific use
cases, whereas, another Artificial Intelligence (AI) team might build machine
learning systems for other reasons. E.g., a project team might use machine
learning with AI capabilities like natural language processing (NLP), computer
vision, etc.

Review the prominent machine learning algorithms before choosing the right
algorithm to build. The following examples of important machine learning
algorithms:
A. NAÏVE BAYES CLASSIFIER ALGORITHM
ML (machine learning) project teams use this popular algorithm to solve
classification problems. It uses the supervised learning approach, i.e., it works
with “labeled” input data.

B. K MEANS CLUSTERING ALGORITHM


It’s one of the unsupervised learning algorithms. ML project teams utilize this
for clustering of the input data set.

C. SUPPORT VECTOR MACHINE ALGORITHM


While most project teams use the “Support Vector Machine” (SVM) algorithm
for classification problems, some of them use it to solve regression problems.
It’s one of the well-known supervised learning algorithms.

D. LINEAR REGRESSION
Data scientists and ML project teams make great use of this supervised
learning algorithm to solve linear regression problems.

E. LOGISTIC REGRESSION
This supervised learning algorithm helps to address machine learning
problems where you need to find discrete values of dependent variables from
independent variables.

F. ARTIFICIAL NEURAL NETWORKS (ANNS)


Artificial Neural Networks have significant utility in deep learning. You design
and create Artificial Neural Networks by taking inspiration from the way the
human brain operates. These algorithms use the reinforcement learning
approach.

G. DECISION TREES
This supervised learning algorithm helps to create flow charts that look like
trees. ML projects use it for solving many real-world problems like binary
classification problems.

2. Hire Developers To Develop A Machine Learning Algorithm


You need the right developers to develop effective algorithms and machine
learning models. We recommend you hire a Python developer to develop a
machine learning algorithm. Python has a great reputation among artificial
intelligence/machine learning developers and data scientists.

Look for programming skills when hiring developers, however, a deeper


understanding of machine learning is even more important. The programmer
you hire should know what it takes to create good models and algorithms.
The developer needs a thorough understanding of different algorithms.
Programmers should know how to improve a machine learning model
performance.

Developers should know of different types of mathematical problems like


ordinary least squares and binary classification problems. Depending on the
project, programmers might need to know about loss functions like the “Mean
Squared Error” (MSE).

3. Learn About The Algorithm Before Diving Deep Into How To Develop A
Machine Learning Algorithm
You need to learn sufficiently about the algorithm that you have decided to
build. Understand the functionality of the algorithm, and understand where it’s
used. Learn when you shouldn’t use this algorithm.

4. Data Collection And Data Preparation


You might collect data for your machine learning model and algorithm from
different data sources. You can’t use that data straight away after you collect
data though.

An ML project team needs to prepare data sets first. This enables them to have
clean, consistent, and accurate data sets.

You need to take help from business stakeholders and data scientists for this.
They need the same unlimited access to the data that your ML developers
have.

Implement a set of repeatable steps so that you can execute them for new data
sets. Invest in technology solutions so that you can prepare more data when
you need it with the same scale and speed.

The data preparation steps are as follows:

A. DATA COLLECTION
You need to first collect data from the relevant data sources. Your ML project
team should work on the following challenges at this stage:

 Scanning external data sources and identifying relevant data;

 Determining the relevant attributes in data sets;

 Parsing data from files like XML and JSON into tabular formats;
 Combining data into the appropriate number of data sets;

 Preparing plans to remove biases from the input data sets.


B. EXPLORE DATA AND CREATE DATA PROFILES
You now need to assess the condition of the input data that you have collected.
Do the following at this stage:

 Identify trends in the input data sets.

 Examine the data sets for outliers.

 Find out the various exceptions in the data sets.

 Make a list of incorrect or missing data points.

 Identify the inconsistencies in the data sets.

 Look for issues that could introduce biases in your expected outputs.
C. ORGANIZE THE DATA SETS IN THE APPROPRIATE FORMAT FOR
CONSISTENCY
You might have gathered data for your training and test sets from different data
sources. They might have different formats.

Furthermore, you might not be the only one to manually update the data sets.
Other users might have unlimited access to the data sets, and they might
update them. All of the above examples might result in different formats in
different data sets.

However, your machine learning model might need the data in a certain format.
Your team needs to organize your input data sets in that format. This task
might require standardizing certain values in several columns.

D. IMPROVE THE QUALITY OF THE DATA SETS


Improve the quality of your input data sets. You might need to do the following:

 Build a strategy to correct data errors.

 Manage the missing values.

 Manage the extreme values in the data sets.

 Find a solution to outliers in the input data sets.

 Review the distribution of your data and identify discrepancies.


 Analyze the “outliers” in your data sets.

 Use appropriate data preparation tools.

 Ensure that your modified data sets are similar to the real data sets.
E. FEATURE ENGINEERING AFTER ANALYZING THE INPUT VARIABLES
The term “feature engineering” refers to the act of modifying raw data into
features for the understanding of machine learning algorithms. This step helps
ML algorithms to understand the data better since they can see patterns in the
data.

Feature engineering might involve decomposing the inputs data sets into
multiple parts. An ML project team might do this to categorize data by different
values.

Each part of the data set will help the ML algorithm to understand specific
relationships in the data sets. The ML algorithms can also find patterns in the
data.

F. SPLIT DATA SETS INTO TRAINING DATA AND TEST DATA SETS
You can now divide your input data sets into two sets. One of these two sets is
to train the ML algorithm that you are building. You should use the other data
set for testing your algorithm.

What if you have heavily skewed training examples in your input data? This
can result in biases. This can adversely impact the performance of your
machine learning model, and this is especially true with respect to complex
problems. You need to choose the “random state” effectively. This argument
helps you to eliminate biases in your input data sets.

5. Design And Implement A Robust Information Security Solution


You use AI and ML to build autonomous systems. Such systems differ
fundamentally from explicitly-programmed systems.

AI and ML systems learn from input data sets and improve their performance
over time. The quality of learning influences their performance, therefore, you
need to feed them with high-quality training data.

Depending on the sensitivity of your ML project, protecting the sanctity of the


training and test data sets can be hard. Malicious players might try to tamper
with the training data, which is called “data poisoning”. ML models can make
wrong inferences based on manipulated training data.
Analyze the information security risks faced by your organization. Strategize
and design an information security solution to prevent “data poisoning” and
other attacks. Implement the information security solution.

6. Create The Pseudocode For The Machine Learning Algorithm


Before you start coding, you need to create the pseudocode for the ML
algorithm that you plan to build. Write the pseudocode in as much detail as you
can. That will help you to understand the algorithm in more detail than what you
learned so far.

Take the simple example of a linear regression algorithm. Under which


conditions will you get the “best-fit” straight line in the output? By creating the
pseudocode, you get this understanding even before the programming phase.

The exact work in this phase will depend upon the algorithm you are
developing. You can refer to authoritative books and blog posts for more
information before you create the pseudocode. The following are a few
examples of authoritative resources:

You need to implement a review of the pseudocode created. Your ML project


team should incorporate the relevant findings from the review.

7. Code The Machine Learning Algorithm


Having created the pseudocode, you now need to develop the ML algorithm.
Your project plan should include a structured code review process. This helps
you to detect defects even before you start testing.

8. Train The Machine Learning Algorithm You Have Created


You had earlier created separate input data sets for training and testing. Now,
you need to utilize the training data set to train the new algorithm you have
created.

Review the machine learning model created during this training, and analyze
the outliers. You might find problems with the input data that earlier escaped
your attention.

Analyze data errors if you find them. Run the previously-created data
preparation process to create better training data. Reiterate the training and
review processes.
9. Test The Machine Learning Algorithm
You now need to validate the ML algorithm with the help of your test data set.
Execute the algorithm and create an ML model. Review the output in detail.
Pay special attention to outliers and exceptions, and examine the reasons.

Check whether the outliers and exceptions originated due to errors in the input
data sets. In that case, make the necessary corrections in the input data sets.
Rerun the tests. Reiterate the review process.

You would want to compare the output of your ML algorithm against a standard
implementation of that algorithm and the same input data set. Scikit-learn, a
popular Python library already includes standard implementations of many
popular ML algorithms. The following are a few examples:

 Scikit-learn Naïve Bayes Classifier;

 Scikit-learn K-Means Clustering;

 Scikit-learn Support Vector Machine;

 Scikit-learn Linear Regression;

 Scikit-learn Logistic Regression;

 Scikit-learn Decision Tree.

Challenges in Deep Learning

Output layer
input layer hidden layer1 hidden layer2
Deep Learning has become one of the primary research areas in
developing intelligent machines. Most of the well-known
applications (such as Speech Recognition, Image Processing and
NLP) of AI are driven by Deep Learning. Deep Learning algorithms
mimic human brains using artificial neural networks and
progressively learn to accurately solve a given problem. But there
are significant challenges in Deep Learning systems which we have
to look out for.
In the words of Andrew Ng, one of the most prominent names in
Deep Learning:
“I believe Deep Learning is our best shot at progress towards
real AI.”
If you look around, you might realize the power of the above
statement by Andrew. From Siris and Cortanas to Google Photos,
from Grammarly to Spotify’s music recommendations are all
powered by Deep Learning. These are just a few examples of how
deep in our life Deep Learning has come.
But, with great technological advances comes complex difficulties
and hurdles. In this post, we shall discuss prominent challenges in
Deep Learning.
Challenges in Deep Learning
Lots and lots of data
Deep learning algorithms are trained to learn progressively using
data. Large data sets are needed to make sure that the machine
delivers desired results. As human brain needs a lot of experiences
to learn and deduce information, the analogous artificial neural
network requires copious amount of data. The more powerful
abstraction you want, the more parameters need to be tuned and
more parameters require more data.
For example, a speech recognition program would require data
from multiple dialects, demographics and time scales. Researchers
feed terabytes of data for the algorithm to learn a single language.
This is a time-consuming process and requires tremendous data
processing capabilities. To some extent, the scope of solving a
problem through Deep Learning is subjected to availability of huge
corpus of data it would train on.
The complexity of a neural network can be expressed through the
number of parameters. In the case of deep neural networks, this
number can be in the range of millions, tens of millions and in some
cases even hundreds of millions.
Let’s call this number P. Since you want to be sure of the model’s
ability to generalize, a good rule of a thumb for the number of data
points is at least P*P.
Overfitting in neural networks
At times, the there is a sharp difference in error occurred in training
data set and the error encountered in a new unseen data set. It
occurs in complex models, such as having too many parameters
relative to the number of observations. The efficacy of a model is
judged by its ability to perform well on an unseen data set and not
by its performance on the training data fed to it.

Training error in blue, Validation error in red (Overfitting) as a


function of the number of cycles. Credits: Wikipedia
In general, a model is typically trained by maximizing its
performance on a particular training data set. The model thus
memorizes the training examples but does not learn to generalize to
new situations and data set.
Hyperparameter Optimization
Hyperparameters are the parameters whose value is defined prior
to the commencement of the learning process. Changing the value
of such parameters by a small amount can invoke a large change in
the performance of your model.
Relying on the default parameters and not performing Hyper
parameter Optimization can have a significant impact on the model
performance. Also, having too few hyper parameters and hand
tuning them rather than optimizing through proven methods is also
a performance driving aspect.
Requires high-performance hardware
Training a data set for a Deep Learning solution requires a lot of
data. To perform a task to solve real world problems, the machine
needs to be equipped with adequate processing power. To ensure
better efficiency and less time consumption, data scientists switch
to multi-core high performing GPUs and similar processing units.
These processing units are costly and consume a lot of power.
Industry level Deep Learning systems require high-end data centers while smart devices such
as drones, robots other mobile devices require small but efficient processing units. Deploying
Deep Learning solution to the real world thus becomes a costly and power consuming affair.
Neural networks are essentially a Blackbox
We know our model parameters, we feed known data to the neural networks
and how they are put together. But we usually do not understand how they arrive at a
particular solution. Neural networks are essentially Balckboxes and researchers have a hard
time understanding how they deduce conclusions.

Lack of Flexibility and Multitasking


Deep Learning models, once trained, can deliver tremendously
efficient and accurate solution to a specific problem. However, in
the current landscape, the neural network architectures are highly
specialized to specific domains of application.
Google Deep Mind’s Research Scientist Raia Hadsell summed it
up:
“There is no neural network in the world, and no method right
now that can be trained to identify objects and images, play
Space Invaders, and listen to music.”

You might also like