UNIT1
UNIT1
There are three key aspects of Machine Learning, which are as follows
The goal of supervised learning is to map input data with the output data. Supervised learning
is based on supervision, and it is the same as when a student learns things in the teacher's
supervision. The example of supervised learning is spam filtering.
o Classification
o Regression
Examples of some popular supervised learning algorithms are Simple Linear regression,
Decision Tree, Logistic Regression, KNN algorithm, etc. Read more..
o Clustering
o Association
3) Reinforcement Learning
In Reinforcement learning, an agent interacts with its environment by producing actions, and
learn with the help of feedback. The feedback is given to the agent in the form of rewards,
such as for each good action, he gets a positive reward, and for each bad action, he gets a
negative reward. There is no supervision provided to the agent. Q-Learning
algorithm is used in reinforcement learning. Read more…
1. Linear Regression
Linear regression is one of the most popular and simple machine learning
algorithms that is used for predictive analysis. Here, predictive
analysis defines prediction of something, and linear regression makes
predictions for continuous numbers such as salary, age, etc.
y= a0+ a*x+ b
x= independent variable
2. Logistic Regression
Logistic regression is the supervised learning algorithm, which is used
to predict the categorical variables or discrete values. It can be
used for the classification problems in machine learning, and the output of
the logistic regression algorithm can be either Yes or NO, 0 or 1, Red or
Blue, etc.
Logistic regression is similar to the linear regression except how they are
used, such as Linear regression is used to solve the regression problem
and predict continuous values, whereas Logistic regression is used to
solve the Classification problem and used to predict the discrete values.
Instead of fitting the best fit line, it forms an S-shaped curve that lies
between 0 and 1. The S-shaped curve is also known as a logistic function
that uses the concept of the threshold. Any value above the threshold will
tend to 1, and below the threshold will tend to 0.
Naïve Bayes classifier is one of the best classifiers that provide a good
result for a given problem. It is easy to build a naïve Bayesian model, and
well suited for the huge amount of dataset. It is mostly used for text
classification.
7. K-Means Clustering
K-means clustering is one of the simplest unsupervised learning
algorithms, which is used to solve the clustering problems. The datasets
are grouped into K different clusters based on similarities and
dissimilarities, it means, datasets with most of the commonalties remain
in one cluster which has very less or no commonalities between other
clusters. In K-means, K-refers to the number of clusters, and means refer
to the averaging the dataset in order to find the centroid.
It can be used for spam detection and filtering, identification of fake news,
etc. Read more..
It contains multiple decision trees for subsets of the given dataset, and
find the average to improve the predictive accuracy of the model. A
random-forest should contain 64-128 trees. The greater number of trees
leads to higher accuracy of the algorithm.
Random forest is a fast algorithm, and can efficiently deal with the
missing & incorrect data. Read more..
9. Apriori Algorithm
Apriori algorithm is the unsupervised learning algorithm that is used to
solve the association problems. It uses frequent itemsets to generate
association rules, and it is designed to work on the databases that contain
transactions. With the help of these association rule, it determines how
strongly or how weakly two objects are connected to each other. This
algorithm uses a breadth-first search and Hash Tree to calculate the
itemset efficiently.
The algorithm process iteratively for finding the frequent itemsets from
the large dataset.
The apriori algorithm was given by the R. Agrawal and Srikant in the
year 1994. It is mainly used for market basket analysis and helps to
understand the products that can be bought together. It can also be used
in the healthcare field to find drug reactions in patients. Read more..
PCA works by considering the variance of each attribute because the high
variance shows the good split between the classes, and hence it reduces
the dimensionality.
o Signal: It refers to the true underlying pattern of the data that helps the
machine learning model to learn from the data.
o Noise: Noise is unnecessary and irrelevant data that reduces the
performance of the model.
o Bias: Bias is a prediction error that is introduced in the model due to
oversimplifying the machine learning algorithms. Or it is the difference
between the predicted values and the actual values.
o Variance: If the machine learning model performs well with the training
dataset, but does not perform well with the test dataset, then variance
occurs.
Overfitting
Overfitting occurs when our machine learning
model tries to cover all the data points or more than the required data points present in the given
dataset. Because of this, the model starts caching noise and inaccurate values present in the dataset,
and all these factors reduce the efficiency and accuracy of the model. The overfitted model has low
bias and high variance.
o Cross-Validation
o Training with more data
o Removing features
o Early stopping the training
o Regularization
o Ensembling
Underfitting
Underfitting occurs when our machine learning model is not able to capture the underlying
trend of the data. To avoid the overfitting in the model, the fed of training data can be stopped
at an early stage, due to which the model may not learn enough from the training data. As a
result, it may fail to find the best fit of the dominant trend in the data.
In the case of underfitting, the model is not able to learn enough from the training data, and
hence it reduces the accuracy and produces unreliable predictions.
An underfitted model has high bias and low variance.
Example: We can understand the underfitting using below output of the linear
regression model:
As we can
see from the above diagram, the model is unable to capture the data
points present in the plot.
Goodness of Fit
The "Goodness of fit" term is taken from the statistics, and the goal of the
machine learning models to achieve the goodness of fit. In statistics
modelling, it defines how closely the result or predicted values match the
true values of the dataset.
The model with a good fit is between the underfitted and overfitted model,
and ideally, it makes predictions with 0 errors, but in practice, it is difficult
to achieve it.
As when we train our model for a time, the errors in the training data go
down, and the same happens with test data. But if we train the model for
a long duration, then the performance of the model may decrease due to
the overfitting, as the model also learn the noise present in the dataset.
The errors in the test dataset start increasing, so the point, just before the
raising of errors, is the good point, and we can stop here for achieving a
good model.
Hyper parameters
Hyperparameters in Machine learning are those parameters that are
explicitly defined by the user to control the learning process.
Here the prefix "hyper" suggests that the parameters are top-level
parameters that are used in controlling the learning process. The value of
the Hyperparameter is selected and set by the machine learning engineer
before the learning algorithm begins training the model. Hence, these
are external to the model, and their values cannot be changed
during the training process.
What is an estimator?
Uses of Estimators
Types of Estimators
What is Bias?
In general, a machine learning model analyses the data, find patterns in it
and make predictions. While training, the model learns these patterns in
the dataset and applies them to test data for prediction. While making
predictions, a difference occurs between prediction values made
by the model and actual values/expected values, and this
difference is known as bias errors or Errors due to bias. It can be
defined as an inability of machine learning algorithms such as Linear
Regression to capture the true relationship between the data points. Each
algorithm begins with some amount of bias because bias occurs from
assumptions in the model, which makes the target function simple to
learn. A model has either:
o Low Bias: A low bias model will make fewer assumptions about the
form of the target function.
o High Bias: A model with a high bias makes more assumptions, and
the model becomes unable to capture the important features of our
dataset. A high bias model also cannot perform well on new
data.
Generally, a linear algorithm has a high bias, as it makes them learn fast.
The simpler the algorithm, the higher the bias it has likely to be
introduced. Whereas a nonlinear algorithm often has low bias.
A model that shows high variance learns a lot and perform well with the
training dataset, and does not generalize well with the unseen dataset. As
a result, such a model gives good results with the training dataset but
shows high error rates on the test dataset.
Since, with high variance, the model learns too much from the dataset, it
leads to overfitting of the model. A model with high variance has the
below problems:
Usually, nonlinear algorithms have a lot of flexibility to fit the model, have
high variance.
p(θ|x)=p(x|θ)p(θ)p(x)
(𝑝(𝜃|𝑥)p(θ|x)) given the likelihood (𝑝(𝑥|𝜃)p(x|θ)) and the prior distribution, 𝑝(𝜃)p(θ).
Generally speaking, the goal of Bayesian ML is to estimate the posterior distribution
The likelihood is something that can be estimated from the training data.
In fact, that’s exactly what we’re doing when training a regular machine learning
model. We’re performing Maximum Likelihood Estimation, an iterative process which
the training data 𝑥x having already seen the model parameters 𝜃θ. We call this
updates the model’s parameters in an attempt to maximize the probability of seeing
process Maximum a Posteriori (MAP). It’s easier, however, to think about it in terms
of the likelihood function. By Bayes’ Theorem we can write the posterior as
p(θ|x)∝p(x|θ)p(θ)
with respect to 𝜃θ which 𝑝(𝑥)p(x) does not depend on. Therefore, we can ignore it in
Here we leave out the denominator, (𝑥)p(x), because we are taking the maximization
the maximization procedure. The key piece of the puzzle which leads Bayesian
the term (𝜃)p(θ). We call this the prior distribution over 𝜃θ.
models to differ from their classical counterparts trained by MLE is the inclusion of
o If the given shape has four sides, and all the sides are equal, then it will be
labelled as a Square.
o If the given shape has three sides, then it will be labelled as a triangle.
o If the given shape has six equal sides then it will be labelled as hexagon.
Now, after training, we test our model using the test set, and the task of
the model is to identify the shape.
The machine is already trained on all types of shapes, and when it finds a
new shape, it classifies the shape on the bases of a number of sides, and
predicts the output.
1. Regression
o Linear Regression
o Regression Trees
o Non-Linear Regression
o Bayesian Linear Regression
o Polynomial Regression
2. Classification
Spam Filtering,
o Random Forest
o Decision Trees
o Logistic Regression
o Support vector Machines
Unsupervised learning is a type of machine learning in which models are trained using unlabeled
dataset and are allowed to act on that data without any supervision
o Unsupervised learning is helpful for finding useful insights from the data.
o Unsupervised learning is much similar as a human learns to think by their
own experiences, which makes it closer to the real AI.
o Unsupervised learning works on unlabeled and uncategorized data which
make unsupervised learning more important.
o In real-world, we do not always have input data with the corresponding
output so to solve such cases, we need unsupervised learning.
Here, we have taken an unlabeled input data, which means it is not categorized
and corresponding outputs are also not given. Now, this unlabeled input data is
fed to the machine learning model in order to train it. Firstly, it will interpret the
raw data to find the hidden patterns from the data and then will apply suitable
algorithms such as k-means clustering, Decision tree, etc.
o K-means clustering
o KNN (k-nearest neighbors)
o Hierarchal clustering
o Anomaly detection
o Neural Networks
o Principle Component Analysis
o Independent Component Analysis
o Apriori algorithm
o Singular value decomposition
Data scientists might use predictive analytics for data science-specific use
cases, whereas, another Artificial Intelligence (AI) team might build machine
learning systems for other reasons. E.g., a project team might use machine
learning with AI capabilities like natural language processing (NLP), computer
vision, etc.
Review the prominent machine learning algorithms before choosing the right
algorithm to build. The following examples of important machine learning
algorithms:
A. NAÏVE BAYES CLASSIFIER ALGORITHM
ML (machine learning) project teams use this popular algorithm to solve
classification problems. It uses the supervised learning approach, i.e., it works
with “labeled” input data.
D. LINEAR REGRESSION
Data scientists and ML project teams make great use of this supervised
learning algorithm to solve linear regression problems.
E. LOGISTIC REGRESSION
This supervised learning algorithm helps to address machine learning
problems where you need to find discrete values of dependent variables from
independent variables.
G. DECISION TREES
This supervised learning algorithm helps to create flow charts that look like
trees. ML projects use it for solving many real-world problems like binary
classification problems.
3. Learn About The Algorithm Before Diving Deep Into How To Develop A
Machine Learning Algorithm
You need to learn sufficiently about the algorithm that you have decided to
build. Understand the functionality of the algorithm, and understand where it’s
used. Learn when you shouldn’t use this algorithm.
An ML project team needs to prepare data sets first. This enables them to have
clean, consistent, and accurate data sets.
You need to take help from business stakeholders and data scientists for this.
They need the same unlimited access to the data that your ML developers
have.
Implement a set of repeatable steps so that you can execute them for new data
sets. Invest in technology solutions so that you can prepare more data when
you need it with the same scale and speed.
A. DATA COLLECTION
You need to first collect data from the relevant data sources. Your ML project
team should work on the following challenges at this stage:
Parsing data from files like XML and JSON into tabular formats;
Combining data into the appropriate number of data sets;
Look for issues that could introduce biases in your expected outputs.
C. ORGANIZE THE DATA SETS IN THE APPROPRIATE FORMAT FOR
CONSISTENCY
You might have gathered data for your training and test sets from different data
sources. They might have different formats.
Furthermore, you might not be the only one to manually update the data sets.
Other users might have unlimited access to the data sets, and they might
update them. All of the above examples might result in different formats in
different data sets.
However, your machine learning model might need the data in a certain format.
Your team needs to organize your input data sets in that format. This task
might require standardizing certain values in several columns.
Ensure that your modified data sets are similar to the real data sets.
E. FEATURE ENGINEERING AFTER ANALYZING THE INPUT VARIABLES
The term “feature engineering” refers to the act of modifying raw data into
features for the understanding of machine learning algorithms. This step helps
ML algorithms to understand the data better since they can see patterns in the
data.
Feature engineering might involve decomposing the inputs data sets into
multiple parts. An ML project team might do this to categorize data by different
values.
Each part of the data set will help the ML algorithm to understand specific
relationships in the data sets. The ML algorithms can also find patterns in the
data.
F. SPLIT DATA SETS INTO TRAINING DATA AND TEST DATA SETS
You can now divide your input data sets into two sets. One of these two sets is
to train the ML algorithm that you are building. You should use the other data
set for testing your algorithm.
What if you have heavily skewed training examples in your input data? This
can result in biases. This can adversely impact the performance of your
machine learning model, and this is especially true with respect to complex
problems. You need to choose the “random state” effectively. This argument
helps you to eliminate biases in your input data sets.
AI and ML systems learn from input data sets and improve their performance
over time. The quality of learning influences their performance, therefore, you
need to feed them with high-quality training data.
The exact work in this phase will depend upon the algorithm you are
developing. You can refer to authoritative books and blog posts for more
information before you create the pseudocode. The following are a few
examples of authoritative resources:
Review the machine learning model created during this training, and analyze
the outliers. You might find problems with the input data that earlier escaped
your attention.
Analyze data errors if you find them. Run the previously-created data
preparation process to create better training data. Reiterate the training and
review processes.
9. Test The Machine Learning Algorithm
You now need to validate the ML algorithm with the help of your test data set.
Execute the algorithm and create an ML model. Review the output in detail.
Pay special attention to outliers and exceptions, and examine the reasons.
Check whether the outliers and exceptions originated due to errors in the input
data sets. In that case, make the necessary corrections in the input data sets.
Rerun the tests. Reiterate the review process.
You would want to compare the output of your ML algorithm against a standard
implementation of that algorithm and the same input data set. Scikit-learn, a
popular Python library already includes standard implementations of many
popular ML algorithms. The following are a few examples:
Output layer
input layer hidden layer1 hidden layer2
Deep Learning has become one of the primary research areas in
developing intelligent machines. Most of the well-known
applications (such as Speech Recognition, Image Processing and
NLP) of AI are driven by Deep Learning. Deep Learning algorithms
mimic human brains using artificial neural networks and
progressively learn to accurately solve a given problem. But there
are significant challenges in Deep Learning systems which we have
to look out for.
In the words of Andrew Ng, one of the most prominent names in
Deep Learning:
“I believe Deep Learning is our best shot at progress towards
real AI.”
If you look around, you might realize the power of the above
statement by Andrew. From Siris and Cortanas to Google Photos,
from Grammarly to Spotify’s music recommendations are all
powered by Deep Learning. These are just a few examples of how
deep in our life Deep Learning has come.
But, with great technological advances comes complex difficulties
and hurdles. In this post, we shall discuss prominent challenges in
Deep Learning.
Challenges in Deep Learning
Lots and lots of data
Deep learning algorithms are trained to learn progressively using
data. Large data sets are needed to make sure that the machine
delivers desired results. As human brain needs a lot of experiences
to learn and deduce information, the analogous artificial neural
network requires copious amount of data. The more powerful
abstraction you want, the more parameters need to be tuned and
more parameters require more data.
For example, a speech recognition program would require data
from multiple dialects, demographics and time scales. Researchers
feed terabytes of data for the algorithm to learn a single language.
This is a time-consuming process and requires tremendous data
processing capabilities. To some extent, the scope of solving a
problem through Deep Learning is subjected to availability of huge
corpus of data it would train on.
The complexity of a neural network can be expressed through the
number of parameters. In the case of deep neural networks, this
number can be in the range of millions, tens of millions and in some
cases even hundreds of millions.
Let’s call this number P. Since you want to be sure of the model’s
ability to generalize, a good rule of a thumb for the number of data
points is at least P*P.
Overfitting in neural networks
At times, the there is a sharp difference in error occurred in training
data set and the error encountered in a new unseen data set. It
occurs in complex models, such as having too many parameters
relative to the number of observations. The efficacy of a model is
judged by its ability to perform well on an unseen data set and not
by its performance on the training data fed to it.