Unit 4 Machine Learning
Unit 4 Machine Learning
Unit 4
Vinay S. Prabhavalkar
Application of Machine Learning
Vinay S. Prabhavalkar
Application of Machine Learning
2. Product Recommendations
Vinay S. Prabhavalkar
Application of Machine Learning
3. Image Recognition
Vinay S. Prabhavalkar
Application of Machine Learning
4. Sentiment Analysis
• Sentiment analysis is one of the most necessary applications of machine
learning. Sentiment analysis is a real-time machine learning application
that determines the emotion or opinion of the speaker or the writer. For
instance, if someone has written a review or email (or any form of a
document), a sentiment analyzer will instantly find out the actual thought
and tone of the text. This sentiment analysis application can be used to
analyze a review based website, decision-making applications, etc
Vinay S. Prabhavalkar
Application of Machine Learning
5. Virtual Personal Assistants
As the name suggests, Virtual Personal Assistants assist in finding useful
information, when asked via text or voice. Few of the major applications of
Machine Learning here are:
Speech Recognition
Speech to Text Conversion
Natural Language Processing
Text to Speech Conversion
Vinay S. Prabhavalkar
Application of Machine Learning
6. Self Driving Cars
• Well, here is one of the coolest application of Machine Learning. It’s here
and people are already using it. Machine Learning plays a very important
role in Self Driving Cars and I’m sure you guys might have heard
about Tesla. The leader in this business and their current Artificial
Intelligence is driven by hardware manufacturer NVIDIA, which is based
on Unsupervised Learning Algorithm.
• NVIDIA stated that they didn’t train their model to detect people or any
object as such. The model works on Deep Learning and it crowdsources
data from all of its vehicles and its drivers. It uses internal and external
sensors which are a part of IOT. According to the data gathered by
McKinsey, the automotive data will hold a tremendous value of $750
Billion.
Vinay S. Prabhavalkar
Application of Machine Learning
7. Entertainment
Companies such as Netflix, Amazon, YouTube, and Spotify give
relevant movies, songs, and video recommendations to enhance their
customer experience.
This is all thanks to Deep Learning. Based on a person’s browsing
history, interest, and behavior, online streaming companies give
suggestions to help them make product and service choices.
Deep learning techniques are also used to add sound to silent movies
and generate subtitles automatically.
Vinay S. Prabhavalkar
What is Machine Learning
• Definition of Machine Learning
Vinay S. Prabhavalkar
What is Machine Learning
• Definition of Machine Learning
Vinay S. Prabhavalkar
Terminology in Machine Learning
• An Algorithm is a set of rules that a machine follows to achieve a particular goal.
• Machine Learning is a set of methods that allow computers to learn from data to
make and improve predictions (for example cancer, weekly sales, credit default).
• A Learner or Machine Learning Algorithm is the program used to learn a
machine learning model from data. Another name is "inducer" (e.g. "tree
inducer").
• A Machine Learning Model is the learned program that maps inputs to
predictions. A trained machine is also called as a Model.
• A Dataset is a table with the data from which the machine learns. An Instance is
a row in the dataset. A feature is a column in the dataset.
• The Target is the information the machine learns to predict. In mathematical
formulas, the target is usually called y.
• The Prediction is what the machine learning model "guesses" what the target
value should be based on the given features.
• A Training set is used to train a machine.
• A Test Set is used to test a already trained machine.
• A validation set is used to verify a trained machine.
Vinay S. Prabhavalkar
https://christophm.github.io/interpretable-ml-book/terminology.html
Terminology in Machine Learning
Vinay S. Prabhavalkar
Terminology in Machine Learning
Vinay S. Prabhavalkar
Terminology in Machine Learning
• Bias is the gap between predicted value by the model and the actual or target
value.
• Variance tells how scattered the predicted values are.
Vinay S. Prabhavalkar
https://christophm.github.io/interpretable-ml-book/terminology.html
Types of Machine Learning
Vinay S. Prabhavalkar
Types of Machine Learning
1. Supervised learning:
It is applicable when a machine has sample data, i.e., input as well as output
data with correct labels. Correct labels are used to check the correctness of
the model using some labels and tags.
Supervised learning technique helps us to predict future events with the help
of past experience and labeled examples. Initially, it analyses the known
training dataset, and later it introduces an inferred function that makes
predictions about output values. Further, it also predicts errors during this
entire learning process and also corrects those errors through algorithms.
Vinay S. Prabhavalkar
Types of Machine Learning
1. Supervised learning:
Vinay S. Prabhavalkar
Types of Machine Learning
2. Unsupervised Learning:
2. Unsupervised Learning:
Here, we have taken an unlabeled input data, which means it is not
categorized and corresponding outputs are also not given. Now, this
unlabeled input data is fed to the machine learning model in order to
train it. Firstly, it will interpret the raw data to find the hidden patterns
from the data and then will apply suitable algorithms such as k-means
clustering, Decision tree, etc.
Vinay S. Prabhavalkar
Types of Machine Learning
3. Reinforcement Learning:
For each good action, they get a positive reward, and for each bad action,
they get a negative reward.
Since there is no labeled data, the agent is bound to learn by its experience
only.
Vinay S. Prabhavalkar
Types of Machine Learning
3. Reinforcement Learning:
An RL problem can be best explained through games. Let’s take the game
of PacMan where the goal of the agent(PacMan) is to eat the food in the grid
while avoiding the ghosts on its way. In this case, the grid world is the
interactive environment for the agent where it acts. Agent receives a reward
for eating food and punishment if it gets killed by the ghost (loses the game).
The states are the location of the agent in the grid world and the total
cumulative reward is the agent winning the game.
Vinay S. Prabhavalkar
Types of Machine Learning
4. Semi-supervised Learning:
Vinay S. Prabhavalkar
Machine Learning Problem Categories
Vinay S. Prabhavalkar
Machine Learning Problem Categories
1. Classification:
Vinay S. Prabhavalkar
Machine Learning Problem Categories
1. Classification:
Vinay S. Prabhavalkar
Machine Learning Problem Categories
1. Classification:
2. Binary Classification: It refers to those classification tasks that have two
class labels.
Examples include:
i. Email spam detection (spam or not).
ii. Churn prediction (churn or not).
iii. Conversion prediction (buy or not).
• Typically, binary classification tasks involve one class that is the normal
state and another class that is the abnormal state.
• Bernoulli probability distribution is used to classify such a problem.
• Popular algorithms that can be used for binary classification include:
▪ Logistic Regression
▪ k-Nearest Neighbors
▪ Decision Trees
▪ Support Vector Machine
▪ Naive Bayes Vinay S. Prabhavalkar
Machine Learning Problem Categories
1. Classification:
3. Multi-Class Classification: It refers to those classification tasks that have
more than two class labels.
Examples include:
i. Face classification.
ii. Plant species classification.
iii. Optical character recognition.
• The number of class labels may be very large on some problems. For
example, a model may predict a photo as belonging to one among
thousands or tens of thousands of faces in a face recognition system.
Vinay S. Prabhavalkar
Machine Learning Problem Categories
• Multinoulli probability distribution is used to classify such a problem.
• Popular algorithms that can be used for multi-class classification include:
i. k-Nearest Neighbors.
ii. Decision Trees.
iii. Naive Bayes.
iv. Random Forest.
v. Gradient Boosting.
Vinay S. Prabhavalkar
Machine Learning Problem Categories
1. Classification:
4. Multi-Label Classification: It refers to those classification tasks that have
two or more class labels, where one or more class labels may be
predicted for each example.
• Special multi-label versions of the algorithms given below are used for
classification:-
i. Multi-label Decision Trees
ii. Multi-label Random Forests
iii. Multi-label Gradient Boosting
Vinay S. Prabhavalkar
Machine Learning Problem Categories
1. Classification:
5. Imbalanced Classification: It refers to classification tasks where the
number of examples in each class is unequally distributed.
• Typically, imbalanced classification tasks are binary classification tasks
where the majority of examples in the training dataset belong to the
normal class and a minority of examples belong to the abnormal class.
Examples include:
i. Fraud detection.
ii. Outlier detection.
iii. Medical diagnostic tests
• Specialized techniques may be used to change the composition of samples in
the training dataset by under-sampling the majority class or oversampling
the minority class.
Examples include:
i. Random Under-sampling
ii. SMOTE Oversampling
Vinay S. Prabhavalkar
Machine Learning Problem Categories
• Specialized modeling algorithms may be used that pay more attention to
the minority class when fitting the model on the training dataset, such as
cost-sensitive machine learning algorithms.
i. Examples include:
ii. Cost-sensitive Logistic Regression
iii. Cost-sensitive Decision Trees.
Vinay S. Prabhavalkar
Machine Learning Problem Categories
2. Clustering:
The below diagram explains the working of the clustering algorithm. We can see
the different fruits are divided into several groups with similar properties.
Vinay S. Prabhavalkar
Machine Learning Problem Categories
2. Clustering:
Vinay S. Prabhavalkar
Machine Learning Problem Categories
2. Clustering:
a. Partitioning Clustering:-
Vinay S. Prabhavalkar
Machine Learning Problem Categories
2. Clustering:
d. Hierarchical Clustering:-
e. Fuzzy Clustering:-
• Clustering is a type of soft method in which a data object may belong to
more than one group or cluster.
• Each dataset has a set of membership coefficients, which depend on the
degree of membership to be in a cluster.
• Fuzzy C-means algorithm is the example of this type of clustering; it is
sometimes also known as the Fuzzy k-means algorithm.
Vinay S. Prabhavalkar
Machine Learning Problem Categories
2. Clustering:
Clustering Algorithms:-
• K-Means algorithm: The k-means algorithm is one of the most popular clustering algorithms. It
classifies the dataset by dividing the samples into different clusters of equal variances. The
number of clusters must be specified in this algorithm. It is fast with fewer computations
required, with the linear complexity of O(n).
• Mean-shift algorithm: Mean-shift algorithm tries to find the dense areas in the smooth density
of data points. It is an example of a centroid-based model, that works on updating the
candidates for centroid to be the center of the points within a given region.
• DBSCAN Algorithm: It stands for Density-Based Spatial Clustering of Applications with Noise. It
is an example of a density-based model similar to the mean-shift, but with some remarkable
advantages. In this algorithm, the areas of high density are separated by the areas of low
density. Because of this, the clusters can be found in any arbitrary shape.
• Expectation-Maximization Clustering using GMM: This algorithm can be used as an alternative
for the k-means algorithm or for those cases where K-means can be failed. In GMM, it is
assumed that the data points are Gaussian distributed.
• Agglomerative Hierarchical algorithm: The Agglomerative hierarchical algorithm performs the
bottom-up hierarchical clustering. In this, each data point is treated as a single cluster at the
outset and then successively merged. The cluster hierarchy can be represented as a tree-
structure.
• Affinity Propagation: It is different from other clustering algorithms as it does not require to
specify the number of clusters. In this, each data point sends a message between the pair of data
points until convergence. It has O(N2T) time complexity, which is the main drawback of this
algorithm. Vinay S. Prabhavalkar
Machine Learning Problem Categories
3. Regression:
• Regression analysis is a statistical method to model the relationship between a
dependent (target) and independent (predictor) variables with one or more
independent variables. More specifically, Regression analysis helps us to understand
how the value of the dependent variable is changing corresponding to an
independent variable when other independent variables are held fixed. It predicts
continuous/real values such as temperature, age, salary, price, etc.
• We can understand the concept of regression analysis using the below example:
• In Regression, we plot a graph between the variables which best fits the given
datapoints, using this plot, the machine learning model can make predictions about
the data. In simple words, "Regression shows a line or curve that passes through all
the datapoints on target-predictor graph in such a way that the vertical distance
between the datapoints and the regression line is minimum." The distance between
datapoints and line tells whether a model has captured a strong relationship or not.
Vinay S. Prabhavalkar
Machine Learning Problem Categories
3. Regression:
Terminologies Related to the Regression Analysis:
• Dependent Variable: The main factor in Regression analysis which we want to predict
or understand is called the dependent variable. It is also called target variable.
• Independent Variable: The factors which affect the dependent variables or which are
used to predict the values of the dependent variables are called independent
variable, also called as a predictor.
• Outliers: Outlier is an observation which contains either very low value or very high
value in comparison to other observed values. An outlier may hamper the result, so it
should be avoided.
• Multicollinearity: If the independent variables are highly correlated with each other
than other variables, then such condition is called Multicollinearity. It should not be
present in the dataset, because it creates problem while ranking the most affecting
variable.
• Underfitting and Overfitting: If our algorithm works well with the training dataset but
not well with test dataset, then such problem is called Overfitting. And if our
algorithm does not perform well even with training dataset, then such problem is
called underfitting.
Vinay S. Prabhavalkar
Machine Learning Problem Categories
4. Optimization:
• Optimization is the problem of finding a set of inputs to an objective
function that results in a maximum or minimum function evaluation.
• Optimization, in simple terms, is a mechanism to make something better or
define a context for a solution that makes it the best.
• Consider a production scenario:- Let's assume there are two machines that
produce the desired product-
– one machine requires more energy for high speed in production and
lower raw materials
– other requires higher raw materials and less energy to produce the same
output in the same time.
• It is important to understand the patterns in the output based on the
variation in inputs; a combination that gives the highest profits would
probably be the one the production manager would want to know.
• As an analyst, one needs to identify the best possible way to distribute the
production between the machines that gives them the highest profit.
Vinay S. Prabhavalkar
Machine Learning Problem Categories
4. Optimization:
• The following image shows the point of highest profit when a graph was
plotted for various distribution options between the two machines.
Identifying this point is the goal of this technique.
Vinay S. Prabhavalkar
Data and inconsistencies in Machine learning
• Signal: It refers to the true underlying pattern of the data that helps the
machine learning model to learn from the data.
• Variance: If the machine learning model performs well with the training
dataset, but does not perform well with the test dataset, then variance
occurs.
Vinay S. Prabhavalkar
Data and inconsistencies in Machine learning
• Under-fitting: Underfitting occurs when our machine learning model is
not able to capture the underlying trend of the data.
• Example: We can understand the underfitting using below output of the
linear regression model:
• As we can see from the above diagram, the model is unable to capture the
data points present in the plot.
Vinay S. Prabhavalkar
Data and inconsistencies in Machine learning
• Overfitting: Overfitting occurs when our machine learning model tries to
cover all the data points or more than the required data points present in
the given dataset.
• Because of this, the model starts caching noise and inaccurate values
present in the dataset, and all these factors reduce the efficiency and
accuracy of the model.
• The overfitted model has low bias and high variance.
Vinay S. Prabhavalkar
Train Test Split
Vinay S. Prabhavalkar