01. ML,Types,Application,Life Cycle,Issues
01. ML,Types,Application,Life Cycle,Issues
Machine Learning tutorial provides basic and advanced concepts of machine learning.
Our machine learning tutorial is designed for students and working professionals.
This machine learning tutorial gives you an introduction to machine learning along with
the wide range of machine learning techniques such as Supervised, Unsupervised,
and Reinforcement learning. You will learn about regression and classification models,
clustering methods, hidden Markov models, and various sequential models.
Machine learning enables a machine to automatically learn from data, improve performance
from experiences, and predict things without being explicitly programmed.
With the help of sample historical data, which is known as training data, machine learning
algorithms build a mathematical model that helps in making predictions or decisions
without being explicitly programmed. Machine learning brings computer science and
statistics together for creating predictive models. Machine learning constructs or uses the
algorithms that learn from historical data. The more we will provide the information, the
higher will be the performance.
A machine has the ability to learn if it can improve its performance by gaining more
data.
We can train machine learning algorithms by providing them the huge amount of data
and let them explore the data, construct the models, and predict the required output
automatically. The performance of the machine learning algorithm depends on the
amount of data, and it can be determined by the cost function. With the help of machine
learning, we can save both time and money.
The importance of machine learning can be easily understood by its uses cases, Currently,
machine learning is used in self-driving cars, cyber fraud detection, face recognition,
and friend suggestion by Facebook, etc. Various top companies such as Netflix and
Amazon have build machine learning models that are using a vast amount of data to
analyze the user interest and recommend product accordingly.
Following are some key points which show the importance of Machine Learning:
1. Supervised learning
2. Unsupervised learning
3. Reinforcement learning
1) Supervised Learning
Supervised learning is a type of machine learning method in which we provide sample
labeled data to the machine learning system in order to train it, and on that basis, it
predicts the output.
The system creates a model using labeled data to understand the datasets and learn about
each data, once the training and processing are done then we test the model by providing
a sample data to check whether it is predicting the exact output or not.
The goal of supervised learning is to map input data with the output data. The supervised
learning is based on supervision, and it is the same as when a student learns things in the
supervision of the teacher. The example of supervised learning is spam filtering.
o Classification
o Regression
2) Unsupervised Learning
Unsupervised learning is a learning method in which a machine learns without any
supervision.
The training is provided to the machine with the set of data that has not been labeled,
classified, or categorized, and the algorithm needs to act on that data without any
supervision. The goal of unsupervised learning is to restructure the input data into new
features or a group of objects with similar patterns.
In unsupervised learning, we don't have a predetermined result. The machine tries to find
useful insights from the huge amount of data. It can be further classifieds into two
categories of algorithms:
o Clustering
o Association
3) Reinforcement Learning
Reinforcement learning is a feedback-based learning method, in which a learning agent
gets a reward for each right action and gets a penalty for each wrong action. The agent
learns automatically with these feedbacks and improves its performance. In reinforcement
learning, the agent interacts with the environment and explores it. The goal of an agent
is to get the most reward points, and hence, it improves its performance.
The robotic dog, which automatically learns the movement of his arms, is an example of
Reinforcement learning.
1. Image Recognition:
Image recognition is one of the most common applications of machine learning. It is used
to identify objects, persons, places, digital images, etc. The popular use case of image
recognition and face detection is, Automatic friend tagging suggestion:
Facebook provides us a feature of auto friend tagging suggestion. Whenever we upload
a photo with our Facebook friends, then we automatically get a tagging suggestion with
name, and the technology behind this is machine learning's face
detection and recognition algorithm.
It is based on the Facebook project named "Deep Face," which is responsible for face
recognition and person identification in the picture.
2. Speech Recognition
While using Google, we get an option of "Search by voice," it comes under speech
recognition, and it's a popular application of machine learning.
Speech recognition is a process of converting voice instructions into text, and it is also
known as "Speech to text", or "Computer speech recognition." At present, machine
learning algorithms are widely used by various applications of speech recognition. Google
assistant, Siri, Cortana, and Alexa are using speech recognition technology to follow the
voice instructions.
3. Traffic prediction:
If we want to visit a new place, we take help of Google Maps, which shows us the correct
path with the shortest route and predicts the traffic conditions.
It predicts the traffic conditions such as whether traffic is cleared, slow-moving, or heavily
congested with the help of two ways:
o Real Time location of the vehicle form Google Map app and sensors
o Average time has taken on past days at the same time.
Everyone who is using Google Map is helping this app to make it better. It takes
information from the user and sends back to its database to improve the performance.
4. Product recommendations:
Machine learning is widely used by various e-commerce and entertainment companies
such as Amazon, Netflix, etc., for product recommendation to the user. Whenever we
search for some product on Amazon, then we started getting an advertisement for the
same product while internet surfing on the same browser and this is because of machine
learning.
Google understands the user interest using various machine learning algorithms and
suggests the product as per customer interest.
As similar, when we use Netflix, we find some recommendations for entertainment series,
movies, etc., and this is also done with the help of machine learning.
5. Self-driving cars:
One of the most exciting applications of machine learning is self-driving cars. Machine
learning plays a significant role in self-driving cars. Tesla, the most popular car
manufacturing company is working on self-driving car. It is using unsupervised learning
method to train the car models to detect people and objects while driving.
o Content Filter
o Header filter
o General blacklists filter
o Rules-based filters
o Permission filters
For each genuine transaction, the output is converted into some hash values, and these
values become the input for the next round. For each genuine transaction, there is a
specific pattern which gets change for the fraud transaction hence, it detects it and makes
our online transactions more secure.
Machine learning life cycle involves seven major steps, which are given below:
o Gathering Data
o Data preparation
o Data Wrangling
o Analyse Data
o Train the model
o Test the model
o Deployment
The most important thing in the complete process is to understand the problem and to
know the purpose of the problem. Therefore, before starting the life cycle, we need to
understand the problem because the good result depends on the better understanding
of the problem.
In the complete life cycle process, to solve a problem, we create a machine learning system
called "model", and this model is created by providing "training". But to train a model, we
need data, hence, life cycle starts by collecting data.
1. Gathering Data:
Data Gathering is the first step of the machine learning life cycle. The goal of this step is
to identify and obtain all data-related problems.
In this step, we need to identify the different data sources, as data can be collected from
various sources such as files, database, internet, or mobile devices. It is one of the most
important steps of the life cycle. The quantity and quality of the collected data will
determine the efficiency of the output. The more will be the data, the more accurate will
be the prediction.
By performing the above task, we get a coherent set of data, also called as a dataset. It
will be used in further steps.
2. Data preparation
After collecting the data, we need to prepare it for further steps. Data preparation is a
step where we put our data into a suitable place and prepare it to use in our machine
learning training.
In this step, first, we put all data together, and then randomize the ordering of data.
This step can be further divided into two processes:
o Data exploration: It is used to understand the nature of data that we have to work
with. We need to understand the characteristics, format, and quality of data.
A better understanding of data leads to an effective outcome. In this, we find
Correlations, general trends, and outliers.
o Data pre-processing: Now the next step is preprocessing of data for its analysis.
3. Data Wrangling
Data wrangling is the process of cleaning and converting raw data into a useable format.
It is the process of cleaning the data, selecting the variable to use, and transforming the
data in a proper format to make it more suitable for analysis in the next step. It is one of
the most important steps of the complete process. Cleaning of data is required to address
the quality issues.
It is not necessary that data we have collected is always of our use as some of the data
may not be useful. In real-world applications, collected data may have various issues,
including:
o Missing Values
o Duplicate data
o Invalid data
o Noise
It is mandatory to detect and remove the above issues because it can negatively affect
the quality of the outcome.
4. Data Analysis
Now the cleaned and prepared data is passed on to the analysis step. This step involves:
The aim of this step is to build a machine learning model to analyze the data using various
analytical techniques and review the outcome. It starts with the determination of the type
of the problems, where we select the machine learning techniques such
as Classification, Regression, Cluster analysis, Association, etc. then build the model
using prepared data, and evaluate the model.
Hence, in this step, we take the data and use machine learning algorithms to build the
model.
5. Train Model
Now the next step is to train the model, in this step we train our model to improve its
performance for better outcome of the problem.
We use datasets to train the model using various machine learning algorithms. Training a
model is required so that it can understand the various patterns, rules, and, features.
6. Test Model
Once our machine learning model has been trained on a given dataset, then we test the
model. In this step, we check for the accuracy of our model by providing a test dataset to
it.
Testing the model determines the percentage accuracy of the model as per the
requirement of project or problem.
7. Deployment
The last step of machine learning life cycle is deployment, where we deploy the model in
the real-world system.
If the above-prepared model is producing an accurate result as per our requirement with
acceptable speed, then we deploy the model in the real system. But before deploying the
project, we will check whether it is improving its performance using available data or not.
The deployment phase is similar to making the final report for a project.
The goal of AI is to make a smart The goal of ML is to allow machines to learn from
computer system like humans to solve data so that they can give accurate output.
complex problems.
In AI, we make intelligent systems to In ML, we teach machines with data to perform a
perform any task like a human. particular task and give an accurate result.
Machine learning and deep learning Deep learning is a main subset of machine learning.
are the two main subsets of AI.
AI has a very wide range of scope. Machine learning has a limited scope.
The main applications of AI are Siri, The main applications of machine learning
customer support using catboats, are Online recommender system, Google search
Expert System, Online game playing, algorithms, Facebook auto friend tagging
intelligent humanoid robot, etc. suggestions, etc.
On the basis of capabilities, AI can be Machine learning can also be divided into mainly
divided into three types, which three types that are Supervised
are, Weak AI, General AI, and Strong learning, Unsupervised learning,
AI. and Reinforcement learning.
It includes learning, reasoning, and self- It includes learning and self-correction when
correction. introduced with new data.
AI completely deals with Structured, Machine learning deals with Structured and semi-
semi-structured, and unstructured structured data.
data.
When creating a machine learning project, it is not always a case that we come across the
clean and formatted data. And while doing any operation with data, it is mandatory to
clean it and put in a formatted way. So for this, we use data preprocessing task.
In supervised learning, the training data provided to the machines work as the supervisor
that teaches the machines to predict the output correctly. It applies the same concept as
a student learns in the supervision of the teacher.
Supervised learning is a process of providing input data as well as correct output data to
the machine learning model. The aim of a supervised learning algorithm is to find a
mapping function to map the input variable(x) with the output variable(y).
In the real-world, supervised learning can be used for Risk Assessment, Image
classification, Fraud Detection, spam filtering, etc.
The working of Supervised learning can be easily understood by the below example and
diagram:
Suppose we have a dataset of different types of shapes which includes square, rectangle,
triangle, and Polygon. Now the first step is that we need to train the model for each shape.
o If the given shape has four sides, and all the sides are equal, then it will be labelled
as a Square.
o If the given shape has three sides, then it will be labelled as a triangle.
o If the given shape has six equal sides then it will be labelled as hexagon.
Now, after training, we test our model using the test set, and the task of the model is to
identify the shape.
The machine is already trained on all types of shapes, and when it finds a new shape, it
classifies the shape on the bases of a number of sides, and predicts the output.
1. Regression
Regression algorithms are used if there is a relationship between the input variable and
the output variable. It is used for the prediction of continuous variables, such as Weather
forecasting, Market Trends, etc. Below are some popular Regression algorithms which
come under supervised learning:
o Linear Regression
o Regression Trees
o Non-Linear Regression
o Bayesian Linear Regression
o Polynomial Regression
2. Classification
Classification algorithms are used when the output variable is categorical, which means
there are two classes such as Yes-No, Male-Female, True-false, etc.
Spam Filtering,
o Random Forest
o Decision Trees
o Logistic Regression
o Support vector Machines
Unsupervised learning is a type of machine learning in which models are trained using
unlabeled dataset and are allowed to act on that data without any supervision.
Here, we have taken an unlabeled input data, which means it is not categorized and
corresponding outputs are also not given. Now, this unlabeled input data is fed to the
machine learning model in order to train it. Firstly, it will interpret the raw data to find the
hidden patterns from the data and then will apply suitable algorithms such as k-means
clustering, Decision tree, etc.
Once it applies the suitable algorithm, the algorithm divides the data objects into groups
according to the similarities and difference between the objects.
Types of Unsupervised Learning Algorithm:
The unsupervised learning algorithm can be further categorized into two types of
problems:
o Clustering: Clustering is a method of grouping the objects into clusters such that
objects with most similarities remains into a group and has less or no similarities
with the objects of another group. Cluster analysis finds the commonalities
between the data objects and categorizes them as per the presence and absence
of those commonalities.
o Association: An association rule is an unsupervised learning method which is used
for finding the relationships between variables in the large database. It determines
the set of items that occurs together in the dataset. Association rule makes
marketing strategy more effective. Such as people who buy X item (suppose a
bread) are also tend to purchase Y (Butter/Jam) item. A typical example of
Association rule is Market Basket Analysis.
o K-means clustering
o KNN (k-nearest neighbors)
o Hierarchal clustering
o Anomaly detection
o Neural Networks
o Principle Component Analysis
o Independent Component Analysis
o Apriori algorithm
o Singular value decomposition
Supervised learning model predicts the Unsupervised learning model finds the
output. hidden patterns in data.
In supervised learning, input data is provided In unsupervised learning, only input data is
to the model along with the output. provided to the model.
The goal of supervised learning is to train the The goal of unsupervised learning is to find
model so that it can predict the output when it the hidden patterns and useful insights from
is given new data. the unknown dataset.
Supervised learning needs supervision to train Unsupervised learning does not need any
the model. supervision to train the model.
Supervised learning can be used for those Unsupervised learning can be used for those
cases where we know the input as well as cases where we have only input data and no
corresponding outputs. corresponding output data.
Supervised learning model produces an Unsupervised learning model may give less
accurate result. accurate result as compared to supervised
learning.
Supervised learning is not close to true Unsupervised learning is more close to the
Artificial intelligence as in this, we first train the true Artificial Intelligence as it learns similarly
model for each data, and then only it can as a child learns daily routine things by his
predict the correct output. experiences.
o Noisy Data- It is responsible for an inaccurate prediction that affects the decision
as well as accuracy in classification tasks.
o Incorrect data- It is also responsible for faulty programming and results obtained
in machine learning models. Hence, incorrect data may affect the accuracy of the
results also.
o Generalizing of output data- Sometimes, it is also found that generalizing output
data becomes complex, which results in comparatively poor future actions.
Further, if we are using non-representative training data in the model, it results in less
accurate predictions. A machine learning model is said to be ideal if it predicts well for
generalized cases and provides accurate decisions. If there is less training data, then there
will be a sampling noise in the model, called the non-representative training set. It won't
be accurate in predictions. To overcome this, it will be biased against one class or a group.
Hence, we should use representative data in training to protect against being biased and
make accurate predictions without any drift.
Overfitting is one of the most common issues faced by Machine Learning engineers and
data scientists. Whenever a machine learning model is trained with a huge amount of
data, it starts capturing noise and inaccurate data into the training data set. It negatively
affects the performance of the model. Let's understand with a simple example where we
have a few training data sets such as 1000 mangoes, 1000 apples, 1000 bananas, and 5000
papayas. Then there is a considerable probability of identification of an apple as papaya
because we have a massive amount of biased data in the training data set; hence
prediction got negatively affected. The main reason behind overfitting is using non-linear
methods used in machine learning algorithms as they build non-realistic data models. We
can overcome overfitting by using linear and parametric algorithms in the machine
learning models.
Underfitting:
Underfitting occurs when our model is too simple to understand the base structure of the
data, just like an undersized pant. This generally happens when we have limited data into
the data set, and we try to build a linear model with non-linear data. In such scenarios,
the complexity of the model destroys, and rules of the machine learning model become
too easy to be applied on this data set, and the model starts doing wrong predictions as
well.
8. Customer Segmentation
Customer segmentation is also an important issue while developing a machine learning
algorithm. To identify the customers who paid for the recommendations shown by the
model and who don't even check them. Hence, an algorithm is necessary to recognize the
customer behavior and trigger a relevant recommendation for the user based on past
experience.