0% found this document useful (0 votes)
1 views

01. ML,Types,Application,Life Cycle,Issues

The Machine Learning tutorial covers fundamental and advanced concepts of machine learning, aimed at students and professionals. It explains the various types of machine learning, including supervised, unsupervised, and reinforcement learning, along with their applications in real-world scenarios such as image recognition, speech recognition, and self-driving cars. The tutorial also outlines the machine learning life cycle, detailing the steps from data gathering to model deployment.

Uploaded by

siam.rehman447
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
1 views

01. ML,Types,Application,Life Cycle,Issues

The Machine Learning tutorial covers fundamental and advanced concepts of machine learning, aimed at students and professionals. It explains the various types of machine learning, including supervised, unsupervised, and reinforcement learning, along with their applications in real-world scenarios such as image recognition, speech recognition, and self-driving cars. The tutorial also outlines the machine learning life cycle, detailing the steps from data gathering to model deployment.

Uploaded by

siam.rehman447
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 29

Machine Learning Tutorial

Machine Learning tutorial provides basic and advanced concepts of machine learning.
Our machine learning tutorial is designed for students and working professionals.

Machine learning is a growing technology which enables computers to learn


automatically from past data. Machine learning uses various algorithms for building
mathematical models and making predictions using historical data or information.
Currently, it is being used for various tasks such as image recognition, speech
recognition, email filtering, Facebook auto-tagging, recommender system, and many
more.

This machine learning tutorial gives you an introduction to machine learning along with
the wide range of machine learning techniques such as Supervised, Unsupervised,
and Reinforcement learning. You will learn about regression and classification models,
clustering methods, hidden Markov models, and various sequential models.

What is Machine Learning


In the real world, we are surrounded by humans who can learn everything from their
experiences with their learning capability, and we have computers or machines which
work on our instructions. But can a machine also learn from experiences or past data like
a human does? So here comes the role of Machine Learning.
Machine Learning is said as a subset of artificial intelligence that is mainly concerned
with the development of algorithms which allow a computer to learn from the data and
past experiences on their own. The term machine learning was first introduced by Arthur
Samuel in 1959. We can define it in a summarized way as:

Machine learning enables a machine to automatically learn from data, improve performance
from experiences, and predict things without being explicitly programmed.

With the help of sample historical data, which is known as training data, machine learning
algorithms build a mathematical model that helps in making predictions or decisions
without being explicitly programmed. Machine learning brings computer science and
statistics together for creating predictive models. Machine learning constructs or uses the
algorithms that learn from historical data. The more we will provide the information, the
higher will be the performance.

A machine has the ability to learn if it can improve its performance by gaining more
data.

How does Machine Learning work


A Machine Learning system learns from historical data, builds the prediction models,
and whenever it receives new data, predicts the output for it. The accuracy of
predicted output depends upon the amount of data, as the huge amount of data helps
to build a better model which predicts the output more accurately.

Suppose we have a complex problem, where we need to perform some predictions, so


instead of writing a code for it, we just need to feed the data to generic algorithms, and
with the help of these algorithms, machine builds the logic as per the data and predict
the output. Machine learning has changed our way of thinking about the problem. The
below block diagram explains the working of Machine Learning algorithm:

Features of Machine Learning:


o Machine learning uses data to detect various patterns in a given dataset.
o It can learn from past data and improve automatically.
o It is a data-driven technology.
o Machine learning is much similar to data mining as it also deals with the huge
amount of the data.

Need for Machine Learning


The need for machine learning is increasing day by day. The reason behind the need for
machine learning is that it is capable of doing tasks that are too complex for a person to
implement directly. As a human, we have some limitations as we cannot access the huge
amount of data manually, so for this, we need some computer systems and here comes
the machine learning to make things easy for us.

We can train machine learning algorithms by providing them the huge amount of data
and let them explore the data, construct the models, and predict the required output
automatically. The performance of the machine learning algorithm depends on the
amount of data, and it can be determined by the cost function. With the help of machine
learning, we can save both time and money.
The importance of machine learning can be easily understood by its uses cases, Currently,
machine learning is used in self-driving cars, cyber fraud detection, face recognition,
and friend suggestion by Facebook, etc. Various top companies such as Netflix and
Amazon have build machine learning models that are using a vast amount of data to
analyze the user interest and recommend product accordingly.

Following are some key points which show the importance of Machine Learning:

o Rapid increment in the production of data


o Solving complex problems, which are difficult for a human
o Decision making in various sector including finance
o Finding hidden patterns and extracting useful information from data.

Classification of Machine Learning


At a broad level, machine learning can be classified into three types:

1. Supervised learning
2. Unsupervised learning
3. Reinforcement learning
1) Supervised Learning
Supervised learning is a type of machine learning method in which we provide sample
labeled data to the machine learning system in order to train it, and on that basis, it
predicts the output.

The system creates a model using labeled data to understand the datasets and learn about
each data, once the training and processing are done then we test the model by providing
a sample data to check whether it is predicting the exact output or not.

The goal of supervised learning is to map input data with the output data. The supervised
learning is based on supervision, and it is the same as when a student learns things in the
supervision of the teacher. The example of supervised learning is spam filtering.

Supervised learning can be grouped further in two categories of algorithms:

o Classification
o Regression

2) Unsupervised Learning
Unsupervised learning is a learning method in which a machine learns without any
supervision.

The training is provided to the machine with the set of data that has not been labeled,
classified, or categorized, and the algorithm needs to act on that data without any
supervision. The goal of unsupervised learning is to restructure the input data into new
features or a group of objects with similar patterns.

In unsupervised learning, we don't have a predetermined result. The machine tries to find
useful insights from the huge amount of data. It can be further classifieds into two
categories of algorithms:

o Clustering
o Association

3) Reinforcement Learning
Reinforcement learning is a feedback-based learning method, in which a learning agent
gets a reward for each right action and gets a penalty for each wrong action. The agent
learns automatically with these feedbacks and improves its performance. In reinforcement
learning, the agent interacts with the environment and explores it. The goal of an agent
is to get the most reward points, and hence, it improves its performance.

The robotic dog, which automatically learns the movement of his arms, is an example of
Reinforcement learning.

Applications of Machine learning


Machine learning is a buzzword for today's technology, and it is growing very rapidly day
by day. We are using machine learning in our daily life even without knowing it such as
Google Maps, Google assistant, Alexa, etc. Below are some most trending real-world
applications of Machine Learning:

1. Image Recognition:
Image recognition is one of the most common applications of machine learning. It is used
to identify objects, persons, places, digital images, etc. The popular use case of image
recognition and face detection is, Automatic friend tagging suggestion:
Facebook provides us a feature of auto friend tagging suggestion. Whenever we upload
a photo with our Facebook friends, then we automatically get a tagging suggestion with
name, and the technology behind this is machine learning's face
detection and recognition algorithm.

It is based on the Facebook project named "Deep Face," which is responsible for face
recognition and person identification in the picture.

2. Speech Recognition
While using Google, we get an option of "Search by voice," it comes under speech
recognition, and it's a popular application of machine learning.

Speech recognition is a process of converting voice instructions into text, and it is also
known as "Speech to text", or "Computer speech recognition." At present, machine
learning algorithms are widely used by various applications of speech recognition. Google
assistant, Siri, Cortana, and Alexa are using speech recognition technology to follow the
voice instructions.

3. Traffic prediction:
If we want to visit a new place, we take help of Google Maps, which shows us the correct
path with the shortest route and predicts the traffic conditions.

It predicts the traffic conditions such as whether traffic is cleared, slow-moving, or heavily
congested with the help of two ways:

o Real Time location of the vehicle form Google Map app and sensors
o Average time has taken on past days at the same time.

Everyone who is using Google Map is helping this app to make it better. It takes
information from the user and sends back to its database to improve the performance.

4. Product recommendations:
Machine learning is widely used by various e-commerce and entertainment companies
such as Amazon, Netflix, etc., for product recommendation to the user. Whenever we
search for some product on Amazon, then we started getting an advertisement for the
same product while internet surfing on the same browser and this is because of machine
learning.
Google understands the user interest using various machine learning algorithms and
suggests the product as per customer interest.

As similar, when we use Netflix, we find some recommendations for entertainment series,
movies, etc., and this is also done with the help of machine learning.

5. Self-driving cars:
One of the most exciting applications of machine learning is self-driving cars. Machine
learning plays a significant role in self-driving cars. Tesla, the most popular car
manufacturing company is working on self-driving car. It is using unsupervised learning
method to train the car models to detect people and objects while driving.

6. Email Spam and Malware Filtering:


Whenever we receive a new email, it is filtered automatically as important, normal, and
spam. We always receive an important mail in our inbox with the important symbol and
spam emails in our spam box, and the technology behind this is Machine learning. Below
are some spam filters used by Gmail:

o Content Filter
o Header filter
o General blacklists filter
o Rules-based filters
o Permission filters

Some machine learning algorithms such as Multi-Layer Perceptron, Decision tree,


and Naïve Bayes classifier are used for email spam filtering and malware detection.

7. Virtual Personal Assistant:


We have various virtual personal assistants such as Google assistant, Alexa, Cortana, Siri.
As the name suggests, they help us in finding the information using our voice instruction.
These assistants can help us in various ways just by our voice instructions such as Play
music, call someone, Open an email, Scheduling an appointment, etc.

These virtual assistants use machine learning algorithms as an important part.


These assistant record our voice instructions, send it over the server on a cloud, and
decode it using ML algorithms and act accordingly.

8. Online Fraud Detection:


Machine learning is making our online transaction safe and secure by detecting fraud
transaction. Whenever we perform some online transaction, there may be various ways
that a fraudulent transaction can take place such as fake accounts, fake ids, and steal
money in the middle of a transaction. So to detect this, Feed Forward Neural
network helps us by checking whether it is a genuine transaction or a fraud transaction.

For each genuine transaction, the output is converted into some hash values, and these
values become the input for the next round. For each genuine transaction, there is a
specific pattern which gets change for the fraud transaction hence, it detects it and makes
our online transactions more secure.

9. Stock Market trading:


Machine learning is widely used in stock market trading. In the stock market, there is
always a risk of up and downs in shares, so for this machine learning's long short term
memory neural network is used for the prediction of stock market trends.

10. Medical Diagnosis:


In medical science, machine learning is used for diseases diagnoses. With this, medical
technology is growing very fast and able to build 3D models that can predict the exact
position of lesions in the brain.

It helps in finding brain tumors and other brain-related diseases easily.

11. Automatic Language Translation:


Nowadays, if we visit a new place and we are not aware of the language then it is not a
problem at all, as for this also machine learning helps us by converting the text into our
known languages. Google's GNMT (Google Neural Machine Translation) provide this
feature, which is a Neural Machine Learning that translates the text into our familiar
language, and it called as automatic translation.

The technology behind the automatic translation is a sequence to sequence learning


algorithm, which is used with image recognition and translates the text from one language
to another language.
Machine learning Life cycle
Machine learning has given the computer systems the abilities to automatically learn
without being explicitly programmed. But how does a machine learning system work? So,
it can be described using the life cycle of machine learning. Machine learning life cycle is
a cyclic process to build an efficient machine learning project. The main purpose of the
life cycle is to find a solution to the problem or project.

Machine learning life cycle involves seven major steps, which are given below:

o Gathering Data
o Data preparation
o Data Wrangling
o Analyse Data
o Train the model
o Test the model
o Deployment
The most important thing in the complete process is to understand the problem and to
know the purpose of the problem. Therefore, before starting the life cycle, we need to
understand the problem because the good result depends on the better understanding
of the problem.

In the complete life cycle process, to solve a problem, we create a machine learning system
called "model", and this model is created by providing "training". But to train a model, we
need data, hence, life cycle starts by collecting data.

1. Gathering Data:
Data Gathering is the first step of the machine learning life cycle. The goal of this step is
to identify and obtain all data-related problems.

In this step, we need to identify the different data sources, as data can be collected from
various sources such as files, database, internet, or mobile devices. It is one of the most
important steps of the life cycle. The quantity and quality of the collected data will
determine the efficiency of the output. The more will be the data, the more accurate will
be the prediction.

This step includes the below tasks:

o Identify various data sources


o Collect data
o Integrate the data obtained from different sources

By performing the above task, we get a coherent set of data, also called as a dataset. It
will be used in further steps.

2. Data preparation
After collecting the data, we need to prepare it for further steps. Data preparation is a
step where we put our data into a suitable place and prepare it to use in our machine
learning training.

In this step, first, we put all data together, and then randomize the ordering of data.
This step can be further divided into two processes:

o Data exploration: It is used to understand the nature of data that we have to work
with. We need to understand the characteristics, format, and quality of data.
A better understanding of data leads to an effective outcome. In this, we find
Correlations, general trends, and outliers.
o Data pre-processing: Now the next step is preprocessing of data for its analysis.

3. Data Wrangling
Data wrangling is the process of cleaning and converting raw data into a useable format.
It is the process of cleaning the data, selecting the variable to use, and transforming the
data in a proper format to make it more suitable for analysis in the next step. It is one of
the most important steps of the complete process. Cleaning of data is required to address
the quality issues.

It is not necessary that data we have collected is always of our use as some of the data
may not be useful. In real-world applications, collected data may have various issues,
including:

o Missing Values
o Duplicate data
o Invalid data
o Noise

So, we use various filtering techniques to clean the data.

It is mandatory to detect and remove the above issues because it can negatively affect
the quality of the outcome.

4. Data Analysis
Now the cleaned and prepared data is passed on to the analysis step. This step involves:

o Selection of analytical techniques


o Building models
o Review the result

The aim of this step is to build a machine learning model to analyze the data using various
analytical techniques and review the outcome. It starts with the determination of the type
of the problems, where we select the machine learning techniques such
as Classification, Regression, Cluster analysis, Association, etc. then build the model
using prepared data, and evaluate the model.

Hence, in this step, we take the data and use machine learning algorithms to build the
model.

5. Train Model
Now the next step is to train the model, in this step we train our model to improve its
performance for better outcome of the problem.

We use datasets to train the model using various machine learning algorithms. Training a
model is required so that it can understand the various patterns, rules, and, features.

6. Test Model
Once our machine learning model has been trained on a given dataset, then we test the
model. In this step, we check for the accuracy of our model by providing a test dataset to
it.

Testing the model determines the percentage accuracy of the model as per the
requirement of project or problem.

7. Deployment
The last step of machine learning life cycle is deployment, where we deploy the model in
the real-world system.
If the above-prepared model is producing an accurate result as per our requirement with
acceptable speed, then we deploy the model in the real system. But before deploying the
project, we will check whether it is improving its performance using available data or not.
The deployment phase is similar to making the final report for a project.

Key differences between Artificial Intelligence (AI) and


Machine learning (ML):

Artificial Intelligence Machine learning

Artificial intelligence is a technology Machine learning is a subset of AI which allows a


which enables a machine to simulate machine to automatically learn from past data
human behavior. without programming explicitly.

The goal of AI is to make a smart The goal of ML is to allow machines to learn from
computer system like humans to solve data so that they can give accurate output.
complex problems.

In AI, we make intelligent systems to In ML, we teach machines with data to perform a
perform any task like a human. particular task and give an accurate result.

Machine learning and deep learning Deep learning is a main subset of machine learning.
are the two main subsets of AI.

AI has a very wide range of scope. Machine learning has a limited scope.

AI is working to create an intelligent Machine learning is working to create machines


system which can perform various that can perform only those specific tasks for which
complex tasks. they are trained.

AI system is concerned about Machine learning is mainly concerned about


maximizing the chances of success. accuracy and patterns.

The main applications of AI are Siri, The main applications of machine learning
customer support using catboats, are Online recommender system, Google search
Expert System, Online game playing, algorithms, Facebook auto friend tagging
intelligent humanoid robot, etc. suggestions, etc.
On the basis of capabilities, AI can be Machine learning can also be divided into mainly
divided into three types, which three types that are Supervised
are, Weak AI, General AI, and Strong learning, Unsupervised learning,
AI. and Reinforcement learning.

It includes learning, reasoning, and self- It includes learning and self-correction when
correction. introduced with new data.

AI completely deals with Structured, Machine learning deals with Structured and semi-
semi-structured, and unstructured structured data.
data.

Data Preprocessing in Machine learning


Data preprocessing is a process of preparing the raw data and making it suitable for a
machine learning model. It is the first and crucial step while creating a machine learning
model.

When creating a machine learning project, it is not always a case that we come across the
clean and formatted data. And while doing any operation with data, it is mandatory to
clean it and put in a formatted way. So for this, we use data preprocessing task.

Why do we need Data Preprocessing?


A real-world data generally contains noises, missing values, and maybe in an unusable
format which cannot be directly used for machine learning models. Data preprocessing is
required tasks for cleaning the data and making it suitable for a machine learning model
which also increases the accuracy and efficiency of a machine learning model.

It involves below steps:

o Getting the dataset


o Importing libraries
o Importing datasets
o Finding Missing Data
o Encoding Categorical Data
o Splitting dataset into training and test set
o Feature scaling

Supervised Machine Learning


Supervised learning is the types of machine learning in which machines are trained using
well "labelled" training data, and on basis of that data, machines predict the output. The
labelled data means some input data is already tagged with the correct output.

In supervised learning, the training data provided to the machines work as the supervisor
that teaches the machines to predict the output correctly. It applies the same concept as
a student learns in the supervision of the teacher.

Supervised learning is a process of providing input data as well as correct output data to
the machine learning model. The aim of a supervised learning algorithm is to find a
mapping function to map the input variable(x) with the output variable(y).

In the real-world, supervised learning can be used for Risk Assessment, Image
classification, Fraud Detection, spam filtering, etc.

How Supervised Learning Works?


In supervised learning, models are trained using labelled dataset, where the model learns
about each type of data. Once the training process is completed, the model is tested on
the basis of test data (a subset of the training set), and then it predicts the output.

The working of Supervised learning can be easily understood by the below example and
diagram:
Suppose we have a dataset of different types of shapes which includes square, rectangle,
triangle, and Polygon. Now the first step is that we need to train the model for each shape.

o If the given shape has four sides, and all the sides are equal, then it will be labelled
as a Square.
o If the given shape has three sides, then it will be labelled as a triangle.
o If the given shape has six equal sides then it will be labelled as hexagon.

Now, after training, we test our model using the test set, and the task of the model is to
identify the shape.

The machine is already trained on all types of shapes, and when it finds a new shape, it
classifies the shape on the bases of a number of sides, and predicts the output.

Steps Involved in Supervised Learning:


o First Determine the type of training dataset
o Collect/Gather the labelled training data.
o Split the training dataset into training dataset, test dataset, and validation
dataset.
o Determine the input features of the training dataset, which should have enough
knowledge so that the model can accurately predict the output.
o Determine the suitable algorithm for the model, such as support vector machine,
decision tree, etc.
o Execute the algorithm on the training dataset. Sometimes we need validation sets
as the control parameters, which are the subset of training datasets.
o Evaluate the accuracy of the model by providing the test set. If the model predicts
the correct output, which means our model is accurate.

Types of supervised Machine learning Algorithms:


Supervised learning can be further divided into two types of problems:

1. Regression

Regression algorithms are used if there is a relationship between the input variable and
the output variable. It is used for the prediction of continuous variables, such as Weather
forecasting, Market Trends, etc. Below are some popular Regression algorithms which
come under supervised learning:

o Linear Regression
o Regression Trees
o Non-Linear Regression
o Bayesian Linear Regression
o Polynomial Regression
2. Classification

Classification algorithms are used when the output variable is categorical, which means
there are two classes such as Yes-No, Male-Female, True-false, etc.

Spam Filtering,

o Random Forest
o Decision Trees
o Logistic Regression
o Support vector Machines

Advantages of Supervised learning:


o With the help of supervised learning, the model can predict the output on the basis
of prior experiences.
o In supervised learning, we can have an exact idea about the classes of objects.
o Supervised learning model helps us to solve various real-world problems such
as fraud detection, spam filtering, etc.

Disadvantages of supervised learning:


o Supervised learning models are not suitable for handling the complex tasks.
o Supervised learning cannot predict the correct output if the test data is different
from the training dataset.
o Training required lots of computation times.
o In supervised learning, we need enough knowledge about the classes of object.

Unsupervised Machine Learning


In the previous topic, we learned supervised machine learning in which models are trained
using labeled data under the supervision of training data. But there may be many cases
in which we do not have labeled data and need to find the hidden patterns from the given
dataset. So, to solve such types of cases in machine learning, we need unsupervised
learning techniques.
What is Unsupervised Learning?
As the name suggests, unsupervised learning is a machine learning technique in which
models are not supervised using training dataset. Instead, models itself find the hidden
patterns and insights from the given data. It can be compared to learning which takes
place in the human brain while learning new things. It can be defined as:

Unsupervised learning is a type of machine learning in which models are trained using
unlabeled dataset and are allowed to act on that data without any supervision.

Unsupervised learning cannot be directly applied to a regression or classification problem


because unlike supervised learning, we have the input data but no corresponding output
data. The goal of unsupervised learning is to find the underlying structure of dataset,
group that data according to similarities, and represent that dataset in a compressed
format.

Example: Suppose the unsupervised learning algorithm is given an input dataset


containing images of different types of cats and dogs. The algorithm is never trained upon
the given dataset, which means it does not have any idea about the features of the
dataset. The task of the unsupervised learning algorithm is to identify the image features
on their own. Unsupervised learning algorithm will perform this task by clustering the
image dataset into the groups according to similarities between images.

Why use Unsupervised Learning?


Below are some main reasons which describe the importance of Unsupervised Learning:
o Unsupervised learning is helpful for finding useful insights from the data.
o Unsupervised learning is much similar as a human learns to think by their own
experiences, which makes it closer to the real AI.
o Unsupervised learning works on unlabeled and uncategorized data which make
unsupervised learning more important.
o In real-world, we do not always have input data with the corresponding output so
to solve such cases, we need unsupervised learning.

Working of Unsupervised Learning


Working of unsupervised learning can be understood by the below diagram:

Here, we have taken an unlabeled input data, which means it is not categorized and
corresponding outputs are also not given. Now, this unlabeled input data is fed to the
machine learning model in order to train it. Firstly, it will interpret the raw data to find the
hidden patterns from the data and then will apply suitable algorithms such as k-means
clustering, Decision tree, etc.

Once it applies the suitable algorithm, the algorithm divides the data objects into groups
according to the similarities and difference between the objects.
Types of Unsupervised Learning Algorithm:
The unsupervised learning algorithm can be further categorized into two types of
problems:

o Clustering: Clustering is a method of grouping the objects into clusters such that
objects with most similarities remains into a group and has less or no similarities
with the objects of another group. Cluster analysis finds the commonalities
between the data objects and categorizes them as per the presence and absence
of those commonalities.
o Association: An association rule is an unsupervised learning method which is used
for finding the relationships between variables in the large database. It determines
the set of items that occurs together in the dataset. Association rule makes
marketing strategy more effective. Such as people who buy X item (suppose a
bread) are also tend to purchase Y (Butter/Jam) item. A typical example of
Association rule is Market Basket Analysis.

Unsupervised Learning algorithms:


Below is the list of some popular unsupervised learning algorithms:

o K-means clustering
o KNN (k-nearest neighbors)
o Hierarchal clustering
o Anomaly detection
o Neural Networks
o Principle Component Analysis
o Independent Component Analysis
o Apriori algorithm
o Singular value decomposition

Advantages of Unsupervised Learning


o Unsupervised learning is used for more complex tasks as compared to supervised
learning because, in unsupervised learning, we don't have labeled input data.
o Unsupervised learning is preferable as it is easy to get unlabeled data in
comparison to labeled data.

Disadvantages of Unsupervised Learning


o Unsupervised learning is intrinsically more difficult than supervised learning as it
does not have corresponding output.
o The result of the unsupervised learning algorithm might be less accurate as input
data is not labeled, and algorithms do not know the exact output in advance.

Difference between Supervised and Unsupervised


Learning
The main differences between Supervised and Unsupervised learning are given below:

Supervised Learning Unsupervised Learning

Supervised learning algorithms are trained Unsupervised learning algorithms are


using labeled data. trained using unlabeled data.
Supervised learning model takes direct Unsupervised learning model does not take
feedback to check if it is predicting correct any feedback.
output or not.

Supervised learning model predicts the Unsupervised learning model finds the
output. hidden patterns in data.

In supervised learning, input data is provided In unsupervised learning, only input data is
to the model along with the output. provided to the model.

The goal of supervised learning is to train the The goal of unsupervised learning is to find
model so that it can predict the output when it the hidden patterns and useful insights from
is given new data. the unknown dataset.

Supervised learning needs supervision to train Unsupervised learning does not need any
the model. supervision to train the model.

Supervised learning can be categorized Unsupervised Learning can be classified


in Classification and Regression problems. in Clustering and Associations problems.

Supervised learning can be used for those Unsupervised learning can be used for those
cases where we know the input as well as cases where we have only input data and no
corresponding outputs. corresponding output data.

Supervised learning model produces an Unsupervised learning model may give less
accurate result. accurate result as compared to supervised
learning.

Supervised learning is not close to true Unsupervised learning is more close to the
Artificial intelligence as in this, we first train the true Artificial Intelligence as it learns similarly
model for each data, and then only it can as a child learns daily routine things by his
predict the correct output. experiences.

It includes various algorithms such as Linear It includes various algorithms such as


Regression, Logistic Regression, Support Clustering, KNN, and Apriori algorithm.
Vector Machine, Multi-class Classification,
Decision tree, Bayesian Logic, etc.
Common issues in Machine Learning
Although machine learning is being used in every industry and helps organizations make
more informed and data-driven choices that are more effective than classical
methodologies, it still has so many problems that cannot be ignored. Here are some
common issues in Machine Learning that professionals face to inculcate ML skills and
create an application from scratch.

1. Inadequate Training Data


The major issue that comes while using machine learning algorithms is the lack of quality
as well as quantity of data. Although data plays a vital role in the processing of machine
learning algorithms, many data scientists claim that inadequate data, noisy data, and
unclean data are extremely exhausting the machine learning algorithms. For example, a
simple task requires thousands of sample data, and an advanced task such as speech or
image recognition needs millions of sample data examples. Further, data quality is also
important for the algorithms to work ideally, but the absence of data quality is also found
in Machine Learning applications. Data quality can be affected by some factors as follows:

o Noisy Data- It is responsible for an inaccurate prediction that affects the decision
as well as accuracy in classification tasks.
o Incorrect data- It is also responsible for faulty programming and results obtained
in machine learning models. Hence, incorrect data may affect the accuracy of the
results also.
o Generalizing of output data- Sometimes, it is also found that generalizing output
data becomes complex, which results in comparatively poor future actions.

2. Poor quality of data


As we have discussed above, data plays a significant role in machine learning, and it must
be of good quality as well. Noisy data, incomplete data, inaccurate data, and unclean data
lead to less accuracy in classification and low-quality results. Hence, data quality can also
be considered as a major common problem while processing machine learning
algorithms.

3. Non-representative training data


To make sure our training model is generalized well or not, we have to ensure that sample
training data must be representative of new cases that we need to generalize. The training
data must cover all cases that are already occurred as well as occurring.

Further, if we are using non-representative training data in the model, it results in less
accurate predictions. A machine learning model is said to be ideal if it predicts well for
generalized cases and provides accurate decisions. If there is less training data, then there
will be a sampling noise in the model, called the non-representative training set. It won't
be accurate in predictions. To overcome this, it will be biased against one class or a group.

Hence, we should use representative data in training to protect against being biased and
make accurate predictions without any drift.

4. Overfitting and Underfitting


Overfitting:

Overfitting is one of the most common issues faced by Machine Learning engineers and
data scientists. Whenever a machine learning model is trained with a huge amount of
data, it starts capturing noise and inaccurate data into the training data set. It negatively
affects the performance of the model. Let's understand with a simple example where we
have a few training data sets such as 1000 mangoes, 1000 apples, 1000 bananas, and 5000
papayas. Then there is a considerable probability of identification of an apple as papaya
because we have a massive amount of biased data in the training data set; hence
prediction got negatively affected. The main reason behind overfitting is using non-linear
methods used in machine learning algorithms as they build non-realistic data models. We
can overcome overfitting by using linear and parametric algorithms in the machine
learning models.

Methods to reduce overfitting:

o Increase training data in a dataset.


o Reduce model complexity by simplifying the model by selecting one with fewer
parameters
o Ridge Regularization and Lasso Regularization
o Early stopping during the training phase
o Reduce the noise
o Reduce the number of attributes in training data.
o Constraining the model.

Underfitting:

Underfitting is just the opposite of overfitting. Whenever a machine learning model is


trained with fewer amounts of data, and as a result, it provides incomplete and inaccurate
data and destroys the accuracy of the machine learning model.

Underfitting occurs when our model is too simple to understand the base structure of the
data, just like an undersized pant. This generally happens when we have limited data into
the data set, and we try to build a linear model with non-linear data. In such scenarios,
the complexity of the model destroys, and rules of the machine learning model become
too easy to be applied on this data set, and the model starts doing wrong predictions as
well.

Methods to reduce Underfitting:

o Increase model complexity


o Remove noise from the data
o Trained on increased and better features
o Reduce the constraints
o Increase the number of epochs to get better results.

5. Monitoring and maintenance


As we know that generalized output data is mandatory for any machine learning model;
hence, regular monitoring and maintenance become compulsory for the same. Different
results for different actions require data change; hence editing of codes as well as
resources for monitoring them also become necessary.

6. Getting bad recommendations


A machine learning model operates under a specific context which results in bad
recommendations and concept drift in the model. Let's understand with an example
where at a specific time customer is looking for some gadgets, but now customer
requirement changed over time but still machine learning model showing same
recommendations to the customer while customer expectation has been changed. This
incident is called a Data Drift. It generally occurs when new data is introduced or
interpretation of data changes. However, we can overcome this by regularly updating and
monitoring data according to the expectations.

7. Lack of skilled resources


Although Machine Learning and Artificial Intelligence are continuously growing in the
market, still these industries are fresher in comparison to others. The absence of skilled
resources in the form of manpower is also an issue. Hence, we need manpower having in-
depth knowledge of mathematics, science, and technologies for developing and
managing scientific substances for machine learning.

8. Customer Segmentation
Customer segmentation is also an important issue while developing a machine learning
algorithm. To identify the customers who paid for the recommendations shown by the
model and who don't even check them. Hence, an algorithm is necessary to recognize the
customer behavior and trigger a relevant recommendation for the user based on past
experience.

9. Process Complexity of Machine Learning


The machine learning process is very complex, which is also another major issue faced by
machine learning engineers and data scientists. However, Machine Learning and Artificial
Intelligence are very new technologies but are still in an experimental phase and
continuously being changing over time. There is the majority of hits and trial experiments;
hence the probability of error is higher than expected. Further, it also includes analyzing
the data, removing data bias, training data, applying complex mathematical calculations,
etc., making the procedure more complicated and quite tedious.

10. Data Bias


Data Biasing is also found a big challenge in Machine Learning. These errors exist when
certain elements of the dataset are heavily weighted or need more importance than
others. Biased data leads to inaccurate results, skewed outcomes, and other analytical
errors. However, we can resolve this error by determining where data is actually biased in
the dataset. Further, take necessary steps to reduce it.

Methods to remove Data Bias:

o Research more for customer segmentation.


o Be aware of your general use cases and potential outliers.
o Combine inputs from multiple sources to ensure data diversity.
o Include bias testing in the development process.
o Analyze data regularly and keep tracking errors to resolve them easily.
o Review the collected and annotated data.
o Use multi-pass annotation such as sentiment analysis, content moderation, and
intent recognition.

11. Lack of Explainability


This basically means the outputs cannot be easily comprehended as it is programmed in
specific ways to deliver for certain conditions. Hence, a lack of explainability is also found
in machine learning algorithms which reduce the credibility of the algorithms.

12. Slow implementations and results


This issue is also very commonly seen in machine learning models. However, machine
learning models are highly efficient in producing accurate results but are time-consuming.
Slow programming, excessive requirements' and overloaded data take more time to
provide accurate results than expected. This needs continuous maintenance and
monitoring of the model for delivering accurate results.

13. Irrelevant features


Although machine learning models are intended to give the best possible outcome, if we
feed garbage data as input, then the result will also be garbage. Hence, we should use
relevant features in our training sample. A machine learning model is said to be good if
training data has a good set of features or less to no irrelevant features.

You might also like