Machine Learning - 1 (UNIT - 1)
Machine Learning - 1 (UNIT - 1)
In the real world, we are surrounded by humans who can learn everything from their
experiences with their learning capability, and we have computers or machines which work
on our instructions. But can a machine also learn from experiences or past data like a human
does? So here comes the role of Machine Learning.
Machine learning algorithms create a mathematical model that, without being explicitly
programmed, aids in making predictions or decisions with the assistance of sample historical
data, or training data. For the purpose of developing predictive models, machine learning
brings together statistics and computer science. Algorithms that learn from historical data are
either constructed or utilized in machine learning. The performance will rise in proportion to
the quantity of information we provide.
A machine can learn if it can gain more data to improve its performance.
How does Machine Learning work
A machine learning system builds prediction models, learns from previous data, and predicts
the output of new data whenever it receives it. The amount of data helps to build a better
model that accurately predicts the output, which in turn affects the accuracy of the predicted
output.
Let's say we have a complex problem in which we need to make predictions. Instead of
writing code, we just need to feed the data to generic algorithms, which build the logic based
on the data and predict the output. Our perspective on the issue has changed as a result of
machine learning. The Machine Learning algorithm's operation is depicted in the following
block diagram:
Following are some key points which show the importance of Machine Learning:
AI vs ML vs DL
Artificial Intelligence
Machine Learning (ML) Deep Learning (DL)
(AI)
AI simulates human ML is a subset of AI that DL is a subset of ML that
intelligence to perform tasks uses algorithms to learn employs artificial neural
and make decisions. patterns from data. networks for complex tasks.
AI may or may not require ML heavily relies on DL requires extensive labeled
large datasets; it can use labeled data for training and data and performs exceptionally
predefined rules. making predictions. with big datasets.
AI can be rule-based, requiring ML automates learning DL automates feature extraction,
human programming and from data and requires less reducing the need for manual
intervention. manual intervention. engineering.
AI can handle various tasks, ML specializes in data- DL excels at complex tasks like
from simple to complex, driven tasks like image recognition, natural
across domains. classification, regression, language processing, and more.
Artificial Intelligence
Machine Learning (ML) Deep Learning (DL)
(AI)
etc.
ML employs various DL relies on deep neural
AI algorithms can be simple or
algorithms like decision networks, which can have
complex, depending on the
trees, SVM, and random numerous hidden layers for
application.
forests. complex learning.
AI may require less training ML training time varies DL training demands substantial
time and resources for rule- with the algorithm computational resources and
based systems. complexity and dataset size. time for deep networks.
ML models can be
AI systems may offer DL models are often considered
interpretable or less
interpretable results based on less interpretable due to complex
interpretable based on the
human rules. network architectures.
algorithm.
AI is used in virtual assistants, ML is applied in image DL is utilized in autonomous
recommendation systems, and recognition, spam filtering, vehicles, speech recognition, and
more. and other data tasks. advanced AI applications.
The major issue that comes while using machine learning algorithms is the lack of quality as
well as quantity of data. Although data plays a vital role in the processing of machine
learning algorithms, many data scientists claim that inadequate data, noisy data, and unclean
data are extremely exhausting the machine learning algorithms. For example, a simple task
requires thousands of sample data, and an advanced task such as speech or image recognition
needs millions of sample data examples. Further, data quality is also important for the
algorithms to work ideally, but the absence of data quality is also found in Machine Learning
applications. Data quality can be affected by some factors as follows:
o Noisy Data- It is responsible for an inaccurate prediction that affects the decision as
well as accuracy in classification tasks.
o Incorrect data- It is also responsible for faulty programming and results obtained in
machine learning models. Hence, incorrect data may affect the accuracy of the results
also.
o Generalizing of output data- Sometimes, it is also found that generalizing output
data becomes complex, which results in comparatively poor future actions.
As we have discussed above, data plays a significant role in machine learning, and it must be
of good quality as well. Noisy data, incomplete data, inaccurate data, and unclean data lead to
less accuracy in classification and low-quality results. Hence, data quality can also be
considered as a major common problem while processing machine learning algorithms.
To make sure our training model is generalized well or not, we have to ensure that sample
training data must be representative of new cases that we need to generalize. The training
data must cover all cases that are already occurred as well as occurring.
Further, if we are using non-representative training data in the model, it results in less
accurate predictions. A machine learning model is said to be ideal if it predicts well for
generalized cases and provides accurate decisions. If there is less training data, then there will
be a sampling noise in the model, called the non-representative training set. It won't be
accurate in predictions. To overcome this, it will be biased against one class or a group.
Hence, we should use representative data in training to protect against being biased and make
accurate predictions without any drift.
Overfitting:
Overfitting is one of the most common issues faced by Machine Learning engineers and data
scientists. Whenever a machine learning model is trained with a huge amount of data, it starts
capturing noise and inaccurate data into the training data set. It negatively affects the
performance of the model. Let's understand with a simple example where we have a few
training data sets such as 1000 mangoes, 1000 apples, 1000 bananas, and 5000 papayas. Then
there is a considerable probability of identification of an apple as papaya because we have a
massive amount of biased data in the training data set; hence prediction got negatively
affected. The main reason behind overfitting is using non-linear methods used in machine
learning algorithms as they build non-realistic data models. We can overcome overfitting by
using linear and parametric algorithms in the machine learning models.
Methods to reduce overfitting:
Underfitting:
Underfitting is just the opposite of overfitting. Whenever a machine learning model is trained
with fewer amounts of data, and as a result, it provides incomplete and inaccurate data and
destroys the accuracy of the machine learning model.
Underfitting occurs when our model is too simple to understand the base structure of the data,
just like an undersized pant. This generally happens when we have limited data into the data
set, and we try to build a linear model with non-linear data. In such scenarios, the complexity
of the model destroys, and rules of the machine learning model become too easy to be applied
on this data set, and the model starts doing wrong predictions as well.
5) Irrelevant features
Although machine learning models are intended to give the best possible outcome, if we feed
garbage data as input, then the result will also be garbage. Hence, we should use relevant
features in our training sample. A machine learning model is said to be good if training data
has a good set of features or less to no irrelevant features.