0% found this document useful (0 votes)
6 views

Machine Learning - 1 (UNIT - 1)

Uploaded by

Niharika Khanna
Copyright
© © All Rights Reserved
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
6 views

Machine Learning - 1 (UNIT - 1)

Uploaded by

Niharika Khanna
Copyright
© © All Rights Reserved
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
You are on page 1/ 6

What is Machine Learning

In the real world, we are surrounded by humans who can learn everything from their
experiences with their learning capability, and we have computers or machines which work
on our instructions. But can a machine also learn from experiences or past data like a human
does? So here comes the role of Machine Learning.

Introduction to Machine Learning

A subset of artificial intelligence known as machine learning focuses primarily on the


creation of algorithms that enable a computer to independently learn from data and previous
experiences. Arthur Samuel first used the term "machine learning" in 1959. It could be
summarized as follows:

Without being explicitly programmed, machine learning enables a machine to automatically


learn from data, improve performance from experiences, and predict things.

Machine learning algorithms create a mathematical model that, without being explicitly
programmed, aids in making predictions or decisions with the assistance of sample historical
data, or training data. For the purpose of developing predictive models, machine learning
brings together statistics and computer science. Algorithms that learn from historical data are
either constructed or utilized in machine learning. The performance will rise in proportion to
the quantity of information we provide.

A machine can learn if it can gain more data to improve its performance.
How does Machine Learning work

A machine learning system builds prediction models, learns from previous data, and predicts
the output of new data whenever it receives it. The amount of data helps to build a better
model that accurately predicts the output, which in turn affects the accuracy of the predicted
output.

Let's say we have a complex problem in which we need to make predictions. Instead of
writing code, we just need to feed the data to generic algorithms, which build the logic based
on the data and predict the output. Our perspective on the issue has changed as a result of
machine learning. The Machine Learning algorithm's operation is depicted in the following
block diagram:

Need of Machine Learning

Following are some key points which show the importance of Machine Learning:

o Rapid increment in the production of data


o Solving complex problems, which are difficult for a human
o Decision making in various sector including finance
o Finding hidden patterns and extracting useful information from data.

AI vs ML vs DL

Artificial Intelligence
Machine Learning (ML) Deep Learning (DL)
(AI)
AI simulates human ML is a subset of AI that DL is a subset of ML that
intelligence to perform tasks uses algorithms to learn employs artificial neural
and make decisions. patterns from data. networks for complex tasks.
AI may or may not require ML heavily relies on DL requires extensive labeled
large datasets; it can use labeled data for training and data and performs exceptionally
predefined rules. making predictions. with big datasets.
AI can be rule-based, requiring ML automates learning DL automates feature extraction,
human programming and from data and requires less reducing the need for manual
intervention. manual intervention. engineering.
AI can handle various tasks, ML specializes in data- DL excels at complex tasks like
from simple to complex, driven tasks like image recognition, natural
across domains. classification, regression, language processing, and more.
Artificial Intelligence
Machine Learning (ML) Deep Learning (DL)
(AI)
etc.
ML employs various DL relies on deep neural
AI algorithms can be simple or
algorithms like decision networks, which can have
complex, depending on the
trees, SVM, and random numerous hidden layers for
application.
forests. complex learning.
AI may require less training ML training time varies DL training demands substantial
time and resources for rule- with the algorithm computational resources and
based systems. complexity and dataset size. time for deep networks.
ML models can be
AI systems may offer DL models are often considered
interpretable or less
interpretable results based on less interpretable due to complex
interpretable based on the
human rules. network architectures.
algorithm.
AI is used in virtual assistants, ML is applied in image DL is utilized in autonomous
recommendation systems, and recognition, spam filtering, vehicles, speech recognition, and
more. and other data tasks. advanced AI applications.

Challenges in Machine Learning


1. Inadequate Training Data

The major issue that comes while using machine learning algorithms is the lack of quality as
well as quantity of data. Although data plays a vital role in the processing of machine
learning algorithms, many data scientists claim that inadequate data, noisy data, and unclean
data are extremely exhausting the machine learning algorithms. For example, a simple task
requires thousands of sample data, and an advanced task such as speech or image recognition
needs millions of sample data examples. Further, data quality is also important for the
algorithms to work ideally, but the absence of data quality is also found in Machine Learning
applications. Data quality can be affected by some factors as follows:

o Noisy Data- It is responsible for an inaccurate prediction that affects the decision as
well as accuracy in classification tasks.
o Incorrect data- It is also responsible for faulty programming and results obtained in
machine learning models. Hence, incorrect data may affect the accuracy of the results
also.
o Generalizing of output data- Sometimes, it is also found that generalizing output
data becomes complex, which results in comparatively poor future actions.

2. Poor quality of data

As we have discussed above, data plays a significant role in machine learning, and it must be
of good quality as well. Noisy data, incomplete data, inaccurate data, and unclean data lead to
less accuracy in classification and low-quality results. Hence, data quality can also be
considered as a major common problem while processing machine learning algorithms.

3. Non-representative training data

To make sure our training model is generalized well or not, we have to ensure that sample
training data must be representative of new cases that we need to generalize. The training
data must cover all cases that are already occurred as well as occurring.

Further, if we are using non-representative training data in the model, it results in less
accurate predictions. A machine learning model is said to be ideal if it predicts well for
generalized cases and provides accurate decisions. If there is less training data, then there will
be a sampling noise in the model, called the non-representative training set. It won't be
accurate in predictions. To overcome this, it will be biased against one class or a group.

Hence, we should use representative data in training to protect against being biased and make
accurate predictions without any drift.

4. Overfitting and Underfitting

Overfitting:

Overfitting is one of the most common issues faced by Machine Learning engineers and data
scientists. Whenever a machine learning model is trained with a huge amount of data, it starts
capturing noise and inaccurate data into the training data set. It negatively affects the
performance of the model. Let's understand with a simple example where we have a few
training data sets such as 1000 mangoes, 1000 apples, 1000 bananas, and 5000 papayas. Then
there is a considerable probability of identification of an apple as papaya because we have a
massive amount of biased data in the training data set; hence prediction got negatively
affected. The main reason behind overfitting is using non-linear methods used in machine
learning algorithms as they build non-realistic data models. We can overcome overfitting by
using linear and parametric algorithms in the machine learning models.
Methods to reduce overfitting:

o Increase training data in a dataset.


o Reduce model complexity by simplifying the model by selecting one with fewer
parameters
o Ridge Regularization and Lasso Regularization
o Early stopping during the training phase
o Reduce the noise
o Reduce the number of attributes in training data.
o Constraining the model.

Underfitting:

Underfitting is just the opposite of overfitting. Whenever a machine learning model is trained
with fewer amounts of data, and as a result, it provides incomplete and inaccurate data and
destroys the accuracy of the machine learning model.

Underfitting occurs when our model is too simple to understand the base structure of the data,
just like an undersized pant. This generally happens when we have limited data into the data
set, and we try to build a linear model with non-linear data. In such scenarios, the complexity
of the model destroys, and rules of the machine learning model become too easy to be applied
on this data set, and the model starts doing wrong predictions as well.

Methods to reduce Underfitting:

o Increase model complexity


o Remove noise from the data
o Trained on increased and better features
o Reduce the constraints
o Increase the number of epochs to get better results.

5) Irrelevant features

Although machine learning models are intended to give the best possible outcome, if we feed
garbage data as input, then the result will also be garbage. Hence, we should use relevant
features in our training sample. A machine learning model is said to be good if training data
has a good set of features or less to no irrelevant features.

6) Offline Learning & Deployment of the model

Simplifying model deployment and machine learning operations (MLOps)


The Challenge: Deploying and managing ML models can be a complex and time-consuming
process. ML models need to be deployed to a production environment, monitored for
performance, and updated as needed. This process, often referred to as MLOps, can be
challenging and requires significant resources.
7) Choosing the right production requirements for machine learning solutions
One of the biggest challenges in developing and deploying ML solutions is choosing the
right production requirements. Production requirements can include factors such as data size,
processing speed, and security considerations. These requirements must be carefully
considered to ensure that the ML solution will perform optimally in the production
environment.

You might also like