0% found this document useful (0 votes)
6 views

ML-UNIT-1

The document provides an overview of Artificial Intelligence (AI), Machine Learning (ML), and Deep Learning (DL), explaining their definitions, types, and applications across various industries. It details the different categories of machine learning, including supervised, unsupervised, reinforcement, and semi-supervised learning, along with examples of each. Additionally, it introduces deep learning as a subset of ML that utilizes neural networks to learn from large datasets.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
6 views

ML-UNIT-1

The document provides an overview of Artificial Intelligence (AI), Machine Learning (ML), and Deep Learning (DL), explaining their definitions, types, and applications across various industries. It details the different categories of machine learning, including supervised, unsupervised, reinforcement, and semi-supervised learning, along with examples of each. Additionally, it introduces deep learning as a subset of ML that utilizes neural networks to learn from large datasets.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 26

Page - 1

R20-MACHINE LEARNING
Unit-1:
Unit I: Introduction- Artificial Intelligence, Machine Learning, Deep learning, Types of Machine Learning
Systems, Main Challenges of Machine Learning.

Statistical Learning: Introduction, Supervised and Unsupervised Learning, Training and Test Loss, Tradeoffs
in Statistical Learning, Estimating Risk Statistics, Sampling distribution of an estimator, Empirical Risk
Minimization.

The evolution of machine learning from 1950 is depicted in Figure 1.1:


Page - 2

INTRODUCTION- ARTIFICIAL INTELLIGENCE, MACHINE LEARNING, DEEP LEARNING

 Artificial Intelligence is the concept of creating smart intelligent machines.


 Machine Learning is a subset of artificial intelligence that helps you build AI-driven applications.
 Deep Learning is a subset of machine learning that uses vast volumes of data and complex
algorithms totrain a model.

What is Artificial Intelligence?

“Artificial intelligence is the capability of a computer system to mimic human functions such as learning
and problem-solving. Through AI, a computer system uses maths and logic to simulate the reasoning that
people use to learn from new information and make decisions.”

“Artificial intelligence, commonly referred to as AI, is the process of imparting data, information, and
human intelligence to machines. The main goal of Artificial Intelligence is to develop self-reliant machines
that can think and act like humans. These machines can mimic human behavior and perform tasks by
learning and problem-solving. Most of the AI systems simulate natural intelligence to solve complex
problems.”

Let’s have a look at an example of an AI-driven product - Amazon Echo.

ML Unit - 1
Page - 3

Capabilities of AI and machine learning


Companies in almost every industry are discovering new opportunities through the connection between AI
and machine learning. These are just a few capabilities that have become valuable in helping companies transform
their processes and products:
Predictive analytics
This capability helps companies predict trends and behavioural patterns by discovering cause-and-effect
relationships in data.
Recommendation engines
With recommendation engines, companies use data analysis to recommend products that someone might be
interested in.
Speech recognition and natural language understanding
Speech recognition enables a computer system to identify words in spoken language, and natural language
understanding recognizes meaning in written or spoken language.
Image and video processing
These capabilities make it possible to recognise faces, objects, and actions in images and videos, and implement
functionalities such as visual search.
Sentiment analysis
A computer system uses sentiment analysis to identify and categorise positive, neutral, and negative attitudes that
are expressed in text.

Types of Artificial Intelligence

Reactive Machines - These are systems that only react. These systems don’t form memories, and they don’t
use any past experiences for making new decisions.
Limited Memory - These systems reference the past, and information is added over a period of time. The
referenced information is short-lived.
Theory of Mind - This covers systems that are able to understand human emotions and how they affect
decision making. They are trained to adjust their behavior accordingly.
Self-awareness - These systems are designed and created to be aware of themselves. They understand their
own internal states, predict other people’s feelings, and act appropriately.

Example for Applications of Artificial Intelligence:

 Machine Translation such as Google Translate


 Self Driving Vehicles such as Google’s Waymo
 AI Robots such as Sophia and Aibo
 Speech Recognition applications like Apple’s Siri or OK Google

ML Unit - 1
Page - 4

Applications of AI and machine learning


Companies in several industries are building applications that take advantage of the connection between
artificial intelligence and machine learning. These are just a few ways that AI and machine learning are
helping companies transform their processes and products:
 Retail
Retailers use AI and machine learning to optimise their inventories, build recommendation engines, and
enhance the customer experience with visual search.
 Healthcare
Health organizations put AI and machine learning to use in applications such as image processing for
improved cancer detection and predictive analytics for genomics research.
 Banking and finance
In financial contexts, AI and machine learning are valuable tools for purposes such as detecting fraud,
predicting risk, and providing more proactive financial advice.
 Sales and marketing
Sales and marketing teams use AI and machine learning for personalised offers, campaign optimisation,
sales forecasting, sentiment analysis, and prediction of customer churn.
 Cyber security
AI and machine learning are powerful weapons for cybersecurity, helping organisations protect
themselves and their customers by detecting anomalies.
 Customer service
Companies in a wide range of industries use chatbots and cognitive search to answer questions, gauge
customer intent, and provide virtual assistance.
 Transportation
AI and machine learning are valuable in transportation applications, where they help companies improve
the efficiency of their routes and use predictive analytics for purposes such as traffic forecasting.
 Manufacturing
Manufacturing companies use AI and machine learning for predictive maintenance and to make their
operations more efficient than ever.

An introduction to Machine Learning


Arthur Samuel, an early American leader in the field of computer gaming and artificial intelligence,
coined the term “Machine Learning ” in 1959 while at IBM. He defined machine learning as “the field of study
that gives computers the ability to learn without being explicitly programmed “. However, there is no universally
accepted definition for machine learning. Different authors define the term differently. We give below two more
definitions.
What is Machine Learning?
“Machine learning is an application of AI. It’s the process of using mathematical models of data to help a
computer learn without direct instruction. This enables a computer system to continue learning and improving on
its own, based on experience.”
Machine learning is programming computers to optimize a performance criterion using example data or
past experience. We have a model defined up to some parameters, and learning is the execution of a computer
program to optimize the parameters of the model using the training data or past experience. The model may be
predictive to make predictions in the future, or descriptive to gain knowledge from data.
The field of study known as machine learning is concerned with the question of how to construct computer
ML Unit - 1
Page - 5

programs that automatically improve with experience.


Definition of learning: A computer program is said to learn from experience E with respect to some class of tasks
T and performance measure P , if its performance at tasks T, as measured by P , improves with experience E.

Examples

1. Handwriting recognition learning problem


Task T: Recognizing and classifying handwritten words within images
Performance P: Percent of words correctly classified
Training experience E: A dataset of handwritten words with given classifications
2. A robot driving learning problem
Task T: Driving on highways using vision sensors
Performance P: Average distance travelled before an error
Training experience E: A sequence of images and steering commands recorded while observing ahuman
driver
Definition: A computer program which learns from experience is called a machine learning program or simply a
learning program.
How Does Machine Learning Work?
Machine learning accesses vast amounts of data (both structured and unstructured) and learns from it to
predict the future. It learns from the data by using multiple algorithms and techniques. Below is a diagram
that shows how a machine learns from data.

Why Use Machine Learning?


Consider how you would write a spam filter using traditional programming techniques(see in below figure)
1. First you would look at what spam typically looks like. You might notice that some words or phrases
(such as “4U,” “credit card,” “free,” and “amazing”) tend to come up a lot in the subject. Perhaps you would
also notice a few other patterns in the sender’s name, the email’s body, and so on.
2. You would write a detection algorithm for each of the patterns that you noticed, and your program would
flag emails as spam if a number of these patterns are detected.
3. You would test your program, and repeat steps 1 and 2 until it is good enough.

ML Unit - 1
Page - 6

Figure : The traditional approach

Since the problem is not trivial, your program will likely become a long list of com‐ plex rules—pretty hard
to maintain.
In contrast, a spam filter based on Machine Learning techniques automatically learns which words and
phrases are good predictors of spam by detecting unusually fre‐ quent patterns of words in the spam
examples compared to the ham examples (Below Figure). The program is much shorter, easier to maintain,
and most likely more accurate.

Figure : Machine Learning approach


Moreover, if spammers notice that all their emails containing “4U” are blocked, they might start writing
“For U” instead. A spam filter using traditional programming techniques would need to be updated to flag
“For U” emails. If spammers keep working around your spam filter, you will need to keep writing new rules
forever.

ML Unit - 1
Page - 7

In contrast, a spam filter based on Machine Learning techniques automatically notices that “For U” has
become unusually frequent in spam flagged by users, and it starts flagging them without your intervention

Figure : Automatically adapting to change


Finally, Machine Learning can help humans learn (Below Figure ): ML algorithms can be inspected to see
what they have learned (although for some algorithms this can be tricky). For instance, once the spam filter
has been trained on enough spam, it can easily be inspected to reveal the list of words and combinations of
words that it believes are the best predictors of spam. Sometimes this will reveal unsuspected correlations or
new trends, and thereby lead to a better understanding of the problem.
Applying ML techniques to dig into large amounts of data can help discover patterns that were not
immediately apparent. This is called data mining.

Figure: Machine Learning can help humans learn


Machine Learning is great for:
 Problems for which existing solutions require a lot of hand-tuning or long lists of rules: one
MachineLearning algorithm can often simplify code and perform better.
 Complex problems for which there is no good solution at all using a traditional approach: the best
Machine Learning techniques can find a solution.
 Fluctuating environments: a Machine Learning system can adapt to new data.
ML Unit - 1
Page - 8

 Getting insights about complex problems and large amounts of data.

TYPES OF MACHINE LEARNING (or)


CLASSIFICATION OF MACHINE LEARNING
Machine learning implementations are classified into four major categories, depending on the nature of the
learning “signal” or “response” available to a learning system which are as follows:
A. Supervised learning:
Supervised learning is the machine learning task of learning a function that maps an input to an output based on
example input-output pairs. The given data is labeled .Both classification and regression problems are supervised
learning problems .
Example - Consider the following data regarding patients entering a clinic . The data consists of the gender and
age of the patients and each patient is labeled as “healthy” or “sick”.

gender age label


M 48 sick
M 67 sick
F 53 healthy
M 49 sick
F 32 healthy
M 34 healthy
M 21 healthy

Supervised learning:
In supervised learning, the training data you feed to the algorithm includes the desired solutions, called labels

Figure : A labeled training set for supervised learning (e.g., spam classification)

A typical supervised learning task is classification. The spam filter is a good example of this: it is trained
with many example emails along with their class (spam or ham), and it must learn how to classify new
emails.

B. Unsupervised learning:
Unsupervised learning is a type of machine learning algorithm used to draw inferences from datasets consisting of
ML Unit - 1
Page - 9

input data without labeled responses. In unsupervised learning algorithms, classification or categorization is not
included in the observations. Example: Consider the following data regarding patients entering a clinic. The data
consists of the gender and age of the patients.

gender age
M 48
M 67
F 53
M 49
F 34
M 21

As a kind of learning, it resembles the methods humans use to figure out that certain objects or events are from the
same class, such as by observing the degree of similarity between objects. Some recommendation systems that you
find on the web in the form of marketing automation are based on this type of learning.

C.Reinforcement learning:
Reinforcement learning is the problem of getting an agent to act in the world so as to maximize its rewards.
A learner is not told what actions to take as in most forms of machine learning but instead must discover
which actions yield the most reward by trying them. For example — Consider teaching a dog a new trick: we
cannot tell him what to do, what not to do, but we can reward/punish it if it does the right/wrong thing.
When watching the video, notice how the program is initially clumsy and unskilled but steadily improves
with training until it becomes a champion.
Reinforcement learning is an area of Machine Learning. It is about taking suitable action to maximize reward in a
particular situation. It is employed by various software and machines to find the best possible behavior or path it
should take in a specific situation. Reinforcement learning differs from supervised learning in a way that in
supervised learning the training data has the answer key with it so the model is trained with the correct answer
itself whereas in reinforcement learning, there is no answer but the reinforcement agent decides what to do to
perform the given task. In the absence of a training dataset, it is bound to learn from its experience.

Example: The problem is as follows: We have an agent and a reward, with many hurdles in between. The agent is
supposed to find the best possible path to reach the reward. The following problem explains the problem more
easily.

The above image shows the robot, diamond, and fire. The goal of the robot is to get the reward that is the diamond
and avoid the hurdles that are fired. The robot learns by trying all the possible paths and then choosing the path
ML Unit - 1
Page - 10

which gives him the reward with the least hurdles. Each right step will give the robot a reward and each wrong
step will subtract the reward of the robot. The total reward will be calculated when it reaches the final reward that
is the diamond.

D. Semi-supervised learning:
Where an incomplete training signal is given: a training set with some (often many) of the target outputs
missing. There is a special case of this principle known as Transduction where the entire set of problem instances
is known at learning time, except that part of the targets are missing. Semi-supervised learning is an approach to
machine learning that combines small labeled data with a large amount of unlabeled data during training. Semi-
supervised learning falls between unsupervised learning and supervised learning.
Example:
Some photo-hosting services, such as Google Photos, are good examples of this. Once you upload all your
family photos to the service, it automatically recognizes that the same person A shows up in photos 1, 5, and 11,
while another person B shows up in photos 2, 5, and 7. This is the unsupervised part of the algorithm
(clustering). Now all the system needs is for you to tell it who these people are. Just one label per person,4 and
it is able to name everyone in every photo, which is useful for searching photos.

Figure : Semi-supervised learning

Introduction to Deep Learning

What is Deep Learning?


“Deep learning is a branch of machine learning which is completely based on artificial neural networks, as neural
network is going to mimic the human brain so deep learning is also a kind of mimic of human brain. In deep
learning, we don’t need to explicitly program everything. ”

Types of Deep Neural Networks

DNN and ANN :- Deep Learning is a subset of Machine Learning that is based on artificial neural
networks (ANNs) with multiple layers, also known as deep neural networks (DNNs). These neural networks are
inspired by the structure and function of the human brain, and they are designed to learn from large amounts of
data in an unsupervised or semi-supervised manner.

Deep Learning models are able to automatically learn features from the data, which makes them well-
suited for tasks such as image recognition, speech recognition, and natural language processing.

RNN: - The most widely used architectures in deep learning are feedforward neural networks,
convolutional neural networks (CNNs), and recurrent neural networks (RNNs).
ML Unit - 1
Page - 11

FNN:- Feedforward neural networks (FNNs) are the simplest type of ANN, with a linear flow of
information through the network. FNNs have been widely used for tasks such as image classification, speech
recognition, and natural language processing.
CNN:- Convolutional Neural Networks (CNNs) are a special type of FNNs designed specifically for
image and video recognition tasks. CNNs are able to automatically learn features from the images, which makes
them well-suited for tasks such as image classification, object detection, and image segmentation.

The major difference between deep learning vs machine learning is the way data is presented to the
machine. Machine learning algorithms usually require structured data, whereas deep learning networks work on
multiple layers of artificial neural networks.

This is what a simple neural network looks like:

The network has an input layer that accepts inputs from the data. The hidden layer is used to find any hidden
features from the data. The output layer then provides the expected output.

Recurrent Neural Networks (RNNs) are a type of neural networks that are able to process sequential data,
such as time series and natural language. RNNs are able to maintain an internal state that captures information
about the previous inputs, which makes them well-suited for tasks such as speech recognition, natural language
processing, and language translation.
In human brain approximately 100 billion neurons all together this is a picture of an individual neuron and
each neuron is connected through thousand of their neighbours. The question here is how do we recreate these
neurons in a computer. So, we create an artificial structure called an artificial neural net where we have nodes or
neurons. We have some neurons for input value and some for output value and in between, there may be lots of
neurons interconnected in the hidden layer.

Difference between Machine Learning and Deep Learning :

ML Unit - 1
Page - 12

Machine Learning Machine Learning


Works on small amount of Dataset for accuracy. Works on Large amount of Dataset.
Dependent on Low-end Machine. Heavily dependent on High-end Machine.
Divides the tasks into sub-tasks, solves them Solves problem end to end
individually and finally combine the results.
Takes less time to train. Takes longer time to train.
Testing time may increase. Less time to test the data.

Here is an example of a neural network that uses large sets of unlabeled data of eye
retinas.
The network model is trained on this data to find out whether or not a person has diabetic retinopathy.

How Does Deep Learning Work?


1. Calculate the weighted sums.
2. The calculated sum of weights is passed as input to the activation function.
3. The activation function takes the “weighted sum of input” as the input to the function, adds a bias,
and decides whether the neuron should be fired or not.
4. The output layer gives the predicted output.
5. The model output is compared with the actual output. After training the neural network, the model
uses the backpropagation method to improve the performance of the network. The cost function helps
to reduce the error rate.

Applications of Deep Learning


Automatic Text Generation – Corpus of text is learned and from this model new text is generated, word-by-word
or character-by-character. Then this model is capable of learning how to spell, punctuate, form sentences, or it
may even capture the style.
Healthcare – Helps in diagnosing various diseases and treating it.
Automatic Machine Translation – Certain words, sentences or phrases in one language is transformed into
another language (Deep Learning is achieving top results in the areas of text, images).
Image Recognition – Recognizes and identifies peoples and objects in images as well as to understand content
and context. This area is already being used in Gaming, Retail, Tourism, etc.
Predicting Earthquakes – Teaches a computer to perform viscoelastic computations which are used in predicting
earthquakes.
Deep learning has a wide range of applications in various fields such as computer vision, speech Recognition,
ML Unit - 1
Page - 13

Natural Language Processing, and many more. Some of the most common applications include:
Image and Video Recognition: Deep learning models are used to automatically classify images and videos,
detect objects, and identify faces. Applications include image and video search engines, self-driving cars, and
surveillance systems.
Deep learning models are used to transcribe and translate speech in real-time, which is used in voiceSpeech
Recognition:-controlled devices, such as virtual assistants, and accessibility technology for people with hearing
impairments.
Natural Language Processing: Deep learning models are used to understand, generate and translate human
languages. Applications include machine translation, text summarization, and sentiment analysis.
Robotics: Deep learning models are used to control robots and drones, and to improve their ability to perceive and
interact with the environment.
Healthcare: Deep learning models are used in medical imaging to detect diseases, in drug discovery to identify
new treatments, and in genomics to understand the underlying causes of diseases.
Finance: Deep learning models are used to detect fraud, predict stock prices, and analyze financial data.
Gaming: Deep learning models are used to create more realistic characters and environments, and to improve the
gameplay experience.
Recommender Systems: Deep learning models are used to make personalized recommendations to users, such as
product recommendations, movie recommendations, and news recommendations.
Social Media: Deep learning models are used to identify fake news, to flag harmful content and to filter out spam.
Autonomous systems: Deep learning models are used in self-driving cars, drones, and other autonomous systems
to make decisions based on sensor data.

Types of Machine Learning Systems

ML Unit - 1
Page - 14

Main Challenges of Machine Learning

In short, since your main task is to select a learning algorithm and train it on some data, the two things that can go
wrong are “bad algorithm” and “bad data.” Let’s start with examples of bad data.

1.Insufficient Quantity of Training Data


For a toddler to learn what an apple is, all it takes is for you to point to an apple and say “apple” (possibly
repeating this procedure a few times). Now the child is able to recognize apples in all sorts of colors and shapes.
Genius.
Machine Learning is not quite there yet; it takes a lot of data for most Machine Learning algorithms to work
properly. Even for very simple problems you typically need thousands of examples, and for complex problems
such as image or speech recognition you may need millions of examples (unless you can reuse parts of an existing
model).
The Unreasonable Effectiveness of Data In a famous paper published in 2001, Microsoft researchers Michele
Banko and Eric Brill showed that very different Machine Learning algorithms, including fairly simple ones,
performed almost identically well on a complex problem of natural language disambiguation8 once they were
given enough data (as you can see in Figure 1-20).

ML Unit - 1
Page - 15

Figure 1-20. The importance of data versus algorithms


As the authors put it: “these results suggest that we may want to reconsider the tradeoff between spending time
and money on algorithm development versus spending it on corpus development.”

2. Nonrepresentative Training Data


In order to generalize well, it is crucial that your training data be representative of the new cases you want to
generalize to. This is true whether you use instance-based learning or model-based learning.
For example, the set of countries we used earlier for training the linear model was not perfectly representative; a
few countries were missing. Figure 1-21 shows what the data looks like when you add the missing countries.

Figure 1-21. A more representative training sample


If you train a linear model on this data, you get the solid line, while the old model is represented by the dotted line.
As you can see, not only does adding a few missing countries significantly alter the model, but it makes it clear
that such a simple linear model is probably never going to work well. It seems that very rich countries are not
happier than moderately rich countries (in fact they seem unhappier), and conversely some poor countries seem
happier than many rich countries.
3. Poor-Quality Data
Obviously, if your training data is full of errors, outliers, and noise (e.g., due to poorquality measurements), it will
make it harder for the system to detect the underlying patterns, so your system is less likely to perform well. It is
often well worth the effort to spend time cleaning up your training data. The truth is, most data scientists spend a
significant part of their time doing just that. For example:
 If some instances are clearly outliers, it may help to simply discard them or try to fix the errors manually.
 If some instances are missing a few features (e.g., 5% of your customers did not specify their age), you
must decide whether you want to ignore this attribute altogether, ignore these instances, fill in the missing
values (e.g., with the median age), or train one model with the feature and one model without it, and so on.

ML Unit - 1
Page - 16

4. Irrelevant Features
As the saying goes: garbage in, garbage out. Your system will only be capable of learning if the training data
contains enough relevant features and not too many irrelevant ones. A critical part of the success of a Machine
Learning project is coming up with a good set of features to train on. This process, called feature engineering,
involves:
 Feature selection: selecting the most useful features to train on among existing features.
 Feature extraction: combining existing features to produce a more useful one (as we saw earlier,
dimensionality reduction algorithms can help).
 Creating new features by gathering new data.
5.Overfitting the Training Data
Say you are visiting a foreign country and the taxi driver rips you off. You might be tempted to say that all taxi
drivers in that country are thieves. Overgeneralizing is something that we humans do all too often, and
unfortunately machines can fall into the same trap if we are not careful. In Machine Learning this is called
overfitting: it means that the model performs well on the training data, but it does not generalize well.
Figure 1-22 shows an example of a high-degree polynomial life satisfaction model that strongly overfits the
training data. Even though it performs much better on the training data than the simple linear model, would you
really trust its predictions?

Figure 1-22. Overfitting the training dataa


Overfitting happens when the model is too complex relative to the amount and noisiness of the training data. The
possible solutions are:
 To simplify the model by selecting one with fewer parameters (e.g., a linear model rather than a high-
degree polynomial model), by reducing the number of attributes in the training data or by constraining the
model
 To gather more training data
 To reduce the noise in the training data (e.g., fix data errors and remove outliers)

6.Underfitting the Training Data


As you might guess, underfitting is the opposite of overfitting: it occurs when your model is too simple to learn
the underlying structure of the data. For example, a linear model of life satisfaction is prone to underfit; reality is
just more complex than the model, so its predictions are bound to be inaccurate, even on the training examples.
The main options to fix this problem are:
 Selecting a more powerful model, with more parameters
 Feeding better features to the learning algorithm (feature engineering)
 Reducing the constraints on the model (e.g., reducing the regularization hyperparameter)
7. Stepping Back
By now you already know a lot about Machine Learning. However, we went through so many concepts that you
ML Unit - 1
Page - 17

may be feeling a little lost, so let’s step back and look at the big picture:
• Machine Learning is about making machines get better at some task by learning from data, instead of having to
explicitly code rules.
• There are many different types of ML systems: supervised or not, batch or online, instance-based or model-
based, and so on.
• In a ML project you gather data in a training set, and you feed the training set to a learning algorithm. If the
algorithm is model-based it tunes some parameters to fit the model to the training set (i.e., to make good
predictions on the training set itself), and then hopefully it will be able to make good predictions on new cases as
well. If the algorithm is instance-based, it just learns the examples by heart and generalizes to new instances by
comparing them to the learned instances using a similarity measure.
• The system will not perform well if your training set is too small, or if the data is not representative, noisy, or
polluted with irrelevant features (garbage in, garbage out). Lastly, your model needs to be neither too simple (in
which case it will underfit) nor too complex (in which case it will overfit).

Statistical Learning: Introduction


There are two major goals for modeling data:
1) To accurately predict some future quantity of interest, given some observed data, and
2) To discover unusual or interesting patterns in the data. To achieve these goals, one must rely on knowledge
from three important pillars of the mathematical sciences.
Function approximation:- Building a mathematical model for data usually means understanding how one
data variable depends on another data variable. The most natural way to represent the relationship between
variables is via a mathematical function or map. We usually assume that this mathematical function is not
completely known, but can be approximated well given enough computing power and data. Thus, data
scientists have to understand how best to approximate and represent functions using the least amount of
computer processing and memory.
Optimization:- Given a class of mathematical models, we wish to find the best possible model in that class.
This requires some kind of efficient search or optimization procedure. The optimization step can be viewed as
a process of fitting or calibrating a function to observed data. This step usually requires knowledge of
optimization algorithms and efficient computer coding or programming.
Probability and Statistics: In general, the data used to fit the model is viewed as a realization of a random
process or numerical vector, whose probability law determines the accuracy with which we can predict
future observations. Thus, in order to quantify the uncertainty inherent in making predictions about the
future, and the sources of error in the model, data scientists need a firm grasp of probability theory and
statistical inference.

ML Unit - 1
Page - 18

Supervised and Unsupervised Learning:

Feature and response: Given an input or feature vector x, one of the main goals of machine learning is
to predict an output or response variable y.

For example,
 x could be a digitized signature and y a binary variable that indicates whether the signature is genuine or
false.
 x represents the weight and smoking habits of an expecting mother and y the birth weight of the baby.

Prediction function: which takes as an input x and outputs a guess g(x) for y (denoted by ^𝑦, for example)

Regression: the response variable y can take any real value.


when y can only lie in a finite set, say y ∈ {0, . . . , c − 1}, then predicting y is conceptually the same as
classifying the input x into one of c categories, and so prediction becomes a classification problem.

loss function: We can measure the accuracy of a prediction by with respect to a given response y by
lossfunction using some loss function Loss(y, ^𝑦 ). n a regression setting the usual choice is the squared error
loss
`12(y− ^𝑦)2 .
In the case of classification, the zero–one (also written 0–1) loss function Loss(y, ^𝑦) = 1{y , ^𝑦} is often
used, which incurs a loss of 1 whenever the predicted class by is not equal to the class y.
we will encounter various other useful loss functions, such as the cross-entropy and hinge loss functions.

Error is often used as a measure of distance between a “true” object y and some approximation ^𝑦, thereof. If
y is real-valued, the absolute error |y − ^𝑦 | and the squared error (y− ^𝑦 ,)2 are both well-established error
concepts, as are the norm ||y− ^𝑦 || and squared norm ||y− ^𝑦 || 2 for vectors. The squared error (y− ^𝑦 ) 2 is just
one example.

ML Unit - 1
Page - 19

Supervised Learning: One tries to learn the functional relationship between the feature vector x and
response y in the presence of a teacher who provides n examples. It is common to speak of “explaining” or
predicting y on the basis of explanatory x, where x is a vector of explanatory variables.
An example of supervised learning is email spam detection.

Unsupervised learning: learning makes no distinction between response and explanatory variables, and the
objective is simply to learn the structure of the unknown distribution of the data. In other words, we need to
learn f(x). In this case the guess g(x) is an approximation of f(x) and the risk is of the form

Training and Test Loss:


Given an arbitrary prediction function g, it is typically not possible to compute its risk .
However, using the training sample T, we can approximate via the empirical (sample average) risk

which we call the training loss. The training loss is thus an unbiased estimator of the risk (the expected loss)
for a prediction function g, based on the training data.

To approximate the optimal prediction function g∗ (the minimizer of the risk we first select a suitable
collection of approximating functions G and then take our learner to be the function in G that minimizes the
training loss; that is

The prediction accuracy of new pairs of data is measured by the generalization risk of the learner. For a
fixed training set τ it is defined as

ML Unit - 1
Page - 20

Figure: The generalization risk for a fixed training set is the weighted-average loss over all possible pairs (x,
y).

Figure: The expected generalization risk is the weighted-average loss over all possible pairs (x, y) and over
all training sets.

For any outcome τ of the training data, we can estimate the generalization risk without bias by taking the
sample average

ML Unit - 1
Page - 21

is called Test Loss.


Where is a so-called test sample. The test sample is completely separate
from T, but is drawn in the same way as T;

Tradeoffs in Statistical Learning:


The relation between model complexity, computational simplicity, and estimation accuracy, it is useful to
decompose the generalization risk into several parts, so that the tradeoffs between these parts can be studied.

We will consider two such decompositions: the approximation–estimation tradeoff and the bias–variance
tradeoff.
We can decompose the generalization risk into the following three components:
ML Unit - 1
Page - 22

The decomposition can now be interpreted as follows.

Thus, when using a squared-error loss, the generalization risk for a linear class can be decomposed as:

Note that in this decomposition the statistical error is the only term that depends on the training set.
The errors in a machine learning model can be broken down into 2 parts:
1. Reducible Error
2. Irreducible Error

Irreducible errors are errors that cannot be reduced even if you use any other machine learning model.

Reducible errors, on the other hand, is further broken down into square of bias and variance. Due to this
bias-variance, it causes the machine learning model to either overfit or underfit the given data.

What exactly is Bias?

Bias is the inability of a machine learning model to capture the true relationship between the data
variables. It is caused by the erroneous assumptions that are inherent to the learning algorithm. For example,
in linear regression, the relationship between the X and the Y variable is assumed to be linear, when in
reality the relationship may not be perfectly linear.
In general,
High Bias indicates more assumptions in the learning algorithm about the relationships between the
variables.
Less Bias indicates fewer assumptions in the learning algorithm.

ML Unit - 1
Page - 23

What is the Variance Error?


This is nothing but the concept of the model overfitting on a particular dataset. If the model learns to fit
very closely to the points on a particular dataset, when it used to predict on another dataset it may not predict
as accurately as it did in the first.

Variance is the difference in the fits between different datasets.

Generally, nonlinear machine learning algorithms like decision trees have a high variance. It is even higher
if the branches are not pruned during training.

Low-variance ML algorithms: Linear Regression, Logistic Regression, Linear Discriminant


Analysis.

High-variance ML algorithms: Decision Trees, K-NN, and Support Vector Machines.

Bias – Variance Tradeoff

Let’s summarize:
 If a model uses a simple machine learning algorithm like in the case of a linear model in the above
code, the model will have high bias and low variance (underfitting the data).
 If a model follows a complex machine learning model, then it will have high variance and low bias(
overfitting the data).

ML Unit - 1
Page - 24

 You need to find a good balance between the bias and variance of the model we have used. This
tradeoff in complexity is what is referred to as bias and variance tradeoff. An optimal balance of bias
and variance should never overfit or underfit the model.
 This tradeoff applies to all forms of supervised learning: classification, regression, and structured
output learning.
How to fix bias and variance problems?
Fixing High Bias
 Adding more input features will help improve the data to fit better.
 Add more polynomial features to improve the complexity of the model.
 Decrease the regularization term to have a balance between bias and variance.
Fixing High Variance
 Reduce the input features, use only features with more feature importance to reduce overfitting the
data.
 Getting more training data will help in this case, because the high variance model will not be
working for an independent dataset if you have very data.

Estimating Risk Statistics:


Different methods of estimating risk measures:
1. In-Sample Risk
2. 2. Cross-Validation

1. In-Sample Risk : Due to the phenomenon of overfitting, the training loss of the learner ,is not
agood estimate of the generalization risk of the learner..

2. Cross-Validation:

The idea is to make multiple identical copies of the data set, and to partition each copy into different
training and test sets, as illustrated in Below Figure. Here, there are four copies of the data set (consisting of
response and explanatory variables). Each copy is divided into a test set (colored blue) and training set
(colored pink). For each of these sets, we estimate the model parameters using only training data and then
predict the responses for the test set. The average loss between the predicted and observed responses is then
a measure for the predictive power of the model.

ML Unit - 1
Page - 25

Figure: An illustration of four-fold cross-validation, representing four copies of the same data set. The data
in each copy is partitioned into a training set (pink) and a test set (blue). The darker columns represent the
response variable and the lighter ones the explanatory variables.

Sampling distribution of an estimator:


In statistics, it is the probability distribution of the given statistic estimated on the basis of a random
sample. It provides a generalized way to statistical inference. The estimator is the generalized mathematical
parameter to calculate sample statistics. An estimate is the result of the estimation.

The sampling distribution of estimator depends on the sample size. The effect of change of the sample size
has to be determined. An estimate has a single numerical value and hence they are called point estimates.
There are various estimators like sample mean, sample standard deviation, proportion, variance, range etc.

Sampling distribution of the mean: It is the population mean from which the samples are drawn. For all the
sample sizes, it is likely to be normal if the population distribution is normal. The population mean is equal
31
Page - 26

to the mean of the sampling distribution of the mean. Sampling distribution of mean has the standard
deviation, which is as follows:

Where , is the standard deviation of the sampling mean , is the population standard deviation and n is
the sample size.
As the size of the sample increases, the spread of the sampling distribution of the mean decreases. But the
mean of the distribution remains the same and it is not affected by the sample size.

The sampling distribution of the standard deviation is the standard error of the standard deviation. It is
defined as:

Here, is the sampling distribution of the standard deviation. It is positively skewed for
small n but it approximately becomes normal for sample sizes greater than 30.

Empirical Risk Minimization:

Empirical risk minimization (ERM) is a principle in statistical learning theory which defines a family
of learning algorithms and is used to give theoretical bounds on their performance. The core idea is that we
cannot know exactly how well an algorithm will work in practice (the true "risk") because we don't know the
true distribution of data that the algorithm will work on, but we can instead measure its performance on a
known set of training data (the "empirical" risk).

In general, the risk R(h) cannot be computed because the distribution P(x,y) is unknown to the learning
algorithm (this situation is referred to as agnostic learning). However, we can compute an approximation,
called empirical risk, by averaging the loss function on the training set; more formally, computing the
expectation with respect to the empirical measure:

The empirical risk minimization principle states that the learning algorithm should choose a
hypothesis ℎ^ which minimizes the empirical risk:

Thus the learning algorithm defined by the ERM principle consists in solving the
above optimization problem.

31

You might also like