ML unit-1
ML unit-1
UNIT-I
Introduction: Learning, Types of Machine Learning, Supervised Learning, The Brain and the
Neuron, Design a Learning System, Perspectives and Issues in Machine Learning, Concept
Learning Task, Concept Learning as Search, Finding a Maximally Specific Hypothesis,
Version Spaces and the Candidate Elimination Algorithm, Linear Discriminates, Perceptron,
Linear Separability, Linear Regression
Learning:
Getting better at some task through practice is called Learning
Machine Learning:
Machine Learning is the study of computer algorithms that allow computer programs
to automatically improve through experience.
Machine Learning is about making computers modify or adapt their actions so that these
actions get more accurate.
Machine learning is a subset of AI, which enables the machine to automatically learn
from data, improve performance from past experiences, and make predictions.
Types of Machine Learning:
Supervised Learning:
A training set of examples with correct responses is provided and based on this, the
algorithm generalises to respond correctly to all possible inputs is called supervised
learning.
Works on labelled data.
Unsupervised Learning:
Correct answers are not provided, but instead the algorithm tries to identify similarities
between the inputs so that inputs that have something in common are categorised
together.
Works on unlabelled data.
Reinforcement Learning:
Reinforcement learning works on a feedback-based process, in which an AI agent
automatically explore its surrounding by hitting & trail, taking action, learning from
experiences, and improving its performance.
Agent gets rewarded for each good action and get punished for each bad action; hence
the goal of reinforcement learning agent is to maximize the rewards.
In reinforcement learning, there is no labelled data like supervised learning, and agents
learn from their experiences only.
Evolutionary Learning:
Biological evolution can be seen as a learning process.
Biological organisms adapt to improve their survival rates and chance of having
offspring in their environment.
1
R22 Machine Learning Lecture Notes
Uses fitness score, which corresponds to a score for how good the current solution is.
Supervised Learning:
The algorithm should produce sensible outputs for inputs that were not encountered
during learning is called generalization.
Supervised machine learning can be classified into two types of problems
o Classification
o Regression
Classification algorithms are used to solve the classification problems in which the
output variable is categorical, such as "Yes" or “No”.
The classification algorithms predict the categories present in the dataset.
Example: Spam Detection, Email Filtering
Regression algorithms are used to solve regression problems in which there is a linear
relationship between input and output variables.
These are used to predict continuous output variables, such as market trends, weather
prediction, etc.
The algorithm should produce sensible outputs for inputs that were not encountered
during learning is called generalisation.
The Machine Learning Process:
1. Data Collection and Preparation
Machine learning algorithms need significant amounts of data preferably
without too much noise.
It is hard to collect data because they are in a variety of a places and formats
and merging it appropriately is difficult.
We should ensure data should be clean that is it does not have significant errors,
missing data.
2. Feature Selection
It consists of identifying the features that are most useful for the problem
3. Choose Appropriate Algorithm
The knowledge of the underlying principles of each algorithm and examples of
their use is required.
4. Parameter and Model Selection
For many of the algorithms there are parameters that have to be set manually,
or that require experimentation to identify appropriate values.
5. Training
Given the dataset, algorithm, and parameters, training should be simply the use
of computational resources in order to build a model
6. Evaluation
Before a system can be deployed it needs to be tested and evaluated for accuracy
on data that it was not trained on.
The Brain and the Neuron:
In animals, learning occurs within the brain.
2
R22 Machine Learning Lecture Notes
If we can understand how the brain works, then there might be things in there for us to
copy and use for our machine learning systems.
The brain is an impressively powerful and complicated system.
It deals with noisy and even inconsistent data, and produces answers that are usually
correct from very high dimensional data (such as images) very quickly.
It weighs about 1.5 kg and is losing parts of itself all the time (neurons die as you age
at impressive/depressing rates), but its performance does not degrade appreciably.
The processing unit of the brain is called neuron (nerve cell)
Signals can be received from dendrites, and sent down the axon once enough signals
were received.
We can mimic most of this process by coming up with a function that receives a list of
weighted input signals and outputs some kind of signal if the sum of these weighted
inputs reach a certain bias.
Each neuron can be viewed as a separate processor, performing a very simple
computation: deciding whether or not to fire.
This makes the brain a massively parallel computer made up of 1011 processing
elements.
3
R22 Machine Learning Lecture Notes
The inputs xi are multiplied by the weights wi, and the neurons sum their values.
If this sum is greater than the threshold then the neuron fires; otherwise it does not.
a set of weighted inputs wi that correspond to the synapses
an adder that sums the input signals (equivalent to the membrane of the cell that
collects electrical charge)
an activation function (initially a threshold function) that decides whether the neuron
fires for the current inputs.
4
R22 Machine Learning Lecture Notes
5
R22 Machine Learning Lecture Notes
6
R22 Machine Learning Lecture Notes
What is the best way to reduce the learning task to one or more function approximation
problems?
What specific functions should the system attempt to learn? Can this process itself be
automated?
How can the learner automatically alter its representation to improve its ability to
represent and learn the target function?
The answers to the above questions will solve the most of the issues in machine learning.
Specific issues in Machine Learning:
1. Inadequate Training Data
2. Poor quality of data
3. Non-representative training data
4. Overfitting and Underfitting
5. Monitoring and Maintenance
6. Getting Bad Recommendations
7. Lack of Skilled Resources
8. Customer Segmentation
9. Process Complexity of Machine Learning
10. Data Bias
Concept Learning Task:
Inferring a Boolean-valued function from training examples of its input and output is
called concept learning.
Example Problem: Days on which my friend enjoys his favourite water sport.
Table: Positive and Negative Training examples for the target concept EnjoySport
The most general hypothesis –that every day is a positive example is represented by
<?, ?, ?, ?, ?, ?>
The most specific possible hypothesis – that no day is a positive example.
< ∅,∅,∅,∅,∅,∅>
7
R22 Machine Learning Lecture Notes
Example: EnjoySport
The first step of FIND-S algorithm is to initialize h to the most specific hypothesis in
H.
h< ∅,∅,∅,∅,∅,∅>
8
R22 Machine Learning Lecture Notes
Version Spaces:
The version space represents the set of all hypotheses that are consistent with the
training examples.
It includes all hypotheses that correctly classify all the training examples as belonging
to their respective categories.
It is possible to represent the version space by two sets of hypotheses:
(1) the most specific consistent hypotheses
(2) the most general consistent hypotheses
The Candidate Elimination Algorithm:
The key idea in the Candidate Elimination Algorithm is to output a description of the
set of all hypotheses consistent with the training examples.
Surprisingly, this algorithm computes the description of the set without explicitly
enumerating all of its members
Example: EnjoySport
9
R22 Machine Learning Lecture Notes
Initially : G = [[?,?,?,?,?,?],[?,?,?,?,?,?],[?,?,?,?,?,?],[?,?,?,?,?,?],[?,?,?,?,?,?],[?,?,?,?,?,?]]
S = [Null, Null, Null, Null, Null, Null]
10
R22 Machine Learning Lecture Notes
The method is based on discriminant functions that are estimated based on a set of data
called training set.
These discriminant functions are linear with respect to the characteristic vector, and
usually have the form
f(t)=wtx+b0,
where w represents the weight vector, x the input vector, and b0 a threshold.
Perceptron:
The Perceptron is nothing more than a collection of McCulloch and Pitts neurons
together with a set of inputs and some weights to fasten the inputs to the neurons
Notice that the neurons in the Perceptron are completely independent of each other:
It doesn’t matter to any neuron what the others are doing
It works out whether or not to fire by multiplying together its own weights and the
input, adding them together, and comparing the result to its own threshold.
The result is a pattern of firing and non-firing neurons, which looks like a vector of 0s
and 1s
Example: (0,1,0,0,1)
11
R22 Machine Learning Lecture Notes
12
R22 Machine Learning Lecture Notes
Linear Seperability:
Linear separability implies that if there are two classes then there will be a point, line,
plane, or hyperplane that splits the input features in such a way that all points of one
class are in one-half space and the second class is in the other half-space.
For example, here is a case of selling a house based on area and price. We have got a
number of data points for that along with the class, which is house Sold/Not Sold:
Linear Regression:
Linear regression is a type of supervised machine learning algorithm that computes the
linear relationship between the dependent variable and one or more independent
features by fitting a linear equation to observed data.
When there is only one independent feature, it is known as Simple Linear Regression,
and when there are more than one feature, it is known as Multiple Linear Regression.
Similarly, when there is only one dependent variable, it is considered Univariate Linear
Regression, while when there are more than one dependent variables, it is known as
Multivariate Regression.
Types of Linear Regression:
There are two main types of linear regression
1. Simple Linear Regression
2. Multiple Linear Regression
Simple Linear Regression:
This is the simplest form of linear regression.
It involves only one independent variable and one dependent variable.
The equation for simple linear regression is:
y=β0+β1X
where:
13
R22 Machine Learning Lecture Notes
This involves more than one independent variable and one dependent variable.
The equation for multiple linear regression is:
y=β0+β1X1+β2X2+………βnXn
where:
Y is the dependent variable
X1, X2, …, Xn are the independent variables
β0 is the intercept
β1, β2, …, βn are the slopes
The goal of the algorithm is to find the best Fit Line equation that can predict the values
based on the independent variables.
Best Fit Line:
The best Fit Line equation provides a straight line that represents the relationship between
the dependent and independent variables.
The slope of the line indicates how much the dependent variable changes for a unit change
in the independent variable(s).
Our primary objective while using linear regression is to locate the best-fit line, which
implies that the error between the predicted and actual values should be kept to a minimum.
There will be the least error in the best-fit line.
14
R22 Machine Learning Lecture Notes
*****
15