SlideShare a Scribd company logo
Linear Regression
Data Mining problems
• Data mining problems are often divided into Predictive tasks and
Descriptive tasks.
• Predictive Analytics (Supervised learning):
Given observed data (x1,y1), (x2,y2),..., (xn, yn) learn a model to predict Y
from X.
 If Yi is a continuous numeric value, this task is called prediction (E.g., Yi
= stock price, income, survival time)
 If Yi is a discrete or symbolic value, this task is called classification
(E.g., Yi ϵ {0, 1}, Yi ϵ{spam, email}, Yi ϵ {1, 2, 3, 4})
• Descriptive Analytics (Unsupervised learning):
Given data x1,x2,..xn, identify some underlying patterns or structure in the
data.
Regression in data mining
• Predict real-valued output for given input-given a training set
– Example:
• Predict rainfall in cm for month
• Predict stock price in next day
• Predict number of users who will click on an internet
advertisement
• Classification problem
– A set of predefined categories/classes
– Training examples with attribute as well as class information
available- supervised learning
– Classification task-predict class table for a new example-
predictive mining
• Clustering task:
– No predefined classes
– Attempt to find homogeneous groups in data -exploratory data
mining
– Training examples have attribute values only
– No class information is available
• Regression:
– it is predictive data mining
– for attribute values of an example you have to predict the output
– output is not a class
– output is a real value
– supervised learning
Linear Regression
• Linear regression aims to predict the response Y by estimating the best
linear predictor: the linear function that is closest to the true regression
function f.
• Task: predict real valued Y, given real valued vector x using a regression
model f.
• Error function, e.g., least squares is often used.
• Why is this usually called linear regression?
– Model is linear in the parameters
• Goal: Function f applied to training data should produce values as close as
possible in aggregate to actual outputs
• For example
– xi=temperature today
– Yi=rainfall volume tomorrow
• Another example:
– Xi=temperature today
– Yi=traffic density
• Training set consists of pairs (x1,y1), (x2,y2),..., (xn,yn). And
regression task is predict value of yn+1 for xn+1.
• When x has a single value called Univariate regression
• Multivariate regression:
– Training set:
– There is a single output y but there are multiple input x1,x2.
Example: Predict temperature of a place based on humidity and
pressure.
– There can be multiple output also.
• Regression model:
Y=f(x1,x2,..xn)  multivariate
Y=f(x)  univariate
Y=output dependent variable
x1, x2,...,xn input or the independent variable
f: regression function or model
)
,
,
),....(
,
,
(
),
,
,
( 1
1
2
1
1
2
2
2
2
1
1
1
2
1
1 


n
n
n
y
x
x
y
x
x
y
x
x
• The model f determines how the dependent variable y
depends on the independent variable x.
• Linear regression:
f is a linear function:
In general for linear regression:
Where a0 , a1,a2, an are the regression coefficients.
• Univariate case line
• Multivariate case plane
)
,...,
,
( 2
1 n
x
x
x
f
y 
n
n x
a
x
a
x
a
a
y 



 ...
2
2
1
1
0
1
1
0 x
a
a
y 

n
n x
a
x
a
x
a
a
y 



 ...
2
2
1
1
0
• Given :
Find a0, a1, such that
So that the line best fits the given data
a1,a2 are the slopes of the regression and a0 is the bias or axis intercept.
• Training a regression model:
Given : training set:
– Find the values of the regression coefficients that best matches /fits
the training data
– Univariate regression:
– Finds values of a0, a1 such that the line best fits the data.
)
,
(
),...,
,
(
),
,
( 2
2
1
1 n
n y
x
y
x
y
x
n
n x
a
x
a
x
a
a
y 



 ...
2
2
1
1
0
1
1
0 x
a
a
y 

)
,
,...,
,
),...,
,
,...,
,
(
),
,
,...,
,
( 2
1
2
2
2
2
2
1
1
1
1
2
1
1 n
n
k
n
n
k
k y
x
x
x
y
x
x
x
y
x
x
x
Least square error
• To find a line having the least error
• Define an error function of a line
• So define error function
• Where ei=difference between actual value of yi and model predicted value
of yi
• For a given value xi actual value is yi, predicted value is a0+a1xi
• So, error
• for univariate
• Here square is taken in error function as equal importance is given for
positive and negative, both are equally bad.



n
i
i
e
SSE
1
2
)]
(
[ 1
0 i
i
i x
a
a
y
e 


 
2
1
1
0
1
2
)
(

 





n
i
i
i
n
i
i x
a
a
y
e
S
• For multivariate,
• Find values of regression coefficients a0, a1 , ... such that sum square error
is minimised
• Predictions based on this equation are the best predictions possible in the
sense that they will be unbiased (equal to the true values on the average)
and will have the smallest expected squared error compared to any
unbiased estimates under the following assumptions.
– Linearity of the relationship between dependent and independent variables
– Statistical independence of the errors
– Homoskedasticity or constant variance of the errors
– Normality of the error distribution
 
2
1
2
2
1
1
0
1
2
)
...
(

 








n
i
ik
k
i
i
i
n
i
i x
a
x
a
x
a
a
y
e
S
Linear Regression
 
 
 
e
e
e
i
X
f
i
y
a
a
a
a
x
a
a
y
X
f
x
a
x
a
x
a
a
y
i
i
i
k
k
i
k
k
k
k


















)
S(
);
(
)
(
)
S(
:
function
Error
parameter
model
]
,...,
,
,
[
structure
model
)
:
(
...
2
2
2
1
0
1
0
2
2
1
1
0





X
y
form
in the
written
be
can
model
regression
linear
the
notation,
compact
With this
1
1
1
X
,
2
1
1
0
1
2
21
1
11
2
1
e
e
e
e
e
a
a
a
x
x
x
x
x
x
y
y
y
y
n
p
np
n
p
p
n






































































.
estimators
(OLS)
squares
least
ordinary
the
as
called
usually
or
,
estimators
regression
direct
the
called
is
equation
the
of
solutions
The
X)
X
(
estimates
final
get the
we
solving
By
)
(
)
(
squares
of
sum
residual
the
minimize
that
values
parameter
the
finding
by
parameters
the
estimates
regression
Linear
1
-
1
2
y
X
X
y
X
y
e
e
e
n
i
i















•
• is defined for training data.
• We are really interested in finding the best predicts y on
future data, i.e., minimising sum of squared error where the
expectation is over future data.
• This is known as Empirical learning which is based on data on
experiment. We are interested not only to minimise on the
training data and but also to get the best prediction on
unknown future data.
value
predicted
model
value
Actual
)
( 


S
)
(
S
• The usual assumption is the way that past data behaved future data will
also behave similarly.
– If we have a model which minimises error on past data it will also
minimise the error on future data.
– If training data is large the model is simple, we are assuming that the
best f on training data is also the best predictor f on future test data.
Limitation of Linear regression
• True relationship of X and Y might be non-linear
– suggests generalisations to non-linear models
• Complexity:
– cost of computational operation and time complexity increases with
number of attributes
• Correlation/Co-linearity among the X variables
– can cause numerical instability ( inverse does not exist if matrix is not
a full ranked)
– problems in interpretability (identifiability: determining whether the
model true parameters may be recovered from the observed data)
• Includes all variables in the model..
– But what if attributes =1000 and only 3 variables are actually related
to Y?
Complexity vs. goodness of fit
• Suppose the regression model is a
linear and it is too simple
– Simple model does not fit the data
well has large training set error
– A biased solution
• Suppose large data on training data
itself makes model more complex
nonlinear regression model
– Complex model has low training set
error but high error on future points
causes overfitting
– Small changes to the data, solution
changes a lot
– A high--‐variance solution
• Occam’s Razor principle (Principle of Parsimony):
– The principle states that "Entities should not be multiplied
unnecessarily.“
– "when you have two competing theories that make exactly the
same predictions, the simpler one is the better."
– Use the simplest model which gives acceptable accuracy on
training set –do not complicate the model to overfit the training
data
• Choose the model which sacrifice some training set errors for
better performance on future sample.
• Penalize complex models based on
– Prior information (bias)
– Information Criterion (MDL, AIC, BIC)
Bias and variance for regression
• For regression, we can easily decompose the error of the learned model into two
parts: bias (error 1) and variance (error 2)
• Bias:
– The difference between the average prediction of our model and the correct
value which we are trying to predict.
– How much does the mean of the predictor differ from the optimal predictor
• Variance:
– The variability of model prediction for a given data point or a value which tells
us spread of our data.
– How much does the predictor vary about its mean for different training
datasets
– The variance of a learning algorithm is a measure of its precision. High
variance error of a model implies that it is highly sensitive to small
fluctuations.
Linear Regression for Data Mining Application
Training and Test Error
• Given a dataset, training data used to fit the parameters of the
model. Training data choose a loss function e.g., squared error for
regression.
• The training error is the mean error over the training sample.
• The test (or generalization) error is the expected prediction error
over an independent test sample.
• Prediction error or true (generalization) error (over the whole
population) is for the target performance measure, i.e.,
performance on a random test point (X,Y).
• Training error is not a good estimator for test error.
Model Complexity and Generalization
• A models ability to adapt to patterns in the data, we call the model
complexity.
• A model with greater complexity might be theoretically more accurate
(i.e., low bias).
– But you have less control on what it might predict on a tiny training
data set.
– Different training data sets will result in widely varying predictions of
same test instance.
• Generalization ability: We want good predictions on new data, i.e.,
‘generalization’. What is the out-of-sample error of learner f ?
• Training error can be reduced by making the hypothesis more sensitive to
training data, but this may lead to overfitting and poor generalization.
Model Selection and Assessment
• When we want to estimate test error, we may have two different goals in
mind:
1. Model selection: Estimate the performance of different hypotheses or
algorithms in order to choose the (approximately) best one.
2. Model assessment: Having chosen a final hypothesis or algorithm,
estimate its generalization error on new data.
• Trade-off between bias and variance:
– Simple Models: High Bias, Low Variance
– Complex Models: Low Bias, High Variance
• Thus, a designer is virtually always confronted to the following
dilemma:
– On one hand, if the model is too simple, it will give a poor
approximation of the phenomenon (underfitting).
– On the other hand, if the model is too complex, it will be able to
fit exactly the examples available, without finding a consistent
way of modelling (overfitting).
• Choice of models balances bias and variance.
– Over‐fitting  Variance is too High
– Under‐fitting  Bias is too High
Training, Validation and Test Data
• In a data-rich situation, the best approach to both model selection and
model assessment is to randomly divide the dataset into three parts:
1. A training set used to fit the models.
2. A validation set (or development test set) used to estimate test error for
model selection.
3. A test set (or evaluation test set) used for assessment of the
generalization error of the finally chosen model.
• Training: train different models
• Validation: evaluate different models
• Test: evaluate the accuracy of the final model
The trained model can then be used to make predictions on
unseen observations
Ad

More Related Content

Similar to Linear Regression for Data Mining Application (20)

Data Science and Machine Learning with Tensorflow
 Data Science and Machine Learning with Tensorflow Data Science and Machine Learning with Tensorflow
Data Science and Machine Learning with Tensorflow
Shubham Sharma
 
2a-linear-regression-18Maykjkij;oik;.pptx
2a-linear-regression-18Maykjkij;oik;.pptx2a-linear-regression-18Maykjkij;oik;.pptx
2a-linear-regression-18Maykjkij;oik;.pptx
rundalomary12
 
Machine learning Introduction
Machine learning IntroductionMachine learning Introduction
Machine learning Introduction
Kuppusamy P
 
Reason To Switch to DNNDNNs excel in handling huge volumes of data (e.g., ima...
Reason To Switch to DNNDNNs excel in handling huge volumes of data (e.g., ima...Reason To Switch to DNNDNNs excel in handling huge volumes of data (e.g., ima...
Reason To Switch to DNNDNNs excel in handling huge volumes of data (e.g., ima...
SrideviPcSenthilkuma
 
Ml ppt at
Ml ppt atMl ppt at
Ml ppt at
pradeep kumar
 
Machine Learning Notes for beginners ,Step by step
Machine Learning Notes for beginners ,Step by stepMachine Learning Notes for beginners ,Step by step
Machine Learning Notes for beginners ,Step by step
SanjanaSaxena17
 
Regression.pptx
Regression.pptxRegression.pptx
Regression.pptx
KeshavSingla42
 
3ml.pdf
3ml.pdf3ml.pdf
3ml.pdf
MianAdnan27
 
Fundamentals of Data Science Modeling Lec
Fundamentals of Data Science Modeling LecFundamentals of Data Science Modeling Lec
Fundamentals of Data Science Modeling Lec
RBeze58
 
Training and Testing Neural Network unit II
Training and Testing Neural Network unit IITraining and Testing Neural Network unit II
Training and Testing Neural Network unit II
tintu47
 
Machine Learning - Dataset Preparation
Machine Learning - Dataset PreparationMachine Learning - Dataset Preparation
Machine Learning - Dataset Preparation
Andrew Ferlitsch
 
UNIT 3.pptx.......................................
UNIT 3.pptx.......................................UNIT 3.pptx.......................................
UNIT 3.pptx.......................................
vijayannamratha
 
Lec4(Multiple Regression) & Building a Model & Dummy Variable.pptx
Lec4(Multiple Regression) & Building a Model & Dummy Variable.pptxLec4(Multiple Regression) & Building a Model & Dummy Variable.pptx
Lec4(Multiple Regression) & Building a Model & Dummy Variable.pptx
au1417257
 
Deeplearning for Computer Vision PPT with
Deeplearning for Computer Vision PPT withDeeplearning for Computer Vision PPT with
Deeplearning for Computer Vision PPT with
naveenraghavendran10
 
cs 601 - lecture 1.pptx
cs 601 - lecture 1.pptxcs 601 - lecture 1.pptx
cs 601 - lecture 1.pptx
GopalPatidar13
 
Machine Learning
Machine Learning Machine Learning
Machine Learning
Dhananjay Birmole
 
Machine learning
Machine learningMachine learning
Machine learning
Sukhwinder Singh
 
Machine learning introduction to unit 1.ppt
Machine learning introduction to unit 1.pptMachine learning introduction to unit 1.ppt
Machine learning introduction to unit 1.ppt
ShivaShiva783981
 
MachineLearning_Unit-II.FHDGFHJKpptx.pdf
MachineLearning_Unit-II.FHDGFHJKpptx.pdfMachineLearning_Unit-II.FHDGFHJKpptx.pdf
MachineLearning_Unit-II.FHDGFHJKpptx.pdf
22eg105n49
 
Demystifying the Bias-Variance Tradeoff
Demystifying the Bias-Variance TradeoffDemystifying the Bias-Variance Tradeoff
Demystifying the Bias-Variance Tradeoff
Ashwin Rao
 
Data Science and Machine Learning with Tensorflow
 Data Science and Machine Learning with Tensorflow Data Science and Machine Learning with Tensorflow
Data Science and Machine Learning with Tensorflow
Shubham Sharma
 
2a-linear-regression-18Maykjkij;oik;.pptx
2a-linear-regression-18Maykjkij;oik;.pptx2a-linear-regression-18Maykjkij;oik;.pptx
2a-linear-regression-18Maykjkij;oik;.pptx
rundalomary12
 
Machine learning Introduction
Machine learning IntroductionMachine learning Introduction
Machine learning Introduction
Kuppusamy P
 
Reason To Switch to DNNDNNs excel in handling huge volumes of data (e.g., ima...
Reason To Switch to DNNDNNs excel in handling huge volumes of data (e.g., ima...Reason To Switch to DNNDNNs excel in handling huge volumes of data (e.g., ima...
Reason To Switch to DNNDNNs excel in handling huge volumes of data (e.g., ima...
SrideviPcSenthilkuma
 
Machine Learning Notes for beginners ,Step by step
Machine Learning Notes for beginners ,Step by stepMachine Learning Notes for beginners ,Step by step
Machine Learning Notes for beginners ,Step by step
SanjanaSaxena17
 
Fundamentals of Data Science Modeling Lec
Fundamentals of Data Science Modeling LecFundamentals of Data Science Modeling Lec
Fundamentals of Data Science Modeling Lec
RBeze58
 
Training and Testing Neural Network unit II
Training and Testing Neural Network unit IITraining and Testing Neural Network unit II
Training and Testing Neural Network unit II
tintu47
 
Machine Learning - Dataset Preparation
Machine Learning - Dataset PreparationMachine Learning - Dataset Preparation
Machine Learning - Dataset Preparation
Andrew Ferlitsch
 
UNIT 3.pptx.......................................
UNIT 3.pptx.......................................UNIT 3.pptx.......................................
UNIT 3.pptx.......................................
vijayannamratha
 
Lec4(Multiple Regression) & Building a Model & Dummy Variable.pptx
Lec4(Multiple Regression) & Building a Model & Dummy Variable.pptxLec4(Multiple Regression) & Building a Model & Dummy Variable.pptx
Lec4(Multiple Regression) & Building a Model & Dummy Variable.pptx
au1417257
 
Deeplearning for Computer Vision PPT with
Deeplearning for Computer Vision PPT withDeeplearning for Computer Vision PPT with
Deeplearning for Computer Vision PPT with
naveenraghavendran10
 
cs 601 - lecture 1.pptx
cs 601 - lecture 1.pptxcs 601 - lecture 1.pptx
cs 601 - lecture 1.pptx
GopalPatidar13
 
Machine learning introduction to unit 1.ppt
Machine learning introduction to unit 1.pptMachine learning introduction to unit 1.ppt
Machine learning introduction to unit 1.ppt
ShivaShiva783981
 
MachineLearning_Unit-II.FHDGFHJKpptx.pdf
MachineLearning_Unit-II.FHDGFHJKpptx.pdfMachineLearning_Unit-II.FHDGFHJKpptx.pdf
MachineLearning_Unit-II.FHDGFHJKpptx.pdf
22eg105n49
 
Demystifying the Bias-Variance Tradeoff
Demystifying the Bias-Variance TradeoffDemystifying the Bias-Variance Tradeoff
Demystifying the Bias-Variance Tradeoff
Ashwin Rao
 

Recently uploaded (20)

Medical Dataset including visualizations
Medical Dataset including visualizationsMedical Dataset including visualizations
Medical Dataset including visualizations
vishrut8750588758
 
IAS-slides2-ia-aaaaaaaaaaain-business.pdf
IAS-slides2-ia-aaaaaaaaaaain-business.pdfIAS-slides2-ia-aaaaaaaaaaain-business.pdf
IAS-slides2-ia-aaaaaaaaaaain-business.pdf
mcgardenlevi9
 
Digilocker under workingProcess Flow.pptx
Digilocker  under workingProcess Flow.pptxDigilocker  under workingProcess Flow.pptx
Digilocker under workingProcess Flow.pptx
satnamsadguru491
 
Developing Security Orchestration, Automation, and Response Applications
Developing Security Orchestration, Automation, and Response ApplicationsDeveloping Security Orchestration, Automation, and Response Applications
Developing Security Orchestration, Automation, and Response Applications
VICTOR MAESTRE RAMIREZ
 
03 Daniel 2-notes.ppt seminario escatologia
03 Daniel 2-notes.ppt seminario escatologia03 Daniel 2-notes.ppt seminario escatologia
03 Daniel 2-notes.ppt seminario escatologia
Alexander Romero Arosquipa
 
VKS-Python-FIe Handling text CSV Binary.pptx
VKS-Python-FIe Handling text CSV Binary.pptxVKS-Python-FIe Handling text CSV Binary.pptx
VKS-Python-FIe Handling text CSV Binary.pptx
Vinod Srivastava
 
Geometry maths presentation for begginers
Geometry maths presentation for begginersGeometry maths presentation for begginers
Geometry maths presentation for begginers
zrjacob283
 
04302025_CCC TUG_DataVista: The Design Story
04302025_CCC TUG_DataVista: The Design Story04302025_CCC TUG_DataVista: The Design Story
04302025_CCC TUG_DataVista: The Design Story
ccctableauusergroup
 
FPET_Implementation_2_MA to 360 Engage Direct.pptx
FPET_Implementation_2_MA to 360 Engage Direct.pptxFPET_Implementation_2_MA to 360 Engage Direct.pptx
FPET_Implementation_2_MA to 360 Engage Direct.pptx
ssuser4ef83d
 
Template_A3nnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnn
Template_A3nnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnTemplate_A3nnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnn
Template_A3nnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnn
cegiver630
 
Secure_File_Storage_Hybrid_Cryptography.pptx..
Secure_File_Storage_Hybrid_Cryptography.pptx..Secure_File_Storage_Hybrid_Cryptography.pptx..
Secure_File_Storage_Hybrid_Cryptography.pptx..
yuvarajreddy2002
 
EDU533 DEMO.pptxccccvbnjjkoo jhgggggbbbb
EDU533 DEMO.pptxccccvbnjjkoo jhgggggbbbbEDU533 DEMO.pptxccccvbnjjkoo jhgggggbbbb
EDU533 DEMO.pptxccccvbnjjkoo jhgggggbbbb
JessaMaeEvangelista2
 
computer organization and assembly language.docx
computer organization and assembly language.docxcomputer organization and assembly language.docx
computer organization and assembly language.docx
alisoftwareengineer1
 
DPR_Expert_Recruitment_notice_Revised.pdf
DPR_Expert_Recruitment_notice_Revised.pdfDPR_Expert_Recruitment_notice_Revised.pdf
DPR_Expert_Recruitment_notice_Revised.pdf
inmishra17121973
 
Principles of information security Chapter 5.ppt
Principles of information security Chapter 5.pptPrinciples of information security Chapter 5.ppt
Principles of information security Chapter 5.ppt
EstherBaguma
 
Molecular methods diagnostic and monitoring of infection - Repaired.pptx
Molecular methods diagnostic and monitoring of infection  -  Repaired.pptxMolecular methods diagnostic and monitoring of infection  -  Repaired.pptx
Molecular methods diagnostic and monitoring of infection - Repaired.pptx
7tzn7x5kky
 
Classification_in_Machinee_Learning.pptx
Classification_in_Machinee_Learning.pptxClassification_in_Machinee_Learning.pptx
Classification_in_Machinee_Learning.pptx
wencyjorda88
 
chapter 4 Variability statistical research .pptx
chapter 4 Variability statistical research .pptxchapter 4 Variability statistical research .pptx
chapter 4 Variability statistical research .pptx
justinebandajbn
 
183409-christina-rossetti.pdfdsfsdasggsag
183409-christina-rossetti.pdfdsfsdasggsag183409-christina-rossetti.pdfdsfsdasggsag
183409-christina-rossetti.pdfdsfsdasggsag
fardin123rahman07
 
MASAkkjjkttuyrdquesjhjhjfc44dddtions.docx
MASAkkjjkttuyrdquesjhjhjfc44dddtions.docxMASAkkjjkttuyrdquesjhjhjfc44dddtions.docx
MASAkkjjkttuyrdquesjhjhjfc44dddtions.docx
santosh162
 
Medical Dataset including visualizations
Medical Dataset including visualizationsMedical Dataset including visualizations
Medical Dataset including visualizations
vishrut8750588758
 
IAS-slides2-ia-aaaaaaaaaaain-business.pdf
IAS-slides2-ia-aaaaaaaaaaain-business.pdfIAS-slides2-ia-aaaaaaaaaaain-business.pdf
IAS-slides2-ia-aaaaaaaaaaain-business.pdf
mcgardenlevi9
 
Digilocker under workingProcess Flow.pptx
Digilocker  under workingProcess Flow.pptxDigilocker  under workingProcess Flow.pptx
Digilocker under workingProcess Flow.pptx
satnamsadguru491
 
Developing Security Orchestration, Automation, and Response Applications
Developing Security Orchestration, Automation, and Response ApplicationsDeveloping Security Orchestration, Automation, and Response Applications
Developing Security Orchestration, Automation, and Response Applications
VICTOR MAESTRE RAMIREZ
 
VKS-Python-FIe Handling text CSV Binary.pptx
VKS-Python-FIe Handling text CSV Binary.pptxVKS-Python-FIe Handling text CSV Binary.pptx
VKS-Python-FIe Handling text CSV Binary.pptx
Vinod Srivastava
 
Geometry maths presentation for begginers
Geometry maths presentation for begginersGeometry maths presentation for begginers
Geometry maths presentation for begginers
zrjacob283
 
04302025_CCC TUG_DataVista: The Design Story
04302025_CCC TUG_DataVista: The Design Story04302025_CCC TUG_DataVista: The Design Story
04302025_CCC TUG_DataVista: The Design Story
ccctableauusergroup
 
FPET_Implementation_2_MA to 360 Engage Direct.pptx
FPET_Implementation_2_MA to 360 Engage Direct.pptxFPET_Implementation_2_MA to 360 Engage Direct.pptx
FPET_Implementation_2_MA to 360 Engage Direct.pptx
ssuser4ef83d
 
Template_A3nnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnn
Template_A3nnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnTemplate_A3nnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnn
Template_A3nnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnn
cegiver630
 
Secure_File_Storage_Hybrid_Cryptography.pptx..
Secure_File_Storage_Hybrid_Cryptography.pptx..Secure_File_Storage_Hybrid_Cryptography.pptx..
Secure_File_Storage_Hybrid_Cryptography.pptx..
yuvarajreddy2002
 
EDU533 DEMO.pptxccccvbnjjkoo jhgggggbbbb
EDU533 DEMO.pptxccccvbnjjkoo jhgggggbbbbEDU533 DEMO.pptxccccvbnjjkoo jhgggggbbbb
EDU533 DEMO.pptxccccvbnjjkoo jhgggggbbbb
JessaMaeEvangelista2
 
computer organization and assembly language.docx
computer organization and assembly language.docxcomputer organization and assembly language.docx
computer organization and assembly language.docx
alisoftwareengineer1
 
DPR_Expert_Recruitment_notice_Revised.pdf
DPR_Expert_Recruitment_notice_Revised.pdfDPR_Expert_Recruitment_notice_Revised.pdf
DPR_Expert_Recruitment_notice_Revised.pdf
inmishra17121973
 
Principles of information security Chapter 5.ppt
Principles of information security Chapter 5.pptPrinciples of information security Chapter 5.ppt
Principles of information security Chapter 5.ppt
EstherBaguma
 
Molecular methods diagnostic and monitoring of infection - Repaired.pptx
Molecular methods diagnostic and monitoring of infection  -  Repaired.pptxMolecular methods diagnostic and monitoring of infection  -  Repaired.pptx
Molecular methods diagnostic and monitoring of infection - Repaired.pptx
7tzn7x5kky
 
Classification_in_Machinee_Learning.pptx
Classification_in_Machinee_Learning.pptxClassification_in_Machinee_Learning.pptx
Classification_in_Machinee_Learning.pptx
wencyjorda88
 
chapter 4 Variability statistical research .pptx
chapter 4 Variability statistical research .pptxchapter 4 Variability statistical research .pptx
chapter 4 Variability statistical research .pptx
justinebandajbn
 
183409-christina-rossetti.pdfdsfsdasggsag
183409-christina-rossetti.pdfdsfsdasggsag183409-christina-rossetti.pdfdsfsdasggsag
183409-christina-rossetti.pdfdsfsdasggsag
fardin123rahman07
 
MASAkkjjkttuyrdquesjhjhjfc44dddtions.docx
MASAkkjjkttuyrdquesjhjhjfc44dddtions.docxMASAkkjjkttuyrdquesjhjhjfc44dddtions.docx
MASAkkjjkttuyrdquesjhjhjfc44dddtions.docx
santosh162
 
Ad

Linear Regression for Data Mining Application

  • 2. Data Mining problems • Data mining problems are often divided into Predictive tasks and Descriptive tasks. • Predictive Analytics (Supervised learning): Given observed data (x1,y1), (x2,y2),..., (xn, yn) learn a model to predict Y from X.  If Yi is a continuous numeric value, this task is called prediction (E.g., Yi = stock price, income, survival time)  If Yi is a discrete or symbolic value, this task is called classification (E.g., Yi ϵ {0, 1}, Yi ϵ{spam, email}, Yi ϵ {1, 2, 3, 4}) • Descriptive Analytics (Unsupervised learning): Given data x1,x2,..xn, identify some underlying patterns or structure in the data.
  • 3. Regression in data mining • Predict real-valued output for given input-given a training set – Example: • Predict rainfall in cm for month • Predict stock price in next day • Predict number of users who will click on an internet advertisement
  • 4. • Classification problem – A set of predefined categories/classes – Training examples with attribute as well as class information available- supervised learning – Classification task-predict class table for a new example- predictive mining • Clustering task: – No predefined classes – Attempt to find homogeneous groups in data -exploratory data mining – Training examples have attribute values only – No class information is available
  • 5. • Regression: – it is predictive data mining – for attribute values of an example you have to predict the output – output is not a class – output is a real value – supervised learning
  • 6. Linear Regression • Linear regression aims to predict the response Y by estimating the best linear predictor: the linear function that is closest to the true regression function f. • Task: predict real valued Y, given real valued vector x using a regression model f. • Error function, e.g., least squares is often used. • Why is this usually called linear regression? – Model is linear in the parameters • Goal: Function f applied to training data should produce values as close as possible in aggregate to actual outputs
  • 7. • For example – xi=temperature today – Yi=rainfall volume tomorrow • Another example: – Xi=temperature today – Yi=traffic density • Training set consists of pairs (x1,y1), (x2,y2),..., (xn,yn). And regression task is predict value of yn+1 for xn+1. • When x has a single value called Univariate regression
  • 8. • Multivariate regression: – Training set: – There is a single output y but there are multiple input x1,x2. Example: Predict temperature of a place based on humidity and pressure. – There can be multiple output also. • Regression model: Y=f(x1,x2,..xn)  multivariate Y=f(x)  univariate Y=output dependent variable x1, x2,...,xn input or the independent variable f: regression function or model ) , , ),....( , , ( ), , , ( 1 1 2 1 1 2 2 2 2 1 1 1 2 1 1    n n n y x x y x x y x x
  • 9. • The model f determines how the dependent variable y depends on the independent variable x. • Linear regression: f is a linear function: In general for linear regression: Where a0 , a1,a2, an are the regression coefficients. • Univariate case line • Multivariate case plane ) ,..., , ( 2 1 n x x x f y  n n x a x a x a a y      ... 2 2 1 1 0 1 1 0 x a a y   n n x a x a x a a y      ... 2 2 1 1 0
  • 10. • Given : Find a0, a1, such that So that the line best fits the given data a1,a2 are the slopes of the regression and a0 is the bias or axis intercept. • Training a regression model: Given : training set: – Find the values of the regression coefficients that best matches /fits the training data – Univariate regression: – Finds values of a0, a1 such that the line best fits the data. ) , ( ),..., , ( ), , ( 2 2 1 1 n n y x y x y x n n x a x a x a a y      ... 2 2 1 1 0 1 1 0 x a a y   ) , ,..., , ),..., , ,..., , ( ), , ,..., , ( 2 1 2 2 2 2 2 1 1 1 1 2 1 1 n n k n n k k y x x x y x x x y x x x
  • 11. Least square error • To find a line having the least error • Define an error function of a line • So define error function • Where ei=difference between actual value of yi and model predicted value of yi • For a given value xi actual value is yi, predicted value is a0+a1xi • So, error • for univariate • Here square is taken in error function as equal importance is given for positive and negative, both are equally bad.    n i i e SSE 1 2 )] ( [ 1 0 i i i x a a y e      2 1 1 0 1 2 ) (         n i i i n i i x a a y e S
  • 12. • For multivariate, • Find values of regression coefficients a0, a1 , ... such that sum square error is minimised • Predictions based on this equation are the best predictions possible in the sense that they will be unbiased (equal to the true values on the average) and will have the smallest expected squared error compared to any unbiased estimates under the following assumptions. – Linearity of the relationship between dependent and independent variables – Statistical independence of the errors – Homoskedasticity or constant variance of the errors – Normality of the error distribution   2 1 2 2 1 1 0 1 2 ) ... (            n i ik k i i i n i i x a x a x a a y e S
  • 13. Linear Regression       e e e i X f i y a a a a x a a y X f x a x a x a a y i i i k k i k k k k                   ) S( ); ( ) ( ) S( : function Error parameter model ] ,..., , , [ structure model ) : ( ... 2 2 2 1 0 1 0 2 2 1 1 0     
  • 16. • • is defined for training data. • We are really interested in finding the best predicts y on future data, i.e., minimising sum of squared error where the expectation is over future data. • This is known as Empirical learning which is based on data on experiment. We are interested not only to minimise on the training data and but also to get the best prediction on unknown future data. value predicted model value Actual ) (    S ) ( S
  • 17. • The usual assumption is the way that past data behaved future data will also behave similarly. – If we have a model which minimises error on past data it will also minimise the error on future data. – If training data is large the model is simple, we are assuming that the best f on training data is also the best predictor f on future test data.
  • 18. Limitation of Linear regression • True relationship of X and Y might be non-linear – suggests generalisations to non-linear models • Complexity: – cost of computational operation and time complexity increases with number of attributes • Correlation/Co-linearity among the X variables – can cause numerical instability ( inverse does not exist if matrix is not a full ranked) – problems in interpretability (identifiability: determining whether the model true parameters may be recovered from the observed data) • Includes all variables in the model.. – But what if attributes =1000 and only 3 variables are actually related to Y?
  • 19. Complexity vs. goodness of fit • Suppose the regression model is a linear and it is too simple – Simple model does not fit the data well has large training set error – A biased solution • Suppose large data on training data itself makes model more complex nonlinear regression model – Complex model has low training set error but high error on future points causes overfitting – Small changes to the data, solution changes a lot – A high--‐variance solution
  • 20. • Occam’s Razor principle (Principle of Parsimony): – The principle states that "Entities should not be multiplied unnecessarily.“ – "when you have two competing theories that make exactly the same predictions, the simpler one is the better." – Use the simplest model which gives acceptable accuracy on training set –do not complicate the model to overfit the training data • Choose the model which sacrifice some training set errors for better performance on future sample. • Penalize complex models based on – Prior information (bias) – Information Criterion (MDL, AIC, BIC)
  • 21. Bias and variance for regression • For regression, we can easily decompose the error of the learned model into two parts: bias (error 1) and variance (error 2) • Bias: – The difference between the average prediction of our model and the correct value which we are trying to predict. – How much does the mean of the predictor differ from the optimal predictor • Variance: – The variability of model prediction for a given data point or a value which tells us spread of our data. – How much does the predictor vary about its mean for different training datasets – The variance of a learning algorithm is a measure of its precision. High variance error of a model implies that it is highly sensitive to small fluctuations.
  • 23. Training and Test Error • Given a dataset, training data used to fit the parameters of the model. Training data choose a loss function e.g., squared error for regression. • The training error is the mean error over the training sample. • The test (or generalization) error is the expected prediction error over an independent test sample. • Prediction error or true (generalization) error (over the whole population) is for the target performance measure, i.e., performance on a random test point (X,Y). • Training error is not a good estimator for test error.
  • 24. Model Complexity and Generalization • A models ability to adapt to patterns in the data, we call the model complexity. • A model with greater complexity might be theoretically more accurate (i.e., low bias). – But you have less control on what it might predict on a tiny training data set. – Different training data sets will result in widely varying predictions of same test instance. • Generalization ability: We want good predictions on new data, i.e., ‘generalization’. What is the out-of-sample error of learner f ? • Training error can be reduced by making the hypothesis more sensitive to training data, but this may lead to overfitting and poor generalization.
  • 25. Model Selection and Assessment • When we want to estimate test error, we may have two different goals in mind: 1. Model selection: Estimate the performance of different hypotheses or algorithms in order to choose the (approximately) best one. 2. Model assessment: Having chosen a final hypothesis or algorithm, estimate its generalization error on new data. • Trade-off between bias and variance: – Simple Models: High Bias, Low Variance – Complex Models: Low Bias, High Variance
  • 26. • Thus, a designer is virtually always confronted to the following dilemma: – On one hand, if the model is too simple, it will give a poor approximation of the phenomenon (underfitting). – On the other hand, if the model is too complex, it will be able to fit exactly the examples available, without finding a consistent way of modelling (overfitting).
  • 27. • Choice of models balances bias and variance. – Over‐fitting  Variance is too High – Under‐fitting  Bias is too High
  • 28. Training, Validation and Test Data • In a data-rich situation, the best approach to both model selection and model assessment is to randomly divide the dataset into three parts: 1. A training set used to fit the models. 2. A validation set (or development test set) used to estimate test error for model selection. 3. A test set (or evaluation test set) used for assessment of the generalization error of the finally chosen model.
  • 29. • Training: train different models • Validation: evaluate different models • Test: evaluate the accuracy of the final model The trained model can then be used to make predictions on unseen observations