Lecture 4 - Linear Regression, a lecture in subject module Statistical & Mach...Maninda Edirisooriya
Simplest Machine Learning algorithm or one of the most fundamental Statistical Learning technique is Linear Regression. This was one of the lectures of a full course I taught in University of Moratuwa, Sri Lanka on 2023 second half of the year.
06-01 Machine Learning and Linear Regression.pptxSaharA84
This document discusses machine learning and linear regression. It provides examples of supervised learning problems like predicting housing prices and classifying cancer as malignant or benign. Unsupervised learning is used to discover patterns in unlabeled data, like grouping customers for market segmentation. Linear regression finds the linear function that best fits some training data to make predictions on new data. It can be extended to nonlinear functions by adding polynomial features. More complex models may overfit the training data and not generalize well to new examples.
Unit-1 Introduction and Mathematical Preliminaries.pptxavinashBajpayee1
This document provides an introduction to pattern recognition and classification. It discusses key concepts such as patterns, features, classes, supervised vs. unsupervised learning, and classification vs. clustering. Examples of pattern recognition applications are given such as handwriting recognition, license plate recognition, and medical imaging. The main phases of developing a pattern recognition system are outlined as data collection, feature choice, model choice, training, evaluation, and considering computational complexity. Finally, some relevant basics of linear algebra are reviewed.
This document provides an overview of machine learning topics including linear regression, linear classification models, decision trees, random forests, supervised learning, unsupervised learning, reinforcement learning, and regression analysis. It defines machine learning, describes how machines learn through training, validation and application phases, and lists applications of machine learning such as risk assessment and fraud detection. It also explains key machine learning algorithms and techniques including linear regression, naive bayes, support vector machines, decision trees, gradient descent, least squares, multiple linear regression, bayesian linear regression, and types of machine learning models.
These slides were used in an introductory lecture to Computational Finance presented in a third-year class on Machine Learning and Artificial Intelligence. The slides present three examples of machine learning applied to computational / quantitative finance. These include
1) Model calibration (stochastic process) using the stochastic Hill Climbing algorithms.
2) Predicting Credit Default rates using a Neural Network
3) Portfolio Optimization using the Particle Swarm Optimization Algorithm.
All of the Python code is available for download on GitHub. Link is available at the end of the slide-show.
Data Science and Machine Learning with TensorflowShubham Sharma
Importance of Machine Learning and AI – Emerging applications, end-use
Pictures (Amazon recommendations, Driverless Cars)
Relationship betweeen Data Science and AI .
Overall structure and components
What tools can be used – technologies, packages
List of tools and their classification
List of frameworks
Artificial Intelligence and Neural Networks
Basics Of ML,AI,Neural Networks with implementations
Machine Learning Depth : Regression Models
Linear Regression : Math Behind
Non Linear Regression : Math Behind
Machine Learning Depth : Classification Models
Decision Trees : Math Behind
Deep Learning
Mathematics Behind Neural Networks
Terminologies
What are the opportunities for data analytics professionals
Machine learning is a form of artificial intelligence that allows systems to learn from data and improve automatically without being explicitly programmed. It works by building mathematical models based on sample data, known as "training data", in order to make predictions or decisions without being explicitly programmed to perform the task. Linear regression is a commonly used machine learning algorithm that allows predicting a dependent variable from an independent variable by finding the best fit line through the data points. It works by minimizing the sum of squared differences between the actual and predicted values of the dependent variable. Gradient descent is an optimization algorithm used to train machine learning models by minimizing a cost function relating predictions to ground truths.
This document provides an overview of machine learning concepts and techniques including linear regression, logistic regression, unsupervised learning, and k-means clustering. It discusses how machine learning involves using data to train models that can then be used to make predictions on new data. Key machine learning types covered are supervised learning (regression, classification), unsupervised learning (clustering), and reinforcement learning. Example machine learning applications are also mentioned such as spam filtering, recommender systems, and autonomous vehicles.
Supervised learning can be used for both regression and classification problems. Regression is used when the target variable is continuous, like time or salary, while classification is used when the target is discrete, like class labels. As an example, a simple linear regression model can be used to learn the relationship between Olympics year (single feature) and winning time for the 100m dash (continuous target). The model assumes a linear equation to approximate the target value as a function of the input feature. The learning task is to find the parameters of the linear model that best fit the training data as measured by a loss function, such as squared error between predicted and actual times.
This document discusses machine learning algorithms for classification problems, specifically logistic regression. It explains that logistic regression predicts the probability of a binary outcome using a sigmoid function. Unlike linear regression, which is used for continuous outputs, logistic regression is used for classification problems where the output is discrete/categorical. It describes how logistic regression learns model parameters through gradient descent optimization of a likelihood function to minimize error. Regularization techniques can also be used to address overfitting issues that may arise from limited training data.
Dataset Preparation
Abstract: This PDSG workshop introduces basic concepts on preparing a dataset for training a model. Concepts covered are data wrangling, replacing missing values, categorical variable conversion, and feature scaling.
Level: Fundamental
Requirements: No prior programming or statistics knowledge required.
The document outlines the course contents for a theory course on machine learning. It covers 5 units: (1) introduction to machine learning concepts including regression, probability, statistics, linear algebra, convex optimization, and data preprocessing; (2) linear and nonlinear models including neural networks, loss functions, and regularization; (3) convolutional neural networks; (4) recurrent neural networks; and (5) support vector machines and applications of machine learning. It also lists recommended textbooks on pattern recognition, machine learning, and deep learning.
A small informative presentation on machine learning.
It contains the following topics:
Introduction to ML
Types of Learning
Regression
Classification
Classification vs Regression
Clustering
Decision Tree Learning
Random Forest
True vs False
Positive vs Negative
Linear Regression
Logistic Regression
Application of Machine Learning
Future of Machine Learning
This document provides an overview of machine learning techniques using R. It discusses regression, classification, linear models, decision trees, neural networks, genetic algorithms, support vector machines, and ensembling methods. Evaluation metrics and algorithms like lm(), rpart(), nnet(), ksvm(), and ga() are presented for different machine learning tasks. The document also compares inductive learning, analytical learning, and explanation-based learning approaches.
Data Science and Machine Learning with TensorflowShubham Sharma
Importance of Machine Learning and AI – Emerging applications, end-use
Pictures (Amazon recommendations, Driverless Cars)
Relationship betweeen Data Science and AI .
Overall structure and components
What tools can be used – technologies, packages
List of tools and their classification
List of frameworks
Artificial Intelligence and Neural Networks
Basics Of ML,AI,Neural Networks with implementations
Machine Learning Depth : Regression Models
Linear Regression : Math Behind
Non Linear Regression : Math Behind
Machine Learning Depth : Classification Models
Decision Trees : Math Behind
Deep Learning
Mathematics Behind Neural Networks
Terminologies
What are the opportunities for data analytics professionals
Machine learning is a form of artificial intelligence that allows systems to learn from data and improve automatically without being explicitly programmed. It works by building mathematical models based on sample data, known as "training data", in order to make predictions or decisions without being explicitly programmed to perform the task. Linear regression is a commonly used machine learning algorithm that allows predicting a dependent variable from an independent variable by finding the best fit line through the data points. It works by minimizing the sum of squared differences between the actual and predicted values of the dependent variable. Gradient descent is an optimization algorithm used to train machine learning models by minimizing a cost function relating predictions to ground truths.
This document provides an overview of machine learning concepts and techniques including linear regression, logistic regression, unsupervised learning, and k-means clustering. It discusses how machine learning involves using data to train models that can then be used to make predictions on new data. Key machine learning types covered are supervised learning (regression, classification), unsupervised learning (clustering), and reinforcement learning. Example machine learning applications are also mentioned such as spam filtering, recommender systems, and autonomous vehicles.
Supervised learning can be used for both regression and classification problems. Regression is used when the target variable is continuous, like time or salary, while classification is used when the target is discrete, like class labels. As an example, a simple linear regression model can be used to learn the relationship between Olympics year (single feature) and winning time for the 100m dash (continuous target). The model assumes a linear equation to approximate the target value as a function of the input feature. The learning task is to find the parameters of the linear model that best fit the training data as measured by a loss function, such as squared error between predicted and actual times.
This document discusses machine learning algorithms for classification problems, specifically logistic regression. It explains that logistic regression predicts the probability of a binary outcome using a sigmoid function. Unlike linear regression, which is used for continuous outputs, logistic regression is used for classification problems where the output is discrete/categorical. It describes how logistic regression learns model parameters through gradient descent optimization of a likelihood function to minimize error. Regularization techniques can also be used to address overfitting issues that may arise from limited training data.
Dataset Preparation
Abstract: This PDSG workshop introduces basic concepts on preparing a dataset for training a model. Concepts covered are data wrangling, replacing missing values, categorical variable conversion, and feature scaling.
Level: Fundamental
Requirements: No prior programming or statistics knowledge required.
The document outlines the course contents for a theory course on machine learning. It covers 5 units: (1) introduction to machine learning concepts including regression, probability, statistics, linear algebra, convex optimization, and data preprocessing; (2) linear and nonlinear models including neural networks, loss functions, and regularization; (3) convolutional neural networks; (4) recurrent neural networks; and (5) support vector machines and applications of machine learning. It also lists recommended textbooks on pattern recognition, machine learning, and deep learning.
A small informative presentation on machine learning.
It contains the following topics:
Introduction to ML
Types of Learning
Regression
Classification
Classification vs Regression
Clustering
Decision Tree Learning
Random Forest
True vs False
Positive vs Negative
Linear Regression
Logistic Regression
Application of Machine Learning
Future of Machine Learning
This document provides an overview of machine learning techniques using R. It discusses regression, classification, linear models, decision trees, neural networks, genetic algorithms, support vector machines, and ensembling methods. Evaluation metrics and algorithms like lm(), rpart(), nnet(), ksvm(), and ga() are presented for different machine learning tasks. The document also compares inductive learning, analytical learning, and explanation-based learning approaches.
Telangana State, India’s newest state that was carved from the erstwhile state of Andhra
Pradesh in 2014 has launched the Water Grid Scheme named as ‘Mission Bhagiratha (MB)’
to seek a permanent and sustainable solution to the drinking water problem in the state. MB is
designed to provide potable drinking water to every household in their premises through
piped water supply (PWS) by 2018. The vision of the project is to ensure safe and sustainable
piped drinking water supply from surface water sources
computer organization and assembly language : its about types of programming language along with variable and array description..https://www.nfciet.edu.pk/
2. Data Mining problems
• Data mining problems are often divided into Predictive tasks and
Descriptive tasks.
• Predictive Analytics (Supervised learning):
Given observed data (x1,y1), (x2,y2),..., (xn, yn) learn a model to predict Y
from X.
If Yi is a continuous numeric value, this task is called prediction (E.g., Yi
= stock price, income, survival time)
If Yi is a discrete or symbolic value, this task is called classification
(E.g., Yi ϵ {0, 1}, Yi ϵ{spam, email}, Yi ϵ {1, 2, 3, 4})
• Descriptive Analytics (Unsupervised learning):
Given data x1,x2,..xn, identify some underlying patterns or structure in the
data.
3. Regression in data mining
• Predict real-valued output for given input-given a training set
– Example:
• Predict rainfall in cm for month
• Predict stock price in next day
• Predict number of users who will click on an internet
advertisement
4. • Classification problem
– A set of predefined categories/classes
– Training examples with attribute as well as class information
available- supervised learning
– Classification task-predict class table for a new example-
predictive mining
• Clustering task:
– No predefined classes
– Attempt to find homogeneous groups in data -exploratory data
mining
– Training examples have attribute values only
– No class information is available
5. • Regression:
– it is predictive data mining
– for attribute values of an example you have to predict the output
– output is not a class
– output is a real value
– supervised learning
6. Linear Regression
• Linear regression aims to predict the response Y by estimating the best
linear predictor: the linear function that is closest to the true regression
function f.
• Task: predict real valued Y, given real valued vector x using a regression
model f.
• Error function, e.g., least squares is often used.
• Why is this usually called linear regression?
– Model is linear in the parameters
• Goal: Function f applied to training data should produce values as close as
possible in aggregate to actual outputs
7. • For example
– xi=temperature today
– Yi=rainfall volume tomorrow
• Another example:
– Xi=temperature today
– Yi=traffic density
• Training set consists of pairs (x1,y1), (x2,y2),..., (xn,yn). And
regression task is predict value of yn+1 for xn+1.
• When x has a single value called Univariate regression
8. • Multivariate regression:
– Training set:
– There is a single output y but there are multiple input x1,x2.
Example: Predict temperature of a place based on humidity and
pressure.
– There can be multiple output also.
• Regression model:
Y=f(x1,x2,..xn) multivariate
Y=f(x) univariate
Y=output dependent variable
x1, x2,...,xn input or the independent variable
f: regression function or model
)
,
,
),....(
,
,
(
),
,
,
( 1
1
2
1
1
2
2
2
2
1
1
1
2
1
1
n
n
n
y
x
x
y
x
x
y
x
x
9. • The model f determines how the dependent variable y
depends on the independent variable x.
• Linear regression:
f is a linear function:
In general for linear regression:
Where a0 , a1,a2, an are the regression coefficients.
• Univariate case line
• Multivariate case plane
)
,...,
,
( 2
1 n
x
x
x
f
y
n
n x
a
x
a
x
a
a
y
...
2
2
1
1
0
1
1
0 x
a
a
y
n
n x
a
x
a
x
a
a
y
...
2
2
1
1
0
10. • Given :
Find a0, a1, such that
So that the line best fits the given data
a1,a2 are the slopes of the regression and a0 is the bias or axis intercept.
• Training a regression model:
Given : training set:
– Find the values of the regression coefficients that best matches /fits
the training data
– Univariate regression:
– Finds values of a0, a1 such that the line best fits the data.
)
,
(
),...,
,
(
),
,
( 2
2
1
1 n
n y
x
y
x
y
x
n
n x
a
x
a
x
a
a
y
...
2
2
1
1
0
1
1
0 x
a
a
y
)
,
,...,
,
),...,
,
,...,
,
(
),
,
,...,
,
( 2
1
2
2
2
2
2
1
1
1
1
2
1
1 n
n
k
n
n
k
k y
x
x
x
y
x
x
x
y
x
x
x
11. Least square error
• To find a line having the least error
• Define an error function of a line
• So define error function
• Where ei=difference between actual value of yi and model predicted value
of yi
• For a given value xi actual value is yi, predicted value is a0+a1xi
• So, error
• for univariate
• Here square is taken in error function as equal importance is given for
positive and negative, both are equally bad.
n
i
i
e
SSE
1
2
)]
(
[ 1
0 i
i
i x
a
a
y
e
2
1
1
0
1
2
)
(
n
i
i
i
n
i
i x
a
a
y
e
S
12. • For multivariate,
• Find values of regression coefficients a0, a1 , ... such that sum square error
is minimised
• Predictions based on this equation are the best predictions possible in the
sense that they will be unbiased (equal to the true values on the average)
and will have the smallest expected squared error compared to any
unbiased estimates under the following assumptions.
– Linearity of the relationship between dependent and independent variables
– Statistical independence of the errors
– Homoskedasticity or constant variance of the errors
– Normality of the error distribution
2
1
2
2
1
1
0
1
2
)
...
(
n
i
ik
k
i
i
i
n
i
i x
a
x
a
x
a
a
y
e
S
13. Linear Regression
e
e
e
i
X
f
i
y
a
a
a
a
x
a
a
y
X
f
x
a
x
a
x
a
a
y
i
i
i
k
k
i
k
k
k
k
)
S(
);
(
)
(
)
S(
:
function
Error
parameter
model
]
,...,
,
,
[
structure
model
)
:
(
...
2
2
2
1
0
1
0
2
2
1
1
0
16. •
• is defined for training data.
• We are really interested in finding the best predicts y on
future data, i.e., minimising sum of squared error where the
expectation is over future data.
• This is known as Empirical learning which is based on data on
experiment. We are interested not only to minimise on the
training data and but also to get the best prediction on
unknown future data.
value
predicted
model
value
Actual
)
(
S
)
(
S
17. • The usual assumption is the way that past data behaved future data will
also behave similarly.
– If we have a model which minimises error on past data it will also
minimise the error on future data.
– If training data is large the model is simple, we are assuming that the
best f on training data is also the best predictor f on future test data.
18. Limitation of Linear regression
• True relationship of X and Y might be non-linear
– suggests generalisations to non-linear models
• Complexity:
– cost of computational operation and time complexity increases with
number of attributes
• Correlation/Co-linearity among the X variables
– can cause numerical instability ( inverse does not exist if matrix is not
a full ranked)
– problems in interpretability (identifiability: determining whether the
model true parameters may be recovered from the observed data)
• Includes all variables in the model..
– But what if attributes =1000 and only 3 variables are actually related
to Y?
19. Complexity vs. goodness of fit
• Suppose the regression model is a
linear and it is too simple
– Simple model does not fit the data
well has large training set error
– A biased solution
• Suppose large data on training data
itself makes model more complex
nonlinear regression model
– Complex model has low training set
error but high error on future points
causes overfitting
– Small changes to the data, solution
changes a lot
– A high--‐variance solution
20. • Occam’s Razor principle (Principle of Parsimony):
– The principle states that "Entities should not be multiplied
unnecessarily.“
– "when you have two competing theories that make exactly the
same predictions, the simpler one is the better."
– Use the simplest model which gives acceptable accuracy on
training set –do not complicate the model to overfit the training
data
• Choose the model which sacrifice some training set errors for
better performance on future sample.
• Penalize complex models based on
– Prior information (bias)
– Information Criterion (MDL, AIC, BIC)
21. Bias and variance for regression
• For regression, we can easily decompose the error of the learned model into two
parts: bias (error 1) and variance (error 2)
• Bias:
– The difference between the average prediction of our model and the correct
value which we are trying to predict.
– How much does the mean of the predictor differ from the optimal predictor
• Variance:
– The variability of model prediction for a given data point or a value which tells
us spread of our data.
– How much does the predictor vary about its mean for different training
datasets
– The variance of a learning algorithm is a measure of its precision. High
variance error of a model implies that it is highly sensitive to small
fluctuations.
23. Training and Test Error
• Given a dataset, training data used to fit the parameters of the
model. Training data choose a loss function e.g., squared error for
regression.
• The training error is the mean error over the training sample.
• The test (or generalization) error is the expected prediction error
over an independent test sample.
• Prediction error or true (generalization) error (over the whole
population) is for the target performance measure, i.e.,
performance on a random test point (X,Y).
• Training error is not a good estimator for test error.
24. Model Complexity and Generalization
• A models ability to adapt to patterns in the data, we call the model
complexity.
• A model with greater complexity might be theoretically more accurate
(i.e., low bias).
– But you have less control on what it might predict on a tiny training
data set.
– Different training data sets will result in widely varying predictions of
same test instance.
• Generalization ability: We want good predictions on new data, i.e.,
‘generalization’. What is the out-of-sample error of learner f ?
• Training error can be reduced by making the hypothesis more sensitive to
training data, but this may lead to overfitting and poor generalization.
25. Model Selection and Assessment
• When we want to estimate test error, we may have two different goals in
mind:
1. Model selection: Estimate the performance of different hypotheses or
algorithms in order to choose the (approximately) best one.
2. Model assessment: Having chosen a final hypothesis or algorithm,
estimate its generalization error on new data.
• Trade-off between bias and variance:
– Simple Models: High Bias, Low Variance
– Complex Models: Low Bias, High Variance
26. • Thus, a designer is virtually always confronted to the following
dilemma:
– On one hand, if the model is too simple, it will give a poor
approximation of the phenomenon (underfitting).
– On the other hand, if the model is too complex, it will be able to
fit exactly the examples available, without finding a consistent
way of modelling (overfitting).
27. • Choice of models balances bias and variance.
– Over‐fitting Variance is too High
– Under‐fitting Bias is too High
28. Training, Validation and Test Data
• In a data-rich situation, the best approach to both model selection and
model assessment is to randomly divide the dataset into three parts:
1. A training set used to fit the models.
2. A validation set (or development test set) used to estimate test error for
model selection.
3. A test set (or evaluation test set) used for assessment of the
generalization error of the finally chosen model.
29. • Training: train different models
• Validation: evaluate different models
• Test: evaluate the accuracy of the final model
The trained model can then be used to make predictions on
unseen observations