0% found this document useful (0 votes)

14 views

8.1-linear-regression-and-correlation-analysis-glossary

Uploaded by

Atul Sangal (SUSBS Associate Professor)

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

14 views

8.1-linear-regression-and-correlation-analysis-glossary

Uploaded by

Atul Sangal (SUSBS Associate Professor)

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 8

Topic 8 Regression and correlation analysis

Regression and correlation analysis – regresná a korelačná analýza

Definition:
Regression and correlation analysis investigates the relationships between two or more
quantitative statistical attributes (bivariate data). Its aim is to find a suitable regression line,
parameter estimates (regression analysis) and to measure the goodness of fit (correlation
analysis).
Regression analysis is a statistical procedure that can be used to develop a mathematical
equation showing how variables are related. Correlation analysis is a procedure for
determining the extent to which the variables are linearly related. If such a relationship exists,
correlation analysis is used for providing a measure of the relative strength of the relationship.
Note:
In simple correlation and regression studies, data are collected on two quantitative variables to
determine whether a relationship exists between the two variables. To graphically analyze the
data, we can display the data on a two-dimensional graph. Such plots are called scatter plots.
The variable along the vertical axis is called the dependent variable, and the variable along
the horizontal axis is called the independent variable. The variable that is being predicted by
the mathematical equation is called the dependent variable. The variable(s) being used to
predict the value of the dependent variable are called the independent variables.
Notation: We will let y represent the dependent variable, and we will let x represent the
independent variable.
In multiple correlation and regression studies, data are collected on more than two
quantitative variables (one dependent variable and more than one independent variable) to
determine whether a relationship exists between these variables.
Deterministic model is a relationship between an independent variable and a dependent
variable whereby specifying the value of the independent variable allows one to compute
exactly the value of the dependent variable (i.e. predicted values).
Probabilistic model is a relationship between an independent variable and a dependent
variable in which specifying the value of the independent variable is not sufficient to allow
determination of the value of the dependent variable (i.e. predicted values).

Scatter plot – graf závislostí

Definition:
A scatter plot is a graph of the ordered pairs (x, y) of values for the independent variable x
and the dependent variable y.

Scatter plot
y (dependent variable) – number of cans
x (independent variable) – temperature

Elaborated by: Ing. Martina Majorová, Dept. of Statistics and Operations Research, FEM SUA in Nitra
Reference: JAISINGH, L.: Statistics for the Utterly Confused
-1-
Topic 8 Regression and correlation analysis

Two variables are said to be positively related if larger values of one variable tend to be
associated with larger values of the other.
Two variables are said to be negatively related if larger values of one variable tend to be
associated with smaller values of the other.
If the data are on the straight line, there is a perfect association (positive or negative)
between the variable(s).
Scatter plots can display various patterns:
• linear – data are displayed in the scatter plot in a linear form
• nonlinear - data are displayed in the scatter plot in a nonlinear form

Perfect negative association Very strong positive association

Perfect positive association Very strong negative association

No association Nonlinear association

Elaborated by: Ing. Martina Majorová, Dept. of Statistics and Operations Research, FEM SUA in Nitra
Reference: JAISINGH, L.: Statistics for the Utterly Confused
-2-
Topic 8 Regression and correlation analysis

Correlation coefficient – korelačný koeficient

Definition:
Correlation coefficient is a numerical measure of the association between two variables. It
measures the strength and direction of a relationship between two variables using. It is
denoted by the letter r (sample correlation coefficient) or ρ (population correlation
coefficient).

Properties of the correlation coefficient:

• The range of the correlation coefficient is from -1 to +l.
• If there is a perfect positive linear relationship between the variables, the value of r
will be equal to +l.
• If there is a perfect negative linear relationship between the variables, the value of r
will be equal to -1.
• If there is a strong positive linear relationship between the variables, the value of r
will be close to +l.
• If there is a strong negative linear relationship between the variables, the value of r
will be close to -1.
• If there is little or no linear relationship between the variables, the value of r will be
close to 0.

Coefficient of determination – koeficient (index) determinácie

Definition:
The coefficient of determination measures the proportion of the variability in the dependent
variable (y variable) that is explained by the regression model through the independent
variable (x variable). It is a measure of the goodness of fit for the estimated regression model.

Properties of the coefficient of determination:

• The coefficient of determination is obtained by squaring the value of the correlation
coefficient.
• The symbol used is R2.
2
• Note that 0 ≤ R ≤ 1 .
• R2 values close to 1 would imply that the model is explaining most of the variation in
the dependent variable and may be a very useful model.
• R2 values close to 0 would imply that the model is explaining little of the variation in
the dependent variable and may not be a useful model.

Elaborated by: Ing. Martina Majorová, Dept. of Statistics and Operations Research, FEM SUA in Nitra
Reference: JAISINGH, L.: Statistics for the Utterly Confused
-3-
Topic 8 Regression and correlation analysis

Residual plots – grafy rezíduí

Definition:
Residuals are just errors. In particular, a residual is the difference between an actual observed
y value and the corresponding predicted y value, i.e. ei = Yi − Yˆi . Standardized residual is
the value obtained by dividing the residual by its standard deviation.

Note:
Plots of residuals may display patterns that would give some idea about the appropriateness
of the model. If the functional form of the regression model is incorrect, the residual plots
constructed by using the model will often display a pattern. The pattern can then be used to
propose a more appropriate model.

Linear residual plot Nonlinear residual plot

Outliers and Influential points – outliery a vplyvné body

Definition:
A value that is well separated from the rest of the data set is called an outlier. With respect to
the line of best fit, an outlier is an observation with a large absolute residual value. That is, an
outlier will fall far from the regression line and will not follow the pattern of the linear
relationship expressed by the line of best fit. An observation that causes the values of the
slope and the intercept in the line of best fit to be considerably different from what they would
be if the observation were removed from the data set is said to be influential.

Plot illustrating outliers and influential points

Elaborated by: Ing. Martina Majorová, Dept. of Statistics and Operations Research, FEM SUA in Nitra
Reference: JAISINGH, L.: Statistics for the Utterly Confused
-4-
Topic 8 Regression and correlation analysis

Least-Squares Regression Line – regresná priamka MNŠ

In investigating the relationship between two variables, the first thing one should do is to
prepare a scatter plot after the data are collected. From the plot, one can observe any pattern.
If the correlation coefficient is reasonably large (positive or negative), the next step would be
to fit the regression line which best fits or models the data (line of best fit).

Line of best fit

Regression analysis allows us to determine which of the two lines best represents the
relationship. In elementary statistics, the equation of the regression line is usually written as
yˆ = ax + b , where a is the slope, b is they intercept, and ŷ is read as "y hat," and it gives
the predicted y value for a given x value. Least-squares analysis allows us to determine values
for a and b such that the equation of the regression line best represents the relationship
between the two variables by minimizing the error sum of squares-that is, by minimizing
∑ ( yi − yˆ i ) , where ( yi − yˆ i ) is the error for a given y value. This regression line is
2
usually called the line of best fit. We usually refer to this type of regression analysis as simple
regression analysis, since we are dealing only with straight-line models involving one
independent variable. If there are more than one independent (x) variable then this type of
regression analysis is known as multiple regression analysis. The equations that one can use
to compute the values for a and b are:

Note: Least squares method is the approach to develop the estimated regression
equation which minimizes the sum of squared residuals and at the same time requires
the sum of residuals to be zero.

Elaborated by: Ing. Martina Majorová, Dept. of Statistics and Operations Research, FEM SUA in Nitra
Reference: JAISINGH, L.: Statistics for the Utterly Confused
-5-
Topic 8 Regression and correlation analysis

Regression analysis report (simple regression where monthly sales for certain goods represent
a dependent variable and advertising costs represent an independent variable)
Interpretation of simple regression analysis report:
Regression statistics – ukazovatele tesnosti závislosti a kvality modelu
Multiple R – (jednoduchý) korelačný koeficient
Multiple R=0.9146, i.e. there is a very strong relationship between the dependent and the
independent variable (even a small change in the values of the independent variable will
greatly affect the values of the dependent variable). The closer the value to 1 is, the better it is.
R Square – index (koeficient) determinácie
R Square=0.8365, i.e. approximately 83.65% of variability of the dependent variable is
expressed by the regression model through the independent variable. The closer the value to 1
is, the better it is.
Adjusted R Square – korigovaný (upravený) koeficient determinácie
Adjusted R Square=0.8239; it is a measure of the goodness of fit for the estimated regression
equation (as R Square) which accounts for the number of independent variables for the model
(that is why its value is smaller than the value of R Square)
Note: if the value of R Square is small and the model contains a large number of independent
variables, the adjusted coefficient of determination can take on negative value
Standard Error – štandardná chyba modelu
Standard Error=13.2056; it is an error which is still present in each regression model due to a
human factor; the smaller the standard error is, the better it is
Observation – počet pozorovaní
Observation=15, i.e. there were 15 observations both for the dependent variable and for the
independent variable

ANOVA – výstup pre analýzu rozptylu

Regression – vysvetlená variabilita
Residual – nevysvetlená (reziduálna) variabilita
Total – celková variabilita
SS (sum of squares) – súčet štvorcov
MS (mean squares) – priemer štvorcov
Elaborated by: Ing. Martina Majorová, Dept. of Statistics and Operations Research, FEM SUA in Nitra
Reference: JAISINGH, L.: Statistics for the Utterly Confused
-6-
Topic 8 Regression and correlation analysis

F – testovacie kritérium F testu

Significance F - teoretická hladina významnosti, pomocou ktorej vyhodnocujeme test
(Significance F is compared with the level of significance – alpha):
if Significance F > 0,05 the regression model as a whole is not statistically significant (–)
if Significance F < 0,05 the regression model as a whole is statistically significant at the
0.05 level of significance (+)
if Significance F < 0,01 the regression model as a whole is statistically significant at the
0.01 level of significance (++)
Significance F =1,81E-06 (i.e. 1,81 ⋅ 10 −6 )
Significance F < 0,01 the regression model as a whole is statistically significant at the 0.01
level of significance (++)

Parameter estimates and their significance

Coefficients – vypočítané koeficienty (parametre) regresného modelu
Intercept – lokujúca konštanta, absolútny člen, aditívna premenná (všetko synonymické
výrazy); it is the intersection of y (dependent variable) when x (independent variable) is equal
to zero
Intercept=32.468, i.e. sales for certain goods are equal to 32.4mil.SKK at zero advertising
costs
X Variable1 – regresný koeficient; it is a slope (a change in dependent variable divided by
a change in independent variable)
X Variable1=14.6, i.e. if we increase advertising costs by a single unit (1 mil. SKK) then
sales for certain goods will increase by 14.6mil.SKK
Standard error – štandardná chyba jednotlivých parametrov (it should not be more than half
the value of the referring coefficient)
tStat – test statistics for t-test in which we analyze the significance of referring coefficients
p-value – probability value; p-value is compared with the level of significance – alpha:
if p-value > 0,05 the referring coefficient is not statistically significant (–)
if p-value < 0,05 the referring coefficient is statistically significant at the 0.05 level of
significance (+)
if p-value < 0,01 the referring coefficient is statistically significant at the 0.01 level of
significance (++)
p-value (Intercept) =0,016
p-value < 0,05 the referring coefficient is statistically significant at the 0.05 level of
significance (+)
p-value (X Variable1)=1,81E-06 (i.e. 1,81 ⋅ 10 −6 )
p-value < 0,01 the referring coefficient is statistically significant at the 0.01 level of
significance (++)
Lower 95%, Upper 95% - 95% interval estimates of parameters (coefficients); value of the
refering coefficient must lie in the given interval

Elaborated by: Ing. Martina Majorová, Dept. of Statistics and Operations Research, FEM SUA in Nitra
Reference: JAISINGH, L.: Statistics for the Utterly Confused
-7-
Topic 8 Regression and correlation analysis

Regression analysis report (multiple regression where monthly sales for certain goods
represent a dependent variable and advertising costs, number of agents represent the
independent variables)
Interpretation of multiple regression analysis report is the same as interpretation of simple
regression analysis report but there is one difference when interpreting the regression
coefficients (slopes)

X Variable1=138.1; i.e. if we increase advertising costs by a single unit (1mil.SKK) and

other independent variable(s) are held constant (ceteris paribus) then sales for certain
goods will increase by 138.1mil.SKK
X Variable2=1.38; i.e. if we increase number of agents by a single unit (1 person) and other
independent variable(s) are held constant (ceteris paribus) then sales for certain goods
will increase by 1.38mil.SKK

Closing remarks:

Synonyms

Elaborated by: Ing. Martina Majorová, Dept. of Statistics and Operations Research, FEM SUA in Nitra
Reference: JAISINGH, L.: Statistics for the Utterly Confused
-8-

Regression Analysis
No ratings yet
Regression Analysis
12 pages
Econometrics: A Simple Introduction
From Everand
Econometrics: A Simple Introduction
K.H. Erickson
3.5/5 (5)
Corr_Regression Analysis
No ratings yet
Corr_Regression Analysis
19 pages
CH 5 - Correlation and Regression
No ratings yet
CH 5 - Correlation and Regression
9 pages
Chapter 5 - 1
No ratings yet
Chapter 5 - 1
5 pages
Econometrics 2
No ratings yet
Econometrics 2
27 pages
Class Note II_044242
No ratings yet
Class Note II_044242
19 pages
06 Simple Linear Regression Part1
No ratings yet
06 Simple Linear Regression Part1
8 pages
Chapter_10.QM sir pac
No ratings yet
Chapter_10.QM sir pac
8 pages
Research-Methodology-Litrature-Review of Fii N Fdi 2003
No ratings yet
Research-Methodology-Litrature-Review of Fii N Fdi 2003
12 pages
Correlation and Simple Linear Regression Analyses: Objectives
No ratings yet
Correlation and Simple Linear Regression Analyses: Objectives
6 pages
Final Report - Introduction of Regression Analysis
No ratings yet
Final Report - Introduction of Regression Analysis
5 pages
Correlation and Regression
No ratings yet
Correlation and Regression
3 pages
Regression & Correlation 230224 221642
No ratings yet
Regression & Correlation 230224 221642
9 pages
QT _Unit 2_Part B - Regression
No ratings yet
QT _Unit 2_Part B - Regression
40 pages
APNotes Chap03
No ratings yet
APNotes Chap03
2 pages
Topic 5-Lecture Notes
No ratings yet
Topic 5-Lecture Notes
12 pages
Correlation & Simple Regression
No ratings yet
Correlation & Simple Regression
15 pages
Correlation and Regration
No ratings yet
Correlation and Regration
8 pages
DSC 402
No ratings yet
DSC 402
14 pages
Regression and Correlation Analysis
No ratings yet
Regression and Correlation Analysis
16 pages
Correlation and Regression
No ratings yet
Correlation and Regression
15 pages
How Can We Explore The Association Between Two Quantitative Variables?
No ratings yet
How Can We Explore The Association Between Two Quantitative Variables?
7 pages
Chapter 3 Describing Relationships
No ratings yet
Chapter 3 Describing Relationships
39 pages
Linear Regression Analysis
No ratings yet
Linear Regression Analysis
17 pages
Cha 6
No ratings yet
Cha 6
8 pages
Assignment 12'
No ratings yet
Assignment 12'
6 pages
Correlation 140708105710 Phpapp01
No ratings yet
Correlation 140708105710 Phpapp01
21 pages
Module 6 RM: Advanced Data Analysis Techniques
No ratings yet
Module 6 RM: Advanced Data Analysis Techniques
23 pages
REGRESION
No ratings yet
REGRESION
11 pages
BRM File
No ratings yet
BRM File
35 pages
Unit 2 - Scatterplots Correlation and Regression Summer 2021
No ratings yet
Unit 2 - Scatterplots Correlation and Regression Summer 2021
43 pages
Simple Linear Correlation-1
No ratings yet
Simple Linear Correlation-1
15 pages
Module 2 - Section 4 (Linear Regression) - 11
No ratings yet
Module 2 - Section 4 (Linear Regression) - 11
20 pages
Investigating Variables
No ratings yet
Investigating Variables
15 pages
Simple and Multiple Linear Regression
No ratings yet
Simple and Multiple Linear Regression
91 pages
Simple Linear Regression (1)
No ratings yet
Simple Linear Regression (1)
83 pages
Regression: by Vijeta Gupta Amity University
No ratings yet
Regression: by Vijeta Gupta Amity University
15 pages
CORRELATION[1]
No ratings yet
CORRELATION[1]
23 pages
Stat II Chapter 6
No ratings yet
Stat II Chapter 6
11 pages
Regression Analysis
No ratings yet
Regression Analysis
7 pages
Correlation
No ratings yet
Correlation
5 pages
Regression Make Simple
No ratings yet
Regression Make Simple
13 pages
Difference Between Correlation and Regression
No ratings yet
Difference Between Correlation and Regression
7 pages
Correlation Regression
100% (1)
Correlation Regression
25 pages
Regression and Correlation
No ratings yet
Regression and Correlation
37 pages
SQQS2073 Note 1 Simple Linear Regression
No ratings yet
SQQS2073 Note 1 Simple Linear Regression
11 pages
Correlation Regression
No ratings yet
Correlation Regression
58 pages
4.analyze and Data Driven - Facebook
No ratings yet
4.analyze and Data Driven - Facebook
27 pages
Lesson 6.2 Correlation and Regression Analysis Final Edition
No ratings yet
Lesson 6.2 Correlation and Regression Analysis Final Edition
8 pages
Chapter 3 - Regression
No ratings yet
Chapter 3 - Regression
8 pages
Research paper
No ratings yet
Research paper
47 pages
RM Chap 18 Bivariate Analysis
No ratings yet
RM Chap 18 Bivariate Analysis
30 pages
Corelation & Regression
No ratings yet
Corelation & Regression
21 pages
Regression Analysis
No ratings yet
Regression Analysis
18 pages
DrSoomro - 2588 - 20292 - 1 - Lecture 9
No ratings yet
DrSoomro - 2588 - 20292 - 1 - Lecture 9
29 pages
sem 6 ques data science
No ratings yet
sem 6 ques data science
23 pages
Correlation and Regression: Six Sigma Thinking, #8
From Everand
Correlation and Regression: Six Sigma Thinking, #8
Sumeet Savant
5/5 (1)
Correlation Coefficient: How Well Does Your Regression Equation Truly Represent Your Set of Data?
No ratings yet
Correlation Coefficient: How Well Does Your Regression Equation Truly Represent Your Set of Data?
3 pages
Exercises of Advanced Statistics
From Everand
Exercises of Advanced Statistics
Simone Malacrida
No ratings yet
Simple Linear Regression
No ratings yet
Simple Linear Regression
6 pages
Research Methodology
No ratings yet
Research Methodology
69 pages
RM QP 2
No ratings yet
RM QP 2
3 pages
Assignment_ Theory of Estimation
No ratings yet
Assignment_ Theory of Estimation
2 pages
ProductComparison - 27 06 2023 - 15 54 40
No ratings yet
ProductComparison - 27 06 2023 - 15 54 40
1 page
AISC Steel Construction Manual 13th Edition
No ratings yet
AISC Steel Construction Manual 13th Edition
5 pages
Gas Absorption 218125584
No ratings yet
Gas Absorption 218125584
20 pages
65772bbf798d0900185949aa - ## - Interaction Session - Class Notes - Lakshya JEE 2024
No ratings yet
65772bbf798d0900185949aa - ## - Interaction Session - Class Notes - Lakshya JEE 2024
3 pages
Be - Electronics and Telecommunication Engineering - Semester 7 - 2022 - October - Vlsi Design and Technology Pattern 2019
No ratings yet
Be - Electronics and Telecommunication Engineering - Semester 7 - 2022 - October - Vlsi Design and Technology Pattern 2019
1 page
3590ET - 3590EGT: Serial Commands
No ratings yet
3590ET - 3590EGT: Serial Commands
64 pages
Purging Workflow Tables
No ratings yet
Purging Workflow Tables
10 pages
Cepstral Analysis: Appendix 3
No ratings yet
Cepstral Analysis: Appendix 3
3 pages
Transformers and Equipment: Abstract
No ratings yet
Transformers and Equipment: Abstract
46 pages
222 Chapter 1
No ratings yet
222 Chapter 1
22 pages
Managing Money Exam Questions
No ratings yet
Managing Money Exam Questions
16 pages
KSB KWP: Non-Clogging Centrifugal Pumps
No ratings yet
KSB KWP: Non-Clogging Centrifugal Pumps
12 pages
EndSem EE602 2020 1 Qpaper
No ratings yet
EndSem EE602 2020 1 Qpaper
3 pages
Diagramming G 1
No ratings yet
Diagramming G 1
14 pages
NBTE
70% (10)
NBTE
128 pages
Olychordsto Polya
100% (1)
Olychordsto Polya
177 pages
Kriele. Spacetime - Foundations of General Relativity and Differential Geometry PDF
100% (5)
Kriele. Spacetime - Foundations of General Relativity and Differential Geometry PDF
444 pages
1 Pricelist PPR Pipes Url1649776675
No ratings yet
1 Pricelist PPR Pipes Url1649776675
32 pages
19 Boiler ESP System Commissioning Procedure-Töàtéëtö TÖñs Ÿt +T+ƑF Âf Ò Ä Û+
No ratings yet
19 Boiler ESP System Commissioning Procedure-Töàtéëtö TÖñs Ÿt +T+ƑF Âf Ò Ä Û+
31 pages
sample-pdf-of-std-11-science-physics-perfect-notes-sample-content-3270-t3715 (1) (1)
No ratings yet
sample-pdf-of-std-11-science-physics-perfect-notes-sample-content-3270-t3715 (1) (1)
32 pages
2-11 Process Constraint Identification (ABBE-R031104)
No ratings yet
2-11 Process Constraint Identification (ABBE-R031104)
37 pages
Dictionary: Arabic English Bilingual Beginners Lexicon
100% (8)
Dictionary: Arabic English Bilingual Beginners Lexicon
1 page
Tutorial Fpga Design Flow Based Xilinx Ise and Isim
No ratings yet
Tutorial Fpga Design Flow Based Xilinx Ise and Isim
44 pages
WinstonChurchillFixedStarsSample PDF
No ratings yet
WinstonChurchillFixedStarsSample PDF
37 pages
WM - LX17 - THL - Display List of Inventory Differences
No ratings yet
WM - LX17 - THL - Display List of Inventory Differences
17 pages
Acn QB
No ratings yet
Acn QB
44 pages
The Sexuality Scale An Instrument To Measure Sexual-Esteem, Sexual-Depression, and Sexual-Preoccupation
No ratings yet
The Sexuality Scale An Instrument To Measure Sexual-Esteem, Sexual-Depression, and Sexual-Preoccupation
29 pages
7 Inch Liner Cementing Program
No ratings yet
7 Inch Liner Cementing Program
44 pages
AU7842
No ratings yet
AU7842
17 pages
Q1 PR2 LAS WEEK 1 Characterisics, Strngths ND Weakness
No ratings yet
Q1 PR2 LAS WEEK 1 Characterisics, Strngths ND Weakness
14 pages

8.1-linear-regression-and-correlation-analysis-glossary

Uploaded by

8.1-linear-regression-and-correlation-analysis-glossary

Uploaded by

Topic 8 Regression and correlation analysis

Regression and correlation analysis – regresná a korelačná analýza

Scatter plot – graf závislostí

Perfect negative association Very strong positive association

Perfect positive association Very strong negative association

No association Nonlinear association

Correlation coefficient – korelačný koeficient

Properties of the correlation coefficient:

Coefficient of determination – koeficient (index) determinácie

Properties of the coefficient of determination:

Residual plots – grafy rezíduí

Linear residual plot Nonlinear residual plot

Outliers and Influential points – outliery a vplyvné body

Plot illustrating outliers and influential points

Least-Squares Regression Line – regresná priamka MNŠ

Line of best fit

ANOVA – výstup pre analýzu rozptylu

F – testovacie kritérium F testu

Parameter estimates and their significance

X Variable1=138.1; i.e. if we increase advertising costs by a single unit (1mil.SKK) and

You might also like