8.1-linear-regression-and-correlation-analysis-glossary
8.1-linear-regression-and-correlation-analysis-glossary
Scatter plot
y (dependent variable) – number of cans
x (independent variable) – temperature
Elaborated by: Ing. Martina Majorová, Dept. of Statistics and Operations Research, FEM SUA in Nitra
Reference: JAISINGH, L.: Statistics for the Utterly Confused
-1-
Topic 8 Regression and correlation analysis
Two variables are said to be positively related if larger values of one variable tend to be
associated with larger values of the other.
Two variables are said to be negatively related if larger values of one variable tend to be
associated with smaller values of the other.
If the data are on the straight line, there is a perfect association (positive or negative)
between the variable(s).
Scatter plots can display various patterns:
• linear – data are displayed in the scatter plot in a linear form
• nonlinear - data are displayed in the scatter plot in a nonlinear form
Elaborated by: Ing. Martina Majorová, Dept. of Statistics and Operations Research, FEM SUA in Nitra
Reference: JAISINGH, L.: Statistics for the Utterly Confused
-2-
Topic 8 Regression and correlation analysis
Elaborated by: Ing. Martina Majorová, Dept. of Statistics and Operations Research, FEM SUA in Nitra
Reference: JAISINGH, L.: Statistics for the Utterly Confused
-3-
Topic 8 Regression and correlation analysis
Note:
Plots of residuals may display patterns that would give some idea about the appropriateness
of the model. If the functional form of the regression model is incorrect, the residual plots
constructed by using the model will often display a pattern. The pattern can then be used to
propose a more appropriate model.
Note: Least squares method is the approach to develop the estimated regression
equation which minimizes the sum of squared residuals and at the same time requires
the sum of residuals to be zero.
Elaborated by: Ing. Martina Majorová, Dept. of Statistics and Operations Research, FEM SUA in Nitra
Reference: JAISINGH, L.: Statistics for the Utterly Confused
-5-
Topic 8 Regression and correlation analysis
Regression analysis report (simple regression where monthly sales for certain goods represent
a dependent variable and advertising costs represent an independent variable)
Interpretation of simple regression analysis report:
Regression statistics – ukazovatele tesnosti závislosti a kvality modelu
Multiple R – (jednoduchý) korelačný koeficient
Multiple R=0.9146, i.e. there is a very strong relationship between the dependent and the
independent variable (even a small change in the values of the independent variable will
greatly affect the values of the dependent variable). The closer the value to 1 is, the better it is.
R Square – index (koeficient) determinácie
R Square=0.8365, i.e. approximately 83.65% of variability of the dependent variable is
expressed by the regression model through the independent variable. The closer the value to 1
is, the better it is.
Adjusted R Square – korigovaný (upravený) koeficient determinácie
Adjusted R Square=0.8239; it is a measure of the goodness of fit for the estimated regression
equation (as R Square) which accounts for the number of independent variables for the model
(that is why its value is smaller than the value of R Square)
Note: if the value of R Square is small and the model contains a large number of independent
variables, the adjusted coefficient of determination can take on negative value
Standard Error – štandardná chyba modelu
Standard Error=13.2056; it is an error which is still present in each regression model due to a
human factor; the smaller the standard error is, the better it is
Observation – počet pozorovaní
Observation=15, i.e. there were 15 observations both for the dependent variable and for the
independent variable
Elaborated by: Ing. Martina Majorová, Dept. of Statistics and Operations Research, FEM SUA in Nitra
Reference: JAISINGH, L.: Statistics for the Utterly Confused
-7-
Topic 8 Regression and correlation analysis
Regression analysis report (multiple regression where monthly sales for certain goods
represent a dependent variable and advertising costs, number of agents represent the
independent variables)
Interpretation of multiple regression analysis report is the same as interpretation of simple
regression analysis report but there is one difference when interpreting the regression
coefficients (slopes)
Closing remarks:
Synonyms
Elaborated by: Ing. Martina Majorová, Dept. of Statistics and Operations Research, FEM SUA in Nitra
Reference: JAISINGH, L.: Statistics for the Utterly Confused
-8-