Stat Notes On Correlation Regression
Stat Notes On Correlation Regression
(Notes On Statistics)
Correlation
A perfect correlation, r = 1.0 means that as one variable increases, the other increases
the same amount. An example of a perfect correlation is shown in Table 1. As x increases, y
also increases the same. Figure 1 shows a straight line curve of the data set A.
A perfect negative correlation, r=-1.0, shows that as one variable increases, the other
decreases the same amount. The trend in data set B (Table 2) shows a perfect negative
correlation. As x increases, y decreases the same amount. The trend is also shown in Figure
2.
Table 2. Data set B
x y
1 1
2 2
3 3
4 4
5 5
When a low correlation exists between two variables, the points where x and y scores
meet are scattered in the graph (Figure 3) and no trend is observed.
Formula:
Where:
ɼxy = coefficient of correlation
Σd2x = sum of d2x column
Σd2y = sum of Σd2y column
Σdx dy = sum of dxdy column
Example: Find the correlation between the grain yield in tons/ha and the number of tillers/m 2 of
the PSB-RC 22 variety.
Solution:
Step 1.
Ho: there is no significant correlation between the grain yield/ha and the number of
tillers/m2
Ha: there is a significant correlation between the grain yield/ha and the number of tillers/m 2
Step 2.
Level of significance at 5%, degrees of freedom = n-1
Step 3.
Test statistic: ɼxy = ___Σdx dy___
√ ( Σd2x) (Σd2y)
Step 4.
Computation:
ɼxy = 0.84
Step 5. Tabular value of ɼ at the 0.05 level with df=7 is 0.6664, which is lower than computed
ɼ =0.84
Step 6. Since the computed value is greater than the tabular value, reject the null hypothesis
Step 7. Interpretation: There is a significant relationship between the grain yield and the
number of tillers/m2 of PSB RC 22 rice variety.
Step 8. Decision: The more tillers produced the higher the grain yield.
Regression analysis is the process by which the values of one variable (criterion variable
or Y) is predicted from the information obtained from another variable (predictor variable or X).
In the above example, one can predict the number of grains per panicle from the number of
tillers. The criterion variable (X) is the number of grains per panicle, while the predictor variable
(Y) is the number of tillers.
Correlation and regression are related. If there is perfect correlation, there is also
perfect prediction or regression. If there is no correlation, there is no way Y can predict X.
Regression Equation
The regression equation is expressed in the equation of the straight line, or least square
regression line. By least square, it means that the most accurate trend line that maybe drawn is
one where the sum of the squares of the vertical distances of the points from the line is least or
a minimum.
y = a + bx
a= –b
On the other hand, the regression coefficient or b is the slope of the line. It indicates the
increase in y for every unit increase in x. For example, if the value of b is 1.0, it means that y
increases by one unit for every unit increase in x.
The regression coefficient is computed by the formula:
b = _ Σdxdy_
Σdx2
Where: Σdxy = sum of the column xy which is derived by multiplying paired values of
the deviation values of x and y
Σdx2 = sum of the deviation values of x
Once the values of a and b are obtained, the predicted values of y can be derived.
Illustrative Example:
x (No. of y (Grain
y2 dx dy dx2 dy2 dxdy
Tillers) Yield)
150 4.9 24.01 -56.75 -0.3 3220.56 0.09 17.025
165 5.2 27.04 -41.75 0 1743.06 0 0
182 5.1 26.01 -24.75 -0.1 612.56 0.01 2.475
185 5.0 25.00 -21.75 -0.2 473.06 0.04 4.350
227 5.2 27.04 -21.25 0 451.56 0 0
230 5.4 29.16 23.25 0.2 540.56 0.04 4.65
242 5.2 27.04 35.25 0 1242.56 0 0
272 5.6 31.36 65.25 0.4 4257.56 0.16 26.10
Total 1654 41.6 216.66 12541.45 0.34 54.6
= 206.75
= 5.2
From the above information, the following values of a and b are obtained;
b = _ 54.6_ = 0.00435
12541.48
a = 5.2 – (0.00435) (206.750 = 4.30064
Thus, the regression equation is:
y1 = a + bx
y1 = 4.30064 + 0.00435x
Hence, the predicted values of xi are:
y1 = 4.30064 + 0.00435 (150) = 4.95
y1 = 4.30064 + 0.00435 (165) = 5.02
y1 = 4.30064 + 0.00435 (182) = 5.09
y1 = 4.30064 + 0.00435 (185) = 5.11
y1 = 4.30064 + 0.00435 (227) = 5.29
y1 = 4.30064 + 0.00435 (230) = 5.30
y1 = 4.30064 + 0.00435 (242) = 5.35
y1 = 4.30064 + 0.00435 (272) = 5.48
Note: When there is perfect correlation, the predicted values are the same as the actual values
of y, that is, when plotted, they fall on the same points in the line. However, this is not
always true, as chosen by the example.
Source: BAT Workbook in Practical Statistics and Computer Education. AUSAID. Pp. 62-69