0% found this document useful (0 votes)
9 views

Stat Notes On Correlation Regression

Irrigation

Uploaded by

marivicbaldoza7
Copyright
© © All Rights Reserved
Available Formats
Download as DOC, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
9 views

Stat Notes On Correlation Regression

Irrigation

Uploaded by

marivicbaldoza7
Copyright
© © All Rights Reserved
Available Formats
Download as DOC, PDF, TXT or read online on Scribd
You are on page 1/ 6

CORRELATION AND REGRESSION

(Notes On Statistics)

Correlation

Correlation is a measure of the strength of relationship between the variables.


Coefficient of correlation determines the validity, reliability and objectivity of scores, data or
observations. It also indicates the amount of agreement or disagreement between groups of
scores, measures or individuals.

A perfect correlation, r = 1.0 means that as one variable increases, the other increases
the same amount. An example of a perfect correlation is shown in Table 1. As x increases, y
also increases the same. Figure 1 shows a straight line curve of the data set A.

Table 1. Data set A


x y
1 1
2 2
3 3
4 4
5 5

Figure 1. Curve of data set A

A perfect negative correlation, r=-1.0, shows that as one variable increases, the other
decreases the same amount. The trend in data set B (Table 2) shows a perfect negative
correlation. As x increases, y decreases the same amount. The trend is also shown in Figure
2.
Table 2. Data set B
x y
1 1
2 2
3 3
4 4
5 5

Figure 2. Curve of data set B

When a low correlation exists between two variables, the points where x and y scores
meet are scattered in the graph (Figure 3) and no trend is observed.

Table3. Data set C


x y
1 3
2 4
3 2
4 1
5 5
Figure 3. Curve of data set C
Coefficient of correlation can be interpreted as:

± 0.00 to ± 0.20 - slight correlation, almost negligible relationship


± 0.21 to ± 0.41 - slight correlation, definite but small relationship
± 0.41 to ± 0.70 - moderate correlation, substantial relationship
± 0.71 to ± 0.90 - high correlation, marked relationship
± 0.91 to ± 1.00 - very high correlation, very dependable relationship

Computing the coefficient of correlation by the Pearson Product-Moment Formula

Formula:

ɼxy = ___Σdx dy___


√ ( Σd2x) (Σd2y)

Where:
ɼxy = coefficient of correlation
Σd2x = sum of d2x column
Σd2y = sum of Σd2y column
Σdx dy = sum of dxdy column

Example: Find the correlation between the grain yield in tons/ha and the number of tillers/m 2 of
the PSB-RC 22 variety.
Solution:
Step 1.
Ho: there is no significant correlation between the grain yield/ha and the number of
tillers/m2
Ha: there is a significant correlation between the grain yield/ha and the number of tillers/m 2
Step 2.
Level of significance at 5%, degrees of freedom = n-1
Step 3.
Test statistic: ɼxy = ___Σdx dy___
√ ( Σd2x) (Σd2y)
Step 4.
Computation:

No. of tillers Grain Yield


(x) (y) dx (X- ) dy (X- ) d2x d2y dx dy
150 4.9 -56.75 -0.3 3220.56 0.09 17.025
165 5.2 -41.75 0 1743.06 0 0
182 5.1 -24.75 -0.1 612.56 0.01 2.475
185 5.0 -21.75 -0.2 473.06 0.04 4.35
228 5.2 21.75 0 451.56 0 0
230 5.4 23.25 0.2 540.56 0.04 4.65
242 5.2 35.25 0 1242.56 0 0
272 5.6 65.25 0.4 4257.56 0.16 26.1
Σdx dy_
note:
= mean of x
= mean of y
Σx=1654, Σy=41.6, Σd2x=12541.48, Σd2y=0.34, Σdxdy=54.6
= 206.75
= 5.2

ɼxy = ___Σdx dy___ = _____54.6_____


√ ( Σd2x) (Σd2y) √(12541.48)(0.34)

ɼxy = 0.84

Step 5. Tabular value of ɼ at the 0.05 level with df=7 is 0.6664, which is lower than computed
ɼ =0.84

Step 6. Since the computed value is greater than the tabular value, reject the null hypothesis

Step 7. Interpretation: There is a significant relationship between the grain yield and the
number of tillers/m2 of PSB RC 22 rice variety.

Step 8. Decision: The more tillers produced the higher the grain yield.

Linear Regression or Prediction

Regression analysis is the process by which the values of one variable (criterion variable
or Y) is predicted from the information obtained from another variable (predictor variable or X).
In the above example, one can predict the number of grains per panicle from the number of
tillers. The criterion variable (X) is the number of grains per panicle, while the predictor variable
(Y) is the number of tillers.
Correlation and regression are related. If there is perfect correlation, there is also
perfect prediction or regression. If there is no correlation, there is no way Y can predict X.

Regression Equation

The regression equation is expressed in the equation of the straight line, or least square
regression line. By least square, it means that the most accurate trend line that maybe drawn is
one where the sum of the squares of the vertical distances of the points from the line is least or
a minimum.

The regression equation is:

y = a + bx

Where: y = the predicted value


x= a particular value of the predictor variable
a= the constant or intercept
b= the regression coefficient
The intercept or constant (a) is the point where the straight line crosses the y intercept

It is computed by the formula

a= –b

Where: = mean of the dependent of criterion variable y

= mean of the predictor variable x


b = computed regression coefficient

On the other hand, the regression coefficient or b is the slope of the line. It indicates the
increase in y for every unit increase in x. For example, if the value of b is 1.0, it means that y
increases by one unit for every unit increase in x.
The regression coefficient is computed by the formula:

b = _ Σdxdy_
Σdx2
Where: Σdxy = sum of the column xy which is derived by multiplying paired values of
the deviation values of x and y
Σdx2 = sum of the deviation values of x

Once the values of a and b are obtained, the predicted values of y can be derived.

Illustrative Example:

x (No. of y (Grain
y2 dx dy dx2 dy2 dxdy
Tillers) Yield)
150 4.9 24.01 -56.75 -0.3 3220.56 0.09 17.025
165 5.2 27.04 -41.75 0 1743.06 0 0
182 5.1 26.01 -24.75 -0.1 612.56 0.01 2.475
185 5.0 25.00 -21.75 -0.2 473.06 0.04 4.350
227 5.2 27.04 -21.25 0 451.56 0 0
230 5.4 29.16 23.25 0.2 540.56 0.04 4.65
242 5.2 27.04 35.25 0 1242.56 0 0
272 5.6 31.36 65.25 0.4 4257.56 0.16 26.10
Total 1654 41.6 216.66 12541.45 0.34 54.6
= 206.75

= 5.2
From the above information, the following values of a and b are obtained;
b = _ 54.6_ = 0.00435
12541.48
a = 5.2 – (0.00435) (206.750 = 4.30064
Thus, the regression equation is:
y1 = a + bx
y1 = 4.30064 + 0.00435x
Hence, the predicted values of xi are:
y1 = 4.30064 + 0.00435 (150) = 4.95
y1 = 4.30064 + 0.00435 (165) = 5.02
y1 = 4.30064 + 0.00435 (182) = 5.09
y1 = 4.30064 + 0.00435 (185) = 5.11
y1 = 4.30064 + 0.00435 (227) = 5.29
y1 = 4.30064 + 0.00435 (230) = 5.30
y1 = 4.30064 + 0.00435 (242) = 5.35
y1 = 4.30064 + 0.00435 (272) = 5.48

Note: When there is perfect correlation, the predicted values are the same as the actual values
of y, that is, when plotted, they fall on the same points in the line. However, this is not
always true, as chosen by the example.

Source: BAT Workbook in Practical Statistics and Computer Education. AUSAID. Pp. 62-69

You might also like