0% found this document useful (0 votes)
14 views

Chapter 7 Correlation and Regression Lyst5582

This document discusses correlation and regression analysis. It defines correlation as the relationship between two variables, whether moving in the same or opposite directions. There are different types of correlation based on the number of variables, direction, and linearity. Common measures of correlation discussed include scatter diagrams, Karl Pearson's coefficient of correlation, and Spearman's rank correlation coefficient. Regression analysis allows one to predict the change in one variable given changes in another variable. The regression line represents the best fit relationship between the variables.

Uploaded by

dauzisohail1999
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
14 views

Chapter 7 Correlation and Regression Lyst5582

This document discusses correlation and regression analysis. It defines correlation as the relationship between two variables, whether moving in the same or opposite directions. There are different types of correlation based on the number of variables, direction, and linearity. Common measures of correlation discussed include scatter diagrams, Karl Pearson's coefficient of correlation, and Spearman's rank correlation coefficient. Regression analysis allows one to predict the change in one variable given changes in another variable. The regression line represents the best fit relationship between the variables.

Uploaded by

dauzisohail1999
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 13

Correlation and Regression

[Type here]

➢ Correlation :
Correlation refers to sympathetic movement of variables either in the same or in the opposite directions.
The measure of correlation called the correlation coefficient.
The degree of relationship is expressed by coefficient which range from correlation ( -1 ≤ r ≥ +1)
The correlation analysis enables us to have an idea about the degree & direction of the relationship
between the two variables under study.

• Types of Correlation :

Types of Correlation

On the basis of On the basis of


On the basis of
number of direction of
linearity
variables Correlation

Simple Multiple Positive Negative Linear Non-Linear


Correlation Corelation Correlation Correlation Correlation Correlation

Partial No
Correlation Correlation

pg. 1
[Type here]

❖ On the basis of number of variables :

❖ On the basis of direction of correlation :

❖ On the basis of linearity :

pg. 2
[Type here]

• Measures/Methods of Correlation :
❖ Scatter Diagram :
Scatter Diagram is a graph of observed plotted points where each points represents the values of X & Y as
a coordinate.
It portrays the relationship between these two variables graphically.
If the line goes upward and this upward movement is from left to right it will show positive correlation.
Similarly, if the lines move downward and its direction is from left to right, it will show negative
correlation.
The degree of slope will indicate the degree of correlation.

❖ Karl Pearson's Coefficient of Correlation :


Karl Pearson’s Coefficient of Correlation denoted by- ‘r’ and -1 ≤ r ≥ +1 . This is most common method.

pg. 3
[Type here]

The coefficient of correlation ‘r’ measure the degree of linear relationship between two variables say x &
y.

𝛴𝑥𝑦 𝐶𝑜𝑣𝑎𝑟𝑖𝑎𝑛𝑐𝑒 where, x = (X-𝑋̅ ) & y= (Y-𝑌̅)


r(x, y)= =
√𝛴𝑥 2 𝛴𝑦 2 𝑆𝐷𝑥 𝑥 𝑆𝐷𝑦 SD = Standard Deviation
∑(𝑋−𝑋̅) ∑(𝑌−𝑌̅)
Covariance =
𝑛

Properties of Coefficient of Correlation


▪ The value of the coefficient of correlation (r) always lies between ±1.
▪ Such as: r=+1, perfect positive correlation
▪ r=-1, perfect negative correlation
▪ r=0, no correlation
▪ The coefficient of correlation is independent of the origin and scale.
▪ By origin, it means subtracting any non-zero constant from the given value of X and Y
the value of “r” remains unchanged.
▪ By scale it means, there is no effect on the value of “r” if the value of X and Y is divided
or multiplied by any constant.
▪ The coefficient of correlation is “zero” when the variables X and Y are independent. But,
however, the converse is not true.

• Probable Error of Correlation Coefficient :


The Probable Error of Correlation Coefficient helps in determining the accuracy and reliability of the value
of the coefficient that in so far depends on the random sampling.
The probable error of correlation coefficient can be obtained by applying the following formula:

pg. 4
[Type here]

Where , r = coefficient of correlation &


N = number of observations

Probable Error is used to:


1. Interpret the value of ‘r’
▪ If r < P.E. then it is not at all significant (No Correlation)
▪ If r > 6P.E. then r is highly significant
▪ If P.E. < r < P.E. then we cannot say anything about the significance of r

2. Constant confidence limits within which the correlation in the population p is expressed in line.
▪ By adding and subtracting the value of P.E from the value of ‘r,’ we get the upper limit
▪ and the lower limit, respectively within which the correlation of coefficient is expected to lie.
▪ Symbolically, it can be expressed

Where, rho denotes the correlation in a population

Conditions under which Probable error is used:


The probable Error can be used only when the following three conditions are fulfilled:
▪ The data must approximate to the bell-shaped curve, i.e. a normal frequency curve.
▪ The Probable error computed from the statistical measure must have been taken from the
sample.
▪ The sample items must be selected in an unbiased manner and must be independent of each
other.
Thus, the probable error is calculated to check the reliability of the value of coefficient
calculated from the random sampling.
❖ Spearman’s Rank Correlation Coefficient :
The Spearman’s Rank Correlation Coefficient is the non-parametric statistical measure used to study the
strength of association between the two ranked variables.
This method is applied to the ordinal set of numbers, which can be arranged in order, i.e. one after the other
so that ranks can be given to each.
When statistical series in which the variables under study are not capable of quantitative measurement but

pg. 5
[Type here]

can be arranged in serial order, in such situation Pearson’s correlation coefficient cannot be used in such
case Spearman Rank correlation can be used.

R = Rank correlation coefficient


D = Difference of rank between
paired item in two series.
N = Total number of observation
Mathematical Properties of Spearman’s Rank Correlation Coefficient :
The value of R lies between ±1 such as:
▪ R = +1, there is a complete agreement in the order of ranks and move in the same direction.
▪ R= -1, there is a complete agreement in the order of ranks, but are in opposite directions.
▪ R =0, there is no association in the ranks.

Types of Problems :

•An individual must follow the following steps to calculate the correlation coefficient:
1. Where actual ranks are a) The difference between the ranks (R1-R2) must be calculated, denoted by D.
b) Then, square these differences to remove the negative sign and obtain its sum ∑𝐷 2
assigned : c) Substitute the values obtained in the formula.

The formula to calculate the rank correlation coefficient when there is a tie in the ranks is:
Where m = number of
items whose ranks are
common.

• Regression :
Regression analysis is the scientific technique for making such prediction.
M.M. Blair has described Regression analysis as a mathematical measures of the average relationship two
or more variables in terms of the original units of the data.

pg. 6
[Type here]

Regression Analysis:
Regression Line:
▪ The Regression Analysis is a statistical
tool used to determine the probable ▪ The degree to which the variables are
change in one variable for the given correlated to each other depends on the
amount of change in another. Regression Line.
▪ It is used to get the measure of the error ▪ The regression line is a single line that best
involved while using the regression line as fits the data, i.e. all the points plotted are
a basis for estimation. connected via a line in the manner that the
distance from the line to the points is the
▪ It estimates the values of dependent smallest.
variables from the values of the
independent variable. This means, the
value of the unknown variable can be
estimated from the known value of
another variable.
The regression lines have equations :

Regression line of Y on X : Regression line of X on Y:


This gives the most probable values of Y from the This gives the most probable values of X from the
given values of X. given values of Y.

pg. 7
[Type here]

Properties of Regression Coefficient

▪ The constant ‘b’ in the regression equation (Ye = a + bx) is called as the Regression Coefficient.
▪ It determines the slope of the line, i.e. the change in the value of Y corresponding to the unit
change in X and therefore, it is also called as a “Slope Coefficient.”
▪ The correlation coefficient is the geometric mean of two regression coefficients.

𝒓𝟐 = 𝒃𝒚𝒙 x 𝒃𝒙𝒚
r = √𝒃𝒚𝒙 𝐱 𝒃𝒙𝒚
▪ The value of the coefficient of correlation cannot exceed unity i.e. 1.
𝒃𝒚𝒙 x 𝒃𝒙𝒚 ≤ 1
▪ The sign of both the regression coefficients will be same, i.e. they will be either positive or
negative.
▪ It is an absolute measure.
▪ The average value of the two regression coefficients will be greater than the value of the
correlation.

pg. 8
[Type here]

pg. 9
[Type here]

pg. 10
[Type here]

pg. 11
[Type here]

pg. 12

You might also like