0% found this document useful (0 votes)
7 views

4. Correlation & Regression

Uploaded by

ujjawal0698
Copyright
© © All Rights Reserved
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
7 views

4. Correlation & Regression

Uploaded by

ujjawal0698
Copyright
© © All Rights Reserved
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
You are on page 1/ 3

Correlation and Regression

Correlation
 About
 2 variables are said to be correlated if a Change in one causes a corresponding change in the other variable.
 It is a Numerical or Quantitative measure of relationship or association b/w any 2 variables.

 Types of correlation
1. Based on the direction of change of variables  i) Positive, ii) Negative
2. Based upon the constancy of the ratio of change b/w the variables  i) Linear ii) Non-linear
3. Based upon the no. of variables studied  i) Simple ii) Multiple iii) Partial

Positive and Negative


 When one variable increases as the other increases, the correlation is positive;
when one decreases as the other increases it is negative.

Linear and Non-linear


 When the amount of change in one variable tends to keep a constant ratio to the amount of change in
the other variable, then the correlation is said to be Linear.
But of the amount of change in one variable does not bear a constant ratio to the amount of change in
the other variable then the correlation is said to be Non-linear.
 The distinction b/w linear and non-linear is based upon the consistency of the ratio of change b/vv the
variables.

Simple and Multiple


 Under simple correlation [correlation of ZERO order], we study the relationship b/w 2 variables only.
E.g., b/w the yield of wheat and the amount of rainfall or b/w the demand and supply of a commodity.
 In case of multiple correlation, the relationship is studied among 3 or more variables. E.g., the
relationship of yield of wheat may be studied with both chemical fertilizers and the pesticides.
- There are 2 categories of Multiple correlation analysis
i) Total correlation  It is based upon all variables.
ii) Partial correlation  Here, the relationship of 2 or more variables is studied in such a
way that only one dependent variable and one independent variable is considered and
all others are kept constant. For e.g., coefficient of correlation b/w yield of wheat and
chemical fertilizers excluding the effects of pesticides and manures called Partial
correlation.

 Measures of correlation
1. Scatter (Dot) Diagram method.
2. Karl Pearson’s coefficient of correlation.
3. Spearman’s Rank correlation coefficient

1. Scatter (Dot) diagram method


 It is a graphical method to find correlation or relationship b/w 2 variables.
 Merits
i) Simplest and non-mathematical method.
ii) Not influenced by the size of extreme item.
iii) 1st step in investing the relationship b/w 2 variabes.
Demerits
i) It does not give the MAGNITUDE of correlation.
i.e., scatter diagram doesn’t show the QUANTITATIVE measure of the relationship b/w the
variables. It only shows the Quantitative expression of the Quantitative change.
ii) It does not show the relationship for more than 2 variables.
2. Karl Pearson’s coefficient of correlation
 About
i) Also called Pearson’s product moment correlation coefficient.
ii) It is a mathematical method.
iii) There is an assumption that there must be a linear relationship b/w 2 variables.
iv) It is denoted by ‘r’, which is the most common correlation coefficient.
v) ‘r’ measures the degree of linear relationship b/w 2 variables x and y.

 Properties
i) Correlation coefficient possess the property of symmetry. The correlation coefficient is
symmetrical w.r.t x and y. i.e., r (x, y) = r (y, x).
ii) ‘r’ is free from the unit of measurement. i.e., ‘r’ is a pure no. which is suitable for
comparison.
iii) Correlation coefficient is independent of change of origin and scale.
iv) If 2 variables are independent, then r = 0. BUT converse is not always true.

 Merits
1. It gives a mathematical value, in which it summaries the degree and direction of correlation.
Demerits
1. Always assume linear relationship.
2. Calculation of ‘r’ is difficult.
3. ‘r’ is affected by extreme obs.
4. Time consuming method.

3. Spearman’s Rank correlation coefficient


 About
i) Spearman’s Rank correlation coefficient or spearman’s Rank Difference Method or
Formula is a method of calculating the correlation coefficient of QUALITATIVE variables
and was developed in 1904 by Charles Edward Spearman.
ii) In other words, the formula determines the correlation coefficient of variables like beauty,
ability, honesty, etc., whose QUANTITATIVE measurement is not possible. Therefore,
these attributes are ranked or put in the order of their preference.
iii) Also called  Product Moment correlation coefficient b/w the Ranks.
iv) It is a non-parametric test.

 Merits and demerits


i) Easy to understand & simple to apply.
ii) This is the only method that can be used to find correlation coefficient of data having
QUALITATIVE characteristics like beauty, intelligence, honesty, etc.
iii) This is the only method that can be used where the ranks are given and not the actual
bivariate data on 2 variables.

Demerits
i) This method can’t be used for finding correlation in the case of bivariate frequency
distribution.
ii) This method is very difficult to apply when the no. of items is more than 30.

 Note for Formula

 There are 3 cases to compute /


Case 1  when actual ranks are given.
Step 1: Find & apply un-tied rank formula.
Case 2  when ranks are NOT given.
Step 1: Assign ranks by taking either the HIGHEST or LOWEST value as 1.
Step 2: Find & apply un-tied rank formula.
Case 3  when ranks are equal or repeated.
If there are 2 or more items with the same rank in either series, then assign common rank
to each repeated item.
Common rank is the avg. of the ranks which these items would have got if they were
different from each other and the next to the rank used in computing the common rank.
e.g., suppose there are 2 items at rank 4, then common rank assigned to each item is
= 4.5. The next item will be assigned rank 6.

Step 1: assign common rank to each repeated item.


Step 2: Find & apply tied rank formula.

 If we are given data in the form of ranks BUT the highest rank in the series exceeds the no. of pairs of obs.
in such situations, ranks are treated as values and the fresh ranks are determined.

Regression

 About
 The literal or dictionary meaning of regression is “stepping back” or “moving backward” or “returning to
avg. value”.
 Regression term vas 1st time used by Sir Francis Galton in 1877.
 It is a functional or mathematical relationship b/w 2 variables.
 Regression analysis
 It means the estimation or prediction of the unknown value of one variable (DEPENDENT variable)
from the known values of one or more other variables (INDEPENDENT variables).
- The variable whose value is to be predicted is called the DEPENDENT/ EXPLAINED/
PREDICTED/ REGRESSED variable.
- The variables whose value are used to predict the value of dependent variable are called
INDEPENDENT/ EXPLANATORY/ PREDICTOR/ REGRESSOR variable.
 The regression analysis confined to the study of only 2 variables, a dependent and an independent
variable is called Simple Regression Analysis.

 Linear regression
 When the relationship b/w the dependent and independent variable is linear, the technique for prediction is
called Simple Linear Regression.

 There are only 2 regression lines b/w variables X and Y.


i) Regression line of Y on X.
ii) Regression line of X on Y.

 The line of regression of Y on X is given by  Y = a + bX.


where ‘a’  INTERCEPT (constant) of the equation.
‘b’  SLOPE of the equation.
These are used to predict the unknown value of Y (dependent variable) when value of X
(independent variable) is known.
Values of ‘a’ & ‘b’ are calculated by Principle of Least Square (i.e., by Normal equations).

 Regression line  also called “line of best fit”.

You might also like