Regression Correlation
Regression Correlation
Biostatistics
Somia Bakhtiar Lone
Lecture 3: Regression & Correlation
Regression & Correlation
1. Scatter diagram
2. Correlation coefficient
3. Straight line regression model
4. Interpretation of regression coefficient and correlation co-efficient.
Dependent and Independent Variable
• For measuring the relationship between two variables or more.
Y= f(X)
• One variable is always response or dependent variable, that is, a variable
to be predicted from or explained by other variables. In other words, The
dependent variable is the one which the investigator is trying to estimate
or predict (denoted y).
• The other variables are called predictors, or explanatory variables or
independent variables. In other words, The independent variable is the
variable which is under the investigator’s control (denoted x).
• In a variety of applications, the dependent variable (y) of interest is a
continuous variable that we can assume may, after an appropriate
transformation, be normally(symmetric) distributed.
• Can a relationship be used to predict what happens to y as x changes (i.e.
what happens to the dependent variable as the independent variable
changes)?
Applications
In analyzing data for the health sciences disciplines, sometimes we
may, for example, be interested in studying the relationship between
1. blood pressure and age,
2. height and weight,
3. the concentration of an injected drug and heart rate,
4. the consumption level of some nutrient and weight gain,
5. total family income and medical care expenditures.
The nature and strength of the relationships between variables such as
these may be examined using linear models such as regression and
correlation analysis.
Scatter Plots
• Scatterplots play a crucial role in regression and correlation analysis by
visually representing the relationships between variables. They provide a
clear and intuitive way to observe patterns, trends, and associations
between two continuous variables.
• Visualizing Relationships between Variables: Scatterplots allow
researchers to visualize how one variable changes concerning another
variable. By plotting data points on a graph, scatterplots help in
identifying the direction, strength, and form of relationships between
variables. They are essential for detecting linear or non-linear patterns in
the data.
Scatter Diagram
• The diagrammatic way of representing
bivariate data is called scatter diagram.
Suppose, (x1,y1), (x2,y2)………..(xn,yn) are
n pairs of observations. If the values of
the variables x and y be plotted along the
x-axis and y-axis respectively in the xy-
plane, the diagram of dots so obtained is
known as scatter diagram.
• For example, data (n = 55) on the age and
the systolic BP were collected
• Scatter plot of Systolic BP versus Age.
Example: Two week ago Ben started a new job as a car salesman. His
supervisor gives him the advice that the more test drives per day he gets his
customers to take the more sales he will make per day. He records the
following data over the past week.
y
• or
Interpretation of correlation coefficient
It ranges from -1 to +1, where +1 indicates a perfect positive linear
relationship, -1 indicates a perfect negative linear relationship, and 0
indicates no linear relationship.
The correlation is high if observations lie close to a straight line (i.e.,
values close to +1 or -1) and low if observations are widely scattered
(correlation value close to 0).
Interpretation of correlation coefficient
1 = Perfect positive correlation
0.7 < r < 1 = Strong positive correlation
0.4 < r < 0.7 = Fairly positive correlation
0 < r < 0.4 = Weak positive correlation
0 = No correlation
0 > r > -0.4 = Weak negative correlation
-0.4 > r > -0.7 = Fairly negative
correlation
-0.7 > r < -1 = Strong negative
correlation
-1 = Perfect negative correlation
Properties of correlation coefficient
1. Correlation coefficient lies between -1 to +1. i.e, -1< rxy < 1.
2. Correlation coefficient is symmetric. i.e, rxy= ryx
3. For two independent variable correlation coefficient is zero.
4. It is always unit free.
Advantages: It summarizes the relationship in one value, tells the degree of
correlation & also describes the direction of correlation
Limitation:
1. Always assume linear relationship
2. Interpreting the value of r is difficult.
3. Value of Correlation Coefficient is affected by the extreme values.
4. Time consuming methods
• Example: Let’s reconsider our previous example with Ben the car
salesman. Compute the Correlation Coefficient r for the data set.
• We have 7 data points so n = 7. So lets compute r.
• So m = 0.84. So increasing the number of test drive by one unit will increase the sales by
0.84 unit. Now we will compute the y-intercept b.