0% found this document useful (0 votes)
37 views

Regression Correlation

The document discusses regression and correlation analysis techniques. It defines dependent and independent variables, and explains how to interpret correlation coefficients and regression lines. Examples are provided to demonstrate computing correlation coefficients and linear regression equations from data.

Uploaded by

Shafqat Ullah
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
37 views

Regression Correlation

The document discusses regression and correlation analysis techniques. It defines dependent and independent variables, and explains how to interpret correlation coefficients and regression lines. Examples are provided to demonstrate computing correlation coefficients and linear regression equations from data.

Uploaded by

Shafqat Ullah
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 22

Research Methodologies and

Biostatistics
Somia Bakhtiar Lone
Lecture 3: Regression & Correlation
Regression & Correlation
1. Scatter diagram
2. Correlation coefficient
3. Straight line regression model
4. Interpretation of regression coefficient and correlation co-efficient.
Dependent and Independent Variable
• For measuring the relationship between two variables or more.
Y= f(X)
• One variable is always response or dependent variable, that is, a variable
to be predicted from or explained by other variables. In other words, The
dependent variable is the one which the investigator is trying to estimate
or predict (denoted y).
• The other variables are called predictors, or explanatory variables or
independent variables. In other words, The independent variable is the
variable which is under the investigator’s control (denoted x).
• In a variety of applications, the dependent variable (y) of interest is a
continuous variable that we can assume may, after an appropriate
transformation, be normally(symmetric) distributed.
• Can a relationship be used to predict what happens to y as x changes (i.e.
what happens to the dependent variable as the independent variable
changes)?
Applications
In analyzing data for the health sciences disciplines, sometimes we
may, for example, be interested in studying the relationship between
1. blood pressure and age,
2. height and weight,
3. the concentration of an injected drug and heart rate,
4. the consumption level of some nutrient and weight gain,
5. total family income and medical care expenditures.
The nature and strength of the relationships between variables such as
these may be examined using linear models such as regression and
correlation analysis.
Scatter Plots
• Scatterplots play a crucial role in regression and correlation analysis by
visually representing the relationships between variables. They provide a
clear and intuitive way to observe patterns, trends, and associations
between two continuous variables.
• Visualizing Relationships between Variables: Scatterplots allow
researchers to visualize how one variable changes concerning another
variable. By plotting data points on a graph, scatterplots help in
identifying the direction, strength, and form of relationships between
variables. They are essential for detecting linear or non-linear patterns in
the data.
Scatter Diagram
• The diagrammatic way of representing
bivariate data is called scatter diagram.
Suppose, (x1,y1), (x2,y2)………..(xn,yn) are
n pairs of observations. If the values of
the variables x and y be plotted along the
x-axis and y-axis respectively in the xy-
plane, the diagram of dots so obtained is
known as scatter diagram.
• For example, data (n = 55) on the age and
the systolic BP were collected
• Scatter plot of Systolic BP versus Age.
Example: Two week ago Ben started a new job as a car salesman. His
supervisor gives him the advice that the more test drives per day he gets his
customers to take the more sales he will make per day. He records the
following data over the past week.
y

This clearly shows there is a relationship between the two variables.


As x increases we see that y also increases. This shows there is
what’s called a positive linear correlation between the two variables.
Correlation
Correlation measures the extent to which two variables are related. It
quantifies the strength and direction of the relationship between variables.
There are three types of correlation: positive, negative, and zero correlation.
• Positive Correlation: A positive correlation exists when both variables
move in the same direction. This means that as one variable increases, the
other variable also increases. An example is the relationship between
height and weight, where taller individuals tend to weigh more.
• Negative Correlation: In a negative correlation, one variable increases as
the other decreases. An example is the relationship between height above
sea level and temperature, where as altitude increases, temperature
decreases.
• Zero Correlation: Zero correlation indicates no relationship between two
variables. For instance, there is no correlation between the amount of tea
consumed and intelligence level.
Pearson's Correlation
• Pearson's Correlation Coefficient: Pearson's correlation coefficient (r)
is a numerical measure that quantifies the strength and direction of a
linear(straight-line) relationship between two continuous variables.
Let, (x1,y1), (x2,y2),…,(xn,yn) be the pairs of n observations. Then the
correlation coefficient between x and y is denoted by r and defined
as,

• or
Interpretation of correlation coefficient
 It ranges from -1 to +1, where +1 indicates a perfect positive linear
relationship, -1 indicates a perfect negative linear relationship, and 0
indicates no linear relationship.
 The correlation is high if observations lie close to a straight line (i.e.,
values close to +1 or -1) and low if observations are widely scattered
(correlation value close to 0).
Interpretation of correlation coefficient
 1 = Perfect positive correlation
 0.7 < r < 1 = Strong positive correlation
 0.4 < r < 0.7 = Fairly positive correlation
 0 < r < 0.4 = Weak positive correlation
 0 = No correlation
 0 > r > -0.4 = Weak negative correlation
 -0.4 > r > -0.7 = Fairly negative
correlation
 -0.7 > r < -1 = Strong negative
correlation
 -1 = Perfect negative correlation
Properties of correlation coefficient
1. Correlation coefficient lies between -1 to +1. i.e, -1< rxy < 1.
2. Correlation coefficient is symmetric. i.e, rxy= ryx
3. For two independent variable correlation coefficient is zero.
4. It is always unit free.
Advantages: It summarizes the relationship in one value, tells the degree of
correlation & also describes the direction of correlation
Limitation:
1. Always assume linear relationship
2. Interpreting the value of r is difficult.
3. Value of Correlation Coefficient is affected by the extreme values.
4. Time consuming methods
• Example: Let’s reconsider our previous example with Ben the car
salesman. Compute the Correlation Coefficient r for the data set.
• We have 7 data points so n = 7. So lets compute r.

Since the correlation coefficient r is nearly 1, this shows that there is a


strong linear correlation between x and y. The sign of r is positive which
also indicates that when the number of test drives per day increases
then the sales of cars will also increases.
Spearman Rank correlation coefficient
• It is a non-parametric measure of correlation. This procedure makes
use of the two sets of ranks that may be assigned to the sample
values of x and y.
• Spearman Rank correlation coefficient could be computed in the
following cases:
 Both variables are quantitative.
 Both variables are qualitative ordinal.
 One variable is quantitative and the other is qualitative ordinal.
Procedure:
1. Rank the values of X from 1 to n where n is the sample size.
2. Rank the values of Y from 1 to n.
3. Compute the value of difference, di= rank of Xi - rank of Yi
4. Apply the following formula

5. The value of rs denotes the magnitude and nature of association


giving the same interpretation as simple r.
Introduction to Regression
Regression analysis is a statistical method used to examine the relationship between one dependent
variable and one or more independent variables. It aims to understand how the value of the
dependent variable changes when one or more independent variables are varied.
Purpose: The primary purpose of regression analysis is to predict the value of the dependent
variable based on the values of one or more independent variables. It helps in understanding the
strength and direction of the relationship between variables, making it a valuable tool in forecasting
and decision-making.
Types of Regression: There are many type of regressions. Two main types are:
 Linear Regression: Linear regression is a type of regression analysis where the relationship
between the dependent variable and independent variable(s) is modeled as a linear equation. It
is used when there is a linear relationship between the variables.
 Multiple Regression: Multiple regression involves predicting the value of a dependent variable
based on two or more independent variables. It extends the concepts of simple linear regression
to more complex relationships.
Linear Regression
• In linear regression, the relationship between the dependent variable (Y) and
independent variable (X) is represented by the equation
Y=a+bX
• Here, "a" represents the intercept of the line with the Y-axis, and "b"
represents the slope of the line, indicating how much Y changes for a unit
change in X.
• Interpretation of Coefficients (a and b):
Intercept (a): The intercept "a" in the linear regression equation represents the
value of Y when X is zero. It indicates where the regression line crosses the Y-
axis.
Slope (b): The slope "b" in the linear regression equation signifies how much Y
changes for a one-unit change in X. It reflects the rate of change in Y concerning
changes in X.
Straight line Regression Model
The Equation of the Regression Line is y = b+ mx, where

• The model above is referred simple because it contains only one


independent variable (simple linear regression model).
• It is linear because the independent variable appears only in the first
power; if we graph the mean of Y versus X, the graph is a straight line
with intercept b and slope m.
Examples: Find the equation of the regression line. Round the slope m
and the intercept b to two decimals.
• We have 6 data points so n = 6. Lets compute the slope m of the regression line.

• So m = 0.84. So increasing the number of test drive by one unit will increase the sales by
0.84 unit. Now we will compute the y-intercept b.

This gives b = -1.53. The regression


line for this data set is
𝑦 = −1.53 + 0.84 𝑥
Now we can put the values of x in
above equation to compute the fitted
value of 𝑦 to plot regression line.

You might also like