Presentation4 - Bivariate Analysis and Simple Linear Regression
Presentation4 - Bivariate Analysis and Simple Linear Regression
Techniques
Introduction to Bivariate Analysis and Simple
Linear Regression
Introduction to Bivariate Analysis and
Simple Linear Regression
• Welcome to the lecture on Bivariate Analysis and Simple Linear
Regression.
• Today, we will cover key concepts including:
• Correlation coefficient
• Simple linear regression model
• Interpretation of regression coefficients
• Goodness of fit (R-squared).
• We will use a real-world case study to understand the relationship
between advertising spend and sales revenue.
Introduction to Bivariate Analysis and
Simple Linear Regression…
• Bivariate analysis involves the analysis of two variables to
determine the empirical relationship between them.
• Simple linear regression is a statistical method used to
understand and quantify the relationship between two continuous
variables.
Correlation Coefficient
• The correlation coefficient (r) measures the strength and direction of the
linear relationship between two variables.
• It ranges from -1 to 1:
• r = 1: Perfect positive linear relationship
• r = -1: Perfect negative linear relationship
• r = 0: No linear relationship
• Formula:
• Outliers are data points that are significantly different from others in the
dataset.
• Outliers can distort the correlation coefficient, making it appear stronger or
weaker than it actually is.
• Use additional methods such as experiments or longitudinal studies.
• Perform robust statistical analyses, consider data transformations, or
conduct analyses with or without outliers.
Correlation and causation
• Understand the nature of linear relationship between independent variable x and dependent
variable y.
• When a hypothesis test indicates a significant linear relationship exists between the variables, then
the researcher must consider the following possibilities (When the null hypothesis has been
rejected for a specific significance level (α) value, then any of the following 5 possibilities can
exist.)
1. There is direct cause-and-effect relationship between the variables e.g. water causes plants
to grow
2. There is a reverse cause-and-effect relationship between the variables e.g. extreme coffee
consumption causes nervousness but the truth is extreme nervousness person craves for
coffee to calm the nerves.
3. The relationship between the variables may be caused by third variable e.g. number of
drowning deaths correlates with number of soft drinks consumed in the summer, but the
high temperature.
4. There maybe a complexity of interrelationships among many variables e.g. student
secondary school grades and college grades, but there could be hours of study, motivation,
IQ etc.
5. The relationship may be coincidental
Linear regression
• In studying relationships between two variables, collect data then construct
scatter plot to determine nature of relationship:
1. Positive linear relationship
2. Negative linear relationship
3. Curvilinear relationship
4. No discernible relationship
• Next compute to calculate the correlation to determine if it is significant or
not.
• If the correlation is significant then you can proceed to calculate regression
i.e. that’s the data’s line of best fit.
• The purpose of regression line is to enable the researcher to see the trend
and make predictions on the basis of the data.
• It is of no use to do regression analysis if the correlation is not significant
because you cannot make prediction based on such data.
Simple Linear Regression Model
• Simple linear regression aims to model the relationship between a
dependent variable (y) and an independent variable (x) using a
linear equation.
• Model Equation:
Regression model
• Relation between variables where changes in some variables
may “explain” or possibly “cause” changes in other variables.
• Explanatory variables are termed the independent variables
and the variables to be explained are termed the dependent
variables.
• Regression model estimates the nature of the relationship
between the independent and dependent variables.
• Change in dependent variables that results from changes in
independent variables, ie. size of the relationship.
• Strength of the relationship.
• Statistical significance of the relationship.
Examples
• Dependent variable is retail price of gasoline in Regina – independent
variable is the price of crude oil.
• Dependent variable is employment income – independent variables might
be hours of work, education, occupation, sex, age, region, years of
experience, unionization status, etc.
140
500
120
400
100
300 80
60
200
40
100
20
0 0
1981M01
1982M01
1983M01
1984M01
1985M01
1986M01
1987M01
1988M01
1989M01
1990M01
1991M01
1992M01
1993M01
1994M01
1995M01
1996M01
1997M01
1998M01
1999M01
2000M01
2001M01
2002M01
2003M01
2004M01
2005M01
2006M01
2007M01
2008M01
Crude Oil price index, 1997=100, left axis Regular gasoline prices, regina, cents per litre, right axis
The R^2 value of 0.992 indicates that approximately 99.2% of the variance in
sales revenue is explained by advertising spend.
Exercise - Employee Performance and
Training Hours
• Scenario: Telekom Networks Malawi (TNM) Employee
Training Hours (x)
Performance
operates a Call-centre manned by 100 call- ID Rating (y)
agents. TNM has invested huge sum of money on 1 5 60
training the call-agents. TNM now wants to 2 7 65
analyze the impact of training hours on employee
performance. 3 10 70
4 15 80
• Conduct the following analyses:
• Correlation Coefficient: To see if there is a 5 20 85
relationship between training hours and performance 6 25 90
ratings.
7 30 95
• Regression Model: To predict performance based on
training hours. 8 35 100
• Interpretation: Helps in designing effective training 9 40 105
programs.
• Goodness of Fit: To measure how well the training 10 45 110
explains performance variations.
Employee ID Training Hours (x) Performance Rating (y)
1 5 60
2 7 65
3 10 70
4 15 80
5 20 85
6 25 90
7 30 95
8 35 100
9 40 105
10 45 110
Summary and Key Takeaways
• We covered the correlation coefficient, simple linear regression,
interpretation of regression coefficients, and goodness of fit.
• We used a case study on the relationship between advertising
spend and sales revenue.
• Key takeaway: Regression analysis is a powerful tool for
understanding relationships between variables and making
predictions.