0% found this document useful (0 votes)

4 views

Presentation4 - Bivariate Analysis and Simple Linear Regression

for data analysis

Uploaded by

cfchalimba

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

4 views

Presentation4 - Bivariate Analysis and Simple Linear Regression

for data analysis

Uploaded by

cfchalimba

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 31

Analytics and Quantitative

Techniques
Introduction to Bivariate Analysis and Simple
Linear Regression
Introduction to Bivariate Analysis and
Simple Linear Regression
• Welcome to the lecture on Bivariate Analysis and Simple Linear
Regression.
• Today, we will cover key concepts including:
• Correlation coefficient
• Simple linear regression model
• Interpretation of regression coefficients
• Goodness of fit (R-squared).
• We will use a real-world case study to understand the relationship
between advertising spend and sales revenue.
Introduction to Bivariate Analysis and
Simple Linear Regression…
• Bivariate analysis involves the analysis of two variables to
determine the empirical relationship between them.
• Simple linear regression is a statistical method used to
understand and quantify the relationship between two continuous
variables.
Correlation Coefficient
• The correlation coefficient (r) measures the strength and direction of the
linear relationship between two variables.
• It ranges from -1 to 1:
• r = 1: Perfect positive linear relationship
• r = -1: Perfect negative linear relationship
• r = 0: No linear relationship
• Formula:

• Round the value of r to three decimal places

Scatter plot
• A scatter plot is a type of data visualization that shows the
relationship between two quantitative variables.
• Each point on the scatter plot represents an observation in the
dataset

Subject Age (x) Blood-Pressure (y)

A 43 128
B 48 120
C 56 135
D 61 143
E 67 141
F 70 152
Scatter plot …
• Scatter plots can reveal different types of
patterns (linear, non-linear).
• A linear pattern indicates a straight-line
relationship between variables.
• Non-linear patterns include curves or clusters.
• Direction of the relationship indicates whether
the variables increase together (positive
correlation) or if one increases while the other
decreases (negative correlation).
• No clear direction suggests no correlation
Scatter plot…
• A strong correlation means data points are closely packed around
the line of best fit.
• A weak correlation means data points are more scattered.
• Outliers are data points that are significantly different from other
observations.
• Outliers can distort the overall impression of the data's pattern
and strength.
• It is important to investigate outliers to understand their cause.
• Steps to interpret scatter plots: look for patterns, determine
direction, assess strength, and identify outliers.
Correlation Coefficient
• Example: Advertising Spend (x)
Sales Revenue
(y)
• Given dataset (Advertising Spend, Sales 100 200
Revenue): 150 300

• Calculation Steps: 200 400

250 450
1.Calculate the sum of x, y, x^2, y^2, and xy. 300 500
2.Substitute into the formula to find r.
Sales Revenue
Advertising Spend (x)
Correlation Coefficient…
(y)
100 200
150 300
200 400
250 450
300 500
Correlation Coefficient…

The correlation coefficient is 0.996, indicating a very strong positive linear

relationship between advertising spend and sales revenue
Example 2
Consider a dataset of a company’s advertising spend ($) and corresponding sales ($).
Calculate the correlation coefficient and interpret the results.

Advertising Spend (x) Sales (y)

10 25
20 45
30 65
40 70
50 85
Example 2…
Correlation Limitations
• Correlation does not necessarily imply causation all the time.
• Correlation measures the strength of a relationship between two
variables but does not indicate that one variable causes the other.
• For example, ice cream sales and drowning incidents may have a high
correlation during summer months.
• However, this does not mean ice cream sales cause drowning
incidents.
• Both variables are likely influenced by a third variable: hot weather.
• Another example, children's shoe size and reading ability may be
correlated.
• However, larger shoe sizes do not cause better reading ability.
• Both variables are influenced by age.
Correlation Limitations…

• Outliers are data points that are significantly different from others in the
dataset.
• Outliers can distort the correlation coefficient, making it appear stronger or
weaker than it actually is.
• Use additional methods such as experiments or longitudinal studies.
• Perform robust statistical analyses, consider data transformations, or
conduct analyses with or without outliers.
Correlation and causation
• Understand the nature of linear relationship between independent variable x and dependent
variable y.
• When a hypothesis test indicates a significant linear relationship exists between the variables, then
the researcher must consider the following possibilities (When the null hypothesis has been
rejected for a specific significance level (α) value, then any of the following 5 possibilities can
exist.)
1. There is direct cause-and-effect relationship between the variables e.g. water causes plants
to grow
2. There is a reverse cause-and-effect relationship between the variables e.g. extreme coffee
consumption causes nervousness but the truth is extreme nervousness person craves for
coffee to calm the nerves.
3. The relationship between the variables may be caused by third variable e.g. number of
drowning deaths correlates with number of soft drinks consumed in the summer, but the
high temperature.
4. There maybe a complexity of interrelationships among many variables e.g. student
secondary school grades and college grades, but there could be hours of study, motivation,
IQ etc.
5. The relationship may be coincidental
Linear regression
• In studying relationships between two variables, collect data then construct
scatter plot to determine nature of relationship:
1. Positive linear relationship
2. Negative linear relationship
3. Curvilinear relationship
4. No discernible relationship
• Next compute to calculate the correlation to determine if it is significant or
not.
• If the correlation is significant then you can proceed to calculate regression
i.e. that’s the data’s line of best fit.
• The purpose of regression line is to enable the researcher to see the trend
and make predictions on the basis of the data.
• It is of no use to do regression analysis if the correlation is not significant
because you cannot make prediction based on such data.
Simple Linear Regression Model
• Simple linear regression aims to model the relationship between a
dependent variable (y) and an independent variable (x) using a
linear equation.
• Model Equation:
Regression model
• Relation between variables where changes in some variables
may “explain” or possibly “cause” changes in other variables.
• Explanatory variables are termed the independent variables
and the variables to be explained are termed the dependent
variables.
• Regression model estimates the nature of the relationship
between the independent and dependent variables.
• Change in dependent variables that results from changes in
independent variables, ie. size of the relationship.
• Strength of the relationship.
• Statistical significance of the relationship.
Examples
• Dependent variable is retail price of gasoline in Regina – independent
variable is the price of crude oil.
• Dependent variable is employment income – independent variables might
be hours of work, education, occupation, sex, age, region, years of
experience, unionization status, etc.

• Price of a product and quantity produced or sold:

• Quantity sold affected by price. Dependent variable is quantity of
product sold – independent variable is price.
• Price affected by quantity offered for sale. Dependent variable is price –
independent variable is quantity sold.
600 160

140

500

120

400

100

300 80

200

100

0 0
1981M01

1982M01

1983M01

1984M01

1985M01

1986M01

1987M01

1988M01

1989M01

1990M01

1991M01

1992M01

1993M01

1994M01

1995M01

1996M01

1997M01

1998M01

1999M01

2000M01

2001M01

2002M01

2003M01

2004M01

2005M01

2006M01

2007M01

2008M01
Crude Oil price index, 1997=100, left axis Regular gasoline prices, regina, cents per litre, right axis

Source: CANSIM II Database (Vector v1576530 and v735048

respectively)
Bivariate and multivariate models
Bivariate or simple regression model
(Education) x y (Income)

Multivariate or multiple regression model

(Education) x1
(Sex) x2
(Experience) x3 y (Income)
(Age) x4

Model with simultaneous relationship

Price of wheat Quantity of wheat produced
Assumptions of the Model
• For the simple linear regression model to be valid, the following
assumptions must be satisfied:
1.Linearity: The relationship between the independent and
dependent variables is linear.
2.Independence: The residuals (errors) are independent.
3.Homoscedasticity: The residuals have constant variance at
every level of x.
4.Normality: The residuals are normally distributed.
Fitting a Simple Linear Regression Model
• Using the same dataset, we will fit a simple linear regression
model.
• Steps:
1.Calculate the means of x and y Sales Revenue
Advertising Spend (x)
2.Calculate the slope (β1). (y)
100 200
3.Calculate the intercept (β0). 150 300
4.Formula: 200 400
250 450
300 500
Fitting a Simple Linear Regression Model…
Calculation Steps:

Régression Equation: y = −262 + 3.16x

Interpretation of Regression Coefficients
• Intercept (β0): The expected value of y when x is zero. In this case,
-262, which doesn't make practical sense, indicating that the
intercept may not be meaningful in all contexts.
• Slope (β1): The change in y for a one-unit change in x. Here, for
every additional unit of advertising spend, sales revenue
increases by 3.16 units.
Goodness of Fit (R-squared)
• R-squared measures the proportion of variance in the
dependent variable that is predictable from the independent
variable.
• It ranges from 0 to 1:
• R^2 = 1: Perfect fit
• R^2 = 0: No fit
• Formula:
Goodness of Fit (R-squared)
• Calculation Steps:
1.Compute predicted values
2.Compute the residual sum of squares (RSS) and total sum of
squares (TSS).
3.Calculate
Advertising Spend (x) Sales Revenue (y) Predicted Sales Revenue
100 200 -262 + 3.16*100 = 54
150 300 -262 + 3.16*150 = 212
200 400 -262 + 3.16*200 = 370
250 450 -262 + 3.16*250 = 528
300 500 -262 + 3.16*300 = 686

The R^2 value of 0.992 indicates that approximately 99.2% of the variance in
sales revenue is explained by advertising spend.
Exercise - Employee Performance and
Training Hours
• Scenario: Telekom Networks Malawi (TNM) Employee
Training Hours (x)
Performance
operates a Call-centre manned by 100 call- ID Rating (y)
agents. TNM has invested huge sum of money on 1 5 60
training the call-agents. TNM now wants to 2 7 65
analyze the impact of training hours on employee
performance. 3 10 70
4 15 80
• Conduct the following analyses:
• Correlation Coefficient: To see if there is a 5 20 85
relationship between training hours and performance 6 25 90
ratings.
7 30 95
• Regression Model: To predict performance based on
training hours. 8 35 100
• Interpretation: Helps in designing effective training 9 40 105
programs.
• Goodness of Fit: To measure how well the training 10 45 110
explains performance variations.
Employee ID Training Hours (x) Performance Rating (y)
1 5 60
2 7 65
3 10 70
4 15 80
5 20 85
6 25 90
7 30 95
8 35 100
9 40 105
10 45 110
Summary and Key Takeaways
• We covered the correlation coefficient, simple linear regression,
interpretation of regression coefficients, and goodness of fit.
• We used a case study on the relationship between advertising
spend and sales revenue.
• Key takeaway: Regression analysis is a powerful tool for
understanding relationships between variables and making
predictions.

Descent Into Alpha
100% (6)
Descent Into Alpha
9 pages
Wired For Story by Lisa Cron - Excerpt
50% (12)
Wired For Story by Lisa Cron - Excerpt
15 pages
Deepak Chopra-David Simon - Training The Mind-Healing The Body PDF
100% (4)
Deepak Chopra-David Simon - Training The Mind-Healing The Body PDF
42 pages
Basic Management Skills & Indian Constitution Diploma Syllbus
67% (6)
Basic Management Skills & Indian Constitution Diploma Syllbus
11 pages
Lecture 5-Association Between Variables-1
No ratings yet
Lecture 5-Association Between Variables-1
20 pages
Lecture 8-Association Between Variables
No ratings yet
Lecture 8-Association Between Variables
28 pages
Correlation
No ratings yet
Correlation
84 pages
Ilovepdf Merged
No ratings yet
Ilovepdf Merged
70 pages
Business Stat Presentation
No ratings yet
Business Stat Presentation
66 pages
Introduction_to_PSPP_Data_Analysis
No ratings yet
Introduction_to_PSPP_Data_Analysis
21 pages
Topic 7 Regression (Cont.)
No ratings yet
Topic 7 Regression (Cont.)
47 pages
Regression Analysis and Modeling For Decision Support
No ratings yet
Regression Analysis and Modeling For Decision Support
45 pages
Correlation and Regression Analysis
No ratings yet
Correlation and Regression Analysis
34 pages
Module 4
No ratings yet
Module 4
41 pages
regression-analysis
No ratings yet
regression-analysis
19 pages
14. Chapter 6.Correlation
No ratings yet
14. Chapter 6.Correlation
25 pages
Correlation and Regression
No ratings yet
Correlation and Regression
6 pages
09 - M & S - Corr+Regr
No ratings yet
09 - M & S - Corr+Regr
18 pages
Measures of Dispersion EASY
No ratings yet
Measures of Dispersion EASY
14 pages
AIML MSE 2 Notes
No ratings yet
AIML MSE 2 Notes
35 pages
Linear Regression Models
No ratings yet
Linear Regression Models
42 pages
Linear Regression Models
No ratings yet
Linear Regression Models
41 pages
Lecture 12 Simple Linear Regression Analysis
No ratings yet
Lecture 12 Simple Linear Regression Analysis
22 pages
ML Lecture - 3
No ratings yet
ML Lecture - 3
47 pages
DA-MODULE-3
No ratings yet
DA-MODULE-3
54 pages
MODULE-3
No ratings yet
MODULE-3
34 pages
CORRELATION and REGRESSION
100% (1)
CORRELATION and REGRESSION
19 pages
tema-3-econometria-tema-3 en
No ratings yet
tema-3-econometria-tema-3 en
21 pages
Correlation Analysis
No ratings yet
Correlation Analysis
17 pages
Chapter 1 Simple Linear Regression
No ratings yet
Chapter 1 Simple Linear Regression
17 pages
Chapter 13
No ratings yet
Chapter 13
44 pages
Intermediate Analytics-Regression-Week 1
No ratings yet
Intermediate Analytics-Regression-Week 1
52 pages
Chapter One Review of Linear Regression Models: Definitions and Components of Econometrics
No ratings yet
Chapter One Review of Linear Regression Models: Definitions and Components of Econometrics
65 pages
Chapter 3 - Classical Simple Linear Regression
No ratings yet
Chapter 3 - Classical Simple Linear Regression
52 pages
BRM-Lecture 4-2023
No ratings yet
BRM-Lecture 4-2023
48 pages
Data Visualization
No ratings yet
Data Visualization
37 pages
Chapter4
No ratings yet
Chapter4
86 pages
Unit 2
No ratings yet
Unit 2
34 pages
Math (Regression Theory)
No ratings yet
Math (Regression Theory)
31 pages
Lab 4 Regression BBIO180 Manual Au24
No ratings yet
Lab 4 Regression BBIO180 Manual Au24
5 pages
(The Mcgraw-Hill Economics Series) Christopher Thomas, S. Charles Maurice-Managerial Economics-McGraw-Hill Education (2015)
No ratings yet
(The Mcgraw-Hill Economics Series) Christopher Thomas, S. Charles Maurice-Managerial Economics-McGraw-Hill Education (2015)
84 pages
Correlation and Regration
No ratings yet
Correlation and Regration
57 pages
Regression Coeffient
No ratings yet
Regression Coeffient
52 pages
Economic Analysis Introduction To Basic Concepts
No ratings yet
Economic Analysis Introduction To Basic Concepts
29 pages
Lecture 03-Review of Mathematical Concepts
No ratings yet
Lecture 03-Review of Mathematical Concepts
16 pages
8_1_categorical_data_ninell
No ratings yet
8_1_categorical_data_ninell
26 pages
LINEAR REGRESSION IN R
No ratings yet
LINEAR REGRESSION IN R
6 pages
Correlation Analysis
100% (1)
Correlation Analysis
51 pages
Regression and Correlation
No ratings yet
Regression and Correlation
19 pages
Week 9 - PROG 8510 Week 9
No ratings yet
Week 9 - PROG 8510 Week 9
27 pages
Chapter - 2-ML
No ratings yet
Chapter - 2-ML
63 pages
Chapter 3 DEMAND ESTIMATION 3
No ratings yet
Chapter 3 DEMAND ESTIMATION 3
38 pages
Correlation and Regression Analysis
No ratings yet
Correlation and Regression Analysis
9 pages
Correlation and Regression
100% (6)
Correlation and Regression
36 pages
Correlation Analysis
No ratings yet
Correlation Analysis
24 pages
Unit-14
No ratings yet
Unit-14
16 pages
UNIt-3 TY
No ratings yet
UNIt-3 TY
67 pages
Chapter 7
No ratings yet
Chapter 7
43 pages
Correlation and Regression
No ratings yet
Correlation and Regression
37 pages
Marketing Analytics - Group Assignment 1
No ratings yet
Marketing Analytics - Group Assignment 1
3 pages
Business Statistics - Use Regression Analysis To Determine Validity of Relationships - Dummies
No ratings yet
Business Statistics - Use Regression Analysis To Determine Validity of Relationships - Dummies
5 pages
Univariate and Bivariate Statistical Analysespdf
100% (1)
Univariate and Bivariate Statistical Analysespdf
6 pages
Correlation and Regression: Six Sigma Thinking, #8
From Everand
Correlation and Regression: Six Sigma Thinking, #8
Sumeet Savant
5/5 (1)
Statistics Super Review
From Everand
Statistics Super Review
Statistics Study Guides
2/5 (1)
Presentation3 - Hypothesis Testing
No ratings yet
Presentation3 - Hypothesis Testing
29 pages
Presentation2 - Probability Theory
No ratings yet
Presentation2 - Probability Theory
28 pages
Good for understanding issues
No ratings yet
Good for understanding issues
49 pages
CH 10
No ratings yet
CH 10
44 pages
CH 04
No ratings yet
CH 04
30 pages
PlagReport (10)
No ratings yet
PlagReport (10)
8 pages
Chris Glenn: Worked Over 300 Hours My Last Semester at UWF
No ratings yet
Chris Glenn: Worked Over 300 Hours My Last Semester at UWF
2 pages
History and Historiography
100% (1)
History and Historiography
20 pages
PDF
No ratings yet
PDF
3 pages
The Humanitarian Theory of Punishment C.S. Lewis
100% (1)
The Humanitarian Theory of Punishment C.S. Lewis
5 pages
Citaion Style Guide
No ratings yet
Citaion Style Guide
2 pages
01 - Maed 306
No ratings yet
01 - Maed 306
2 pages
OptiStruct 2017 Tutorials and Examples
No ratings yet
OptiStruct 2017 Tutorials and Examples
1,061 pages
6 Daily Lesson Plan Bungee
No ratings yet
6 Daily Lesson Plan Bungee
3 pages
Office Management
100% (1)
Office Management
8 pages
Presentation - Bridge Course
No ratings yet
Presentation - Bridge Course
8 pages
Unit - 4: Leadership Communication Controlling and Change Manangement
No ratings yet
Unit - 4: Leadership Communication Controlling and Change Manangement
45 pages
Federation Ship Recognition Manual
91% (22)
Federation Ship Recognition Manual
129 pages
MICS4 Manual Getting Started
No ratings yet
MICS4 Manual Getting Started
12 pages
Reflection: "How To Avoid Death by Powerpoint" by David JP Phillips
No ratings yet
Reflection: "How To Avoid Death by Powerpoint" by David JP Phillips
2 pages
WDP Process Diagrams v1
No ratings yet
WDP Process Diagrams v1
6 pages
Tips For Effective Presentation
No ratings yet
Tips For Effective Presentation
3 pages
Keerthana Resume Updated
No ratings yet
Keerthana Resume Updated
2 pages
Self Compacting Concrete
No ratings yet
Self Compacting Concrete
48 pages
Oral Comm Module 10
No ratings yet
Oral Comm Module 10
16 pages
College Tanca PDF
0% (1)
College Tanca PDF
262 pages
Transitional Rendezvous, Sadarghat: Seminar 2 ARC 502
No ratings yet
Transitional Rendezvous, Sadarghat: Seminar 2 ARC 502
80 pages
Senior Letter
No ratings yet
Senior Letter
4 pages
Untitled
No ratings yet
Untitled
11 pages
Saep 397
100% (1)
Saep 397
9 pages
Diversity of Living World: Very Short Answer Questions
No ratings yet
Diversity of Living World: Very Short Answer Questions
9 pages

Presentation4 - Bivariate Analysis and Simple Linear Regression

Uploaded by

Presentation4 - Bivariate Analysis and Simple Linear Regression

Uploaded by

Analytics and Quantitative

• Round the value of r to three decimal places

Subject Age (x) Blood-Pressure (y)

• Calculation Steps: 200 400

The correlation coefficient is 0.996, indicating a very strong positive linear

Advertising Spend (x) Sales (y)

• Price of a product and quantity produced or sold:

Source: CANSIM II Database (Vector v1576530 and v735048

Multivariate or multiple regression model

Model with simultaneous relationship

Régression Equation: y = −262 + 3.16x

You might also like