0% found this document useful (0 votes)

2 views

Two variables Chap3

The lecture discusses the relationship between two variables, focusing on concepts such as correlation, covariance, and the correlation coefficient (r). It explains how to measure the strength of relationships between variables and introduces R-squared as a measure of variance explained. The lecture emphasizes that correlation does not imply causation and highlights the importance of understanding the context of the data.

Uploaded by

mansoursuihal26

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

2 views

Two variables Chap3

Uploaded by

mansoursuihal26

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 47

FEEG6017 lecture:

Relationship between two

variables: correlation,
covariance and r-squared

Markus Brede
[email protected]
Relationships between variables
• So far we have looked at ways of
characterizing the distribution of a single
variable, and testing hypotheses about the
population based on a sample.
• We're now moving on to the ways in which
two variables can be examined together.
• This comes up a lot in research!
Relationships between variables
• You might want to know:
o To what extent the change in a patient's blood
pressure is linked to the dosage level of a drug
they've been given.
o To what degree the number of plant species in an
ecosystem is related to the number of animal
species.
o Whether temperature affects the rate of a chemical
reaction.
Relationships between variables
• We assume that for each case we have at
least two real-valued variables.
• For example: both height (cm) and weight
(kg) recorded for a group of people.
• The standard way to display this is using a
dot plot or scatterplot.
Positive Relationship
Negative Relationship
No Relationship
Measuring relationships?
• We're going to need a way of measuring
whether one variable changes when another
one does.
• Another way of putting it: when we know the
value of variable A, how much information do
we have about variable B's value?
Recap of the one-variable case
• Perhaps we can borrow some ideas about
the way we characterized variation in the
single-variable case.
• With one variable, we start out by finding the
mean, which is also the expectation of the
distribution.
Sum of the squared deviations
• Then find the sum
of all the squared
deviations from the
mean.
• This gives us a
measure of the
total variation: it will
be higher for bigger
samples.
Sum of the squared deviations
•
The variance
• This is a good measure of how much
variation exists in the sample, normalized by
sample size.
• It has the nice property of being additive.
• The only problem is that the variance is
measured in units squared.
• So we take the square root to get...
The standard deviation
•
The standard deviation
• With a good estimate of the population SD,
we can reason about the standard deviation
of the distribution of sample means.
• That's a number that gets smaller as the
sample sizes get bigger.
• To calculate this from the sample standard
deviation we divide through by the square
root of N, the sample size, to get...
The standard error
• This measures the precision of our
estimation of the true population mean.
• Plus or minus 1.96 standard errors from the
sample mean should capture the true
population mean 95% of the time.
• The standard error is itself the standard
deviation of the distribution of the sample
means.
Variation in one variable
• So, these four measures all describe
aspects of the variation in a single variable:
a. Sum of the squared deviations
b. Variance
c. Standard deviation
d. Standard error
• Can we adapt them for thinking about the
way in which two variables might vary
together?
Two variable example
• Consider a small sample of four records with
two variables recorded, X and Y.
• X and Y could be anything.
• Let's say X is hours spent fishing, Y is
number of fish caught.
• Values: (1,1) (4,3) (7,5) (8,7).
Two variable example
• We can see there's a positive relationship
but how should we quantify it?
• We can start by calculating the mean for
each variable.

• Mean of X = 5.
• Mean of Y = 4.
Two variable example
• In the one-variable case, the next step would
be to find the deviations from the mean and
then square them.
• In the two-variable case, we need to connect
the variables.
• We do this by multiplying each X-deviation
by its associated Y-deviation
Calculating covariance
• -4 x -3 = 12
• -1 x -1 = 1
• 2x1=2
• 3x3=9
• Total of the cross-multiplied deviates = 24.

∑i (X i− X̄ )(Y i−Ȳ )
In Formulae
• Variance:
2
V [ X ]=E [( X − X̄ ) ]
V [ X ]=1/( N −1) ∑i ( X i− X̄ ) 2

• Covariance:
Cov [ X ,Y ]=E [( X − X̄ )(Y − Ȳ )]
Cov [ X ,Y ]=1/(N −1) ∑i (X i− X̄ )(Y i −Ȳ )
• Note Bessel's correction in the sample
versions ...
Calculating covariance
• Divide by N if this is the population, or divide
by N-1 if this is a sample and we're
estimating the population.
• If this was the population, we get 24 / 4 = 6.
• If this is a sample and we want to estimate
the true population value, we get 24 / 3 = 8.
• Assuming this is a sample, we have a
measure of 8 "fish-hours" for the estimated
covariance between X and Y.
Properties of covariance
• You might remember the formula for the
variance of the sum of two independent
random variates. If they are correlated we
instead have:

V [ X +Y ]=V [ X ]+V [Y ]+Cov [ X ,Y ]

• Also, Cov [.,.] is linear:

Cov [ X +Y , Z]=Cov [ X , Z ]+Cov [Y , Z ]
Cov [aX ,Y ]=a Cov [ X ,Y ]
Interpreting covariance?
• Covariance has some of the properties we
want: positive, negative, and absent
relationships can be recognized.

• But "fish-hours" is difficult to interpret.

• Can we scale it in some way? ... Well, the

standard deviation of X is in hours, and the
standard deviation of Y is in fish...
The correlation coefficient
• So, if we take the covariance and divide by
the two standard deviations, we obtain a
dimensionless measure:

Cov [ X , Y ]
r=
√ V [ X ] √ V [Y ]
• So we obtain a correlation coefficient
• ... or more technically: a Pearson product
moment correlation coefficient
The correlation coefficient
• What magnitude will the measure have?

• You can't get anything more strongly related

than something with itself (or more strongly
anti-related than with minus itself)

• Recall that coveriance of X with itself is just

variance
The correlation coefficient
• This measure runs between -1 and 1, and
represents negative, absent, and positive
relationships.

• It's often referred to as "r".

• It's extremely popular as a way of measuring

the strength of a linear relationship.
The correlation coefficient
• In our case, the sample standard deviations
of X and Y are 3.16 and 2.58 respectively.
• r = 8 / (3.16 * 2.58) = 0.98.
• This is a very strong positive relationship, as
we can see from the original scatter plot.
Another example
• Invented data set where X is normally
distributed, mean = 100, SD = 10.

• For each of 500 cases, Y is equal to X plus a

normal variate, mean = 100, SD = 10.

• Y and X are clearly related, but there's also a

significant part of the variation in Y that has
nothing to do with X.
Calculating the correlation coefficient

• In Python, we use pylab.corrcoef(a,b)

where a and b are lists (returns a matrix).

• In R, it's cor(a,b) where a and b are

variable names. You can also use
cor(data) to get a matrix showing the
correlation of everything with everything else
in the data frame.

• For the previous example, r = 0.72.

Interpreting correlation coefficients
• 0.0 - 0.3: Weak relationship; may be an
artefact of the data set and in fact there is no
relationship at all.
• 0.3 - 0.6: Moderate relationship; you might
be on to something, or you might not.
• 0.6 - 0.9: Strong relationship; you can be
confident that these two variables are
connected in some way.
• 0.9 - 1.0: Very strong relationship; variables
are almost measuring the same thing.
Correlations measure linear
relationships only
Correlation is not causality
• Of course, just because X and Y are
correlated does not mean that X causes Y.

• They could both be caused by some other

factor Z.

• Y might cause X instead.

• Low correlations might result from no causal

linkage, just sampling noise.
Range effects
• Two variables can be strongly related across
the whole of their range, but with no strong
relationship in a limited subset of that range.

• Consider the relationship between price and

top speed in cars: broadly positive.

• But if we look only at very expensive cars,

the two values may be uncorrelated.
Range effects
• Consider the X, Y scatterplot from a few
slides back.

• If we limit the range of X to between 95 and

105, the correlation coefficient is only 0.27.
Confidence intervals
• Confidence intervals for correlation
coefficients can be calculated in much the
same way as for means.

• As an exercise: using the Python code for

this lecture, try drawing samples of size 50
repeatedly from the X, Y distribution and look
at the range of values for r you get.
Permutation tests

• Another method is via permutation tests. This

is a way to judge noise from small sample
sizes.

• Take the data for (Xi,Yi) and consider

permutations (Xi, Yi). You can treat them as a
sample which gives you a null hypothesis.

• Last step is to test whether your actual data is

likely to have been drawn from the sample.
Information about Y from X
• If I know the correlation between two things,
what does knowing one thing tell me about
the value of the other?
• Consider the X, Y example. X was a random
variable, and Y was equal to X plus another
random variable from the same distribution.
• The correlation worked out at about 0.7.
Why?
R-squared
• Turns out that if we square the correlation
coefficient we get a direct measure of the
proportion of the variance explained.

• In our example case we know that X

explains exactly 50% of the variance in Y.

• The square root of 0.5 ≈ 0.71.

R-squared
• r = 0.3 explains 9% of the variance.
• r = 0.6 explains 36% of the variance.
• r = 0.9 explains 81% of the variance.
• "R-squared" is a standard way of measuring
the proportion of variance we can explain in
one variable using one or more other
variables. This connects with the next lecture
on ANOVA.
Python code
• The Python code used to produce the
graphs and correlation coefficients in this
lecture is available here.

Correlation and Regression
100% (4)
Correlation and Regression
49 pages
Sales Career Path - 2019 PDF
No ratings yet
Sales Career Path - 2019 PDF
1 page
Workout Overview
No ratings yet
Workout Overview
8 pages
1 FCE Test
25% (4)
1 FCE Test
5 pages
DCS Mi-8MTV2 Guide PDF
100% (1)
DCS Mi-8MTV2 Guide PDF
114 pages
Definition 4.5.1: The Covariance of and Is The Number Defined by
No ratings yet
Definition 4.5.1: The Covariance of and Is The Number Defined by
2 pages
Chapter 8 - PSYC 284
No ratings yet
Chapter 8 - PSYC 284
7 pages
Lecture - Correlation and Regression GEG 222
100% (1)
Lecture - Correlation and Regression GEG 222
67 pages
Lecture 4 - Correlation and Regression
No ratings yet
Lecture 4 - Correlation and Regression
35 pages
Unit 2
No ratings yet
Unit 2
17 pages
Correlation and Chi-Square Test - LDR 280
100% (1)
Correlation and Chi-Square Test - LDR 280
71 pages
Unit 4
No ratings yet
Unit 4
10 pages
Stat 9-10
No ratings yet
Stat 9-10
100 pages
Correlation:: (Bálint Tóth, Pázmány Péter Catholic University:, Do Not Share Without Author's Permission)
No ratings yet
Correlation:: (Bálint Tóth, Pázmány Péter Catholic University:, Do Not Share Without Author's Permission)
9 pages
Laporan Praktikum Kecerdasan Buatan: Jobhseet 12 Corellation
No ratings yet
Laporan Praktikum Kecerdasan Buatan: Jobhseet 12 Corellation
11 pages
Oe Statistics Notes
No ratings yet
Oe Statistics Notes
32 pages
Correlation and Regression
No ratings yet
Correlation and Regression
11 pages
Corr and Regress
No ratings yet
Corr and Regress
42 pages
QMM 1
No ratings yet
QMM 1
18 pages
SPSS Aditya 3rd Sem
No ratings yet
SPSS Aditya 3rd Sem
55 pages
Correlation and Covariance
No ratings yet
Correlation and Covariance
11 pages
Correlation New
No ratings yet
Correlation New
37 pages
CPL Practical 1
No ratings yet
CPL Practical 1
14 pages
Math Review
No ratings yet
Math Review
29 pages
Lecture 7
No ratings yet
Lecture 7
65 pages
Covariance@3PM 24th Sep
No ratings yet
Covariance@3PM 24th Sep
9 pages
BA 216 Lecture 5 Notes
No ratings yet
BA 216 Lecture 5 Notes
31 pages
Unit 3 Covariance and Correlation
No ratings yet
Unit 3 Covariance and Correlation
7 pages
ECN 652 Handout 9 Student
No ratings yet
ECN 652 Handout 9 Student
46 pages
Correlation and Regression Analysis
No ratings yet
Correlation and Regression Analysis
17 pages
المادة العمية المتلقة بالارتباط والانحدار - د فواز القربي
100% (1)
المادة العمية المتلقة بالارتباط والانحدار - د فواز القربي
150 pages
C5 Covariance Correlation
No ratings yet
C5 Covariance Correlation
14 pages
CORRELATION AND COVARIANCE in R
100% (1)
CORRELATION AND COVARIANCE in R
24 pages
06 Correlation and Regression
No ratings yet
06 Correlation and Regression
63 pages
Chapter 14 Summary
No ratings yet
Chapter 14 Summary
2 pages
Practice Problems and Some Formulae
No ratings yet
Practice Problems and Some Formulae
2 pages
Correlation notes
No ratings yet
Correlation notes
17 pages
Mosconi W1
No ratings yet
Mosconi W1
14 pages
Statistics Learners' Working Manual
No ratings yet
Statistics Learners' Working Manual
25 pages
Course: Statistiek Voor Premasters
No ratings yet
Course: Statistiek Voor Premasters
51 pages
Joint Probability Distributions and Random Samples
No ratings yet
Joint Probability Distributions and Random Samples
28 pages
Basics
No ratings yet
Basics
8 pages
CORRELATION
No ratings yet
CORRELATION
4 pages
How To Calculate A Correlation
No ratings yet
How To Calculate A Correlation
5 pages
Data Reduction or Structural Simplification
No ratings yet
Data Reduction or Structural Simplification
44 pages
Basic Statistics
No ratings yet
Basic Statistics
31 pages
Introduction To Correlationand Regression Analysis BY Farzad Javidanrad PDF
No ratings yet
Introduction To Correlationand Regression Analysis BY Farzad Javidanrad PDF
52 pages
Appendix 1 Basic Statistics: Summarizing Data
No ratings yet
Appendix 1 Basic Statistics: Summarizing Data
9 pages
Correlation Analysis
No ratings yet
Correlation Analysis
54 pages
UNIT III PORIYAN NOTES (1)
No ratings yet
UNIT III PORIYAN NOTES (1)
33 pages
Chapter 3, Numerical Descriptive Measures: - Data Analysis Is
No ratings yet
Chapter 3, Numerical Descriptive Measures: - Data Analysis Is
21 pages
Simple Linear Regression: Y XI. XI X
No ratings yet
Simple Linear Regression: Y XI. XI X
25 pages
5 Bivariate Data. Double The Data, Double The Fun: 5.1 Covariance and Correlation
No ratings yet
5 Bivariate Data. Double The Data, Double The Fun: 5.1 Covariance and Correlation
10 pages
Corporate Finance - Statistics Review: Random Variable
No ratings yet
Corporate Finance - Statistics Review: Random Variable
15 pages
MetNum1 2023 1 Week 13
No ratings yet
MetNum1 2023 1 Week 13
70 pages
DV Stat
No ratings yet
DV Stat
39 pages
DSBD_Unit-II_4
No ratings yet
DSBD_Unit-II_4
15 pages
Correlation and Regression
No ratings yet
Correlation and Regression
23 pages
8 BUS250 Spring 2023 Hypothesis Testing - 1
No ratings yet
8 BUS250 Spring 2023 Hypothesis Testing - 1
72 pages
Correlation Coefficient Definition
100% (1)
Correlation Coefficient Definition
8 pages
The Significance of Correlation
No ratings yet
The Significance of Correlation
6 pages
Module 11
No ratings yet
Module 11
21 pages
BDA 09 Shridhti Tiwari
No ratings yet
BDA 09 Shridhti Tiwari
12 pages
Correlation and Regression: Six Sigma Thinking, #8
From Everand
Correlation and Regression: Six Sigma Thinking, #8
Sumeet Savant
5/5 (1)
Fuchs Lubricants Sweden AB - HYDRAULIC OIL 131 - Unknown - 07-05-2016 - English
No ratings yet
Fuchs Lubricants Sweden AB - HYDRAULIC OIL 131 - Unknown - 07-05-2016 - English
10 pages
Colour Of Compounds pdf by AceNEETJEE
No ratings yet
Colour Of Compounds pdf by AceNEETJEE
2 pages
RepetMat U13 Test Podst B
No ratings yet
RepetMat U13 Test Podst B
2 pages
Isi Thesis
No ratings yet
Isi Thesis
52 pages
Annotated Bibliography
No ratings yet
Annotated Bibliography
6 pages
Development of Working Alliance Over The Course of Psychotherapy
100% (1)
Development of Working Alliance Over The Course of Psychotherapy
5 pages
Suggest A Name & Meaning Name Meaning
No ratings yet
Suggest A Name & Meaning Name Meaning
7 pages
Employee Guide On Official Working Hours and Office Attendance
No ratings yet
Employee Guide On Official Working Hours and Office Attendance
2 pages
MSP - Benefit Profile
No ratings yet
MSP - Benefit Profile
1 page
Vouching Summary
No ratings yet
Vouching Summary
7 pages
The Centre For Education in Mathematics and Computing: Cemc - Uwaterloo.ca
No ratings yet
The Centre For Education in Mathematics and Computing: Cemc - Uwaterloo.ca
4 pages
Oriana FallaciI Interview Transcrpit
No ratings yet
Oriana FallaciI Interview Transcrpit
8 pages
Mathematics Prelim 3 QP MLSW
No ratings yet
Mathematics Prelim 3 QP MLSW
9 pages
Philosophy Compass - 2016 - McKinnon - Epistemic Injustice
No ratings yet
Philosophy Compass - 2016 - McKinnon - Epistemic Injustice
10 pages
Classification of Schools: 1. General Classification - The Schools in The State Shall Be Classified As
No ratings yet
Classification of Schools: 1. General Classification - The Schools in The State Shall Be Classified As
3 pages
Casey Sarah R. Erato Block D: Amount Paid For The Property Acquired
No ratings yet
Casey Sarah R. Erato Block D: Amount Paid For The Property Acquired
3 pages
Detailed Lesson Plan in English Wakangggg
No ratings yet
Detailed Lesson Plan in English Wakangggg
13 pages
Cut and Inject Herbicide Control of Japanese Knotweed in Cornwall (Conservation Evidence 2004)
No ratings yet
Cut and Inject Herbicide Control of Japanese Knotweed in Cornwall (Conservation Evidence 2004)
3 pages
MP Singh Admin Book (Mid Term)
No ratings yet
MP Singh Admin Book (Mid Term)
240 pages
Business Ethics Final
No ratings yet
Business Ethics Final
9 pages
Immediate Download Calculus Early Transcendentals 7th Edition Stewart Test Bank All Chapters
100% (3)
Immediate Download Calculus Early Transcendentals 7th Edition Stewart Test Bank All Chapters
24 pages
FrontierIncubator ToolkitPDF
No ratings yet
FrontierIncubator ToolkitPDF
116 pages
Getting Started With SAP Adobe Forms
100% (1)
Getting Started With SAP Adobe Forms
12 pages
HOPE 4 Q4 Week 6
No ratings yet
HOPE 4 Q4 Week 6
15 pages
Resume
No ratings yet
Resume
3 pages
(www.entrance-exam.net)-Sathyabama University-B.Tech in IT-5th Sem Probability and Statistics Sample Paper 3
No ratings yet
(www.entrance-exam.net)-Sathyabama University-B.Tech in IT-5th Sem Probability and Statistics Sample Paper 3
5 pages

Two variables Chap3

Uploaded by

Two variables Chap3

Uploaded by

FEEG6017 lecture:

Relationship between two

V [ X +Y ]=V [ X ]+V [Y ]+Cov [ X ,Y ]

• Also, Cov [.,.] is linear:

• But "fish-hours" is difficult to interpret.

• Can we scale it in some way? ... Well, the

• You can't get anything more strongly related

• Recall that coveriance of X with itself is just

• It's often referred to as "r".

• It's extremely popular as a way of measuring

• For each of 500 cases, Y is equal to X plus a

• Y and X are clearly related, but there's also a

• In Python, we use pylab.corrcoef(a,b)

• In R, it's cor(a,b) where a and b are

• For the previous example, r = 0.72.

• They could both be caused by some other

• Y might cause X instead.

• Low correlations might result from no causal

• Consider the relationship between price and

• But if we look only at very expensive cars,

• If we limit the range of X to between 95 and

• As an exercise: using the Python code for

• Another method is via permutation tests. This

• Take the data for (Xi,Yi) and consider

• Last step is to test whether your actual data is

• In our example case we know that X

• The square root of 0.5 ≈ 0.71.

You might also like