8_2_correlations+models_ninell
8_2_correlations+models_ninell
Linear Models
Lecture 13
Empirical Methods 2 & Theory of Science
01.11.2024 2
Today
• Associations & Scatter Plots
• Correlation (Pearson’s r)
• Linear Models & how to fit them
• Residuals
• Assumptions of Linear Models &
Steps of Hypothesis Testing
01.11.2024 4
Today
• Associations & Scatter Plots
• Correlation (Pearson’s r)
• Linear Models & how to fit them
• Residuals
• Assumptions of Linear Models &
Steps of Hypothesis Testing
01.11.2024 5
Associations
PRICE
Amount of
ice cream
Questions to ask:
• Who or What is Described?
• What Variables are Included?
• How are the Variables Measured?
• What Types of Variables Are There?
01.11.2024 8
Today
• Associations & Scatter Plots
• Correlation (Pearson’s r)
• Linear Models & how to fit them
• Residuals
• Assumptions of Linear Models &
Steps of Hypothesis Testing
01.11.2024 13
Pearson’s r
Definition: Pearson’s r measures the direction and strength of the linear
relationship between two quantitative variables
• Can range from -1 (negatively correlated) to +1 (positively correlated)
Assumptions:
• Normality
• Linearity
• Homoscedasticity – constant scatter
pattern across range
• Independence of errors
01.11.2024 14
* Correlation
Based on Means and Standard Deviations
• Correlation is calculated using the average values (means) and variability (standard
deviations) of two variables.
No Need for Independent or Dependent Variables
• Correlation measures the relationship without assuming one variable depends on the
other.
Applies Only to Continuous Variables
• Both variables need to be continuous (e.g., height, temperature) for correlation to be
meaningful.
Standardized Values, No Units
• Correlation uses standardized scores (z-scores), meaning it has no units of
measurement—just a value.
rxy = =
01.11.2024 15
* Correlation Formula
calculate one data point sum of all individual differences
the average from group x (same as on the left)
mean from
the result group x
rxy = =
standard deviation
of group x square to
“n-1” instead of “n” is the “Bessel's correction” remove the sign
square root to
(look it up if you’re interested, not covered here ;))
reverse the square
formula of the
standard deviation
01.11.2024 16
Correlation
Size (x) Price (y) (xi-μx) / sx (yi-μy) / sy Multiply
our correlation
01.11.2024 17
*
Correlation
• Measures strength of only linear relationships.
• Is very sensitive to sample size.
• Is sensitive to outliers != caution!
• Cannot prove causality, only infer presence of a relationship.
01.11.2024 18
Correlation
Today
• Associations & Scatter Plots
• Correlation (Pearson’s r)
• Linear Models & how to fit them
• Residuals
• Assumptions of Linear Models &
Steps of Hypothesis Testing
01.11.2024 26
* Variables
Definition of a Linear Model
• Dependent Variable (Response Variable): 𝑦
• The outcome you are trying to predict or explain, e.g. ice cream price.
• Independent Variables (Explanatory Variables): 𝑥1, 𝑥2, …
• The values (predictors) that make changes to the dependent v., e.g. ice cream size
Purpose of Variables
• Objective: To explain patterns in the dependent variable (𝑦) using the
independent variables (𝑥1, 𝑥2, …).
Model Representation
• Relationship: 𝑦 = model + error
• Variance Partitioning: The variance / error in 𝑦 can be divided into:
• Explained Variance: Variance attributed to the independent variables.
• Unexplained Variance: Variance due to random error or other factors.
01.11.2024 30
30
01.11.2024 31
*
The Regression Equation Linear Function of an Independent Variable
• Represents the relationship between the
Linear Function of X Random Error independent 𝑋 and the dependent 𝑌
Random Error
• Accounts for the variability in 𝑌 that cannot
be explained by the linear function.
Intercept Slope Intercept (a):
• Represents the value of 𝑌 when 𝑋 = 0.
Slope (b):
• How much does 𝑌 change for a change in 𝑋.
• Is a ratio of change, or steepness of line.
01.11.2024 32
Today
• Associations & Scatter Plots
• Correlation (Pearson’s r)
• Linear Models & how to fit them
• Residuals
• Assumptions of Linear Models &
Steps of Hypothesis Testing
01.11.2024 37
Residuals
A residual is the difference between the predicted value and the observed
value. Examining the residuals helps assess how well the line describes the
data.
The model can under- or over-estimate.
residual = observed y – predicted y
01.11.2024 40
* Residuals
A residual plot is a scatterplot of the regression residuals against the explanatory
variable, helps us assess the model assumptions. If the regression line fits the
overall pattern of the data, there should be no pattern in the residuals.
Trend in dispersion can indicate
that a data transformation is
necessary.
01.11.2024 41
Today
• Associations & Scatter Plots
• Correlation (Pearson’s r)
• Linear Models & how to fit them
• Residuals
• Assumptions of Linear Models &
Steps of Hypothesis Testing
01.11.2024 42
*
Regression Assumptions
• Additivity and linearity (same as correlation!)
• Independent errors (independent and identically
distributed IID assumption)
• Normally distributed
• Predictors are uncorrelated with external variables
(nothing lurking!)
• No multicollinearity
• Homoscedasticity
• Non-zero variance (something to explain!)
01.11.2024 43
Thanks! :)