Cheatsheet Part 2
Cheatsheet Part 2
B coefficient associated with group variable = difference between lines at X = 0, group – reference
B coefficient associated with x variable = slope of the reference
B coefficient with interaction = where lines cross is the starting point, then you move 1 to the right and check the difference between lines, group – reference
Residuals should look like: overall normality (correlation residuals - predict values) and same variance (scatterplot variables – residuals, should be in a box)
If residuals are problematic: 1) Change/reconceptualize variables: log of variable. 2) Change the model/include extra variables (called omitted variable)
If we are low on the Y variable, then the residuals will be negative. If the residuals are on top, high level of Y, then the residuals will be positive. In the middle,
the residuals will be both negative and positive residuals. It is about the predictive value of Y, and the residuals. Not the Y itself, because there will always be
a relationship.
Non linearity = when the line bends slightly over time, Parabolic = when the line bends all the way to the starting point over time, kwadraad
Logarithmic = it doesn’t bend again, For removing non linearity: kwadraard gebruiken of sqrt of Y
Shapiro-Wilk = W: range between 0 and 1. 1 means perfectly normality. W tells you how (un)likely it is that this comes from a normal distribution. Null hypo
= population = normal distribution. P value below 0.05, then null hypo is rejected = not normal distribution.
Levene’s test = mainly for groups: when you want to detect whether the error variance in one group is different from another. Null hypo = equal variances.
Big sample size or unequal variances will not really affect the s.e. estimates in larger samples = always significant = disadvantage of test.
Breusch-Pagan test = linear models: studying whether residuals are associated with one or more variables. If there is association = no homogeneity. Null
hypo = homogeneity. P value above 0.05 = equal variance in residuals
Residual = extent to which a point is away from estimated line/model, Leverage = outlier on independent (x)
Influence = extent to which slope of line is affected by datapoint (= high residual + high leverage)
R:
#adding residuals and predicted values
dataname$res1 <- model$residuals
dataname$pred1 <- model$fitted.values
dataname %>% or use plot(model, 1)
ggplot(aes(x = pred1, y = res1)) + geom_...