0% found this document useful (0 votes)
3 views

Cheatsheet Part 2

Basic lvl2 Statistics

Uploaded by

kidkapper007
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
3 views

Cheatsheet Part 2

Basic lvl2 Statistics

Uploaded by

kidkapper007
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 2

Linear equations:

B coefficient associated with group variable = difference between lines at X = 0, group – reference
B coefficient associated with x variable = slope of the reference
B coefficient with interaction = where lines cross is the starting point, then you move 1 to the right and check the difference between lines, group – reference

Addition = both variables independently effect the dependent


Interaction = variable effects rela between two indep, with dichotomy, take lowest expected effect as 0

Residuals should look like: overall normality (correlation residuals - predict values) and same variance (scatterplot variables – residuals, should be in a box)

If residuals are problematic: 1) Change/reconceptualize variables: log of variable. 2) Change the model/include extra variables (called omitted variable)

If we are low on the Y variable, then the residuals will be negative. If the residuals are on top, high level of Y, then the residuals will be positive. In the middle,
the residuals will be both negative and positive residuals. It is about the predictive value of Y, and the residuals. Not the Y itself, because there will always be
a relationship.

Non linearity = when the line bends slightly over time, Parabolic = when the line bends all the way to the starting point over time, kwadraad
Logarithmic = it doesn’t bend again, For removing non linearity: kwadraard gebruiken of sqrt of Y

If a model is good, errors will be random


Discrepancies = the differences between what you expect to find, had it been a normal distribution, as compared to what you see in dataset

Shapiro-Wilk = W: range between 0 and 1. 1 means perfectly normality. W tells you how (un)likely it is that this comes from a normal distribution. Null hypo
= population = normal distribution. P value below 0.05, then null hypo is rejected = not normal distribution.

Homoscedasticity = homogeneity (of variances) = equal variance


Heteroscedasticity = heterogeneity (of variance) = unequal variance

Levene’s test = mainly for groups: when you want to detect whether the error variance in one group is different from another. Null hypo = equal variances.
Big sample size or unequal variances will not really affect the s.e. estimates in larger samples = always significant = disadvantage of test.
Breusch-Pagan test = linear models: studying whether residuals are associated with one or more variables. If there is association = no homogeneity. Null
hypo = homogeneity. P value above 0.05 = equal variance in residuals

Residual = extent to which a point is away from estimated line/model, Leverage = outlier on independent (x)
Influence = extent to which slope of line is affected by datapoint (= high residual + high leverage)

Cooks distance: when bigger than 1 = problematic. Outside lines.


Deterministic bivariate = y is fully explained by one variable. Probablistic = y is NOT fully explained by one variable.
𝒄𝒓𝒊𝒕𝒊𝒄𝒂𝒍_𝒗𝒂𝒍𝒖𝒆𝟐 (𝒑)(𝟏−𝒑) p∗(1−p)
Calculate N = 𝑴𝒂𝒓𝒈𝒊𝒏 𝒐𝒇 𝑬𝒓𝒓𝒐𝒓𝟐
= if you don’t know p, take 0.05 ////// SE = √
sample size
CI = ESTIMATE + MARGIN OF ERROR, MARGIN OF ERROR = 2 x sd of pop or SE of sample

1)They give moe:


P=
moe =
nanswer = (1.96^2*p*(1-p))/moe^2

2)They don’t give moe:


nquestion =
p=

Don’t touch this part:


se = sqrt(p*(1-p))/sqrt(nquestion)
lower = p-1.96*se
upper = p+1.96*se
moe = 1.96*se/2
nanswer = (1.96^2*p*(1-p))/moe^2

Non parametric tests:


Kruskal-Wallis test: three or more independent groups -> determine significant diff between medians of groups
Mann-Whitney-wilcoxon test = wilcoxon rank sum test: two independent groups with ordinal or continuous data
Wilcoxon Signed-Rank test: two paired groups with ordinal or continuous data
Sign test: used to determine whether the median of a sample differs from known hypothesized value

R:
#adding residuals and predicted values
dataname$res1 <- model$residuals
dataname$pred1 <- model$fitted.values
dataname %>% or use plot(model, 1)
ggplot(aes(x = pred1, y = res1)) + geom_...

R packages: tidyverse, broom, modelr, car, lmtest, haven, dplyr

You might also like