0% found this document useful (0 votes)
40 views

AMDA Cheat Sheet Spring FINAL3

Uploaded by

r g
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
40 views

AMDA Cheat Sheet Spring FINAL3

Uploaded by

r g
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 2

LECTURE 1: Moderation. If it depends if X has an effect on Y (diff for diff groups), then it’s moderation.

explained (shouldn’t use): Total effect (Rtot2). Total proportion of Y variance explained by X is the squared variables per participant. Shrinking the beta value: Split the data in test and training sets, on the training set we
Modifiers qualify the relationship btwn X and Y. Symmetrical: Z moderates btwn X and Y, but X also moderates zero-order correlation: Rtot2 = rXY2. R2 = any variance in Y that is explained by any part of our model. Rtot2 = fit a linear regression model, giving us an estimated regression coefficient b hat. Multiply b hat by value smaller
btwn Z and Y. The predictors (X and Z) are always correlated w/ each other. All predictors are allowed to variance in Y explained by both X and Z. Rdir2 = variance in Y explained directly by X, after accounting for Z. than 1 to shrink it by shrinkage factor s (btwn 0 and 1). Use this new value to compute predictions for the test
2
correlate w/ each other. 4 cases of moderation w/ diff. measurement lvls of vars (NOTE: in this course, Y is Issue: Rmed can be negative, when dir and indir effect are of opposite signs (suppression). MULTIPLE set: y hat = s b hat * x. Biasing the beta on purpose. See what result will be when applying biased model on test
always interval): 1. X and Z both interval → regression analysis. A standard regression model is a main effects MEDIATION: new measure added: predicting Y from all mediators: X, Z1, Z2,Z3 → Y.(total indirect effect). set. Sometimes shrinkage makes prediction better, not always, too much will increase PE. See graph for what
only model. Add product term to the regression equation and we get: Yhat = (b0+b2Z)+(b1+b3Z)X. Interpretation: does the CI include 0 or not? Still have total effect c and direct effect c’. Add as many a dn b paths shrinkage is best (depends on the model). (more bias on left side of graph) WHY DOES SHRINKAGE
For every value of Z, diff. X value → gives non-parallel regression lines. Testing b3 = testing for interaction. as we have mediators, for each mediator repeat the same tests. Sobel’s test becomes sobel test for specific WORK? Decreases variance, even though bias increases. If variance decreases more than bias increases, we’re
Graphing interaction: drawing Y-X regression lines for diff. values of Z. Choose Z values w/in your observed indirect effect, same as sobel test just repeat it for each variable. Want to know: does any one of these mediators good. Optimal shrinkage factor for every model. LASSO REGRESSION: more common way to apply
range → usually ppl use the mean of Z, MeanZ - 1SD of Z, and MeanZ + 1SD of Z. Computing interactions: produce a significant indirect effect?. Parallel multiple mediator model: assumes mediators do not have a shrinkage factor. Use when you have many predictors. Purpose: preventing overfitting with complex models →
when XZ in included in the regression, meaning of Z on its own → effect of Z when X=0 (and vice versa). If X causal relationship w/ each other. Serial multiple mediator model: used if mediators have causal relationship w/ simpler model that generalizes better w/ new data. Same as normal regression, but penalty on inclusion of
and Z are centered, then Z means the effect of Z on Y when X is at its mean → approx. average effect of Z on Y. each other. Case 2: When X is nominal: steps: 1. ANOVA for X → Y (x=factor, y=dependent). 2. ANOVA for X predictors. Gets shrunken beta weights, some predictors excluded. Including a predictor will only pay off when
Don’t interpret X and Z once you have XZ, but still include them. Centering variables: center X and Z prior to → Z (X=factor, Z=dependent) 3. ANCOVA for X,Z → Y (X=factor, Z=covariate, Y=dependent). Notes: For predicted performance of the predictor (explained variance) is higher than the penalty it takes w/ it. Predictors
fitting the model. Why center? → to solve the problem of the value of 0 not being in the actual range of the data. interpretation, it’s not a 1 unit change in X, when X increases it’s going from group 1 to group 2 e.g. Remember with high beta values will add higher penalty. Penalty size = λ (lambda). If λ = 0 → normal regression. Deciding
Z being =0 is usually unnatural in the context. XZ is usually highly correlated w/ X and Z, this is collinearity, look at the group means. Regression weight of the covariate → ask for parameter estimates. LIMITATIONS λ value → use cross-validation. Only one λ for all predictors. When λ is small ; less bias, more variance. When
which is bad. Centering lowers corrs. of XZ w/ X and Z → good. STEPS IN MODERATION: 1. Compute MEDIATION: assumes causal order of variables, effects estimated based on this. Results can’t probe that λ is large: more bias, less variance. LASSO VARIABLE TRACE PLOT: left side: λ high, right side, λ low.
means (for centering) and SDs (for computing separate regression lines). 2. Compute centered vars and assumed order is correct. Solution: do a mediation model for each ordering, choose model w/ results that look Top = # predictors still left after penalty adjusted. By decreasing λ , some variables immediately get important.
interaction term: subtract meanX from varX and meanZ from varZ, then multiple centered varX * centered varZ. most plausible. Confounding: all effects should be due to the causal relations of X, Y, and Z, and not be Can’t see what model is best from this plot. LASSO CROSS-VALIDATION PLOT: find which model is best
3. Hierarchical regres anal, w/ varY as dep. var., varX and varZ as predictors in block 1, add Y in block 2. look at confounded by relationships w/ other variables (C). Possible confounders (not exhaustive) in image. w/ w/ cross-val., lower mean squared error the better. Low penalty on left side. First Dotted line will be over the
“Sig. F Change” in SPSS output to see if the interaction adds sig. explained variance. 4. Analyze Model 1 (no int. Randomization, you can exclude C1 and C3 as confounders, by removing paths from C1 and C3 to X. But → lowest dot, meaning best model (one that generalizes best to the test data). INTERPRETATION: not main point
effect). Unstandardized beta weight: for 1 unit increase in X (or Z), theres a … unit increase in Y. 5. Analyze randomization of X doesn’t make C2 confounders disappear, causal direction of Z - Y relationship may not follow of predictive regression and not that reliable. Be cautious with interpretation, bc if two variables are highly
model 2 (does have int effect). If the beta for the int. effect is sig., then there’s a sig.int. effect. Don’t interpret the from the design. Power issues: Test for indirect effect has more power than test for direct or total effect of the correlated, only need one of those variables. Don’t report lambda bc it’s dependent on the scale of the variables
effects of X and Z in model 2. 6. Create regression lines for three lvls of Z → average (Z=0), one SD below same size. Limitations: cross sectional data → no mediation unless randomization. LECTURE 3: Predictive and not interpretable. LECTURE 5: Multilvl introduction. Just a type of model you can make, bit more complex
average, and one SD above average.now have 3 regr. equations, all w/ diff. y ints. and slopes. 2.Nominal X, regression.Explanatory regression relies heavily on assumptions, and regression weights should be unbiased. than linear regression.Two types hierarchical (multilvl) data: 1. (explicit) Multi-stage sampling: e.g.selecting
interval Z. diffs. on Y btwn groups (X is groups) are diff for diff values of Z. STEPS: 1. standard ANCOVA w/ Predictive regression is focused on predicting Y based on X. Explanatory and predictive models will not be the schools, then students. 2. (implicit) people belong to “clusters” (children in school or employees in firms). lvl 1
only main effects of factor and covariate. 2. add covariate x factor interaction (ignore but include main effects) 3: same best model if, e.g., you have a lot of predictors. Explanatory regression: are the beta values significantly diff variable: lowest lvl variable (students, or scores at diff times in longitudinal). lvl 2 variable: higher variable (e.g.
if sig int., compute/draw separate regression lines for the diff groups. 3. Interval X, nominal Z. regression slopes than 0/what are their effect sizes? More theory focused than predictive. Predictive regression: if we have new # of children at each school, or person[if longitudinal). Clusters are always lvl 2. Anything that changes over
of Y on X are diff in the diff groups (Z is groups). For both case 2 and 3, use ANCOVA w/ covariate X factor X’s, what will be the predicted value of Y and how accurate is this? Don’t care about how important each time → lvl 1. Doesn’t? → lvl 2. CROSS-CLASSIFIED DATA: ppl belong to classification 1 (e.g. certain
interaction. 4. Nominal X and Z. interaction in ANOVA is predicted.group diffs on Y are diff for diff lvls fo the predictor is. Pred. regression model may be sparser than the one you’d get for the same data for explanatory school) and classification 2 (e.g. specific neighborhood), these aren’t nested, bc 2 ppl could live in same
moderator. Exactly as the 1st case. Power in moderation: power for testing interactions is low, need high N for regression. If the model we created predicts the outcomes well, it’s a good model, regardless of assumptions. neighborhood, but diff. schools. lvlS ARE units sampled at random from wider population. CORRELATED
testing moderation, need more ppl than you would for norm regression analysis. LECTURE 2: Mediation. Training and test data: splitting the data into training and test data is cross-validation. Training phase: create a DATA: an assumption of linear regression (correlated residuals) is violated. In longitudinal data, measurements
Causal effect chain. Correlational data cannot say Yes to a causal model, but it can say No. Notation: bYX.Z: prediction model, look for beta values for your predictors. 𝑦 = 2 + 0. 5𝑋1𝑖 + 1. 5𝑋2𝑖. Testing phase: focuses on w/in subjects are corr. In non-longitudinal, measures w/in clusters are corr. Total variance is combo btwn w/in
regression weight for prediction of Y from X w/ Z as the other predictor. bYX : Y is predicted from X alone. Two and btwn cluster variation. W/in clusters: variance at lower lvl (variation in scores btwn children from same
cases of mediation: Case 1: X is interval (often done in non-experimental studies) → series of regression the accuracy of y hat. We get new observations X1 = 2 and X2 = 3 (e.g.) from participant A. apply prediction rule family). Btwn clusters: variance at higher lvl (variation in av. math score btwn families). Solution to correlated
analyses. Causal steps approach: 1. Total effect (c): Is X related to Y (ignoring Z)? W/out this relationship, that we got in the training phase to new participant scores. Difference btwn prediction that we make (y hat) and clusters: multilvl. When there is high intra-class correlation → high dependency (can estimate other member’s
nothing to mediate. To estimate → regression predicting Y from X → find beta. Necessary step: technically no, observed value of participant is prediction error. Note: training and testing done on diff data sets to get scores based on another member of same cluster). Low ICC? → still use linear regression. INTERCLASS
direct and indirect effects may be of opposite sign and neutralize each other: inconsistent mediation. 2. Indirect out-of-sample prediction accuracy. EVALUATE QUALITY OF THE MODEL: Mean squared prediction CORRELATION: ratio of between cluster variance to total variance. Large btwn-cluster var → large ICC. ICC
effect (a): is X related to Z (ignoring Y)? If X does not cause Z, then no mediation. To estimate → Regression error (PE): PE(𝑓(𝑋 )) = 𝐸[(𝑦 − 𝑓(𝑥 ))2]. This is simply diff. btwn the observed Y and the predicted Y for is 1st test to see if there’s dependency and we need multilvl. WHY USE MULTIlvl: distinction btwn btween and
0 0 0
predicting Z from X. Necessary step: yes. 3. Indirect effect (b): is Z related to Y (controlling for X)? If Z does
that participant, then square it and take the average over all participants. Prediction error in 3 parts: Bias: diff. w/in cluster variance (composite residual). And correct SEs and p-values for regression coeffs (fixed effects).
not cause Y → no mediation. Why include X in this one? → Z and Y could be only spuriously correlated (if both And you can ask a richer set of questions and deal better w/ missing data. MULTIlvl VS REPEATED
btwn estimated f hat and true f. Less bias → more variance. Can come in the form of: 1. Making a model too
Y and Z are caused by X). We want to find that there is a unique effect of Z on Y. Necessary step?: yes. 4. Direct MEASURES ANOVA: limitations of rmANOVA: only captures some of dependency btwn observations
simple (making it linear e.g.) 2. Only using a few predictors rather than them all. Variance: variability of the
effect (c’): Is X related to Y (controlling for Z)? If X is still related, there is at best partial mediation. If X is no
estimated f hat. Too much variance = model is overfitted to the training set. Irreducible term: even if we found (equivalent to random intercepts model). Can’t handle unbalanced or missing data (subjects w/ missing data are
longer related, there may be complete mediation. To estimate: regression weight predicting Y from X (w/ Z as
perfect true function, this is always be there: don’t focus on it. BIAS VARIANCE TRADEOFF: sometimes too removed) or non-normally dist. data. Everyone needs to be measured at same timepoints. WIDE VS LONG
other predictor. Necessary step: kinda. The complete/partial binary can only be made in theory. Nonsignificant B FORMAT DATA: wide = person-lvl. Child 1 score 1 score 2 score 3. long= person-period. Each row is combo of
similar linear model. Large bias (bc large distance btwn estimated and true f’s) small variance (bc models are
does not imply that B=0. Complete mediation means that all influence of X on Y goes through the mediator Z person and observation. Use for R. Child 1 measurement 1 score (then beneath) Child 1 msm 2 score. WHY
very stable, not much diff. btwn them). Pros: generalization to another dataset would not be too much off.
(but no direct effect X on Y). Partial mediation means that some influence of X on Y goes through the mediator CENTER A VARIABLE: automatically know if someone scores below or above avg. based on their score.
Complex model w/ smaller bias, larger variance btwn each model. Pros: more flexibility for each individual
Z. 3 REGRESSION ANALYSES NEEDED TO TEST: 1. Predicting Y from X. 2. Predicting Z from X. 3. Centering variable: intercept changes but not slope, bc relationship is still the same. What are we interested in
dataset Cons: might not generalize to another dataset. Overfitted. BALANCING BIAS AND VARIANCE:
Predicting Y from X and Z. (do this last one simultaneously for X and Z).
When there’s a lot of included predictors → bias very low, variance very high. In statistical learning (modern in multilvl? 1. intra-individual differences: individual trajectories (lvl 1, within person) 2. inter-individual
statistics): accept biased parameters as long as the variance decreases more than the squared bias increases. May differences: effect of covariates on individual trajectories (lvl 2, btwn person) EXPLORATIVE ANALYSIS:
not be the optimal model, we might get more bias → more PE, but the model gets more stable, less variance, Plots: 1st plot: Colored spaghetti plot: darker line is the mean score, tells us the average trajectory. Groups on
c = total effect: total effect of independent var X on Y w/out considering Z. c’ = direct effect: accounting for Z x-axis. 2nd plot: individual trajectories (separate regressions for each person). Visually see if most participants’
better suited to new data set, so we accept it. TRAINING AND TEST SETS: Randomly split, training set =
ab = indirect effect: The effect of X on Y that is explained by X’s relationship w/ Z. so (a*b). In linear regression → linear model. You get a R2 value for each participant. Look at histograms for
50-75% of the participants. Estimate regression model on training set → in test set, compare predicted y values trajectories are linear. Yes?
only, c = c’ +a*b. INTERPRETATION: unstandardized: 1 extra unit X gives c extra units of Y (total effect) of 2
w/ observed ones. w/ competing models: fit each model on training set, and evaluate each model on the test data intercepts, slopes, and R for each person. if intercepts and slopes are related graph: shows intercept v. slope and
which c’ extra units are via direct effect and a*b extra units via indirect effect. Standardized: w/ SD’s instead of their correlation. Subject-specific intercepts graphs and slope graphs as a function of group: can see if
set → which is better? CROSS VALIDATION: Disadvantage: inefficient, you need 2x as many data points.
units. Defining direct effect: c=c’+ab can be rewritten c-c’=ab. Quantifying direct effect: difference btwn
Improve this by recycling the training/test sets: K-FOLD CROSS-VALIDATION: use one fold as a test set, and intercepts (or slopes) tend to be higher in certain groups. LECTURE 6: Multilvl pt. 2. Notation: i: subscript for
coefficients c-c’, or product of coefficients: a*b. There’s a significant indirect effect only when a,b, and c are
the other K-1 folds as a training set (90% of the data is for training if there are 10 folds). Repeat until the model is observations/measurements (lvl-1 units) j: subscript for subjects (lvl-2 units, cluster). J: total # subjects. Yij: value
significant (for barron and kenny), but this is low in power bc requires 3 sig tests, and applying a cutoff to all of of outcome variable for measurement i from subject j. Variables at 2 lvls: lvl 1 (time-varying): X1ij: variable 1 for
tested on 100% of the data (so 10 times, if there are 10 folds). Repeat procedure K times. To get final PE: for
them. TESTING THE INDIRECT EFFECT: 1. Sobel test: get sense of magnitude of ind. effect by converting
each iteration, calculate the PE and average over all K # of folds. Amount to choose for K? → often we use 5 or observation i in cluster j. lvl 2 (time-invariant): Z1j: variable 1 for cluster j. Don’t need i bc lvl 2 doesn’t vary for
into a Z score. How? → z = ab / SEab. In large samples Z-value follows a std. norm dist., so if |z|> 1.96, p < .05.
2 2 2 2 2 2 10, but can also choose K= # of participants, so if 200 participants, each K fold train model on 199 participants, i. MULTIlvl STEPS (LONGITUDINAL): what is the influence of (insert lvl 2 variable) on the slope and
Use Z dist. table to find if associated p-value is sig. How to get SEab ? SE = √𝑏 𝑠𝑎 + 𝑎 𝑠𝑏 + 𝑠𝑎𝑠𝑏. Problems w/ repeat 200 times. Estimated prediction error in K folds is variable, bc the randomization is diff for everyone. intercept of (insert lvl 1 variable). Gammas (𝜸’s) answer these questions. 1. Regression model for lvl 1
(evolution over time: intra-individual differences, w/in-subject) per subject. Make regression model for every
Sobel: large sample test — doesn’t work well w/ small samples, they don’t follow norm dist. Higher chance type Solution: use repeated (e.g. 100 times) cross validation and calculate mean PE over all repetitions to make PE
subject j using their measurements i: Yij = b0j + b1(X1ij) + eij. Intercept b0j and slope b1j are allowed to vary
II error: sig. Effect but don’t detect it. Sobel assumes norm dist. Method 2. Bootstrap: use when not big enough more stable. MODEL SELECTION: if you have set of models, split data into training/test, do K folds, figure
N to have a std norm dist. Sampling w/ replacement from our data frame, rerunning our analysis (5000 is std. out which model is better. Calculate PE using the test set. More models you try → more likely that one of them btwn subjects. 2. Regression model for lvl 2 (w/ residual) for all lvl 1 regression coeffs (btwn-subjects diffs).
How do subject-specific intercepts/slopes change depending on lvl 2 variables? Each person has diff intercept and
minimum). Steps bootstrapping: 1. Resample 5000 times (pseudo populations) 2. Compute statistic ab* for each will have a very low PE by chance. ONLY calculate PE for the best model. ROOT MEAN SQUARED
slope. Diffs in intercepts/slopes = intercept/slope variance. Want to explain diffs in intercepts and slopes with
sample 3. Compute confidence interval (CI). Line up ab* values to find border values (2.5% and 97.5%). If 0 (or ERROR: for ease of interpretation, calculate RMSE, which is on the same scale as the outcome variable. If
predictors. Intercept: b0j = 𝜸00 + 𝜸01(lvl2var)j + 𝝁0j. lvl 2 regression model for the intercepts. In this case, the
what your null-hypothesis number was doesn’t fall w/in CI, then effect differs from 0. (sig at 0.05 lvl). Two types RMSE is 1, then on average you’re 1 off w/ your predictions. Evaluate size of the RMSE: compare it to the SD
outcome (b0j) is the starting point. 𝜸00: mean intercept (intercept value for avg. person from population; person’s
of bootstrapping: 1. Percentile bootstrap: middle 95% scores. 2. Bias corrected bootstrap: There’s bias if the of the outcome variable, if it’s lower than SD, then at least the model is better than a model with no prediction at
intercept when the lvl 2 var in question is 0 (or mean, if you centered it)). 𝜸01: effect of 1st lvl 2 variable on
mean of resampled estimates is diff from observed ab from the mother sample. Common practice nowadays (but all (only an intercept), and you have some predictive power. Error for dichotomous outcomes: Classification
slight increase in type I errors). Notes: bootstraps are asymmetric: lower and upper borders don’t have same error rate (CER). Calculate this for all the participants (0 or 1 for each participant) and take the average ( 0.3, subject’s intercept. Amount of increase of your starting point (intercept) if the lvl 2 predictor in question increases
distance to observed ab. Doesn’t assume norm dist. If conflicting info from sobel & joint sig., follow the e.g.). Predictions are btwn 0 and 1. If absolute difference btwn observed outcomes and the predictions is larger by 1. This has the function of “slope” bc b0j is the starting point. For each subject there will be a diff 𝝁0j value.
bootstrap. Method 3: Joint significance test: only looks at a and b sig, not c. indir effect ab only sig if both a than .5, the predicted value is wrong. CER can be evaluated by comparing it to random guessing (.5 error rate) or Assumed to be from a normal dist. Slope: same as for intercept. Each subject has diff. slope. b1j= 𝜸10 +
𝜸11(lvl2var)j + 𝝁1j. Outcome in this case is the slopes themselves. Higher rate of change (or lower, depending on
and b are sig (ignoring c). If both a and b are significant, then the indirect effect is significant. Does not give you to error rate obtained when assigning all subjects to the largest class (think fake bank notes).
LECTURE 4: predictive regression pt. 2. LASSO regression will balance between bias and variance. polarity) for higher lvl of lvl 2 variable. 𝜸10 = mean slope (value slope for avg. person from pop, person’s
a specific p-value for indirect effect, as sobel’s test does. Pro: good control over type-I errors and good power,
intercept when the lvl 2 var in question is 0 (or mean, if you centered it)). 𝜸11 = effect of lvl2 variable in question
less liberal than percentile bootstrap. MEASURING EFFECT SIZE FOR MEDIATION: How much of total Polynomial models will decrease bias but increase variance. The fewer participants you have → more risk
effect of X on Y is mediated through Z? 1. Proportion mediated: Amount of mediation: reduction of the overfitting. MORE FLEXIBLE MODELS (POLYNOMIAL [NONLIINEAR]): more hypothetical, not used on subject’s rate of change (slope). For each subject there will be a diff 𝝁1j value. Assumed to be from a normal
dist. Adding predictors: Adding lvl-2 predictor (time-invariant) will decrease the unexplained lvl 2 variance in
regression weight due to the mediator: bYX - bYX.Z . Where bYX = total effect, and bYX.Z = direct effect. take the total in psych practice. Linear regression yields unbiased estimates of the beta coefficients. But only gives you the best
slopes/intercepts (will explain variance at lvl 2). Adding lvl-1 predictor (time-varying) will decrease the
relationship and subtract what’s left when you control for Z. Once you have the difference, you can divide over fitting straight line, so biased still. Higher the order → more flexible the line to the data. Want simplest model, bc
direct and indirect effect, to see how much of the difference is due to direct effect and how much is due to we don’t want it overfitted to our specific dataset. Higher order → more likely to be overfitted. Lower order → unexplained lvl 1 variance, will change the unexplained intercept/slope variance at lvl 2. Assumptions for
residuals at lvl 1 and lvl 2. lvl 1: residuals eij are assumed to be normally distributed w/ a mean 0 and variance σ
indirect. Issue: Pmed on its own is not really an ES measure, bc its meaning depends on nsize of btot 2. Completely more bias, smaller variance. The order of the polynomial is a tuning parameter, we tune using cross-validation. 2 2
LESS FLEXIBLE REGRESSION MODELS: “Univariate linear regression gives unbiased estimates for a e. σ e is the same for each subject in multilvl. iid = independent and identical distribution. eij (lvl 1 residuals) are
standardized effect (easiest and best): This is just ab when X, Y, and Z are all standardized: abcs = β𝑍𝑋β𝑌𝑍.𝑋.
linear model. But can we get better predictions with biased estimates?” → Do this by making the beta weight independent of 𝝁0j and 𝝁1j (lvl 2 residuals). lvl 1: residuals eij are assumed to be normally distributed w/ a
Interpretation: one extra SD on X gives abcs extra SDs on Y via the indirect effect. 3. Proportion of variance smaller or getting rid of it. More relevant/popular than polynomial regressions. Used for datasets w/ many mean 0 and variance σ2e. σ2e is the same for each subject in multilvl. iid = independent and identical distribution.
eij (lvl 2 residuals) are independent of 𝝁0j and 𝝁1j (lvl 2 residuals).OVERVIEW AND FULL MODEL: Yij LECTURE 9: MISSING DATA: RESPONSE INDICATOR: R=1 if Y is observed, R=0 if Y is missing. R is group diffs from 0. F-test of effect of factor a: ANOVA — tests if all group means differ from overall mean.
= b0j + b1(X1ij) + eij. b0j = 𝜸00 + 𝜸01(lvl2var)j + 𝝁0j. b1j= 𝜸10 + 𝜸11(lvl2var)j + 𝝁1j. Combine and reorder: always complete. MISSINGNESS DATA MECHANISMS: Missing Completely At Random assumption: Regression — tests if all group means, except mean of ref group, diff from reference group. NOTE: when
(alcuse)ij = γ00 + γ01(peer)j + γ10(age 14)ij + γ11(peer)j(age 14)ij + u0j + u1j(age 14)ij + eij. FIXED prob. to be missing is not related to any factor. P(R=0|Y,X) = P(R=0). So it’s just the unconditional probability of pooling results of ANOVA using regression w/ dummy coding, F-tests will have diff meaning. SOLUTION:
EFFECTS (OVERALL TRENDS): these components show trajectory of an avg. individual w/ a score of peerj: having missing values, not dependent on Y or X (e.g. respondent accidentally skipped question). Missing at Don’t use dummy coding, use EFFECT CODING. Ybarj = b0 + b1Xi1+ b2Xi2. For effect coding: Xi1 =1 if
γ00: overall mean intercept. Avg. lvl of e.g. alcohol use when predictors are at their reference lvl (0 or mean). random: prob. to be missing depends only on known factors. P(R=0|Y,X) = P(R=0|X). E.g. gender is always farmer i lives in greenham. Xi1 = -1 if farmer i lives in reference group Newbay. Xi1 = 0 otherwise (wheaton).
γ01(peer)j: effect of lvl2 predictor “peer” on the intercept. Change in intercept of alc use for 1-unit increase in observed (no missing data), and men have more missing data than women. So P(R=0) depends only on known For second variable: Xi2 = 1 if farmer lives in Wheaton. Xi2 = -1 if farmer lives in reference group Newbay. Xi2
peer variable. (peer)j= value of peer variable for group j. γ10 = avg. change in alcohol use for 1-unit increase in factor X. Not missing at random: P to be missing depends on unknown factors, P(R|Y,X) doesn’t simplify. E.g. = 0 otherwise (greenham). NOW, CONSTANT = TO THAT OF THE ANOVA, AND THE COEFFS ARE NOW
age14. γ11(peer)j(age 14)ij = interaction effect btwn the lvl 2 predictor peer and lvl 1 predictor age 14. γ11​= people of a certain ethnicity more likely not to answer a specific question (but ethnicity wasn’t observed), OR THE SAME AS WELL.
how the effect of "age 14"(lvl1) on alcohol use changes depending on the value of "peer"(lvl2). RANDOM missing data depends on the outcomes of the questions they didn’t answer (e.g. people w/ higher weight didn’t Fixed effects: Intercept (β0): Avg. lvl of outcome when predictor is at its mean (or 0). Slope (β1​): How much
EFFECTS: (group-specific deviations): at lvl 2, these represent the deviation of each subjects’ intercept/slope answer questions about body weight, but we don’t know their weight bc they didn’t answer). if DATA GIVE individuals’ outcome changes w/ 1-unit increase in predictor. Random effects: Random Intercept Variance
from the intercept/slope of an avg. individual w/ the same score peerj. u0j​: Random intercept for group j: BIASED RESULTS: depends on missingness mechanism. IGNORABLE VS. NONIGNORABLE (σ20):Variance of intercepts across the groups (random effect of group). Variability in baseline outcome levels
deviation of group j’s intercept from overall mean intercept γ00. Group specific effect on the intercept. u1j(age MISSINGNESS: MCAR and MAR are ignorable, bc when handled properly, they give unbiased results. across groups. Random Slope Variance (σ21):Variance of slopes for predictors across groups. Effect of predictor
14)ij: deviation of the slope of “age 14” for group j from the overall mean slope γ10. Group specific effect on the NMAR is non-ignorable, because it’ll give biased results (unless very strict assumptions). MCAR TEST: to find on outcome varies across groups. Intercept-Slope Correlation​:Groups w/ higher baseline outcome levels tend to
slope of age 14. eij: residual error for individual i in group j. Individual-specific deviation from the predicted which type of missingness. Tests H0 that data are MCAR, if significant, remains unknown if data are NMAR or have less/more positive relationship btwn predictor and outcome. Residual Variance (σ2ϵ​):Variability in outcome
alcohol use (variability not accounted for by the fixed and random effects). σ20 (the variance of u0j) and σ21 (the MAR. STRATEGIES TO DEAL W/ MISSING DATA: Simple strategies (bad): listwise deletion: deletes all not explained by model.
variance of u1j) tell how large btwn-subjects diffs are. Residuals eij at lvl 1 describe deviations of predicted cases that don’t have total data. Default in R and spss. Advantages: correct SEs and p-values. Disadvantages:
scores for a subject (i.e., subject-specific regression line) from their data. Multilvl model will not directly wasteful. Same data, diff N. Okay under MCAR (will guarantee unbiased results), biased under MAR and partly
estimate the u0j’s and u1j’s (and eij) but only their variances (σ 2 0 , σ 2 1 and σ 2 e ) and covariance (σ01). NMAR. Pairwise deletion: only be applied in certain cases. Calculate correlation btwn each variable pair, don’t
BUILDING MULTIlvl MODELS FOR CHANGE: start simple, build → complex. 1. Unconditional means use cases where there is missing data in either data pair. Advantage: uses more information than listwise deletion.
model: (alcuse)ij = b0j + eij. Outcome is lvl1 predictor value. Group level avg. b0j + individual variability eij. Disadvantages: only works under MCAR. Computational problems: negative variances, etc., so avoid. Mean
Model w/ no predictors. Know it’s not correct, but still estimate bc can use to compute intraclass corr (ICC): substitution: everywhere there's a missing value, fill in the mean of that variable. Biased under all 3
proportion of variance in outcome explained by cluster structure (btwn-cluster variation). For each person, only missingnesses. Underestimates the variance. Disturbs distributions. Multiple imputation: filling in an estimate
intercept, no slope. 2. Unconditional growth model I: assumes that the only parameter affecting the outcome for the missing data multiple times. Use statistical model appropriate for your data type, several times over
variable is time.Random intercepts only. b0j = γ00 + u0j. Outcome is intercept of subject j, combo of intercept generate estimates for missing values → several plausible full sets of data. Apply analyses to these datasets.
for avg. person + the individual j’s deviation from that avg. intercept. 3. Unconditional growth model II: Gives several results for same analysis, slightly diff. from each other. In pooling step, combine all results of
composite model. allows individual variability in growth trajectories. (alcuse)ij = γ00 + u0j + eij. eij independentanalysis. Most frequently used is MICE: flexible, gives small bias, appropriate coverage, and can work under
2 2
of u0j. (intercept) output in R: so σ 0 : what is happening btwn subjects. Residual output in R: σ e : what is extreme MAR conditions. HOW DOES MULTIPLE IMPUTATION WORK?imputation under linear
happening w/in subjects. These models help establish if there’s systematic variation in outcome measure and regression: use for datasets where only 1 variable has missing values. Numeric data: use regression model to get
where it is (w/in subjects or btwn subjects). LECTURE 7: MULTILEVEL PART 3. LECTURE 7 an estimate for the missing data on the one variable. Repeat multiple times estimating missing data (using spread
MULTILEVEL PT 3.: of the data to guide estimations). Predictive mean matching: starts w/ regression model. But only uses linear
𝛾00: 1st number always indicates what you’re predicting: 0 = subject specific intercepts 1= subject specific regression model to look for cases w/ similar expected values for the outcome. So you use real observed outcome
slopes. 2nd number refers to position of coeff in regression equation. 0 = intercept in regression equation, 1= dots to use for imputed values for the missing data outcome. Keeps non-linear relations intact.
slope in regression equation.γ00: overall intercept. γ01: effect of lvl2 predictor on the intercept b0j. γ10= Overall MULTIVARIATE IMPUTATION: use when there’s >1 variable w/ missing data. Joint modeling: outdated.
slope for the level-1 predictor Xij (value of bij when lvl2 pred= 0). γ11​= Effect of the level-2 predictor Wj​on the Fully conditional specification: another name for MICE. For each incomplete variable, specify model where the
slope bij. All γ’s are fixed-effects. u0 tells you that a participant can have diff intercept than avg. u1 tells you that variable w/ missing data is the outcome, and the other variables are the predictors which predict the missing data
a participant can have diff slope than avg. IMPORTANT RESIDUALS NOTE!!: u0j and u1j are the diffs btwn in the outcome variable. Do this for each variable w/ missing data. Start by filling in starting values (randomly)
the avg. intercept or slope for my group, but eijs are the difference btwn what we predicted for the individual vs. for imputations, then iterate. Reestimate until imputed values are far enough removed from the starting values
what they actually got. MAXIMUM LIKELIHOOD: estimation method for multilevel: estimating the best (i.e. until the imputations stabilize). Once you’ve finished the iterations, you have 1 complete version of your
values for all the parameters. Robust against mild violations of normality. Produces good estimates. By: incomplete data set, have to repeat until you have some number (M) of complete versions of your incomplete
2
maximizing the likelihood function. Whats in ML? All gammas and σ . Subject specific estimates (u) are dataset. Now we have to combine several imputed datasets so we can analyze. Complete data estimates: point
estimated later. Uses iterative procedure. Two methods: Full ML: regression weights and variance components estimate: regression coefficients. Variance estimate: squared std error. So you have several estimates of the
are included. Restricted maximum likelihood (REML): default. Only variance components included. REML point estimates and variance estimates → need to be one. → Pooled point estimate (regression coeff): average
more correct (less biased) for estimates of variances and covariances of random effects. What to use? FML regression coeffs of all imputed datasets. Q = (coeff1 + coeff2 +coeff3)/3. Pooled variance estimate: average to
unless only testing random effects. NON-CONVERGENCE: try other starting values. Variance might be close get w/in imputation variance: U = (SE21+SE22 + SE23)/3. Get between imputation variance: B= Log-likelihood: how well model explains observed data. Higher values (closer to 0) better fit. Deviance: lower is better.
to 0: remove it. Or model is too complex: too many random effects. Or Assumptions are not correct. ((coeff1-avgcoeff)2 +(coeff2-avgcoeff)2+(coeff3-avgcoeff)2/(M-1). Where M =# imputed datasets. Then you get LECTURE 8: NON-LONGITUDINAL DATA: no natural ordering of lvl1 units. Main interest now is in the
SIGNIFICANCE TESTING FOR FIXED EFFECTS: Wald test (Z-test): z = estimate/SE, so the total variance: T= U +(1+⅓)B. Then special t-value and CI give you if Q is significant. You have to effect of the conditions. Note: At level1 you want to have linearity btwn predictor and the outcome (also in
γhat00/SEhat(γoo), with z being N(0,1). Problem: only valid in large samples. APPROXIMATE T-TEST: incorporate variation of imputed values in the analysis so that it represents uncertainty about the data. longitudinal). Longitudinal data: separate regression model for each person (lvl2 unit). Non-longitudinal data:
estimate/SE, with t following t-dist. w/ degrees of freedom. LIKELIHOOD RATIO TEST (lrt): better. Advantages: correct statistical analysis, good general method. Disadvantages: more work, pooling step. NOTE: separate regression model for each ward (level 2). Start w/ unconditional means model: not realistic, only for
Reduced model: extended model, only fixed effect of interest is removed. FML should be used when models are graphs: you want red dots (imputed data) to fall into blue dots (observed) on graph. LECTURE 10 MISSING ICC: “Is most of the variance within ards or between wards?” UMM will give us straight lines of interceptes for
compared that differ in their fixed effects. How does it work?: only for comparing nested models. Hypothesis: DATA PT 2.:multiple imputation pooling: pooling is only available for t- and z-tests. Some analyses require the each group with residuals as well. u0: diff btwn avg. value for avg. ward vs avg stress level for that person’s
null: γ10 = 0, the reduced model is true. LRT follows χ 2 -distribution with DF=diff in # of estimated parameters testing gof sevevral parameters are at time, like: testing R2 for sign. In multiple regression, Testing a large specific ward. eij is diff btwn avg. level for that person’s ward and that person’s specific level.
btwn both models. # parameters: # parameters equal # fixed effects (γ) AND #of estimated variances and multiple regression model vs. smaller model, ANOVA, ANCOVA. POOLING TECHNIQUES FOR Avg level intercept in avg ward: y00.
covariances of the random effects. RESULTS OF LRT: if probability(Chisq) is >.05, reduced model fits equally MULTIVARIATE ESTIMATES: set of parameter estimates. Each parameter estimate (e.g. each regression
well as extended model. Comparing non-nested models: no test, but indices like deviance and complexity (# coefficient) is averaged across imputed data sets. Pooling SE (extension of univariate): if you want to test
estimated parameters). DEFINING COMPLEXITY (AIC AND BIC): choose model w/ lowest AIC or BIC. multiple parameters, not 1 SE anymore, but covariance matrix of the parameter estimates. (co)variance is Moderation moderation mediation (sobel test)
CHECKING ASSUMPTIONS FOR FINAL MODEL: 1. Check functional form (linearity) at both levels: averaged across imputed datasets. Between-imputation covariance matrix: covariance matrix of the parameter
lvl1: for each subject, a plot of Time against the outcome variable to check if trajectory of each subject evolves estimates across the M imputed data sets is computed. Total covariance matrix: from U and B one total
linearly over time. Lvl2: do a regression per subject and plot OLS estimates of intercepts and slopes against covariance matrix T is derived. This gives pooled F-value: F =(Qbar’T-1Qbar)/p. Where p = # predictors, Qbar is
time-invariant predictors, to check for linear relation between OLS estimates of intercepts/slops and lvl2 pooled estimate of Q. Important for multivariate: regression coeffs of each dataset are needed. Regression coeffs
predictors. 2. Check normality: inspect residuals. residuals at level 1 (eij) and level 2 (u0j and u1j) should be have a covariance matrix from which SEs can be derived. Pooled regression ccoeffs are averages across M
normally distributed. Check QQ plots of raw residuals, should follow straight line. Plots of stdized residuals → imputed datasets. Pooled covariance matrix derived from M imputed datasets using specific formula. Covariance
box plot and plots of subject/cluster # against stdized residuals. 3. Check homoscedasticity: plot of raw residuals matrix and parameter estimates used for F value computing. APPLICATIONS: F-test for significance of R2 in
against predictor(s): foreach value of predictor(s), residual variability should be equal. lvl1: plot estimated linear regression. Take your (e.g. 3) regression coeffs. → these are now Q bar. Then you get 4 diff covariance
residuals eij against age 14. lvl2: plot estimated u0j and u1j against coa and peer (and also gender when this matrices (SEs) for parameter estimates. Derive one total covariance matrix and its inverse (T-1). Using this
variable is included in the model). BOOTSTRAP PROCEDURE (IF DOUBTING ABOUT formula, calculate F-value. F-test for sign. of change in R2 of large regression model compared to smaller
ASSUMPTIONS): serious doubts? Don’t use multilevel model. Step: go back to std linear regression BUT regression model: is change in R2 sig? Stepwise regression w/ newly added predictors in a multiply imputed Multiple mediators pred. reg(Mean 2 PE), calc. Root square PE for each of interpretation.
correct SEs to take the dependency in the data into account. Bootstrap used to estimate the SEs and calculate CI. dataset. Do the same thing as before, but this time only use the coefficients of the predictors that were added in
parametric bootstrap: assumes parameter of interest is normally distributed. Get estimate of SEs for Beta1, the larger model, don’t use the predictor(s) that was already there in the smaller model. Only use covariance
Beta2, etc. Compute bounds of confidence interval : Beta hat - 1.96Se, Beta hat + 1.96SE. Where betahat is matrix of newly added predictors. Calculate F-value again, which will have DF in the numerator equal to the # of
estimate of regression weight in original sample. If that CI doesn’t contain 0, reject the null than beta =0. predictors you used (only in the large model) just like always. If F-value is sig→ larger model sig increases
Nonparametric bootstrap: don’t have to assume normality of parameters. Don’t use 1.96. Don’t estimate SE, go prediction of outcome compared to smaller model. F-Test for analysis of variance (ANOVA): one way ANOVA.
directly to computing CI. To get CI: order all bootstrap estimates for the parameter of interest from small to large, Does outcome differ across the groups? We need to formulate the ANOVA model as a regression model. How?
to obtain CI bounds, take away 2.5% smallest and 2.5% largest values. If bootstrap percentile interval doesn’t Regression model w/ dummy variables for each category, minus the reference category. Now you’ve added a (classification error
contain 0, reject null that beta =0. CLUSTERED BOOTSTRAPPING(USED FOR MULTILVL): need to categorical predictor to your regression model, equivalent to an ANOVA. One way ANOVA model: Yij = 𝛍+ aj+rate)
change it to include dependence (normal bootstrapping assumes independence). Need to keep dependency in the eij. Where 𝛍 is constant, in anova, overall mean. aj is group effect of group j (diff. btwn mean of group j and
data intact. In clustered bootstrapping, you sample entire clusters (e.g., schools, hospitals) with replacement 𝑝
rather than individual observations.All observations within a selected cluster are included in the bootstrap overall mean). Eij is error of person i in group j. Turn into regression model → 𝑌𝑖𝑗 = β0 + ∑ β𝑗𝑋𝑖𝑗 + ε𝑖𝑗 where
𝑗=1
sample.Each bootstrap sample consists of clusters that may appear more than once and some clusters that may be
omitted. Maintains the hierarchical structure by treating clusters as the units of resampling. Advantages: works Xij = indicator variable of group membership of group j. P = # of groups -1. B0 is constant. Bj is group effect of
w/ time-varying covariates, unbalanced data, and if we have missing value for a subject, don't have to throw away group j: diff btwn mean of group j and constant B0. This comes pretty close to 1-way ANOVA model. 3rd group
all the data (due to long matrix data format). in dummy variable represented by two 0s. Differences btwn ANOVA and regression for multiple imputation:
F-test of constant: anova — tests if overall mean differs from 0. Regression w/ dummies — tests if mean of ref. Rather close every time than avg. correct: choose straight lines. (more bias).

You might also like