Cheat Sheet AMDA Fall
Cheat Sheet AMDA Fall
CFA: testing a predetermined factor structure (ie division of items/variables into groups)
Could happen that a found structure in PCA is not theoretically useful, could be the case when correlations between variables are weak
Component: linear combination of items. PCA finds a component that explains the largest possible variance (highest cronbachs a). First C explains most, every next C less. Maximum of C is number of variables, but in that 1. Model fit
case no data reduction. C’s are independent from each other 2. Check residuals: want it to show least possible extremes and be normally
Guidelines item inclusion: c<.3 no, c>.3 maybe, c>.5 yes if highest, c>.8 yes clinical use. Guidelines cronbachs a: >.7 new, >.8 standardized, >.9 practice distributed, symmetry, we want under and over estimations to be similar!
Choosing C solution: explained variance, parallel criterion, scree criterion, eigenvalue >1 (only with small number of items), theory, parsimony
Percentage explained variance of C: eigenvalue/number of items (so eigenvalue indicates C quality (Example: If chi square is significant maybe it’s because of skewedness)
Parallel analysis/criterion: results of PCA’s on simulated random data without any correlation structure, same dimensions as original dataset. Select all components with eigenvalue higher than what you would get with a 3. Chi square: should not be significant! The difference between observed and
random dataset Communality: variance of item explained by the selected C’s. Indicates how well an item works in C structure. Low communality-> revising/removal Component loading: correlation of variable with predicted so it should be similar as possible!
component/amount of component variance explained by the variable Rotation: changing the angle we LOOK at data, helps with interpretation and helpful in scale construction. Done after # of C is chosen. Variables stay the
same, but information is distributed differently over components (different C eigenvalues compared to unrotated solution) 4. Baseline chi square: the difference between model estimated and without any
Orthogonal rotation: uncorrelated subscales (e.g. 90 degree angle between components in plot), for data reduction, very strict factor structure so we want it to be significant! It is no model fit!
Oblique rotation: correlated subscales, for scale construction. If component correlations approx. <.33, go on to orthogonal (bc less than 10% shared variance between Cs). The weaker the component correlations are, the less 5. sRMR: average misfit standardized, if it is <.08 then there is not much misfitting,
difference between oblique and orthogonal rotation.
Biplot properties: axes represent variables (fixed in plot), points represent observations and give readings on axes. Value of a point is determined by orthogonal projection to a variable (draw line from observation point to a reasonable model
variable axis (should have 90-degree angle to axis) see where on axis it ends up -> score of an observation on that variable CFA goal: comparing observed and predicted covariance matrix. Interdependence technique: not 6. RMSEA: if not <.06, model is not strongly fitting model
really a predictor and outcome (only sort of). Regression equation with a manifest response variable (items/variables) and two latent predictors (factors). Correlations between variables should be explained by common factors. 7. Then check with re-specification: modification index, suggests additions only to
Many double arrows-> maybe correlation between factors is ignored.
PCA vs CFA: In PCA all Cs are related to some extent to all items (low factor loadings). In CFA this is not the case (no loadings for some items on some factors) -> CFA is more restricted bc of fewer item coefficients. model fit to make it better
Exploratory/confirmatory, no test/test, theory used afterwards/before.
Factors: theoretical constructs that are being empirically validated; components: empirically determined constructs that might be meaningful
Number of model parts/parameters (add up these numbers): Variances = number of variables, Loadings = number of variables – number of factors, Factor covariances = number of factors*number of factors-1*0,5, Factor
variances = number of factors. Item covariances: number of variables*number of variables+1*0,5. Degrees of freedom: item covariances-number of model parts. Sample size: at least 100 and more than number of variables.
Statistics and interpretation: Chi square with df (difference between predicted and observed matrix), CFI >.95 (good fit), sRMR <.08 (small residuals), RMSEA <.06 with CI should be non sig (small residuals), AIC lowest
as possible when comparing models.
Ideal structure: high factor loadings on only 1 factor for all variables. Loading om more than 1 factor is not bad but should be explained.
Residual distribution should be approximately normal and centered around zero. Chisq: difference between predicted and observed matrix (should be similar, so ideally not significant); baseline.chisq: difference between
model and a model without any factor structure (significant is good). Model indices suggests additions to model (possible double loadings that would increase fit) Comparing regression coefficients in summary(model.fit): for
each factor 1 variable has a fixed coefficient, the other coefficients are relative to that 1 st fixed one. Larger number means higher correlation with factor compared to fixed one.
Hierarchical clustering: Exploration (can different groups be discerned), confirmation (can we replicate/generalize an existing grouping/classification), IRT: provides values of ability for persons and values of difficulty for items by
simplification (can we divide people’s test scores into several score profiles). Simpsons paradox: overall regression line in sample has the opposite direction fitting a logistic model to a dataset with zeroes and ones, with 2 predictors
than the regression lines in the subgroups. Agglomeration of clusters (HC): merging clusters together into bigger clusters; opposite of division. Clustering finds (ability and difficulty). The probability of answering an item correctly
groups of participants with high within group and low between group proximities. increases with more “Ability”. The proportion of people who answered
Distances. Euclidian: shortest possible distance, preserves original scale, not additive over variables. If variation of differences is large, Euclidian is more correctly decreases with more “difficulty” of an item. Response of subject s
influenced by larger differences. If small, similar to city-block. City-block/Manhattan: if Euclidian is diagonal of a triangle, this is the length of the other two to item i is determined by ability (theta, characteristic of subject: higher the
sides summed. Preserves original scale of variables and is additive over variables. Squared Euclidian: technically preserves original scale but more complex ability, the more likely a correct answer) and difficulty (beta, characteristic of
interpretation, additive over variables, distinguishes well between large and very large differences. item; higher the difficulty, the less likely a correct answer).
Standardized: (Euclidian but standardized values) corrects for scale (differences in variables with large dispersion weigh more than differences in variables with 1PL: only one item characteristic (difficulty level beta). Difficulty of an item is
a small dispersion). Gower: use for mixed measurement levels. Mahalanobis: accounts for correlation between variables. the ability level at which there is 50% chance of correct item, influences
Linkages: Single: distance between clusters = distance between the most similar individuals (smallest possible distance between clusters). Easy to merge location of ICC. In 1PL calculation, alpha is kept constant.
clusters, often results in chaining: adding one/few individuals at a time instead of finding subgroups and merging those together. Leads to less homogeneity 2PL: 2 item characteristics: difficulty (beta) and discrimination (alpha).
(bc extreme cases will also be merged just because some individuals were very similar to the other cluster). Discrimination: extent to which the item can distinguish between subjects
Complete: distance between clusters = distance between least similar individuals (largest possible distance between clusters). Conservative, as all pairs should with low and high level/extent to which the item is representative of the
be related before the clusters merge.
Logistic Average: transformation
regression: unweighted mean. Ward’s:
of the predictor weighted
variable mean such(takes
that itinto bestaccount
matches thethe volume of theof
probability cluster you’re already
the (binary) responsein). variable.underlying construct,
Logistic function: see influences
pic. Insteadslope
of x of
theICC. Discrimination
linear is highestfor
equation is inserted at the
Average and Ward’s take average distance between clusters. Single and complete: ordinal, Ward’s: numerical.
logistic regression. item’s difficulty level (so at 50% chance correct). In 1&2PL item difficulty and
Deciding on partitioning level K: GAP criterion: choose K for fusion level for which the relative increase is maximal. Calculate the % change for each fusion level to next, choose level where largest or 2 nd largest change discrimination don’t depend on reference group.
Odds:Slope
occurs. probability of something
criterion: choose happening
K for fusion level for which the(p)relative
divided
rate ofby probability
increase is maximal.ofUses
something notclose
silhouette: how happening
each point in(1-p) so: isp to/ points
one cluster (1-p)in the neighbouring cluster. Slope>Gap
Odds ratio:
K-means howfinds
clustering: much the(based
clusters odds change that
on distances) with one
fulfil unit increase
the criterion (aka(ideal
of homogeneity from 0 of
ratio toSSE:
1; slope in regular
within group linear
variablility function)
and SSB: between group variability) by alternating calculation of centroids with Precision of measurement depends on the ability of the measured subject:
reallocations of objects. Centroid: point in multidimensional space found by averaging the values on each variable over the individuals/objects within a cluster; indicates location of a cluster. Seed points: any set of points test/item says most about subject with ability level close to test’s/item’s
Logarithms
chosen change
to initialize addition
the cluster means. In into multiplication
KM, cluster partitioning is and subtraction
flexible into with
(reallocation). Starts division.
any random location of seed points. Computes Euclidian distance from each object to all centroids. Puts object in cluster of
LogR
the vs LinR:
centroid predict
it is closest probability
to. Recalculates of outcome/outcome
centroids. Repeats until SSE no longeritself,
changes.binary/nonbinary
Selecting K: SSE and SSB outcome, both
automatically have
diminish linearity
with assumption
higher K->not and max
useful. CH-index: useratio
linear equation
of between- in some way. Difference
and within-cluster difficultymodel
level. evaluation:
The higher the item discrimination,
in linear the higher
modelling the baseline modelthe maximum
only
variability, corrected for K. maxCH(K)=(SSB/K-1)/(SSE/n-k). Adjusted Rand Index (basically Cohen’s kappa, ranges from -1 to 1). information. Item information is highest at the difficulty level of the item
contains
Local minimum a constant, in logistic
problem: depending on theregression
starting value the
used baseline is themay
in KM, the process most predicted
converge to a local value.
minimumLogit function:
(best solution) insteadlogarithm of the Remedy:
of a global minimum. odds. logit(p)
run several=analyses
log(odds).
with different starting values and
Relation
choose between
the best predictor
one. Multiple and
starts is best, logitwith
starting is Ward
linear IF thesolution
clustering logistic model
almost alwaysisequally
an accurate
good. In KM model
clustersof thenecessarily
aren’t probability. So,HCplot
nested (in they the data beforehand
are). Crosstabs to make
of different cluster solutionssure
can bethere (since in
used tois an S shape discrimination
the data andisthe also highest
logit there).
function Test information is sum of all
is appropriate.
evaluate. Nestedness is present when the newly added cluster only receives participants from one source cluster.
Link function: function that transforms p so that it has a linear relationship with the predictor (eg logit function). e bo
=odds, e b1
=odds ratio
KM vs HC: divisice/agglomerative, requires value for K/does not, solutions with different number of K can be v different/are similar, subjects can switch between clusters/stay in own cluster that gets merged, uses distances of
item informations. SE of IRT depends on test information. The higher the test
Interval
subjects predictor
to cluster -> bin between
center/distances it. Categorical
subjects predictor -> can be split into several dummy variables. information for a specific ability level, the more precise the measurement
Likelihood: the chance that this particular model generated these data (calculated by multiplying the probability of the outcome for each participant). Very small number, and the narrower
to make itCImore is. manageable loglikelihood is used (always negative;
good model has LL close to zero) In 1PL: no difference in discrimination -> everyone with same total score has
Fitting model: Start with educated guess of b0 and b1, calculate LL, adjust model, calculate LL again and adjust accordingly, repeat until LL doesn’t improve anymore same ability level.
(Maximum Relation between
Loglikelihood estimate;testMLE).
scoreAfter
and multiple
ability is repeats
nonlinear weand
gets-a
chisquare distribution. -2*LL is standard chisquare distribution that can be used to get pvalues. Linearity assumption logistic regression: probability of success is an shaped.
approx.In 2PL
linearthefunction
discrimination of theMulticollinearity
of predictor. items answered correctly
indicates correlation
between predictors, makes parameters unstable. Indicators: VIF >1(possible bias), >10 problem. determines ability. Depending on which items are answered correctly there
We don’t have a good equivalent of R2 from linear regression in logistic regression as logistic regression does not model variance. are multiple ways to get the same ability level, and more items correct does
Hosmer & Lemeshow’s test: tests whether data significantly deviates from model. Should not be significant! Exp(B) for variable in question = odds ratio. Likelihood notratio
automatically mean a higher
test: X2 = deviance ability
(baseline) level.
– deviance(new) or 2LL(new) –
2LL(baseline) or 2*log(likelihoodnew/likelihoodbaseline). LRT can also compare against another, nested model instead of baseline. Assumptions IRT: local independence (controlling for trait level no relations
Deviance: -2 * log-likelihood -> indicates how much the model is unlike the perfect model. Bigger the deviance, the worse the model. Not useful for calculating effect between
size asitems)
it getsand unidimensionality
larger if there is more(onlydata.1Table
factorcasewise
underlying listthe
can be used
Meta-analysis: statistical technique
to detect outliers/influential points, that rightcomputes
column SResid summarizing
indicatedstatistics
change of in model deviance if removed. Influential datapoints indicators: cooks distance >1, leverage responses). If LI isk+1/N,
> 2/3 times good then
dfbeta dimensionality
>1. too. Real world data are never
individual
Advantages study statistics.
LR over Goals:can
crosstab: estimate
handletrue population
numerical effect, estimate
variables,can perform covariate corrections and therefore describe unique effects, can compare effects strictly unidimensional.
of predictors (crosstab IRTdescribes
is robust overall
to minorassociation
violations of assumptions
only), can actually
variability,
predict individual explain variability
probabilities. (moderation). Additional benefits compared to especially if dimensions are highly correlated.