Cheat Sheet AMDA Fall

Uploaded by

edde48429

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as DOCX, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

2 views

Cheat Sheet AMDA Fall

Uploaded by

edde48429

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as DOCX, PDF, TXT or read online on Scribd

You are on page 1/ 1

PCA: exploratory mapping of underlying structure/interrelations in a set of variables, data to theory Steps of CFA

CFA: testing a predetermined factor structure (ie division of items/variables into groups)
Could happen that a found structure in PCA is not theoretically useful, could be the case when correlations between variables are weak
Component: linear combination of items. PCA finds a component that explains the largest possible variance (highest cronbachs a). First C explains most, every next C less. Maximum of C is number of variables, but in that 1. Model fit
case no data reduction. C’s are independent from each other 2. Check residuals: want it to show least possible extremes and be normally
Guidelines item inclusion: c<.3 no, c>.3 maybe, c>.5 yes if highest, c>.8 yes clinical use. Guidelines cronbachs a: >.7 new, >.8 standardized, >.9 practice distributed, symmetry, we want under and over estimations to be similar!
Choosing C solution: explained variance, parallel criterion, scree criterion, eigenvalue >1 (only with small number of items), theory, parsimony
Percentage explained variance of C: eigenvalue/number of items (so eigenvalue indicates C quality (Example: If chi square is significant maybe it’s because of skewedness)
Parallel analysis/criterion: results of PCA’s on simulated random data without any correlation structure, same dimensions as original dataset. Select all components with eigenvalue higher than what you would get with a 3. Chi square: should not be significant! The difference between observed and
random dataset Communality: variance of item explained by the selected C’s. Indicates how well an item works in C structure. Low communality-> revising/removal Component loading: correlation of variable with predicted so it should be similar as possible!
component/amount of component variance explained by the variable Rotation: changing the angle we LOOK at data, helps with interpretation and helpful in scale construction. Done after # of C is chosen. Variables stay the
same, but information is distributed differently over components (different C eigenvalues compared to unrotated solution) 4. Baseline chi square: the difference between model estimated and without any
Orthogonal rotation: uncorrelated subscales (e.g. 90 degree angle between components in plot), for data reduction, very strict factor structure so we want it to be significant! It is no model fit!
Oblique rotation: correlated subscales, for scale construction. If component correlations approx. <.33, go on to orthogonal (bc less than 10% shared variance between Cs). The weaker the component correlations are, the less 5. sRMR: average misfit standardized, if it is <.08 then there is not much misfitting,
difference between oblique and orthogonal rotation.
Biplot properties: axes represent variables (fixed in plot), points represent observations and give readings on axes. Value of a point is determined by orthogonal projection to a variable (draw line from observation point to a reasonable model
variable axis (should have 90-degree angle to axis) see where on axis it ends up -> score of an observation on that variable CFA goal: comparing observed and predicted covariance matrix. Interdependence technique: not 6. RMSEA: if not <.06, model is not strongly fitting model
really a predictor and outcome (only sort of). Regression equation with a manifest response variable (items/variables) and two latent predictors (factors). Correlations between variables should be explained by common factors. 7. Then check with re-specification: modification index, suggests additions only to
Many double arrows-> maybe correlation between factors is ignored.
PCA vs CFA: In PCA all Cs are related to some extent to all items (low factor loadings). In CFA this is not the case (no loadings for some items on some factors) -> CFA is more restricted bc of fewer item coefficients. model fit to make it better
Exploratory/confirmatory, no test/test, theory used afterwards/before.
Factors: theoretical constructs that are being empirically validated; components: empirically determined constructs that might be meaningful
Number of model parts/parameters (add up these numbers): Variances = number of variables, Loadings = number of variables – number of factors, Factor covariances = number of factors*number of factors-1*0,5, Factor
variances = number of factors. Item covariances: number of variables*number of variables+1*0,5. Degrees of freedom: item covariances-number of model parts. Sample size: at least 100 and more than number of variables.
Statistics and interpretation: Chi square with df (difference between predicted and observed matrix), CFI >.95 (good fit), sRMR <.08 (small residuals), RMSEA <.06 with CI should be non sig (small residuals), AIC lowest
as possible when comparing models.
Ideal structure: high factor loadings on only 1 factor for all variables. Loading om more than 1 factor is not bad but should be explained.
Residual distribution should be approximately normal and centered around zero. Chisq: difference between predicted and observed matrix (should be similar, so ideally not significant); baseline.chisq: difference between
model and a model without any factor structure (significant is good). Model indices suggests additions to model (possible double loadings that would increase fit) Comparing regression coefficients in summary(model.fit): for
each factor 1 variable has a fixed coefficient, the other coefficients are relative to that 1 st fixed one. Larger number means higher correlation with factor compared to fixed one.
Hierarchical clustering: Exploration (can different groups be discerned), confirmation (can we replicate/generalize an existing grouping/classification), IRT: provides values of ability for persons and values of difficulty for items by
simplification (can we divide people’s test scores into several score profiles). Simpsons paradox: overall regression line in sample has the opposite direction fitting a logistic model to a dataset with zeroes and ones, with 2 predictors
than the regression lines in the subgroups. Agglomeration of clusters (HC): merging clusters together into bigger clusters; opposite of division. Clustering finds (ability and difficulty). The probability of answering an item correctly
groups of participants with high within group and low between group proximities. increases with more “Ability”. The proportion of people who answered
Distances. Euclidian: shortest possible distance, preserves original scale, not additive over variables. If variation of differences is large, Euclidian is more correctly decreases with more “difficulty” of an item. Response of subject s
influenced by larger differences. If small, similar to city-block. City-block/Manhattan: if Euclidian is diagonal of a triangle, this is the length of the other two to item i is determined by ability (theta, characteristic of subject: higher the
sides summed. Preserves original scale of variables and is additive over variables. Squared Euclidian: technically preserves original scale but more complex ability, the more likely a correct answer) and difficulty (beta, characteristic of
interpretation, additive over variables, distinguishes well between large and very large differences. item; higher the difficulty, the less likely a correct answer).
Standardized: (Euclidian but standardized values) corrects for scale (differences in variables with large dispersion weigh more than differences in variables with 1PL: only one item characteristic (difficulty level beta). Difficulty of an item is
a small dispersion). Gower: use for mixed measurement levels. Mahalanobis: accounts for correlation between variables. the ability level at which there is 50% chance of correct item, influences
Linkages: Single: distance between clusters = distance between the most similar individuals (smallest possible distance between clusters). Easy to merge location of ICC. In 1PL calculation, alpha is kept constant.
clusters, often results in chaining: adding one/few individuals at a time instead of finding subgroups and merging those together. Leads to less homogeneity 2PL: 2 item characteristics: difficulty (beta) and discrimination (alpha).
(bc extreme cases will also be merged just because some individuals were very similar to the other cluster). Discrimination: extent to which the item can distinguish between subjects
Complete: distance between clusters = distance between least similar individuals (largest possible distance between clusters). Conservative, as all pairs should with low and high level/extent to which the item is representative of the
be related before the clusters merge.
Logistic Average: transformation
regression: unweighted mean. Ward’s:
of the predictor weighted
variable mean such(takes
that itinto bestaccount
matches thethe volume of theof
probability cluster you’re already
the (binary) responsein). variable.underlying construct,
Logistic function: see influences
pic. Insteadslope
of x of
theICC. Discrimination
linear is highestfor
equation is inserted at the
Average and Ward’s take average distance between clusters. Single and complete: ordinal, Ward’s: numerical.
logistic regression. item’s difficulty level (so at 50% chance correct). In 1&2PL item difficulty and
Deciding on partitioning level K: GAP criterion: choose K for fusion level for which the relative increase is maximal. Calculate the % change for each fusion level to next, choose level where largest or 2 nd largest change discrimination don’t depend on reference group.
Odds:Slope
occurs. probability of something
criterion: choose happening
K for fusion level for which the(p)relative
divided
rate ofby probability
increase is maximal.ofUses
something notclose
silhouette: how happening
each point in(1-p) so: isp to/ points
one cluster (1-p)in the neighbouring cluster. Slope>Gap
Odds ratio:
K-means howfinds
clustering: much the(based
clusters odds change that
on distances) with one
fulfil unit increase
the criterion (aka(ideal
of homogeneity from 0 of
ratio toSSE:
1; slope in regular
within group linear
variablility function)
and SSB: between group variability) by alternating calculation of centroids with Precision of measurement depends on the ability of the measured subject:
reallocations of objects. Centroid: point in multidimensional space found by averaging the values on each variable over the individuals/objects within a cluster; indicates location of a cluster. Seed points: any set of points test/item says most about subject with ability level close to test’s/item’s
Logarithms
chosen change
to initialize addition
the cluster means. In into multiplication
KM, cluster partitioning is and subtraction
flexible into with
(reallocation). Starts division.
any random location of seed points. Computes Euclidian distance from each object to all centroids. Puts object in cluster of
LogR
the vs LinR:
centroid predict
it is closest probability
to. Recalculates of outcome/outcome
centroids. Repeats until SSE no longeritself,
changes.binary/nonbinary
Selecting K: SSE and SSB outcome, both
automatically have
diminish linearity
with assumption
higher K->not and max
useful. CH-index: useratio
linear equation
of between- in some way. Difference
and within-cluster difficultymodel
level. evaluation:
The higher the item discrimination,
in linear the higher
modelling the baseline modelthe maximum
only
variability, corrected for K. maxCH(K)=(SSB/K-1)/(SSE/n-k). Adjusted Rand Index (basically Cohen’s kappa, ranges from -1 to 1). information. Item information is highest at the difficulty level of the item
contains
Local minimum a constant, in logistic
problem: depending on theregression
starting value the
used baseline is themay
in KM, the process most predicted
converge to a local value.
minimumLogit function:
(best solution) insteadlogarithm of the Remedy:
of a global minimum. odds. logit(p)
run several=analyses
log(odds).
with different starting values and
Relation
choose between
the best predictor
one. Multiple and
starts is best, logitwith
starting is Ward
linear IF thesolution
clustering logistic model
almost alwaysisequally
an accurate
good. In KM model
clustersof thenecessarily
aren’t probability. So,HCplot
nested (in they the data beforehand
are). Crosstabs to make
of different cluster solutionssure
can bethere (since in
used tois an S shape discrimination
the data andisthe also highest
logit there).
function Test information is sum of all
is appropriate.
evaluate. Nestedness is present when the newly added cluster only receives participants from one source cluster.
Link function: function that transforms p so that it has a linear relationship with the predictor (eg logit function). e bo
=odds, e b1
=odds ratio
KM vs HC: divisice/agglomerative, requires value for K/does not, solutions with different number of K can be v different/are similar, subjects can switch between clusters/stay in own cluster that gets merged, uses distances of
item informations. SE of IRT depends on test information. The higher the test
Interval
subjects predictor
to cluster -> bin between
center/distances it. Categorical
subjects predictor -> can be split into several dummy variables. information for a specific ability level, the more precise the measurement
Likelihood: the chance that this particular model generated these data (calculated by multiplying the probability of the outcome for each participant). Very small number, and the narrower
to make itCImore is. manageable loglikelihood is used (always negative;
good model has LL close to zero) In 1PL: no difference in discrimination -> everyone with same total score has
Fitting model: Start with educated guess of b0 and b1, calculate LL, adjust model, calculate LL again and adjust accordingly, repeat until LL doesn’t improve anymore same ability level.
(Maximum Relation between
Loglikelihood estimate;testMLE).
scoreAfter
and multiple
ability is repeats
nonlinear weand
gets-a
chisquare distribution. -2*LL is standard chisquare distribution that can be used to get pvalues. Linearity assumption logistic regression: probability of success is an shaped.
approx.In 2PL
linearthefunction
discrimination of theMulticollinearity
of predictor. items answered correctly
indicates correlation
between predictors, makes parameters unstable. Indicators: VIF >1(possible bias), >10 problem. determines ability. Depending on which items are answered correctly there
We don’t have a good equivalent of R2 from linear regression in logistic regression as logistic regression does not model variance. are multiple ways to get the same ability level, and more items correct does
Hosmer & Lemeshow’s test: tests whether data significantly deviates from model. Should not be significant! Exp(B) for variable in question = odds ratio. Likelihood notratio
automatically mean a higher
test: X2 = deviance ability
(baseline) level.
– deviance(new) or 2LL(new) –
2LL(baseline) or 2*log(likelihoodnew/likelihoodbaseline). LRT can also compare against another, nested model instead of baseline. Assumptions IRT: local independence (controlling for trait level no relations
Deviance: -2 * log-likelihood -> indicates how much the model is unlike the perfect model. Bigger the deviance, the worse the model. Not useful for calculating effect between
size asitems)
it getsand unidimensionality
larger if there is more(onlydata.1Table
factorcasewise
underlying listthe
can be used
Meta-analysis: statistical technique
to detect outliers/influential points, that rightcomputes
column SResid summarizing
indicatedstatistics
change of in model deviance if removed. Influential datapoints indicators: cooks distance >1, leverage responses). If LI isk+1/N,
> 2/3 times good then
dfbeta dimensionality
>1. too. Real world data are never
individual
Advantages study statistics.
LR over Goals:can
crosstab: estimate
handletrue population
numerical effect, estimate
variables,can perform covariate corrections and therefore describe unique effects, can compare effects strictly unidimensional.
of predictors (crosstab IRTdescribes
is robust overall
to minorassociation
violations of assumptions
only), can actually
variability,
predict individual explain variability
probabilities. (moderation). Additional benefits compared to especially if dimensions are highly correlated.

Chap 1-4 Reviewer Psych Stats
No ratings yet
Chap 1-4 Reviewer Psych Stats
3 pages
Psyc 2F23 - Stat Notes
No ratings yet
Psyc 2F23 - Stat Notes
8 pages
Regresi Logistik
No ratings yet
Regresi Logistik
34 pages
Turnover Intentions and Voluntary Turnover: The Moderating Roles of Self-Monitoring, Locus of Control, Proactive Personality, and Risk Aversion
No ratings yet
Turnover Intentions and Voluntary Turnover: The Moderating Roles of Self-Monitoring, Locus of Control, Proactive Personality, and Risk Aversion
11 pages
Ap Stats Cram Sheet: Symmetric - When The Left Half Is
No ratings yet
Ap Stats Cram Sheet: Symmetric - When The Left Half Is
7 pages
Hello
No ratings yet
Hello
3 pages
Stats
No ratings yet
Stats
2 pages
5.basic Statistics
No ratings yet
5.basic Statistics
43 pages
Finals in Edu 533
No ratings yet
Finals in Edu 533
11 pages
Efa, Cfa
No ratings yet
Efa, Cfa
3 pages
STATS DISH CLEAN
No ratings yet
STATS DISH CLEAN
47 pages
MT1 - Text Notes
No ratings yet
MT1 - Text Notes
25 pages
Chapter 12
No ratings yet
Chapter 12
12 pages
Chapter 4 - of Test and Testing
No ratings yet
Chapter 4 - of Test and Testing
51 pages
Psychological Assessment Outline Summary
No ratings yet
Psychological Assessment Outline Summary
9 pages
Data Analysis Course Notes
No ratings yet
Data Analysis Course Notes
9 pages
Statistics and Data Analysis for the Behavioral Sciences Student Study Guide Dana S. Dunn download
100% (1)
Statistics and Data Analysis for the Behavioral Sciences Student Study Guide Dana S. Dunn download
56 pages
Analytical Review2
No ratings yet
Analytical Review2
40 pages
Glossary SPSS Statistics
No ratings yet
Glossary SPSS Statistics
6 pages
StatisticsRefresher Part1
No ratings yet
StatisticsRefresher Part1
7 pages
Medical Statistics New
No ratings yet
Medical Statistics New
46 pages
cs-CFA
No ratings yet
cs-CFA
3 pages
Data Screening& Factor Analysis
No ratings yet
Data Screening& Factor Analysis
23 pages
3 Matm111
No ratings yet
3 Matm111
3 pages
Basic Concepts
No ratings yet
Basic Concepts
105 pages
Gea1000 Cheatsheet Finals
No ratings yet
Gea1000 Cheatsheet Finals
3 pages
Stats 1 - IITM BS Notes - Part 1
No ratings yet
Stats 1 - IITM BS Notes - Part 1
16 pages
Psychological Assessment
No ratings yet
Psychological Assessment
9 pages
Solution 1
No ratings yet
Solution 1
14 pages
PSYCHSTATS
No ratings yet
PSYCHSTATS
9 pages
Assessment
No ratings yet
Assessment
3 pages
Statistics
No ratings yet
Statistics
8 pages
Cornell Notes Template
No ratings yet
Cornell Notes Template
4 pages
Psych Assessment Chapter 3
No ratings yet
Psych Assessment Chapter 3
4 pages
Basic Stats For Testing and Intelligence
No ratings yet
Basic Stats For Testing and Intelligence
11 pages
Basicof Stats
No ratings yet
Basicof Stats
7 pages
PSYCHOLOGICAL ASSESSMENT CHAPTER 3-6 (Summary) 1
No ratings yet
PSYCHOLOGICAL ASSESSMENT CHAPTER 3-6 (Summary) 1
11 pages
Chapter 13 Multivariate Analysis Techniques
No ratings yet
Chapter 13 Multivariate Analysis Techniques
58 pages
Psychological Stats Reviewer
No ratings yet
Psychological Stats Reviewer
11 pages
MMW Midterm Reviewer
No ratings yet
MMW Midterm Reviewer
6 pages
Methods notes
No ratings yet
Methods notes
9 pages
Main Title: Planning Data Analysis Using Statistical Data
100% (1)
Main Title: Planning Data Analysis Using Statistical Data
40 pages
EDUC/PSY 6600: Unit 6 Homework: Categorical Data - Binomial and Chi Squared Tests
No ratings yet
EDUC/PSY 6600: Unit 6 Homework: Categorical Data - Binomial and Chi Squared Tests
34 pages
Cheat Sheet
No ratings yet
Cheat Sheet
2 pages
(Ebook) Statistics and Data Analysis for the Behavioral Sciences by Dana S. Dunn, Suzanne Mannes ISBN 9780072347647, 9780072373646, 0072347643, 0072373644 - The full ebook with complete content is ready for download
100% (2)
(Ebook) Statistics and Data Analysis for the Behavioral Sciences by Dana S. Dunn, Suzanne Mannes ISBN 9780072347647, 9780072373646, 0072347643, 0072373644 - The full ebook with complete content is ready for download
61 pages
STATS 201
No ratings yet
STATS 201
5 pages
What Is Statistics?: "Statistics Is A Way To Get Information From Data"
No ratings yet
What Is Statistics?: "Statistics Is A Way To Get Information From Data"
220 pages
Inferential Statistics: Identify
No ratings yet
Inferential Statistics: Identify
7 pages
MSC103 Assessment of Learning
No ratings yet
MSC103 Assessment of Learning
3 pages
Bsem 34 Chapter 1 Complete
No ratings yet
Bsem 34 Chapter 1 Complete
58 pages
Lesson 3 - Statistics Refresher
No ratings yet
Lesson 3 - Statistics Refresher
56 pages
Basic Statistics Power Point
No ratings yet
Basic Statistics Power Point
41 pages
Quality Trainer Content Outline
0% (1)
Quality Trainer Content Outline
4 pages
Quiz
No ratings yet
Quiz
4 pages
Statistics 101 Study Notes
No ratings yet
Statistics 101 Study Notes
33 pages
Difference Between (Median, Mean, Mode, Range, Midrange) (Descriptive Statistics)
No ratings yet
Difference Between (Median, Mean, Mode, Range, Midrange) (Descriptive Statistics)
11 pages
Descriptive Descriptive Analysis and Histograms 1.1 Recode 1.2 Select Cases & Split File 2. Reliability
100% (1)
Descriptive Descriptive Analysis and Histograms 1.1 Recode 1.2 Select Cases & Split File 2. Reliability
6 pages
2B Statistic Education 0k
No ratings yet
2B Statistic Education 0k
39 pages
MMW Mod#4 Statistics
No ratings yet
MMW Mod#4 Statistics
6 pages
K Nearest Neighbor Algorithm: Fundamentals and Applications
From Everand
K Nearest Neighbor Algorithm: Fundamentals and Applications
Fouad Sabry
No ratings yet
Alternating Decision Tree: Fundamentals and Applications
From Everand
Alternating Decision Tree: Fundamentals and Applications
Fouad Sabry
No ratings yet
Direct Linear Transformation: Practical Applications and Techniques in Computer Vision
From Everand
Direct Linear Transformation: Practical Applications and Techniques in Computer Vision
Fouad Sabry
No ratings yet
MPH Curriculum KU Nepal
0% (1)
MPH Curriculum KU Nepal
50 pages
ML U3 MCQ
No ratings yet
ML U3 MCQ
20 pages
Interview Transcripts_UPSC-ISS 2022
No ratings yet
Interview Transcripts_UPSC-ISS 2022
36 pages
Package Mlogit': R Topics Documented
No ratings yet
Package Mlogit': R Topics Documented
41 pages
Literature Review-Survey Paper
No ratings yet
Literature Review-Survey Paper
5 pages
Data Analytics Unit 2
No ratings yet
Data Analytics Unit 2
13 pages
Applied Multivariate Statistics for the Social Sciences Analyses with SAS and IBM s SPSS 6th Edition Keenan A. Pituch pdf download
100% (1)
Applied Multivariate Statistics for the Social Sciences Analyses with SAS and IBM s SPSS 6th Edition Keenan A. Pituch pdf download
46 pages
Mit Data Science Machine Learning Program Brochure
No ratings yet
Mit Data Science Machine Learning Program Brochure
17 pages
Attitude and Behavioral Factors in Waste Management in The Construction Industry of Malaysia
No ratings yet
Attitude and Behavioral Factors in Waste Management in The Construction Industry of Malaysia
8 pages
Eye Tracking in Second Language Acquisition and Bilingualism A Research Synthesis and Methodological Guide 1st Edition Aline Godfroid (Author) pdf download
100% (3)
Eye Tracking in Second Language Acquisition and Bilingualism A Research Synthesis and Methodological Guide 1st Edition Aline Godfroid (Author) pdf download
63 pages
Econometrics II
No ratings yet
Econometrics II
4 pages
Tagliamonte Et Al (2016)
No ratings yet
Tagliamonte Et Al (2016)
27 pages
Data Science Interview Resources
No ratings yet
Data Science Interview Resources
12 pages
Statistical Disclosure Control For Microdata
No ratings yet
Statistical Disclosure Control For Microdata
299 pages
Unit V Full
No ratings yet
Unit V Full
23 pages
Week01 Lecture BB
No ratings yet
Week01 Lecture BB
70 pages
Predicting The Car Accident Severity in Seattle
No ratings yet
Predicting The Car Accident Severity in Seattle
12 pages
Predict Total Goals Arxiv
No ratings yet
Predict Total Goals Arxiv
41 pages
The Forgotten 90
No ratings yet
The Forgotten 90
23 pages
Machine Learning Unit 4
100% (1)
Machine Learning Unit 4
78 pages
Children at Risk of Specific Learning Disorder
No ratings yet
Children at Risk of Specific Learning Disorder
14 pages
Tackling Complex Problems - Analysis of The AN - TPQ-53 Counterfire Radar
No ratings yet
Tackling Complex Problems - Analysis of The AN - TPQ-53 Counterfire Radar
8 pages
Class 7
No ratings yet
Class 7
42 pages
Generative and Discriminative Classifiers: Naive Bayes and Logistic Regression
No ratings yet
Generative and Discriminative Classifiers: Naive Bayes and Logistic Regression
17 pages
Mini Project Report On Ipl Win Probability Predictor"
No ratings yet
Mini Project Report On Ipl Win Probability Predictor"
28 pages
Regression and Analysis
No ratings yet
Regression and Analysis
132 pages
Sminton,+13509 Article+ (PDF) 30287 1 11 20220414
No ratings yet
Sminton,+13509 Article+ (PDF) 30287 1 11 20220414
38 pages
ICS423 IoT syllabus
No ratings yet
ICS423 IoT syllabus
2 pages

Cheat Sheet AMDA Fall

Uploaded by

Cheat Sheet AMDA Fall

Uploaded by

PCA: exploratory mapping of underlying structure/interrelations in a set of variables, data to theory Steps of CFA

You might also like