100% found this document useful (1 vote)
51 views

Descriptive Descriptive Analysis and Histograms 1.1 Recode 1.2 Select Cases & Split File 2. Reliability

This document provides an overview of various statistical analysis techniques that can be performed in SPSS, including descriptive analysis, reliability analysis, factor analysis, cluster analysis, t-tests, contingency tables, and regression. Descriptive analysis involves frequencies, charts, and measures of central tendency and dispersion. Reliability analysis assesses the internal consistency of scales. Factor analysis is used to reduce variables and identify underlying factors. Cluster analysis groups cases into clusters based on similarities. Contingency tables examine relationships between categorical variables using chi-square tests. T-tests assess whether a sample mean differs from a population mean. Regression identifies relationships between variables and how well models fit data.

Uploaded by

dmihalina2988
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
100% found this document useful (1 vote)
51 views

Descriptive Descriptive Analysis and Histograms 1.1 Recode 1.2 Select Cases & Split File 2. Reliability

This document provides an overview of various statistical analysis techniques that can be performed in SPSS, including descriptive analysis, reliability analysis, factor analysis, cluster analysis, t-tests, contingency tables, and regression. Descriptive analysis involves frequencies, charts, and measures of central tendency and dispersion. Reliability analysis assesses the internal consistency of scales. Factor analysis is used to reduce variables and identify underlying factors. Cluster analysis groups cases into clusters based on similarities. Contingency tables examine relationships between categorical variables using chi-square tests. T-tests assess whether a sample mean differs from a population mean. Regression identifies relationships between variables and how well models fit data.

Uploaded by

dmihalina2988
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 6

CATEGORICAL: NOMINAL: MALE/FEMALE + ORDINAL: 1 TO 5 METRIC (SCALE): INTERVAL (REAL NUMBER

LIKE BLOOD PRESSURE) + RATIO (REAL NUMBER INCOME)


1. Descriptive Analyze->descriptive analysis-> frequencies->charts: histograms
Report Skewness: +0 right , Mean> Median/ =0: SYMMETRICAL/ -0: left: mean<median
Kurtosis: +0 peaked / =0 normally distributed / -0 flat
descriptive
The difference between the mean and median shows that some of the distributions are not symmetrical.
analysis and Skewness and Kurtosis indicate, too, that there are deviations from a normal distribution. Some of the
histograms distributions appear to be skewed left, other skewed right; some peaked, some flat.
Transform-> Recode into different variables
1.1 Recode To change item rotation in spss: Recode V01 (1=4)(2=3)(3=2)(4=1) INTO V01_r.
a. Use All. Compute filter_$= ( ). Filter by filter_$. Execute. => Frequencies=> Filter off. Use ALL.
1.2 Select
Execute.
Cases & Split
file b. Sort Cases BY . Split FILE Separate BY . Split File OFF.
2. Reliability Analyze->Scale->Reliability analysis: Item + scale if item deleted
Case processing summary=> excluded %: of the cases are excluded because of missing values. OK

Reliability statistics: Cronbach’s Alpha > .80 (.60 or .70 acceptable)=> the reliability of the scale is high
Report
Item statistics: (mean) scale from x-> y: average difficulty (x+y)/2 (low p values = difficult items)
reliability .5 middle, <.1 very difficult <.9 very easy
analysis
Item-total statistics: items with “corrected item total correlation” below .30 are discarded or revised
Above .30 : each of the items reflects sufficiently what the scale as a whole measures.
Cronbach’s alpha if item deleted: alpha should be higher than all amounts-> OK
1. Analyze->Correlate->Bivariate: options: exclude cases listwise (correlation table)
2.1 Factor
2. Analyze-> Dimension Reduction-> Factor: check: (Initial solution coefficient significance levels+
inverse, KMO) (correlation matrix based on eigenvalue, scree plot), (loading plots, varimax,rotated
solutions) (save as variables, regression sort by size)

Inverse of Correlation Matrix: when the non-Diagonal values are significantly smaller than the values
on the diagonal-> the correlation structure is well suited for a factor analysis.

Bartlett's Test and KMO Value (evaluate suitability of data):


2
shows that the variables in the dataset are not totally uncorrelated (χ =, df =, p = .000). if data is suitable
for factor analysis.
The KMO value is, according to the script, classified as respectable. There are no grounds to terminate
the analysis. (.9-1 marvellous, .8-.89 meritorious, .7-.79 middling, .6-.69 mediocre, .5-.59 miserable, .0-
.49 unacceptable).
Report Factor
Extracting the Factors: Most of the communalities are around .5, which means that for these variables,
around half of the variance can be explained through the extracted factors.
(at least 10
items per factor) Scree Plot: The scree plot indicates 3 factors, since the elbow is at 4

Kaiser Criterion: ‚total variance explained’: eigenvalues > 1

Determine the number based on rotated component matrix


Factor loading>.20 (sonst discard) min 4 loading>.30 or .40, min 10 factors>0.40

Cross Loadings:
Evaluating the Cross-Loadings: two or more factor loading>.3 or .4 (larg difference >.2 ok, lower than .2
exclude.)
Rotated Componne matrix: sorted by size.
Calculating Sum Scales or Factor Scores:
COMPUTE X new=y+z+r .
VARIABLE LABELS X ‘lable it’.

Analyze-> Classify-> Hierarchical Cluster: Check: Agglometerian, proximity, dendrogram vertical,


standard z, squared eulicidean distance, statistics, plots, SAVE:
3.1 Cluster Scatterplot: Graphs->chart builder-> (2 up left )Groups,point ID, gallery> grouping and point ID label

**Sort Cases By clus_1. Split file separate by clus_1. (Do Frequencies analyze on cluster) .Split file off.

Report Cluster Dendogram: the number of clusters, increase in heterogeneity is found when stepping from a two-cluster
to a one-cluster solution This suggests a two-cluster solution.
(Between
Do analysis again and give the number of cluster in Save, single solution!!
Linkage is for
outliers) Scatter Plot (even with more than 2 use the normal scatter, without any cluster. Then 2 it with the known
(Wards Method) cluster number but still x and y axis only)

Descriptive  analysis-­‐>  z  variables  


Analyze-> Compare Means-> One-Sample t-Test -> put the value of population in test value box.
4. One sample Sig=> the mean of sample is higher/lower than the population
t-Test
Confidence interval with SPSS:
Analyze-> descriptive->explore, descriptive with 95%.
(Normal, or ordinal Scales) => row=metric, columns=ordinal <layer: nominal
1. Frequency analysis with histograms. Min max mean median.
5. Contingency 2. Analyze-> Descriptive Statistics->Crosstabs (Relationship between characteristics: (lung cancer,
smoke)
3. Analyze->Nonparametric Tests>Legacy Dialogs>Chi-square (Two distributions are the same)
Chi-square tests: When none of the cells has an expected count of less than 5 cases and the degrees of
freedom are larger than 1=> report Pearson Chi Square.(χ =, df =, p = .000). Chi Square<0.05=>
significant => there is a relationship between x and y.
If 20<n<50 -> Continuity Correction (yate’s)
If n<20, f<5 -> Fisher’s Exacts test

Systemic Measures: Cramer’s V (not quadratic) and Contingency Coefficients (larger than 2x2)
significant. Value< 0.3=> relationship is not very strong. (Phi is for 2x2 tables.)
Report
Contingency Directional Measures: Lamda: quantify the strength of the relationship between two variables by
assessing how much the prediction of one variable improves when a second variable is used to improve
this prediction. A value of zero does not necessarily mean there is no relationship between the two
variables at all. Report: knowledge about (row " Dependent") reduces the error in predicting the other by
(value x%). Knowledge about (row "intensity Dependent") does the same for the prediction of the other
(y%).
!!The problem with the calculation of significance levels in the case of Lambda is due to the large sample
size.

1. Without splitting by groups


6. Regression GRAPH /SCATTERPLOT(BIVAR)=educ WITH wage /MISSING=LISTWISE (x axis=indep, y axis
dependent)
(2 metric scale Analyze->Regression->Linear check: Estimates, Model fit+ Y(ZRSID), X(ZPRED)+ Include constant
variable)
2. Split by Groups:
• RECODE educ (Lowest thru 11=1) (12 =2) (13 thru Highest=3) INTO educ_r.
• CROSSTABS educ by educ_r.
• SORT CASES BY educ_r. SPLIT FILE SEPARATE BY educ_r.
• GRAPH /SCATTERPLOT(BIVAR)=educ WITH wage /MISSING=LISTWISE .
• Do REGRESSION with /METHOD=ENTER educ
• Split file off.

3. Optimize Model
In order to optimize model when is skewed to right: Compute wage_1=ln(wage)
When the residual plots are U-shaped and is not linear X2: COMPUTE x_r =x*x-> do regression

Anova F: P is sign=> there is a linear relationship between the x and Y variable.

Coefficients: Constant (can be not sign in bivariate) and report variable sig.
If the variable increases by 1 unit, the other increases by B%. of variable.
Confidence level: when it does not include 0, than the regression coefficients is significantly different
from 0.
Unstandardized coefficients (B) show absolute change of dependent variable if the independent variable
increases by one unit.
The beta coeeficients are the standardized coefficients of the regression, their magnitudes reflect their
Report relative importance in predicting.
Regression
Model Summary: R square= the model explains % of the variance for Y. the highest the better the fit.
(y=B0+B1X1+U)
Coefficients t-test: Bot coefficients are sign.
Prerequisite of Gaus-Markov: 1. Linearity of coefficients (linear regression model), 2. Random sample.
3.Zero conditional mean of error term=0 (the mean values of the residuals do not differ visibly from 0
across the range of standardized estimated values, check scatterplot.) 4. Sample variation in explanatory
variables ( scatter plot shows variance), 5. Homoscedasticty when residuals are trumpet shaped they don’t
have constant variances, there is heteroscedasticity (when met the model shall be improved and don’t
interpret anymore. (if they don’t have a wave like pattern, there is an independence of error, normal
distribution of error and residuals shall be considered as well.)

1. Recode gender (1=0)(2=1) Into Female. (dummy is one less than the number of categories)
RECODE grade_n (2=0) (3=1) (4=0) (5=0) (6=0) INTO dgrade_3.

2. CROSSTABS sex by female.


7. Multiple CROSSTAB dgrade_3 by grade_n.
Linear
Regression 3.**Scatterplot (Graph->Chart Builder->scatter plot Group ID->Grouping/stacking variable
(switch all variables to SCALE in variable view) GRAPH /SCATTERPLOT(MATRIX)=earnings
(syntaxes) grade_n expr expr_sq BY female / MISSING= LISTWISE .

(in order to 4. Analyze-> regression->linear-> plus add in statistic: Collinearity diagnostics


identify the Depen: Metric Independ: Metric, Categorical (coded as dummy)
effect of • Method: Enter ( all variables entered at the same time)
variable transfer • Stepwise: (each variable individually removal tested)
it into dummy) • Forward: sequentially entered
• Backward: all entered into model and sequentially removed.
(switch to scale
5. Correlation analysis (multicollinearity) (LOG)
for scatter plot)
CORRELATIONS / VARIABLES = expr expr_sq /PRINT=TWOTAIL NOSIG / MISSING=P
AIRWISE.
Dependent variable: metric, independent: metric and categorical (dummy)
Anova: p<0.5=> Significant model.
Coefficients: t-Test and sig levels (dummy shows up= switching this lowers, increases the depend)
Adjusted R square report.
Report: Y is predicted from factors x (B), z, u. the model explains % of the variance for monthly income.
The one with the highest Beta has the greatest impact on the predicted variable.
• Multicollinearity: (high standard errors, when its high) Tolerance < .10 is multicollinear and
Report not independent. not ok! Means correlation coefficients between pair of variables is too high and
Multiple the variables should not be used in the same model.=> one shall be eliminated. Then do a
Regression regression with both variables. the variable with sig constant is chosen.
• Perfect collinearity= a variable is a linear combination of the other variables. (mistake)
(enter, or Model summary: adjusted R square <1.
stepwise Prerequisite of Gaus-Markov: 1. Linearity of coefficients (linear regression model), 2. Random sample.
method) 3.Zero conditional mean of error term=0 (analysis of residuals) 4. Sample variation in explanatory
variables, 5. Homoscedasticty (independence of error, normal distribution of error)
=>X is predicted from factor (beta=) , and ( beta 2=) (F(regression, residual)=, p=) the model explains %
of the variance for predicted variable. The highest Beta has the greatest impact on predicted variable.
Optimize the mode: (check 6)
• COMPUTE new X = ln(old dependent variable).=> Regression with new variable
• Do regressions on two multicollinear variables (once without x, once without x2 (y). compare
If the Dependent variable= ORDINAL OR NOMINAL=> the relationship is non linear.
1. Boxplot: Graph-> Legacy dialogs-> summary of group cases (do it separately for each of
independent v)
8. Logistic
Regression 2. Analyze-> Regression-> Binary Logistic: Option: classification plots, iteration history CI for exp (B)
(a probability) , Hosmer lemeshow goodness of fit,
• Binary: dependent= nominal and 2 category
• Multinominal: Dependend=nominal and more than 2 categories (drug, no drug, hard drugs)
• Ordinal: Dependent is ordinal. (small, medium, large, extra large)
Omnibus Test of Coefficients: Model=step=block as a whole is sig.
Report Logistic
Model summary: Nagelkerke R>.2 is ok! Model fits to data
Regression Classification table: overall percentage: the model predicted % correctly
-2Log (how well the model fits)
(use stepwise -2LL(base model) should have a larger number than the model. (base model –model=improvement)
,forward LR
(exploratory and Variable in the Equation: Exp(B) an increase of x of one unit alters the relative probability of getting Y
optimization, by a factor of Exp(B). This means it is being increased by (1-exp(B))%
and consider the The confidence intervals of Exp(B) also support the notion that all three variables have an impact (they
last step to all do not include 1)
interpret and
compare it with Interpretation of coefficients: CI for Exp(B) entire interval>1=> positive association between xi and y (
indep variable rises by 1 unit=> Y dependent rises by(1- expB= %),
the enter
interval contains=1=> no association, no impact
method, Interpretation of Constant:
theoretical use)
Dependent V: metric, Independent V: categorical, part of them metric

*Boxplot with table: Analyze->Descriptive Statistics->Explore


9. One Way
ANOVA Analyze-> General Linear Model-> Univariate: options descriptive stat. estimates of effect size,
parameter estimates.
Post hoc: Bonferroni

Box plot+ Levens test (As Levene's test is not significant, equal variances can be assumed)
Report One
Way ANOVA 1.Test of between subjects: Corrected model=sig=> the model as a whole is sig. constant=intercept
• When its Sig: There is a main effect of independent v on dependent. The value of Adjusted R
Square*100% shows that % of the variance in independent variable around the grand mean can
be predicted by the model. (categorical)
• When not-sig: holding all other variables constant– there is no significant difference between X
and the not significant one.
2. Partial Eta squared: (the higher the better)
Categorical V explains x% of the previously unexplained variation
For the grand mean (intercept), partial η2 is not interesting and therefore, will not be interpreted.
3. Multiple comparisons: Post hoc comparisons ( whether they differ sign)
Group x and y have a sig difference, if not=> those non sig are in one group
4. Profile Plot: Profile plots are of particular interest if you want to get a quick overview on the relative
differences between the mean values or if there are several factors involved (and hence interactions). If
not horizontal=> there is a main effect of independent on dependent.
Interaction = there is a dependency between the two. (no parallel lines)

**Boxplot:Graphs->Legacy Dialogs->Boxplots->Clustered.
9.1. Two Way
Analyze-> General Linear Model-> Unvariate-> Plots: horizontal and separate line: (add) 2 times
ANOVA (x*y, y*x)

Test of between subjects:


(up) + A significant interaction means that the effect of one variable partially depends on the value of the
other.
When interaction is sig=> there is an interaction of z*x on Y, the interaction term explains partial eta %
Report Two of the previously unexplained variance
Way ANOVA Multiple comparisons: Post hoc comparisons only on Sig explanatory ones.
Box Plot Interpretation:
• When the lines are not horizontal and both have a similar orientation => there is an effect
• crossing lines, not parrarel => there is an interaction

1. Box Plot: EXAMINE VARIABLES= depend. BY indep.


10. ANCOVA /PLOT=BOXPLOT
/STATISTICS=NONE
/NOTOTAL.
(if a variable
2. Testing homogeneity of regression. Univariate-> Model-> Interaction (2) + main effects
differ in term of 3. Analyze-> General Linear Model-> Univariate check: Options: descriptive, estimates, parameter,
another homgenity
variable) 4. Post hoc Test: Option-> display means for->"Compare main effect" and "Bonferroni"

1.Tests of Within-Subjects Effects:


• Interaction term of main factor (x*y) with covariate were not significant=> the assumption of
homogeneity of regression coefficients is not violated.; the covariate can be introduced into the
model as control variable. (conclusion) OK
• Levens test: As Levene's test of equality of error variances is not significant: Homogeneous error
Report variances can be assumed, report F. OK
2,Test of Between Subject (main result): Corrected model=sig=> the overall model is sig. the model
ANCOVA
predicts % of the variation around the grand mean (adjusted R squared). The intercept is not significant
but this is not problematic in ANOVA. There is a main effect of independent V) on Dependent, partial η2
=% of the previously unexplained variance can be predicted by treatment. The covariate is also
significant
3.Conclusion: ANCOVA indicates that there is an impact of the main factor on the dependent variable.
The different methods result in different (depend variable) when applied to the same level of (covariate)
BoxPlot:
• Select "Simple" and "Summaries of separate variables", insert several variables into the field
10.1. Repeated
"Boxes represent".(additional insert main factor into the field "Rows".)
Measure
• Or select Clustered" and "Summaries of separate variables", insert variables into the box "Boxes
ANOVA represent", insert into the box "Category axis".
Analyze-> General Linear model-> Repeated Measures
1.Homogeneity: Levene’s test should be non-sig=> homogeneity of variances is assumed
2.Sphereicity: As Mauchly's test is not-significant sphericity is met.OK!
(If Mauchly is Sig. => Epsilon GG<.75 (correction per greenhouse-Geisser) if >.75-> Correction per
Huynh-Feldt)
This means we do not need to apply a correction but can use the row "sphericity assumend" in the table
"Tests of within-subjects effects". =>
• has a significant effect on X, partial η2 = .. The time of the measurement explains Partial Eta
square % of the previously unexplained variance.
Report • Interaction *: when exist (not sig): the increase/decrease of X with Y varies depending on the
Repeated treatment applied.
measure 3.Tests of Between Subjects Effects: Main factor: there is an effect of main factor on the level of…
4.Post Hoc Tests: If main factor differ with regard to their effects, post hoc test reveal which of them
differ, when they r sign only.
Test of Within Subject contrast: compare Partial Eta to the with subject table: if its higher (explains more
of the variance)=> There is a sig linear or quadratic trend of within subject factor. A linear relationship
seems more appropriate.
5.Conclusion: Profile Plot:
As none of the post hoc tests is significant, the conclusion can be drawn that the three techniques do not
have different impacts on the creativity level.
Descriptive analysis-> QQ-plots

! 2 Independent sample: WRS: Wilcoxon rank-sum test (man-whitney U test)


Analyze-> Nonparametric test->legacy dialog-> 2 Independent sample (descriptive, define groups
Test statistics: For samples with n>30 use Asymp.sig for P(normal distribution)
N<30=> the exact sig for P

! K independent samples-> Kruskal Wallis test


Analyze-> Nonparametric test->legacy dialog-> K Independent samples (add grouping variable,
define range)
11. Non-
Test Statistics: Compare chi square with test statistics
parametric
! 2 Related Samples-> WSR: Wilcoxon signed-ranked test
Analyze-> Nonparametric test->legacy dialog-> 2 related samples (wilixicon, descriptive)
Test statistics: x increase y

! K related samples-> Friedman test


Analyze-> Nonparametric test->legacy dialog-> K related samples ( add all tests there, descriptive)
Test statistics: P sig-> different impact

Analyze-> Correlate-> Bivariate Spearman rank correlation coefficient


Post hoc test (Kruskal-wallis test) K independent samples

Analyze-> Non parametric test-> independent samples


• Does not allow define comparing groups
Report Non
• Double click-> pairwise comparisons-> copy -> when p sig; report the groups who differ
parametric
Post hoc test (Friedman test) K related samples
• Go to fields, take all groups, pairwise comparison and report the p=sig ones.

You might also like