SPSS Notes_Final (1)
SPSS Notes_Final (1)
Introduction to SPSS
SPSS (Statistical Package for the Social Sciences) is a software program that
has played a significant role in the field of social sciences and data analysis since its
inception. Developed by Norman H. Nie, C. Hadlai "Tex" Hull, and Dale H. Bent at
Stanford University in the late 1960s, SPSS has revolutionized statistical analysis and
research methodologies, becoming one of the most widely used software packages in
the social sciences.
The history of SPSS dates back to 1968 when Nie and his colleagues set out
to develop a system to facilitate data analysis in social science research. Initially
called the Statistical Package for the Social Sciences, the software was developed for
use on mainframe computers. It was designed to be user-friendly, allowing
researchers without extensive statistical training to analyze data effectively.
The early versions of SPSS were command-line driven, requiring users to input
commands to perform various statistical analyses. Despite its command-based
interface, SPSS quickly gained popularity due to its powerful features and ease of use.
As the demand for the software grew, SPSS Inc. was founded in 1975 to further
develop and market the software.
Over the years, SPSS continued to evolve and expand its capabilities. In the 1980s,
SPSS introduced a graphical user interface (GUI), making it more accessible to a
broader audience. The GUI allowed users to interact with the software using menus
and icons, simplifying the process of data analysis. This user-friendly interface
contributed to SPSS's widespread adoption across academic, business, and
government sectors. With each new version, SPSS incorporated additional statistical
procedures and data management tools, solidifying its position as a comprehensive
statistical analysis software. Its extensive range of features enabled researchers to
conduct a wide variety of analyses, including descriptive statistics, hypothesis testing,
regression analysis, factor analysis, and more.
2
In 2009, SPSS Inc. was acquired by IBM Corporation, and the software was
rebranded as IBM SPSS Statistics. The acquisition further enhanced the resources
available for the development and support of the software, ensuring its continued
growth and improvement. SPSS has continued to evolve in recent years, keeping up
with advances in technology and data analysis methodologies. The software now
offers integration with other data analysis tools and programming languages, such as
Python and R, expanding its capabilities and providing flexibility to users with
diverse needs.
Today, SPSS is widely recognized as a leading statistical analysis software package,
trusted by researchers, analysts, and data scientists worldwide. Its user-friendly
interface, extensive range of statistical procedures, and powerful data management
tools have made it a staple in various fields, including social sciences, market research,
healthcare, and education. In conclusion, the history of SPSS is a story of innovation
and adaptation. From its humble beginnings as a mainframe-based statistical package
to its current iteration as IBM SPSS Statistics, the software has consistently provided
researchers with powerful tools to analyze and interpret data. Its ease of use,
comprehensive features, and continuous development have contributed to its enduring
popularity and its status as a cornerstone in the field of social sciences and data
analysis.
Regression
Coefficientsa
Standardized
Unstandardized Coefficients Coefficients
Model B Std. Error Beta t Sig.
1 (Constant) 2.311 1.589 1.454 .168
Single_Strength .000 .004 -.020 -.076 .941
a. Dependent Variable: Elongation
Answer:
The table shows the results of a linear regression model, which was conducted to
determine whether there is a significant relationship between Single_Strength and
Elongation.
3
The B column shows the unstandardized coefficients, which are the estimates of the
slopes of the regression line. The Std. Error column shows the standard errors of the
coefficients, which are used to measure the uncertainty of the estimates. The Beta
column shows the standardized coefficients, which are the estimates of the slopes of
the regression line after they have been standardized. The t column shows the t-
statistics, which are used to test the significance of the coefficients. The Sig. column
shows the p-values, which are the probability of obtaining a t-statistic as extreme or
more extreme than the one observed, assuming that the null hypothesis is true.
In this case, the B coefficient for Single_Strength is 0.000, which means that there is
no significant linear relationship between Single_Strength and Elongation. The p-
value for Single_Strength is 0.941, which is greater than 0.05, so we do not reject the
null hypothesis that the coefficient is equal to 0. This means that there is not enough
evidence to conclude that Single_Strength has a significant impact on Elongation.
The Beta coefficient for Single_Strength is -0.020, which means that a one-unit
increase in Single_Strength is associated with a 0.020-unit decrease in Elongation.
However, the p-value for Single_Strength is 0.941, so we do not reject the null
hypothesis that the coefficient is equal to 0. This means that we cannot be confident
that the decrease in Elongation is actually caused by Single_Strength.
In conclusion, the results of the linear regression model do not provide enough
evidence to conclude that there is a significant relationship between Single_Strength
and Elongation.
Answer:
A t-test is a statistical hypothesis test used to determine if there is a significant
difference between the means of two groups. It helps researchers and analysts assess
whether the observed difference between the sample means is likely to be due to
chance or if it represents a true difference in the population means.
There are different types of t-tests, but the most common ones are:
1. Independent samples t-test: This test is used when comparing the means of
two independent groups or samples. For example, it could be used to compare
the average test scores of students from two different schools to see if there is
a significant difference in their academic performance.
2. Paired samples t-test: Also known as a dependent samples t-test, this test is
used when comparing the means of two related or paired samples. The data is
4
collected from the same individuals or objects under two different conditions
or time points. For instance, it could be used to assess whether there is a
significant difference in the weight of individuals before and after a weight
loss intervention.
The t-test relies on assumptions, such as the data being approximately normally
distributed and having equal variances in both groups (for the independent samples t-
test). When these assumptions are met, the t-test produces reliable results. However, if
the data violates these assumptions, alternative tests like non-parametric tests may be
more appropriate.
The t-test calculates a t-statistic, which measures the difference between the sample
means relative to the variability within the groups. Based on the t-statistic and the
degrees of freedom (related to sample size), the test determines the probability (p-
value) of observing the data if there were no true difference between the groups. A
small p-value (typically less than a predetermined significance level, often 0.05)
indicates that the observed difference is unlikely to be due to chance, and we can
reject the null hypothesis that there is no difference between the population means. On
the other hand, a large p-value suggests that the observed difference is likely due to
chance, and we fail to reject the null hypothesis.
In summary, the t-test is a widely used statistical tool for comparing means of two
groups and determining whether the observed differences are statistically significant.
Question:
How to apply t-test using SPSS
Answer:
Independent t-test
Analyze Compare Means Independent-Samples T Test
Select the dependent variable(s) and transfer it to the box labelled Test Variable(s)
Select an independent variable and transfer it to the box labelled Grouping Variable
Define Groups button will become activated click on the Define Groups box
Question:
Interpret the following table
One-Sample Test
Test Value = 80
95% Confidence Interval of the
Difference
t df Sig. (2-tailed) Mean Difference Lower Upper
Marks -.642 14 .531 -1.66667 -7.2308 3.8975
Answer:
The table shows the results of a one-sample t-test, which was conducted to determine
whether the mean marks of a sample of students are significantly different from a
hypothesized value of 80.
The t-statistic is -0.642, which is not significant at the 0.05 level (p-value = 0.531).
This means that there is not enough evidence to conclude that the mean marks of the
sample are significantly different from 80.
The 95% confidence interval for the mean difference is -7.2308 to 3.8975. This means
that we can be 95% confident that the true mean difference between the sample mean
and the hypothesized mean of 80 is between -7.2308 and 3.8975.
In conclusion, the results of the one-sample t-test do not provide enough evidence to
conclude that the mean marks of the sample are significantly different from 80.
Test Value: The hypothesized value of the mean marks in the population.
t: The t-statistic, which is a measure of how far the sample mean is from the
hypothesized mean.
df: The degrees of freedom, which is a function of the sample size.
Sig. (2-tailed): The p-value, which is the probability of obtaining a t-statistic
as extreme or more extreme than the one observed, assuming that the null
hypothesis is true. A p-value of 0.531 is not significant at the 0.05 level, so we
do not reject the null hypothesis.
Mean Difference: The difference between the sample mean and the
hypothesized mean.
6
95% Confidence Interval of the Difference: The range of values that we can
be 95% confident contains the true difference between the sample mean and
the hypothesized mean.
Answer:
The ANOVA test yields an F-statistic, which represents the ratio of the variation
between groups to the variation within groups. Based on this F-statistic and the
degrees of freedom, the test calculates a p-value, which indicates the probability of
obtaining the observed data if the null hypothesis (that all group means are equal)
were true. A small p-value (usually less than a chosen significance level, such as 0.05)
suggests that there are significant differences between at least two groups, and we
reject the null hypothesis. Conversely, a large p-value indicates that the observed
differences are likely due to chance, and we fail to reject the null hypothesis.
ANOVA is a powerful and widely used tool for comparing means across multiple
groups and is a fundamental analysis in many research fields, including psychology,
biology, social sciences, and more.
7
Question:
How to apply ANOVA using SPSS
Answer:
One-way ANOVA
Analyze Compare Means One-way ANOVA
Transfer the outcome variable of interest (DV) to the Dependent List
Transfer grouping variable to the Factor dialog box
Options click on the options box and choose ‘Descriptive’ and then Continue
Post Hoc click on the test chosen and then continue
Click OK
Two-way ANOVA
Analyze General Linear Model Univariate
In the main dialog box, place the outcome measure in the Dependent Variable box
Transfer your two grouping factors to the Fixed Factor(s) box
Select covariate and transfer it to the box labelled Covariate(s)
Click on EM Means for post-hoc
Transfer factor(s) to the box ‘Display Means for’
Click on ‘Compare main effects’
Click on chosen test for ‘Confidence interval adjustment’
Click Continue
Select Options box for descriptive statistics
Click OK
ANOVA
Marks
Sum of Squares df Mean Square F Sig.
Between Groups 182.933 2 91.467 .892 .435
Within Groups 1230.400 12 102.533
Total 1413.333 14
8
Answer:
The ANOVA table shows the results of a one-way ANOVA, which was conducted to
determine whether there is a significant difference in the mean marks of three groups.
The Sum of Squares (SS) column shows the total variation in the data. The df column
shows the degrees of freedom, which is a measure of the number of independent
samples. The Mean Square (MS) column shows the average variation within each
group. The F column shows the F-statistic, which is a measure of the difference
between the between-groups variation and the within-groups variation. The Sig.
column shows the p-value, which is the probability of obtaining an F-statistic as
extreme or more extreme than the one observed, assuming that the null hypothesis is
true.
In this case, the F-statistic is 0.892, which is not significant at the 0.05 level (p-value
= 0.435). This means that there is not enough evidence to conclude that there is a
significant difference in the mean marks of the three groups.
In other words, the data does not provide enough evidence to reject the null
hypothesis that the mean marks of the three groups are equal.
Sum of Squares Between Groups: This is the variation between the mean
marks of the three groups.
9
The chi-square test of independence is a nonparametric test, which means that it does
not make any assumptions about the distribution of the data. This makes the test
relatively robust to violations of assumptions.
The chi-square test of independence is a powerful tool for analyzing categorical data.
It can be used to test a wide variety of hypotheses, including the following:
Do the levels of one variable differ across the levels of another variable?
Is there a relationship between two categorical variables?
Is there a difference in the distribution of a categorical variable across two or
more groups?
The chi-square test of independence is a versatile tool that can be used to answer a
variety of questions about categorical data. It is a valuable tool for researchers and
analysts who work with categorical data.
Here are some additional details about the chi-square test of independence:
Answer:
The cross-tabulation table shows the relationship between gender and grades. The
table shows that there are 12 males and 13 females in the sample. Of the males, 3
received an A, 6 received a B, and 3 received a C. Of the females, 7 received an A, 2
received a B, and 4 received a C.
The table shows that there is no significant difference in the distribution of grades
between males and females. The same number of males and females received an A
(3), a B (6), and a C (3).
In this case, the cross-tabulation table does not show any significant
differences in the distribution of grades between males and females.
However, it is important to note that the sample size is relatively small, so it is
possible that these results may not be generalizable to the population as a
whole.
12
Chi-Square Tests
Asymptotic Significance
Value df (2-sided)
Pearson Chi-Square 3.709a 2 .157
Likelihood Ratio 3.842 2 .146
N of Valid Cases 25
a. 5 cells (83.3%) have expected count less than 5. The minimum expected count is 3.36.
Answer:
The Chi-Square table shows the results of a Chi-Square test, which was conducted to
determine whether there is a significant association between two categorical variables.
The Value column shows the Chi-Square statistic, which is a measure of the
difference between the observed and expected counts.
In this case, the Chi-Square statistic is 3.709, which is not significant at the 0.05 level
(p-value = 0.157). This means that there is not enough evidence to conclude that there
is a significant association between the two categorical variables.
The a. 5 cells (83.3%) have expected count less than 5. The minimum expected count
is 3.36. message indicates that there are 5 cells in the contingency table where the
expected count is less than 5. This means that the Chi-Square test may not be reliable,
as it is sensitive to small sample sizes.
Answer:
Chi-Square (χ2)
Analyze
Descriptive Statistics
Crosstabs
Enter one of the variables of interest in the box labelled Row(s)
13
Select the other variable of interest and transfer it to the box labelled Column(s)
Select the Statistics box and click on Chi-square
Click continue
Select Cells box and click on Observed and Expected under Counts, and Row,
Column and Total under Percentages
Click Continue
Click OK
Answer:
In statistics, nonparametric tests are methods of statistical analysis that do not make
any assumptions about the underlying distribution of the data. This makes them a
valuable tool for analyzing data that is not normally distributed or when the
distribution of the data is unknown.
There are many different types of nonparametric tests, each of which is designed to
answer a specific question. Some of the most common nonparametric tests include:
Nonparametric tests are a valuable tool for analyzing data that does not meet the
assumptions of parametric tests. They are also a relatively easy to understand and
implement, making them a good choice for researchers and analysts who are not
familiar with statistical theory.
They are distribution-free, which means that they do not make any
assumptions about the underlying distribution of the data.
14
They are less powerful than parametric tests when the data is normally
distributed.
They may not be as sensitive to small differences between groups.
They may not be as widely available as parametric tests.
Overall, nonparametric tests are a valuable tool for analyzing data that does not meet
the assumptions of parametric tests. They are relatively easy to understand and
implement, and they are often more powerful than parametric tests when the data is
not normally distributed.
Mann-Whitney U
Analyze Nonparamteric Tests Legacy Dialogs 2 Independent Samples
Select the dependent variable(s) of interest and transfer them to the box labelled Test
Variable List
Select the independent or grouping variable and transfer it to the box labelled
grouping Variable
Click on the Define Groups box and type in the numeric codes you gave to the two
groups
Click on the Options box to get descriptive statistics and quartiles
Click Continue
Click OK
Select the dependent variable of interest and transfer it to the box labelled Test
Variable List
Select the independent or grouping variable and transfer it to the box labelled
Grouping Variable
Click on the Define Groups box and type in the numeric codes you gave to the two
groups
Click options to obtain descriptives and Quartiles
Click Continue
Click OK
Friedman’s ANOVA
Analyze Nonparamteric Tests K Related Samples
Select the dependent variables of interest and transfer it to the box labelled Test
Variables
Select Statistics to obtain descriptives and quartiles
Click Continue
Click OK