0% found this document useful (0 votes)
2 views

SPSS Notes_Final (1)

The document provides an overview of the Statistical Package for the Social Sciences (SPSS), detailing its history, evolution, and significance in data analysis within social sciences. It explains various statistical methods including regression, t-tests, and ANOVA, along with their applications and interpretations using SPSS. Additionally, it outlines step-by-step instructions for conducting these analyses in SPSS.

Uploaded by

murtaza3120
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
2 views

SPSS Notes_Final (1)

The document provides an overview of the Statistical Package for the Social Sciences (SPSS), detailing its history, evolution, and significance in data analysis within social sciences. It explains various statistical methods including regression, t-tests, and ANOVA, along with their applications and interpretations using SPSS. Additionally, it outlines step-by-step instructions for conducting these analyses in SPSS.

Uploaded by

murtaza3120
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 15

1

Introduction to Statistical Packages for Social Sciences


Dr. Maqsood Ahmad
Assistant Professor
Department of Statistics
University of Okara

Introduction to SPSS

SPSS (Statistical Package for the Social Sciences) is a software program that
has played a significant role in the field of social sciences and data analysis since its
inception. Developed by Norman H. Nie, C. Hadlai "Tex" Hull, and Dale H. Bent at
Stanford University in the late 1960s, SPSS has revolutionized statistical analysis and
research methodologies, becoming one of the most widely used software packages in
the social sciences.
The history of SPSS dates back to 1968 when Nie and his colleagues set out
to develop a system to facilitate data analysis in social science research. Initially
called the Statistical Package for the Social Sciences, the software was developed for
use on mainframe computers. It was designed to be user-friendly, allowing
researchers without extensive statistical training to analyze data effectively.
The early versions of SPSS were command-line driven, requiring users to input
commands to perform various statistical analyses. Despite its command-based
interface, SPSS quickly gained popularity due to its powerful features and ease of use.
As the demand for the software grew, SPSS Inc. was founded in 1975 to further
develop and market the software.
Over the years, SPSS continued to evolve and expand its capabilities. In the 1980s,
SPSS introduced a graphical user interface (GUI), making it more accessible to a
broader audience. The GUI allowed users to interact with the software using menus
and icons, simplifying the process of data analysis. This user-friendly interface
contributed to SPSS's widespread adoption across academic, business, and
government sectors. With each new version, SPSS incorporated additional statistical
procedures and data management tools, solidifying its position as a comprehensive
statistical analysis software. Its extensive range of features enabled researchers to
conduct a wide variety of analyses, including descriptive statistics, hypothesis testing,
regression analysis, factor analysis, and more.
2

In 2009, SPSS Inc. was acquired by IBM Corporation, and the software was
rebranded as IBM SPSS Statistics. The acquisition further enhanced the resources
available for the development and support of the software, ensuring its continued
growth and improvement. SPSS has continued to evolve in recent years, keeping up
with advances in technology and data analysis methodologies. The software now
offers integration with other data analysis tools and programming languages, such as
Python and R, expanding its capabilities and providing flexibility to users with
diverse needs.
Today, SPSS is widely recognized as a leading statistical analysis software package,
trusted by researchers, analysts, and data scientists worldwide. Its user-friendly
interface, extensive range of statistical procedures, and powerful data management
tools have made it a staple in various fields, including social sciences, market research,
healthcare, and education. In conclusion, the history of SPSS is a story of innovation
and adaptation. From its humble beginnings as a mainframe-based statistical package
to its current iteration as IBM SPSS Statistics, the software has consistently provided
researchers with powerful tools to analyze and interpret data. Its ease of use,
comprehensive features, and continuous development have contributed to its enduring
popularity and its status as a cornerstone in the field of social sciences and data
analysis.

Regression

Question: Interpret the following table

Coefficientsa
Standardized
Unstandardized Coefficients Coefficients
Model B Std. Error Beta t Sig.
1 (Constant) 2.311 1.589 1.454 .168
Single_Strength .000 .004 -.020 -.076 .941
a. Dependent Variable: Elongation

Answer:

The table shows the results of a linear regression model, which was conducted to
determine whether there is a significant relationship between Single_Strength and
Elongation.
3

The B column shows the unstandardized coefficients, which are the estimates of the
slopes of the regression line. The Std. Error column shows the standard errors of the
coefficients, which are used to measure the uncertainty of the estimates. The Beta
column shows the standardized coefficients, which are the estimates of the slopes of
the regression line after they have been standardized. The t column shows the t-
statistics, which are used to test the significance of the coefficients. The Sig. column
shows the p-values, which are the probability of obtaining a t-statistic as extreme or
more extreme than the one observed, assuming that the null hypothesis is true.

In this case, the B coefficient for Single_Strength is 0.000, which means that there is
no significant linear relationship between Single_Strength and Elongation. The p-
value for Single_Strength is 0.941, which is greater than 0.05, so we do not reject the
null hypothesis that the coefficient is equal to 0. This means that there is not enough
evidence to conclude that Single_Strength has a significant impact on Elongation.

The Beta coefficient for Single_Strength is -0.020, which means that a one-unit
increase in Single_Strength is associated with a 0.020-unit decrease in Elongation.
However, the p-value for Single_Strength is 0.941, so we do not reject the null
hypothesis that the coefficient is equal to 0. This means that we cannot be confident
that the decrease in Elongation is actually caused by Single_Strength.

In conclusion, the results of the linear regression model do not provide enough
evidence to conclude that there is a significant relationship between Single_Strength
and Elongation.

Question: What is t-test in Statistics

Answer:
A t-test is a statistical hypothesis test used to determine if there is a significant
difference between the means of two groups. It helps researchers and analysts assess
whether the observed difference between the sample means is likely to be due to
chance or if it represents a true difference in the population means.
There are different types of t-tests, but the most common ones are:
1. Independent samples t-test: This test is used when comparing the means of
two independent groups or samples. For example, it could be used to compare
the average test scores of students from two different schools to see if there is
a significant difference in their academic performance.
2. Paired samples t-test: Also known as a dependent samples t-test, this test is
used when comparing the means of two related or paired samples. The data is
4

collected from the same individuals or objects under two different conditions
or time points. For instance, it could be used to assess whether there is a
significant difference in the weight of individuals before and after a weight
loss intervention.
The t-test relies on assumptions, such as the data being approximately normally
distributed and having equal variances in both groups (for the independent samples t-
test). When these assumptions are met, the t-test produces reliable results. However, if
the data violates these assumptions, alternative tests like non-parametric tests may be
more appropriate.
The t-test calculates a t-statistic, which measures the difference between the sample
means relative to the variability within the groups. Based on the t-statistic and the
degrees of freedom (related to sample size), the test determines the probability (p-
value) of observing the data if there were no true difference between the groups. A
small p-value (typically less than a predetermined significance level, often 0.05)
indicates that the observed difference is unlikely to be due to chance, and we can
reject the null hypothesis that there is no difference between the population means. On
the other hand, a large p-value suggests that the observed difference is likely due to
chance, and we fail to reject the null hypothesis.
In summary, the t-test is a widely used statistical tool for comparing means of two
groups and determining whether the observed differences are statistically significant.

Question:
How to apply t-test using SPSS

Answer:

Independent t-test
Analyze  Compare Means  Independent-Samples T Test
Select the dependent variable(s) and transfer it to the box labelled Test Variable(s)
Select an independent variable and transfer it to the box labelled Grouping Variable
Define Groups button will become activated  click on the Define Groups box

Dependent (paired) t-test:


Analyze  Compare Means  Paired-Samples T Test
Select the 2 variables from the list
Transfer them to the box labelled Paired Variables
Click OK
5

Question:
Interpret the following table

One-Sample Test
Test Value = 80
95% Confidence Interval of the
Difference
t df Sig. (2-tailed) Mean Difference Lower Upper
Marks -.642 14 .531 -1.66667 -7.2308 3.8975

Answer:

The table shows the results of a one-sample t-test, which was conducted to determine
whether the mean marks of a sample of students are significantly different from a
hypothesized value of 80.

The t-statistic is -0.642, which is not significant at the 0.05 level (p-value = 0.531).
This means that there is not enough evidence to conclude that the mean marks of the
sample are significantly different from 80.

The 95% confidence interval for the mean difference is -7.2308 to 3.8975. This means
that we can be 95% confident that the true mean difference between the sample mean
and the hypothesized mean of 80 is between -7.2308 and 3.8975.

In conclusion, the results of the one-sample t-test do not provide enough evidence to
conclude that the mean marks of the sample are significantly different from 80.

Here is a more detailed explanation of the table:

 Test Value: The hypothesized value of the mean marks in the population.
 t: The t-statistic, which is a measure of how far the sample mean is from the
hypothesized mean.
 df: The degrees of freedom, which is a function of the sample size.
 Sig. (2-tailed): The p-value, which is the probability of obtaining a t-statistic
as extreme or more extreme than the one observed, assuming that the null
hypothesis is true. A p-value of 0.531 is not significant at the 0.05 level, so we
do not reject the null hypothesis.
 Mean Difference: The difference between the sample mean and the
hypothesized mean.
6

 95% Confidence Interval of the Difference: The range of values that we can
be 95% confident contains the true difference between the sample mean and
the hypothesized mean.

Question: What is ANOVA in Statistics

Answer:

ANOVA, which stands for Analysis of Variance, is a statistical method used to


compare the means of three or more groups or treatments simultaneously. It allows
researchers to determine whether there are any statistically significant differences
among the means of the different groups. ANOVA is an extension of the t-test, which
is used to compare means between two groups.
The key idea behind ANOVA is to partition the total variation observed in the data
into two components: variation between groups and variation within groups. If the
variation between groups is significantly larger than the variation within groups, it
suggests that there are meaningful differences among the groups.
ANOVA can be used in various scenarios, including but not limited to:
1. One-way ANOVA: This is used when there is one categorical independent
variable (also called a factor) with three or more levels, and a continuous
dependent variable is measured for each level. For example, we could use one-
way ANOVA to compare the test scores of students from three different
schools.
2. Two-way ANOVA: This is an extension of one-way ANOVA when there are
two independent variables (factors) with multiple levels each, and a
continuous dependent variable. It allows us to examine the interaction effects
between the two factors. For example, we could use two-way ANOVA to
study the effects of both gender and educational background on exam
performance.
3. Repeated measures ANOVA: This is used when measurements are taken from
the same subjects at multiple time points or under different conditions. It
accounts for the within-subject correlation and is useful in analyzing data from
longitudinal or repeated measures studies.

The ANOVA test yields an F-statistic, which represents the ratio of the variation
between groups to the variation within groups. Based on this F-statistic and the
degrees of freedom, the test calculates a p-value, which indicates the probability of
obtaining the observed data if the null hypothesis (that all group means are equal)
were true. A small p-value (usually less than a chosen significance level, such as 0.05)
suggests that there are significant differences between at least two groups, and we
reject the null hypothesis. Conversely, a large p-value indicates that the observed
differences are likely due to chance, and we fail to reject the null hypothesis.
ANOVA is a powerful and widely used tool for comparing means across multiple
groups and is a fundamental analysis in many research fields, including psychology,
biology, social sciences, and more.
7

Question:
How to apply ANOVA using SPSS

Answer:

One-way ANOVA
Analyze  Compare Means  One-way ANOVA
Transfer the outcome variable of interest (DV) to the Dependent List
Transfer grouping variable to the Factor dialog box
Options  click on the options box and choose ‘Descriptive’ and then Continue
Post Hoc  click on the test chosen and then continue
Click OK

Two-way ANOVA
Analyze  General Linear Model  Univariate
In the main dialog box, place the outcome measure in the Dependent Variable box
Transfer your two grouping factors to the Fixed Factor(s) box
Select covariate and transfer it to the box labelled Covariate(s)
Click on EM Means for post-hoc
Transfer factor(s) to the box ‘Display Means for’
Click on ‘Compare main effects’
Click on chosen test for ‘Confidence interval adjustment’
Click Continue
Select Options box for descriptive statistics
Click OK

Question: Interpret the following output of ANOVA

ANOVA
Marks
Sum of Squares df Mean Square F Sig.
Between Groups 182.933 2 91.467 .892 .435
Within Groups 1230.400 12 102.533
Total 1413.333 14
8

Answer:

The ANOVA table shows the results of a one-way ANOVA, which was conducted to
determine whether there is a significant difference in the mean marks of three groups.

The Sum of Squares (SS) column shows the total variation in the data. The df column
shows the degrees of freedom, which is a measure of the number of independent
samples. The Mean Square (MS) column shows the average variation within each
group. The F column shows the F-statistic, which is a measure of the difference
between the between-groups variation and the within-groups variation. The Sig.
column shows the p-value, which is the probability of obtaining an F-statistic as
extreme or more extreme than the one observed, assuming that the null hypothesis is
true.

In this case, the F-statistic is 0.892, which is not significant at the 0.05 level (p-value
= 0.435). This means that there is not enough evidence to conclude that there is a
significant difference in the mean marks of the three groups.

In other words, the data does not provide enough evidence to reject the null
hypothesis that the mean marks of the three groups are equal.

Here is a more detailed explanation of the ANOVA table:

 Sum of Squares Between Groups: This is the variation between the mean
marks of the three groups.
9

 df Between Groups: This is the number of degrees of freedom between


groups, which is equal to the number of groups minus 1.
 Mean Square Between Groups: This is the average variation between
groups.
 F: This is the F-statistic, which is a measure of the difference between the
between-groups variation and the within-groups variation.
 Sig.: This is the p-value, which is the probability of obtaining an F-statistic as
extreme or more extreme than the one observed, assuming that the null
hypothesis is true.
 Sum of Squares Within Groups: This is the variation within each of the
three groups.
 df Within Groups: This is the number of degrees of freedom within groups,
which is equal to the total number of observations minus the number of
groups.
 Mean Square Within Groups: This is the average variation within each of
the three groups.

One-way Repeated Measures ANOVA


Analyze  General Linear Model  Repeated Measures
In ‘Repeated measures Define Factor(s)’ dialog box, enter the Number of levels of
your test
Click Add and then Define
In the main dialog box, transfer the variables under ‘Within-Subjects Variables’
Click on Options box descriptive stats
Click on EM Means for post-hoc
Transfer factor1 to the box ‘Display Means for’
Click on ‘Compare main effects’
Click on chosen test for ‘Confidence interval adjustment’
Click Continue

8. Analysis of Covariance (ANCOVA)


Analyze  General Linear Model  Univariate
In the main dialog box, place the outcome measure in the Dependent Variable box
Transfer your grouping factor to the Fixed Factor(s) box
Select covariate and transfer it to the box labelled Covariate(s)
Click on EM Means for post-hoc
Transfer factor1 to the box ‘Display Means for’
Click on ‘Compare main effects’
Click on chosen test for ‘Confidence interval adjustment’
Click Continue
Select Options box for descriptive statistics
Click OK
10

Chi Square Test

The chi-square test of independence is a statistical hypothesis test used to determine


whether there is a significant association between two categorical variables. The test
is based on the idea that if there is no association between the two variables, then the
observed counts in the contingency table will be close to the expected counts.
However, if there is an association between the two variables, then the observed
counts will be different from the expected counts.

The chi-square test of independence is a nonparametric test, which means that it does
not make any assumptions about the distribution of the data. This makes the test
relatively robust to violations of assumptions.

The chi-square test of independence is a powerful tool for analyzing categorical data.
It can be used to test a wide variety of hypotheses, including the following:

 Do the levels of one variable differ across the levels of another variable?
 Is there a relationship between two categorical variables?
 Is there a difference in the distribution of a categorical variable across two or
more groups?

The chi-square test of independence is a versatile tool that can be used to answer a
variety of questions about categorical data. It is a valuable tool for researchers and
analysts who work with categorical data.

Here are some additional details about the chi-square test of independence:

 The chi-square statistic is calculated by comparing the observed counts in the


contingency table to the expected counts.
 The expected counts are calculated assuming that there is no association
between the two variables.
 The p-value is the probability of obtaining a chi-square statistic as extreme or
more extreme than the one observed, assuming that the null hypothesis is true.
 If the p-value is less than the significance level, then we reject the null
hypothesis and conclude that there is a significant association between the two
variables.
11

Question: Interpret the following table

Gender * Grades Crosstabulation


Count
Grades
A B C Total
Gender Male 3 6 3 12
Female 7 2 4 13
Total 10 8 7 25

Answer:

The cross-tabulation table shows the relationship between gender and grades. The
table shows that there are 12 males and 13 females in the sample. Of the males, 3
received an A, 6 received a B, and 3 received a C. Of the females, 7 received an A, 2
received a B, and 4 received a C.

The table shows that there is no significant difference in the distribution of grades
between males and females. The same number of males and females received an A
(3), a B (6), and a C (3).

Here is a more detailed explanation of the cross-tabulation table:

 Grades: This is the variable of interest. It has three levels: A, B, and C.


 Gender: This is the independent variable. It has two levels: male and
female.
 Count: This is the number of observations in each cell of the table.

 The cross-tabulation table can be used to answer the following


questions:

 Is there a difference in the distribution of grades between males and


females?
 What is the proportion of males and females who received each grade?
 Are there any other patterns in the data?

In this case, the cross-tabulation table does not show any significant
differences in the distribution of grades between males and females.
However, it is important to note that the sample size is relatively small, so it is
possible that these results may not be generalizable to the population as a
whole.
12

Question: Interpret the following table

Chi-Square Tests
Asymptotic Significance
Value df (2-sided)
Pearson Chi-Square 3.709a 2 .157
Likelihood Ratio 3.842 2 .146
N of Valid Cases 25
a. 5 cells (83.3%) have expected count less than 5. The minimum expected count is 3.36.

Answer:

The Chi-Square table shows the results of a Chi-Square test, which was conducted to
determine whether there is a significant association between two categorical variables.

The Value column shows the Chi-Square statistic, which is a measure of the
difference between the observed and expected counts.

 The df column shows the degrees of freedom, which is a function of the


number of rows and columns in the contingency table.
 The Asymptotic Significance (2-sided) column shows the p-value, which is
the probability of obtaining a Chi-Square statistic as extreme or more extreme
than the one observed, assuming that the null hypothesis is true.

In this case, the Chi-Square statistic is 3.709, which is not significant at the 0.05 level
(p-value = 0.157). This means that there is not enough evidence to conclude that there
is a significant association between the two categorical variables.

The a. 5 cells (83.3%) have expected count less than 5. The minimum expected count
is 3.36. message indicates that there are 5 cells in the contingency table where the
expected count is less than 5. This means that the Chi-Square test may not be reliable,
as it is sensitive to small sample sizes.

Question: How to apply chi square test in SPSS

Answer:

Chi-Square (χ2)
Analyze
Descriptive Statistics
Crosstabs
Enter one of the variables of interest in the box labelled Row(s)
13

Select the other variable of interest and transfer it to the box labelled Column(s)
Select the Statistics box and click on Chi-square
Click continue
Select Cells box and click on Observed and Expected under Counts, and Row,
Column and Total under Percentages
Click Continue
Click OK

Question: What is meant by Non-Parametric Tests in


Statistics

Answer:

In statistics, nonparametric tests are methods of statistical analysis that do not make
any assumptions about the underlying distribution of the data. This makes them a
valuable tool for analyzing data that is not normally distributed or when the
distribution of the data is unknown.

There are many different types of nonparametric tests, each of which is designed to
answer a specific question. Some of the most common nonparametric tests include:

 The Mann-Whitney U test: This test is used to compare two independent


groups.
 The Wilcoxon signed-rank test: This test is used to compare two dependent
groups.
 The Kruskal-Wallis test: This test is used to compare more than two
independent groups.
 The Spearman rank correlation test: This test is used to measure the
correlation between two ranked variables.
 The Chi-square test of independence: This test is used to determine whether
there is a significant association between two categorical variables.

Nonparametric tests are a valuable tool for analyzing data that does not meet the
assumptions of parametric tests. They are also a relatively easy to understand and
implement, making them a good choice for researchers and analysts who are not
familiar with statistical theory.

Here are some of the advantages of using nonparametric tests:

 They are distribution-free, which means that they do not make any
assumptions about the underlying distribution of the data.
14

 They are relatively easy to understand and implement.


 They are often more powerful than parametric tests when the data is not
normally distributed.

 Here are some of the disadvantages of using nonparametric tests:

 They are less powerful than parametric tests when the data is normally
distributed.
 They may not be as sensitive to small differences between groups.
 They may not be as widely available as parametric tests.

Overall, nonparametric tests are a valuable tool for analyzing data that does not meet
the assumptions of parametric tests. They are relatively easy to understand and
implement, and they are often more powerful than parametric tests when the data is
not normally distributed.

Question: How to apply non parametric tests in SPSS


Answer:

Mann-Whitney U
Analyze  Nonparamteric Tests  Legacy Dialogs  2 Independent Samples
Select the dependent variable(s) of interest and transfer them to the box labelled Test
Variable List
Select the independent or grouping variable and transfer it to the box labelled
grouping Variable
Click on the Define Groups box and type in the numeric codes you gave to the two
groups
Click on the Options box to get descriptive statistics and quartiles
Click Continue
Click OK

Wilcoxon Signed-rank Test


Analyze  Nonparamteric Tests  Legacy Dialogs 2 Related Samples
Select the 2 variables of interest and transfer them to the box labelled Test Pair(s) List
(you can select more than 1 pair)
Leave the Wilcoxon test selected under Test Type
Click on the Options box to get descriptive statistics and quartiles
Click Continue
Click OK

The Kruskall-Wallis Test


Analyze  Nonparamteric Tests  K Independent Samples
15

Select the dependent variable of interest and transfer it to the box labelled Test
Variable List
Select the independent or grouping variable and transfer it to the box labelled
Grouping Variable
Click on the Define Groups box and type in the numeric codes you gave to the two
groups
Click options to obtain descriptives and Quartiles
Click Continue
Click OK

Friedman’s ANOVA
Analyze  Nonparamteric Tests  K Related Samples
Select the dependent variables of interest and transfer it to the box labelled Test
Variables
Select Statistics to obtain descriptives and quartiles
Click Continue
Click OK

You might also like