Chapter - I 1. Introducton To SPSS: Software Package Interactive Batched Statistical Analysis SPSS Inc. IBM
Chapter - I 1. Introducton To SPSS: Software Package Interactive Batched Statistical Analysis SPSS Inc. IBM
1. INTRODUCTON TO SPSS
1.1. OVERVIEW OF SPSS:
SPSS is a widely used program for statistical analysis in social science. It is also used by market
researchers, health researchers, survey companies, government, education researchers, marketing
organizations, data miners, and others. The original SPSS manual (Nie, Bent & Hull, 1970) has been
described as one of "sociology's most influential books" for allowing ordinary researchers to do their own
statistical analysis. In addition to statistical analysis, data management (case selection, file reshaping,
creating derived data) and data documentation are features of the base software.
SPSS, the Statistical Package for the Social Sciences) has been developed by three students at the
University of Stanford (Norman H. Nie, C. Hadlai (Tex) Hull and Dale H. Bent), after graduation N. Nie
moved to the University of Chicago, joined by Hull (National Opinion Research Center). Initially not
meant for distribution outside their home university, the publication of the first Manuel made SPSS widely
known and used. Initially developed for IBM mainframe computers, versions for most other important
mainframe brands (Univac, CDC, Honeywell...,) and later for so-called minicomputers were available.
SPSS Inc. was the founded in 1975. In 2009 IBM acquired SPSS; it is now fully integrated into the IBM
Corporation Business Analytics Software portfolio.
The software was released in its first version in 1968 as the Statistical Package for the Social
Sciences (SPSS) after being developed by Norman H. Nie, Dale H. Bent, and C. Hadlai Hull. Those
principals incorporated as SPSS Inc. in 1975. Early versions of SPSS Statistics were written in Fortranand
designed for batch processing on mainframes, including for example IBM and ICL versions, originally
using punched cards for data and program input. A processing run read a command file of SPSS commands
1
and either a raw input file of fixed format data with a single record type, or a 'getfile' of data saved by a
previous run.
To save precious computer time an 'edit' run could be done to check command syntax without
analysing the data. From version 10 (SPSS-X) in 1983, data files could contain multiple record types. Prior
to SPSS 16.0, different versions of SPSS were available for Windows, Mac OS X and UNIX.SPSS
Statistics version 13.0 for Mac OS X was not compatible with Intel-based Macintosh computers, due to
the Rosetta emulation software causing errors in calculations. SPSS Statistics 15.0 for Windows needed a
downloadable hotfix to be installed in order to be compatible with Windows Vista.
From version 16.0 the same version runs under Windows, Mac, and Linux. The graphical user
interface is written in Java. The Mac OS version is provided as a Universal binary, making it fully
compatible with both PowerPC and Intel-based Mac hardware.
SPSS offers four programs that assist researchers with their complex data analysis needs.
Statistics Program: It furnishes a plethora of basic statistical functions like frequencies and cross
tabulation.
Modeler Program: It enables researchers to build and validate predictive models using advanced
statistical procedures.
Text Analytics for Surveys Program: It helps survey administrators uncover powerful insights.
Visualization Designer: It allows researchers to use their data to create a wide variety of visuals
like density charts and radial box plots very easily.
SPSS is an extremely powerful tool for manipulating and deciphering survey data.
It makes the process of pulling, manipulating and analyzing data clean and easy.
2
1.4. LIMITATIONS OF SPSS:
The major limitation of SPSS is that a very large data set cannot be analyzed. A researcher often
gets a large data ser in some fields, like insurance where the researcher generally uses SAS or R instead of
SPSS to analyze the data.
2. OPENING OF SPSS
3
Fig.1.1: OPENING OF SPSS
4
3. DETAILS OF MENU
STEPS TO OPEN FILE MENU:
screenshot
5
Fig.1.2: FILE MENU- NEW OPTION
screenshot
6
Fig.1.3: FILE MENU- OPEN OPTION
4. DETAILS OF VIEW
5. DETAILS OF EDIT
screenshot
6. PREPARATIONS OF QUESTIONNAIRE
Although questionnaires are often designed for statistical analysis of the responses, this is not
always the case.
Questionnaires have advantages over some other types of surveys in that they are cheap, do not
require as much effort from the questioner as verbal or telephone surveys, and often have standardized
answers that make it simple to compile data. However, such standardized answers may frustrate users.
9
Questionnaires are also sharply limited by the fact that respondents must be able to read the questions and
respond to them. Thus, for some demographic groups conducting a survey by questionnaire may not be
concrete.
Questionnaire should deal with important or significant topic to create interest among respondents.
It should seek only that data which cannot be obtained from other sources.
It should be attractive.
It should be represented in good Psychological order proceeding from general to more specific
responses.
Putting two questions in one question also should be avoided. Every question should seek to obtain
only one specific information
It should be designed to collect information which can be used subsequently as data for analysis.
10
Screenshot
Fig.1.6: QUESTIONNAIRE
7. DATA COLLECTION
Data collection is the process of gathering and measuring information on targeted variables in an
established system, which then enables one to answer relevant questions and evaluate outcomes. Data
collection is a component of research in all fields of study including physical and social
sciences, humanities, and business. While methods vary by discipline, the emphasis on ensuring accurate
11
and honest collection remains the same. The goal for all data collection is to capture quality evidence that
allows analysis to lead to the formulation of convincing and credible answers to the questions that have
been posed.
1) Primary Data – refers to the data that the investigator collects for the very first time. This type of data
has not been collected either by this or any other investigator before. A primary data will provide the
investigator with the most reliable first-hand information about the respondents. The investigator would
have a clear idea about the terminologies uses, the statistical units employed, the research methodology and
the size of the sample. Primary data may either be internal or external to the organization.
2) Secondary Data – refers to the data that the investigator collects from another source. Past investigators
or agents collect data required for their study. The investigator is the first researcher or statistician to collect
this data. Moreover, the investigator does not have a clear idea about the intricacies of the data. There may
be ambiguity in terms of the sample size and sample technique. There may also be unreliability with
respect to the accuracy of the data.
12
appropriate information for the research. Thus, this method of data collection ensures first-hand
information because the interviewers can cross-question for the right and appropriate information.
c) Mailed Questionnaire
Consists of mailing a set or series of questions related to the research. The respondent answers the
questionnaire and forwards it back to the investigator after marking his/her responses. This method of
collection of data has proven to be time-saving. It is also a very cost-efficient manner of collecting the
required data. An investigator who has the access to the internet and an email account can undertake this
method of data collection. The researcher can only investigate those respondents who also have access to
the internet and an email account. This remains the only major restriction of this method.
d) Schedules
Scheduling involves a face to face situation with the respondents. In this method of collecting data,
the interviewer questions the respondent according to the questions mentioned in a form. This form is
known as a schedule. This is different than a questionnaire. A questionnaire is personally filled by the
respondents and the interviewer may or may not be physically present. Whereas, the schedule is filled by
the enumerator or interviewer after asking the respondent his/her answer to a specific question. And in
scheduling method of collecting data, the interviewer or enumerator is physically present.
e) Local agencies
In this method, the information is not directly or indirectly collected by either the interviewer of the
enumerator. Instead, the interviewer hires or employs a local agency to work for him/her and help in
gathering appropriate information. These local agents are often known as correspondents as well.
Correspondents are only responsible for gathering accurate and reliable information. They work according
to their preference and adopt different methods to do so.
13
a) Published Sources
There are many national organizations, international agencies and official publications that collect various
statistical data. They collect data related to business, commerce, trade, prices, economy, productions,
services, industries, currency and foreign affairs. They also collect information related to various (internal
and external) socio-economic phenomena and publish them. These publications contain statistical reports of
various kinds. Central Government Official Publication, Publications of Research Institutions, Committee
Reports and International Publications are some published sources of secondary data.
b) Unpublished Sources
Some statistical data are not always a part of publications. Such data are stored by institutions an private
firms. Researchers often make use of these unpublished data in order to make their researches all the more
original.
screenshot
14
FIG: 1.7: DATA VIEW
CHAPTER -II
ANALYSIS OF DATA
Analysis of data plays an important role in the fulfillment of research objectives. Data is
summarized and observed to find patterns or relationships. Data is analyzed using various statistical
techniques requiring substantive theoretical as well as practical knowledge a researcher should first acquire
theoretical as well as practical knowledge and then proceed for data analysis on real data collected. The
techniques would vary depending on the nature of the research (qualitative/ quantitative study). This step of
the research process also includes the interpretation of findings and writing down the results and
conclusions.
1. Frequency Distribution.
3. Arithmetic Mean.
4. Median.
5. Mode.
6. T-test.
15
2.1. FREQUENCY DISTRIBUTION
Frequency Distribution is a method of displaying the frequency (number of times a particular value
of a variable repeats in the data) of different values of a categorical/ nominal variable in a dataset. It
represents the counts of all outcomes of a variable in a sample. The frequency distribution of variable
can be represented in tabular as well as graphical forms (bar charts, pie charts, etc.)
Frequency distribution is very common and important method for analyzing the nominal
(categorical) and ordinal (ranking) variables in a dataset.
16
Step 2: Next, transfer the variable ‘Education’ to the Variable (s)’ window and click ‘ charts’ as shown in
figure 5.3
Step 3: Next, select the type of chart (eg.bar charts) as shown in figure 5.4
17
Step 4: Finally, click ‘continue’ and then ‘OK’ . The final SPSS output in the tabular form is shown in table 5.4
Education
18
TABLE NO.-2.1 GENDER OF THE RESPONDENTS
1. MALE 14 56.0
2. FEMALE 11 44.0
TOTAL 25 100.0
INTERPRETATION:
In the above chart, 56% are male respondents and 44% are female.
2. Satisfied 7 28.0
3. Neutral 3 12.0
TOTAL 25 100.0
INTERPRETATION:
20
In the above chart 40% respondents are highly satisfied, 28% are satisfied, 12% are neutral, 12% are not
satisfied, 8% are highly not satisfied.
TOTAL 25 100.0
CHART NO.-2.3
INTERPRETATION:
21
RESPONDENTS
2. Favourable 4 16.0
3. Neutral 8 32.0
TOTAL 25 100.0
INTERPRETATION :
In the above chart, 20% respondents are highly favourable, 16% are favourable, 32% are neutral, 20% are
not favourable, 12% are highly not favourable regarding brand image.
22
TABLE NO.-2.5 ADVERTISING IN MARKET
2. Important 8 32.0
TOTAL 25 100.0
INTERPRETATION:
From the above chart, 20% respondents thinks advertising is very important, 32% are thinks important,
16% thinks less important, 32% thinks not important.
23
S.NO MEDIA NO. OF PERCENTAGE
RESPONDENTS
1. Television 5 20.0
2. Newspaper 7 28.0
3. Internet 6 24.0
4. Hoardings 7 28.0
TOTAL 25 100.0
CHART NO.2.6
INTERPRETATION:
From the above chart, impact of advertising is 20% from television, 28% from newspaper, 24% from
internet, 28% from hoardings.
24
S.NO. AWARENESS NO.OF PERCENTAGE
RESPONDENTS
1. Newspaper 9 36.0
2. Friends 8 32.0
3. Advertisements 5 20.0
4. relatives 3 12.0
TOTAL 25 100.0
CHART NO.-2.7
INTERPRETATION:
From the above chart, awareness of aachi masala 36% from newspaper, 32% from friends, 20% from
advertisements, 12% from relatives.
25
S.NO. ACTIVITIES NO.OF PERCENTAGE
RESPONDENTS
1. Offers 2 8.0
2. Discounts 11 44.0
3. Coupons 7 28.0
TOTAL 25 100.0
INTERPRETATION:
From the above chart, we conclude that promotional activities which attract the respondents most are offers
8%, discounts 44%, coupons 28%, buy one get one 20% .
26
RESPONDENTS
1. Television 5 20.0
2. Newspaper 7 28.0
3. Internet 6 24.0
4. Hoardings 7 28.0
TOTAL 25 100.0
INTERPRETATION:
From the above chart, the activities which created long term impact on respondents are 20% from
television, 28% from newspaper, 24% from internet, 28% from hoardings.
1. Tele-marketing 6 24.0
2. E-mail 8 32.0
4. Others 3 12.0
TOTAL 25 100.0
CHART NO.-2.10
INTERPRETATION:
From the above chart, 24% of media is tele-marketing, 32% is by E-Mail, 32% by personal selling, 12% by
others.
We concluded that media which is effective for disrect marketing is E-Mail and Personal selling.
28
S.NO. COMPANY NO. OF PERCENTAGE
RESPONDENTS
TOTAL 25 100.0
INTERPRETATION:
From the above chart, we got to know that Aachi masala company has 24%, sakthi masala has 44%, 8% of
MDH Masala, 20% of Everest masala.
The company which has got more publicity by respondents are sakthi masala and aachi masala.
29
It is one of the most popular methods of representing the joint frequency distribution of the cases of
two or more nominal variables in the dataset. For example. In the dataset given in the previous section, the
cross tabulation of the variables “gender” and “religion” can be analyzed as given below:
It is one of the most popular non-parametric tests. It is used in two cases, which are as follows:
The process of chi-square test compares the actual observed frequencies with the calculated expected
frequencies of different combinations of nominal variables. The difference between observed and expected
frequencies gives logic of possible association between categorical variables. The chi-squared statistic
compares the observed count in which table cell to the count that would be expected between the row and
column classifications under the assumptions of no association.
Question:-
30
Fig.2.4: OPENING OF CROSSTAB
Step 2 : Transfer ‘Educational background’ to the ‘Row(s)’ window and ‘Familiarity with the internet’ to
the ‘Column(s)’ window. Click ‘Statistics’.
31
Step 3: Click ‘Continue’.
Step 4: Click on cell and select ‘observed’ and ‘Expected’. Click ‘Continue’.
32
Fig.2.7: COMMAND FOR CROSSTAB
Step 5: Finally, select ‘OK’. The chi- square test results will appear.
Familiarity
S.No Education Low Medium High
Background Familiarity Familiarity Familiarity Total
1. Humanities Count 1 5 7 13
Expected Count 4.4 4.7 3.9 13.0
2. Management Count 6 4 4 14
Expected Count 4.8 5.0 4.2 14.0
3. Technology Count 5 8 3 16
Expected Count 5.4 5.8 4.8 16.0
4. IT Count 5 1 1 7
Expected Count 2.4 2.5 2.1 7.0
Total Count 17 18 15 50
Expected Count 17.0 18.0 15.0 50.0
Chi-Square Tests
Value df Asymp. Sig. (2-sided)
Pearson Chi-Square 11.638a 6 .071
Likelihood Ratio 12.101 6 .060
Linear-by-Linear Association 7.034 1 .008
N of Valid Cases 50
a. 9 cells (75.0%) have expected count less than 5. The minimum expected count is 2.10.
33
Arithmetic Mean : Arithmetic mean is commonly known as average. The average of a given set of
numbers is called the arithmetic mean, or simply, the mean of the given numbers.
2.4. MEDIAN
The middle number; found by ordering all data points and picking out the one in the middle (or if there are
two middle numbers, taking the mean of those two numbers).
Example: The median of 444, 111, and 777 is 444 because when the numbers are put in order (1(1left
parenthesis, 1, 444, 7)7)7, right parenthesis, the number 444 is in the middle.
2.5. MODE
The most frequent number—that is, the number that occurs the highest number of times.
STEPS:
34
Step 1: Click ‘analyze’ ‘descriptive statistics’ ‘Frequencies’
STEP 2: next, transfer the variable to the ‘variables’ window and click ‘statistics’.
35
STEP 3: Select the options: ‘Mean’,’Median’,’Mode’ and ‘Quartiles’. Next click CONTINUE and then
OK
Statistics
education background
N Valid 50
Missing 1
Mean 2.3400
Median 2.0000
Mode 3.00
Percentiles 25 1.0000
50 2.0000
75 3.0000
36
2.6. One - Sample T-Test
In many situations, we come across claims made by marketers about their products. For example, a car
manufacturer may claim that the average mileage of a car is, for say, 19.9 kmpl or a business school may
claim that the average package offered to its students is Rs. 12 lakh per annum .A researcher may be
interested in analyzing the truthfulness of these claims. For this analysis, the researcher needs to randomly
pick a small from the population and compare its mean with the claimed population mean. The sample
mean and the population mean may be different from each other. In order to test whether this difference is
statistically significant, we should apply one-sample t-test.
“Ho: there is no significant difference between sample mean and population mean.”
37
Step 2: Next, transfer the test variable ‘weight_lost’ to the ‘Test variable(s)’ window and click ‘OK’ as shown in
figure 7.2:
One-Sample Statistics
Std. Error
N Mean Std. Deviation Mean
how satisfied are you with
aachi masala? 25 2.2000 1.32288 .26458
38
One-Sample Test
Test Value = 0
95% Confidence Interval of the
Difference
Mean
t df Sig. (2-tailed) Difference Lower Upper
how satisfied are you with
aachi masala? 8.315 24 .000 2.20000 1.6539 2.7461
39