Data Preparation & Analysis
Data Preparation & Analysis
Analysis
with
SPSS
Data Preparation Process
Check Questionnaire
Edit
Code
Transcribe
Clean Data
Data Analysis
Questionnaire Checking
A questionnaire returned from the field may be
unacceptable for several reasons.
Editing
Coding Questions
• Fixed field codes, which mean that the number of records for each
respondent is the same and the same data appear in the same
column(s) for all respondents, are highly desirable.
• Only a few (10% or less) of the responses should fall into the “other”
category.
• column number
• record number
• variable number
• variable name
• question number
• instructions for coding
Data Cleaning
Consistency Checks
Dependence Interdependence
Technique Technique
Ho (True) Ho(False)
Reject Ho
Type I Error (α) Correct Decision
(1- β )
Machine is working accurately. But,
it is assumed to be working
erroneously and hence filling will
be 23 April 2018
stopped & mechanic is called.
Hypotheses Testing
• Level of Significance (α): Risk that a researcher is willing to take of rejecting the null hypotheses when it
happens to be true. It is probability of making a Type I error (α). The higher the significance level, the higher
the probability of rejecting a null hypothesis when its true.
• Critical Region: It is the rejection region. If the value of mean falls within this region, the null hypothesis is
rejected.
• Critical value: The value of a test statistic beyond which the null hypothesis can be rejected.
• Power of Test (1- β): It is the ability of a test to reject a false null hypothesis. The probability of supporting
an alternative hypothesis that is true. High value of 1- β(near 1) means test is working fine, it is rejecting a
null hypothesis when it is false.
• One-Tailed Test : If null hypothesis is rejected only for values of the test statistic falling into one specified
tail of its sampling distribution.
• Two-Tailed Test: If the null hypothesis is rejected for values of the test statistic falling into either tail of its
sampling distribution. A deviation in either direction would reject the null hypothesis. Normally α is divided
into α/2 on one side and α/2 on the other.
One Tailed & Two Tailed Test
• A manufacturer of a light bulb wants to produce bulbs with a mean life of 1000
hours. If the lifetime is shorter, he will lose customers to the competitors; if the
lifetime is longer, he will have a very high production cost because the filaments will
be very thick. Determine the type of test.
• The wholesaler buys bulbs in large lots & does not want to accept bulbs unless
their mean life is at least 1000 hours. Determine the type of test.
One Tailed Test
Two Tailed Test
Univariate
Data
Analysis
t-tests (Cases)
One sample t-test : To test if mean of a distribution differs significantly from
some preset value
For the given marks.sav file, find if the final marks scored by students differ
significantly from the Professor’s goal of class average of 60. Design
hypothesis & test it.
t-tests (Cases)
Independent sample t-test : To test if means of a distribution of two samples
differs significantly from each other
If there are 15 customers of our brand each in Mumbai & Delhi, and they
are asked to rate our brand on a 7 point scale. 1= most disliked & 7 = most
liked.
The ratings by these 30 customers from two cities are mentioned next.
Develop a hypothesis to test if ratings by two cities are different. Also test
the hypothesis.
t-tests (Cases)
Paired sample t-test : To test if two measurements on the same sample differ
significantly
If, however, we wish to see whether any of the five different ethnic groups’ scores
differ significantly from each other on the same quiz, it would require one way analysis
of variance to accomplish it.
Two (Three) way ANOVA means: Exactly one dependent variable &
Exactly two (three) independent variable
Education Background:
• B.com (1)
• B.E. (2)
• B.Sc. (3)
Ho: Graduation background of MBA students does not
• B.B.A. (4)
influence their performance in terms of grade.
• B.A. (5)
Ha: Graduation background of MBA students influence their
Grade Codes: performance in terms of grade.
• A (1)
• B (2)
• C (3)
Correlation (r)
• Degree of association between two sets of quantitative data e.g. how crop
production is correlated with rainfall?
• r varies from -1 to +1; r=0 (no correlation); r= (+/-)1 (perfect correlation)
• File # REGRESSION.sav
It is dersired to study the effect that six different conditions (independent variables)
have on yield per hectre for a crop of wheat. The research was conducted by
accumulating data from fifteen major states in India
The six independent variables are;
X1= Rainfall (in cms)
X2= Soil type (1, low quality to 5, high quality)
X3= Quantity of fertilizer (in quintal/ sq. km of land)
X4= Land percentage being irrigated by State Agri. Deptt.
X5= Seed quality (1, low quality to 5, high quality)
X6= Percentage of automation in cultivation process
Dependent variable is Y= yield per hectre in quintals
Regression
We need to determine:
4. Regression Equation