0% found this document useful (0 votes)

55 views

Data Preparation & Analysis

The document discusses various steps involved in data preparation and analysis using SPSS. It covers questionnaire checking, editing, coding, creating a codebook, data cleaning, and selecting appropriate univariate and multivariate statistical techniques based on the characteristics and properties of the data.

Uploaded by

Vardaan Bhaik

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

55 views

Data Preparation & Analysis

Uploaded by

Vardaan Bhaik

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 27

Data Preparation &

Analysis
with
SPSS
Data Preparation Process
Check Questionnaire

Edit

Code

Transcribe

Clean Data

Data Analysis
Questionnaire Checking
A questionnaire returned from the field may be
unacceptable for several reasons.
Editing

Editing the questionnaires involves identifying illegible, incomplete,

inconsistent or ambiguous responses:

Treatment of Unsatisfactory Results

– Returning to the Field

– Assigning Missing Values

– Discarding Unsatisfactory Respondents

Coding
Coding means assigning a code, usually a number, to each possible
response to each question. The code includes an indication of
– the column position (field) e.g. sex of a respondent
– data record that includes related fields such as sex, marital status,
age, income etc.

Coding Questions

• Fixed field codes, which mean that the number of records for each
respondent is the same and the same data appear in the same
column(s) for all respondents, are highly desirable.

• If possible, standard codes should be used for missing data. Coding of

structured questions is relatively simple, since the response options are
predetermined.

• In questions that permit a large number of responses, each possible

response option should be assigned a separate column.
Coding
Guidelines for coding unstructured questions:

• Category codes should be mutually exclusive and collectively

exhaustive.

• Only a few (10% or less) of the responses should fall into the “other”
category.

• Category codes should be assigned for critical issues even if no one

has mentioned them.

• Data should be coded to retain as much detail as possible.

Coding : Codebook
A codebook contains coding instructions and the necessary
information about variables in the data set. A codebook generally
contains the following information:

• column number
• record number
• variable number
• variable name
• question number
• instructions for coding
Data Cleaning
Consistency Checks

Consistency checks : identify data that are out of range, logically

inconsistent, or have extreme values.
Selecting a Data Analysis Strategy
Earlier Steps of the Research Process

Known Characteristics of the Data

Properties of Statistical Techniques

Background and Philosophy of the Researcher

Data Analysis Strategy

• Metric Data- Data that are on interval or ratio scale
• Non-metric Data- Data that are on nominal or ordinal scale
• Univariate Techniques- Statistical techniques appropriate for analysing
data when there is single measurement of each element in the sample.
• Multivariate Techniques- Statistical techniques appropriate for analysing
data when there are two or more measurements on each element in the
sample. It tells simultaneous relationship between two or more
phenomenon.
– Dependence Techniques- When one or more of the variables can be
identified as dependent variable & the remaining as independent
variables.
– Interdependence Techniques- The techniques that attempt to group
data based on underlying similarity. No distinction is made as to which
variables are dependent/ independent.
A Classification of Univariate Techniques
Univariate Techniques

Metric Data Non-numeric Data

One Sample Two or More One Sample Two or More

Samples Samples
* t test * Frequency
* Z test * Chi-Square
* K-S
* Runs
* Binomial
Independent Related
* Two- Group * Paired Independent Related
test t test
* Z test * Chi-Square
* One-Way * Sign
* Mann-Whitney * Wilcoxon
ANOVA * Median * McNemar
* K-S * Chi-Square
* K-W ANOVA
A Classification of Multivariate Techniques
Multivariate Techniques

Dependence Interdependence
Technique Technique

One Dependent More Than One Variable Interobject

Variable Dependent Interdependence Similarity
Variable
* Cross- * Multivariate * Factor * Cluster Analysis
Tabulation Analysis of Analysis * Multidimensional
* Analysis of Variance and Scaling
Variance and Covariance
Covariance * Canonical
* Multiple Correlation
Regression * Multiple
* Conjoint Discriminant
Analysis Analysis
Type I Error & Type II Error
• A Type I error (α) is the mistake of rejecting the null hypothesis when it is true.
• A Type II error (β) is the mistake of failing to reject the null hypothesis when it is false.
Machine is working erroneously. But, it is assumed
to be working accurately and hence, it will fill in
wrongly causing loss to company & customers.

Ho (True) Ho(False)

Accept Ho Correct Decision Type II Error (β)

(1- α)

Reject Ho
Type I Error (α) Correct Decision
(1- β )
Machine is working accurately. But,
it is assumed to be working
erroneously and hence filling will
be 23 April 2018
stopped & mechanic is called.
Hypotheses Testing
• Level of Significance (α): Risk that a researcher is willing to take of rejecting the null hypotheses when it
happens to be true. It is probability of making a Type I error (α). The higher the significance level, the higher
the probability of rejecting a null hypothesis when its true.

• Critical Region: It is the rejection region. If the value of mean falls within this region, the null hypothesis is
rejected.

• Critical value: The value of a test statistic beyond which the null hypothesis can be rejected.

• Power of Test (1- β): It is the ability of a test to reject a false null hypothesis. The probability of supporting
an alternative hypothesis that is true. High value of 1- β(near 1) means test is working fine, it is rejecting a
null hypothesis when it is false.

• One-Tailed Test : If null hypothesis is rejected only for values of the test statistic falling into one specified
tail of its sampling distribution.

• Two-Tailed Test: If the null hypothesis is rejected for values of the test statistic falling into either tail of its
sampling distribution. A deviation in either direction would reject the null hypothesis. Normally α is divided
into α/2 on one side and α/2 on the other.
One Tailed & Two Tailed Test
• A manufacturer of a light bulb wants to produce bulbs with a mean life of 1000
hours. If the lifetime is shorter, he will lose customers to the competitors; if the
lifetime is longer, he will have a very high production cost because the filaments will
be very thick. Determine the type of test.

• The wholesaler buys bulbs in large lots & does not want to accept bulbs unless
their mean life is at least 1000 hours. Determine the type of test.
One Tailed Test
Two Tailed Test
Univariate
Data
Analysis
t-tests (Cases)
One sample t-test : To test if mean of a distribution differs significantly from
some preset value

For the given marks.sav file, find if the final marks scored by students differ
significantly from the Professor’s goal of class average of 60. Design
hypothesis & test it.
t-tests (Cases)
Independent sample t-test : To test if means of a distribution of two samples
differs significantly from each other

If there are 15 customers of our brand each in Mumbai & Delhi, and they
are asked to rate our brand on a 7 point scale. 1= most disliked & 7 = most
liked.
The ratings by these 30 customers from two cities are mentioned next.
Develop a hypothesis to test if ratings by two cities are different. Also test
the hypothesis.
t-tests (Cases)
Paired sample t-test : To test if two measurements on the same sample differ
significantly

If there are 18 customers of Passion brand of garments. This set of

customers is to be monitored for their attitude towards Passion brand
before and after release of an advertising campaign. The attitude is to be
measured on a 10 point scale. 1= highly disliked, 10= highly liked.

The ratings by these 18 customers before and after the advertising

campaign are mentioned next. Develop a hypothesis to test if these ratings
by customers are different. Also test the hypothesis.
ANOVA
• Whereas t-tests compare only two distributions, analysis of variance is able to
compare many. E.g. if in case of MARKS file, we want to see whether Quiz1 scores by
men and women are different i.e. who (men or women) score higher in the quiz, a t-
test is appropriate.

If, however, we wish to see whether any of the five different ethnic groups’ scores
differ significantly from each other on the same quiz, it would require one way analysis
of variance to accomplish it.

One way ANOVA means:

Exactly one dependent variable (Continuous) e.g. quiz1 scores, here
Exactly one independent variable (Categorical) e.g. ethnicity, here, with 5 level

Two (Three) way ANOVA means: Exactly one dependent variable &
Exactly two (three) independent variable

MANOVA: Multiple dependent variables & multiple independent variables

One-Way ANOVA
• File # MARKS.sav
Dependent variable – Quiz 1 scores
Independent variable – Ethnicity (with 5 levels)

– Ho: There is no difference among students with different

ethnicities as far as quiz1 marks scored by them is concerned.

– H1: There is significant difference among students with

different ethnicities as far as quiz1 marks scored by them is
concerned.
Chi-Square Test
Graduation background of MBA students & their performance in terms of
grade is given below:

Education Background:
• B.com (1)
• B.E. (2)
• B.Sc. (3)
Ho: Graduation background of MBA students does not
• B.B.A. (4)
influence their performance in terms of grade.
• B.A. (5)
Ha: Graduation background of MBA students influence their
Grade Codes: performance in terms of grade.
• A (1)
• B (2)
• C (3)
Correlation (r)
• Degree of association between two sets of quantitative data e.g. how crop
production is correlated with rainfall?
• r varies from -1 to +1; r=0 (no correlation); r= (+/-)1 (perfect correlation)

Bivariate Correlation: Correlation between two variables

• File # MARKS.sav
• To produce correlation matrix of gender, gpa & final

Partial Correlation: Process of finding correlation between two variables after

the influence of other variables has been controlled for.
Regression
• Regression explains variation in one variable (dependent variable) based on the
variation in one or more other variables (independent variables)
• Simple regression: one dependent & one independent variable
• Multiple regression: one dependent & more than one independent variables

• File # REGRESSION.sav
It is dersired to study the effect that six different conditions (independent variables)
have on yield per hectre for a crop of wheat. The research was conducted by
accumulating data from fifteen major states in India
The six independent variables are;
X1= Rainfall (in cms)
X2= Soil type (1, low quality to 5, high quality)
X3= Quantity of fertilizer (in quintal/ sq. km of land)
X4= Land percentage being irrigated by State Agri. Deptt.
X5= Seed quality (1, low quality to 5, high quality)
X6= Percentage of automation in cultivation process
Dependent variable is Y= yield per hectre in quintals
Regression
We need to determine:

1. Is model a good fit? From ANOVA table (F-value)

2. What % of variation in dependent variable is explained by independent variables?

From Model Summary (Adjusted R square)

3. Which independent variables are good explanatory variables of dependent variable?

From Coefficients (t-values)

4. Regression Equation

MKSAP For Students 5 PDF
100% (15)
MKSAP For Students 5 PDF
324 pages
Ace Designers - Competing Through Process Improvement
No ratings yet
Ace Designers - Competing Through Process Improvement
17 pages
Training Documentation SAP EWM en
No ratings yet
Training Documentation SAP EWM en
3 pages
Bertrand de Jouvenel - Art of Conjecture-Basic Books (1967)
No ratings yet
Bertrand de Jouvenel - Art of Conjecture-Basic Books (1967)
311 pages
The Schedule of Recent Experience - Stress Management Techniques From Mind Tools
No ratings yet
The Schedule of Recent Experience - Stress Management Techniques From Mind Tools
2 pages
Data Preparation
No ratings yet
Data Preparation
12 pages
Data Preparation & Univariate Analysis
No ratings yet
Data Preparation & Univariate Analysis
18 pages
Week 10 Factor Analysis
No ratings yet
Week 10 Factor Analysis
61 pages
Session 3: Data Analysis Plan
No ratings yet
Session 3: Data Analysis Plan
4 pages
data analysis new
No ratings yet
data analysis new
28 pages
Chapter 5 Descriptive Inferential Statistics
No ratings yet
Chapter 5 Descriptive Inferential Statistics
33 pages
ADA Binder
No ratings yet
ADA Binder
171 pages
Module 5.hypothesis Testing
No ratings yet
Module 5.hypothesis Testing
16 pages
Unit 2: Hypothesis Testing Shahaida P
No ratings yet
Unit 2: Hypothesis Testing Shahaida P
23 pages
CH 1 - MDA - 6e - PH
No ratings yet
CH 1 - MDA - 6e - PH
32 pages
BRM 5th Unit
No ratings yet
BRM 5th Unit
16 pages
Research Methodology and Biostatistics Part II 2
No ratings yet
Research Methodology and Biostatistics Part II 2
45 pages
Variable and Types of Statistical Variables
100% (1)
Variable and Types of Statistical Variables
9 pages
Statistical Methods: 4 Unit
No ratings yet
Statistical Methods: 4 Unit
39 pages
manova 1
No ratings yet
manova 1
68 pages
An Overview of Descriptive Statistics
No ratings yet
An Overview of Descriptive Statistics
6 pages
Educational Statistics
No ratings yet
Educational Statistics
12 pages
CH 1 - MDA - 6e - PH
No ratings yet
CH 1 - MDA - 6e - PH
32 pages
Review and Non Parametric Using SPSS 2023
No ratings yet
Review and Non Parametric Using SPSS 2023
69 pages
Generalizability Theory Updated
No ratings yet
Generalizability Theory Updated
44 pages
0.4 Parametric Vs Non Parametric
No ratings yet
0.4 Parametric Vs Non Parametric
19 pages
data screening and main model analysis in spss
No ratings yet
data screening and main model analysis in spss
26 pages
L1 Introduction To Multivariate Analysis PDF
No ratings yet
L1 Introduction To Multivariate Analysis PDF
55 pages
STAT22209 - Nonparametric Statistics
No ratings yet
STAT22209 - Nonparametric Statistics
74 pages
Market Research
No ratings yet
Market Research
29 pages
Module 2 Feb
No ratings yet
Module 2 Feb
57 pages
Non Parametric+2022
No ratings yet
Non Parametric+2022
227 pages
Descriptive Lec
No ratings yet
Descriptive Lec
8 pages
Data Preparation
100% (1)
Data Preparation
38 pages
Week 3 Skewness and Kurtosis
No ratings yet
Week 3 Skewness and Kurtosis
29 pages
MATH 2207 - Basics of Statistics
No ratings yet
MATH 2207 - Basics of Statistics
21 pages
L1 Introduction To Multivariate Data Analysis
No ratings yet
L1 Introduction To Multivariate Data Analysis
37 pages
Econometrics Guide E-Veiw
No ratings yet
Econometrics Guide E-Veiw
16 pages
ML 04 Validation Regularization
No ratings yet
ML 04 Validation Regularization
57 pages
Descriptive 2024 Lec
No ratings yet
Descriptive 2024 Lec
8 pages
(Scalling techniques)
No ratings yet
(Scalling techniques)
12 pages
Data Screening Assumptions
No ratings yet
Data Screening Assumptions
29 pages
All chapter download Modern Marketing Research Concepts Methods and Cases 2nd Edition Feinberg Solutions Manual
100% (26)
All chapter download Modern Marketing Research Concepts Methods and Cases 2nd Edition Feinberg Solutions Manual
64 pages
Group - 2 - Summary On Hypothesis Testing and T-Tests
No ratings yet
Group - 2 - Summary On Hypothesis Testing and T-Tests
2 pages
PDF Modern Marketing Research Concepts Methods and Cases 2nd Edition Feinberg Solutions Manual download
100% (2)
PDF Modern Marketing Research Concepts Methods and Cases 2nd Edition Feinberg Solutions Manual download
52 pages
Review Anova
No ratings yet
Review Anova
3 pages
Final Notes
No ratings yet
Final Notes
184 pages
Chapter 8.2
No ratings yet
Chapter 8.2
33 pages
Modern Marketing Research Concepts Methods and Cases 2nd Edition Feinberg Solutions Manual - Quickly Download And Experience The Full Content
100% (2)
Modern Marketing Research Concepts Methods and Cases 2nd Edition Feinberg Solutions Manual - Quickly Download And Experience The Full Content
44 pages
Chi Square Tests
No ratings yet
Chi Square Tests
14 pages
Modern Marketing Research Concepts Methods and Cases 2nd Edition Feinberg Solutions Manual - PDF Format Is Available With All Chapters
No ratings yet
Modern Marketing Research Concepts Methods and Cases 2nd Edition Feinberg Solutions Manual - PDF Format Is Available With All Chapters
55 pages
Parametric Tests
No ratings yet
Parametric Tests
69 pages
Instantly download the complete Modern Marketing Research Concepts Methods and Cases 2nd Edition Feinberg Solutions Manual book (PDF).
100% (6)
Instantly download the complete Modern Marketing Research Concepts Methods and Cases 2nd Edition Feinberg Solutions Manual book (PDF).
46 pages
6.multiple Regressions - BDSM - 2020 - Oct
No ratings yet
6.multiple Regressions - BDSM - 2020 - Oct
45 pages
Access Modern Marketing Research Concepts Methods and Cases 2nd Edition Feinberg Solutions Manual All Chapters Immediate PDF Download
100% (7)
Access Modern Marketing Research Concepts Methods and Cases 2nd Edition Feinberg Solutions Manual All Chapters Immediate PDF Download
64 pages
Intermediate Analytics-nonparametric statistics&sampling-Week 5
No ratings yet
Intermediate Analytics-nonparametric statistics&sampling-Week 5
56 pages
SPC SQC
No ratings yet
SPC SQC
68 pages
Data Analysis and Interpretation
No ratings yet
Data Analysis and Interpretation
33 pages
STAT - 9 Two Sample
No ratings yet
STAT - 9 Two Sample
38 pages
Chapter 1 The Nature of Probability and Statistics
No ratings yet
Chapter 1 The Nature of Probability and Statistics
9 pages
Eda 2022 04 11 09352244
No ratings yet
Eda 2022 04 11 09352244
35 pages
Hypothesis Testing: Charity I. Mulig
No ratings yet
Hypothesis Testing: Charity I. Mulig
21 pages
Non Parametric Testing
No ratings yet
Non Parametric Testing
42 pages
Elementary Statistics
From Everand
Elementary Statistics
jay prakash Maheshwari
5/5 (1)
Report Presentation
100% (1)
Report Presentation
22 pages
Fieldwork
No ratings yet
Fieldwork
9 pages
Causal Research
No ratings yet
Causal Research
23 pages
The Learning Circle in Culture Change: Why Use It?: Sources
No ratings yet
The Learning Circle in Culture Change: Why Use It?: Sources
6 pages
Solomon K MS - M1 Edexcel PDF
No ratings yet
Solomon K MS - M1 Edexcel PDF
4 pages
10 Ielts Speaking Tests
No ratings yet
10 Ielts Speaking Tests
48 pages
Animals and Tourism Understanding Diverse Relationships
No ratings yet
Animals and Tourism Understanding Diverse Relationships
2 pages
Revised Ipcrf Master Teacher I III 1
No ratings yet
Revised Ipcrf Master Teacher I III 1
37 pages
7 History
No ratings yet
7 History
43 pages
CASP-RCT-Checklist (Kelompok EBP)
No ratings yet
CASP-RCT-Checklist (Kelompok EBP)
5 pages
Barcode Basics
No ratings yet
Barcode Basics
3 pages
2nd sound in medici2 جدي
No ratings yet
2nd sound in medici2 جدي
70 pages
Biology Seminar 601 (Biosem 601) Thesis Proposal Format 1
No ratings yet
Biology Seminar 601 (Biosem 601) Thesis Proposal Format 1
6 pages
Kanna Technologies - DBA Content
No ratings yet
Kanna Technologies - DBA Content
3 pages
Microsoft Project Exercise 4, School of Business Costs, Levels, Reallocations, 3 Oct 2010
No ratings yet
Microsoft Project Exercise 4, School of Business Costs, Levels, Reallocations, 3 Oct 2010
3 pages
Sample Qualitative Research Proposal
100% (1)
Sample Qualitative Research Proposal
6 pages
Tomasello's Book Summary
No ratings yet
Tomasello's Book Summary
5 pages
Wire Rod
No ratings yet
Wire Rod
71 pages
Shadow Work
No ratings yet
Shadow Work
19 pages
Paper and Board Manufacture and Properties Workshop
100% (1)
Paper and Board Manufacture and Properties Workshop
99 pages
Lecture 1 Research Method
100% (1)
Lecture 1 Research Method
69 pages
Logic: Fallacies
No ratings yet
Logic: Fallacies
30 pages
Attenuation in OFC
No ratings yet
Attenuation in OFC
4 pages
A Computer Integrated Framework For E-Learning Control Systems Based On Data Flow Diagrams
No ratings yet
A Computer Integrated Framework For E-Learning Control Systems Based On Data Flow Diagrams
9 pages
Comprehensive Integrated Master Plan For Chennai Bengaluru Industrial Corridor Final Report
No ratings yet
Comprehensive Integrated Master Plan For Chennai Bengaluru Industrial Corridor Final Report
77 pages
Pidekso Random
No ratings yet
Pidekso Random
4 pages
Calculus Review Packet November
No ratings yet
Calculus Review Packet November
23 pages
Cloud Identification Powerpoint
No ratings yet
Cloud Identification Powerpoint
28 pages
Agri Pro Insights & Quiz Kit UPSSSC AG TA
No ratings yet
Agri Pro Insights & Quiz Kit UPSSSC AG TA
11 pages

Data Preparation & Analysis

Uploaded by

Data Preparation & Analysis

Uploaded by

Data Preparation &

Editing the questionnaires involves identifying illegible, incomplete,

Treatment of Unsatisfactory Results

– Returning to the Field

– Assigning Missing Values

– Discarding Unsatisfactory Respondents

• If possible, standard codes should be used for missing data. Coding of

• In questions that permit a large number of responses, each possible

• Category codes should be mutually exclusive and collectively

• Category codes should be assigned for critical issues even if no one

• Data should be coded to retain as much detail as possible.

Consistency checks : identify data that are out of range, logically

Known Characteristics of the Data

Properties of Statistical Techniques

Background and Philosophy of the Researcher

Data Analysis Strategy

Metric Data Non-numeric Data

One Sample Two or More One Sample Two or More

One Dependent More Than One Variable Interobject

Accept Ho Correct Decision Type II Error (β)

If there are 18 customers of Passion brand of garments. This set of

The ratings by these 18 customers before and after the advertising

One way ANOVA means:

MANOVA: Multiple dependent variables & multiple independent variables

– Ho: There is no difference among students with different

– H1: There is significant difference among students with

Bivariate Correlation: Correlation between two variables

Partial Correlation: Process of finding correlation between two variables after

1. Is model a good fit? From ANOVA table (F-value)

2. What % of variation in dependent variable is explained by independent variables?

3. Which independent variables are good explanatory variables of dependent variable?

You might also like