Advanced Biostatistics Using SPSS_Revised
Advanced Biostatistics Using SPSS_Revised
• Solution
• Outcome variable
• Predictor variable
Example 2
• Global Burden of Non communicable diseases and
risk factors. They are by far the leading cause of
death in the Region, representing 30% of all annual
deaths.
• NCD risk factors include:
– Tobacco
– Harmful use of Alcohol
– Sedentary behavior and physical inactivity
– Obesity
– Unhealthy diet.
Schematic presentation
Speed and risk of car accident
Data
Understanding universal
Captured/stored symbol truth
Sound judgment
Conceptualized Appropriate execution
Constructed
Data
Wisdom
Knowledge
Intelligence
Information
Restructuring/
Processed /analyzed Mental processing
Reconstructed picture
Example 1
Example 2
Introduction to
SPSS for windows
Introduction
• It is a multi-purpose statistical package to help you explore,
summarize and analyze datasets
• A dataset is a collection of several pieces of information called
variables (usually arranged by columns)
• A variable can have one or several values (information for one
or several cases)
• Other statistical packages are Stata, SAS and R
• SPSS is widely used in social science research and the most used
statistical software on campus.
Data Management Software's
• There are different possible data management software's
• The most common are Stata, SPSS, SAS and R
• They all have different features
Window Name
1. Data editor window
–It is the view where we see our data
Column
A single Variable’s value
across all study subjects
Rows
(Single study subject’s
information)
Displaying values
• You can display the values of your categorical variables
– as the numeric codes entered (eg 1’s and 2’s for gender), or
• It is by writing name of
variable, type of variable,
its label and its value
SPSS/Windows
b. Variable view
Here you simply observe
1. Names of variables
2. Type of variables
3. Width/ Decimals
education vs educationalstatus
educational status
SPSS/Windows
b. Variable view
Here you simply observe
1. Names of variables
2. Type of variables
3. Width/ Decimals
1. Names of variables
2. Type of variables
3. Width/ Decimals
3.14159265
SPSS/Windows
b. Variable view
Here you simply observe
1. Names of variables
2. Type of variables
3. Width/ Decimals
1. Names of variables
2. Type of variables
3. Width/ Decimals
Click
Labeling value
• Write the value first
–Statistical results,
–Tables, and
–Charts
–Commands…… etc.
Program with
a command
SPSS/Windows
• Once a syntax is written, we are able to excute it.
SPSS Viewer
SPSS/Windows
The second important feature is its use of
Pull-down menu items and tool bars.
Pull-down
Menu Items
SPSS/Windows
The tool bar provide a quick, easy method of
accessing commonly required tasks.
Tool Bar
SPSS/Windows
The pull-menu and tool bar items change from one type of window to
another.
Different Windows,
Different pull down Menu
Items
Different Windows,
Different Tool bars
SPSS/Windows
Pull-down items important for discussion
when a person wants to use SPSS
Pull-down
Menu Items
Compare means
Correlate
Regression
Scale
Nonparametric
Survival
Three Steps of Data Analysis
• Univariate analysis
– Step 1: Examine the distribution of each individual variable
• Bivariate analysis
– Step 2: Describe association between pairs of variables
(only two variables)
• Multivariate analysis
– Step 3: Use a statistical model called Regression (Linear or logistic) to
examine the relationship between multiple independent variables & a
dependent variable
– This is done to gain insight into causal relationships (cause & effect)
I. Univariate Analysis
Univariate Analysis
Standard deviation
Skewness
1. Analysis Descriptive statistics Frequency
Skeweness
Statistics, tells us how it is skewed, the more it is nearer to 0,
the more it will be normally distributed
400
300
300
200
200
Frequency
100
Frequency
Std. Dev = 5.63 100
Std. Dev = 1.36
Mean = 15.5
Mean = 2.9
0 N = 1441.00
0 N = 1366.00
0.0 5.0 10.0 15.0 20.0 25.0 30.0 35.0 40.0
1.0 2.0 3.0 4.0 5.0
2.5 7.5 12.5 17.5 22.5 27.5 32.5 37.5 42.5
educational level
verbal fluency - animal naming score
The Histogram and the curve shows us how the data are distributed.
2. Testing for distribution using explore
Under Explore
– Click ‘Plotes’ and select “Normality plots with test”
Result is found by
– Q-Q plot test
Analysis Descriptive statistics Explore
Analysis Descriptive statistics Explore Plots
If Significant, it
is not normally
Test of Normality distributed
2
3
1
2
0
1
Expected Normal
-1
0
Expected Normal
-2
-1 -3
-4
-2
-10 0 10 20 30 40 50
-3
Observed Value
50 60 70 80 90 100
Observed Value
OUTPUT
110
50
1393 100
40 187
308
1237
833
869
423
1150 1262
440
975
1260
1383
418
1388
1395 936
214
1134
1276
889
898
821
339
1285
1385
1274
1146 90 1423
180
196
1155
530
706
1087
1374
1437
840
1098
1051
30 1413
1382
493
420
788
1041
294
896
636
80
20
70
10
1366
1379 60
0 22
929
50
-10
N= 1441
N= 1441
age in years
verbal fluency - ani
• Combination between:
1. Two qualitative variables
Under crosstabs
– Put dependent variable to “column” and the
independent variables to “Rows”.
– By Clicking the ‘statistics’ mark the ‘Chi square’,
‘risk’.
– By clicking the ‘Cells’, mark ‘rows’ from the
percents.
Under ‘statistics’
‘Chi square’,
‘risk’.
Analysis Descriptive statistics Crosstab
Under ‘Cells’,
‘rows’ .
Output
gender * depression diagnosis Crosstabulation
depression diagnosis
depression
non-case case Total
gender female Count 497 358 855
% within gender 58.1% 41.9% 100.0%
male Count 420 160 580
% within gender 72.4% 27.6% 100.0%
• Recall: Poisson distribution assumes that its `mean' and `variance' are
equal.
• However, count data in practice violate this assumption as a result of
unobserved heterogeneity.
• Usually, the sample variance is higher than the sample mean, a
phenomenon referred to as `over dispersion'.
• Hence, there is a need of models having more parameters than the Poisson
distribution.
Cox regression
3. Parametric method
Generalized Estimating Equations
(GEE):
What are Generalized Estimating Equations (GEE)?
• Extension of the Generalized Linear Model (GZLM), which is an
extension of the General Linear Model (GLM)
– GLM analyzes models with normally distributed DVs that are
linearly linked to predictors
– GZLM extends GLM to analyze non-normally distributed DVs
that may be non-linearly linked to predictors
• Easily handles interactions between discrete and continuous
IVs
• Cannot analyze correlated, non-independent, clustered,
nested, repeated measures, within-subjects data
– GEE extends GZLM and analyzes correlated data with
• Normal and non-normal DVs
• DVs that are linearly or non-linearly linked to IVs
How can we Conducting a GEE
Conducting a GEE: First Step
A. How data usually look B. How data need to look for GEE
Getting from A to B: Restructuring Your Data
Restructuring Your Data
Restructuring Your Data
Restructuring Your Data
Restructuring Your Data
Restructured Data
Conducting a GEE Analysis
Selecting the Model Type
• Dozens of model
combinations with GEE
– DV can be discrete, any
of several distributions,
and nonlinearly linked to
IVs
• Must select distribution
of DV and link function
Response Variable
• Also known as outcome
variable, DV
• Category order is for
multinomial DVs
• For binary outcomes,
can specify reference
category
Predictors
• Options for factors
allows specification of
reference category and
how to handle missing
data
Model
• Full factorial is a few
clicks away
Estimation and Statistics
EM Means
• Several options for
controlling for family-
wise error
• Several options for
contrasts, including
– Simple
– Pairwise
– Deviation
– Difference
Save, Export, and Cross Your Fingers
Thank You so Much all!!!