0% found this document useful (0 votes)
3 views

Advanced Biostatistics Using SPSS_Revised

The document provides an introduction to advanced biostatistics using SPSS, detailing the concepts of variables, data, and information. It explains the functionality of SPSS, including its three main windows (Data Editor, Syntax Editor, and Viewer) and the types of analyses that can be conducted, such as univariate, bivariate, and multivariate analyses. Additionally, it emphasizes the importance of understanding data types and the process of converting raw data into meaningful information for decision-making.

Uploaded by

bekele
Copyright
© © All Rights Reserved
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
3 views

Advanced Biostatistics Using SPSS_Revised

The document provides an introduction to advanced biostatistics using SPSS, detailing the concepts of variables, data, and information. It explains the functionality of SPSS, including its three main windows (Data Editor, Syntax Editor, and Viewer) and the types of analyses that can be conducted, such as univariate, bivariate, and multivariate analyses. Additionally, it emphasizes the importance of understanding data types and the process of converting raw data into meaningful information for decision-making.

Uploaded by

bekele
Copyright
© © All Rights Reserved
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
You are on page 1/ 129

Advanced Biostatistics Using SPSS

Bekele Belayihun, PhD in Biostatistics and Epidemiology


[email protected]
+2510968599931
General Introduction

• Introduction to SPSS with the concept


of Statistics
References
• Martin Bland. An introduction to Medical Statistics

• Colton T. Statistics in Medicine

• Daniel W. Biostatistics a foundation for analysis in the Health


Sciences
• Kirkwood BR. Essentials of Medical Statistics

• Knapp RG, Miller MC. Clinical epidemiology and Biostatistics.

• Baltimore Williams and Wilkins, 1992

• P. Armitage and G. Berry. Statistical Methods in Medical Research

• Pagano and Gauvereau. Principles of Biostatistics


Introduction
Good to know/refresh you what data is and
how create information and why?
Variable, Data and Information
• Variable: is a characteristic which takes different
value
• Quantitative variables
– E.g. Number of children in a family, ...
• E.g. Weight, height, BP, VL, ...
• Qualitative variables
– E.g. Marital status, religion,
– E.G. Education status, patient satisfaction
Types of Variable
• Variables can be classified in to different types
Example 1
In a study to determine whether surgery or chemotherapy
results in higher survival rates for a certain type of cancer,
whether or not the patient survived is one variable, and whether
they received surgery or chemotherapy is the other.
• Identify the outcome variable
• Identify the predictor variable

• Solution
• Outcome variable
• Predictor variable
Example 2
• Global Burden of Non communicable diseases and
risk factors. They are by far the leading cause of
death in the Region, representing 30% of all annual
deaths.
• NCD risk factors include:
– Tobacco
– Harmful use of Alcohol
– Sedentary behavior and physical inactivity
– Obesity
– Unhealthy diet.
Schematic presentation
Speed and risk of car accident
Data

• Data: Is a measurement (observation) taken


about the variable
• The collection of data is often called
dataset
• Can be quantitative data or
• Qualitative data
• However, data is raw in which the
required evidences can not be easily
obtained.
Illustration
• Data is raw, unorganized facts that need to be
processed.
• When data is processed, organized, structured
or presented in a given context so as to make
it useful, it is called information.

Relation of data and information


Information
Information: Data that is
– specific and organized for a purpose
– presented within a context that gives it
meaning and relevance and
– can lead to an increase in understanding and
decrease in uncertainty
• Biostatistics/Statistics is the tool which converts data
to information.
Information
• Relevant meaning/implication, input for decision or action
• Realized the opportunity/problem solving
• Context or perspective of action
• Meaning of human intention
DIKIW Interrelationship model

Understanding universal
Captured/stored symbol truth
Sound judgment
Conceptualized Appropriate execution
Constructed

Data
Wisdom

Knowledge
Intelligence
Information
Restructuring/
Processed /analyzed Mental processing
Reconstructed picture
Example 1
Example 2
Introduction to
SPSS for windows
Introduction
• It is a multi-purpose statistical package to help you explore,
summarize and analyze datasets
• A dataset is a collection of several pieces of information called
variables (usually arranged by columns)
• A variable can have one or several values (information for one
or several cases)
• Other statistical packages are Stata, SAS and R
• SPSS is widely used in social science research and the most used
statistical software on campus.
Data Management Software's
• There are different possible data management software's
• The most common are Stata, SPSS, SAS and R
• They all have different features

For this course, we will focus on SPSS


SPSS/ Windows
The user for SPSS/Windows is built by three
primary and distinct windows:
 the Data Editor window

 the Syntax Editor window

 and, the Viewer window


SPSS/Windows
It is important to know in which window one
is working because each window supports a
specific function.
SPSS/Windows
The type of window
1. Data Editor, 2. Syntax Editor, 3. Viewer

– The type of window is visible on the top left corner of


each window.

Window Name
1. Data editor window
–It is the view where we see our data

–It is useful to see and manipulate the data

–It has *.sav extension

–It has two views,


1. Data view and 2. Variable view

– Views can be exchanged by clicking the view


we wish at the left most lower side of the
data editor
SPSS/Windows
Data editor view

Data view Variable view


SPSS/Windows
a. Data view
Here you simply observe
–Names of variables

–the values of the variables

•We are able to edit directly


the values of variables

•Data can be entered when


SPSS is in data view
SPSS/Windows
Rows and Column
• Rows are horizontally recorded values of different
variables of a single study subject
• Column is vertically recorded single variable of many
study subjects

Column
A single Variable’s value
across all study subjects

Rows
(Single study subject’s
information)
Displaying values
• You can display the values of your categorical variables
– as the numeric codes entered (eg 1’s and 2’s for gender), or

– to view the value labels which you have defined in variable


view (eg male and female; see 1.3)

• to view on the menu-bar, and choose value labels.

• use the luggage label button on the toolbar


luggage label button
SPSS/Windows
b. Variable view
Here you simply observe
• It is important to create
new variables in SPSS

• It is by writing name of
variable, type of variable,
its label and its value
SPSS/Windows
b. Variable view
Here you simply observe

1. Names of variables

2. Type of variables

3. Width/ Decimals

4. Label of the variables

5. Labels of values of variable


Cont….
• Name of variables
– Name of variables are usually codes

– They contain continuous alphabets with out


interruption (no space in between alphabets)
– Example
Agegroup vs age group

education vs educationalstatus
educational status
SPSS/Windows
b. Variable view
Here you simply observe

1. Names of variables

2. Type of variables

3. Width/ Decimals

4. Label of the variables

5. Labels of values of variable


2. Type of variables
Cont…
• There are different types of variables
• It is displayed when clicked upper right
corner of type column

1. Numeric for countable (quantitative)


only accepting numericals (coding of
qualitative variables is possible)

2. Date characteristics it can use different


styles of dates

3. String for qualitative data usually if we


interested on words
SPSS/Windows
b. Variable view
Here you simply observe

1. Names of variables

2. Type of variables

3. Width/ Decimals

4. Label of the variables

5. Labels of values of variable


Cont…
3. Width/ Decimals
• The width and decimal are used to allow number of characteristic
of a value of a single variable

• If numeric type of variable, it will ask to choose number of widths


and decimals (as a default the width comes 8 and decimals of 2)

• If date type of variable, it may ask you to choose number of


characteristics of the type of date

• If a qualitative data with words, it will ask you to choose number


of characteristics you wanted to add
Cont.…
• Decimals
– Number of decimals
– It has to be less than or equal to 16

– If it is date or string variable, it will not ask you


decimals

3.14159265
SPSS/Windows
b. Variable view
Here you simply observe

1. Names of variables

2. Type of variables

3. Width/ Decimals

4. Label of the variables

5. Labels of values of variable


4. Label of the variables
Cont….
• Label of a variable is detailed description of
the variable name
– You can specify the details of the variable

– You can write characters with spaces up to 256


characters
SPSS/Windows
b. Variable view
Here you simply observe

1. Names of variables

2. Type of variables

3. Width/ Decimals

4. Label of the variables

5. Labels of values of variable


Cont…
5. Labels of values of variable
• This is description of values of variables of qualitative
variables coded as quantitative (categorical)

• It is for variables whose values are nominated

• Eg. ‘Sex’ the value can be 1. male, 2. female


‘Residence’ = = = = = 1. urban, 2. rural ….etc

• For continuous variables, no value is needed coding


Defining the value labels
• Click the cell in the values column as shown below
• For the value, and the label, you can put up to 60
characters.
• After defining the values click add and then click OK.

Click
Labeling value
• Write the value first

• Write its meaning

• Click add to pass


2. The Viewer window
It is displayed after any data manipulation

 Analysis result, commands are displayed in the


viewer window

 Editing of graphs is also performed in this window


field.
SPSS/Windows
The Viewer window displays
all

–Statistical results,

–Tables, and

–Charts

–Commands…… etc.

•It has *.spv extension


The Viewer window

Menu bar and


toolbar buttons in
output window

output in outline Output in detail


3. The Syntax Editor
 It is the window in which SPSS commands can be typed and
submitted for processing.

 Commands saved in files can be opened in a syntax Editor


window for processing.

 it has *.sps extension

SPSS Syntax Editor


SPSS/Windows
ntax ater
a sy d l
c e uce
odu prod
a
r
n p be re • A syntax file is formed by two
e c n
h
r
e t ca
g tha
ways;
in
ck m
Cli ogra
Pr 1. Manual writing (for
programmers)

2. By clicking at ‘paste’ of any


function in recoding,
transforming or analysis

3. Resent versions also keep


syntaxes at the viewer
windows
SPSS/Windows
Syntax development from any function
• First manupilate the function
Eg.
anlysis Descriptive statistics Frequency

The frequency menu will appear

After entering the variables click ‘past’


SPSS/Windows

• A written syntax menu having the program will


appear as below Syntax menu

Program with
a command
SPSS/Windows
• Once a syntax is written, we are able to excute it.

• In excuting a syntax we are able to do the whole


program as whole or by selecting part of the syntax

• To do the whole syntax, select the ‘run’ from the


pulldown menu of the syntax and select ‘all’

• To excute part of the syntax, shade it and run the file


SPSS/Windows

The Viewer window displays all statistical results,


tables, and charts.

SPSS Viewer
SPSS/Windows
The second important feature is its use of
Pull-down menu items and tool bars.

Pull-down
Menu Items
SPSS/Windows
The tool bar provide a quick, easy method of
accessing commonly required tasks.

Tool Bar
SPSS/Windows
The pull-menu and tool bar items change from one type of window to
another.
Different Windows,
Different pull down Menu
Items
Different Windows,
Different Tool bars
SPSS/Windows
Pull-down items important for discussion
when a person wants to use SPSS

Pull-down
Menu Items

Data, Transform, Analysis, Graphs


Data processing using DATA menu
Analysis Using SPSS for windows
Analysis
Analysis
Descriptive Stat

Compare means

Correlate

Regression

Scale

Nonparametric

Survival
Three Steps of Data Analysis
• Univariate analysis
– Step 1: Examine the distribution of each individual variable

• Bivariate analysis
– Step 2: Describe association between pairs of variables
(only two variables)

• Multivariate analysis
– Step 3: Use a statistical model called Regression (Linear or logistic) to
examine the relationship between multiple independent variables & a
dependent variable
– This is done to gain insight into causal relationships (cause & effect)
I. Univariate Analysis
Univariate Analysis

• UNI variate analysis is the process of


describing the sample by examining and
summarizing the distribution of each
individual variable.
• It will also be useful to familiarize
yourself with your data
 Percentage
 Frequency table
 Descried continuous variable(Mean, Sd, symmetricity)
Univariate analysis
using
SPSS for windows
Analysis  Descriptive statistics  Frequency
Cont…
Variable list
a variable is selected
Click here to pass
to the variable list
Variables
Click here to do
selected
the analysis
Output….
The statistics tells us number of valid and missing values of each variable

Valid percent without


Percent taking into considering missing value
consideration missing value
marital status Cumulative %
Missing value Cumulative
Sometimes may be
Frequency Percent Valid Percent Percent
Valid Never married 74 5.1 5.5 5.5 useful to decide
currently married or
cohabiting
759 52.7 56.4 61.9
recoding
separated or divorced 58 4.0 4.3 66.2
widowed 427 29.6 31.7 97.9
not known 28 1.9 2.1 100.0
Total 1346 93.4 100.0
Missing System 95 6.6
Total 1441 100.0

In practice we usually take the valid percent,


but we should indicate ‘n’ as the valid totals
Continuous variables
Looking for Assumptions
• In SPSS, like any statistical analysis, it goes through
lots of assumptions
• Dependent and continuous variables should go
through these assumptions
• These continuous variables should be tested for
their symmetrical distribution
• If not, they should not pass through many methods
of analysis (they should follow non-parametric
analysis)

• There are two ways to assess symmetry of a


continuous variable
1. Assess for symmetry using, frequency

Analysis  Descriptive statistics  Frequency


Under frequency

– Look for ‘statistics’ and click mean,


standard deviation and ‘skewedness’

– Under charts choose for histogram and


click the normal curve
Analysis  Descriptive statistics  Frequency
1. Analysis  Descriptive statistics  Frequency

From statistics, click


Mean and median

Standard deviation

Skewness
1. Analysis  Descriptive statistics  Frequency

From the charts,

choose for ‘Histogram’

tick with ‘Normal curve’.


OUTPUT

Skeweness
Statistics, tells us how it is skewed, the more it is nearer to 0,
the more it will be normally distributed

verbal fluency - animal naming score educational level


400 500

400
300

300

200

200
Frequency

100

Frequency
Std. Dev = 5.63 100
Std. Dev = 1.36
Mean = 15.5
Mean = 2.9
0 N = 1441.00
0 N = 1366.00
0.0 5.0 10.0 15.0 20.0 25.0 30.0 35.0 40.0
1.0 2.0 3.0 4.0 5.0
2.5 7.5 12.5 17.5 22.5 27.5 32.5 37.5 42.5

educational level
verbal fluency - animal naming score

The Histogram and the curve shows us how the data are distributed.
2. Testing for distribution using explore

Analysis  Descriptive statistics  Explore

Under Explore
– Click ‘Plotes’ and select “Normality plots with test”

Result is found by
– Q-Q plot test
Analysis  Descriptive statistics  Explore
Analysis  Descriptive statistics  Explore  Plots

Under plots Click for

Normality plots with tests


OUTPUT

If Significant, it
is not normally
Test of Normality distributed

Normal Q-Q plot, tells us that if the data is normally distributed,


then the red dots should lie on the straight diagonal line
Normal Q-Q Plot of verbal fluency - animal naming score
4
Normal Q-Q Plot of age in years
4 3

2
3

1
2
0
1

Expected Normal
-1

0
Expected Normal

-2

-1 -3

-4
-2
-10 0 10 20 30 40 50

-3
Observed Value
50 60 70 80 90 100

Observed Value
OUTPUT
110
50

1393 100
40 187
308
1237
833
869
423
1150 1262
440
975
1260
1383
418
1388
1395 936
214
1134
1276
889
898
821
339
1285
1385
1274
1146 90 1423
180
196
1155
530
706
1087
1374
1437
840
1098
1051
30 1413
1382
493
420
788
1041
294
896
636

80
20

70
10

1366
1379 60
0 22
929

50
-10
N= 1441
N= 1441
age in years
verbal fluency - ani

The Box Plot also has a lot of outliers, showing


the data are not normally distributed
II Bivariate Analysis
Bivariate Analysis
• Bivariate analysis is second step in analysis

• It is analysis made to test presence of relationship


between two variables

• Describes presence of association between two


variables

• Answers the question: Is there a relationship


between these two variables?

• It is initial step in hypothesis testing


Possible combination
• There are three possible combination pairs of
variable types,

• Combination between:
1. Two qualitative variables

2. Two quantitative variables

3. A quantitative and qualitative variables


1. Two qualitative variables
• This is when the dependent and the independent
variables are categorical

• The statistics can be done


– Manually,
– Statcalc of EPI-info,
– Crosstab and logistic regression in SPSS.

• Chi square is the usual test of statistics


SPSS for Windows
1. Analysis Descriptive statistics Crosstab

Under crosstabs
– Put dependent variable to “column” and the
independent variables to “Rows”.
– By Clicking the ‘statistics’ mark the ‘Chi square’,
‘risk’.
– By clicking the ‘Cells’, mark ‘rows’ from the
percents.

NB: If a Case-control study, better to click the cells


and mark column
Analysis Descriptive statistics Crosstab
Analysis Descriptive statistics Crosstab

Put the independent variables to “Rows”


(One or more categorical variables)

The dependent variable to “column”

Under ‘statistics’

‘Chi square’,

‘risk’.
Analysis Descriptive statistics Crosstab

Under ‘Cells’,

‘rows’ .
Output
gender * depression diagnosis Crosstabulation

depression diagnosis
depression
non-case case Total
gender female Count 497 358 855
% within gender 58.1% 41.9% 100.0%
male Count 420 160 580
% within gender 72.4% 27.6% 100.0%

This is considered Total Count


% within gender
917
63.9%
518
36.1%
1435
100.0%
as the referent
Compare percentages
between different
Chi-Square Tests exposure status

Asymp. Sig. Exact Sig. Exact Sig.


Value df (2-sided) (2-sided) (1-sided)
Pearson Chi-Square 30.571 b 1 .000
a
Continuity Correction 29.955 1 .000
Likelihood Ratio 31.089 1 .000
X2 that needs
Fisher's Exact Test .000 .000
Linear-by-Linear Consideration (for 2x2)
30.550 1 .000
Association
N of Valid Cases 1435
a. Computed only for a 2x2 table
b. 0 cells (.0%) have expected count less than 5. The minimum expected count is
209.37.
•If the variables are of 2X2 table format, take the X2 under the continuity correction
•If it is of 2X(>2) take the X2 under the Pearson chi-Square
•If any cell in the table has < 5 expected count, choose likelihood ratio Fisher’s Ex.
•If the dependent variable is of ordinal type, choose linear by linear association.
Additional points
• Regression Analysis (continues and categorical
data)
• General Linear Models
• Generalized Linear Models (GLMs)
If you're interested, please refer to the listed reference.
I'm happy to provide further support and details if
needed. I've also highlighted some additional important
points below for your review. These highlights may pique
your interest and encourage you to explore further.
However, if they don't appeal to you, feel free to disregard
them—no worries!
Poisson Regression Model
The Poisson Regression Model
• Count data are very common in many applications.
• Examples include:
– Number of patients visiting a certain hospital
per day,
• CD4 counts,
• Number of live births in a given district per
year, etc.
• Count data are commonly analyzed using Poisson
regression model.
Negative-Binomial Regression I

• Recall: Poisson distribution assumes that its `mean' and `variance' are
equal.
• However, count data in practice violate this assumption as a result of
unobserved heterogeneity.
• Usually, the sample variance is higher than the sample mean, a
phenomenon referred to as `over dispersion'.

• The Over dispersion in the data can't be fully captured by observed


covariates.

• Hence, there is a need of models having more parameters than the Poisson
distribution.

• Negative-binomial is an important extension to Poisson in this regard.


Zero-Inflated Models I
• Most count data are characterized by excessive
zeros beyond what the common count
distributions can predict.
• Often because of heterogeneity between
subjects.
Or omission of important covariance's
• Not appropriately accounting for this feature
leads to biased estimates.
• And hence wrong conclusion.
• Hence, models for such extension are needed.
Zero-Inflated Models II
• Commonly used models in this regard:
– Zero-inflated Poisson model (ZIP).

– Zero-inflated negative-binomial (ZINB).


• ZINB is considered when data are characterized by both
– Over dispersion and zero-inflation.
Zero-Inflated Models III
• Zero-inflated Poisson model (ZIP)is commonly
used models in this regard:
• This model allows for over dispersion assuming
that there are two different types of individuals in
the data:
– (1) Those who have a zero count with a probability of 1
(Always-0 group), and
– (2) those who have counts predicted by the standard
Poisson (Not always-0 group)
• Observed zero could be from either group, and if
the zero is from the always-0 group, it indicates
that the observation is free from the probability of
having a positive outcome
Zero-Inflated Poisson for Data

• Consider `deaths' as outcome.


• `Place' and `family income' as covariates both
in positive counts and zero-inflation parts.

• The option `vuong' gives test for zero-inflation.


Survival Analysis
Introduction

• Survival Analysis refers to statistical methods for


analyzing survival data.

• Survival data could be derived from laboratory, clinical,


epidemiological studies, etc.

• Response of interest is the time from an initial


observation until occurrence of a subsequent event.
Why survival analysis ?
• Censoring (time of
event not observed)
• Unequal follow-up time
What is time?
What is the origin of time?
In epidemiology:
•Age (birth as time 0) ?
•Calendar time since a
baseline survey ?
Types of survival analysis
1. Non-parametric method
Kaplan-Meier analysis
2. Semi-parametric method

Cox regression
3. Parametric method
Generalized Estimating Equations
(GEE):
What are Generalized Estimating Equations (GEE)?
• Extension of the Generalized Linear Model (GZLM), which is an
extension of the General Linear Model (GLM)
– GLM analyzes models with normally distributed DVs that are
linearly linked to predictors
– GZLM extends GLM to analyze non-normally distributed DVs
that may be non-linearly linked to predictors
• Easily handles interactions between discrete and continuous
IVs
• Cannot analyze correlated, non-independent, clustered,
nested, repeated measures, within-subjects data
– GEE extends GZLM and analyzes correlated data with
• Normal and non-normal DVs
• DVs that are linearly or non-linearly linked to IVs
How can we Conducting a GEE
Conducting a GEE: First Step

• Arrange your data in “long form”

A. How data usually look B. How data need to look for GEE
Getting from A to B: Restructuring Your Data
Restructuring Your Data
Restructuring Your Data
Restructuring Your Data
Restructuring Your Data
Restructured Data
Conducting a GEE Analysis
Selecting the Model Type
• Dozens of model
combinations with GEE
– DV can be discrete, any
of several distributions,
and nonlinearly linked to
IVs
• Must select distribution
of DV and link function
Response Variable
• Also known as outcome
variable, DV
• Category order is for
multinomial DVs
• For binary outcomes,
can specify reference
category
Predictors
• Options for factors
allows specification of
reference category and
how to handle missing
data
Model
• Full factorial is a few
clicks away
Estimation and Statistics
EM Means
• Several options for
controlling for family-
wise error
• Several options for
contrasts, including
– Simple
– Pairwise
– Deviation
– Difference
Save, Export, and Cross Your Fingers
Thank You so Much all!!!

You might also like