Research Methods Chapter 5 (1)
Research Methods Chapter 5 (1)
Dereje Teferi
Actual Data Collection
• One should carefully plan the data collection as this
is the departure for execution of the research
• Pre-data collection
– Training of Data Collectors might be crucial
– Supporting letters might be necessary
• Post-data collection
– Editing of returned questionnaires
• The data you have collected may be presented using
– Tabular methods
– Graphical methods
Tabular methods of data presentation
• Tabulated data can be more easily understood than
facts
• They help facilitate statistical treatment of data
• When data are tabulated, all unnecessary details and
repetitions are avoided.
• Type of tables
– Simple (one way) table: shows one characteristic
– Two-way table: shows two characteristics
– Higher order table: shows three or more
characteristics
Tabular
Tabularmethods
methodsofofdata
datapresentation
presentation
(Frequency
(Frequencydistributions)
distributions)
• Steps
Steps
– Begin
Beginby by
arranging
arranging thethe
data from
data fromsmallest
smallest to to
largest
largest
– Count
Count values
values that repeat
that repeatbyby
making
making tallies
tallies
– Group
Group observations
observations with comparable
with comparable magnitude
magnitude
– StopStopthethe
classification
classificationwhen
when you areare
you sure
surethat thethe
that first and
first and
thethelastlast
classes respectively
classes consist
respectively thethe
consist smallest andand
smallest larges
larges
values
values
– Indicate
Indicate howhowmany
manyvalues andare
values included
included in in
a class
each class
Note:
If the number
If the number of classes k has
of classes been
k has beenfixed, then
fixed, class
then width
class width
may
maybebefixed as w = range/k
computed as w = range/k
Graphical methods of data presentation
• Data in a frequency distribution can be presented
graphically or diagrammatically
• Graphs are the natural choice to represent
continuous data
• For discrete or qualitative data, we have
– Pie chart (multiply relative frequency by 3600
– Pictogram (use of pictures)
– Bar graph (class limit and Abs. frequency)
Graphical methods … cont’d
• For continuous data, we have
– Histogram (class boundary and abs. freq.)
– Frequency polygon (Class mark and abs freq.)
– Cumulative frequency graph
Summarizing data numerically
• Measures of Central Tendency
– One of the objectives of statistical analysis is to determine
the various numerical measures which describe the
characteristics of a frequency distribution
• We can calculate the
– Mean, Median, and Mode for ungrouped data
– Mean, Median, and Mode for groupeddata
• We need to determine how representative the
average is as a description of a given set of data.
– We need to calculate :
• Range (very simple), Quartile Deviations
• Quartile Deviation QD=(Q3-Q1)/2
• Q1=value of ((n+1)/4)th item and Q3=value of (3(n+1)/4)th item
after arranging the items sequentially
• Standard Deviations
Estimation and Hypothesis Testing
• Two of the very important problems of
statistical inference are:
• Estimation of parameters (point estimation and interval
estimation) Estimation of one point against estimation
of several of range of values
• Test of hypothesis
• Estimation of parameters
– Parametric estimation refers to producing sound
and reasonable substitute for unknown
parameters of a population (Mean, variance,
correlation coefficient, etc.)
Test of Hypothesis
• The problem in hypothesis testing refers to
speculation made about the value of unknown
parameter of a distribution.
• We can test hypothesis based on sample data.
• Procedures to follow in Tests of Hypothesis
– The assumption about the parameter is called the null
hypothesis, and is denoted by Ho
– The counter hypothesis is known as alternative hypothesis
and is denoted by H1
– There should always be a level of significance α in testing a
hypothesis (the probability of rejecting a hypothesis when
it is actually true)
Univariate vs. multivariate analysis
• When we are solely interested in an isolated characteristic
of a set of objects, irrespective of other variable
characteristics possessed by the objects, we are involved
in a univariate analysis
– Achievement test scores of students
– White blood cell counts of patients
• Multivariate analysis is concerned with the simultaneous
investigation of two or more variable characteristics which
are measured over a set of objects
.
• Terms like related/correlated, dependent, covariance
correlation, etc. come into picture in multivariate analysis
Examples of multivariate analysis
• Relations between career performance,
academic achievement and aptitude test
scores
• Relations between income, consumption,
number of families, etc.
• Relations between product sales, price,
advertising levels.
Covariance and Correlation
Covariance is a measure of how changes in
one variable are associated with changes in a
second variable.
Covariance measures the degree to which two
variables are linearly associated.
Correlation r is standardized covariance
(x i X )( yi Y )
cov ( x , y ) i 1
n 1
Correlation r
Correlation …
A positive correlation indicates a positive
association between the variables (increas-
ing values in one variable correspond to in-
creasing values in the other variable)
A negative correlation indicates a negative
association between the variables (increas-
ing values is one variable correspond to de-
creasing values in the other variable).
A correlation value close to 0 indicates no
association between the variables.
Correlation …
The formula for calculating the correla-
tion coefficient standardizes the variables
Thus, changes in scale or units of mea-
surement will not affect its value.
For this reason, the correlation coefficient
is often more useful than a graphical de-
piction in determining the strength of the
association between two variables.
Example
The next slide is:
The hypothetical systolic BP and age of
twenty children in a sample at the no-city
hospital.
The hypothetical weight and age of twenty
children in a sample at the no-city hospital.
Computing the correlation,
Is there a relationship between SBP and age,
as well as weight and age in this sample data?
What do you see in the scatter plot?
What is the interpretation of your finding?
BP and Age of Children with CP
SBP Age Weight (kg) Age
90 12.5 38 12.5
88 12.1 45 12.1
100 13.6 35 13.6
70 10.0 50 10.0
80 11.2 60 11.2
90 12.0 45 12.0
100 13.4 30 13.4
102 13.8 51 13.8
120 16.8 53 16.8
110 15.6 40 15.6
89 12.3 43 12.3
80 12.0 39 12.0
90 12.7 41 12.7
100 13.7 40 13.7
87 12.0 50 12.0
93 12.8 56 12.8
82 11.6 52 111.6
102 14.0 62 14.0
93 13.0 39 13.0
86 11.9 44 11.9
Correlation r – basic assumptions
No distinction between explanatory (x) and response
(y) variable.
The null hypothesis test that r is significantly differ-
ent from zero (0).
Requires both variables to be quantitative or contin-
uous variables
Both variables must be normally distributed. If one or
both are not, either transform the variables to near
normality or use an alternative non-parametric test of
Spearman Rank Correlation
Use Spearman Correlation coefficient when the
shape of the distribution is not assumed or variable is
distribution-free.
Correlation r – basic assumptions
No categorical or nominal (qualitative)
variables
r does not change when we change the
units of measurement.
For example, from Kg to pounds for weight.
This is because r uses standardized values of
the observations.
r does not measure curved or non-linear
association no matter how strong.
Like the mean and SD, r is not resistant or
uninfluenced by outliers.
r is strongly affected by outlier or outlying ob-
SPB
70 80 90 100 110 120
10
12
14
Age
16
18
Weight
30 40 50 60
0
50
Age
100
Interpretation of Children Data
In a sample of 66 children, there is no significant
relationship between age of the children and sys-
tolic BP, r = 0.02, p = 0.90.
Assuming non-normal distribution of either one of
the variables, a non-parametric test was used (S-
pearman Rank correlation), r = 0.025, p = 0.84.
In either test, there is no linear relationship be-
tween age and the SBP of these patients.
However the absence of a linear association does
not rule out a non-linear relationship between the
age of these patients and their SBP.
Correlation r - Interpretation
Positive r indicates positive linear association between
x and y, and negative r indicates negative linear rela-
tionship
R is always between -1 and +1
The strength increases as r moves away from zero to-
ward wither -1 or +1
The extreme values +1 and -1 indicate perfect linear
relationship (points lie exactly along a straight line)
Graded interpretation : r 0.1-0.3 = weak; 0.4-0.7 =
moderate and 0.8-1.0=strong correlation
Scatter Plots of Example Data with Various
Correlation Coefficients
Y Y Y
X X X
r = -1 r = -.6 r=0
Y
Y Y
X X X
r = +1 r = +.3 r=0
Linear Vs non-linear Correlation
Linear relationships Curvilinear relationships
Y Y
X X
Y Y
X X
Slide from: Statistics for Managers Using Microsoft® Excel 4th Edition, 2004 Prentice-Hall
Strong Vs Weak Correlation
Strong relationships Weak relationships
Y Y
X X
Y Y
X X
Slide from: Statistics for Managers Using Microsoft® Excel 4th Edition, 2004 Prentice-Hall
No Correlation
No relationship
X
Regression
Sir Galton was the first to apply the word regression to biological and psycholog-
ical data. Specifically, Galton observed the heights of children versus the heights
of their parents. He discovered that taller than average parents tended to have
children who were also taller than average, but not as tall as their parents. Gal-
ton characterized this as regression toward mediocrity.
Correlation Coefficient is also attributed to Francis Galton.
Regression
1 Explanatory Variables 2+ Explanatory Variables
Models
Simple Multiple
Figure 9.1
9-33
Scatter Plots and Correlation
A scatter plot (or scatter diagram) is used to
show the relationship between two variables
Correlation analysis is used to measure
strength of the association (linear relation-
ship) between two variables
Only concerned with strength of
the relationship
No causal effect is implied
Purpose of Regression Analysis
The purpose of regression analysis is to analyze
relationships among variables.
The analysis is carried out through the estimation
of a relationship and the results serve the follow-
ing two purposes:
1.Answer the question of how much y changes with
changes in each of the x's (x1, x2,...,xk),
Y is the dependent variable
Figure 9.3
9-37
Simple Linear Regression
Finding the Best-Fitting Regression Line
Two possible lines are shown below.
Line A is clearly a better fit to the data.
We want to determine the best regression
^
line.
Y = b0 + b1X
where
b0 is the intercept
b1 is the slope
Figure 9.4
9-38
Least Squares Line
• The most widely used criterion for measuring the
goodness of fit of a line
• The line that gives the best fit to the data is the one
that minimizes this sum; it is called the least
squares line or sample regression line.
• The slope of a regression line
represents the rate of change
in y as x changes. Because y
is dependent on x, the slope
describes the predicted values
of y given x.
Simple Linear Regression
Using statistical tools to Find the Best Regres-
sion Line
Market value = 32673 + 35.036(area in
square feet)
The regression model
explains variation in
market value due to
size of the home.
It provides better es-
timates of market
value than simply us-
ing the average.
Figure 9.5
9-40
Three Important Questions
To examine how useful or effective the line sum-
marizing the relationship between x and y, we
consider the following three questions.
1. Is a line an appropriate way to summarize the
relationship between the two variables?
2. Are there any unusual aspects of the data set
that we need to consider before proceeding to
use the regression line to make predictions?
3. If we decide that it is reasonable to use the re-
gression line as a basis for prediction, how ac-
curate can we expect predictions based on the
regression line to be?
Data Analysis (Quantitative Vs Qualitative)
Quantitative research techniques generate a mass of
numbers that need to be summarized, described and ana-
lyzed.
Characteristics of the data may be described and explored
by drawing graphs and charts, doing cross tabulations and
calculating means and standard deviations.
On the other hand in Qualitative research the mass of
words generated by interviews or observational data
needs to be described and summarized.
The question may require the researchers to seek rela-
tionships between various themes that have been identi-
fied, or to relate behavior or ideas to biographical charac-
teristics of respondents such as age or gender.
Implications for policy or practice may be derived from the
data, or interpretation sought of puzzling findings from
previous studies.
Qualitative data analysis
Analysis of qualitative data usually goes
through some or all of the following stages
(though the order may vary)
Familiarization with the data through review, reading,
listening etc.
Transcription of tape recorded material
Organization and indexing of data for easy retrieval and
identification
Anonymizing of sensitive data
Coding (may be called indexing)
Identification of themes
Re-coding
Qualitative data analysis …
Development of provisional categories
Exploration of relationships between categories
Refinement of themes and categories
Development of theory and incorporation of pre-existing
knowledge
Testing of theory against the data
Report writing, including excerpts from original data if
appropriate (eg quotes from interviews)
It isn’t always necessary to go through all the
stages above, just as it isn’t always necessary
to use multivariate modelling in statistics!
Theories and methods in qualitative data analysis
Qualitative data refers to non-numeric information
such as interview transcripts, notes, video and au-
dio recordings, images and text documents.
There is no one right way to analyze qualitative
data, and there are several approaches available.
However, there are particular ‘schools of
thought’, or theoretical approaches to qualitative
analysis, which is important to be familiar with,
both for designing your own research and for criti-
cally evaluating qualitative research evidence.
The particular approach you choose for any given
study will depend on many factors, such as the re-
search question, the time you have available and
Methods of qualitative data analysis
methods
Qualitative data analysis can be divided into
the following five categories:
1. Content analysis. This refers to the
process of categorizing verbal or behavioral
data to classify, summarize and tabulate the
data.
2. Narrative analysis. This method involves
the reformulation of stories presented by re-
spondents taking into account context of each
case and different experiences of each respon-
dent. In other words, narrative analysis is the
revision of primary qualitative data by re-
Methods of qualitative data analysis
methods …
Discourse analysis. A method of analysis of
naturally occurring talk and all types of writ-
ten text.
4. Framework analysis. This is more ad-
vanced method that consists of several stages
such as familiarization, identifying a thematic
framework, coding, charting, mapping and in-
terpretation.
5. Grounded theory. This method of qualita-
tive data analysis starts with an analysis of a
single case to formulate a theory. Then, addi-
tional cases are examined to see if they con-
Framework Analysis
A second, more recent, approach to qualitative analy-
sis that is gaining popularity especially in health – re-
lated research, is Framework Analysis (Ritchie and
Spencer, 1994).
In contrast to grounded theory, Framework Analysis
was explicitly developed in the context of applied pol-
icy research.
Applied research aims to meet specific information
needs and provide outcomes or recommendations, of-
ten within a short timescale.
The benefit of Framework Analysis is that it provides
systematic and visible stages to the analysis process,
so that funders and others, can be clear about the
stages by which the results have been obtained from
the data.
Key stages of Framework Analysis
· Familiarization
· Identifying a thematic framework
· Indexing
· Charting
· Mapping and Interpretation
Stages in Framework Analysis…
Familiarization: whole or partial transcrip-
tion and reading of the data.
Identifying a thematic framework: this is
the initial coding framework which is devel-
oped both from a priori issues and from
emerging issues from the familiarization
stage.
This thematic framework should be devel-
oped and refined during subsequent stages.
Stages in Framework Analysis…
Indexing: the process of applying the the-
matic framework to the data, using numerical
or textual codes to identify specific pieces of
data which correspond to differing themes
(this is more commonly called coding in
other qualitative analysis approaches).
Charting: using headings from the thematic
framework to create charts of your data so
that you can easily read across the whole
dataset.
Charts can be either thematic for each theme
across all respondents (cases) or by case for
each respondent across all themes.
Stages in Framework Analysis…
Mapping and Interpretation: this means
searching for patterns, associations, con-
cepts, and explanations in your data, aided by
visual displays and plots.
Ritchie and Spencer, (1994) suggest that at
this stage, the qualitative analyst might be
aiming to define concepts, map the range and
nature of phenomena, create typologies, find
associations within the data, provide explana-
tions or develop strategies.
Grounded Theory
Grounded Theory is a methodology; in other words, it is a
way of thinking about conceptualizing data.
Grounded Theory evolved out of research by sociologists
Glaser and Strauss (1967).
Glaser and Strauss were concerned to outline an induc-
tive method of qualitative research which would allow so-
cial theory to be generated systematically from data.
That is, theories would be ‘grounded’ in rigorous empiri-
cal research, rather than produced in the abstract.
Here theory means a generalizable idea or concept,
gleaned from data,
Grounded Theory analysis is inductive, in that the result-
ing theory ‘emerges’ from the data through a process of
rigorous and structured analysis.
Grounded Theory …
The researcher typically goes through several
procedures.
These procedures are not linear stages, rather
the process of grounded theory is cumulative and
can involve frequent revisiting of data in light of
the new analytical ideas that emerge as data col-
lection and analysis progresses
Grounded Theory analysis requires ‘theoretical
sensitivity’.
This is described as an ability ‘to see the research
situation and its associated data in new ways, and
to explore the data’s potential for developing the-
ory’
Steps in Grounded Theory Analysis
Open coding (initial familiarization with the data)
delineation of emergent concepts
conceptual coding (using emergent concepts)
refinement of conceptual coding schemes
clustering of concepts to form analytical categories
searching for core categories
core categories lead to identification of core theory
testing of emerging theory by reference to other
research and to social/cultural/economic factors
that affect the area of study.