0% found this document useful (0 votes)
4 views

Reliability and Validity

The document discusses the importance of reliability and validity in questionnaire design, highlighting that the main goals are to gather relevant information and ensure consistent measurements. It outlines various methods to assess reliability, including test-retest, alternate-form, and internal consistency, as well as different forms of validity such as face, content, criterion, and construct validity. The document emphasizes that reliability and validity are crucial for ensuring that survey instruments accurately measure what they intend to measure.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPT, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
4 views

Reliability and Validity

The document discusses the importance of reliability and validity in questionnaire design, highlighting that the main goals are to gather relevant information and ensure consistent measurements. It outlines various methods to assess reliability, including test-retest, alternate-form, and internal consistency, as well as different forms of validity such as face, content, criterion, and construct validity. The document emphasizes that reliability and validity are crucial for ensuring that survey instruments accurately measure what they intend to measure.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPT, PDF, TXT or read online on Scribd
You are on page 1/ 32

Reliability

&
Validity

1
Goals in questionnaire design
Warwick and Linninger(1975) point out that there are two
basic goals in questionnaire design

 To obtain information relevant to the purpose


of the survey
 To collect this information with maximal
RELIABILITY and VALIDITY
How can a researcher be sure that the data gathering
instrument being used will measure what is supposed
to measure and will do this in a consistent manner ?

2
Reliability

 The degree of stability exhibited when a


measurement is repeated under identical conditions
 Reliability is the consistency of your measurement,
or the degree to which an instrument measures the
same way each time it is used under the same
condition with the same subjects. In short, it is the
repeatability of your measurement. A measure is
considered reliable if a person's score on the same
test given twice is similar.

3
Assessment of reliability

 Reliability is assessed in 3 forms


– Test-retest reliability
– Alternate-form reliability
– Internal consistency reliability

4
Test-retest reliability
 Measured by having the same respondents complete a
survey at two different points in time to see how stable the
responses are
 Usually quantified with a correlation coefficient (r value)
 In case of same results on the two administrations of the
instrument (questionnaire) then reliability coefficient will be
one
 Normally, correlation of measurements across time will be
less than perfect due to different experiences and attitidus
that that respondents have encountered from time of the
first test.
5  In general, r values are considered good if r  0.70
Test-retest reliability

 You can test-retest specific questions or the


entire survey instrument
 Be careful about test-retest with items or scales
that measure variables likely to change over a
short period of time, such as energy, happiness,
anxiety
 If you do it, make sure that you test-retest over
very short periods of time

6
Problems with Test-retest reliability

 Potential problem with test-retest is the practice


effect
– Individuals become familiar with the items and simply
answer based on their memory of the last answer

 Researchers do not have resources for multiple


administration
 What effect does this have on your reliability
estimates?
 It inflates the reliability estimate

7
Alternate-form reliability

 Unlike retest method Alternate-form requires


two similar tests with the same respondents
instead of same test for twice.
 Use differently worded forms to measure the
same attribute
 Questions or responses are reworded or
their order is changed to produce two items
that are similar but not identical
8
Alternate-form VS retest

 Alternative is viewed as superior to the retest


because a respondent’s memory of test
items is not as likely to play a role in the data
received.
 Practically it is difficult to develop similar test
items that are consistent in the measurement
of a specific phenomenon.

9
Alternate-form reliability

 Be sure that the two items address the same aspect


of behavior with the same vocabulary and the same
level of difficulty
– Items should differ in wording only
 It is common to simply change the order of the
response alternatives
– This forces respondents to read the response alternatives
carefully and thus reduces practice effect

10
Example: Assessment of depression

Circle one item


Version A:
During the past 4 weeks, I have felt downhearted:
Every day 1
Some days 2
Never 3

Version B:
During the past 4 weeks, I have felt downhearted:
Never 1
Some days 2
11 Every day 3
Alternate-form reliability

 You could also change the actual wording of the


question
– Be careful to make sure that the two items are equivalent
– Items with different degrees of difficulty do not measure the
same attribute
– What might they measure?
 Reading comprehension or cognitive function

12
Example: Assessment of loneliness

Version A:
How often in the past month have you felt alone in the world?
Every day
Some days
Occasionally
Never
Version B:
During the past 4 weeks, how often have you felt a sense of
loneliness?
All of the time
Sometimes
13 From time to time
Example of nonequivalent item rewording

Version A:
When your boss blames you for something you did not do, how often do
you stick up for yourself?
All the time
Some of the time
None of the time
Version B:
When presented with difficult professional situations where a superior
censures you for an act for which you are not responsible, how
frequently do you respond in an assertive way?
All of the time
Some of the time
None of the time
14
Internal consistency
(Split-half method)

 In this method the total number of items is divided into


halves and correlation taken between the two halves.
 Use spearman-Brown prophecy formula

2r
PXX 
1 r

15
PROBLEM

 I am a graduate student who is conducting a


research project for my thesis. I can't wait to
graduate! I would like to find out whether my
instrument is reliable in order to proceed with my
experiment. I heard about using alternate forms and
test-retest to estimate reliability. But due to lack of
resources, I cannot afford to write two tests or
administer the same test in two different times. With
only one test result, what should I do to evaluate the
reliability of my measurement tool?

16
SPLIT half method
 Spilt-half can be viewed as a one-test equivalent to alternate form and test-
retest, which use two tests.
 In spilt-half, you treat one single test as two tests by dividing the items into
two subsets.
 Reliability is estimated by computing the correlation between the two
subsets.
 For example, let's assume that you calculate the subtotal scores of all even
numbered items and the subtotal of all odd numbered items. The two sets
of scores are as the following:
 Calculate the correlation of these two sets of scores to check the internal
consistency.
 If the correlation of the two sets of scores is low, it implies that some
people received high scores on odd items but received low scores on even
items while other people received high scores on even items but received
low scores on odd items. In other words, the response pattern is
inconsistent.

17
Example: Calculate SPLIT HALF reliability for the
following data of 5 students on 4 test items

Total Total
Stude Q Q Q Q Different Correlation Split half
(Q1+Q2) (Q3+Q4)
nt 1 2 3 4 X Y halves coefficient reliability

1 2 1 1 3 3 4 1,2 and 3,4 0.7863 0.8763


2 6 4 5 6 10 11
1,3 and 2,4 0.9011 0.9278
3 3 2 1 1 5 2
1,4 and 2,3 0.9423 0.9691
4 6 3 3 3 9 6
5 6 4 4 3 10 7

S XY 8 .5
r  0.7863
S 2X S 2y (10.3)(11 .5)
Average of split half reliability coefficients=0.9244

2(0.7863)
PXX  0.8763
1  0.7863

18
Methods to split items into two halves

 Assign the odd numbered items to one half and the even numbered
items to the other half of the test.
 Divide the items from center (discard center item, if necessary) into
two halves
 Drawback:
Correlation between the two halves is dependent upon the method used
to divide the items
 Solution:
Calculate correlation coefficients between every possible division of test
into two halves and find average of these correlation coefficients
 Problem:
Incase of large number of test items difficult to calculate correlation
between every possible split of the test items into two halves
 Solution:
Calculate CORNBACH’s ALPHA
19
Internal consistency
(Cornbach’s alpha)

The most common internal consistency


measure is Cronbach's alpha, which is usually
interpreted as the mean of all possible split-
half coefficients.
Cronbach's alpha is a generalization of an earlier
form of estimating internal consistency,
Kuder-Richardson Formula 20
( test items with only two possible outcomes e.g
Yes/True or No/Fasle)
– Interpret like a correlation coefficient (0.70 is good)
20
Example: Calculate Cornbach’s Alpha for the
following data of 5 students on 4 test items

Studen Q1 Q2 Q3 Q4 TOTAL
t
1 2 1 1 3 7

2 6 4 5 6 21

3 3 2 1 1 7 S2total=31.04

4 6 3 3 3 15
 k

5 6 4 4 3 17   Si2 
k  i 1  4  9.52 
 1 2  1   0.9244
 (x  x) 2
15.2 6.8 12.8 12.8
k1  Stotal  4  1  31.04 
3.04 1.36 2.56 2.56
 
Si2 9.52  

21
Example: Calculate Cornbach’s Alpha/ KR-20 for the
following data of 5 students on 4 test
items(TRUE/FALSE)

Student Q1 Q2 Q3 Q4 TOTAL

1 1 1 0 0 2
2 1 0 0 0 1
3 1 1 1 1 4
S2total=1.44
4 1 1 1 1 4  k

5 0 1 1 0 2 k 
  S i2 
  4  1  0.8  0.5926
 1  i 12
k1  S total  4  1 1.44 
0.8 0.8 1.2 1.2  
 (x  x) 2
 
Si2 0.16 0.16 0.24 0.24 0.8  k

k 
  pi qi 
  4  1  0.8  0.5926
KR  20  1 i 1
k  1 2
S total  4  1  2
S total 

p
0.8 0.8 0.6 0.6  
 
q
0.2 0.2 0.4 0.4 As the value of reliability coefficient
0.16 0.16 0.24 0.24
is less than recommended standard
pq
0.8 of 0.7 so the test is not reliable
22
Validity

23
Definition

 How well a survey measures what it sets out


to measure

24
Assessment of validity

 Validity is measured in four forms


– Face validity
– Content validity
– Criterion validity
– Construct validity

25
Face validity

 Cursory review of survey items by untrained


judges
– Ex. Showing the survey to untrained individuals to
see whether they think the items look okay
– Very casual, soft
– Many don’t really consider this as a measure of
validity at all

26
Content validity

 Subjectivemeasure of how appropriate the


items seem to a set of reviewers who have
some knowledge of the subject matter
– Usually consists of an organized review of the
survey’s contents to ensure that it contains
everything it should and doesn’t include anything
that it shouldn’t
– Still very qualitative

27
Content validity (2)

 Who might you include as reviewers?


 How would you incorporate these two
assessments of validity (face and content)
into your survey instrument design process?

28
Criterion validity

 Measure of how well one instrument stacks


up against another instrument or predictor
– Concurrent: assess your instrument against a
“gold standard”
– Predictive: assess the ability of your instrument to
forecast future events, behavior, attitudes, or
outcomes
– Assess with correlation coefficient

29
Construct validity

 Most valuable and most difficult measure of


validity
 Basically, it is a measure of how meaningful
the scale or instrument is when it is in
practical use

30
Construct validity (2)

 Convergent:Implies that several different


methods for obtaining the same information
about a given trait or concept produce similar
results
– Evaluation is analogous to alternate-form
reliability except that it is more theoretical and
requires a great deal of work-usually by multiple
investigators with different approaches

31
Construct validity (3)

 Divergent: The ability of a measure to


estimate the underlying truth in a given area-
must be shown not to correlate too closely
with similar but distinct concepts or traits

32

You might also like