0% found this document useful (0 votes)
36 views

Presentation Outline

This document outlines the key aspects of ensuring the validity and reliability of measurement instruments. It discusses three types of validity: content validity, criterion-related validity (which includes concurrent and predictive validity), and construct validity. It also describes several aspects of reliability, including test-retest reliability, parallel-form reliability, internal consistency reliability (measured by Cronbach's alpha and split-half reliability), and inter-rater reliability. Establishing the validity and reliability of measures is important to ensure the instrument is accurately measuring the intended concept.

Uploaded by

MKWD NRWM
Copyright
© © All Rights Reserved
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
36 views

Presentation Outline

This document outlines the key aspects of ensuring the validity and reliability of measurement instruments. It discusses three types of validity: content validity, criterion-related validity (which includes concurrent and predictive validity), and construct validity. It also describes several aspects of reliability, including test-retest reliability, parallel-form reliability, internal consistency reliability (measured by Cronbach's alpha and split-half reliability), and inter-rater reliability. Establishing the validity and reliability of measures is important to ensure the instrument is accurately measuring the intended concept.

Uploaded by

MKWD NRWM
Copyright
© © All Rights Reserved
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
You are on page 1/ 6

PRESENTATION OUTLINE

I – OVERVIEW: GOODNESS OF MEASURES

II – RELIABILITY
a. Stability of Measures
Test-Retest Reliability
Parallel Form Reliability
b. Internal Consistency of Measures
Interitem Consistency Reliability
Split-Half Reliability

III – VALIDITY
a. Content Validity
Face Validity
b. Criterion-Related Validity
Con-current Validity
Predictive Validity
c. Congruent Validity
Congruent Validity
Discriminant Validity

SOURCE MATERIAL:
“Research Methods for Business: A Skill Building Approach 4th Edition” by Uma Sekaran
GOODNESS OF MEASURES

It is important to make sure that the instrument that we develop to measure a particular concept is indeed
accurately measuring the variable, and that in fact, we are actually measuring the concept that we set out
to measure. This ensures that in operationally defining perceptual and attitudinal variables, we have not
overlooked some important dimensions and elements or included some irrelevant ones. The scales
developed could often be imperfect, and errors are prone to occur in the measurement of attitudinal
variables. The use of better instruments will ensure more accuracy in results, which in turn, will enhance
the scientific quality of the research. Hence, in some way, we need to assess the ―goodness of the
measures developed. That is, we need to be reasonably sure that the instruments we use in our research do
indeed measure the variables they are supposed to, and that they measure them accurately.

Let us now examine how we can ensure that the measures developed are reasonably good. First an item
analysis of the responses to the questions tapping the variable is done, and then the reliability and validity
of the measures are established, as described below

ITEM ANALYSIS is done to see if the items in the instrument belong there or not. Each item is
examined for its ability to discriminate between those subjects whose total scores are high, and those with
low scores. In item analysis, the means between the high-score group and the low-score group are tested
to detect significant differences through the t-values. The items with a high t-value (test which is able to
identify the highly discriminating items in the instrument) are then included in the instrument. Thereafter,
tests for the reliability of the instrument are done and the validity of the measure is established.

Very briefly, reliability tests how consistently a measuring instrument measures whatever concept it is
measuring. Validity tests how well an instrument that is developed measures the particular concept it is
intended to measure. In other words, validity is concerned with whether we measure the right concept,
and reliability with stability and consistency of measurement. Validity and reliability of the measure attest
to the scientific rigor that has gone into the research study. These two criteria will now be discussed.

RELIABILITY

The reliability of a measure indicates the extent to which it is without bias (error free) and hence ensures
consistent measurement across time and across the various items in the instrument. In other words, the
reliability of a measure is an indication of the stability and consistency with which the instrument
measures the concept and helps to assess the ―goodness of a measure.

Stability of Measures
The ability of a measure to remain the same over time—despite uncontrollable testing conditions or the
state of the respondents themselves—is indicative of its stability and low vulnerability to changes in the
situation. This attests to its ―goodness because the concept is stably measured, no matter when it is done.
Two tests of stability are test–retest reliability and parallel-form reliability.

Test–Retest Reliability

The reliability coefficient obtained with a repetition of the same measure on a second occasion is
called test–retest reliability. That is, when a questionnaire containing some items that are
supposed to measure a concept is administered to a set of respondents now, and again to the same
respondents, say several weeks to 6 months later, then the correlation between the scores obtained
at the two different times from one and the same set of respondents is called the test–retest
coefficient. The higher it is, the better the test–retest reliability, and consequently, the stability of
the measure across time.

Parallel-Form Reliability
When responses on two comparable sets of measures tapping the same construct are highly
correlated, we have parallel-form reliability. Both forms have similar items and the same
response format, the only changes being the wordings and the order or sequence of the questions.
What we try to establish here is the error variability resulting from wording and ordering of the
questions. If two such comparable forms are highly correlated (say 8 and above), we may be
fairly certain that the measures are reasonably reliable, with minimal error variance caused by
wording, ordering, or other factors.

Internal Consistency of Measures

The internal consistency of measures is indicative of the homogeneity of the items in the measure that tap
the construct. In other words, the items should ―hang together as a set, and be capable of independently
measuring the same concept so that the respondents attach the same overall meaning to each of the items.
This can be seen by examining if the items and the subsets of items in the measuring instrument are
correlated highly. Consistency can be examined through the inter-item consistency reliability and split-
half reliability tests.

Interitem Consistency Reliability

This is a test of the consistency of respondents‘ answers to all the items in a measure. To the
degree that items are independent measures of the same concept, they will be correlated with one
another. The most popular test of interitem consistency reliability is the Cronbach‘s coefficient
alpha (Cronbach‘s alpha; Cronbach, 1946), which is used for multipoint-scaled items, and the
Kuder–Richardson formulas (Kuder & Richardson, 1937), used for dichotomous items. The
higher the coefficients, the better the measuring instrument

Split-Half Reliability

Split-half reliability reflects the correlations between two halves of an instrument. The estimates
would vary depending on how the items in the measure are split into two halves. Split-half
reliabilities could be higher than Cronbach‘s alpha only in the circumstance of there being more
than one underlying response dimension tapped by the measure and when certain other conditions
are met as well (for complete details, refer to Campbell, 1976). Hence, in almost all cases,
Cronbach‘s alpha can be considered a perfectly adequate index of the interitem consistency
reliability.

It should be noted that the consistency of the judgment of several raters on how they view a
phenomenon or interpret some responses is termed interrater reliability, and should not be
confused with the reliability of a measuring instrument. As we had noted earlier, interrater
reliability is especially relevant when the data are obtained through observations, projective tests,
or unstructured interviews, all of which are liable to be subjectively interpreted.

It is important to note that reliability is a necessary but not sufficient condition of the test of
goodness of a measure. For example, one could very reliably measure a concept establishing high
stability and consistency, but it may not be the concept that one had set out to measure. Validity
ensures the ability of a scale to measure the intended concept. We will now discuss the concept of
validity

VALIDITY

We are now going to examine the validity of the measuring instrument itself. That is, when we ask a set
of questions (i.e., develop a measuring instrument) with the hope that we are tapping the concept, how
can we be reasonably certain that we are indeed measuring the concept we set out to do and not
something else? This can be determined by applying certain validity tests. Several types of validity tests
are used to test the goodness of measures and writers use different terms to denote them. For the sake of
clarity, we may group validity tests under three broad headings: content validity, criterion-related validity,
and construct validity.
Content Validity

Content validity ensures that the measure includes an adequate and representative set of items
that tap the concept. The more the scale items represent the domain or universe of the concept
being measured, the greater the content validity. To put it differently, content validity is a
function of how well the dimensions and elements of a concept have been delineated.

A panel of judges can attest to the content validity of the instrument. Kidder and Judd (1986) cite
the example where a test designed to measure degrees of speech impairment can be considered as
having validity if it is so evaluated by a group of expert judges (i.e., professional speech
therapists).

Face validity is considered by some as a basic and a very minimum index of content validity.
Face validity indicates that the items that are intended to measure a concept, do on the face of it
look like they measure the concept. Some researchers do not see it fit to treat face validity as a
valid component of content validity.

Criterion-Related Validity

Criterion-related validity is established when the measure differentiates individuals on a criterion


it is expected to predict. This can be done by establishing concurrent validity or predictive
validity, as explained below.

Concurrent validity is established when the scale discriminates individuals who are known to be
different; that is, they should score differently on the instrument as in the example that follows.

If a measure of work ethic is developed and administered to a group of welfare recipients, the
scale should differentiate those who are enthusiastic about accepting a job and glad of an
opportunity to be off welfare, from those who would not want to work even when offered a job.
Obviously, those with high work ethic values would not want to be on welfare and would yearn
for employment to be on their own. Those who are low on work ethic values, on the other hand,
might exploit the opportunity to survive on welfare for as long as possible, deeming work to be a
drudgery. If both types of individuals have the same score on the work ethic scale, then the test
would not be a measure of work ethic, but of something else.

Predictive validity indicates the ability of the measuring instrument to differentiate among
individuals with reference to a future criterion. For example, if an aptitude or ability test
administered to employees at the time of recruitment is to differentiate individuals on the basis of
their future job performance, then those who score low on the test should be poor performers and
those with high scores good performers.
Construct Validity

Construct validity testifies to how well the results obtained from the use of the measure fit the
theories around which the test is designed. This is assessed through convergent and discriminant
validity, which are explained below.

Convergent validity is established when the scores obtained with two different instruments
measuring the same concept are highly correlated.

Discriminant validity is established when, based on theory, two variables are predicted to be
uncorrelated, and the scores obtained by measuring them are indeed empirically found to be so.
Validity can thus be established in different ways. Published measures for various concepts
usually report the kinds of validity that have been established for the instrument, so that the user
or reader can judge the ―goodness‖ of the measure. Table 9.1 summarizes the kinds of validity
discussed here.

Some of the ways in which the above forms of validity can be established are through (1)
correlational analysis (as in the case of establishing concurrent and predictive validity or
convergent and discriminant validity), (2) factor analysis, a multivariate technique that would
confirm the dimensions of the concept that have been operationally defined, as well as indicate
which of the items are most appropriate for each dimension (establishing construct validity), and
(3) the multitrait, multimethod matrix of correlations derived from measuring concepts by
different forms and different methods, additionally establishing the robustness of the measure.

In sum, the goodness of measures is established through the different kinds of validity and
reliability depicted in Figure 9.1. The results of any research can only be as good as the measures
that tap the concepts in the theoretical framework. We need to use well-validated and reliable
measures to ensure that our research is scientific. Fortunately, measures have been developed for
many important concepts in organizational research and their psychometric properties (i.e., the
reliability and validity) established by the developers. Thus, researchers can use the instruments
already reputed to be ―good, rather than laboriously develop their own measures. When using
these measures, however, researchers should cite the source (i.e., the author and reference) so that
the reader can seek more information if necessary.

It is not unusual that two or more equally good measures are developed for the same concept. For
example, there are several different instruments for measuring the concept of job satisfaction.
One of the most frequently used scales for the purpose, however, is the Job Descriptive Index
(JDI) developed by Smith, Kendall, and Hulin (1969). When more than one scale exists for any
variable, it is preferable to use the measure that has better reliability and validity and is also more
frequently used.

At times, we may also have to adapt an established measure to suit the setting. For example, a
scale that is used to measure job performance, job characteristics, or job satisfaction in the
manufacturing industry may have to be modified slightly to suit a utility company or a health care
organization. The work environment in each case is different and the wordings in the instrument
may have to be suitably adapted. However, in doing this, we are tampering with an established
scale, and it would be advisable to test it for the adequacy of the validity and reliability afresh.

A sample of a few measures used to tap some frequently researched concepts in the management
and marketing areas is provided in the Appendix to this chapter.
Scenario A.

Product Ratings
A B C
Respondent 1 Fair Excellent Poor
Respondent 2 Fair Fair Poor
Respondent 3 Fair Excellent Poor
Respondent 4 Poor Excellent Poor
Respondent 5 Fair Excellent Poor

Scenario B.

Product Ratings
A B C
Respondent 1 Poor Excellent Excellent
Respondent 2 Excellent Fair Fair
Respondent 3 Fair Poor Fair
Respondent 4 Fair Poor Poor
Respondent 5 Poor Excellent Poor

You might also like