0% found this document useful (0 votes)
19 views

Validity and Reliability Lesson 3.

Uploaded by

Inner Art
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPT, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
19 views

Validity and Reliability Lesson 3.

Uploaded by

Inner Art
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPT, PDF, TXT or read online on Scribd
You are on page 1/ 48

VALIDITY AND RELIABILITY

Commonly used terms…

“She has a valid point”

“My car is unreliable”

…in science…
“The conclusion of the study was not valid”

“The findings of the study were not reliable”.


• What’s validity?
Some definitions…
• Validity

The soundness or appropriateness of a test


or instrument in measuring what it is
designed to measure
Denotes the extent to which an instrument is
measuring what it is supposed to measure
• "Validation is inquiry into the soundness of
the interpretations proposed for scores from
a test" Cronbach (1990, p. 145)
• The tool is valid for a particular purpose and
for certain situations.
• For example the purpose of a measurement is
to determine intelligence. Becaouse of this
reason, you can not use this tool to determine
anxiety level or personality disorder
• So; It cannot be used for any other purpose !!!
• While definition of validity seems simple,
there are several different types of validity
that are relavant in the social science.
• Each of these types of validity takes a
somewhat diffreent approach in assessing the
extent to which a measures what it supports
to.
• Validity was traditionally subdivided into three
categories: criterion-related, content and
construct validity (see Brown 1996, pp. 231-
249).
• 1)Criterion related (1)predicitve validity,
• 2) Concurrent validity
• 2)Content validity
• 3)Construct validity
Criterion-Related Validity
• Criterion validity (or criterion-
related validity) measures how well
one measure predicts an outcome
for another measure.
• Validity is usually determined by comparing
two instruments ability to predict a similar
outcome with a single variable being
measured.
• There are two major types of criterion validity
predictive or concurrent forms of validity.
• 1) Concurrent criterion validity
• 2) Predictive validity
• Concurrent criterion validity is used when the
two instruments are used to measure the
same event at the same time.
• Example: surveys during an election can be
said to have concurrent criterion validity if
they predict similar outcomes to the election.
CONCURRENT VALIDITY

The extent to which a


procedure correlates
with the current
behavior of subjects
• Concurrent Validity
–Infers that the test produces
similar results to a previously
validated test
• 2) Predictive validity is used when the
instrument is administered then time is
allowed to pass and is measured against the
another outcome.
• if the test accurately predicts what it is
supposed to predict.
• For example, TOEFL exhibits predictive validity
for performance in English
2) CONTENT VALIDITY
• A second basic type of validity is content
validity.
• This type of validity has played a major role in
the development and assessment of various
types of tests used in psychology and
especially education.
• Fundamentally, content validity depends on
the extent to which an emprical measurement
reflects a specific domain of content.
• In psychometrics, content validity
refers to the extent to which a
measure represents all facets of a
given construct.
• For example a test in arithmetical operations
would not be content valid if the test
problems focused only on addition, thus
neglecting subtraction, multiplication and the
division.
• For example, a depression scale may lack
content validity if it only assesses the affective
dimension of depression but fails to take into
account the behavioral dimension.
3) Construct validity
• To understand the traditional definition of
construct validity, it is first necessary to
understand what a construct is.
• A construct, or psychological construct as it is
also called, is an attribute, proficiency, ability,
or skill that happens in the human brain and is
defined by established theories.
In psychology, a construct is a skill, attribute,
intelligence, self-concept, motivation,
aggression or ability that is based on one or
more established theories and can be
observed by some type of instrument.
• Construct validity refers to how well a test or
tool measures the constructs that it was
designed to measure.
• In other words, to what extent the
statements we are using in the scale to
measure  depression are actually measuring it
• Construct validity is woven into the theoretical
fabric of the social sciences, and is thus central to
the measurement of abstract theoretical
concepts.
• Indeed, construct validation must be considered
within a theoretical contex.
• Rosenberg’s self-esteem scale:
theoretically, Rosenberg (1965) has argued that
a student’s level of self-esteem is positively
related to participation in school activities.
• Thus, the theoretical prediction is that the
higher the level of self-esteem, the more active
the student willl be in school –related activities.
• A teacher can administer Rosenberg’s self-
esteem scale to a group of students and can
determine the extent of their involvement in
school activities.

• If the correlation is positive, then this evidence


has been supported the construct validity of
Rosenberg’ s self-esteem scale.
• Another example;
• Someone with depression could tell about
being fatigued, anxious, hopeless etc. and
these symptoms could point towards
depression or anxiety, but we can't determine
the severity of depression just by hearing the
symptoms because we need to have a tool to
measure it as closely as possible. Here we
trying to measure the construct called
depression.
FACTORS AFFECTING VALIDITY
1. Test-related factors
2. The criterion to which you compare
your instrument may not be well
enough established
3. Intervening events
4. Reliability
• Reliability ?

• The consistency of measurements


A RELIABLE TEST
Produces similar scores across
various conditions and situations,
including different evaluators and
testing environments.
RELIABILITY COEFFICIENTS
• The statistic for expressing reliability.
• Expresses the degree of consistency
in the measurement of test scores.
• Donoted by the letter r with two
identical subscripts (rxx)
• Reliability coefficients: 0 to 1
• The closer the value is to one (1.00),
the higher the reliability is assumed.
• 1)Test-retest reliability
• 2)Split half reliability
• 3) Interrater reliability
1)TEST-RETEST RELIABILITY

Suggests that subjects tend


to obtain the same score
when tested at different
times.
• test-retest reliability is a measure of reliability
obtained by administering the same test twice
over a period of time to a group of
individuals.  The scores from Time 1 and Time
2 can then be correlated in order to evaluate
the test for stability over time. 
•  
• Example:  A test designed to assess student
learning in psychology could be given to a
group of students twice, with the second
administration perhaps coming a week after
the first.  The obtained correlation coefficient
would indicate the stability of the scores.
•  
Split-Half Reliability
• Sometimes referred to as internal
consistency
• Indicates that subjects’ scores on
some trials consistently match
their scores on other trials
– Split-half testing measures reliability. In split-half
reliability, a test for a single knowledge area
is splitinto two parts and then both parts given to
one group of students at the same time. The
scores from both parts of the test are correlated
• Stages of split half reliability:
• Administer the test to a large group students
(ideally, over about 30).
• Randomly divide the test questions into two
parts. For example, separate even questions from
odd questions.
• Score each half of the test for each student.
• Find the correlation coefficient for the two halves.
INTERRATER RELIABILITY
Involves having two raters independently
observe and record specified behaviors,
such as hitting, crying, yelling, and getting
out of the seat, during the same time period
• Inter-rater reliability is a measure of reliability
used to assess the degree to which different judges
or raters agree in their assessment decisions. 
Inter-rater reliability is useful because human
observers will not necessarily interpret answers
the same way; raters may disagree as to how well
certain responses or material demonstrate
knowledge of the construct or skill being assessed. 
•  
• Example:  Inter-rater reliability might be employed
when different judges are evaluating the degree to
which art portfolios meet certain standards.  Inter-
rater reliability is especially useful when judgments
can be considered relatively subjective.  Thus, the
use of this type of reliability would probably be
more likely when evaluating artwork as opposed to
math problems.
•  
FACTORS AFFECTING RELIABILITY

1. Test length
2. Test-retest interval
3. Variability of scores
4. Guessing
5. Variation within the test situation
Threats to Reliability
• Fatigue
8 am 9 am 10 am

Subject 1 60 ml.kg-1.min-1 55 ml.kg-1.min-1 50 ml.kg-1.min-1

Therefore, solution = increase time between tests.


Threats to Reliability
• Habituation

Subject 1 60 ml.kg-1.min-1 65 ml.kg-1.min-1 70 ml.kg-1.min-1

Therefore, solution = familiarise prior to test.


Threats to Reliability
• Standardisation of Procedures
– Control of extraneous variables
Measurement Errors
• Ultimately, reliability is dependent on the
degree of measurement error in a given study

• The overall error in any measurement is


comprised of both systematic and random
error
Relationship between reliability
and validity
• If data are valid, they must be reliable. If people
receive very different scores on a test every
time they take it, the test is not likely to predict
anything.
• However, if a test is reliable, that
does not mean that it is valid.
• Reliability is a necessary, but not sufficient,
condition for validity

You might also like