Reliability vs. Validity in Research - Difference, Types and Examples
Reliability vs. Validity in Research - Difference, Types and Examples
Table of contents
A reliable measurement is not always valid: the results might be reproducible, but they’re not
necessarily correct.
A valid measurement is generally reliable: if a test produces accurate results, they should be
reproducible.
What is reliability?
Reliability refers to how consistently a method measures something. If the same result can be
consistently achieved by using the same methods under the same circumstances, the
measurement is considered reliable.
You measure the temperature of a liquid sample several times under identical
conditions. The thermometer displays the same temperature every time, so the results
are reliable.
A doctor uses a symptom questionnaire to diagnose a patient with a long-term
Table of contents
medical condition. Several different doctors use the same questionnaire with the same
patient but give different diagnoses. This indicates that the questionnaire has low
reliability as a measure of the condition.
What is validity?
Validity refers to how accurately a method measures what it is intended to measure. If
research has high validity, that means it produces results that correspond to real properties,
characteristics, and variations in the physical or social world.
High reliability is one indicator that a measurement is valid. If a method is not reliable, it
probably isn’t valid.
If the thermometer shows different temperatures each time, even though you have
carefully controlled conditions to ensure the sample’s temperature stays the same, the
thermometer is probably malfunctioning, and therefore its measurements are not valid.
However, reliability on its own is not enough to ensure validity. Even if a test is reliable, it
may not accurately reflect the real situation.
The thermometer that you used to test the sample gives reliable results. However, the
thermometer has not been calibrated properly, so the result is 2 degrees lower than
the true value. Therefore, the measurement is not valid.
A group of participants take a test designed to measure working memory. The results
Table of contents
are reliable, but participants’ scores correlate strongly with their level of reading
comprehension. This indicates that the method might have low validity: the test may
be measuring participants’ reading comprehension instead of their working memory.
Validity is harder to assess than reliability, but it is even more important. To obtain useful
results, the methods you use to collect data must be valid: the research must be measuring
what it claims to measure. This ensures that your discussion of the data and the conclusions
you draw are also valid.
Types of reliability
Different types of reliability can be estimated through various statistical methods.
Types of reliability
Test-retest reliability
The consistency of a measure across time: do you get the same results when you repeat the
measurement?
A group of participants complete a questionnaire designed to measure personality traits. If they repeat
the questionnaire days, weeks or months apart and give the same answers, this indicates high test-
retest reliability.
Interrater reliability
Table of contents
The consistency of a measure across raters or observers: do you get the same results when different
people conduct the same measurement?
Based on an assessment criteria checklist, five examiners submit substantially different results for the
same student project. This indicates that the assessment checklist has low inter-rater reliability (for
example, because the criteria are too subjective).
Internal consistency
The consistency of the measurement itself: do you get the same results from different parts of a test
that are designed to measure the same thing?
You design a questionnaire to measure self-esteem. If you randomly split the results into two halves,
there should be a strong correlation between the two sets of results. If the two results are very
different, this indicates low internal consistency.
Types of validity
The validity of a measurement can be estimated based on three main types of evidence.
Each type can be evaluated through expert judgement or statistical methods.
Types of validity
Construct validity
The adherence of a measure to existing theory and knowledge of the concept being measured.
A test that aims to measure a class of students’ level of Spanish contains reading, writing and speaking
components, but no listening component. Experts agree that listening comprehension is an essential
aspect of language ability, so the test lacks content validity for measuring the overall level of ability in
Spanish.
Criterion validity
The extent to which the result of a measure corresponds to other valid measures of the same
concept.
A survey is conducted to measure the political opinions of voters in a region. If the results accurately
predict the later outcome of an election in that region, this indicates that the survey has high criterion
validity.
To assess the validity of a cause-and-effect relationship, you also need to consider internal
validity (the design of the experiment) and external validity (the generalizability of the
results).
Academic style
Vague sentences
Grammar
Style consistency
See an example
How to ensure validity and reliability in your research
Table of contents
The reliability and validity of your results depends on creating a strong research design,
choosing appropriate methods and samples, and conducting the research carefully and
consistently.
Ensuring validity
If you use scores or ratings to measure variations in something (such as psychological traits,
levels of ability or physical properties), it’s important that your results reflect the real
variations as accurately as possible. Validity should be considered in the very earliest stages
of your research, when you decide how you will collect your data.
Ensure that your method and measurement technique are high quality and targeted to
measure exactly what you want to know. They should be thoroughly researched and based
on existing knowledge.
For example, to collect data on a personality trait, you could use a standardized
questionnaire that is considered reliable and valid. If you develop your own questionnaire, it
should be based on established theory or findings of previous studies, and the questions
should be carefully and precisely worded.
To produce valid and generalizable results, clearly define the population you are researching
(e.g., people from a specific age range, geographical location, or profession). Ensure that
you have enough participants and that they are representative of the population. Failing to
do so can lead to sampling bias and selection bias.
Ensuring reliability
Reliability should be considered throughout the data collection process. When you use a tool
or technique to collect data, it’s important that the results are precise, stable, and
reproducible.
For example, if you are conducting interviews or observations, clearly define how specific
behaviors or responses will be counted, and make sure questions are phrased the same way
each time. Failing to do so can lead to errors such as omitted variable bias or information
bias.
When you collect your data, keep the circumstances as consistent as possible to reduce the
influence of external factors that might create variation in the results.
For example, in an experimental setup, make sure all participants are given the same
information and tested under the same conditions, preferably in a properly randomized
setting. Failing to do so can lead to a placebo effect, Hawthorne effect, or other demand
characteristics. If participants can guess the aims or objectives of a study, they may attempt
to act in more socially desirable ways.
Literature review
What have other researchers done to devise and improve methods that are reliable and valid?
Methodology
How did you plan your research to ensure reliability and validity of the measures used? This includes
the chosen sample set and size, sample preparation, external conditions and measuring techniques.