0% found this document useful (0 votes)
7 views

LU 4 Methods of Reliability Testing Concepts

The document discusses the importance of reliability in psychological research and explores various methods used for testing reliability, including test-retest reliability, parallel forms reliability, internal consistency reliability, and inter-rater reliability. Examples are provided to illustrate each reliability testing technique.
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
7 views

LU 4 Methods of Reliability Testing Concepts

The document discusses the importance of reliability in psychological research and explores various methods used for testing reliability, including test-retest reliability, parallel forms reliability, internal consistency reliability, and inter-rater reliability. Examples are provided to illustrate each reliability testing technique.
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 23

Nueva Ecija University of Science and Technology

College of Arts and Sciences


MATHEMATICS AND SCIENCES DEPARTMENT

Psy 102: Psychological Statistics

Lecture prepared by: JAYNELLE G. DOMINGO, MSc. MathEd


LESSON OBJECTIVES:

At the end of the lesson, the students are expected to:

ü Understand the concept of reliability and its importance in


psychological research.
ü Explore various methods used for testing reliability.
ü Analyze the strengths and limitations of different reliability
testing techniques.
ü Apply appropriate reliability testing methods to ensure the
reliability of research instruments.
Introduction

In research, ensuring the reliability of instruments is crucial for


obtaining accurate and trustworthy results.
Psychological research often deals with complex constructs such
as personality traits, cognitive abilities, and mental health variables.
These constructs are typically not directly observable and can be
influenced by various factors, including measurement error.
Measurement error refers to any discrepancy between the true score of
a participant on a given construct and the score obtained from a
measurement instrument. Reliability testing helps to quantify and
minimize measurement error, thereby increasing the accuracy and
precision of the measurements.
Introduction

Reliability is especially crucial in fields like psychology where


researchers aim to make generalizations about human behavior or mental
processes based on empirical evidence. Without reliable measurement
instruments, researchers cannot confidently draw conclusions or make
meaningful comparisons across different studies or populations. Furthermore,
unreliable instruments can lead to wasted time, resources, and effort. Imagine
a scenario where researchers invest significant resources in conducting a large-
scale study, only to discover later that the measurement tool they used was not
reliable. Not only does this undermine the validity of the study's findings, but it
also squanders valuable resources that could have been allocated to more
fruitful research endeavors.
Introduction

Moreover, in applied settings such as clinical psychology or


educational assessment, the consequences of using unreliable instruments can
be particularly severe. For instance, inaccurate assessments of psychological
disorders or academic abilities can result in misdiagnosis, inappropriate
interventions, or ineffective educational programs, ultimately impacting the
well-being and success of individuals.
The reliability of instruments forms the foundation of sound research
or data gathering. By rigorously testing and ensuring the reliability of
measurement instruments, researchers can enhance the validity, credibility,
and impact of their research findings, ultimately advancing our understanding
of human behavior and mental processes.
Understanding Reliability

Reliability refers to the consistency, stability, and


dependability of measurements obtained from a research
instrument.

• Reliable instrument produces consistent results when


administered repeatedly under similar conditions.
• Ensuring reliability is essential because it provides researchers
with confidence that the observed scores accurately reflect the
underlying construct of interest rather than random
measurement error or fluctuations.
Types of Reliability

1. Test-Retest Reliability
This type of reliability assesses the consistency of scores
obtained from the same participants when they are tested on two
separate occasions with the same instrument. For example, if a test
yields similar scores for the same individuals when administered at two
different time points, it indicates good test-retest reliability. However,
factors such as practice effects or changes in participants' conditions
between the two administrations can affect the reliability of this
method.

Test Analysis: Correlation (Pearson r)


Types of Reliability

1. Test-Retest Reliability
Strengths:
Ø Provides a straightforward assessment of stability over time.
Ø Easy to administer and interpret.
Ø Useful for measuring relatively stable constructs.

Weaknesses:
Ø Susceptible to practice effects, where participants may remember previous
responses.
Ø Not suitable for measuring constructs prone to change over short periods.
Ø External factors (e.g., environmental changes) may influence results between
administrations.
Types of Reliability

1. Test-Retest Reliability

Example:
A researcher is studying the effectiveness of a mindfulness-based stress
reduction program. To assess participants' stress levels, the researcher
administers a stress questionnaire twice, with a two-week interval
between administrations. By correlating participants' scores from the
two administrations, the researcher can determine the test-retest
reliability of the questionnaire.
Types of Reliability

2. Parallel Forms Reliability


Also known as alternate forms reliability, this method entails
administering two equivalent forms of the same test to the same group
of participants and then correlating the scores obtained from both
forms. This method helps mitigate the effects of practice or memory on
test-retest reliability.

Test Analysis: Correlation (Pearson r)


Types of Reliability

2. Parallel Forms Reliability


Strengths:
Ø Minimizes the impact of practice effects compared to test-retest reliability.
Ø Useful when repeated administrations of the same instrument are impractical.
Ø Provides a more robust assessment of reliability by using alternate forms.

Weaknesses:
Ø Creating truly equivalent alternate forms can be challenging.
Ø Requires additional time and resources to develop and validate alternate forms.
Ø Differences in content or difficulty between forms may affect reliability estimates.
Types of Reliability

2. Parallel Forms Reliability

Example:
A teacher wants to ensure consistency in grading across multiple
versions of a midterm exam. The teacher creates two equivalent
versions of the exam and administers them to the same group of
students.
Types of Reliability

3. Internal Consistency Reliability


This method assesses the consistency of responses within a
single administration of a test. Common measures of internal
consistency include Cronbach's alpha and split-half reliability.
Cronbach's alpha estimates the average correlation between all possible
combinations of items within a test, with higher values indicating
greater internal consistency. Split-half reliability involves splitting the
test into two halves and correlating the scores obtained from each half.

Test Analysis: Cronbach Alpha (𝛼) / Correlation (Pearson r)


Types of Reliability

3. Internal Consistency Reliability


Strengths:
Ø Provides a measure of the extent to which items within a scale are interrelated.
Ø Allows for the assessment of reliability within a single administration.
Ø Useful for evaluating the homogeneity of a scale or instrument.

Weaknesses:
Ø Assumes that all items are measuring the same underlying construct.
Ø Cronbach's alpha may be influenced by the number of items in the scale.
Ø Doesn't account for systematic errors that may affect individual items.
Types of Reliability

3. Internal Consistency Reliability


Example:
A researcher is developing a questionnaire to assess depression symptoms. The
researcher administers the questionnaire to a sample of participants and calculates
Cronbach's alpha to assess internal consistency.

A psychologist is developing a questionnaire to measure job satisfaction in employees.


The questionnaire consists of 20 items. To assess the split-half reliability of the
questionnaire, the psychologist administers it to sample employees from the same
company. S/He computes the correlation between the total scores obtained from the
odd-numbered items and the total scores obtained from the even-numbered items
across all participants.
Types of Reliability

4. Inter-Rater Reliability
This method assesses the consistency of ratings or judgments
made by different raters or observers. It is particularly relevant in
research involving observational data or subjective judgments.

Test Analysis: Cohen's kappa coefficient for categorical data and


intraclass correlation coefficient (ICC) for continuous data
Types of Reliability

4. Inter-Rater Reliability
Strengths:
Ø Provides a measure of consistency across different observers or raters.
Ø Useful for observational studies or studies involving subjective judgments.
Ø Helps ensure the reliability and validity of data collected through observation.

Weaknesses:
Ø Reliability may be influenced by differences in observer training or judgment criteria.
Ø More time-consuming and resource-intensive compared to other reliability methods.
Ø Requires careful definition and operationalization of behaviors or criteria being
observed.
Types of Reliability

4. Inter-Rater Reliability
Example:
A team of researchers is conducting a study on nonverbal communication in job interviews. They
have developed a coding scheme to analyze specific nonverbal behaviors displayed by job
applicants, such as eye contact, posture, and facial expressions. To assess inter-rater reliability, the
researchers recruit three trained observers who will independently code videos of job interviews
according to the established coding scheme. Each observer watches the same set of video
recordings of job interviews and records the frequency and duration of specific nonverbal
behaviors displayed by the job applicants. They then enter their coding data into a spreadsheet or
coding software. After coding all the videos, the researchers calculate inter-rater reliability using
appropriate statistical measures such as Cohen's kappa coefficient or intraclass correlation
coefficient (ICC). These measures quantify the degree of agreement among the observers in their
coding of nonverbal behaviors.
Types of Reliability

4. Inter-Rater Reliability
Reliability Coefficient

• Correlation1 and Cronbach Alpha2


Coefficient Interpretation
≥ 0.9 excellent reliability
≥ 0.8 < 0.9 good reliability
≥ 0.7 < 0.8 acceptable reliability
≥ 0.6 < 0.7 questionable reliability
≥ 0.5 < 0.6 poor reliability
< 0.5 unacceptable reliability
1
Calmorin, L., & Calmorin, M. (2007). Research Methods and Thesis Writing (2nd ed.). Manila: Rex Bookstore.
2
George, D., & Mallery, P. (2003). SPSS for Windows step by step: A simple guide and reference. 11.0 update (4th ed.). Boston, MA:
Allyn & Bacon.
Reliability Coefficient

• Карра Value/Intraclass Coefficient*


Coefficient Interpretation
0.81-1.00 Near complete agreement
0.61-0.80 Strong agreement
0.41-0.60 Moderate agreement
0.21-0.40 Fair agreement
0.00-0.20 poor agreement

*Landis, J. R., & Koch, G. G. (1977). The measurement of observer agreement for categorical data. biometrics, 159-174.
NOTE

Part 2 of this discussion (illustrating of different types of reliability


in SPSS) will be continued after the Midterm Exam.
References

Calmorin, L., & Calmorin, M. (2007). Research Methods and


Thesis Writing (2nd ed.). Manila: Rex Bookstore.

George, D., & Mallery, P. (2003). SPSS for Windows step by step: A
simple guide and reference. 11.0 update (4th ed.). Boston,
MA: Allyn & Bacon

Landis, J. R., & Koch, G. G. (1977). The measurement of observer


agreement for categorical data. biometrics, 159-174.

You might also like