0% found this document useful (0 votes)
9 views

W2 - Reliability in ESL Research

Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
9 views

W2 - Reliability in ESL Research

Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
You are on page 1/ 27

Validity

Research:
and
proposal
trustworthy
and replicable
Reliability
RELIABILITY
IN ESL
RESEARCH
I n s t r u c t o r : D r. N g u y e n H u u
Cuong
Presenter: Le Do Ngoc Hang
content 1
.
Definition of Reliability

2 True Score Theory


.
3 Measurement Error
.
4 Reliability vs
Validity
.
5 Types of reliability
.
6 Ensuring reliability
.
DEFINITION OF
RELIABILITY
1. "Reliability refers to the consistency or stability of measurement results or research
findings over time and across different conditions or settings."
- Citation: Fraenkel, J. R., Wallen, N. E., & Hyun, H. H. (2012). How to design and evaluate
research in education (8th ed.). McGraw-Hill.

2. "Reliability in ESL research is the extent to which the data collection methods and
instruments used produce consistent, dependable, and replicable results."
- Citation: Dörnyei, Z. (2007). Research methods in applied linguistics: Quantitative,
qualitative, and mixed methodologies. Oxford University Press.

3. "Reliability in ESL research entails the extent to which assessments, tests, or research
procedures yield accurate and consistent results, unaffected by measurement error or
external factors."
- Citation: Brown, J. D. (2004). Research methods in applied linguistics: A practical
resource. Cambridge University Press.
DEFINITION OF
RELIABILITY
4. "Reliability in ESL research refers to the stability and consistency of measurement or
assessment outcomes, ensuring that the results are not influenced by random errors but
reflect true performance or characteristics."
- Citation: Shavelson, R. J., & Webb, N. M. (1991). Generalizability theory: A primer. Sage
Publications.

5. "Reliability in ESL research is the extent to which a measurement or assessment tool


provides consistent results when applied to different groups or individuals under similar
conditions."
- Citation: Dörnyei, Z. (2007). Research methods in applied linguistics: Quantitative,
qualitative, and mixed methodologies. Oxford University Press.

6. "Reliability in ESL research refers to the degree of consistency and dependability of


research procedures and findings, minimizing the influence of measurement errors or
fluctuations in data collection."
- Citation: Mackey, A., & Gass, S. M. (2012). Research methods in second language
acquisition: A practical guide. Wiley-Blackwell.
DEFINITION OF
RELIABILITY
Generally, reliability can be understood as the
consistency or repeatability of observations of
behaviours, performance and/or psychological
attributes.

CONSISTEN
CY
TRUE SCORE THEORY -
Psychometrics

True score theory is a theory


about measurement.
(Classical Test Theory)

1. Every measurement has


an error component.
2. True score theory is the
foundation of reliability.
Measurement error
Random Error is caused
by any factors that
randomly affect
measurement of the
variable across the sample.

Systematic Error is
caused by any factors that
systematically affect
measurement of the
variable across the sample.
Random error
• Variability in Participant Responses
• Measurement Instrument Fluctuations
• Sampling Variability
Systematic error
• Bias in Test Items
• Rater Bias
• Measurement Instrument Biases
Reliability Coefficient
The reliability of a test or research instrument is commonly
expressed as a value between 0 and 1.

A reliability coeffi cient of 0 indicates that the test or


instrument does not measure the target construct
consistently (i.e., it is 0% reliable).

A reliability coeffi cient of 1 means that the test or research


instrument is perfectly precise with no measurement error
(i.e., it is 100% reliable or consistent).
RELIABILITY VS VALIDITY
Reliability Consistency Validity Accuracy
What does it tell The extent to which the results can be The extent to which the results
you? reproduced when the research is really measure what they are
repeated under the same conditions. supposed to measure.

How is it assessed? By checking the consistency of results By checking how well the results
across time, across different observers, correspond to established
and across parts of the test itself. theories and other measures of
the same concept.

How do they relate? A reliable measurement is not always A valid measurement is generally
valid: the results might be reproducible, reliable: if a test produces accurate
but they’re not necessarily correct. results, they should be
reproducible.
RELIABILITY VS VALIDITY

https://conjointly.com/kb/reliability-and-validity/
Types of What does it assess? Example
reliability
Test-retest reli A group of participants complete a
recap
ability
and final
The consistency of a advice questionnaire designed to measure
personality traits. If they repeat the
measure across time questionnaire days, weeks or months
apart and give the same answers,
this indicates high test-retest
reliability.
Interrater relia Based on an assessment criteria checklist,
bility five examiners submit substantially
The consistency of a different results for the same student
measure across raters project. This indicates that the assessment
or observers checklist has low inter-rater reliability (for
example, because the criteria are too
subjective).
Internal consis You design a questionnaire to measure self-
tency
esteem. If you randomly split the results
The consistency of the into two halves, there should be a
measurement itself strong correlation between the two sets of
results. If the two results are very different,
this indicates low internal consistency.
Types of What does it assess? Example
reliability
Test-retest reli The consistency of a A group of participants complete a
recap
ability
and
measure final
across time:
do you get the same
advice questionnaire designed to measure
personality traits. If they repeat the
questionnaire days, weeks or months apart
results when you repeat
and give the same answers, this indicates
the measurement? high test-retest reliability.

Interrater relia The consistency of a Based on an assessment criteria checklist,


bility measure across raters five examiners submit substantially different
or observers: do you get results for the same student project. This
the same results when indicates that the assessment checklist has
different people conduct low inter-rater reliability (for example,
the same because the criteria are too subjective).
measurement?
Internal consis The consistency of the You design a questionnaire to measure self-
tency measurement itself: do esteem. If you randomly split the results into
you get the same two halves, there should be a
results from different strong correlation between the two sets of
parts of a test that are results. If the two results are very different,
designed to measure this indicates low internal consistency.
the same thing?
Types of What does it assess? Statistics
reliability
Test-retest reli The consistency of a measure Researchers administer the same instrument to the

recap and final advice


ability across time same group of participants on two separate
occasions. The scores or measurements obtained
from both administrations are then compared using
a correlation coefficient (e.g., Pearson's correlation)
to determine the stability or reliability of the
instrument. A higher correlation indicates greater
test-retest reliability.
Interrater relia The consistency of a measure Researchers typically provide a set of guidelines or
bility across raters or observers criteria to the raters to ensure uniformity. The
agreement between raters is calculated using
statistical measures such as Cohen's kappa or
intraclass correlation coefficient (ICC). A higher
value indicates greater inter-rater reliability.
Internal consis The consistency of the Researchers administer a single instrument to a
tency measurement itself group of participants and analyze the responses to
calculate measures such as Cronbach's alpha or
split-half reliability. These measures assess the
degree of correlation or agreement among the
individual items within the instrument. A higher
Type of What does it assess? Statistics
reliability
Test-retest reli The consistency of a measure Quantitative measure:
recap and final advice
ability across time Intraclass Correlation Coefficient (ICC)
Interrater relia The consistency of a measure Quantitative measure:
bility across raters or observers Intraclass correlation coefficients (ICC)
Bland and Altman method (fidelity between
two raters)
Spearman-Brown prophecy
Qualitative measure: (Inter-coder)
Cohen's kappa or percentage agreement

Internal consi The consistency of the Quantitative measure:


stency measurement itself Cronbach's Alpha
Spearman-Brown prophecy
Participants Qualitative measure:
Member Checking
Triangulation
Peer Debriefing
Ensuring
reliability
Improving test-retest
reliability
0 02 0
Minimize Control Address
1Practice Environment 3
Participant
Effects: al Variables: Variability:
Randomize Ensure Consider
test orders to consistent individual
reduce testing differences that
familiarity conditions may impact
bias. across performance.
sessions.
Enhancing inter-rater
reliability
0 02 0
Standardize Pilot Calibration
1
d Training: Testing: 3
Sessions:
Ensure all Test Regular
raters instruments meetings to
understand with a sample discuss
scoring to identify and discrepancies
criteria and address and refine
procedures. ambiguities. scoring.
Ensuring internal
consistency reliability
0 02 0
Use Reliable Conduct Consider Item
1
Instruments: Factor 3
Analysis:
Select Analysis: Evaluate
validated Assess the individual items
measures with underlying for consistency
established structure of and coherence.
reliability measurement
coefficients. tools.
Proposal - dissertation
Section Discuss
Literature review What have other researchers done to devise and improve
methods that are reliable and valid?
Methodology How did you plan your research to ensure reliability and validity
of the measures used? This includes the chosen sample set and
size, sample preparation, external conditions and measuring
techniques.

Results If you calculate reliability and validity, state these values


alongside your main results.
Discussion This is the moment to talk about how reliable and valid your
results actually were. Were they consistent, and did they reflect
true values? If not, why not?

Conclusion If reliability and validity were a big problem for your findings , it
might be helpful to mention this here.
An example
Adams et al. (2011) discussed their scoring and
coding procedure as follows: ‘The oral tests were
scored by two of the researchers; the few
discrepancies were discussed until 100 percent
agreement was reached. The written tests were
scored by an independent rater and then the scores
were reviewed by two of the researchers. Interrater
reliability was calculated to be 98 percent.’
Challenges in esl research
 Language Variability: Diverse linguistic
backgrounds among participants.
 Cultural Diff erences: Varied cultural
interpretations of language constructs.
 Contextual Factors: Influence of contextual
factors on language usage and understanding.
Reliability
100%

You might also like