Psych Assess Lec Prelims Notes
Psych Assess Lec Prelims Notes
CONTENT 3. PORTFOLIO
Subject matter; what the test is trying to measure Compilation of works done by the individual being
Focus may vary (i.e. two tests measuring the same variable assessed
may differ) o not your current work or best work (Except for
o ex. There are different intelligence tests job hiring)
(theoretical concept on intelligence) o it shows your progression or development
FORMAT o Ex. Photography, art related works, job hiring in
form, plan, structure, arrangement, and layout of test a company
item
o First bullet would refer to the psych test 4. CASE HISTORY
Computerized, pen-and-paper, or some other form Refers to records, transcripts, and other accounts in
o (Ex. A performance test - you ask people to written, pictorial, or other form that preserve archival
perform a behavior) information, official and informal accounts, and other data
Form of the software (for digital tests) and items relevant to an assessee
Case study
Tests differ in their administration procedures o A report or illustrative account concerning a
o digital vs pen and paper person or an event that was compiled on the
basis of case history data.
Tests differ in scoring and interpretation procedures
o Score - code or summary statement, often numerical 5. BEHAVIORAL OBSERVATION
in nature Monitoring the actions of others or oneself by visual or
(units or quota --- a quota of items) electronic means while recording quantitative and or
score: 1-100 qualitative information regarding those actions
o Scoring - process of assigning such evaluative codes Example: naturalistic observation
or statements to performance o observing an individual in his/her natural habitat
ex. Within average (in their home or school)
ex. Assigning: whether the score is high o asking the person to report what he/she see in
and low; mild moderate or severe himself/herself
Ex. IQ test: you get a score (65 on the it is impractical to always observe the
scale) then the scoring will be you telling client
whether the score range from 55-70 6. ROLE PLAY TEST
would be considered as mild; 55-25 - low Improvised or partially improvised
o Cut score - reference point, usually numerical, o ex. Hiring for jobs - you are stimulating how a
derived by judgement and used to divide a set of person behave
data in two to or more classifications o ex. How the person would respond to the
ex. Grading system - cut off score (75%) is customer? --- stress test to the employee ---
the reference if you pass or fail check how the person handles the annoying
what differentiates a low score or below customer
performing
Tests differ with respect to their psychometric soundness Scoring may be done on-site (local processing) or at a central
(technical quality ~ psychometry) Tests differ in scoring and location (central processing).
interpretation procedures o Ex. Recent psychometrian board -- pusb
o Validity computer testing but on site
is it measuring what is supposed to o Answers would be sent to a central site then it
measure? would compute
what theory did you use for intelligence?
is it really measuring the intelligence? Reports may come in the form of a simple scoring report
o Reliability (numerical), extended scoring report, interpretive report,
consistency of results consultative report, or integrative report.
ex. is intelligence score consistent through consultative and integrative are given to
time? expertise
o Utility - Usefulness of a test for a particular purpose OTHER TOOLS
Tools traditionally associated with medical health.
Thermometers to measure body temperature. Gauges to
measure blood pressure.
Biofeedback equipment. intelligence tests, personality tests, neuropsychological
o How it influence the person (heart rate or tests, and other specialized instruments
palpitation) mostly individual assessment, with group testing usually
Penile plethysmograph. for screening (i.e.. determining individuals who need
o phallometry is measurement of blood flow to the further diagnostic evaluation)
penis 3. Counselling Settings
o typically used as a proxy for measurement of Schools, prisons, and governmental or privately owned
sexual arousal institutions.
o used for pedophiles Measures of social and academic skills and measures of
WHO, WHAT, WHY, HOW, AND WHERE personality, interest, attitudes, and values are among the
many types of tests that a counselor might administer to a
Who are the parties? client.
Test developer Referral questions: “How can this child better focus on
o Creates tests for research studies, publications (as tasks?” to “For what career is the client best suited?” to
commercially available instruments), or modifications of “What activities are recommended for retirement?”
existing tests. 4. Geriatric Settings
Psych test are products - test developers would care of older patients in a variety of settings
try to sell them to us assessments is the extent to which assessees are enjoying
It is important to conduct 3rd party of the test as good a quality of life as possible (perceived stress,
Test User loneliness, sources of satisfaction, personal values, quality
o Clinicians, counselors, school psychologists, human of living conditions, and quality of friendships and other
resources personnel, consumer psychologists, social support)
experimental psychologists, social psychologists, etc. screening for cognitive decline and dementia
o Must be QUALIFIED test users (the person buying the 5. Business and Military Settings
test, trying to use the test, and score the test), and also Decision making about the careers of personnel.
PERMITTED TO PURCHASE A wide range of achievement, aptitude, interest,
o Different levels of testing: motivational, and other tests may be employed in the
LEVEL A - teacher made test (prelims test) decision to hire as well as in related decisions regarding
LEVEL B - usually psychometrician/ psychologist promotions, transfer, job satisfaction, and eligibility for
(there are technical knowledge of test) further training
can be done in groups Psychologists working in the area of marketing help
LEVEL C - highest - psychologist (supporting “diagnose” what is wrong (and right) about brands,
psych theories and administered individually) products, and campaigns.
Intelligence test 6. Governmental and organizational credentialing
The Test Taker: the person taking the test Before they are legally entitled to practice medicine,
o Issues to consider physicians must pass an examination.
Test anxiety Law school graduates cannot present themselves to the
Understanding and agreement with the public as attorneys until they pass their state’s bar
rationale of the assessment examination.
Capacity and willingness to cooperate with the Psychologists, too, must pass an examination before
examiner adopting the official title “psychologist.”
the person voluntary taking the test 7. Academic Research Settings
Physical pain and emotional distress any academician who ever hopes to publish research
experienced; physical discomfort; alertness, should ideally have a sound knowledge of measurement
wakefulness during assessment principles and tools of assessment.
test fatigue: multiple test taken in a
day HOW ARE ASSESSMENTS CONDUCTED?
Test acquiescence; social desirability issues; Protocol
social desirability - client lying to you Typically refers to the form or sheet or booklet on which a
(client tries to appear okay) test taker's responses are entered. The term may also be
Prior coaching received used to refer to a description of a set of test- or
o security of the tests and test protocols assessment related procedure (steps to take)
o Ex. Test booklet, answer sheet
some paint themselves as ok or good so the
psych test would be good Rapport
May be defined as a working relationship between the
Society at large (creates needs for new variables to measure) examiner and the examinee
Other parties o He/she might respond to random statements
o Court cases (make decision; forensics) Group vs individually administered tests
o Group - more practical, economical, but not much
In what types of settings are assessments conducted, and why? observation
o Individual - mas tedious pero mas tutok ka
1. Educational Settings Score, interpret, integrate, and report (and feedback)
achievement tests (measuring amount of learning)
o prelims exam ASSESSMENT OF PEOPLE WITH DISABILITIES
diagnostic tests (tools of assessment used to help narrow
down and identify areas of deficit to be targeted for 1. Accommodation
interventions) Adaptation of a test, procedure or situation;
diagnostic test - you being place at the right Substitution of one test for another
lvl o if there is available
2. Clinical Settings Need to make assessment more suitable for an assesee
hospitals, in-patient and out-patient clinics, private with exceptional needs
practice consulting rooms, schools, other institutions
2. Alternate assessment After World War Two, psychologists increasingly used the tests
Evaluative/ diagnostic procedure/ process in government and civilian applications
Varies from the usual, customary standardized way of o Civilian application: Large corporations and private
deriving measurement organizations.
Special accommodation or alternative methods designed By the late 1930s, about 4,000 different psychological
to measure the same variable tests were in print.
WHERE TO GO FOR AUTHORITATIVE INFORMATION: REFERENCE
SOURCE Robert Woodworth
Test catalogues The Woodworth Psychoneurotic Inventory (PERSONAL
o most readily accessible sources of information DATA SHEET (PDS)) was the first widely used self-report
o This source of test information can be tapped by a personality test.
simple telephone call, e-mail, or note Advantage of the self-report personality test.
o brief description of the test o Respondents are arguably the best-qualified
o catalogue’s objective is to sell the test people to provide answers about themselves.
Test manuals Disadvantages of the self-report personality test.
o Detailed information concerning the development of o Respondents may have poor insight into
a particular test and technical information themselves
o Usually can be purchased from the test publisher o People might honestly believe some things
Reference volumes about themselves that in reality are not true.
o This authoritative compilation of test reviews is o Respondents are unwilling to reveal anything
currently updated about every three years about themselves that is very personal or that
o This volume, which is also updated periodically, paints them in a negative light.
provides detailed information for each test listed,
including test publisher, test author, test purpose, Projective tests, such as the Rorschach Inkblot Test, are tests in
intended test population, and test administration which an individual is assumed to "project" onto some
time. ambiguous stimulus his or her own unique needs, fears, hopes,
Journal articles and motivation. (Projection theory)
o Reviews of the test, updated or independent studies Psychological assessment has proceeded along distinct threads:
of its psychometric soundness, or examples of how the academic and the applied.
the instrument was used in either research or an Academic tradition: Researchers at universities throughout the
applied context world use the tools of assessment to help advance knowledge
Online databases and understanding of human and animal behavior.
o most widely used bibliographic databases for test- o For research purposes to get to know more
related publications is that maintained by the In the applied tradition, the goal is to help select applicants for
Educational Resources Information Center (ERIC). various positions on the basis of merit.
o The American Psychological Association (APA) o Putting into action those findings
maintains a number of databases useful in locating CULTURE AND ASSESSMENT
psychology-related information in journal articles,
book chapters, and doctoral dissertations. Culture: The socially transmitted behavior patterns, beliefs,
and products of work of a particular population, community, or
HISTORICAL, LEGAL, ETHICAL CONSIDERATIONS group of people (Cohen, 1994).
Professionals in assessment have shown increasing sensitivity
A HISTORICAL PERSPECTIVE to cultural issues with every aspect of test development and
use.
Darwin's interest in individual differences led his half cousin,
Francis Galton, to devise a number of measures for HENRY GODDARD
psychological variables. Early psychological testing of immigrant populations by Henry
o Eugenics is the scientifically erroneous and immoral Goddard was controversial.
theory of “racial improvement” and “planned o He found that the majority of immigrant populations
breeding, ----Ayaw magpadami were feebleminded.
In Germany, Wilhelm Max Wundt o Goddard's findings were largely the result of using a
o started the first experimental psychology laboratory translated Binet Test that overestimated mental
o measured variables such as reaction time, deficiency in native English-speaking populations, let
perception, and attention span. alone immigrant populations.
o Focused on how people were similar o Goddard's research sparked a nature-nurture debate
o Individual differences as a frustrating source of error Were IQ results indicative of some
in experimentation underlying native ability or the extent to
Charles Spearman which knowledge and skills had been
o Student of Wundt in Leipzig acquired?
o credited with originating the concept of test
reliability as well as building the mathematical Early developers of IQ tests devised culture-specific tests and
framework for the statistical technique of factor clarified that the tests were designed for people from one
analysis culture but not from another.
James McKeen Cattell - coined the term mental test in 1890 Today, developers of intelligence tests take precautions against
and was responsible for introducing mental testing in America. bias.
The twentieth century brought the first tests of abilities such
as intelligence. 1. Verbal communication.
In 1905, Binet (father of intelligence or IQ testing) and Simon Certain nuances of meaning may be lost in translation.
developed the first intelligence test to identify intellectually Some interpreters may not be familiar with mental health
disabled Paris schoolchildren. issues and pre-training may be necessary.
World Wars One and Two brought the need for large-scale In interviews, language deficits may be detected by
testing of the intellectual ability of new recruits. trained examiners but may go undetected in written tests.
Assessments need to be evaluated in terms of the Appreciating the nature of the situation.
language proficiency required and the language level of
the test taker. 2. The right to be informed of test findings
Test takers have a right to know about test findings and
2. Nonverbal communication and behavior. recommendations.
Nonverbal signs or body language may vary from one Test users should sensitively inform test takers of the
culture to another. purpose of the test, the meaning of the score relative to
Psychoanalysis pays particular attention to the symbolic those of other test takers, and the possible limitations and
meaning of nonverbal behavior. margins of error of the test.
Differences in the pace of life across cultures may detract 2.1 Relevancy
or enhance test scores for timed tests.
3. The right to privacy and confidentiality.
3. Standards of evaluation. In most states, information provided by clients to
Judgments related to certain psychological traits can be psychologists is considered privileged information.
culturally relative. Privilege is not absolute
Cultures differ with regard to gender roles and views of Another ethical mandate regarding confidentiality pertains
psychopathology. to safeguarding test data.
Cultures also vary in terms of collectivist versus o The duty to warn authorities or involved party
individualist value. ex. balak q patayin si ganto
Collectivist cultures value traits such as conformity,
cooperation, interdependence, and striving toward group 4. The right to the least stigmatizing label:
goals. The least stigmatizing labels should always be assigned
Individualist cultures place value on traits such as self- when reporting test results.
reliance, autonomy, independence, uniqueness, and o Ex. Feeble minded or mentally retarded ---
competitiveness. intellectual disability
LEGAL AND ETHICAL ISSUES (Historical, Cultural, and Ethical LESSON 3 or 4 OF TESTS AND TESTING
Considerations)
ASSUMPTIONS ABOUT PSYCHOLOGICAL TESTING
Test user qualifications: In 1950, an APA Committee on Ethical Psychological traits and states exist.
Standards for Psychology published a report called Ethical Traits and states can be quantified and measured
Standards for the Distribution of Psychological Tests and Test-related behavior predicts non-test related behavior.
Diagnostic Aids All tests have limits and imperfections.
Level A: Tests or aids that can adequately be administered, Various sources of error are part of the assessment
scored, and interpreted with the aid of the manual. process.
Level B: Tests or aids that require some technical knowledge of Unfair and biased assessment procedures can be identified
test construction and of supporting psychological and and reformed.
educational fields. (May be in groups) Testing and assessment benefit society.
Level C: Tests and aids that require substantial understanding
of testing and supporting psychological fields together with Assumption 1: Psychological traits and states exist.
supervised experience in the use of these devices. (Psychologist A trait has been defined as "any distinguishable, relatively
only in Ph) enduring way in which one individual varies from
Some challenges in testing people with disabilities may include: another" (Guilford, 1959, p. 6).
o Transforming the test into a form that can be taken by States also distinguish one person from another but are
the test taker. (Ex. Braile or oral) relatively less enduring (Chaplin et al., 1988).
o Transforming the responses of the test taker so that o May vary from time to time
they are scorable. o Temporary
o Meaningfully interpreting the test data. Psychological traits exist as constructs - an informed,
scientific concept developed or constructed to describe or
THE RIGHTS OF TEST TAKERS explain behavior.
Test takers have a right to know why they are being o Trait - intelligence; would not specify because
evaluated, how the test data will be used, and what (if considered as relatively stable
any) information will be released to whom. o State - anxiety (can be trait); often specify for
1. Informed consent. the past couple of days or currently feeling
Test takers should give their informed consent only with
full knowledge of such information. Assumption 2: Traits and states can be quantified and measured.
o Information needed for consent must be in Different test developers may define and measure
language the test taker can understand. constructs in different ways.
o Some groups (example: dementia, bipolar Once a construct is defined, test developers turn to types
disorder, and schizophrenia) may not have the of item content and gauging the strength of a trait in a
capacity, or competency, to provide informed test taker.
consent. A scoring system and a way to interpret results need to be
o If competency cannot be provided by the person, devised.
consent may be obtained from a parent or a o Test developers have different interpretation for
legal representative. intelligence based on the scoring that you have
(ex. Optimistic, very optimistic, pessimistic)
COMPONENTS OF COMPETENCY INCLUDE:
Being able to evidence a choice as to whether one wants Assumption 3: Test-related behavior predicts non-test related
to participate. behavior
Demonstrating a factual understanding of the issues. Responses on tests are thought to predict real-world
Being able to reason about the facts of a study, treatment, behavior; the obtained sample of behavior is expected to
or whatever it is to which consent is sought. predict future behavior.
Assumption 4: All tests have limits and imperfections. Validity
Competent test users understand and appreciate the measuring what is supposed to measure
limitations of the tests they use as well as how those Utility
limitations might be compensated for by data from other How useful the test is
sources. The practical value of test as it aids in the decision-making
o Ex. How could you say that the items you put in process/efficiency
your test are associated with intelligence?
Assumption 5: Various sources of error are part of the assessment
process. NORM
ERROR refers to a long-standing assumption that factors other
than what a test attempts to measure will influence Norm-referenced testing and assessment:
performance on the test. A method of evaluation and a way of deriving meaning
o Random error - out of the blue; from test scores by evaluating an individual test taker's
isang tao lng di naprintan (90 items sa kanya score and comparing it to scores of a group of test takers.
the rest 100 items)
o Systematic error - applicable to all would not affect The meaning of an individual test score is understood relative
your reliability to other scores on the same test.
ex. Your exact weight is 55kg but weighing
scale is off by 5kg then you are 60kg Book recon: the end of average by todd rose
a test that has 100items but 90 lng naprint
sa test paper - when all of u would answer Norms
the test would u think that anybody would Test performance data of a particular group of test takers
get a 100? that are designed for use as a reference when evaluating
o Environmental - the lights on the room can affect the or interpreting individual test scores.
test taker
o Procedural error - shade a if the statement is true and Normative sample
shade b if the statement is false; then mali yng the reference group to which the performance of test
binigay na instruction shabe a (false) shade b (true) takers are compared.
ERROR VARIANCE: The component of a test score attributable
to sources other than the trait or ability measured. SAMPLING TO DEVELOP NORMS
o Influences the actual score
o Variance - things that would contribute to your • Standardization
performance The process of administering a test to a representative
o Error variance - ex. Weighing a block of gold with the sample of test takers for the purpose of establishing
right weighing scale, 1kg is the weigh (Is the block of norms.
gold pure gold? No) • Sampling
Impurity is the variance Test developers select a population, for which the test is
o How much impurity for the block of gold to weigh 1kg intended.
o Ex. Yng score na makukuha ay di talaga yun yng score
• Test takers should have at least one common, observable
nya kasi something is influencing their score
characteristic.
o Variance in inherit
Both the assessee and assessor are sources of error variance.
• Stratified sampling
Sampling that includes different subgroups, or strata,
Assumption 6: Unfair and biased assessment procedures can be
from the population.
identified and reformed.
Ilang percent ng subgroup ganun din percent sa population
All major test publishers strive to develop instruments that
divide a sample of adults into subgroups by age, like 18–
are fair when used in strict accordance with guidelines in
29, 30–39, 40–49, 50–59, and 60 and above.
the test manual.
Problems related to test fairness.
• Stratified-random sampling
Arise if the test is used with people for whom it was not
Every member of the population has an equal opportunity
intended.
of being included in a sample.
Some are more political than psychometric in nature.
All of them would have equal number
o some tests have norms not
--- fishbowl method
Assumption 7: Testing and assessment benefit society.
• Purposive sample
There is a great need for tests, especially good tests,
Arbitrarily selecting a sample that is believed to be
considering the many areas of our lives that they benefit.
representative of the population.
Ex. Specific; you want your samples to have a certain
CHARACTERISTICS OF A PSYCHOLOGICAL TEST
characteristics
Objectivity
• Incidental/convenience sample:
Dapat same score yng nakuha nung 2 different individual
A sample that is convenient or available for use; it may
sa iniscroran nila
not representative of the population.
prelims exams - there is only 1 correct answer
Generalization of findings from convenience samples must
eliminates judgement or bias
be made with caution.
Subjectivity: essays, may be affected based on the
judgement of the person checking (personal biases)
• Non-proba - kung cno lng available
Standardization
• Ex. Survey on fastfood
a test is made uniformed
there is a particular standard that you have to follow
Reliability
consistency of results
1.2 Ordinal Scale
Identify stage reached by child in development of specific
DEVELOPING NORMS FOR A STANDARDIZED TEST behavior functions
Having obtained a sample, test developers: Age norms
o Administer the test with a standard set of 2.1 Percentiles
instructions. An expression of the percentage of people whose score on
o Recommend a setting for administering the test. a test or measure falls below a particular raw score.
o Collect and analyze data. Different from percentage correct
o Summarize data using descriptive statistics including 2.2 Standard Scores
measures of central tendency and variability. Derived scores that uses as its unit the SD of the
o Provide a detailed description of the standardization population upon which the test was standardized
sample 2.3 Deviation IQ
Identify stage reached by child in development of specific
behavior functions
Age norms
ADDITIONAL NORMS
NATIONAL NORMS
Norms derived from a normative sample that was
nationally representative of the population at the time
the norming study was conducted
• Fixed reference group scoring systems NATIONAL ANCHOR NORMS
The distribution of scores obtained on the test from one group Equivalency table for scores on two nationally
of test takers is used as the basis for the calculation of test standardized tests designed to measure the same thing.
scores for future administrations of the test. Provides some stability to test scores by anchoring
The SAT tests employ this method. (comparing) them to other test scores
SUBGROUP NORMS
NORM-REFERENCED VERSUS CRITERION-REFERENCED EVALUATION Segmented normative samples
Norm-referenced tests involve comparing individuals to the LOCAL NORMS
normative group Local population's performance on the test
Criterion-referenced tests and assessments, test takers are
evaluated as to whether they meet a set standard (example: a 5: RELIABILITY
driving exam).
Criterion
a standard on which judgement or decision may be based
Observed score = True score plus error. We cannot truly know the perfect variance because error of
Error refers to the difference between the observed score variance exists
and the true score.
X=T+E. ó = sd
o Your performance on a test might not be ó^2 = v
necessarily be your true score Ó^2 = ó^2 T + ó^2 E
o It might be your performance at that moment
only SOURCES OF ERROR VARIANCE
Test construction
VARIANCE AND MEASUREMENT ERROR Variation may exist within items in a test or between tests
Variance: Standard deviation squared. (ie., item sampling or content sampling).
o Measure of dispersion o This is putting items or questions on your test that
o Something that contributes on why you got a score is not relevant to what you are intending to
o Ex. Bakit 70pts nakuha mo? What made it 70? measure
70 - the attribute you truly possess o Might cause the person’s score to increase and
Variance equals true variance plus error variance. decrease
o Ex. Test about intelligence – kpop content
o Some are not knowledgeable about
kpop while some are
Test administration
Sources of error variance may stem from the test
environment.
paiba iba yng test, first test (no
instructions)
2nd administration (right instructions)
3rd administration (mali mali yng
instructions)
o Test taker variables include: Pressing emotional
o
problems, physical discomfort, lack of sleep, and
Reliability is the proportion of the total variance attributed to
the effects of drugs or medication
true variance.
o Examiner-related variables include: Physical
o Variable - anything that varies
appearance and demeanor may play a role.
The way proctor administer the test
MEASUREMENT ERROR
Test scoring and interpretation
Refers to the inherent uncertainty associated with any
Computer testing reduces error in test scoring, but many
measurement, even after care has been taken to
tests still require expert interpretation.
minimize preventable mistakes.
Example: Tests of personality, tests of creativity, various
RANDOM ERROR
behavioral measures, etc.
consists of unpredictable fluctuations and
o Different experts might have different opinion
inconsistencies of other variables in the measurement
process (ie., noise).
RELIABILITY ESTIMATES
Not applicable to all (become noticeable)
SYSTEMATIC ERROR
Test-retest reliability
Typically proportionate to what is presumed to be the
An estimate of reliability obtained by correlating pairs of
true value of the variable being measured.
scores from the same people on two different
Experience or applicable to all (all are incorrect)
administrations of the same test.
EXAMPLES:
o Pre-test and post-test – time sampling reliability
Error variance:
Most appropriate for variables that should be stable over time,
Ex. T1 - 71, T2 - 75 (naalala yng mga sagot sa t1), T3 - 65
such as personality, and not appropriate for variables expected
(may anxiety yng tao)
to change over time.
Able to completely measures (constant yung scores)
o Not the best way to get reliability
As time passes, correlation between the scores obtained on
each testing decreases.
o With intervals greater than 6 months, the estimate of DYNAMIC VS STATIC CHARACTERISTICS
test-retest reliability is called the coefficient of Would test retest be a good measure of reliability for
stability. dynamic characteristics? No
• How stable the scores are through time What are examples of static characteristics?
DYNAMIC STATIC
COEFFICIENT OF EQUIVALENCE Trait, state, or ability A trait, state, or ability
Measures the degree of the relationship between various presumed to be ever- presumed to be relatively
forms of a test by means of alternate-forms or parallel-forms. changing as a function of unchanging
PARALLEL FORMS ALTERNATE FORMS situational and cognitive Example: intelligence
For each form of the test, the Typically designed to be experiences Obtained measurement
means and the variances of equivalent with respect to Measure of anxiety would not be expected to
observed test scores are equal. variables such as content and (characteristic) may vary significantly as a
level of difficulty. change from hour to hour. function of time
Ex. Each item in form a should Test-retest measure Either the test-retest or
have equivalent statistics with would be of little help in the alternate-forms
items in form b gauging the reliability of method would be
Example: giving 2 different forms of test (test a and test b) the measuring instrument appropriate
• Obtaining estimates of alternate-forms reliability and parallel- RESTRICTION OF RANGE VS INFLATION OF RANGE
forms reliability is similar to obtaining an estimate of test-retest
reliability. RESTRICTION INFLATION
The variance of either Variance of either variable
SPLIT-HALF RELIABILITY variable in a correlational in a correlational analysis
Obtained by correlating two pairs of scores obtained from analysis is restricted by is inflated by the sampling
equivalent halves of a single test administered once. the sampling procedure procedure,
• Step 1: Divide the test into equivalent halves. used the resulting correlation
The resulting correlation coefficient = higher.
• Step 2: Calculate a Pearson r between scores on the two
coefficient = lower. Comparing your statistics
halves test.
Limiting the statistics that to a larger group
• Step 3: Adjust the half-test reliability using the Spearman- you have obtained
Brown of the formula.
• Spearman-Brown formula allows a test developer or EXAMPLE:
user to estimate internal consistency reliability from a Inflation or range – ustet – entrance exam – you get the
correlation of two halves of a test. statistics of those who took the ustet, the items of the test,
• Ex. The score for odd and score for even must be the range of scores of the test
same o Sample: college of science
o Range of score of takers – 700-800
OTHER METHODS OF ESTIMATING INTERNAL CONSISTENCY
Restriction of range: only getting the measure of those who
Inter-item consistency
passed
The degree of correlation among all the items on a scale.
o For passers the range of score is: 600-900 (restricted)
Coefficient alpha
o University wide passers 100-900
Mean of all possible split-half correlations.
Inflation of range: comparing your score to the univ-wide
Corrected by the Spearman-Brown formula,
instead of focusing only on the college of science range of
Values range from 0 to 1.
scores
o Passing score: 700-800
MEASURES OF INTER-SCORER RELIABILITY
Inter-scorer reliability: The degree of agreement or
SPEED TEST VS POWER TEST
consistency between two or more scorers (or judges or
raters) with regard to a particular measure.
SPEED POWER
Often used when coding nonverbal behavior.
Limited time Enough time
Guards against biases or idiosyncrasies in scoring.
Uniform level of difficulty If some items are so difficult
• Coefficient of inter-scorer reliability: A
(typically uniformly low) that no testtaker is able to
correlation coefficient used to determine the
obtain a perfect score
degree of consistency among scorers. Testtakers should be able to Allow testtakers to attempt all
(check reliability docs for table) complete all the test items items
correctly. (kasi Madali lang)
THE NATURE OF TESTS