0% found this document useful (0 votes)

15 views

6 - Reliability-1

The document discusses the concepts of reliability and validity in testing. It defines reliability as the consistency and lack of unsystematic error in measurements. It also discusses different types of reliability such as test-retest reliability and internal consistency. Validity is discussed as being a necessary but not sufficient condition for a test to be reliable. Different types of validity evidence such as content, criterion, and construct validity are also outlined.

Uploaded by

Samantha Tavares

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PPTX, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

15 views

6 - Reliability-1

Uploaded by

Samantha Tavares

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PPTX, PDF, TXT or read online on Scribd

You are on page 1/ 90

Reliability and Validity

A couple things to consider…

Who has taken the SAT (or ACT) more than once?

Who has bought a piece of fruit?

What is Reliability
Definitions
◦ Consistency of measurement
◦ Freedom from unsystematic errors of measurement

Why is it important?
◦ We usually get one shot to measure the variables we want to, so it’s
important to make that shot count
Error
There are a lot of sources of unsystematic error
◦ People: employees may perform differently in some situations than others;
performance fluctuates
◦ Tests: the wording of items and the item content may influence
measurements
◦ Conditions: room temperature, noises outside, etc. may affect attention or
focus
What are “systematic” errors?
◦ These are not true errors per se; they are meaningful changes in a variable
that arise from experience, training, or other events
Reliability vs accuracy
1 2 3 4
The coefficient of determination
r and r2
◦ You will often see correlations reported as r ; this is a correlation between two
xy
different variables (x and y)
◦ The correlation between a test and itself (i.e., its reliability) is often expressed as r
xx
◦ Theoretically, this is the average correlation between infinite administrations of the
same test
◦ The square of this correlation, r2, can be thought of as the percentage of variance
(fluctuations in scores) explained by differences in the trait being measured vs
error
◦ Example: if r2 = .6, 60% of the differences in peoples’ scores is due to their differences on the trait
we’re measuring (systematic); 40% of the differences we observe are due to error from various
sources (unsystematic)
Classical Test Theory
The whole idea of reliability relies on the assumption that X = T + e
◦ X is the observed score of a person (the actual measurement value)
◦ T is their “true” score (an individual’s real standing on a trait)
◦ e is error (some crap that we don’t want)

We can never know a person’s true score

The “e” part is what basically makes a measure unreliable
◦ If error is random, it will cancel out over infinite administrations
◦ To the extent we can minimize error, we can make better measures
Reliability
Reliability sets an upper bound on our estimates of validity
◦ Only systematic variance is predictable
◦ A test cannot predict a criterion any better than it can predict itself!

There are two purposes of reliability coefficients

◦ To estimate the precision of a particular procedure as a measuring instrument
◦ To estimate the consistency of performance on the procedure by the
examinees
There are a number of ways we estimate reliability, each with its own
conception of error
Types of reliability
Test-Retest (coefficient of stability): Test --- Test
◦ Administer the same test to the same group of people on two
different occasions
◦ Infinite number of stability coefficients; important to indicate the
time interval between administrations
◦ Error is attributed to differences in administration conditions (e.g.,
poor lighting) and each person (e.g., mood, learning)
Types of reliability
Parallel forms (coefficient of equivalence): Form A  Form B
◦ Administer two different sets of items to the same group of people
◦ Items for parallel forms may be chosen at random from a pool or matched in
various ways
◦ Helps reduce cheating and learning effects
◦ Error is attributed to differences in test forms (e.g., item content)
Types of reliability
Coefficient of stability and equivalence
◦ A combination of parallel forms and test-retest
◦ Group 1: Form A - Form B
◦ Group 2: Form B - Form A
◦ Provides a very conservative (i.e., the lowest) estimate of reliability
Types of reliability
Internal consistency
◦ These measure how well the items within a test are correlated
◦ Split-half (coefficient of equivalence): split a test into two halves and correlate
the halves
◦ Kuder-Richardson 20: used for tests with correct and incorrect answers
◦ KR-20 is the average of all possible split-half combinations
◦ Coefficient alpha (α): used for tests with Likert-type items (e.g., “always,”
“sometimes,” “never”)

In all types of reliability, rxx > .80 is good, rxx > .70 is acceptable
Interrater reliability
In cases of subjective scoring (e.g., performance appraisal) the raters
or examiners may be an additional source of error
◦ Interrater agreement: the exact agreement between raters
◦ Interclass correlation: relative agreement; used when raters are rating
multiple objects or individuals
◦ Intraclass correlation: estimates how much of the differences between raters
are due to rater errors vs true differences in the ratees
These aren’t exactly reliability
◦ Correlations between raters are often pretty low (.1 to .3)
◦ This doesn’t mean error accounts for 90 to 99% of ratings, only that error
accounts for no more than that percentage of the ratings (still not reassuring)
Interpreting reliability
There are many reasons why a test might have a low reliability estimate
◦ Range restriction: the wider range of scores you have, the better your estimates of correlations
will be

◦ Assessing multiple dimensions in the same test often leads to unreliability if dimensions are
unrelated
◦ Reliability estimates are influenced by the # of items
Interpreting reliability
Standard error of measurement
◦ From a reliability coefficient, we can determine the amount of error expected in
each individual’s score

; s = standard deviation, rxx = reliability

With the SEM, we can calculate confidence intervals for scores
◦ Example: the first exam had a reliability of .86, the SD for your class is
about 7, and you get a 43, your estimated true score would be between
39 and 47
; 14
Validity
Reliability is a necessary, but not sufficient, condition for validity
◦ A measure can be highly reliable but not valid, but cannot be highly valid yet
unreliable
◦ Example: in terms of the importance and value of positions within a police department,
several studies have concluded that police officers should be paid higher than detectives.
However, detectives always outrank officers, so these conclusions never apply

If you obtain reliability coefficients, it is possible to correct for

unreliability in criteria and/or predictors in validity estimates
◦ This is typically not used for evaluating a measure’s validity as it is
◦ You can use these analyses to decide if you should seek different predictors

Validity is not a “yes” or “no” question, it is a matter of degree

Types of validity evidence
Content-related validity
◦ Does a test contain a fair sample of the universe of situations it is supposed to
represent?
◦ Does it adequately cover all of the relevant content in the trait domain?

Evaluation of content validity is usually based on judgments by a

panel of experts
◦ Panel members provide ratings on how well content is covered
◦ Various statistics or ratios are calculated to quantify these judgments (really,
we just like numbers)
Types of validity evidence
Criterion-related validity (CRV)
◦ How well does the predictor correlate with (predict) a criterion?
◦ Considered the “gold standard” for evaluating a measure

Predictive
◦ Predictors are collected first; criteria are collected at a later time

Concurrent
◦ Criteria data are collected at the same time as predictors
◦ This is usually done by administering a new set of predictors to job
incumbents (i.e., current employees)
Factors affecting CRV
Range enhancement
◦ Including some people a selection battery was NOT designed for in your
analysis can artificially inflate validity
Range restriction
◦ Direct: criterion data is unavailable for applicants who have been screened
out by predictors
◦ Indirect: applicants are screened out by the previous predictors, which are
related to the new predictors
Various statistical methods exist to correct for range restriction
Types of validity evidence
Construct-related validity
◦ What is the meaning of the construct; how should measures of the construct
relate to measures of other constructs?
◦ What would be the characteristics of individuals who are high or low on that
construct?
Examining the relationship of the construct to these other characteristics
is how we provide evidence of construct validity
◦ Convergent validity: tests should be related to other tests of the same or
similar constructs
◦ Discriminant validity: tests should be unrelated to tests of dissimilar
constructs
Validity
Review from last class
Types of validity evidence
◦ Content
◦ Example: did Exam 1 adequately cover all the material from the first three
weeks of class?
◦ Assessment: panel of subject matter experts (SMEs)
◦ Construct
◦ Example: exactly which construct(s) did my exam measure?
◦ Assessment: examine relationships between constructs
Review from last class
Types of validity evidence
◦ Criterion-related
◦ Example: does being extraverted relate to increased sales performance?
◦ Assessment
◦ Predictive: collect predictor data from applicants, collect criterion data
later on from employees
◦ Concurrent: collect predictor and criterion data from job incumbents
Cross-validation
Cross-validation: the extent to which validity in one instance applies
equally well in other instances
◦ This is also called generalization (how much a specific property applies to
general situations)
Construct and content validity generalize somewhat readily
◦ A possible exception might be transfers across cultures in which conceptions
of a construct may change
On the other hand, criterion-related validity can change greatly from
the sample with which it was established to other samples
Predictors and Regression
We know that our predictors have varying degrees of relationships with our
criterion
◦ Multiple regression allows us to determine the relative importance of each of the predictors
in predicting a criterion; this “relative importance” is quantified in regression weights
◦ Example: consider how your various classes factor into your GPA
◦ Classes with more credits factor into your GPA more than classes with fewer credits

The regression weights for a set of predictors do not apply universally

◦ Those that were developed on a specific sample do not generalize well to others

As such, it’s necessary to re-validate a selection battery periodically, especially

when transferring across organizations
Indirect validity evidence
For many organizations, logistic and practical issues get in
the way of validating predictors
◦ Small companies don’t have a large sample to work with
◦ Criterion data may be weak or unavailable altogether
◦ Validating can be an expensive process (hey, us I-Os have to get
paid!)
Indirect validity evidence
Test transportability
◦ The transferability of the results of a criterion-related validity study
conducted at another location can be inferred based on:
◦ The degree of similarity between the job performed by incumbents locally
and that performed at the location where the test has been used previously
◦ The degree of similarity between the applicants in the prior and local
settings
Indirect validity evidence
Synthetic validity
◦ Determine the psychological elements involved in the predictors
(e.g., verbal ability) and create a battery of tests for those
elements, along with the relative importance of each of these
elements
◦ The sum of the expected validity of all the element tests provides
an estimate of validity
Indirect validity evidence
Validity generalization
◦ Meta-analyses involve combining numerous studies to estimate
the “true” value of relationships
◦ Applied to validity studies, meta-analyses (called validity
generalization studies) can be used to provide estimates of the
expected validity for predictors regardless of sample characteristics
◦ Validity generalization may not hold up as evidence of validity in
courts if used alone
The Kinds of Major Validity
(Visualized)
Types of major Content Related Construct related Criterion related
validity Validity Validity Validity
evidence This is related to
Content validity but is
not technically evidence
of content validity.
Face
Variations of Validity
major validity Convergent Discriminant Concurrent Predictive
evidence

SME Panel Measure Data from the Data from the

Measure
rates each against other predictor and predictor and
How we get against other
item. assessments. the criterion the criterion
that evidence assessments.
Not measures measures
Highly
correlated taken at same taken at
correlated
with time. different
Don’t forget Time, Money, and with similar
dissimilar times
People are necessary to collect constructs
constructs (criterion
any of this evidence therefor you after the
may need to rely on synthetic and predictor).
generalized validity.
Bias and Fairness
Test bias vs fairness
We often examine our tests from a psychometric perspective  test bias
◦ Investigating this issue involves looking at the statistics of our various predictors and criteria

We must also be sure to examine tests from a social and interpersonal

perspective  fairness
◦ Investigating this issue involves looking at applicant perceptions of the selection process
What is bias? How can we look at it
1. Accuracy of predictions
◦ Predicted criterion scores are significantly over- or under-estimated for a
subgroup
2. Group differences in the scores themselves
◦ On average, one group or groups score significantly higher or lower than
another group or groups
User bias?
◦ Some argue that if a test is reliable, any differences are due to the user
◦ “Sometimes bad drivers cause car accidents. We should get rid of all cars.“
◦ Is this really an appropriate comparison?
Differential validity
The EEO Guidelines states that adverse impact (AI) is usually established when a selection ratio
for any group (in terms of protected classes) is less than 4/5 (80%) of the selection ratio for the
group with the highest rate

Gender Applicants Hired Ratio

Male 50 25 .5; 1/2; 50%
Female 20 4 .2; 1/5; 20%
Impact ratio .2/.5 = .4; 2/5; 40%

Race Applicants Hired Ratio

Caucasian 64 8 .125; 1/8; 12.5%
Hispanic American 80 8 .1; 1/10; 10%
Impact ratio .1 / .125 = .8; 4/5; 80%
Not correlations again…ugh
Differential validity
Selection Decision
Performance Evaluation
Differential validity
In examining the validity of predictors, we typically put the most emphasis on criterion validity
◦ How strong is the relationship between a predictor and a criterion?
◦ Do the applicants who perform well on predictors do similarly well in terms of job performance?
◦ Would the applicants who did poorly on predictors (and weren’t hired) have done well on the job?

Positive Zero
Validity Validity
Differential validity
A valid predictor that results in
adverse impact (AI)
◦ Predicts equally for both groups
◦ This can be legal, but you will need
evidence that:
◦ The criterion is relevant and important and
not biased
◦ There aren’t any equivalent criteria that
aren’t biased
◦ A third factor didn’t cause the group
differences
Differential validity
A predictor that is valid for the
entire group, but not each group
separately (AI)
◦ In this case, the predictor is
basically a crude grouping variable
◦ This would be an obvious attempt
to use selection measures to
discriminate
Differential validity
Equal validity for both groups and
valid overall, but unequal predictor
means (AI)
◦ Both groups are equally likely to
succeed on the job, but the minority
group members are much less likely to
be hired due to lower predictor scores
◦ A solution to this problem is to use
separate cut scores for the groups
◦ In most cases, this “solution” is illegal
unless these corrections have been
approved as part of affirmative action
Differential validity
Equal validity for both groups and
valid overall, but unequal criterion
means (no AI)
◦ Both groups are equally likely to be
hired, but nonminority group
members are much more likely to
succeed on the job
◦ This sort of difference in validity could
reinforce negative stereotypes
Differential validity
Equal predictor means, but only
valid for the nonminority group (no
AI)
◦ Members of groups are selected at
equal rates, but nonminorities are
more likely to succeed
◦ Again, this sort of difference in validity
could reinforce negative stereotypes
Differential validity
Differential validity
◦ The validity (i.e., correlation between predictor and criterion) for one or both
subgroups is significant
◦ There is a significant difference in validity between the subgroups

Single-group validity
◦ The validity for one or both groups is significant
◦ There is no significant difference in validity between the subgroups

In general, the occurrence of differential validity in well-controlled

research and well-designed selection systems is rare
Differential prediction
We can also investigate bias in the tests using regression
◦ Example: Predicted criterion score [] = .5(IQ test)
12

8
Criterion Ratings

0
0 1 2 3 4 5 6 7 8 9 10
Predictor Score
Differential prediction
◦ If we add a grouping variable (e.g., ethnicity or sex), we can determine if there are differences
between the subgroups in their average scores on the criterion
◦ Example: = b1(IQ test) + b2(group)
16

14
Women
12
Criterion Ratings

8
Overall Men
6

0
1 2 3 4 5 6 7 8 9 10
IQ Test Score
Differential prediction
◦ If we add the interaction between a group and a predictor, we can determine if the
relationships between the predictors differ as a function of the subgroups (moderation)
◦ Example: = b1(IQ test) + b2(group) + b3(IQ test × group)
35

30
Women
25
Criterion Ratings

15
Overall

10 Men

0
1 2 3 4 5 6 7 8 9 10
IQ Test Score
Differential prediction
Similar to differential validity, group differences in slopes or
intercepts (averages) are rare
When these differences do occur, they almost always favor minority
groups
◦ Minorities tend to have overprediction of criteria, whereas nonminorities
tend to have underprediction
Cognitive ability tests are most likely to display group differences,
especially for different ethnic groups
Differential prediction
◦ When using the “Overall” line (as required by law) in the presence of differential validity:
◦ Scores for the group for whom the predictions are less valid tend to be inflated
◦ Scores for the group the predictions are more valid for receive lower scores
35

30
Women
25
Criterion Ratings

15
Overall

10 Men

0
1 2 3 4 5 6 7 8 9 10
IQ Test Score
Bias in IQ tests
We always hope that our tests never show bias, but they do in rare cases
Could be real differences
◦ Differences in educational opportunities, poverty, neighborhoods, home life, etc.

We could be causing these differences

◦ Stereotype threat

Predictive validity bias

◦ IQ tests predict achievement, but what if achievement tests are biased too?

Culturally-influenced questions
◦ “What would you do if a child much smaller than you tried to pick a fight with you?”
Reducing adverse impact
1. Improve recruitment of minorities
2. Use a combination of cognitive and noncognitive predictors
3. Use measures of specific (rather than general) cognitive ability
4. Use multiple regression to combine predictors into a composite
5. Use differential weighting for criterion facets
6. Consider alternative modes for presenting test materials
7. Enhance face validity
8. Implement banding to select among applicants
Banding
Typically, we select applicants in a top-down fashion: those with the
highest scores are selected first, then the next highest, and so on
down the list
◦ This method gives the greatest utility of our measures, but may lead to
adverse impact due to subgroup differences in test scores
However, all of our predictors have a certain amount of error when
they measure each person
◦ The differences between two scores that are close together may be a result of
unsystematic (random) error
Banding
Banding involves using this error (the standard error of
measurement) in order to create ranges or “bands” of scores that
are not statistically different from one another
◦ Example: the SEM might show that those in the 94th percentile and up are not
statistically different than the top scorer (100th percentile)
◦ Within this band, all applicants are treated equally in terms of that predictor
or criterion; you can then select individuals within this band using other
predictors or criteria (e.g., diversity needs)

0 100
Fairness
Applicants may examine the fairness of a selection system on three
dimensions:
◦ Distributive: do the outcomes of the system (i.e., who gets hired) seem fair?
◦ Procedural: do the tests and processes used to make decisions seem fair?
◦ Interpersonal: was I treated well during my interactions with members of the
organization?
As you’d expect, individual perceptions of distributive fairness are
largely based on whether or not the outcome was favorable him or
her
Fairness
Although tests may be technically fair and lack bias, the process of
testing and making decisions can be such that applicants perceive
unfairness
◦ These perceptions are bad for both the organization and the test taker (e.g.,
lower self-efficacy)
Fair and equitable treatment of test takers involves providing
information about:
◦ The nature of the tests
◦ The intended use of test scores
◦ The confidentiality of the results
Law and HR
Discrimination
Unequal/disparate treatment: Intentional
discrimination
◦ Direct evidence
◦ Open expressions of hatred or inequality; exclusionary policies
◦ Circumstantial evidence
◦ Often established through statistics
◦ Mixed-motive
◦ Both direct evidence of intentional discrimination and evidence that the
stated basis for an employment decision is merely a pretext
Discrimination
Adverse/disparate impact: Unintentional discrimination
◦ Identical standards and procedures are applied to everyone, but
1. The outcome is substantially different for members of a particular group
2. They are unrelated to success on a job
◦ Example: height requirement of 5’8” for police cadets
◦ May have adverse impact on Asians, Hispanics, and women
Employment Laws and Civil Rights
Thirteenth Amendment
◦ No slavery or involuntary servitude

Fourteenth Amendment
◦ Equal protection under the law

Civil Rights Act of 1866

◦ Right to make and enforce contracts for employment

Civil Rights Act of 1871

◦ Right to sue in federal court if deprived of rights
Employment Laws and Civil Rights
Equal Pay Act of 1963
◦ Men and women working for the same employer must be paid the same for
work that is comparable in skill, effort, responsibility, etc.
◦ How do we compare the relative worth of jobs?
◦ Are women underpaid in general, or do they just hold jobs of less worth?

Title VII of the Civil Rights Act of 1964/EEO Act of 1972

◦ Prohibits discrimination in employment practices and job advertisements
based on race (ethnicity), color, religion, sex (as of 2022 this includes
pregnancy, sexual orientation, and gender identity), national origin, or
opposition to unlawful employment practices
◦ Applies to all organizations with over 15 employees
◦ Excludes private clubs, Native American reservations, religious organizations
Provisions to Title VII
Bona fide occupational requirements (BFOQs)
◦ Employers may restrict employment in some instances based on legitimate
requirements of a particular job
◦ Example: gender requirements for clothes models

Seniority systems
◦ Legal so long as the differences are not intentionally discriminative

Preemployment Inquiries
◦ Legal so long as they are not used as a basis for discrimination
Provisions to Title VII
Testing
◦ Any professionally developed ability test may be used as long as it does
not discriminate based on protected traits
Preferential treatment
◦ It is illegal to give preferential treatment to a group due to existing
imbalances
Veterans
◦ Veterans may be given preference in spite of likely adverse impact

National Security
◦ Discrimination is permissible when necessary to protect national security
Employment Laws and Civil Rights
Age Discrimination in Employment act of 1967
◦ Requires EEO based on age; protects those 40 and over
◦ However, older employees can waive their rights to sue

Immigration Reform and Control Act of 1986

◦ Prohibits employment of illegal aliens and those not authorized to
work
Americans with Disabilities Act (ADA) of 1990
◦ Prohibits discrimination against disabled individuals who are still able
to perform the primary functions of the job with or without
accommodation
Details of the ADA
Rehabilitated drug or alcohol abusers are protected
Buildings must be made accessible to those with physical disabilities
Reasonable accommodations must be made for applicants or
employees with disabilities
◦ Example: Walgreens replaced keyboards with touchscreen monitors with
picture icons to make it easier for those with cognitive disabilities
Physical exams are permissible only if all applicants are given them
Details of the ADA
Employers may not ask about past workers’ compensation claims or
history of illegal drug use
Medical information may be kept separate from other personal or
work-related information about them
Drug and alcohol tests are allowed
Enforcement is handled by the EEOC
◦ In cases of intentional discrimination, compensation and punitive damages
can cost an employer up to $300,000!
Employment Laws and Civil Rights
Civil Rights Act of 1991
◦ Expands monetary damage award limits of discrimination cases and allows all
types of discrimination complaints a jury trial
◦ In cases of adverse impact, employee/applicant must provide evidence of
discrimination, after which employers must prove that the challenged practice
is job relevant and necessary
◦ Extends EEO enforcement to U.S. citizens employed by a U.S. company in
foreign countries
◦ Mixed-motive cases: if any discriminatory practice is used (even in
conjunction with nondiscriminatory ones), a system is unlawful
◦ Race Norming (comparing employees/applicants only to others of the same
race) is illegal
Employment Laws and Civil Rights
Family and Medical Leave Act of 1993
◦ Covers private employers with over 50 employees
◦ Gives workers up to 12 weeks unpaid leave for birth, adoption, foster
care, medical illness, or family members of military personnel
Executive Orders 11246, 11375, and 11478
◦ Basically apply EEO standards above to federal agencies/contractors

Rehabilitation Act of 1973

◦ Requires federal contractors to recruit qualified individuals with
disabilities
Employment Laws and Civil Rights
Vietnam Era Veterans Readjustment Act of 1974
◦ Federal contractors are required to take affirmative action for Vietnam-era
veterans

Uniformed Services Employment and Reemployment Rights Act

◦ Employers may not deny employment, reemployment, promotion, or benefits
based on membership or potential membership into the armed forces
Legal or illegal?
Mandatory retirement ages for bus drivers and airline pilots

IQ tests used as a screening tool for manual laborers in a power plant

Running an advertisement for a bank teller job that says,

“Portuguese need not apply”

Allowing only female applicants for a women’s clothing modeling job

Legal or illegal?
A large company refusing to install a ramp to make their business
wheelchair accessible because it is “too expensive”

A Catholic school requiring its teachers to be Catholic themselves

Requiring the actor portraying Hamlet in a play to be White

A less-qualified 36-year-old is given employment over a far more

qualified 23-year-old individual
Examples
A woman is denied a position as a secretary because she is
overweight

An employee is denied a promotion to manager because he is gay

A private organization hires a less qualified Black applicant over a

more qualified White applicant to increase diversity

A woman is fired because her manager does not like her

Law and HR pt. 2
Quick review from last time…
What is disparate treatment? What is adverse impact?

What classes are protected under Title VII of the Civil Rights Act of 1964?

What are Bona Fide Occupational Qualifications? Examples?

At what age does protection from age discrimination begin?

The U.S. Legal Federal System Supreme Court State System

System Court of
Supreme Court
Appeals

Legislative branch makes the laws

Federal
Specialized Court of
Administrative District Courts
Executive branch enforces them Agencies
Courts Appeals
State
Courts interpret laws Internal General Administrative
Bankruptcy Jurisdiction Agencies
Revenue
Service Courts Trial Courts
Alcoholic
Beverage
Informal complaints can be made Equal Local Trial Commission
Employment Tax Court Courts
directly to an employer Opportunity Department of
Commission Health
Formal complaints can be made to:
◦ State fair employment commissions Department of
Labor Employment
◦ Federal regulatory agencies (EEOC) Commission
◦ Federal or state district courts Complaint
Enforcement of Employment Laws
Most states have their own anti-discrimination laws that:
◦ Explain the roles of state fair employment practices commissions or similar
bodies
◦ Specify other protected traits (e.g., weight, marital status, genetic
information)
◦ Describe unfair practices/procedures and the enforcement against them

Equal Employment Opportunity Commission

◦ Handles enforcement of federal EEO laws; ~100,000 cases each year
The Complaint Process
Complaints originally filed to EEOC are deferred to state or local fair
employment commissions first
After 60 days, the EEOC can then begin their own investigation
◦ 1. Investigation
◦ 2. Conciliation
◦ 3. Litigation

Cases with private employers are taken to federal courts; public

employers are taken to the Department of Justice
The EEOC also issues guidelines on compliance with Title VII and
gathers information on employment of minorities
Office of Federal Contract Compliance
Programs (OFCCP)
Ensures that organizations doing business with the government
comply with nondiscrimination laws
◦ Employment opportunities generated by taxes should be distributed without
bias
Companies with fewer minorities or women than would be expected
are required to set goals and timetables to rectify the discrepancy
If a company fails to meet the requirements of the program, their
contract with the government is terminated
◦ The number of companies debarred from government contracts varies each
year but is usually small (under 10)
Employment Case Law
As discussed previously,
◦ Congress enacts laws (often written very generally)
◦ The Executive branch enforces them
◦ Bodies of the Judicial branch interpret them when unclear

These interpretations are called case law

◦ Case laws set precedents that guide future legal cases

However, case laws are not real laws and decisions may be reversed
in light of new social circumstances or scientific evidence
Case Law: Testing
All professionally developed tests are fine as long as they don’t discriminate
◦ 4/5 rule: if the selection rate of one group is 4/5 that of another, a test is biased

Employees with a complaint must identify the specific practice that violates
their rights
◦ Burden of proof shifts to the employer to show that the practice is job relevant and
unbiased
◦ Plaintiff can then show that an alternative test exists that is not biased

Establishment of intention to discriminate is unnecessary (though useful in

getting more money)
Any component of a selection system that is biased results in the whole
thing being biased, even if attempts are made to mask discrimination
Case Law: Personal History
Education or experience
◦ Are fine when the job is highly technical or requires advanced knowledge
◦ Are not fine when used to perpetuate racial imbalance

Arrest or conviction records

◦ Arrest records are not valid for screening (innocent until proven guilty, right?)
◦ Conviction records usually aren’t either, unless the conviction is relevant to
the job for which an individual is applying
◦ Example: a convicted drug dealer applying for a job as a pharmacy
technician
Case Law: Sexual Discrimination
Sexual discrimination typically occurs as disparate treatment (sexual
harassment or gender stereotyping) but may also occur as adverse
impact
Preferences of employers or customers do not make sex a BFOQ
Sexual harassment takes two forms
◦ Quid pro quo: employment or benefits are given in exchange for harassment
◦ Hostile work environment: unwelcome sexual advances or offensive behavior

In most cases, employers are somewhat liable when a supervisor

harasses a worker
Case Law: Age Discrimination
To file a case, employees must show:
◦ That they are over 40
◦ That they are doing satisfactory work
◦ That they were discharged despite satisfactory performance
◦ That a younger person filled the position

Cases may fall under disparate treatment or adverse impact

Complaints of age discrimination often follow layoffs, selections, or
promotion decisions
Case Law: Nationality Discrimination
English-Only rules: discriminatory or not?
Most of the time they are unjustified and constitute discrimination
Other times there is a legitimate reason for it
◦ Examples: police officers, firefighters, paramedics
Case Law: Seniority Systems
In most companies, employees who have been working there the
longest have the most benefits and are least likely to lose their jobs
in cutbacks
Such seniority systems and decisions based on them (promotions,
layoffs, etc.) are typically legal despite negative consequences
toward minorities
Employers may not protect minorities’ or disabled individuals’ jobs at
the expense of those with more seniority
Case Law: Preferential Selection
How do YOU feel about affirmative action programs?
Do extra privileges today atone for being denied them in the past?
Affirmative action plans may be ordered by a court or a plan
developed by an organization may be sanctioned by them
◦ Plans that clearly disadvantage a group (i.e., Whites) are generally not
approved, but plans that impose a “limited burden” on them may be
◦ Quota systems or mechanized advantages are usually struck down
◦ Approved plans usually have a timeline for completion

A more diverse organization has many competitive advantages, thus

efforts to increase diversity are supported
Recent discrimination lawsuits
Goldman Sachs – Gender bias
◦ Allegedly underpays, sexualizes, and undermines women

Walmart – Gender bias

◦ Lawsuit claims female employees are discriminated against when it comes to
receiving compensation and promotions
Kroger – Disabilities discrimination
◦ Lawsuit claims Kroger fired an employee because of his mental and physical
disabilities
Recent discrimination lawsuits
In-N-Out – Age and race discrimination
◦ Lawsuit claims the company refuses to hire African Americans and applicants
over 40 years of age
University of Miami – Race discrimination
◦ Lawsuit claims that individuals are rejected or fired based on their credit
history, which is biased against minorities and does not predict employee
performance

Tetras Scale
No ratings yet
Tetras Scale
1 page
EPPP Test Construction
No ratings yet
EPPP Test Construction
14 pages
Lesson No. 1: Validity of A Test
No ratings yet
Lesson No. 1: Validity of A Test
10 pages
CISA EXAM-Testing Concept-Knowledge of Compliance & Substantive Testing Aspects
From Everand
CISA EXAM-Testing Concept-Knowledge of Compliance & Substantive Testing Aspects
Hemang Doshi
3/5 (4)
Multivariate Analysis – The Simplest Guide in the Universe: Bite-Size Stats, #6
From Everand
Multivariate Analysis – The Simplest Guide in the Universe: Bite-Size Stats, #6
Lee Baker
No ratings yet
Validity and Reliability Lesson 3.
No ratings yet
Validity and Reliability Lesson 3.
48 pages
Paprint
No ratings yet
Paprint
3 pages
test constrcution
No ratings yet
test constrcution
39 pages
Establishing Validity-and-Reliability-Test
No ratings yet
Establishing Validity-and-Reliability-Test
28 pages
Validity and Reliability
100% (1)
Validity and Reliability
22 pages
Strructures
No ratings yet
Strructures
28 pages
Psychometric Properties
No ratings yet
Psychometric Properties
3 pages
Al2 Report
No ratings yet
Al2 Report
87 pages
Reliability and Validity
No ratings yet
Reliability and Validity
23 pages
PT Presentaion
No ratings yet
PT Presentaion
25 pages
3 - Reliability
No ratings yet
3 - Reliability
38 pages
Measurability
100% (1)
Measurability
23 pages
BBA-BI-Class 19 Business Research Notes For BHM
No ratings yet
BBA-BI-Class 19 Business Research Notes For BHM
28 pages
Chapter 3: Understanding Test Quality-Concepts of Reliability and Validity
No ratings yet
Chapter 3: Understanding Test Quality-Concepts of Reliability and Validity
10 pages
Reliability
No ratings yet
Reliability
113 pages
U5_Measurement, Reliability n Validity
No ratings yet
U5_Measurement, Reliability n Validity
9 pages
KPD Validity & Realibility
No ratings yet
KPD Validity & Realibility
25 pages
Reliability and Validity
No ratings yet
Reliability and Validity
21 pages
Chapter 4 Assessment & Evaluation
No ratings yet
Chapter 4 Assessment & Evaluation
10 pages
Chapter 5: Validity For Teachers
No ratings yet
Chapter 5: Validity For Teachers
31 pages
Validity and Reliability
No ratings yet
Validity and Reliability
6 pages
Chapter 5
No ratings yet
Chapter 5
20 pages
Characteristics of Effective Selection Techniques
No ratings yet
Characteristics of Effective Selection Techniques
17 pages
Validity & Realibility
No ratings yet
Validity & Realibility
13 pages
Psycass Reviewer
No ratings yet
Psycass Reviewer
19 pages
Validity and Reliability
No ratings yet
Validity and Reliability
3 pages
Reability Test Table Interpretation
No ratings yet
Reability Test Table Interpretation
6 pages
Chapter 4 Notes
100% (1)
Chapter 4 Notes
3 pages
Lecture 8 Measurement Reliability and Validity
No ratings yet
Lecture 8 Measurement Reliability and Validity
21 pages
Validity & Reliability
No ratings yet
Validity & Reliability
27 pages
Module 4 Psychometric properties (1)
No ratings yet
Module 4 Psychometric properties (1)
49 pages
Reliability Reviewer
No ratings yet
Reliability Reviewer
5 pages
Psych-Testing-Reviewer-Midterm
No ratings yet
Psych-Testing-Reviewer-Midterm
9 pages
Reliability and Validity
No ratings yet
Reliability and Validity
19 pages
Validity
No ratings yet
Validity
5 pages
Unit 9
No ratings yet
Unit 9
27 pages
Validity and Reliability in Research
0% (1)
Validity and Reliability in Research
13 pages
Essentials of A Good Test
No ratings yet
Essentials of A Good Test
6 pages
Psych Testing Assignment 2.
No ratings yet
Psych Testing Assignment 2.
5 pages
Week 2 Lectures
No ratings yet
Week 2 Lectures
13 pages
Individual Differences PDF
No ratings yet
Individual Differences PDF
78 pages
Validity and Reliability
100% (2)
Validity and Reliability
20 pages
Essentials of A Good Psychological Test
No ratings yet
Essentials of A Good Psychological Test
6 pages
Httpseclass.yorku.capluginfile.php5802518mod Resourcecontent7Week202 Reliability20and20Validity Student.pptx
No ratings yet
Httpseclass.yorku.capluginfile.php5802518mod Resourcecontent7Week202 Reliability20and20Validity Student.pptx
1 page
Handbook of Psychological Assessment Fourth Edition
100% (1)
Handbook of Psychological Assessment Fourth Edition
9 pages
Week 2 Readings (CH 4-5)
No ratings yet
Week 2 Readings (CH 4-5)
33 pages
Validity: MEAM 607 - Advanced Test and Measurement By: Sherwin Trinidad
No ratings yet
Validity: MEAM 607 - Advanced Test and Measurement By: Sherwin Trinidad
38 pages
Language Test Reliability
No ratings yet
Language Test Reliability
20 pages
Lesson 4 Assessment 4
No ratings yet
Lesson 4 Assessment 4
26 pages
Psycho Metric Properties of Tests
No ratings yet
Psycho Metric Properties of Tests
8 pages
What Is Validit1
No ratings yet
What Is Validit1
5 pages
Readings Psy211
No ratings yet
Readings Psy211
23 pages
QUALITY OF A TEST
No ratings yet
QUALITY OF A TEST
7 pages
Chracteristics of A Good Test
No ratings yet
Chracteristics of A Good Test
58 pages
Chapter 5 Reliability
No ratings yet
Chapter 5 Reliability
38 pages
Evaluating a Psychometric Test as an Aid to Selection
From Everand
Evaluating a Psychometric Test as an Aid to Selection
Zuzana Robertson C.Psychol
5/5 (1)
Introduction To Non Parametric Methods Through R Software
From Everand
Introduction To Non Parametric Methods Through R Software
Editor IJSMI
No ratings yet
Cognitive Science - 2014 - Kapur - Productive Failure in Learning Math
No ratings yet
Cognitive Science - 2014 - Kapur - Productive Failure in Learning Math
15 pages
Assessment of Depression The Depression Inventory
No ratings yet
Assessment of Depression The Depression Inventory
19 pages
The Writing Handbook
67% (3)
The Writing Handbook
311 pages
cureus-0015-00000044638
No ratings yet
cureus-0015-00000044638
10 pages
Validity and Reliability of Measurements
No ratings yet
Validity and Reliability of Measurements
36 pages
AIL Midterm Exam
No ratings yet
AIL Midterm Exam
25 pages
Kendall's Coefficient of Concordance W
100% (1)
Kendall's Coefficient of Concordance W
8 pages
K00059 - 20180316103502 - Overview of Authentic Assessment by DR J Lund PDF
No ratings yet
K00059 - 20180316103502 - Overview of Authentic Assessment by DR J Lund PDF
23 pages
A Practice-Based Approach To Learning Nature of Science Through Socioscientific Issues
No ratings yet
A Practice-Based Approach To Learning Nature of Science Through Socioscientific Issues
27 pages
The Brief Motor Scale BMS For The Assess
No ratings yet
The Brief Motor Scale BMS For The Assess
13 pages
Effectiveness of Responsive Teaching With Children With Down Syndrome
No ratings yet
Effectiveness of Responsive Teaching With Children With Down Syndrome
12 pages
Classification of Femur Trochanteric Fracture: Evaluating The Reliability of Tang Classification
No ratings yet
Classification of Femur Trochanteric Fracture: Evaluating The Reliability of Tang Classification
6 pages
Chen - Cheung - Measuring Productive Syntax in Mandarin SLI
No ratings yet
Chen - Cheung - Measuring Productive Syntax in Mandarin SLI
1 page
2009 - An Evaluation of Induction Machine Stray Load Loss From Collated Test Results
No ratings yet
2009 - An Evaluation of Induction Machine Stray Load Loss From Collated Test Results
7 pages
8602 2nd Assignment
No ratings yet
8602 2nd Assignment
48 pages
Instrumental Activities Daily Living: Try This
No ratings yet
Instrumental Activities Daily Living: Try This
11 pages
Creative Thinking in Music Its Nature An PDF
100% (1)
Creative Thinking in Music Its Nature An PDF
12 pages
Teaching L2 Composition Purpose, Process, and Prac... - (PG 357 - 419)
No ratings yet
Teaching L2 Composition Purpose, Process, and Prac... - (PG 357 - 419)
60 pages
Arcus Senilis Age Estimation
No ratings yet
Arcus Senilis Age Estimation
8 pages
2013 Ray Et Al Use of Toys in CC Play Therapy
No ratings yet
2013 Ray Et Al Use of Toys in CC Play Therapy
15 pages
Analysis of Five High School Biology Textbooks Used in The United States For Inclusion of The Nature of Science
No ratings yet
Analysis of Five High School Biology Textbooks Used in The United States For Inclusion of The Nature of Science
23 pages
Mackey and Gass 2011 Ch. 1 Introduction PDF
No ratings yet
Mackey and Gass 2011 Ch. 1 Introduction PDF
15 pages
123-358385-observations-powerpoint
No ratings yet
123-358385-observations-powerpoint
28 pages
Cohesive Devices Chen, 2008
No ratings yet
Cohesive Devices Chen, 2008
15 pages
Brindley (1994) - Task-centred assessment in language learning
No ratings yet
Brindley (1994) - Task-centred assessment in language learning
24 pages
ED521929
No ratings yet
ED521929
13 pages
Board Games For Health A Systematic Literature Review and MetaAnalysis
No ratings yet
Board Games For Health A Systematic Literature Review and MetaAnalysis
16 pages
Behavioral Skill Training
100% (1)
Behavioral Skill Training
28 pages

6 - Reliability-1

Uploaded by

6 - Reliability-1

Uploaded by

Reliability and Validity

A couple things to consider…

Who has bought a piece of fruit?

We can never know a person’s true score

There are two purposes of reliability coefficients

; s = standard deviation, rxx = reliability

If you obtain reliability coefficients, it is possible to correct for

Validity is not a “yes” or “no” question, it is a matter of degree

Evaluation of content validity is usually based on judgments by a

The regression weights for a set of predictors do not apply universally

As such, it’s necessary to re-validate a selection battery periodically, especially

SME Panel Measure Data from the Data from the

We must also be sure to examine tests from a social and interpersonal

Gender Applicants Hired Ratio

Race Applicants Hired Ratio

In general, the occurrence of differential validity in well-controlled

We could be causing these differences

Predictive validity bias

Civil Rights Act of 1866

Civil Rights Act of 1871

Title VII of the Civil Rights Act of 1964/EEO Act of 1972

Immigration Reform and Control Act of 1986

Rehabilitation Act of 1973

Uniformed Services Employment and Reemployment Rights Act

IQ tests used as a screening tool for manual laborers in a power plant

Running an advertisement for a bank teller job that says,

Allowing only female applicants for a women’s clothing modeling job

A Catholic school requiring its teachers to be Catholic themselves

Requiring the actor portraying Hamlet in a play to be White

A less-qualified 36-year-old is given employment over a far more

An employee is denied a promotion to manager because he is gay

A private organization hires a less qualified Black applicant over a

A woman is fired because her manager does not like her

What are Bona Fide Occupational Qualifications? Examples?

At what age does protection from age discrimination begin?

Legislative branch makes the laws

Equal Employment Opportunity Commission

Cases with private employers are taken to federal courts; public

These interpretations are called case law

Establishment of intention to discriminate is unnecessary (though useful in

Arrest or conviction records

In most cases, employers are somewhat liable when a supervisor

Cases may fall under disparate treatment or adverse impact

A more diverse organization has many competitive advantages, thus

Walmart – Gender bias

You might also like