0% found this document useful (0 votes)

38 views

Eps 400 Eps 310

Uploaded by

ondiekiderrick949

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

38 views

Eps 400 Eps 310

Uploaded by

ondiekiderrick949

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 48

EPS 400: EDUCATIONAL STATISTICS AND EVALUATION/

EPS 310: EDUCATIONAL MEASUREMENT AND EVALUATION

PART ONE: EDUCATIONAL STATISTICS
CHAPTER ONE: INTRODUCTION
This chapter introduces you to the basic concepts in educational statistics and evaluation.

S
1.2 KEY CONCEPTS
A. Statistics- is the science of conducting studies to collect, organize, summarize, analyse,

ES
and interpret information/data.
Types of statistics

N
i) Descriptive statistics- techniques that organize, summarize, and present a set

PI
of data in attempts to describe a situation.
AP
Elements of Descriptive stats.
a. The population or sample of interest
b. One or more variables to be investigated
H

c. Tables, graphs, or numerical summary tools

d. Identification of patterns in the data

ii) Inferential statistics- techniques that use sample data to make estimates,
decisions, predictions, or other generalizations about the population from
R

which the sample was drawn.

Important concepts in inferential statistics

a) Population- the entire group of individuals that a researcher wishes to study.

b) A sample- is a small group selected from a population to participate in a research

study.
Reasons why we sample.
i. It would take too much time to study the entire population.
ii. It would cost too much money to study the entire population.
iii. It might not be possible to identify and reach all members of the population.
iv. If we study the entire population, we might not have anything left.
c) Parameter- a numerical value that describes a population.
d) Statistic- a numerical value that describes a sample.
e) Sampling error- the discrepancy between a statistic and a parameter.

Page 1 of 48

0759474478
CONGRESSLADY SCHOOL OF EDUCATION
f) Hypothesis- A guess about what is likely to happen.
g) Hypothesis testing-A decision making process for evaluating claims about a
population based on information obtained from samples.
B. Data- are the values (measurements or observations) that variables can assume.
i) Data set- A collection of data values.
ii) Data array- an ordered data set.
iii) Datum/ data value- each value in the data set.
iv) Raw data/ raw scores- data collected in their original form/the scores

S
obtained from tests.

ES
Data can either be quantitative or qualitative. Quantitative data consist of
measurements that are recorded using numbers while qualitative data are

N
measurements that cannot be measured on a natural numerical scale but can only be

PI
classified into categories. AP
C. Variable- is a characteristic that can change or assume different values. E.g. height,
weight, intelligence, age, motivation, identity, e.t.c.
H
Types of variables.
i) Random variables- Variables whose values are determined by chance.
AH

ii) Physical variables- Those measured using tangible/physical instruments e.g.

time, length, height, temperature etc.
R

iii) Psychological Variables- These are intangible variables that are located in the
D

individual’s brain and they are measured indirectly.

iv) Qualitative variables- Variables that differ in distinct quality, characteristic,

or attribute. E.g. gender, religion, country, race, tribe e.t.c
SA

v) Quantitative variables- variables that are numerical in nature and can be

ranked or ordered according to their values. E.G. Height, weight, age, length,
body temperatures, scores in a test e.t.c.
Types of quantitative variables.
a) Discrete variable- a variable that exists in indivisible units. It can only be
counted and expressed using whole numbers E.g. Number of children in a
family, no of telephone calls received, no of students in a class etc.
b) Continuous variable- a variable that can be divided into smaller units
without limit. It can assume an infinite number of values between any two

Page 2 of 48

0759474478
CONGRESSLADY SCHOOL OF EDUCATION
points on a measurement scale. It is obtained by measuring and often
includes fractions and decimals. E.g. Temperature, time, length etc.
VI) Independent variable- in an experiment, it is the variable that is manipulated by the
researcher in order to study its effects on another variable.
VII) Dependent variable- it is the variable that is observed or measured in an experiment
in order to record the effects of the independent variable.
D. Constant- A variable that does not change or that assumes only one value.
E. Measurement- This is the process of assigning numbers to individuals or objects in a

S
systematic way to represent their properties.

ES
F. Evaluation- This is the process of making value judgements based on measurements.
G. Test- A systematic procedure for observing a sample of behaviour/psychological

N
variable. A set of questions to which students are expected to supply answer.

PI
H. Test Item- a specific stimulus to which a person responds overtly; a specific question to
which a person supplies an answer in a test. AP
I. Examination- A collection of tests which measure different traits of the individual in
order to facilitate decision making.
H

Why teachers need to have a basic knowledge of statistics.

1. Statistics enable teachers to be intelligent consumers of the different kinds of

information they come across in the teaching practice.
R

2. Knowledge of statistics enables teachers to plan and use appropriate procedures for
D

conducting and interpreting their own research.

3. Knowledge of statistics enables teachers to describe, classify, organize, summarize,

and present data collected in their practice in ways that are easy to comprehend. E.g.
SA

using graphs, tables, and charts to present test scores.

4. Statistical knowledge helps teachers in the construct and standardize tests in education
as well as in using them properly.
5. Statistics help teachers to use statistical reasoning as they conduct evaluation and
measurement in education.
6. Knowledge of statistics enables teachers to understand individual differences among
students, to compare the suitability of one method or technique with another, and to
make predictions about the future performance of students.

Page 3 of 48

0759474478
CONGRESSLADY SCHOOL OF EDUCATION
LEVELS/SCALES OF MEASUREMENT.
Scales of measurement refer to the ways in which variables/numbers are defined, categorized,
counted, or measured. They present the set of rules upon which measurement is based.

The four types of measurement scales are: Nominal, ordinal, interval, and ratio.

1. Nominal scale of measurement

This classifies data into mutually exclusive (non-overlapping/ different), exhausting

categories which cannot be arranged in any meaningful order. Numbers in this scale only

S
represent properties and can only be used for identification, naming or classifying

ES
variables. You can only count the numbers at this level.

Examples.

N
PI
Numbers on the back of football players. 1, 2, 10, 32 etc.

Gender. e.g. 1= male; 2=female.

AP
Ethnicity e.g. 1= White; 2= Indian; 3= African;
H

Your college admission number.

Areas of study.
R

Nationality.
D

2. Ordinal scale of measurement

Data measured at this level can be put into categories and these categories can be ordered,
SA

or ranked. In this scale, numbers distinguish between individuals and give merit. We can
count the numbers and show the direction of their difference using the concept of ‘greater’
or ‘less’ than or by giving an ordered series.

However, the numbers are ordered without regard for differences in the distance between
the scores. i.e. An ordinal scale can tell us who is the first, second, and third. However, it
cannot tell us whether the distance between the first- and the second ranked scores is the
same as the distance between the second- and third- ranked scores.

Examples.

Letters used in the grading system.

Page 4 of 48

0759474478
CONGRESSLADY SCHOOL OF EDUCATION
Classifying people according to their body size e.g. small, medium, or large.

The rating scale that uses descriptive words, poor, good, excellent etc.

3. Interval level of measurement.

This ranks data and uses equal interval between any two units of measurement. We can
therefore tell precise differences between scores. However, zero is arbitrary point. i.e. it
only represents another point of measurement in this scale and it is therefore not a
meaningful/true zero. It does not represent absence of the quality being measured or the

S
absolute lowest point in the scale. It is just a point on the scale and there are numbers

ES
above and below it.

N
With the numbers obtained on this scale, we can: count, use the concept of greater or less

PI
than; and indicate the precise difference between any two measurements. However, we
cannot make comparative statements about the scores/numbers on an interval scales.
AP
Examples
H
Measurement of:
AH

- Scores on an IQ test.

- Temperature.
R

- Sea level. E.g. Dead Sea is the lowest point on earth at 400 Metres below the seal level.
D

4. Ratio level of measurement.

N
SA

It is similar to the interval scale in that it also represents quantity and has equal intervals
between units of measurement. In addition, this scale has an absolute/true/meaningful
zero. Therefore, you cannot have negative scores on this scale. You can also form
meaningful ratios/comparisons (fractions) between the scores. E.g. Karo (6kg) is twice as
heavy as James (3kg).

NB: Ratio scales are not common in Psychology because many psychological variables
do not permit the measurement of an absolute zero.

Examples.

Page 5 of 48

0759474478
CONGRESSLADY SCHOOL OF EDUCATION
Height, Weight, Time, GPA, age, number of children, salary, number of customers
served, e.t.c.

What scale of measurement is used in the following?

a. Labelling people as confident or shy?

b. Classifying a sample of students into two groups, males and females?
c. Classifying sportsmen according to high, medium, and low self-esteem?
d. Your current emotional expression?

S
1.3 Important Statistical Notation.

ES
Notation refers to the use of letters and symbols to represent statistical concepts and
ideas.

N
PI
Below is the common statistical notation

The letter X- identifies individual scores for a particular variable. When we have multiple
AP
scores per subject, we use X and Y.
H
When used to head a column, it represents a set of scores.
AH

It is important in statistics to specify how many scores are in a set. We use:

N to represent the total number of scores for a population and n to represent the number
R

of scores in a sample.
D

Many of our operations on numbers in statistics involve adding a set of scores.

Summing a set of values has its own notation called summation notation. The Greek
SA

letter sigma, Σ, is used to indicate ‘the sum of’. The expression ΣX, is read’ the sum of
scores’ and it means add all the scores for variable X. ΣY means add all the scores for
variable Y.

1.4 Order of mathematical operations.

To use and interpret summation notation, you must follow the basic order of operations
required for all mathematical calculation. Below is a list showing the correct order for
performing mathematical operations.

1. Any calculation contained within parenthesis is done first.

Page 6 of 48

0759474478
CONGRESSLADY SCHOOL OF EDUCATION
2. Squaring or raising to other exponents is done second.
3. Multiplying, and dividing are done third, and should be completed in order from left
to right.
4. Summation with the Σ notation is done next.
5. Any additional adding and subtracting is done last and should be completed in order
from left to right.

Example 1.1: Given the following scores, 8, 9, 4, 3, and 1; compute ΣX; ΣX2; (ΣX)2;
Σ(X-1) and Σ(X-1)2

S
ES
We can use a computational table to help us demonstrate the calculations:

X X2 (X-1) (X-1)2

N
8 64 7 49

PI
9 81 8 AP 64
4 16 3 9
3 9 2 4
1 1 0 0
H

ΣX=25 ΣX2=171 Σ(X-1)=20 Σ(X-1)2=126

AH
R

TABULATION AND GRAPHICAL REPRESENTATION OF DATA.

A. FREQUENCY DISTRIBUTIONS
D
N

A distribution is just a set of scores.

A frequency distribution is an organized tabulation of raw data using the scores and
frequencies.

We have two types of frequency distributions:

a. Ungrouped frequency distribution

It is used for categorical data (nominal or ordinal data) or when we have a very small
range of data.

The regular frequency distribution comprises of two columns. The first column lists the
scores (X values), in one column, from the highest to the lowest score. Besides each X

Page 7 of 48

0759474478
CONGRESSLADY SCHOOL OF EDUCATION
value, in a second column, is the frequency- the number of times that particular score
occurred in the data set.

Steps in constructing an ungrouped frequency distribution

i. Make a table with the following four columns : A) Class (B) Tally (C) Frequency,
and (D) Percent.

Class Tally Frequency Percent

S
ES
ii. List the scores in column A starting with the highest to the lowest.

N
iii. Tally/sort the data and place the results in column B.

PI
iv. Count the tallies and place the numerical frequency (f) in column C. Always
indicate the Σf at the end of this column. Note: the sum of frequency is the same
AP
as the total number of scores in a data set. Thus Σf=N.
v. Find the percentage of values in each class by using the formula:
H
%=f/n x 100%. Where f=frequency of the class, n= total number of values.
NB: sometimes, we may be required to compute ΣX for a set of scores that have
AH

been organized into a FD. To do that we multiply each X with its respective f and
then add these values. This ensures that we capture all the scores in a data set.
R

Thus for a FD, ΣX= Σ(Xf).

As a general rule, a frequency distribution should have a maximum of 10-15 rows to keep
N

it simple. When scores cover a range wider than this, then it is advisable to use a grouped
SA

FD.

Grouped frequency distribution

We can get a relatively simple organization of the data by listing groups of scores in the table
rather than individual scores. The class intervals are indicated in one column from the highest
to the lowest and their respective frequencies are given in an adjacent column.

Major concepts related to grouped FD.

I. Range= highest score- lowest score

II. Class interval or class (X)= A group of scores

Page 8 of 48

0759474478
CONGRESSLADY SCHOOL OF EDUCATION
III. Apparent limits/class limits= the smallest and largest values that can be included in a
class. The lower class limit is the smallest data value that can be included in a class
while the upper class limit is the largest data value that can be included in a class.
The class limits have gaps between them. E.g. There is a gap between 30 and 31.
IV. Real limits/class boundaries- numbers with an additional decimal value ending in 5
which are used to separate the classes so that there are no gaps in the FD. You get the
lower boundary (real limit) by subtracting .5 from the lower class limit and by adding
.5 to the upper class limit to get the upper boundary.

S
V. Class width= lower (or upper) class limit of a class minus the lower (upper) class limit

ES
of the class below it.
VI. Mid-point= the score that is at the centre of a class.

N
VII. Cumulative frequencies (CF)= the total data values accumulated to and including a

PI
specific class. CFB= the total scores up to and below a particular class. CFA= the
total scores up to and above a particular class.
AP
Basic rules for construction of a grouped FD.
H
I. There should be between 5 and 20 classes.
II. It is preferable a class width as an odd number. This ensures that the mid-point has the
AH

same value as the raw scores.

III. The classes must be mutually exclusive. i.e. non-overlapping.
R

IV. The classes must be continuous. There should be no gaps in the FD. Even when a
D

class has no data values, it must be included.

V. The classes must be exhaustive. i.e. all the data values in the distribution must be
captured by the classes.
SA

VI. The classes must have equal width.

VII. The lowest score in each class must be a multiple of the class interval.
VIII. It must have a clear title.

Steps in the construction of a grouped FD.

a. Determine the range of scores

Range= H-L
Inclusive R= H-L +1 (or real upper limit of the highest score- real lower limit of the
lowest score).
Find the range for the following scores:

Page 9 of 48

0759474478
CONGRESSLADY SCHOOL OF EDUCATION
62,50,36,30,60,48,60,75,50,25,40,90,45,54,60,78,85,32,36,80, 51,53,54.
b. Determine the possible class interval to use.
Lowest possible class interval= R/15 (round it up). HPCI= R/12. (Round it down).
Always decide on an odd numbered CI.
c. Identify the starting point (the highest class in your distribution). NB: For all classes,
the number on your left-hand side should be a multiple of the CI. It must contain the
highest value. Give the other classes. NB: The lowest interval must contain the least
data value. Distribute the data into the classes.

S
d. Give the real limits and mid-points

ES
e. Tally the data
f. Find the numerical frequencies from the tallies. NB: Always find the sum of frequency

N
at the bottom of this column.

PI
g. Compute the cumulative frequencies (CFB and CFA). Your GFD should have the
following columns: AP
Class Real Mid- Tally Frequency CFB CFA
Limits Point
H
AH

Why we construct FDs.

a. To organize and summarize data in a meaningful, intelligible way.
R

b. To enable the reader to determine the shape or nature of the distribution.

c. To facilitate the computation of measures of central tendency and variability.

d. To enable the researcher to draw charts and graphs for data presentation.
e. To enable the reader to compare different data sets.
SA

B. Graphical methods for describing data

Reasons why we use graphs and tables in statistics

a. A graph presents statistical data in, a quick, easy-to-read-and-interpret pictorial format.

b. To capture audience’s attention in public speaking or during a presentation.
c. To discuss an issue, reinforce a point, or summarize a situation.
d. To describe and analyze data.
e. The main purpose of any chart is to give representation of data which is more difficult to
obtain from a table or a complete listing of the data.
f. To reveal a trend or pattern in a situation over a period of time.

Page 10 of 48

0759474478
CONGRESSLADY SCHOOL OF EDUCATION
Common graphical procedures

i) Bar graph- a graph that displays data using bars of different heights. The height of
each bar corresponds to the frequency of the respective score or class. In this graph, a
space is left between adjacent bars. It is used for scores measured on a nominal or
ordinal scale.

The undergraduate student population of Kenyatta University may be summarized thus:

S
Class Number of students
Freshers 12000

ES
Sophomore 11000
Third year 10000

N
Fourth year 9000

PI
Number of students
AP
14000
12000
H
10000
8000
AH

6000 Number of students

4000
R

2000
D

0
First year Second Third year Fourth
N

year Year
SA

ii) Histogram- is a graph that represents frequency distribution data by means of

contiguous rectangular bars whose widths represent the class boundaries and whose
heights correspond to the class frequencies. Contiguous means that there are no gaps
between adjacent bars in a histogram. We use histograms to represent continuous
(interval or ratio) data.

How to construct a histogram.

1. Step 1
Draw and label the x and y axes.
2. Step 2
Represent the frequency on the y axis and the class boundaries on the x axis.

Page 11 of 48

0759474478
CONGRESSLADY SCHOOL OF EDUCATION
3. Step 3
Using the frequencies as the heights, draw vertical bars for each class.

iii) Frequency polygon- a graph that displays the data by using lines that connect points
plotted for the frequencies at the midpoints of the classes. The frequencies are
represented by the heights of the points.
Steps in the construction of frequency polygon
1. Step 1:
Find the midpoint of each class.

S
2. Step 2:
Draw the x and y axes. Label the x axis with the midpoint of each class, and then use a

ES
suitable scale on the y axis for the frequencies.
3. Step 3:
Using the midpoints for the x values and the frequencies as the y values, plot the points.

N
4. Step 4:

PI
Join the polygon and close the polygon by drawing a line back to the x axis at the
beginning and end of the graph, at the same distance that the previous and next midpoints
AP
would be located.
iv) Cumulative frequency polygon/ The smooth curve/ the ogive- This is a graph that
represents the cumulative frequencies for the classes in a frequency distribution. We
H

can construct either a less than or a greater than ogive.

Steps in constructing an ogive

1. Step 1:
Find the cumulative frequency for each class.
R

2. Step 2:
D

Draw the x and y axes. Label the x axis with the class boundaries. NB: For the
N

greater than ogive. plot each upper class boundary and the corresponding
SA

cumulative frequency (since the CFB represent the number of data values
accumulated up to the upper boundary of each class)., plot the lower class boundaries
Use an appropriate scale for the y axis to represent the cumulative
frequencies. Do not label the y axis with the numbers in the cumulative frequency
column
3. Starting with the first class boundary, connect adjacent points with a line segment.
Always close the graph to the class boundary on the x axis at the beginning and at the
end of the graph.

Class Frequency

Page 12 of 48

0759474478
CONGRESSLADY SCHOOL OF EDUCATION
10-14 4

15-19 7

20-24 12

25-29 17

30-34 10

S
ES
Distribution Shapes

Importance of distribution shape

N
The shape of a distribution helps us to describe data.

PI
It also determines the appropriate statistical methods to use when analyzing data.
AP
How to analyse distribution shape.

i. We can analyse the shape of a distribution by drawing a histogram or a frequency

H
polygon.
ii. By computing statistics that describe the shape of a distribution.
AH

The most common shapes

The normal distribution- this is a bell-shaped distribution which has a single peak and
R

tapers off at either end. It is approximately symmetric; i.e., it is roughly the same on both
sides of a line running through the center. It is obtained in a heterogenous class. i.e. where
D

only a few students are very weak, a few very bright, and most of them average. In this
N

distribution, the scores are evenly distributed on both sides of the mean.
SA

Deviations from the normal distribution.

These are caused by:

a. Having a poor test
b. When the class itself is not normal

They include:

a) Skewness- this is deviation from the normal distribution in terms of symmetry. A

skewed distribution has a ‘tail’ either on one end or the other.
Types of Skewness
i) Positively skewed distribution

Page 13 of 48

0759474478
CONGRESSLADY SCHOOL OF EDUCATION
A distribution whose peak is to the left and the data values taper/tail off to the
right. To normalize it, you push the tail to the right. This distribution implies that
majority of the students got low marks. This may arise when:

- The students are very poor.

- The given test is very hard.
- The content is above the level of understanding of the students.
ii) Negatively skewed distribution

A distribution whose peak is to the right and the data values taper off/tail off to the
left. NB: to normalize this distribution, you push the tail to the left. This

S
distribution indicates that majority of the learners got high marks in the test.

ES
This may arise when:

- The given test was very simple.

N
- Cheating.
- The learners are very bright e.g. Alliance Boys.

PI
b) Bimodal distributions
A bimodal distribution have two peaks of the same height. This can be found in
AP
classes with students from different backgrounds.
c) Kurtosis
H
This refers to the peakedness of a graph.
Types of kurtosis
i) Mesokurtic- this is cone-shaped distribution. i.e. there is a gradual
AH

increase followed by gradual decrease of frequencies (broad range of

scores). This is the shape of a normal distribution.
ii) Leptokurtic- this is a distribution that is sharp pointed. The range of
R

scores is very narrow. There is a sudden increase followed by a sudden

decrease of the frequencies. i.e. Majority of the scores were near the mean.
It’s found in a homogenous class.
N

iii) Platykurtic distribution- this is a uniform distribution which is basically

flat or rectangular. In it, the frequencies of the scores are the same. This
could arise if the class is organized in groups (group work).

MEASURES OF CENTRAL TENDENCY

3.1 Measures of central tendency

Central tendency is a statistical measure that determines a single value which acts as the most

typical score or the best representative score for the entire data set.

The single value is used by researchers to:

a) summarize or describe a large set of data,

Page 14 of 48

0759474478
CONGRESSLADY SCHOOL OF EDUCATION
b) compare two or more data sets.

The three commonly used measures of central tendency are: mean, median, and mode.

a) The mode

This is the most frequent score in a data set.

- In a frequency distribution, the mode is the score corresponding to the peak or the high

point of the distribution.

S
- For grouped frequency distribution, mode is the mid-point of the class with the highest

ES
frequency.

NB:

N
-

PI
Can be determined for data measured on any scale of measurement: nominal, ordinal,

interval, or ratio.
AP
Possible scenarios:
H
i. When one value in the distribution has the highest frequency, that value is the mode.
AH

E.g. 8,11,16,18,11,12,11 mode=11.

ii. When two adjacent scores have the same frequency and the two have the highest
R

frequency in the distribution. E.g. 0,1,1,2,2,2,3,3,3,4,5. (Plot the data)

Mode is found by finding the average of the two adjacent scores with the highest
N

frequency. Thus mode= 2+3/2= 2.5

Nb. The peak is somewhere betwen the two scores.

iii. When in a group of scores, two non-adjacent scores have the same frequency, and this

common frequency is greater than that of any other score in the distribution, then the

distribution has two modes. Such a distribution is called a bi-modal distribution.

E.g. 10,11,11,11, 12,12,13,13,14,14,14,15,17. (Plot the data).

Page 15 of 48

0759474478
CONGRESSLADY SCHOOL OF EDUCATION
iv. When more than two non-adjacent scores occur more frequently than any other score

in the distribution and with the same frequency, the distribution is said to be

multimodal.

v. All the scores in a distribution may occur with the same frequency e.g. 1,2,3,4,5,6, the

distribution is said to have no mode. NB: You don’t say that the mode is 0. Since in

some data sets, 0 is an actual value. You simply say that the distribution has no mode.

S
Limitations of the mode.

ES
i. It is not always unique. A data set can have more than one mode, or have no mode at all.

ii. Mode is greatly affected by chance and has little or no mathematical usefulness.

N
PI
Strengths of the mode.

i. It is the only measure of central tendency that can be used for data measured on a nominal
AP
scale.
H
ii. It is used when the most typical score is desired. Hence it is very useful in analyzing
AH

qualitative data.

iii. It is not affected by outliers.

iv. Very easy to compute.

b) Median (MD)
N

The middle point in a data set that has been ordered. It is only computed for data that can be
SA

ordered in ranks. i.e.ordinal, interval, and ratio data. We usually find the median by a simple

counting procedure:

I. Median for ungrouped data

a. When N is odd, the median will be the middle value. Steps: a) List the values in

order, and b) select the middle value as median. Example: the median for this data

set 10,20,23,24,25,28,30, is 24.

Page 16 of 48

0759474478
CONGRESSLADY SCHOOL OF EDUCATION
b. When N is even, the median will fall between two data values. Steps a) List the

values in order, b) Locate the middle two scores, c) add the two values and divide

by 2 to obtain the median.

Example: in this data, 10,20,23,24,25,28,30,33, the median is between the scores 24

and 25. To obtain it, we just get the average of the two scores thus: 24+25/2= 49/2=24.5.

c. If several scores have the same value as the middle score.

S
E.g. 6,7,8,11,11,11,12,13,14. Here, N=9. You should circle the repeated value. If it

ES
separates the data into two equal halves, then the value is the median. If it does not

separate the data into two equal parts, E.g. 6,7,8,11,11,11,12,13,14,15. Then we

N
PI
should find the median using the exact median method/frequency distribution data

method.
AP
II. Median for frequency distribution data.
H
a. One of the simplest methods for finding the median is to draw a histogram. The
AH

goal is to draw a vertical line through the distribution so that exacly half of the boxes

are on either side of the line. The median is that score which is at the vertical line.
R

You could also use the formula for exact median:

𝑐𝑖
D

𝑁
− 𝑐𝑓𝑏
𝑀𝐷 = 𝐿 + ( 2 )
N

𝑓𝑤
SA

Where: L= the lower real limit of the class containing the median score.

N= total number of scores

Cfb= cumulative frequency for the class below the one containing the median score.

Fw= frequency for the class containing the median score.

Ci= class interval.

Advantages of the median

a. It is relatively unaffected by outliers. It’s the best MOCT for data containing outliers.

Page 17 of 48

0759474478
CONGRESSLADY SCHOOL OF EDUCATION
b. The median tends to stay at the centre of the distribution even when the distribution is

‘skewed’. It is the best MOCT for skewed distributions.

c. Median is unique distribution can only have one median.

Disadvantage.

a. Cannot be used with nominal data

iii) The Mean

S
- The arithmetic average of all the scores in a data set. It is calculated for numerical values,

ES
usually measured on an interval or ratio scale. Unlike mode and median, it involves all the

scores in a distribution.

N
PI
To obtain the mean, sum up all scores and then divide by the total number of scores in the set

(N). The formula for the mean is as follows:

AP
M=ΣX/N.
H

Where:
M= mean of the scores
AH

ΣX= sum of the scores

N= number of scores
R

Example: find the mean of the following seven scores: 2,3,6,8,9,10,12

M= 2+3+6+8+9+10+12/7= 50/7= 7.14.

NB: The mean is the score that each individual receives when the total is divided equally among
SA

all N individuals.

Mean for frequency distribution data

Ungrouped:

∑𝑓𝑥
𝑋=
∑𝑓

Grouped:

∑𝑓𝑥
𝑋=
∑𝑓

Page 18 of 48

0759474478
CONGRESSLADY SCHOOL OF EDUCATION
where x is the mid-point.

Mean of means

(𝑥 𝑎 × 𝑛 𝑎 ) + (𝑥 𝑏 × 𝑛 𝑏 ) + (𝑥 𝑐 × 𝑛 𝑐 ).
𝑋=
𝑛𝑎 + 𝑛𝑏 + 𝑛𝑐
Properties of the Mean

a) A distribution can only have one mean.

b) Its computation involves all the scores in a data set.

S
ES
c) The mean serves as the balance point of the distribution because the sum of the

distances below the mean is exactly equal to the sum of the distances above the mean.

N
This makes the mean to be most commonly preferred measure of central tendency.

PI
This property makes the mean to have the following four important characteristics.
AP
a. Changing the value of any score in a data set, will always change the mean.

I. If we add a constant to one value in the data set.

E.g. 9,3,8,7,and5. N=5. ∑X=32. M=6.4. Suppose X=3 is changed to 6. New mean will
AH

be M=7.

New mean= old mean + constant/N.

New mean=6.4+ 3/5= 6.4 +0.6= 7.

D
N

VIII. If we add a constant to all the values in the data set

e.g. If we add 3 to all the values in our example data. We would get:

X X+3

9 12

3 6

8 11

7 10

5 8

∑X=32 ∑X=47

Page 19 of 48

0759474478
CONGRESSLADY SCHOOL OF EDUCATION
Ẍ=6.4 Ẍ=9.4

Note: The new mean is 9.4 which is the same as 6.4 +3.

Therefore: New mean = old mean + constant.

NB: The sum of squared deviations from the mean is always lesser than deviations from any

other point in the distribution.

Advantages of the mean

S
● Easy to compute.

ES
● Easy to work with and use in further analaysis.

● It is the most representative moct.

N
PI
Disadvantages of the mean

a) It is extremely sensitive to outliers (a few scores that deviate extremely from the other
AP
scores in a set). Outliers tend to pull the mean to the extremes of the distribution. This
H
is why the mean is located near the tails of skewed distributions.
AH

b) Is not meanigful for scores measured on a nominal or ordinal scale.

Describing distribution shapes using the MOCT.

The three measures of central tendency systematically relate to one another in a distribution.
D

a. For normal distributions, the three MOCT are equal and therefore occur at the centre.
N

b. Positively skewed data, the mean is the largest MOCT, followed by median then mode.
SA

c. Negatively skewed data, the mode is the greatest MOCT, followed by median, and then

mean is the least.

3.2 Measures of dispersion/ measures of variability/ measures of spread

These are statistical measures that tell us how the scores differ from one another. They indicate

how much the scores are spread out or clustered around the central value.

Significance of measures of dispersion

a. They describe the set of scores.

Page 20 of 48

0759474478
CONGRESSLADY SCHOOL OF EDUCATION
b. They determine how well the average represents the other scores. NB: When a MOV is

small, it indicates that the scores are clustered closely (and therefore the mean is

representative of the data); when it is large it indicates that the mean is not representative

of the data.

c. To facilitate comparison.

d. To facilitate the use of other statistical measures.

S
The most frequently used measures of dispersion are range, interquartile range, variance, and

ES
standard deviation.

a. Range.

N
PI
The range is the difference between the highest and the lowest scores in the data set. The

formula is:
AP
R=H-L
Where R=range
H
H= highest score
L= Lowest score
AH

Range for Frequency Distribution Data:

a. Ungrouped: highest value-lowest value.

b. Grouped:
D

i) Upper real limit of the highest class minus lower real limit of the lowest class.
N

ii) Mid-point of highest class minus mid-point of lowest class.

Merits of Range

a) Easy to calculate.

b) Easy to understand.

c) Captures the entire distribution.

Demerits of Range

a) Only involves two values and threfore can highly be influenced by outliers.

b) Does not indicate the direction of variability i.e. not an accurate indicator of variability.

Page 21 of 48

0759474478
CONGRESSLADY SCHOOL OF EDUCATION
c) Cannot be determined for open ended distributions.

d) Cannot be used for calculating other statistical functions of the data.

QUARTILE DEVIATION/ SEMI-INTERQUARTILE RANGE

It is half the difference between the third quartile and the first quartile.

QD is related to the median. i.e. It is the range of the middle 50% of the data.

What are quartiles?

S
Quartiles are the values that divide a list of numbers into four groups: Q1; Q2; Q3; & Q4.

ES
NB: Q1= 25% of N; Q2= 50% of N; Q3= 75% of N;

For ungrouped data. E.G. 15,13,6,5,12,50,22,18

N
To get QD, you must first order the data set.

PI
Steps: AP
● Arrange data from highest to lowest.
● Find the median of the values (This is Q2).
H
● Find the median of the values below Q2. (This is Q1).
● Find the median of the values above Q2. (This is Q3).
AH

Substitute in the formula:

𝑄1 − 𝑄3
𝑄=
R

2
D

For grouped data.

𝑁/4 − 𝑐𝑓𝑏
𝑄1 = 𝐿 + ( ) 𝑥𝑐
𝑓𝑤
SA

3𝑁/4 − 𝑐𝑓𝑏
𝑄3 = 𝐿 + ( ) 𝑥𝑐
𝑓𝑤

Advantages of SIQ

- Easy to calculate.

- Not influenced by outliers.

- Can be illustrated graphically.

- Can be used for open-ended distributions.

Disadvatages

Page 22 of 48

0759474478
CONGRESSLADY SCHOOL OF EDUCATION
- Does not represent all the data. i.e. the upper and lower 25% of the data are not used in the

calculation.

- Does not support any other statistical analysis of the data.

Mean Deviation (MD).

Deviation score (D)= The distance of a score from the mean. i.e. 𝑋 − 𝑋. Nb: ∑D= 0.

∑|𝑋−𝑋 |
Mean deviation (MD)= the average of the absolute deviation scores. i.e. 𝑀𝐷 = 𝑁

S
Advantages of the MD

ES
- Easy to understand.

N
Disadvantages of the MD

PI
- Does not support more advanced statistical procedures.

Variance.
AP
Variance (S2) usually indicates the average squared distance from the mean. The greater the
H

distance, the greater the variance. When two sets of scores have the same mean but different
AH

variances means that one has a greater spread of scores than the other.

Variance is calculated using the following formula:

∑(𝑋 − 𝑋̅)2
𝑆2 =
D

𝑁
∑𝑋 2
∑ 𝑋2 − ( )
N

𝑁
𝑆2 =
𝑁
SA

2
∑𝑋 2 ∑𝑋 2
𝑆 = −( )
𝑁 𝑁

Where S2 = variance of the scores

N = number of scores
ΣX2 = sum of the squared scores
(ΣX)2 =square of the sum of the scores

The steps involved in working out the variance using the above formular include:

i. Find the sum of the values (ΣX). Example: For the following eight scores

11,12,12,13,14,15,16,17, the ΣX is 110.

Page 23 of 48

0759474478
CONGRESSLADY SCHOOL OF EDUCATION
ii. Square each value and find the sum (ΣX2). Thus the ΣX2 for our scores

would be: 121+144+144+169+196+225+256+289= 1544

Substitute in the formula:

(∑𝑋)2
∑𝑋 2 − 𝑁
𝑆2 =
𝑁

2
∑𝑋 2 ∑𝑋 2
𝑆 = −( )
𝑁 𝑁

S
S2= 1544/8 – (110/8)2 = 193 -189.0625=3.9375=3.94.

ES
Deviation score method:

N
2
∑(𝑥 − 𝑥)
2
𝑆 =

PI
𝑛
𝑥 = 13.75
AP
X x-𝑥 (𝑥 − 𝑥)2
H
11 -2.75 7.5625
AH

12 -1.75 3.0625

12 -1.75 3.0625
R

13 -0.75 0.5625
D

14 0.25 0.0625
N

15 1.25 1.5625
SA

16 2.25 5.0625

17 3.25 10.5625

∑(𝑥 − 𝑥)2 = 31.5

2
∑(𝑥 − 𝑥) = 𝑆𝑢𝑚 𝑜𝑓 𝑠𝑞𝑢𝑎𝑟𝑒𝑠 = 31.5

Variance = ss/n = 31.5/8 = 3.9375 = 3.94.

For grouped data:

Page 24 of 48

0759474478
CONGRESSLADY SCHOOL OF EDUCATION
2
∑𝑋 2 𝑓 ∑𝑋𝑓 2
𝑆 = −( )
𝑁 𝑁
Steps:

1. Make a table with the following columns:

Table 1

A B C D E F

S
Class Frequency Midpoint midpoint squared (x2) Xf X2f

ES
2. Square the midpoint for each class and place the products in column D.

N
3. Multiply frequency by midpoint for each class, and place the products in column E.

PI
4. Multiply the frequency by square of the midpoint, and place the products in column F.

5. Substitute in the formula and solve to get variance:

AP
∑𝑋 2 𝑓 ∑𝑋𝑓 2
𝑆2 = −( )
𝑁 𝑁
H
AH

b) Take the square root to get the standard deviation

∑𝑋 2 𝑓 ∑𝑋𝑓 2
𝑆=√
R

−( )
𝑁 𝑁
D
N

Advantages of variance.
SA

Weaknesses of the variance

-It does not give a good descriptive measure of dispersion since it is expressed in squared

deviations.

C. Standard Deviation

The standard deviation (SD), sometimes represented by the Greek letter σ (sigma), is the square

root of the variance. It usually indicates the extent to which scores vary from the mean. It is a

Page 25 of 48

0759474478
CONGRESSLADY SCHOOL OF EDUCATION
very common measure in the field of testing and measurement. It is also used in calculating the

Deviation IQ.

2
𝛴𝑋 2 −(𝛴𝑋)
𝑆𝐷 = √ 𝑁
𝑁

In the example for variance, we can compute the standard deviation by finding the square root

of the obtained variance. Since, S2=3.94, then SD=√3.94=1.98.

S
Note: That the greater the standard deviation, the more the scores tend to spread out from the

ES
mean. i.e. The smaller the standard deviation, the less the scores tend to vary from the mean

i.e. The homogenous the class was.

N
PI
The standard deviation considers all scores in the distribution and it facilitates calculations of

combined standard deviation, which helps to compare two or more distributions of scores. It
AP
also facilitates other statistical computations like correlation and skewness. The standard
H
deviation provides a unit of measurement for the normal distribution. However, the standard
AH

deviation is very much affected by extreme scores, it is difficult to compute and compare, and

it cannot be calculated for distributions with open classes.

MEASURES OF RELATIONSHIP
Relationship – tendency for two variables to change consistently.
D
N

THE CONCEPT OF CORRELATION

It refers to the relationship between two variables.

Questions answered through correlation.

a. Is there a relationship between the two variables?

b. What is the direction of the relationship?
c. What is the magnitude or the strength of the relationship?

Ways of establishing relationships between two variables:

a. Logical examination

This involves examining/observing a pair of data to pick out any pattern of change in the
values.

Page 26 of 48

0759474478
CONGRESSLADY SCHOOL OF EDUCATION
E.g. (A). 1,2,4,6,7,8. (B). 3,4,6,8,9

However, the relationship between variables is not always easily picked out by mere
observation of the data.

b. By drawing scatter plots/scatter diagrams

Scatterplot is a graph that describes the nature of the relationship between two variables by
plotting them against each other.

Scatter Diagram Procedure

S
i Collect pairs of data where a relationship is suspected.
ii Draw a graph with the independent variable on the horizontal axis and the dependent

ES
variable on the vertical axis. For each pair of data, put a dot or a symbol where the x-
axis value intersects the y-axis value.

N
iii Look at the pattern of points to see if a relationship is obvious (NB. If the data clearly
form a line or a curve, then the variables are correlated).

PI
Types of relationships that can be shown using scatter plots.
AP
i. Positive relationship- this is present when the scatter plot indicates that as the values of
one variable increase, the values of the other variable also increase.
H

Series 1
AH

4.8
4.6
4.4
R

Series 1
4.2
D

4
Category 1 Category 2
N

ii. Negative relationship- this is present when the scatter plot indicates that as the
SA

values of one variable increase, the values of the other variable decrease.

Page 27 of 48

0759474478
CONGRESSLADY SCHOOL OF EDUCATION
Series 1
5
4.5
4
3.5
3
2.5
2
1.5 Series 1
1
0.5
0
Category 1 Category 2

S
ES
iii. No relationship
This is when the scatter plot does not indicate any distinct pattern in the points of the

N
two variables.

PI
Purposes of scatter plots AP
- Show direction of relationship
- Show strength of relationship- the amount of scatter or dispersion in the points. The
strongest possible relationship is a perfect relationship- a relationship where the actual
H
data points perfectly fit on a straight line.
- Show linearity
AH

- Linear rlships- a rln btw n 2 vs where a the pts on a scatter plot are best fitted/
summarized by a straight line.
- Non-linear relship- scattreplots best fitted by a curve
R

- Homoscedascticity/Heteroscedasticity- pts close vs pts far away.

- Presence of outliers- pts far away from the rest.
D
N

c. Correlation coefficient (r)

This is a quantitative measure of the relationship between two variables. It ranges from -1
to +1.

Intrepreting r

You consider two parameters:

i. Its sign- it can either be positive or negative.

A Positive correlation coefficient- indicates a positive relationship between the two

variables. i.e. an increase in one variable is associated with an increase in another
variable.

A negative correlation coefficient- indicates a negative relationship between the two

variables. i.e. an increase in one variable is associated with a decrease in another.

Page 28 of 48

0759474478
CONGRESSLADY SCHOOL OF EDUCATION
ii. Its size/ magnitude- indicates the strength of the relationship between two variables.

How to describe the correlation.

You use words together with numbers.

0 = no relationship
1= a perfect correlation.
<0.2- weak
0.30-0.5- moderate
0.6-0.7- strong
0.8-.99- very strong

S
Common correlation coefficients

ES
1. Pearson’s product moment correlation coefficient (rxy)
Definitional formula
a. 60,50,54,59,75

N
b. 71,40,53,59,72

PI
∑(𝑥 − 𝑥) (𝑦 − 𝑦)
𝑟𝑥𝑦 =
2
2
AP
√∑(𝑥 − 𝑥) ∑ (𝑦 − 𝑦)

E.g.
H
Table 2
X Y 2 2
(𝑥 (𝑦 (𝑥 − 𝑥) (𝑦 − 𝑦) (𝑥 − 𝑥) (𝑦 − 𝑦)
AH

− 𝑥)
− 𝑦)
10 9 -6 -4 36 16 24
20 16 4 3 16 9 12
R

15 13 -1 0 1 0 0
D

19 17 3 4 9 16 12
16 10 0 -3 0 9 0
N

∑x=80 ∑y=65 ∑(𝑥 − ∑(𝑦 − ∑(𝑥 − 𝑥) (𝑦 −

M= 16 M= 13 2
𝑥) =62 2
SA

𝑦) =50 𝑦)=48
rxy= 48/√62x 50= 48/55.68= 0.86.)⁰

Computational Formula

𝑁∑𝑥𝑦 − ∑𝑋∑𝑌
𝑟𝑥𝑦 =
√[𝑁∑𝑋 2 −(∑𝑋)2 ][𝑁∑𝑌 2 −(∑𝑌)2 ]

Table 3

X Y X2 Y2 XY

Page 29 of 48

0759474478
CONGRESSLADY SCHOOL OF EDUCATION
10 9 100 81 90
20 16 400 256 320
15 13 225 169 195
19 17 361 289 323
16 10 256 100 160
∑X=80 ∑Y=65 ∑X2=1342 ∑Y2=895 ∑XY=1088
5440−5200
𝑟𝑥𝑦 = = 240/278.39= 0.86
√[6710−6400][4475−4225]

ASSUMPTIONS OF PEARSON'S CORRELATION COEFFICIENT.

There are four assumptions that should be met when using Pearson's correlation:

S
i. The variables are measured at either interval or ratio scales of measurement.

ES
ii. The variables are approximately normally distributed
iii. There is a linear relationship between the two variables.
iv. There is homoscedasticity of the data. i.e. Homogeneity of variance equal variance of

N
the x and y distributions.

PI
v. There are no outliers in the data.

THE SPEARMAN RANK ORDER CORRELATION COEFFICIENT (rs or rho)

AP
A statistic that measures the degree of association between two ordinal variables. It is the
equivalent of rxy that is usually used when the assumption of normality is not met by the data.
H

It involves ranking each set of data. Then the differences in the ranks is found. rs is found
computed using these differences. It establishes whether the ranks of the two sets of data are
AH

correlated. When both sets have the same rank, rs = + 1. When the two sets have exactly
opposite ranks then it is -1. If there is no relationship between the rankings, rs will be very
near 0.
R

6∑𝐷2
D

𝑟𝑠 = 1 −
𝑛(𝑛 2 − 1)
N
SA

Steps.

2. Rank the scores in variable x and variable y.

3. Subtract the rankings (rank x- rank y).
4. Square the differences.
5. Find the sum of the squares.
6. Substitute in the formula.
6∑𝐷2
𝑟𝑠 = 1 −
𝑛(𝑛 2 − 1)

ASSUMPTIONS OF THE SPEARMAN RANK ORDER CORRELATION

COEFFICIENT.

i. The two variables are measured at least on an ordinal, interval or ratio scale.
Page 30 of 48

0759474478
CONGRESSLADY SCHOOL OF EDUCATION
ii. There is a monotonic relationship between the two variables. A monotonic
relationship exists when either the variables increase in value together, or as one
variable value increases, the other variable value decreases.
iii. The variables are not normally distributed.

CAUTIONS WHEN INTERPRETING CORRELATION COEFFICIENTS.

1. A correlation is a simple number and must not be reported as a percentage.

E.g. r = 0.5 does not mean that 50% of the relationship between the two variables.

2. Correlation does not imply causation.

S
Why?

ES
1. A correlation coefficient is only a measure of relationship. It does not show whether
one variable is causing the scores of the other to be as they are.

N
2. The two variables could be consequences of a common cause but do not cause each

PI
other. i.e. The third variable problem. The third variable is called a lurking
variable: A hidding variable that causes both X and Y.
AP
a. The number of churches in Nairobi and alcoholism.
b. The number of photocopiers in KM and retakes in EPS 400.
3. The two variables could have cyclic causation: X causes Y, and Y causes X.
H
4. There could be indirect causation. E.g. X causes A, and A causes Y.
5. There is no connection between X and Y, and the correlation could be coincidental.
AH

MEASURES OF POSITION/RELATIVE STANDING

These are scores used to locate the relative position of a score in the data set. They mainly
R

compare a students’ performance with that of a reference group, usually one’s classmates.
D

Common Measures of Position

a. Standard scores
SA

b. Percentiles
c. Deciles
d. Quartiles

Standard scores- the transformation of raw scores to a desired scale with a predetermined
mean and standard deviation.
Types of standard scores

1. Z-scores

A Z-score indicates the relative position of a student in a group by showing how far the raw
score is above or below the mean using the sd as a unit of measurement.

Page 31 of 48

0759474478
CONGRESSLADY SCHOOL OF EDUCATION
NB: Comparing raw scores may not be possible since the exams may be different, markers,
schools, etc.

1. this is a score that indicates the number of std deviations a raw score is above or
below the mean in a distribution.
𝑥−𝑥
Formula: 𝑧 = .
𝑠𝑑

Reasons for computing z-scores

1. A z-score value specifies an exact location of a raw score within a distribution.

2. Transforming raw scores into z-scores we standardizes a distribution making it possible

S
different distributions can be made comparable.

ES
NB: Standardization- the process of putting scores on a uniform distribution with the same
mean and std deviation for the purposes of comparing them.

N
Z-scores and Location.

PI
A raw score does not tell us how that particular score compares with other values in the
distribution.
AP
If the raw score is transformed into a z score, the z-score value indicates exactly where a
score is located relative to all the other scores in the distribution.
H

When scores are converted to z-scores, we end up with numbers with two important
properties:
AH

a. Either a + or – sign. + indicates that the X value is located above the mean. – indicates
that the X value is located below the mean.
R

b. The numerical value. This shows the number of std. deviations between the raw score
and the mean.
D
N

We should able to visualize z-scores as locations in a distribution.

Advs of Z-scores

- Makes two/more different distributions the same.

- Enables us to directly compare individual scores from different distributions.
- Allows us to calculate the probability of a score occurring within our normal distribution .

Disadvantages of Z-scores
- The values of z-scores are normally very small. Most of them range from -3 to +3.
- Burdensome since they involve many decimal values and negative numbers. In case one
forgets putting the – sign it would change the entire meaning of the score.
- The z-score distribution has a mean of 0. This is quite difficult for laymen to compute.

To overcome these challenges, we convert raw scores into other standard scores.

Page 32 of 48

0759474478
CONGRESSLADY SCHOOL OF EDUCATION
We convert raw scores are transformed into standard scores using the following procedure:

i Convert raw score into z-score.

ii Then use the formula:

Standard score = mean of the standard score + (standard deviation of the standard score
multiplied by the z-score).

i.e. 𝑆𝑡𝑑. 𝑠𝑐𝑜𝑟𝑒 = 𝑋 + 𝑆𝐷(𝑧𝑠𝑐𝑜𝑟𝑒).

The following are the common standard scores and their respective means and standard
deviations:

S
ES
Standard score Mean Standard deviation
1 T-score 50 10

2 Stannines 5 2

N
3 IQ 100 15

PI
AP
Normalized standard scores- are standard scores based on distributions that were not were
not originally normal, but were transformed into normal distributions.
H

NB: Normalization is the process of trying to make a distribution of scores to be as close as

possible to a normal distribution. This is attained by smoothing the distribution, stretching it
AH

or by condensing irregularities/departures from normality in raw score distribution.

Examples of Normalized standard scores

Stannines. Mean=5; SD= 2.

To transform raw scores into stannines, we use the following procedure:

IX. Convert the raw score into z-score.

X. Then use the formula: 𝑆𝑡𝑎𝑛𝑛𝑖𝑛𝑒 = 5 + 2(𝑧 𝑠𝑐𝑜𝑟𝑒).

NB: Stannines have a range of 1-9.

Z-scores and standardized distributions

When an entire data set is transformed into z-scores, the resulting distribution of z-scores will
always have a mean of 0 and a sd of 1 .i.e. It will be a standard normal distribution.

NB: Standardization is a form of linear transformation of the scores. In a linear

transformation the shape of the original distribution and the location of any individual score
relative to others in the distribution are not affected.

NB: All normally distributed variables can be transformed into standard normal distributions
using the formula for the z- scores.
Page 33 of 48

0759474478
CONGRESSLADY SCHOOL OF EDUCATION
We need to know its properties and how to locate scores on it.

a) Properties.

NB: we must know its properties in order for us to solve problems involving distributions that
are approximately normal.

Can you remember the properties of the standard normal distribution?

● Bell-shaped
● Mean, median, and mode located at the centre.
● Unimodal

S
● Symmetric

ES
● The curve is continuous
● The curve never touches the x axis (homoscedasticity).

N
Statistical Properties
● Has a mean of 0 and sd of 1.

PI
● The total area under the curve is 1 or 100%.
● 50% or 0.5 of the area lies above the mean and 50 % below the mean.
AP
● 34.13 % of the total area under the curve lies between the mean and one SD below
and above the mean.
● 47.72% of the total area under the curve lies between the mean and two SD below and
H

above the mean.

● 49.9% of the total area under the curve lies between the mean and three SD below and
AH

above the mean.

● About 68.26 % of the total area under the curve lies within one SD of the mean. (i.e.
34.13 x2).
R

● About 95.44% of the total area under the curve lies within two SD of the mean. (i.e.
D

47.72 x2).
N

● About 99.8% of the total area under the curve lies within three SD of the mean. (i.e.
SA

49.87x2).

b) Locating scores on the standard Normal Distribution

We locate scores on the standard normal distribution by finding areas under the standard
normal distribution curve.

Steps

i Draw picture

Page 34 of 48

0759474478
CONGRESSLADY SCHOOL OF EDUCATION
ii Shade the desired area
iii Get the area from the table.

Possible areas under the normal distribution curve.

● Between 0 and any z score value: look up the z score value to get the area.
● In any tail: Look up the Z score value to get the area. Then subtract area from 0.50.
● Between two Z-score values on the same side of the mean. Look up both z score
values to get the areas, then subtract the smaller area from the larger area.
● Between two z score values on opposite sides of the mean, look up both z-score
values to get the areas, then add the areas.

S
● To the left of any z value where z is greater than the mean: look up the z value to get

ES
the area, then add 0.50.
● To the right of any z value, where z is less than the mean, look up the value in the
table to get the area, then add 0.50 to the area.

N
● In any two tails. Look up the z values in the table to get the areas. Subtract both areas
from 0.50, then add the answers.

PI
In a class of 100 students the mean of an English test is 75 and the standard deviation 7.5.
AP
How many students scored below a score of 69?

Procedure:
H

i Convert score to z-score

ii Look up for area under the normal distribution curve.
AH

iii Subtract area from .50

iv Multiply answer by N.
R

How to convert a z-score to the corresponding raw score.

E.g. What is the raw score corresponding to a z-score of 1.96 in the English test?
N

Procedure.
SA

i Determine whether the z-score is above or below the mean. Recall: Z(sd) is the
distance from the mean. From formula: Z= x-xbar/sd.
We find that: X= Z(sd) + Mean.
ii The score is 1.96(7.5) above the 75.
i.e. X= 1.96(7.5) + 75.
X=89.7.

How to find a z-score when we know the proportion under a normal curve.

E.g. What score must a student get in the English test, to be among the top 5% of the class?

The top 5% will be .50- .45.

Therefore we look up for the z-score value that gives us the proportion .45.

Page 35 of 48

0759474478
CONGRESSLADY SCHOOL OF EDUCATION
Z= 1.65.

Then work out the raw score.

If z= x-x bar/ sd.

Then Z(sd)+Xbar= X.

Then 1.65(7.5)+75=X.

87.38.

S
Question.

ES
In a class of 240 students, the mean for the end of year Agriculture test was 70 and the
standard deviation was 9. Assuming that the scores were normally distributed:

N
a)How many students scored between 50 and 75?
b)How many students scored above a score of 69?

PI
c)How many students scored below a score of 40?
d)What were the cut off scores for the middle 50% of the class?
e)
AP
Due to limited facilities, the teacher can only promote 85 % of the top students to the
next class. What is the minimum score a student had to obtain to be selected for the
next class?
H
C. Percentiles
AH

Percentile- position in hundredths that a score holds in the distribution.

Percentiles indicate what percentage of scores fall below a particular score in a distribution.
R

Percentile rank- this is the percentage of individual scores in a data set that fall at or below a
given score. It is mainly used in norm-referenced scores.
D

Percentile point- A score below which a certain percentage of scores fall in a given
N

distribution.
SA

When we know the mean and sd of a distribution, we can find percentiles using standard
normal distribution tables.

PART TWO: EDUCATIONAL EVALUATION

DEFINITIONS
A. Test- A device or procedure in which a sample of individual’s behaviour is obtained,
evaluated, and scored using standardized procedures.

Page 36 of 48

0759474478
CONGRESSLADY SCHOOL OF EDUCATION
B. Examination- A collection of tests, which measure different traits of the individual in
order to facilitate decision-making.
C. Measurement- A process of assigning numbers to represent objects, traits, attributes,
or behaviours following a set of rules.
- An educational test is a measurement device and therefore involves rules for
assigning numbers that represent an individual student’s performance.
D. Assessment is the systematic process of collecting and integrating information in a
manner that promotes understanding of students’ characteristics and decision making
in the education process. It is accomplished by use of tests and other techniques.
E. Testing- this is the process of administering, scoring, and interpreting, an instrument.
Testing is a component of the assessment process.
F. Psychometrics- the science of psychological measurement.

S
G. Evaluation is the process of making value judgements about students and the learning

ES
process based on the information gathered through measurements.
There are two major types of evaluation.
Formative Evaluation
This type of evaluation is done during the teaching-learning process with a goal of

N
monitoring student’s learning and to provide on-going specific feedback to them.

PI
Formative are generally low stakes, which means that they have low or no point value.
Examples:
- Draw a concept map in class to represent their understanding of a topic.
AP
- Submit one or two sentences identifying the main point of a lecture.
- Turn in a research proposal for early feedback.
Specific uses of formative evaluation in education.
H
1. It monitors how well the instructional goals and objectives are being met- teacher,
learners, and curriculum designers;
2. It determines student’s strengths and weaknesses- areas mastered and those not
AH

mastered. This informs proper learning interventions which can be put in place.
3. Facilitating learning - allow learners to master the required skills and knowledge;
4. Motivating students-
R

5. It facilitates the analyses of effectiveness of the teacher and learning resources.

Summative evaluation
D

Done at the end of an instructional period. It involves comparing student’s performance

against some standard or benchmark. It typically involves summative evaluation of student

performance esp. thro a numerical or letter grade. E.g. A, B, C, D. Summative evaluations are
SA

often high stakes. High stakes tests are those tests that are used in ways that have important
consequences for the student. Such tests affect decisions regarding whether the student will
be promoted, admitted, or allowed to graduate.
Examples
i. A midterm exam
ii. Final project
iii. A journal paper
iv. End of Sem Exams
Specific uses of Summative Evaluation
a. It guides students’ efforts and activities in subsequent courses (formative use).
b. Provides a summary of student’s achievement/progress to parents.
c. It shows the value or quality of learning outcomes.
d. Enhances accountability.
Participants in the Assessment Process

Page 37 of 48

0759474478
CONGRESSLADY SCHOOL OF EDUCATION
a. Test developers- people who develop the tests. NB: Not all developers are
professionally trained. They have professional and ethical responsibilities.
b. Test users- people who use the tests. Those who select, administer, score, interpret
and use the results. E.g. teachers, psychologists, counsellors, employers, professional
licensing boards, parents, etc. NB: Not all users are professionally trained. They also
have professional and ethical responsibilities.
c. Test takers- people who take/do/write the tests. They have their rights.
d. Test marketers - Those who market assessment products and services.
e. Those who teach others about the assessment process.
CRITERIA FOR CLASSIFYING ASSESSMENT PROCEDURES
There are four basic ways of classification:
1. Nature of assessment

S
a. Maximum performance –determines what an individual can do when performing

ES
at his/her best. E.g. aptitude test; achievement tests.
b. Typical performance- determines what an individual will do under normal
conditions. Attitude & interest inventories.
2. Form of assessment-

N
a. Fixed choice- student selects the response to questions from available options.

PI
E.g. MCQs.
b. Constructed response- student constructs extended response in response to
complex task. E.g. Essays.
AP
c. Performance assessment- Assessment which requires the student to demonstrate
knowledge or skill through activities that are mostly direct, active, and hands-on,
such as giving a speech, performing a task, or producing an artistic product.
H

3. Uses in the classroom-

a. Placement assessment- used to determine student performance at the beginning
AH

of teaching-learning process.
b. Diagnostic assessment- Assessment carried out to find out the underlying causes
of student’s learning difficulties.
R

c. Formative vs summative-
D

4. Method of interpreting results

Norm-referenced- measure performance interpretable in terms of a student’s relative

standing in some known group.

Criterion-referenced- tests that measure performance interpretable in terms of clearly

defined and specific domain of learning.

Similarities
A. Both require specification of the achievement domain to be covered.
B. They use same type of questions.
C. Both require a relevant and representative sample of test items.
D. Both use the same rules of item writing.
E. They are both judged by the same qualities of goodness.
F. They are both useful in educational assessment.

Norm-referenced Criterion referenced

Typically cover a large domain of learning with a Typically focus on specific domain of
few items measuring each specific task. learning tasks with a relatively large
number of tasks.

Page 38 of 48

0759474478
CONGRESSLADY SCHOOL OF EDUCATION
Emphasize discrimination among learners in Emphasize description of the learning
terms of relative levels of learning. tasks learners can or cannot perform.
Interpretation requires clearly defined group. Interpretation requires a clearly defined
achievement domain.
Prefers items of average difficulty and typically Matches item difficulty to learning tasks,
omits very easy and very hard items. without omitting very hard or very easy
items.

Other terms used to describe tests

Individual vs group
Speed vs power

S
Objective vs subjective

ES
5. Alternative assessment- Non-traditional ways of gathering information about the
student. This may include: portfolios, observations, samples of client’s, or group
projects.

N
Needs assessment-Inquiry into the current state of knowledge, resources, or practice with the

PI
intention of taking action, making a decision, or providing a service based on the results.
AP
Self-assessment- Personal rating of ability according to specified criteria.

Criteria for classifying tests

H
(Check Module pg. 106)
Purpose and role of evaluation in Education
i. Student evaluations- enables educators to monitor students’ progress and provide
AH

constructive feedback to them. E.g. they involve assigning students grades to reflect their
academic progress or achievement.
ii. In classroom instruction- provides information that helps teachers to modify and
R

improve their teaching practices.

● Assessment clarifies the nature of intended learning outcomes-
D

● Provides short-term goals to work toward

● Provide feedback concerning learners’ mastery
N

● Helps teachers determine what to teach, how to teach, and how effective their teaching
SA

was, and how to provide feedback to their students.

● Formative assessment- it monitors learners progress and helps teachers ‘size up’ their
students.
● Placement assessment- at the beginning of instruction (measures entry behaviour).
● Diagnostic assessment- identifies learners’ strengths and weaknesses; and causes of
learning problems. Helps teachers provide information about how to overcome learning
difficulties.
● Summative assessment- measures end-of- course achievement.
iii. Selection, placement, and classification decisions- provides infor. that helps educators
to select, place, and classify students.
● Selection decisions involve some individuals being accepted while others are
rejected. Eg. Admission to universities, high school etc.
● Placement decisions involve situations in which students are assigned to
various categories that represent different educational tracks or levels are
ordered in some way.

Page 39 of 48

0759474478
CONGRESSLADY SCHOOL OF EDUCATION
● Classification decisions- refer to situations in which students are assigned to
different categories that are not ordered or not ordered. E.g. Classification of
special education students VI, HI, Emotional Disturbed e. t. c.
iv. Policy decisions- a lot administrative decisions at different levels. They involve
evaluating the curriculum, instructional practices, levels of funding, employee recognition
and benefits, as well as accountability.
v. Counselling and guidance decisions- school cousellors promote the self-understanding
of students and help students plan for their future.
PREPARATION OF EDUCATIONAL MEASUREMENT OBJECTIVES
Educational objectives- A.k.a Instructional / learning objectives.
Educational goals i.e. what you hope the students will learn or accomplish.
Educational objectives are the foundations of assessment.

S
ROLE OF OBJECTIVES IN EDUCATION AND EVALUATION

ES
Objectives:
i. Direct the instructional process- by describing the intended results of instruction.
ii. Communicate the intent of instruction to others (students, parents, school personnel,
public, ministry, etc).

N
iii. Provide a basis for assessing student learning- describing the performance to be

PI
measured.
iv. Make it easier to develop fair, valid, and comprehensive tests.
v. Enhance quality of teaching and learning.
AP
vi. Guide student’s self-assessment of their learning as well as self-management of learning
opportunities.
H
Characteristics of Educational Objectives:
E.O. Possess three most prominent characteristics:
AH

a. Scope- how specific or broad the objective is. There should be a balance between the
broad and narrow objectives. How? Use an intermediate level of specificity or write
broad objectives then break them down into specific ones.
b. Domain - cognitive, affective, or psychomotor domain.
R

c. Format- behavioural objectives- specify activities that are observable and

measurable.
D

Non-behavioural- specify activities that are unobservable and not directly measurable.
N

BLOOMS TAXONOMY
SA

Taxonomy- system of classification.

Bloom’s taxonomy- a system of classification of educational objectives that was developed
by Benjamin Bloom. It gives the cognitive behaviours that can relate to the development of
academic competence among learners.
Learning objective domains
There are three learning objective domains:
1. Cognitive (knowing): Knowledge, reasoning, problem solving.
2. Psychomotor (doing): Communicating, documenting, and assessing.
3. Affective (feeling): Shows sensitivity.
Considering Blooms Taxonomy when writing learning objectives ensures they are:
- Demonstrable in a tangible way
- Achievable in the time frame
- Describe essential learning
- Are appropriate for the level of learners’ competence.
Criteria for selecting appropriate objectives:

Page 40 of 48

0759474478
CONGRESSLADY SCHOOL OF EDUCATION
- They should include all-important outcomes of the course.
- They must be consistent with the general goals of education.
- They should be consistent with the sound principles of learning.
- They should be realistic in terms of students’ abilities, time, and facilities.
- They should be specific and measurable.
The Cognitive Domain. Six basic objectives are listed in Bloom’s taxonomy of thinking or
cognitive domain.

1. Knowledge: Remembering or recognizing something without necessarily

understanding, using, or changing it.
2. Comprehension: Understanding the material being communicated without
necessarily relating it to anything else.

S
3. Application: Using a general concept to solve a particular problem.

ES
4. Analysis: Breaking something down into its parts.
5. Synthesis: Creating something new by combining different ideas.
6. Evaluation: Judging the value of materials or methods as they might be applied in
a particular situation.

N
PI
Table of Specifications (TOS) Aka Test Blueprint

This is a two-way grid/chart which defines the scope and emphasis of a test by relating the
AP
course content to the instructional objectives.

Its columns list the performance objectives at each level of the cognitive domain as described
H
in Blooms Taxonomy. Its rows list the key concepts/ content measured by the test.

TOS should be prepared before the test is constructed. Preferably before the actual teaching.
AH

Steps followed in constructing a TOS

a. Choose measurement goals and content domains to be covered

b. Break down domains into key/fairly independent parts
D

c. Construct the two-way chart. Place the content classification on the left-hand side of
N

the chart and the cognitive domains across the top of the chart.
d. Find the totals or percentages for each content category (placed at the right hand-side
SA

of the chart) and for each cognitive domain (placed at the bottom of the chart).

Cognitive Domain
Content Knowledge Comprehension Application Analysis Synthesis Evaluation Total
Climate 1 2 3 1 4 2 13
Environment 2 3 1 1 2 3 11
Total 3 5 4 2 6 5 24

Importance of the Table of Specifications.

1 It ensures that the test content matches the curriculum content.

2 It helps the teacher to be more organized in teaching especially in the selection of
educational objectives, assessment procedures, and resources.

Page 41 of 48

0759474478
CONGRESSLADY SCHOOL OF EDUCATION
3 It encourages the teacher to use items of varying difficulty.
4 It enhances the content validity of a test- guides test developers not to under test or
over test any area, or include irrelevant concepts.
5 It informs students of what to expect in tests, thus improving their learning and
revision for tests.

4.1 Qualities of a Good Test

- Reliable
- Valid
- Fair

S
- Practical value

ES
4.1.1 Reliabity
4.1.2 Meaning
-

N
The ability of a test to consistently measure what it is measuring.
- The degree to which test scores are consistent, dependable, and repeatable.

PI
4.1.3 Methods of estimating reliability.
The major methods of determining reliability focus on different types of consistency:
AP
- consistency over a period of time,
- over different forms of the assessment,
- within the assessment itself, and
H
- over different raters.
The major methods of estimating reliability include:
Test-retest reliability- It is the extent to which a test yields the same scores when it is
AH

administered to the same group of test-takers on two different occasions. Procedure: Give the
same test to the same group of test-takers with atime interval between the tests (usually within
a short period of time 2 wks-a month) and then correlating the two sets of scores.
R

Equivalent/ alternate/parallel forms- this is a measure of equivalence. It involves

administering two forms of a test to the same group of individuals at almost the same time
N

and then correlating the scores. A high correlation coefficient indicates that the two forms are
measuring the same type of performance. A low correlation coefficient would indicate that
SA

the two versions are not measuring the same thing or that they differ in degree of difficulty.

Test-retest with equivalent forms- This method is a measure of stability and equivalence. It
involves giving two forms of the test to the same group with an increased time interval
between the two forms and then correlating the two sets of scores.
Split-half reliability- this is estimated by administering a test and then dividing it into two
equivalent parts that are scored independently (e.g. odd numbered items vs even numbered
items). The results of the two parts are then scored and correlated. The correlation coefficient
indicates the degree to which consistent results are obtained from the two parts of the test and
therefore adequacy of content sampling.
Inter-rater reliability- this measures consistency of scorers’ judgements when scoring a test.
It is estimated by administering the test once and having two or more individuals
independently score each student’s responses in the test. The scores obtained by two scorers
are then correlated. This method only reflects differences due to the individuals scoring the
test. It is highly applicable for essaytests. A major limitation of the approach is that it is

Page 42 of 48

0759474478
CONGRESSLADY SCHOOL OF EDUCATION
affected by differences due to raters or scorers.
How to improve the reliability of a test.
i. Standardize the conditions under which the test is taken.
ii. Use sufficient number of items or tasks in a test.
iii. Clearly explain requirements for responding to test items.
iv. Identify specific criteria in advance for scoring student’s essay items. Score all
students’ responses to a particular item before moving to a second one.
v. Avoid being influenced by your expectations of your students and score tests
anonymously.
Factors influencing reliability coefficients
i. Test length. A longer test more reliable than a short one.
ii. Speed- Speed tests are not as reliable as power tests. In a speed test not every student

S
is able to complete all of the items within the given time. In a power test every student

ES
is able to complete all the items.
iii. Group homogeneity- The more heterogeneous the group of students who take the
test, the more reliable the scores are likely to be.
iv. Item difficulty- very easy or very difficult tests have little reliability.

N
v. Objectivity- Objectively scored tests show higher reliability than subjective tests.

PI
vi. Variation with the testing situation. Errors in the testing situation (e.g., students
misunderstanding or misreading test directions, noise level, distractions, and sickness)
can cause test scores to vary.
AP
vii. Student’s internal factors
Health
Motivation
H
Anxiety
Validity
AH

The degree to which a test actually measures what it is supposed to measure.

Types
Face validity- whether a test appears superficially to test what it is supposed to measure.
Content validity- a test’s ability to accurately sample the content taught in class and
R

measure the extent to which learners understand it.

Predictive validity- a test’s ability to gauge future performance.
D

Construct validity- the extent to which a test accurately measures a characteristic that is
N

not directly observable.

How to improve validity of a test
SA

- Specify the instructional domains to be used to measure student achievement.

- Select a representative sample of relevant learning tasks.
- Control for extraneous variables.
- Use standardized instructions.
- Arrange test items properly.
- Let the test be of moderate difficulty.
- Have similar scoring procedures.

ITEM FORMAT FOR WRITTEN TESTS

Item format- the form, plan, structure, arrangement, and layout of individual test items.
Two broad types of items formats:
a. Selected response format- test items that require students to pick the correct answer
form available options. E.G. Multiple choice question, true/false, matching tests.

Page 43 of 48

0759474478
CONGRESSLADY SCHOOL OF EDUCATION
b. Constructed response format- test items that ask students to create an answer by
writing out information E.G. Fill in the blank spaces, essays etc.
FACTORS TO CONSIDER WHEN SELECTING TEST FORMAT
i. Purpose of the test.
ii. Time available to prepare and mark the test.
iii. Number of students to be tested.
iv. Physical facilities available for reproducing the test.
v. Teacher’s skill at writing different types of tests.
vi. The subject being tested.
vii. The level of examinees.
1. Selected response items
Multiple choice tests (MCTs) are the most popular. A multiple choice item has two parts: the

S
stem and the alternatives.

ES
Stem- a question or an incomplete statement.
Alternatives- these are possible answers.
Two types of alternatives:
Answer- the correct alternative

N
Distracters- incorrect alternatives (they serve to ‘distract’ students who actually do not know

PI
the answer).
Forms of MCT
a. Direct-question type- the stem is a direct question. E.g. Which is the longest river in
AP
Africa?
b. Incomplete sentence type- the stem is an incomplete statement. The longest river in
Africal is------------------------------
H
Guidelines for writing MCT
a. The item should contain all the information necessary to understand the problem/question
AH

b. Provide between three and five alternatives. This reduces the chance for guessing the
correct answer.
c. Keep the alternatives brief and arrange them in an order that promotes efficient scanning.
d. Avoid negatively stated stems as much possible
R

e. There should be only one correct alternative

f. Stems should not have information that suggests the answer
D

g. All alternatives should be grammatically correct relative to the stem

h. No item should reveal the answer to another item.

i. All distracters should appear plausible.
SA

j. Randomly place the correct answer in different positions.

k. Minimize use of ‘none of the above’, and avoid using ‘all of the above’.
l. Limit use of always and never in the alternatives.
m. Avoid using the exact textbook wording.
n. Organize the test in a logical manner.
o. Carefully consider the number of items in your test.
Advantages of multiple-choice items.
a. Can be used to assess achievement in a wide range of areas.
b. Marking is easy, objective, and reliable.
c. They efficiently wide range of content (wide content coverage).
d. Distracters provide diagnostic information.
e. Can assess both simple and complex learning outcomes.
Disadvantages of multiple-choice items
a. Time consuming to construct

Page 44 of 48

0759474478
CONGRESSLADY SCHOOL OF EDUCATION
b. Not effective for measuring some educational objectives. E.g. creativity, organization,
problem solving, or verbal skills.
c. Scores can be influenced by reading ability
d. May encourage rote learning and guesswork
e. Does not allow students freedom of expression
2. Constructed-response items.
Short answer and essay questions are the most common.
Essay items- test items that pose a question for the student to respond to in a written
format.
Types of essay items
a. Restricted-response items- highly structured and clearly specify the form and scope
of student’s response. They typically require students to list, define, describe, or give

S
reasons. They may specify time and length of the response. E.g. list the three types of

ES
muscle tissue and state four functions of each.
b. Extended-response items- provide more freedom and flexibility in how students can
respond to the item. They do no limit the response it terms of form and scope. They
typically require students to compose, summarize, formulate, compare, interpret, e.t.c.

N
E.g.

PI
Summarize the major forms of validity.
They provide less structure and this promotes creativity, integration, organization,
analysis and synthesis of the information.
AP
Advantages of essay tests
i. They take less time to prepare.
ii. Effective for measuring higher-level cognitive skills.
H
iii. They largely eliminate guessing.
iv. They effectively assess knowledge of content, grammar, and writing ability and
AH

style.
Disadvantages
i. Tedious to mark and the essays are difficult to score in a reliable way.
ii. Scoring of essays may be influenced by irrelevant characteristics like students’
R

academic history, handwriting, grammatical errors, bluffing etc

iii. May only sample limited content due to the time required to answer.
D

Guidelines for writing effective essay questions

a. Ask questions in a simple and straightforward manner.

b. Consider carefully the amount of time students will need to answer the essay
SA

questions.
c. Let students respond to the same set of items.
d. Use more restricted-response items than extended-response items.
e. Structure and clarify the task.
f. Limit the use of essay to educational objectives that cannot be assessed using
selected-response items.

ITEM ANALYSIS

Page 45 of 48

0759474478
CONGRESSLADY SCHOOL OF EDUCATION
This is a set of statistics that can be computed for each test item to assess its quality. The
choice of item analysis method depends on the purpose of the test and the person designing
the analysis.
Purpose of item analysis.
i. Item analysis data provides a basis for class discussion of the test results.
ii. Item analysis data provides a basis for remedial work focused on students’ areas
of weakness.
iii. It can help improve classroom instruction.
iv. It increases teacher’s understanding and skills in test construction
v. It helps us understand why a test has specific levels of reliability and validity and
why test scores can be used to predict some criteria and not others.
vi. It may also suggest ways of improving the measurement characteristics of a test.

S
Basic questions in item analysis.

ES
a. How many people chose each response? Recall that there is only one correct answer.
The incorrect responses are called distracters. Therefore, examining the total pattern
of responses to each item of a test is referred to as distracter analysis.
b. How many people answered the item correctly?/ were the items of appropriate

N
difficulty? To answer this question, we conduct an analysis of item difficulty.

PI
c. Are responses to the item related to the responses to other items in the test?
Answering this question entails an analysis of item discrimination.
Item difficulty
AP
In achievement tests, item difficulty also called p Value, is commonly measured as the
percentage or proportion of students who answered an item correctly. It can be expressed
as a decimal, percentage, or fraction. It ranges from 0-1 (for decimal or fraction) or 0-
H
100% (for percentage).
Formular for p.
AH

P value= number of students answering an item correctly/ number of test-takers.

𝑅𝑢 + 𝑅𝑙
𝑝=( ) 𝑥100
𝑁𝑢 + 𝑁𝑙
R

Where: Ru= number of students in the upper group who got the item right
Rl=number of students in the lower group who got the item right
D

Nu=Number of students in the upper group who actually attempted the

question
Nl= number of students in the lower group who actually attempted the
SA

question.
How to interpret Item difficulty of Test Scores.
P=0 indicates that all students chose the wrong answers.
P=1 indicates that everyone got the correct answer.
An item with a p value of 0 or 1 is undesirable since the item would not show individual
differences among the learners and are of no value from a measurement perspective.
The optimal difficulty index is 0.50. Any items with p ranging from .30 to .70 are
considered good in a MCT.
The item discrimination index
Item discrimination is abbreviated as D. It is simply the difference between the number of
high achieving students who got an item right and the number of low achieving students who
got the item right. We can compute the discrimination index by subtracting the proportion of
students who got the item correct in the lower group (RL/NL) from the proportion of students
who got the item correct in the upper group (RU/NU)
Thus:

Page 46 of 48

0759474478
CONGRESSLADY SCHOOL OF EDUCATION
𝑅𝑢 𝑅𝑙
𝐷= −
𝑁𝑢 𝑁𝑙

How to interpret D.
D is usually expressed as a decimal and it can range from -1.0 to +1.0.
If it is positive the item has positive discrimination. i.e. a large proportion of more
knowledgeable students got the item right than the poor students.
If it is 0 the item has zero discrimination. This is possible when test item is too difficult, too
easy, or ambiguous.
If it is negative the item has negative discrimination. i.e. a large proportion of poor students got
the item right than the more knowledgeable students.
We can interpret the quality of test items following these guidelines:

S
D Quality of item

ES
0.40 and above excellent
0.30-0.39 good
0.20-0.29 fair
0.00-0.19 poor

N
Negative values poor

PI
Therefore, for items to be considered of good quality, they must have at least a D of 0.30.
However, items with D of .020 and above are acceptable in classroom tests for various reasons.
Items with negative discrimination values should be reviewed. A negative discrimination
AP
value, like a low difficulty value, may occur as a result of several possible causes: a miskeyed
item, or an item that is ambiguous.
H
Example:
A group of 140 examinees responded as shown below in a test item:
AH

A B C D Omit Total
Upper 3 57 6 4 0 70
group
Lower 12 34 14 4 6 70
R

group
D

Find p and D.
N

P= 57/70 +34/64= 91/134X 100= 67/91%.

D= 57/70- 34/64= .814- .531= 0.283
SA

Evaluating the effectiveness of distracters

We determine how well the distracters function by inspection. Generally, a good
distracter attracts more students from the lower group than the upper group.
Poor distracters attract more students from the upper group than the lower group. This
could be due to ambiguity in the statement of the item.
Distracters that attract no one (unpopular distracters) are also ineffective/poor.
Report cards
The report card is a summary of teachers’ professional judgements about a student’s
achievement which acts as a standard method of communicating students’ academic
progress and grades to parents.
Functions of a report card
a. Details achievement of curriculum expectations for each semester or term
b. Reports the student’s overall development of learning skills and work habits
c. Includes next steps for future student learning

Page 47 of 48

0759474478
CONGRESSLADY SCHOOL OF EDUCATION
d. Provides an opportunity for parents/guardians to comment on student
achievement, student goals, and shared responsibilities of home and school to
support improved student learning
e. It gives students descriptive feedback in comments,
f. It provides guidance to help students improve their learning.
Types of judgements on report cards
i. Letter grades e.g. A,B,C,D,E, F (sometimes allowing pluses and minuses).
ii. Numerical scores per subject e.g. 91 mathematics; 70 english etc
iii. Pass/fail category in one or more subjects.
iv. Checklists indicating skills and objective that students have attained (mainly
used for kindergartens and nursery school).
v. Categories for affective characteristics such as effort, cooperation, and other

S
appropriate and inappropriate behaviours.

ES
Writing effective report card comments.
i. Effective Report Card comments provide specific details about a student’s
achievement of the overall curriculum expectations.
ii. Report card comments should provide students and parents with personalized, clear,

N
precise, and meaningful feedback.

PI
iii. Effective comments are written in clear and simple language, using vocabulary that is
easily understood by both students and parents, rather than educational terminology
taken directly from the curriculum documents, and conveys a positive tone.
AP
Comments should be error-free, and avoid slang or colloquial language.
H
AH
R
D
N
SA

Page 48 of 48

0759474478
CONGRESSLADY SCHOOL OF EDUCATION

Kenyatta University Notes EPS 400 Module Educational Statistics and Evaluation
No ratings yet
Kenyatta University Notes EPS 400 Module Educational Statistics and Evaluation
122 pages
Introduction & Basic Concepts in Statistics
100% (2)
Introduction & Basic Concepts in Statistics
36 pages
Statistics Books With Answers
80% (10)
Statistics Books With Answers
210 pages
A Quick Approach To Statistics by G.R.pashA
75% (12)
A Quick Approach To Statistics by G.R.pashA
210 pages
3rd Grade Ela Review
100% (1)
3rd Grade Ela Review
4 pages
Eps 400 New Notes Dec 15-1
No ratings yet
Eps 400 New Notes Dec 15-1
47 pages
Eps 310-400 Na Mairinai Philipo-1
No ratings yet
Eps 310-400 Na Mairinai Philipo-1
48 pages
PGDE 742
No ratings yet
PGDE 742
16 pages
EPS NOTES
No ratings yet
EPS NOTES
77 pages
Unit 1 mean And SD
No ratings yet
Unit 1 mean And SD
45 pages
MODULE 1-Basic Concepts in Statistics
No ratings yet
MODULE 1-Basic Concepts in Statistics
7 pages
LU 1 Basic Concepts in Statistics and Research Designs
No ratings yet
LU 1 Basic Concepts in Statistics and Research Designs
34 pages
Week 1 - Intro
No ratings yet
Week 1 - Intro
16 pages
Ful Edu 311 Lecture Note1
No ratings yet
Ful Edu 311 Lecture Note1
16 pages
Lesson 1 Basic Concepts of Statistics
No ratings yet
Lesson 1 Basic Concepts of Statistics
9 pages
Chapter-1 Data analysis
No ratings yet
Chapter-1 Data analysis
14 pages
Basic Statistics For Testing
No ratings yet
Basic Statistics For Testing
58 pages
Lecture Note Basic Statistics
No ratings yet
Lecture Note Basic Statistics
73 pages
Basic Concepts, Methods of Data Collection and Presentation
No ratings yet
Basic Concepts, Methods of Data Collection and Presentation
17 pages
new Chapter I (1)
No ratings yet
new Chapter I (1)
51 pages
Statistical Processes Are Usually Carried Out As A Part of Decision Making Procedures
No ratings yet
Statistical Processes Are Usually Carried Out As A Part of Decision Making Procedures
9 pages
Introduction To Statistics
No ratings yet
Introduction To Statistics
15 pages
Presentation 1
No ratings yet
Presentation 1
9 pages
ASSESSMENT and EVALUATION
No ratings yet
ASSESSMENT and EVALUATION
4 pages
Assessment in Learning 2 Chapter 8
No ratings yet
Assessment in Learning 2 Chapter 8
4 pages
3 Assumptions Stat
No ratings yet
3 Assumptions Stat
66 pages
1.2the Nature of Sta & Proba
No ratings yet
1.2the Nature of Sta & Proba
25 pages
Psych Stat
No ratings yet
Psych Stat
7 pages
Unit 3 Measurement Scale
No ratings yet
Unit 3 Measurement Scale
59 pages
EPSC 123
No ratings yet
EPSC 123
38 pages
merged_presentation_8614
No ratings yet
merged_presentation_8614
290 pages
01 SASA Lesson 1.1 Introduction
No ratings yet
01 SASA Lesson 1.1 Introduction
38 pages
chapter one probability and Statistics
No ratings yet
chapter one probability and Statistics
57 pages
Module 3: Principles of Psychological Testing: Central Luzon State University
No ratings yet
Module 3: Principles of Psychological Testing: Central Luzon State University
14 pages
AGROECONO Ch_1 (1)
No ratings yet
AGROECONO Ch_1 (1)
22 pages
Module 0. Review on Statistics
No ratings yet
Module 0. Review on Statistics
76 pages
Statistics Notes Angelux
No ratings yet
Statistics Notes Angelux
55 pages
Interpreting Test Scores
No ratings yet
Interpreting Test Scores
26 pages
The Statistical Tool PDF
No ratings yet
The Statistical Tool PDF
31 pages
CH 1
No ratings yet
CH 1
24 pages
Unit 8..8602 PDF
No ratings yet
Unit 8..8602 PDF
47 pages
Chapter 1 the Nature of Probability and Statistics Updated Spring 2023-2024
No ratings yet
Chapter 1 the Nature of Probability and Statistics Updated Spring 2023-2024
38 pages
stats.2021.u1
No ratings yet
stats.2021.u1
31 pages
Unit 3
No ratings yet
Unit 3
25 pages
Lesson Word
No ratings yet
Lesson Word
6 pages
Statistical Treatment
No ratings yet
Statistical Treatment
49 pages
File tổng hợp kiến thức SB
No ratings yet
File tổng hợp kiến thức SB
148 pages
Statistics
100% (5)
Statistics
272 pages
Note for Int to Statistics
No ratings yet
Note for Int to Statistics
24 pages
Ugwuoke Victor Arinze
No ratings yet
Ugwuoke Victor Arinze
8 pages
QR lectures week 1
No ratings yet
QR lectures week 1
21 pages
KEnyatta University Notes EPS 400 Education Statistics PDF
100% (1)
KEnyatta University Notes EPS 400 Education Statistics PDF
122 pages
Introduction Statistics
100% (1)
Introduction Statistics
23 pages
Business Statitics New
No ratings yet
Business Statitics New
72 pages
Asm Long
No ratings yet
Asm Long
15 pages
CH 1, 2 & 3for MIS
No ratings yet
CH 1, 2 & 3for MIS
31 pages
Chapter one
No ratings yet
Chapter one
92 pages
Educational-statistics_Basic-Terms_Sampling_Data-Gathering
No ratings yet
Educational-statistics_Basic-Terms_Sampling_Data-Gathering
21 pages
Math 1f - All Lessons
No ratings yet
Math 1f - All Lessons
81 pages
Elementary Statistics
From Everand
Elementary Statistics
jay prakash Maheshwari
5/5 (1)
Mastering Research Process
From Everand
Mastering Research Process
IRENE JEBET
No ratings yet
Wordly Wise Lesson Plan
No ratings yet
Wordly Wise Lesson Plan
3 pages
Tuv Rheinland Cosmetics Testing Faq en
No ratings yet
Tuv Rheinland Cosmetics Testing Faq en
4 pages
MCQs in Enrichment Course - Bleachung - 114453
No ratings yet
MCQs in Enrichment Course - Bleachung - 114453
4 pages
Hypnosis and Relaxation
No ratings yet
Hypnosis and Relaxation
6 pages
15m Bridge BoQ
No ratings yet
15m Bridge BoQ
9 pages
The Sweet Voice of Salvation': Febc & Its Mission
No ratings yet
The Sweet Voice of Salvation': Febc & Its Mission
3 pages
Booklist
No ratings yet
Booklist
2 pages
Transcript For BUSHIRI
No ratings yet
Transcript For BUSHIRI
1 page
Fire Protection in Conveyor Belts
No ratings yet
Fire Protection in Conveyor Belts
11 pages
Carta Servicio Frenos 938g
No ratings yet
Carta Servicio Frenos 938g
3 pages
Case 2 - Festival Cruise Line (Group 3)
No ratings yet
Case 2 - Festival Cruise Line (Group 3)
17 pages
Reference List Waste Fuels Instalations
No ratings yet
Reference List Waste Fuels Instalations
9 pages
Different Types of Wounds & Bleeding: Bruise (Contusion) Abrasion Cut (Incision) Laceration Puncture Tear (Avulsion)
No ratings yet
Different Types of Wounds & Bleeding: Bruise (Contusion) Abrasion Cut (Incision) Laceration Puncture Tear (Avulsion)
1 page
202404020442-NABL-225-doc
No ratings yet
202404020442-NABL-225-doc
19 pages
VR
No ratings yet
VR
25 pages
BCH 222 Neurobiochemistry: The Morphology and Composition of A Neuron
No ratings yet
BCH 222 Neurobiochemistry: The Morphology and Composition of A Neuron
24 pages
Photoelectric Guide
No ratings yet
Photoelectric Guide
3 pages
2019 - 2.8. Termička Obrada Osnovnog Materijala I Zavrenih Spojeva
No ratings yet
2019 - 2.8. Termička Obrada Osnovnog Materijala I Zavrenih Spojeva
27 pages
XFLR5 Mode Measurements
No ratings yet
XFLR5 Mode Measurements
17 pages
29-100 Pakistan Studies Mcqs With Answers in Urdu PDF Download For Lecturer, Educators, CSS, PMS, FPSC, NTS and Others
No ratings yet
29-100 Pakistan Studies Mcqs With Answers in Urdu PDF Download For Lecturer, Educators, CSS, PMS, FPSC, NTS and Others
12 pages
'the Glove and the Lions' by Leigh Hunt
No ratings yet
'the Glove and the Lions' by Leigh Hunt
2 pages
(Ebook) Social Anthropology by E.E. Evans-Pritchard ISBN 9780415330305, 0415330300 download
100% (1)
(Ebook) Social Anthropology by E.E. Evans-Pritchard ISBN 9780415330305, 0415330300 download
47 pages
Quantum QT 710 Table
No ratings yet
Quantum QT 710 Table
60 pages
Tianzhi (1938 1978) Air-Cooled VW Beetle Parts 2020
No ratings yet
Tianzhi (1938 1978) Air-Cooled VW Beetle Parts 2020
26 pages
ARAMCO Pre-Qualification Questionnaire (Contractors) - EMPTY
No ratings yet
ARAMCO Pre-Qualification Questionnaire (Contractors) - EMPTY
12 pages
SENTENCE PATTERN Noun Linking Verb Adverbial
No ratings yet
SENTENCE PATTERN Noun Linking Verb Adverbial
4 pages
1 - Acid Base 2021 Notes
No ratings yet
1 - Acid Base 2021 Notes
32 pages
India Debt To GDP Ratio 1960-2024 MacroTrends
No ratings yet
India Debt To GDP Ratio 1960-2024 MacroTrends
1 page
A Comparison Study of KGD, PKN and A Modified P3D Model.
No ratings yet
A Comparison Study of KGD, PKN and A Modified P3D Model.
7 pages