Eps 400 Eps 310
Eps 400 Eps 310
S
1.2 KEY CONCEPTS
A. Statistics- is the science of conducting studies to collect, organize, summarize, analyse,
ES
and interpret information/data.
Types of statistics
N
i) Descriptive statistics- techniques that organize, summarize, and present a set
PI
of data in attempts to describe a situation.
AP
Elements of Descriptive stats.
a. The population or sample of interest
b. One or more variables to be investigated
H
Page 1 of 48
0759474478
CONGRESSLADY SCHOOL OF EDUCATION
f) Hypothesis- A guess about what is likely to happen.
g) Hypothesis testing-A decision making process for evaluating claims about a
population based on information obtained from samples.
B. Data- are the values (measurements or observations) that variables can assume.
i) Data set- A collection of data values.
ii) Data array- an ordered data set.
iii) Datum/ data value- each value in the data set.
iv) Raw data/ raw scores- data collected in their original form/the scores
S
obtained from tests.
ES
Data can either be quantitative or qualitative. Quantitative data consist of
measurements that are recorded using numbers while qualitative data are
N
measurements that cannot be measured on a natural numerical scale but can only be
PI
classified into categories. AP
C. Variable- is a characteristic that can change or assume different values. E.g. height,
weight, intelligence, age, motivation, identity, e.t.c.
H
Types of variables.
i) Random variables- Variables whose values are determined by chance.
AH
iii) Psychological Variables- These are intangible variables that are located in the
D
Page 2 of 48
0759474478
CONGRESSLADY SCHOOL OF EDUCATION
points on a measurement scale. It is obtained by measuring and often
includes fractions and decimals. E.g. Temperature, time, length etc.
VI) Independent variable- in an experiment, it is the variable that is manipulated by the
researcher in order to study its effects on another variable.
VII) Dependent variable- it is the variable that is observed or measured in an experiment
in order to record the effects of the independent variable.
D. Constant- A variable that does not change or that assumes only one value.
E. Measurement- This is the process of assigning numbers to individuals or objects in a
S
systematic way to represent their properties.
ES
F. Evaluation- This is the process of making value judgements based on measurements.
G. Test- A systematic procedure for observing a sample of behaviour/psychological
N
variable. A set of questions to which students are expected to supply answer.
PI
H. Test Item- a specific stimulus to which a person responds overtly; a specific question to
which a person supplies an answer in a test. AP
I. Examination- A collection of tests which measure different traits of the individual in
order to facilitate decision making.
H
2. Knowledge of statistics enables teachers to plan and use appropriate procedures for
D
Page 3 of 48
0759474478
CONGRESSLADY SCHOOL OF EDUCATION
LEVELS/SCALES OF MEASUREMENT.
Scales of measurement refer to the ways in which variables/numbers are defined, categorized,
counted, or measured. They present the set of rules upon which measurement is based.
The four types of measurement scales are: Nominal, ordinal, interval, and ratio.
S
represent properties and can only be used for identification, naming or classifying
ES
variables. You can only count the numbers at this level.
Examples.
N
PI
Numbers on the back of football players. 1, 2, 10, 32 etc.
Areas of study.
R
Nationality.
D
Data measured at this level can be put into categories and these categories can be ordered,
SA
or ranked. In this scale, numbers distinguish between individuals and give merit. We can
count the numbers and show the direction of their difference using the concept of ‘greater’
or ‘less’ than or by giving an ordered series.
However, the numbers are ordered without regard for differences in the distance between
the scores. i.e. An ordinal scale can tell us who is the first, second, and third. However, it
cannot tell us whether the distance between the first- and the second ranked scores is the
same as the distance between the second- and third- ranked scores.
Examples.
0759474478
CONGRESSLADY SCHOOL OF EDUCATION
Classifying people according to their body size e.g. small, medium, or large.
The rating scale that uses descriptive words, poor, good, excellent etc.
This ranks data and uses equal interval between any two units of measurement. We can
therefore tell precise differences between scores. However, zero is arbitrary point. i.e. it
only represents another point of measurement in this scale and it is therefore not a
meaningful/true zero. It does not represent absence of the quality being measured or the
S
absolute lowest point in the scale. It is just a point on the scale and there are numbers
ES
above and below it.
N
With the numbers obtained on this scale, we can: count, use the concept of greater or less
PI
than; and indicate the precise difference between any two measurements. However, we
cannot make comparative statements about the scores/numbers on an interval scales.
AP
Examples
H
Measurement of:
AH
- Scores on an IQ test.
- Temperature.
R
- Sea level. E.g. Dead Sea is the lowest point on earth at 400 Metres below the seal level.
D
It is similar to the interval scale in that it also represents quantity and has equal intervals
between units of measurement. In addition, this scale has an absolute/true/meaningful
zero. Therefore, you cannot have negative scores on this scale. You can also form
meaningful ratios/comparisons (fractions) between the scores. E.g. Karo (6kg) is twice as
heavy as James (3kg).
NB: Ratio scales are not common in Psychology because many psychological variables
do not permit the measurement of an absolute zero.
Examples.
Page 5 of 48
0759474478
CONGRESSLADY SCHOOL OF EDUCATION
Height, Weight, Time, GPA, age, number of children, salary, number of customers
served, e.t.c.
S
1.3 Important Statistical Notation.
ES
Notation refers to the use of letters and symbols to represent statistical concepts and
ideas.
N
PI
Below is the common statistical notation
The letter X- identifies individual scores for a particular variable. When we have multiple
AP
scores per subject, we use X and Y.
H
When used to head a column, it represents a set of scores.
AH
N to represent the total number of scores for a population and n to represent the number
R
of scores in a sample.
D
Summing a set of values has its own notation called summation notation. The Greek
SA
letter sigma, Σ, is used to indicate ‘the sum of’. The expression ΣX, is read’ the sum of
scores’ and it means add all the scores for variable X. ΣY means add all the scores for
variable Y.
To use and interpret summation notation, you must follow the basic order of operations
required for all mathematical calculation. Below is a list showing the correct order for
performing mathematical operations.
Page 6 of 48
0759474478
CONGRESSLADY SCHOOL OF EDUCATION
2. Squaring or raising to other exponents is done second.
3. Multiplying, and dividing are done third, and should be completed in order from left
to right.
4. Summation with the Σ notation is done next.
5. Any additional adding and subtracting is done last and should be completed in order
from left to right.
Example 1.1: Given the following scores, 8, 9, 4, 3, and 1; compute ΣX; ΣX2; (ΣX)2;
Σ(X-1) and Σ(X-1)2
S
ES
We can use a computational table to help us demonstrate the calculations:
X X2 (X-1) (X-1)2
N
8 64 7 49
PI
9 81 8 AP 64
4 16 3 9
3 9 2 4
1 1 0 0
H
A frequency distribution is an organized tabulation of raw data using the scores and
frequencies.
It is used for categorical data (nominal or ordinal data) or when we have a very small
range of data.
The regular frequency distribution comprises of two columns. The first column lists the
scores (X values), in one column, from the highest to the lowest score. Besides each X
Page 7 of 48
0759474478
CONGRESSLADY SCHOOL OF EDUCATION
value, in a second column, is the frequency- the number of times that particular score
occurred in the data set.
S
ES
ii. List the scores in column A starting with the highest to the lowest.
N
iii. Tally/sort the data and place the results in column B.
PI
iv. Count the tallies and place the numerical frequency (f) in column C. Always
indicate the Σf at the end of this column. Note: the sum of frequency is the same
AP
as the total number of scores in a data set. Thus Σf=N.
v. Find the percentage of values in each class by using the formula:
H
%=f/n x 100%. Where f=frequency of the class, n= total number of values.
NB: sometimes, we may be required to compute ΣX for a set of scores that have
AH
been organized into a FD. To do that we multiply each X with its respective f and
then add these values. This ensures that we capture all the scores in a data set.
R
As a general rule, a frequency distribution should have a maximum of 10-15 rows to keep
N
it simple. When scores cover a range wider than this, then it is advisable to use a grouped
SA
FD.
We can get a relatively simple organization of the data by listing groups of scores in the table
rather than individual scores. The class intervals are indicated in one column from the highest
to the lowest and their respective frequencies are given in an adjacent column.
Page 8 of 48
0759474478
CONGRESSLADY SCHOOL OF EDUCATION
III. Apparent limits/class limits= the smallest and largest values that can be included in a
class. The lower class limit is the smallest data value that can be included in a class
while the upper class limit is the largest data value that can be included in a class.
The class limits have gaps between them. E.g. There is a gap between 30 and 31.
IV. Real limits/class boundaries- numbers with an additional decimal value ending in 5
which are used to separate the classes so that there are no gaps in the FD. You get the
lower boundary (real limit) by subtracting .5 from the lower class limit and by adding
.5 to the upper class limit to get the upper boundary.
S
V. Class width= lower (or upper) class limit of a class minus the lower (upper) class limit
ES
of the class below it.
VI. Mid-point= the score that is at the centre of a class.
N
VII. Cumulative frequencies (CF)= the total data values accumulated to and including a
PI
specific class. CFB= the total scores up to and below a particular class. CFA= the
total scores up to and above a particular class.
AP
Basic rules for construction of a grouped FD.
H
I. There should be between 5 and 20 classes.
II. It is preferable a class width as an odd number. This ensures that the mid-point has the
AH
IV. The classes must be continuous. There should be no gaps in the FD. Even when a
D
V. The classes must be exhaustive. i.e. all the data values in the distribution must be
captured by the classes.
SA
Page 9 of 48
0759474478
CONGRESSLADY SCHOOL OF EDUCATION
62,50,36,30,60,48,60,75,50,25,40,90,45,54,60,78,85,32,36,80, 51,53,54.
b. Determine the possible class interval to use.
Lowest possible class interval= R/15 (round it up). HPCI= R/12. (Round it down).
Always decide on an odd numbered CI.
c. Identify the starting point (the highest class in your distribution). NB: For all classes,
the number on your left-hand side should be a multiple of the CI. It must contain the
highest value. Give the other classes. NB: The lowest interval must contain the least
data value. Distribute the data into the classes.
S
d. Give the real limits and mid-points
ES
e. Tally the data
f. Find the numerical frequencies from the tallies. NB: Always find the sum of frequency
N
at the bottom of this column.
PI
g. Compute the cumulative frequencies (CFB and CFA). Your GFD should have the
following columns: AP
Class Real Mid- Tally Frequency CFB CFA
Limits Point
H
AH
d. To enable the researcher to draw charts and graphs for data presentation.
e. To enable the reader to compare different data sets.
SA
Page 10 of 48
0759474478
CONGRESSLADY SCHOOL OF EDUCATION
Common graphical procedures
i) Bar graph- a graph that displays data using bars of different heights. The height of
each bar corresponds to the frequency of the respective score or class. In this graph, a
space is left between adjacent bars. It is used for scores measured on a nominal or
ordinal scale.
S
Class Number of students
Freshers 12000
ES
Sophomore 11000
Third year 10000
N
Fourth year 9000
PI
Number of students
AP
14000
12000
H
10000
8000
AH
2000
D
0
First year Second Third year Fourth
N
year Year
SA
Page 11 of 48
0759474478
CONGRESSLADY SCHOOL OF EDUCATION
3. Step 3
Using the frequencies as the heights, draw vertical bars for each class.
iii) Frequency polygon- a graph that displays the data by using lines that connect points
plotted for the frequencies at the midpoints of the classes. The frequencies are
represented by the heights of the points.
Steps in the construction of frequency polygon
1. Step 1:
Find the midpoint of each class.
S
2. Step 2:
Draw the x and y axes. Label the x axis with the midpoint of each class, and then use a
ES
suitable scale on the y axis for the frequencies.
3. Step 3:
Using the midpoints for the x values and the frequencies as the y values, plot the points.
N
4. Step 4:
PI
Join the polygon and close the polygon by drawing a line back to the x axis at the
beginning and end of the graph, at the same distance that the previous and next midpoints
AP
would be located.
iv) Cumulative frequency polygon/ The smooth curve/ the ogive- This is a graph that
represents the cumulative frequencies for the classes in a frequency distribution. We
H
2. Step 2:
D
Draw the x and y axes. Label the x axis with the class boundaries. NB: For the
N
greater than ogive. plot each upper class boundary and the corresponding
SA
cumulative frequency (since the CFB represent the number of data values
accumulated up to the upper boundary of each class)., plot the lower class boundaries
Use an appropriate scale for the y axis to represent the cumulative
frequencies. Do not label the y axis with the numbers in the cumulative frequency
column
3. Starting with the first class boundary, connect adjacent points with a line segment.
Always close the graph to the class boundary on the x axis at the beginning and at the
end of the graph.
Class Frequency
Page 12 of 48
0759474478
CONGRESSLADY SCHOOL OF EDUCATION
10-14 4
15-19 7
20-24 12
25-29 17
30-34 10
S
ES
Distribution Shapes
N
The shape of a distribution helps us to describe data.
PI
It also determines the appropriate statistical methods to use when analyzing data.
AP
How to analyse distribution shape.
The normal distribution- this is a bell-shaped distribution which has a single peak and
R
tapers off at either end. It is approximately symmetric; i.e., it is roughly the same on both
sides of a line running through the center. It is obtained in a heterogenous class. i.e. where
D
only a few students are very weak, a few very bright, and most of them average. In this
N
distribution, the scores are evenly distributed on both sides of the mean.
SA
They include:
Page 13 of 48
0759474478
CONGRESSLADY SCHOOL OF EDUCATION
A distribution whose peak is to the left and the data values taper/tail off to the
right. To normalize it, you push the tail to the right. This distribution implies that
majority of the students got low marks. This may arise when:
A distribution whose peak is to the right and the data values taper off/tail off to the
left. NB: to normalize this distribution, you push the tail to the left. This
S
distribution indicates that majority of the learners got high marks in the test.
ES
This may arise when:
N
- Cheating.
- The learners are very bright e.g. Alliance Boys.
PI
b) Bimodal distributions
A bimodal distribution have two peaks of the same height. This can be found in
AP
classes with students from different backgrounds.
c) Kurtosis
H
This refers to the peakedness of a graph.
Types of kurtosis
i) Mesokurtic- this is cone-shaped distribution. i.e. there is a gradual
AH
decrease of the frequencies. i.e. Majority of the scores were near the mean.
It’s found in a homogenous class.
N
flat or rectangular. In it, the frequencies of the scores are the same. This
could arise if the class is organized in groups (group work).
Central tendency is a statistical measure that determines a single value which acts as the most
typical score or the best representative score for the entire data set.
Page 14 of 48
0759474478
CONGRESSLADY SCHOOL OF EDUCATION
b) compare two or more data sets.
The three commonly used measures of central tendency are: mean, median, and mode.
a) The mode
- In a frequency distribution, the mode is the score corresponding to the peak or the high
S
- For grouped frequency distribution, mode is the mid-point of the class with the highest
ES
frequency.
NB:
N
-
PI
Can be determined for data measured on any scale of measurement: nominal, ordinal,
interval, or ratio.
AP
Possible scenarios:
H
i. When one value in the distribution has the highest frequency, that value is the mode.
AH
ii. When two adjacent scores have the same frequency and the two have the highest
R
Mode is found by finding the average of the two adjacent scores with the highest
N
iii. When in a group of scores, two non-adjacent scores have the same frequency, and this
common frequency is greater than that of any other score in the distribution, then the
Page 15 of 48
0759474478
CONGRESSLADY SCHOOL OF EDUCATION
iv. When more than two non-adjacent scores occur more frequently than any other score
in the distribution and with the same frequency, the distribution is said to be
multimodal.
v. All the scores in a distribution may occur with the same frequency e.g. 1,2,3,4,5,6, the
distribution is said to have no mode. NB: You don’t say that the mode is 0. Since in
some data sets, 0 is an actual value. You simply say that the distribution has no mode.
S
Limitations of the mode.
ES
i. It is not always unique. A data set can have more than one mode, or have no mode at all.
ii. Mode is greatly affected by chance and has little or no mathematical usefulness.
N
PI
Strengths of the mode.
i. It is the only measure of central tendency that can be used for data measured on a nominal
AP
scale.
H
ii. It is used when the most typical score is desired. Hence it is very useful in analyzing
AH
qualitative data.
b) Median (MD)
N
The middle point in a data set that has been ordered. It is only computed for data that can be
SA
ordered in ranks. i.e.ordinal, interval, and ratio data. We usually find the median by a simple
counting procedure:
a. When N is odd, the median will be the middle value. Steps: a) List the values in
order, and b) select the middle value as median. Example: the median for this data
Page 16 of 48
0759474478
CONGRESSLADY SCHOOL OF EDUCATION
b. When N is even, the median will fall between two data values. Steps a) List the
values in order, b) Locate the middle two scores, c) add the two values and divide
and 25. To obtain it, we just get the average of the two scores thus: 24+25/2= 49/2=24.5.
S
E.g. 6,7,8,11,11,11,12,13,14. Here, N=9. You should circle the repeated value. If it
ES
separates the data into two equal halves, then the value is the median. If it does not
separate the data into two equal parts, E.g. 6,7,8,11,11,11,12,13,14,15. Then we
N
PI
should find the median using the exact median method/frequency distribution data
method.
AP
II. Median for frequency distribution data.
H
a. One of the simplest methods for finding the median is to draw a histogram. The
AH
goal is to draw a vertical line through the distribution so that exacly half of the boxes
are on either side of the line. The median is that score which is at the vertical line.
R
𝑁
− 𝑐𝑓𝑏
𝑀𝐷 = 𝐿 + ( 2 )
N
𝑓𝑤
SA
Where: L= the lower real limit of the class containing the median score.
Cfb= cumulative frequency for the class below the one containing the median score.
a. It is relatively unaffected by outliers. It’s the best MOCT for data containing outliers.
Page 17 of 48
0759474478
CONGRESSLADY SCHOOL OF EDUCATION
b. The median tends to stay at the centre of the distribution even when the distribution is
Disadvantage.
S
- The arithmetic average of all the scores in a data set. It is calculated for numerical values,
ES
usually measured on an interval or ratio scale. Unlike mode and median, it involves all the
scores in a distribution.
N
PI
To obtain the mean, sum up all scores and then divide by the total number of scores in the set
Where:
M= mean of the scores
AH
NB: The mean is the score that each individual receives when the total is divided equally among
SA
all N individuals.
Ungrouped:
∑𝑓𝑥
𝑋=
∑𝑓
Grouped:
∑𝑓𝑥
𝑋=
∑𝑓
Page 18 of 48
0759474478
CONGRESSLADY SCHOOL OF EDUCATION
where x is the mid-point.
Mean of means
(𝑥 𝑎 × 𝑛 𝑎 ) + (𝑥 𝑏 × 𝑛 𝑏 ) + (𝑥 𝑐 × 𝑛 𝑐 ).
𝑋=
𝑛𝑎 + 𝑛𝑏 + 𝑛𝑐
Properties of the Mean
S
ES
c) The mean serves as the balance point of the distribution because the sum of the
distances below the mean is exactly equal to the sum of the distances above the mean.
N
This makes the mean to be most commonly preferred measure of central tendency.
PI
This property makes the mean to have the following four important characteristics.
AP
a. Changing the value of any score in a data set, will always change the mean.
E.g. 9,3,8,7,and5. N=5. ∑X=32. M=6.4. Suppose X=3 is changed to 6. New mean will
AH
be M=7.
e.g. If we add 3 to all the values in our example data. We would get:
X X+3
9 12
3 6
8 11
7 10
5 8
∑X=32 ∑X=47
Page 19 of 48
0759474478
CONGRESSLADY SCHOOL OF EDUCATION
Ẍ=6.4 Ẍ=9.4
Note: The new mean is 9.4 which is the same as 6.4 +3.
NB: The sum of squared deviations from the mean is always lesser than deviations from any
S
● Easy to compute.
ES
● Easy to work with and use in further analaysis.
N
PI
Disadvantages of the mean
a) It is extremely sensitive to outliers (a few scores that deviate extremely from the other
AP
scores in a set). Outliers tend to pull the mean to the extremes of the distribution. This
H
is why the mean is located near the tails of skewed distributions.
AH
The three measures of central tendency systematically relate to one another in a distribution.
D
a. For normal distributions, the three MOCT are equal and therefore occur at the centre.
N
b. Positively skewed data, the mean is the largest MOCT, followed by median then mode.
SA
c. Negatively skewed data, the mode is the greatest MOCT, followed by median, and then
These are statistical measures that tell us how the scores differ from one another. They indicate
how much the scores are spread out or clustered around the central value.
Page 20 of 48
0759474478
CONGRESSLADY SCHOOL OF EDUCATION
b. They determine how well the average represents the other scores. NB: When a MOV is
small, it indicates that the scores are clustered closely (and therefore the mean is
representative of the data); when it is large it indicates that the mean is not representative
of the data.
c. To facilitate comparison.
S
The most frequently used measures of dispersion are range, interquartile range, variance, and
ES
standard deviation.
a. Range.
N
PI
The range is the difference between the highest and the lowest scores in the data set. The
formula is:
AP
R=H-L
Where R=range
H
H= highest score
L= Lowest score
AH
b. Grouped:
D
i) Upper real limit of the highest class minus lower real limit of the lowest class.
N
Merits of Range
a) Easy to calculate.
b) Easy to understand.
Demerits of Range
a) Only involves two values and threfore can highly be influenced by outliers.
b) Does not indicate the direction of variability i.e. not an accurate indicator of variability.
Page 21 of 48
0759474478
CONGRESSLADY SCHOOL OF EDUCATION
c) Cannot be determined for open ended distributions.
It is half the difference between the third quartile and the first quartile.
QD is related to the median. i.e. It is the range of the middle 50% of the data.
S
Quartiles are the values that divide a list of numbers into four groups: Q1; Q2; Q3; & Q4.
ES
NB: Q1= 25% of N; Q2= 50% of N; Q3= 75% of N;
N
To get QD, you must first order the data set.
PI
Steps: AP
● Arrange data from highest to lowest.
● Find the median of the values (This is Q2).
H
● Find the median of the values below Q2. (This is Q1).
● Find the median of the values above Q2. (This is Q3).
AH
𝑄1 − 𝑄3
𝑄=
R
2
D
𝑁/4 − 𝑐𝑓𝑏
𝑄1 = 𝐿 + ( ) 𝑥𝑐
𝑓𝑤
SA
3𝑁/4 − 𝑐𝑓𝑏
𝑄3 = 𝐿 + ( ) 𝑥𝑐
𝑓𝑤
Advantages of SIQ
- Easy to calculate.
Disadvatages
Page 22 of 48
0759474478
CONGRESSLADY SCHOOL OF EDUCATION
- Does not represent all the data. i.e. the upper and lower 25% of the data are not used in the
calculation.
Deviation score (D)= The distance of a score from the mean. i.e. 𝑋 − 𝑋. Nb: ∑D= 0.
∑|𝑋−𝑋 |
Mean deviation (MD)= the average of the absolute deviation scores. i.e. 𝑀𝐷 = 𝑁
S
Advantages of the MD
ES
- Easy to understand.
N
Disadvantages of the MD
PI
- Does not support more advanced statistical procedures.
Variance.
AP
Variance (S2) usually indicates the average squared distance from the mean. The greater the
H
distance, the greater the variance. When two sets of scores have the same mean but different
AH
variances means that one has a greater spread of scores than the other.
∑(𝑋 − 𝑋̅)2
𝑆2 =
D
𝑁
∑𝑋 2
∑ 𝑋2 − ( )
N
𝑁
𝑆2 =
𝑁
SA
2
∑𝑋 2 ∑𝑋 2
𝑆 = −( )
𝑁 𝑁
The steps involved in working out the variance using the above formular include:
i. Find the sum of the values (ΣX). Example: For the following eight scores
Page 23 of 48
0759474478
CONGRESSLADY SCHOOL OF EDUCATION
ii. Square each value and find the sum (ΣX2). Thus the ΣX2 for our scores
2
∑𝑋 2 ∑𝑋 2
𝑆 = −( )
𝑁 𝑁
S
S2= 1544/8 – (110/8)2 = 193 -189.0625=3.9375=3.94.
ES
Deviation score method:
N
2
∑(𝑥 − 𝑥)
2
𝑆 =
PI
𝑛
𝑥 = 13.75
AP
X x-𝑥 (𝑥 − 𝑥)2
H
11 -2.75 7.5625
AH
12 -1.75 3.0625
12 -1.75 3.0625
R
13 -0.75 0.5625
D
14 0.25 0.0625
N
15 1.25 1.5625
SA
16 2.25 5.0625
17 3.25 10.5625
2
∑(𝑥 − 𝑥) = 𝑆𝑢𝑚 𝑜𝑓 𝑠𝑞𝑢𝑎𝑟𝑒𝑠 = 31.5
Page 24 of 48
0759474478
CONGRESSLADY SCHOOL OF EDUCATION
2
∑𝑋 2 𝑓 ∑𝑋𝑓 2
𝑆 = −( )
𝑁 𝑁
Steps:
Table 1
A B C D E F
S
Class Frequency Midpoint midpoint squared (x2) Xf X2f
ES
2. Square the midpoint for each class and place the products in column D.
N
3. Multiply frequency by midpoint for each class, and place the products in column E.
PI
4. Multiply the frequency by square of the midpoint, and place the products in column F.
∑𝑋 2 𝑓 ∑𝑋𝑓 2
𝑆=√
R
−( )
𝑁 𝑁
D
N
Advantages of variance.
SA
-It does not give a good descriptive measure of dispersion since it is expressed in squared
deviations.
C. Standard Deviation
The standard deviation (SD), sometimes represented by the Greek letter σ (sigma), is the square
root of the variance. It usually indicates the extent to which scores vary from the mean. It is a
Page 25 of 48
0759474478
CONGRESSLADY SCHOOL OF EDUCATION
very common measure in the field of testing and measurement. It is also used in calculating the
Deviation IQ.
2
𝛴𝑋 2 −(𝛴𝑋)
𝑆𝐷 = √ 𝑁
𝑁
In the example for variance, we can compute the standard deviation by finding the square root
S
Note: That the greater the standard deviation, the more the scores tend to spread out from the
ES
mean. i.e. The smaller the standard deviation, the less the scores tend to vary from the mean
N
PI
The standard deviation considers all scores in the distribution and it facilitates calculations of
combined standard deviation, which helps to compare two or more distributions of scores. It
AP
also facilitates other statistical computations like correlation and skewness. The standard
H
deviation provides a unit of measurement for the normal distribution. However, the standard
AH
deviation is very much affected by extreme scores, it is difficult to compute and compare, and
MEASURES OF RELATIONSHIP
Relationship – tendency for two variables to change consistently.
D
N
a. Logical examination
This involves examining/observing a pair of data to pick out any pattern of change in the
values.
Page 26 of 48
0759474478
CONGRESSLADY SCHOOL OF EDUCATION
E.g. (A). 1,2,4,6,7,8. (B). 3,4,6,8,9
However, the relationship between variables is not always easily picked out by mere
observation of the data.
Scatterplot is a graph that describes the nature of the relationship between two variables by
plotting them against each other.
S
i Collect pairs of data where a relationship is suspected.
ii Draw a graph with the independent variable on the horizontal axis and the dependent
ES
variable on the vertical axis. For each pair of data, put a dot or a symbol where the x-
axis value intersects the y-axis value.
N
iii Look at the pattern of points to see if a relationship is obvious (NB. If the data clearly
form a line or a curve, then the variables are correlated).
PI
Types of relationships that can be shown using scatter plots.
AP
i. Positive relationship- this is present when the scatter plot indicates that as the values of
one variable increase, the values of the other variable also increase.
H
Series 1
AH
4.8
4.6
4.4
R
Series 1
4.2
D
4
Category 1 Category 2
N
ii. Negative relationship- this is present when the scatter plot indicates that as the
SA
values of one variable increase, the values of the other variable decrease.
Page 27 of 48
0759474478
CONGRESSLADY SCHOOL OF EDUCATION
Series 1
5
4.5
4
3.5
3
2.5
2
1.5 Series 1
1
0.5
0
Category 1 Category 2
S
ES
iii. No relationship
This is when the scatter plot does not indicate any distinct pattern in the points of the
N
two variables.
PI
Purposes of scatter plots AP
- Show direction of relationship
- Show strength of relationship- the amount of scatter or dispersion in the points. The
strongest possible relationship is a perfect relationship- a relationship where the actual
H
data points perfectly fit on a straight line.
- Show linearity
AH
- Linear rlships- a rln btw n 2 vs where a the pts on a scatter plot are best fitted/
summarized by a straight line.
- Non-linear relship- scattreplots best fitted by a curve
R
This is a quantitative measure of the relationship between two variables. It ranges from -1
to +1.
Intrepreting r
Page 28 of 48
0759474478
CONGRESSLADY SCHOOL OF EDUCATION
ii. Its size/ magnitude- indicates the strength of the relationship between two variables.
S
Common correlation coefficients
ES
1. Pearson’s product moment correlation coefficient (rxy)
Definitional formula
a. 60,50,54,59,75
N
b. 71,40,53,59,72
PI
∑(𝑥 − 𝑥) (𝑦 − 𝑦)
𝑟𝑥𝑦 =
2
2
AP
√∑(𝑥 − 𝑥) ∑ (𝑦 − 𝑦)
E.g.
H
Table 2
X Y 2 2
(𝑥 (𝑦 (𝑥 − 𝑥) (𝑦 − 𝑦) (𝑥 − 𝑥) (𝑦 − 𝑦)
AH
− 𝑥)
− 𝑦)
10 9 -6 -4 36 16 24
20 16 4 3 16 9 12
R
15 13 -1 0 1 0 0
D
19 17 3 4 9 16 12
16 10 0 -3 0 9 0
N
𝑦) =50 𝑦)=48
rxy= 48/√62x 50= 48/55.68= 0.86.)⁰
Computational Formula
𝑁∑𝑥𝑦 − ∑𝑋∑𝑌
𝑟𝑥𝑦 =
√[𝑁∑𝑋 2 −(∑𝑋)2 ][𝑁∑𝑌 2 −(∑𝑌)2 ]
Table 3
X Y X2 Y2 XY
Page 29 of 48
0759474478
CONGRESSLADY SCHOOL OF EDUCATION
10 9 100 81 90
20 16 400 256 320
15 13 225 169 195
19 17 361 289 323
16 10 256 100 160
∑X=80 ∑Y=65 ∑X2=1342 ∑Y2=895 ∑XY=1088
5440−5200
𝑟𝑥𝑦 = = 240/278.39= 0.86
√[6710−6400][4475−4225]
There are four assumptions that should be met when using Pearson's correlation:
S
i. The variables are measured at either interval or ratio scales of measurement.
ES
ii. The variables are approximately normally distributed
iii. There is a linear relationship between the two variables.
iv. There is homoscedasticity of the data. i.e. Homogeneity of variance equal variance of
N
the x and y distributions.
PI
v. There are no outliers in the data.
It involves ranking each set of data. Then the differences in the ranks is found. rs is found
computed using these differences. It establishes whether the ranks of the two sets of data are
AH
correlated. When both sets have the same rank, rs = + 1. When the two sets have exactly
opposite ranks then it is -1. If there is no relationship between the rankings, rs will be very
near 0.
R
6∑𝐷2
D
𝑟𝑠 = 1 −
𝑛(𝑛 2 − 1)
N
SA
Steps.
i. The two variables are measured at least on an ordinal, interval or ratio scale.
Page 30 of 48
0759474478
CONGRESSLADY SCHOOL OF EDUCATION
ii. There is a monotonic relationship between the two variables. A monotonic
relationship exists when either the variables increase in value together, or as one
variable value increases, the other variable value decreases.
iii. The variables are not normally distributed.
E.g. r = 0.5 does not mean that 50% of the relationship between the two variables.
S
Why?
ES
1. A correlation coefficient is only a measure of relationship. It does not show whether
one variable is causing the scores of the other to be as they are.
N
2. The two variables could be consequences of a common cause but do not cause each
PI
other. i.e. The third variable problem. The third variable is called a lurking
variable: A hidding variable that causes both X and Y.
AP
a. The number of churches in Nairobi and alcoholism.
b. The number of photocopiers in KM and retakes in EPS 400.
3. The two variables could have cyclic causation: X causes Y, and Y causes X.
H
4. There could be indirect causation. E.g. X causes A, and A causes Y.
5. There is no connection between X and Y, and the correlation could be coincidental.
AH
compare a students’ performance with that of a reference group, usually one’s classmates.
D
a. Standard scores
SA
b. Percentiles
c. Deciles
d. Quartiles
Standard scores- the transformation of raw scores to a desired scale with a predetermined
mean and standard deviation.
Types of standard scores
1. Z-scores
A Z-score indicates the relative position of a student in a group by showing how far the raw
score is above or below the mean using the sd as a unit of measurement.
Page 31 of 48
0759474478
CONGRESSLADY SCHOOL OF EDUCATION
NB: Comparing raw scores may not be possible since the exams may be different, markers,
schools, etc.
1. this is a score that indicates the number of std deviations a raw score is above or
below the mean in a distribution.
𝑥−𝑥
Formula: 𝑧 = .
𝑠𝑑
S
different distributions can be made comparable.
ES
NB: Standardization- the process of putting scores on a uniform distribution with the same
mean and std deviation for the purposes of comparing them.
N
Z-scores and Location.
PI
A raw score does not tell us how that particular score compares with other values in the
distribution.
AP
If the raw score is transformed into a z score, the z-score value indicates exactly where a
score is located relative to all the other scores in the distribution.
H
When scores are converted to z-scores, we end up with numbers with two important
properties:
AH
a. Either a + or – sign. + indicates that the X value is located above the mean. – indicates
that the X value is located below the mean.
R
b. The numerical value. This shows the number of std. deviations between the raw score
and the mean.
D
N
Advs of Z-scores
Disadvantages of Z-scores
- The values of z-scores are normally very small. Most of them range from -3 to +3.
- Burdensome since they involve many decimal values and negative numbers. In case one
forgets putting the – sign it would change the entire meaning of the score.
- The z-score distribution has a mean of 0. This is quite difficult for laymen to compute.
To overcome these challenges, we convert raw scores into other standard scores.
Page 32 of 48
0759474478
CONGRESSLADY SCHOOL OF EDUCATION
We convert raw scores are transformed into standard scores using the following procedure:
Standard score = mean of the standard score + (standard deviation of the standard score
multiplied by the z-score).
The following are the common standard scores and their respective means and standard
deviations:
S
ES
Standard score Mean Standard deviation
1 T-score 50 10
2 Stannines 5 2
N
3 IQ 100 15
PI
AP
Normalized standard scores- are standard scores based on distributions that were not were
not originally normal, but were transformed into normal distributions.
H
When an entire data set is transformed into z-scores, the resulting distribution of z-scores will
always have a mean of 0 and a sd of 1 .i.e. It will be a standard normal distribution.
NB: All normally distributed variables can be transformed into standard normal distributions
using the formula for the z- scores.
Page 33 of 48
0759474478
CONGRESSLADY SCHOOL OF EDUCATION
We need to know its properties and how to locate scores on it.
a) Properties.
NB: we must know its properties in order for us to solve problems involving distributions that
are approximately normal.
● Bell-shaped
● Mean, median, and mode located at the centre.
● Unimodal
S
● Symmetric
ES
● The curve is continuous
● The curve never touches the x axis (homoscedasticity).
N
Statistical Properties
● Has a mean of 0 and sd of 1.
PI
● The total area under the curve is 1 or 100%.
● 50% or 0.5 of the area lies above the mean and 50 % below the mean.
AP
● 34.13 % of the total area under the curve lies between the mean and one SD below
and above the mean.
● 47.72% of the total area under the curve lies between the mean and two SD below and
H
● About 95.44% of the total area under the curve lies within two SD of the mean. (i.e.
D
47.72 x2).
N
● About 99.8% of the total area under the curve lies within three SD of the mean. (i.e.
SA
49.87x2).
We locate scores on the standard normal distribution by finding areas under the standard
normal distribution curve.
Steps
i Draw picture
Page 34 of 48
0759474478
CONGRESSLADY SCHOOL OF EDUCATION
ii Shade the desired area
iii Get the area from the table.
● Between 0 and any z score value: look up the z score value to get the area.
● In any tail: Look up the Z score value to get the area. Then subtract area from 0.50.
● Between two Z-score values on the same side of the mean. Look up both z score
values to get the areas, then subtract the smaller area from the larger area.
● Between two z score values on opposite sides of the mean, look up both z-score
values to get the areas, then add the areas.
S
● To the left of any z value where z is greater than the mean: look up the z value to get
ES
the area, then add 0.50.
● To the right of any z value, where z is less than the mean, look up the value in the
table to get the area, then add 0.50 to the area.
N
● In any two tails. Look up the z values in the table to get the areas. Subtract both areas
from 0.50, then add the answers.
PI
In a class of 100 students the mean of an English test is 75 and the standard deviation 7.5.
AP
How many students scored below a score of 69?
Procedure:
H
E.g. What is the raw score corresponding to a z-score of 1.96 in the English test?
N
Procedure.
SA
i Determine whether the z-score is above or below the mean. Recall: Z(sd) is the
distance from the mean. From formula: Z= x-xbar/sd.
We find that: X= Z(sd) + Mean.
ii The score is 1.96(7.5) above the 75.
i.e. X= 1.96(7.5) + 75.
X=89.7.
How to find a z-score when we know the proportion under a normal curve.
E.g. What score must a student get in the English test, to be among the top 5% of the class?
Therefore we look up for the z-score value that gives us the proportion .45.
Page 35 of 48
0759474478
CONGRESSLADY SCHOOL OF EDUCATION
Z= 1.65.
Then Z(sd)+Xbar= X.
Then 1.65(7.5)+75=X.
87.38.
S
Question.
ES
In a class of 240 students, the mean for the end of year Agriculture test was 70 and the
standard deviation was 9. Assuming that the scores were normally distributed:
N
a)How many students scored between 50 and 75?
b)How many students scored above a score of 69?
PI
c)How many students scored below a score of 40?
d)What were the cut off scores for the middle 50% of the class?
e)
AP
Due to limited facilities, the teacher can only promote 85 % of the top students to the
next class. What is the minimum score a student had to obtain to be selected for the
next class?
H
C. Percentiles
AH
Percentiles indicate what percentage of scores fall below a particular score in a distribution.
R
Percentile rank- this is the percentage of individual scores in a data set that fall at or below a
given score. It is mainly used in norm-referenced scores.
D
Percentile point- A score below which a certain percentage of scores fall in a given
N
distribution.
SA
When we know the mean and sd of a distribution, we can find percentiles using standard
normal distribution tables.
DEFINITIONS
A. Test- A device or procedure in which a sample of individual’s behaviour is obtained,
evaluated, and scored using standardized procedures.
Page 36 of 48
0759474478
CONGRESSLADY SCHOOL OF EDUCATION
B. Examination- A collection of tests, which measure different traits of the individual in
order to facilitate decision-making.
C. Measurement- A process of assigning numbers to represent objects, traits, attributes,
or behaviours following a set of rules.
- An educational test is a measurement device and therefore involves rules for
assigning numbers that represent an individual student’s performance.
D. Assessment is the systematic process of collecting and integrating information in a
manner that promotes understanding of students’ characteristics and decision making
in the education process. It is accomplished by use of tests and other techniques.
E. Testing- this is the process of administering, scoring, and interpreting, an instrument.
Testing is a component of the assessment process.
F. Psychometrics- the science of psychological measurement.
S
G. Evaluation is the process of making value judgements about students and the learning
ES
process based on the information gathered through measurements.
There are two major types of evaluation.
Formative Evaluation
This type of evaluation is done during the teaching-learning process with a goal of
N
monitoring student’s learning and to provide on-going specific feedback to them.
PI
Formative are generally low stakes, which means that they have low or no point value.
Examples:
- Draw a concept map in class to represent their understanding of a topic.
AP
- Submit one or two sentences identifying the main point of a lecture.
- Turn in a research proposal for early feedback.
Specific uses of formative evaluation in education.
H
1. It monitors how well the instructional goals and objectives are being met- teacher,
learners, and curriculum designers;
2. It determines student’s strengths and weaknesses- areas mastered and those not
AH
mastered. This informs proper learning interventions which can be put in place.
3. Facilitating learning - allow learners to master the required skills and knowledge;
4. Motivating students-
R
often high stakes. High stakes tests are those tests that are used in ways that have important
consequences for the student. Such tests affect decisions regarding whether the student will
be promoted, admitted, or allowed to graduate.
Examples
i. A midterm exam
ii. Final project
iii. A journal paper
iv. End of Sem Exams
Specific uses of Summative Evaluation
a. It guides students’ efforts and activities in subsequent courses (formative use).
b. Provides a summary of student’s achievement/progress to parents.
c. It shows the value or quality of learning outcomes.
d. Enhances accountability.
Participants in the Assessment Process
Page 37 of 48
0759474478
CONGRESSLADY SCHOOL OF EDUCATION
a. Test developers- people who develop the tests. NB: Not all developers are
professionally trained. They have professional and ethical responsibilities.
b. Test users- people who use the tests. Those who select, administer, score, interpret
and use the results. E.g. teachers, psychologists, counsellors, employers, professional
licensing boards, parents, etc. NB: Not all users are professionally trained. They also
have professional and ethical responsibilities.
c. Test takers- people who take/do/write the tests. They have their rights.
d. Test marketers - Those who market assessment products and services.
e. Those who teach others about the assessment process.
CRITERIA FOR CLASSIFYING ASSESSMENT PROCEDURES
There are four basic ways of classification:
1. Nature of assessment
S
a. Maximum performance –determines what an individual can do when performing
ES
at his/her best. E.g. aptitude test; achievement tests.
b. Typical performance- determines what an individual will do under normal
conditions. Attitude & interest inventories.
2. Form of assessment-
N
a. Fixed choice- student selects the response to questions from available options.
PI
E.g. MCQs.
b. Constructed response- student constructs extended response in response to
complex task. E.g. Essays.
AP
c. Performance assessment- Assessment which requires the student to demonstrate
knowledge or skill through activities that are mostly direct, active, and hands-on,
such as giving a speech, performing a task, or producing an artistic product.
H
of teaching-learning process.
b. Diagnostic assessment- Assessment carried out to find out the underlying causes
of student’s learning difficulties.
R
c. Formative vs summative-
D
Similarities
A. Both require specification of the achievement domain to be covered.
B. They use same type of questions.
C. Both require a relevant and representative sample of test items.
D. Both use the same rules of item writing.
E. They are both judged by the same qualities of goodness.
F. They are both useful in educational assessment.
Page 38 of 48
0759474478
CONGRESSLADY SCHOOL OF EDUCATION
Emphasize discrimination among learners in Emphasize description of the learning
terms of relative levels of learning. tasks learners can or cannot perform.
Interpretation requires clearly defined group. Interpretation requires a clearly defined
achievement domain.
Prefers items of average difficulty and typically Matches item difficulty to learning tasks,
omits very easy and very hard items. without omitting very hard or very easy
items.
Individual vs group
Speed vs power
S
Objective vs subjective
ES
5. Alternative assessment- Non-traditional ways of gathering information about the
student. This may include: portfolios, observations, samples of client’s, or group
projects.
N
Needs assessment-Inquiry into the current state of knowledge, resources, or practice with the
PI
intention of taking action, making a decision, or providing a service based on the results.
AP
Self-assessment- Personal rating of ability according to specified criteria.
constructive feedback to them. E.g. they involve assigning students grades to reflect their
academic progress or achievement.
ii. In classroom instruction- provides information that helps teachers to modify and
R
● Helps teachers determine what to teach, how to teach, and how effective their teaching
SA
Page 39 of 48
0759474478
CONGRESSLADY SCHOOL OF EDUCATION
● Classification decisions- refer to situations in which students are assigned to
different categories that are not ordered or not ordered. E.g. Classification of
special education students VI, HI, Emotional Disturbed e. t. c.
iv. Policy decisions- a lot administrative decisions at different levels. They involve
evaluating the curriculum, instructional practices, levels of funding, employee recognition
and benefits, as well as accountability.
v. Counselling and guidance decisions- school cousellors promote the self-understanding
of students and help students plan for their future.
PREPARATION OF EDUCATIONAL MEASUREMENT OBJECTIVES
Educational objectives- A.k.a Instructional / learning objectives.
Educational goals i.e. what you hope the students will learn or accomplish.
Educational objectives are the foundations of assessment.
S
ROLE OF OBJECTIVES IN EDUCATION AND EVALUATION
ES
Objectives:
i. Direct the instructional process- by describing the intended results of instruction.
ii. Communicate the intent of instruction to others (students, parents, school personnel,
public, ministry, etc).
N
iii. Provide a basis for assessing student learning- describing the performance to be
PI
measured.
iv. Make it easier to develop fair, valid, and comprehensive tests.
v. Enhance quality of teaching and learning.
AP
vi. Guide student’s self-assessment of their learning as well as self-management of learning
opportunities.
H
Characteristics of Educational Objectives:
E.O. Possess three most prominent characteristics:
AH
a. Scope- how specific or broad the objective is. There should be a balance between the
broad and narrow objectives. How? Use an intermediate level of specificity or write
broad objectives then break them down into specific ones.
b. Domain - cognitive, affective, or psychomotor domain.
R
Non-behavioural- specify activities that are unobservable and not directly measurable.
N
BLOOMS TAXONOMY
SA
Page 40 of 48
0759474478
CONGRESSLADY SCHOOL OF EDUCATION
- They should include all-important outcomes of the course.
- They must be consistent with the general goals of education.
- They should be consistent with the sound principles of learning.
- They should be realistic in terms of students’ abilities, time, and facilities.
- They should be specific and measurable.
The Cognitive Domain. Six basic objectives are listed in Bloom’s taxonomy of thinking or
cognitive domain.
S
3. Application: Using a general concept to solve a particular problem.
ES
4. Analysis: Breaking something down into its parts.
5. Synthesis: Creating something new by combining different ideas.
6. Evaluation: Judging the value of materials or methods as they might be applied in
a particular situation.
N
PI
Table of Specifications (TOS) Aka Test Blueprint
This is a two-way grid/chart which defines the scope and emphasis of a test by relating the
AP
course content to the instructional objectives.
Its columns list the performance objectives at each level of the cognitive domain as described
H
in Blooms Taxonomy. Its rows list the key concepts/ content measured by the test.
TOS should be prepared before the test is constructed. Preferably before the actual teaching.
AH
c. Construct the two-way chart. Place the content classification on the left-hand side of
N
the chart and the cognitive domains across the top of the chart.
d. Find the totals or percentages for each content category (placed at the right hand-side
SA
of the chart) and for each cognitive domain (placed at the bottom of the chart).
Cognitive Domain
Content Knowledge Comprehension Application Analysis Synthesis Evaluation Total
Climate 1 2 3 1 4 2 13
Environment 2 3 1 1 2 3 11
Total 3 5 4 2 6 5 24
Page 41 of 48
0759474478
CONGRESSLADY SCHOOL OF EDUCATION
3 It encourages the teacher to use items of varying difficulty.
4 It enhances the content validity of a test- guides test developers not to under test or
over test any area, or include irrelevant concepts.
5 It informs students of what to expect in tests, thus improving their learning and
revision for tests.
S
- Practical value
ES
4.1.1 Reliabity
4.1.2 Meaning
-
N
The ability of a test to consistently measure what it is measuring.
- The degree to which test scores are consistent, dependable, and repeatable.
PI
4.1.3 Methods of estimating reliability.
The major methods of determining reliability focus on different types of consistency:
AP
- consistency over a period of time,
- over different forms of the assessment,
- within the assessment itself, and
H
- over different raters.
The major methods of estimating reliability include:
Test-retest reliability- It is the extent to which a test yields the same scores when it is
AH
administered to the same group of test-takers on two different occasions. Procedure: Give the
same test to the same group of test-takers with atime interval between the tests (usually within
a short period of time 2 wks-a month) and then correlating the two sets of scores.
R
administering two forms of a test to the same group of individuals at almost the same time
N
and then correlating the scores. A high correlation coefficient indicates that the two forms are
measuring the same type of performance. A low correlation coefficient would indicate that
SA
the two versions are not measuring the same thing or that they differ in degree of difficulty.
Test-retest with equivalent forms- This method is a measure of stability and equivalence. It
involves giving two forms of the test to the same group with an increased time interval
between the two forms and then correlating the two sets of scores.
Split-half reliability- this is estimated by administering a test and then dividing it into two
equivalent parts that are scored independently (e.g. odd numbered items vs even numbered
items). The results of the two parts are then scored and correlated. The correlation coefficient
indicates the degree to which consistent results are obtained from the two parts of the test and
therefore adequacy of content sampling.
Inter-rater reliability- this measures consistency of scorers’ judgements when scoring a test.
It is estimated by administering the test once and having two or more individuals
independently score each student’s responses in the test. The scores obtained by two scorers
are then correlated. This method only reflects differences due to the individuals scoring the
test. It is highly applicable for essaytests. A major limitation of the approach is that it is
Page 42 of 48
0759474478
CONGRESSLADY SCHOOL OF EDUCATION
affected by differences due to raters or scorers.
How to improve the reliability of a test.
i. Standardize the conditions under which the test is taken.
ii. Use sufficient number of items or tasks in a test.
iii. Clearly explain requirements for responding to test items.
iv. Identify specific criteria in advance for scoring student’s essay items. Score all
students’ responses to a particular item before moving to a second one.
v. Avoid being influenced by your expectations of your students and score tests
anonymously.
Factors influencing reliability coefficients
i. Test length. A longer test more reliable than a short one.
ii. Speed- Speed tests are not as reliable as power tests. In a speed test not every student
S
is able to complete all of the items within the given time. In a power test every student
ES
is able to complete all the items.
iii. Group homogeneity- The more heterogeneous the group of students who take the
test, the more reliable the scores are likely to be.
iv. Item difficulty- very easy or very difficult tests have little reliability.
N
v. Objectivity- Objectively scored tests show higher reliability than subjective tests.
PI
vi. Variation with the testing situation. Errors in the testing situation (e.g., students
misunderstanding or misreading test directions, noise level, distractions, and sickness)
can cause test scores to vary.
AP
vii. Student’s internal factors
Health
Motivation
H
Anxiety
Validity
AH
Construct validity- the extent to which a test accurately measures a characteristic that is
N
Page 43 of 48
0759474478
CONGRESSLADY SCHOOL OF EDUCATION
b. Constructed response format- test items that ask students to create an answer by
writing out information E.G. Fill in the blank spaces, essays etc.
FACTORS TO CONSIDER WHEN SELECTING TEST FORMAT
i. Purpose of the test.
ii. Time available to prepare and mark the test.
iii. Number of students to be tested.
iv. Physical facilities available for reproducing the test.
v. Teacher’s skill at writing different types of tests.
vi. The subject being tested.
vii. The level of examinees.
1. Selected response items
Multiple choice tests (MCTs) are the most popular. A multiple choice item has two parts: the
S
stem and the alternatives.
ES
Stem- a question or an incomplete statement.
Alternatives- these are possible answers.
Two types of alternatives:
Answer- the correct alternative
N
Distracters- incorrect alternatives (they serve to ‘distract’ students who actually do not know
PI
the answer).
Forms of MCT
a. Direct-question type- the stem is a direct question. E.g. Which is the longest river in
AP
Africa?
b. Incomplete sentence type- the stem is an incomplete statement. The longest river in
Africal is------------------------------
H
Guidelines for writing MCT
a. The item should contain all the information necessary to understand the problem/question
AH
b. Provide between three and five alternatives. This reduces the chance for guessing the
correct answer.
c. Keep the alternatives brief and arrange them in an order that promotes efficient scanning.
d. Avoid negatively stated stems as much possible
R
Page 44 of 48
0759474478
CONGRESSLADY SCHOOL OF EDUCATION
b. Not effective for measuring some educational objectives. E.g. creativity, organization,
problem solving, or verbal skills.
c. Scores can be influenced by reading ability
d. May encourage rote learning and guesswork
e. Does not allow students freedom of expression
2. Constructed-response items.
Short answer and essay questions are the most common.
Essay items- test items that pose a question for the student to respond to in a written
format.
Types of essay items
a. Restricted-response items- highly structured and clearly specify the form and scope
of student’s response. They typically require students to list, define, describe, or give
S
reasons. They may specify time and length of the response. E.g. list the three types of
ES
muscle tissue and state four functions of each.
b. Extended-response items- provide more freedom and flexibility in how students can
respond to the item. They do no limit the response it terms of form and scope. They
typically require students to compose, summarize, formulate, compare, interpret, e.t.c.
N
E.g.
PI
Summarize the major forms of validity.
They provide less structure and this promotes creativity, integration, organization,
analysis and synthesis of the information.
AP
Advantages of essay tests
i. They take less time to prepare.
ii. Effective for measuring higher-level cognitive skills.
H
iii. They largely eliminate guessing.
iv. They effectively assess knowledge of content, grammar, and writing ability and
AH
style.
Disadvantages
i. Tedious to mark and the essays are difficult to score in a reliable way.
ii. Scoring of essays may be influenced by irrelevant characteristics like students’
R
questions.
c. Let students respond to the same set of items.
d. Use more restricted-response items than extended-response items.
e. Structure and clarify the task.
f. Limit the use of essay to educational objectives that cannot be assessed using
selected-response items.
ITEM ANALYSIS
Page 45 of 48
0759474478
CONGRESSLADY SCHOOL OF EDUCATION
This is a set of statistics that can be computed for each test item to assess its quality. The
choice of item analysis method depends on the purpose of the test and the person designing
the analysis.
Purpose of item analysis.
i. Item analysis data provides a basis for class discussion of the test results.
ii. Item analysis data provides a basis for remedial work focused on students’ areas
of weakness.
iii. It can help improve classroom instruction.
iv. It increases teacher’s understanding and skills in test construction
v. It helps us understand why a test has specific levels of reliability and validity and
why test scores can be used to predict some criteria and not others.
vi. It may also suggest ways of improving the measurement characteristics of a test.
S
Basic questions in item analysis.
ES
a. How many people chose each response? Recall that there is only one correct answer.
The incorrect responses are called distracters. Therefore, examining the total pattern
of responses to each item of a test is referred to as distracter analysis.
b. How many people answered the item correctly?/ were the items of appropriate
N
difficulty? To answer this question, we conduct an analysis of item difficulty.
PI
c. Are responses to the item related to the responses to other items in the test?
Answering this question entails an analysis of item discrimination.
Item difficulty
AP
In achievement tests, item difficulty also called p Value, is commonly measured as the
percentage or proportion of students who answered an item correctly. It can be expressed
as a decimal, percentage, or fraction. It ranges from 0-1 (for decimal or fraction) or 0-
H
100% (for percentage).
Formular for p.
AH
Where: Ru= number of students in the upper group who got the item right
Rl=number of students in the lower group who got the item right
D
question
Nl= number of students in the lower group who actually attempted the
SA
question.
How to interpret Item difficulty of Test Scores.
P=0 indicates that all students chose the wrong answers.
P=1 indicates that everyone got the correct answer.
An item with a p value of 0 or 1 is undesirable since the item would not show individual
differences among the learners and are of no value from a measurement perspective.
The optimal difficulty index is 0.50. Any items with p ranging from .30 to .70 are
considered good in a MCT.
The item discrimination index
Item discrimination is abbreviated as D. It is simply the difference between the number of
high achieving students who got an item right and the number of low achieving students who
got the item right. We can compute the discrimination index by subtracting the proportion of
students who got the item correct in the lower group (RL/NL) from the proportion of students
who got the item correct in the upper group (RU/NU)
Thus:
Page 46 of 48
0759474478
CONGRESSLADY SCHOOL OF EDUCATION
𝑅𝑢 𝑅𝑙
𝐷= −
𝑁𝑢 𝑁𝑙
How to interpret D.
D is usually expressed as a decimal and it can range from -1.0 to +1.0.
If it is positive the item has positive discrimination. i.e. a large proportion of more
knowledgeable students got the item right than the poor students.
If it is 0 the item has zero discrimination. This is possible when test item is too difficult, too
easy, or ambiguous.
If it is negative the item has negative discrimination. i.e. a large proportion of poor students got
the item right than the more knowledgeable students.
We can interpret the quality of test items following these guidelines:
S
D Quality of item
ES
0.40 and above excellent
0.30-0.39 good
0.20-0.29 fair
0.00-0.19 poor
N
Negative values poor
PI
Therefore, for items to be considered of good quality, they must have at least a D of 0.30.
However, items with D of .020 and above are acceptable in classroom tests for various reasons.
Items with negative discrimination values should be reviewed. A negative discrimination
AP
value, like a low difficulty value, may occur as a result of several possible causes: a miskeyed
item, or an item that is ambiguous.
H
Example:
A group of 140 examinees responded as shown below in a test item:
AH
A B C D Omit Total
Upper 3 57 6 4 0 70
group
Lower 12 34 14 4 6 70
R
group
D
Find p and D.
N
Page 47 of 48
0759474478
CONGRESSLADY SCHOOL OF EDUCATION
d. Provides an opportunity for parents/guardians to comment on student
achievement, student goals, and shared responsibilities of home and school to
support improved student learning
e. It gives students descriptive feedback in comments,
f. It provides guidance to help students improve their learning.
Types of judgements on report cards
i. Letter grades e.g. A,B,C,D,E, F (sometimes allowing pluses and minuses).
ii. Numerical scores per subject e.g. 91 mathematics; 70 english etc
iii. Pass/fail category in one or more subjects.
iv. Checklists indicating skills and objective that students have attained (mainly
used for kindergartens and nursery school).
v. Categories for affective characteristics such as effort, cooperation, and other
S
appropriate and inappropriate behaviours.
ES
Writing effective report card comments.
i. Effective Report Card comments provide specific details about a student’s
achievement of the overall curriculum expectations.
ii. Report card comments should provide students and parents with personalized, clear,
N
precise, and meaningful feedback.
PI
iii. Effective comments are written in clear and simple language, using vocabulary that is
easily understood by both students and parents, rather than educational terminology
taken directly from the curriculum documents, and conveys a positive tone.
AP
Comments should be error-free, and avoid slang or colloquial language.
H
AH
R
D
N
SA
Page 48 of 48
0759474478
CONGRESSLADY SCHOOL OF EDUCATION