0% found this document useful (0 votes)
8 views

Assessment in learning MODULE 3

Uploaded by

RJ's Extreme V
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
8 views

Assessment in learning MODULE 3

Uploaded by

RJ's Extreme V
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 17

MODULE 3:

ADMINISTRATION OF TESTS AND ORGANIZATION OF TEST RESULTS

Module 3 consists of two (2) lessons attainable for coverage of Midterm as follows:

Lesson 6 : Establishing Test Validity and Reliability


Lesson 7 : Organization of Test Data Using Tables and Graphs

Each lesson contains theory, questions and activity. Quiz is provided for you to answer as required.

LESSON 6 :

ESTABLISHING TEST VALIDITY AND RELIABILITY

THINK ABOUT THESE EXPECTATIONS:

1. Use Procedures and Statistical Analysis to Establish Test Validity and Reliability.
2. Decide whether a Test is Valid or Reliable.
3. Decide which Test Items are Easy and Difficult.

Opening

In order to establish the validity and reliability of an assessment tool, you need to know the different
ways of establishing test validity and reliability.

Test Reliability

Reliability is the consistency of the responses to measure under three conditions:


1. when retested on the same persons;
2. when retested on the same measure; and,
3. similarity of responses across items that measure the same characteristic.

In the first condition, consistent response is expected when the test is given to the same participants.
In the second condition, reliability is attained if the responses to the same test is consistent with the same
test or its equivalent . In the third condition, there is reliability when the person responded in the same way
or consistently across items that measure the same characteristic.

Methods in Testing Reliability

1. Test-Retest Method. You have a test, and you need to administer it at one time to a group of
examinees. Administer it again at another time to the “same group” of examinees. There is a time
interval of not more than 6 months between the first and second administration of tests that
measure stable characteristics, such as standardized aptitude tests. The post test can be given
with a minimum time interval of 30 minutes.

Test-retest is applicable for tests that measure stable variables, such as aptitude and psychomotor
measure (e.g., typing test, task in physical education).

Correlate the test scores from the first and second administration. Significant and positive
correlation indicates that the test has temporal stability. Correlation is refer to a statistical
procedure where linear relationship is expected for two variables. You may use Pearson Product
Moment Correlation or Pearson r because test data are usually in an interval scale.

2. Parallel Forms Method. There are two versions of a test. The items need to exactly measure the
same skill. Each test version is called a “form.” Administer one form at one time and the other form
to another time to the “same” group of participants. The responses on the two forms should be
more or less the same.
(2)

Parallel Forms are applicable if there are two versions of the test. This is usually done when the
test is repeatedly used for different groups, such as entrance examinations and licensure
examinations. Different versions of the test are given to a different group of examinees.

Correlate the test results for the first form and the second form. Significant and positive correlation
coefficient are expected. The significant and positive correlation indicates in the two forms are the
same or consistent. Pearson r usually used for this analysis.

3. Split-Half Method. Administer a test to a group of examinees. The items need to be split into
halves, usually using the odd-even technique. In this technique, get the sum of the points in the
odd
numbered items and correlate it with the sum of points of the even numbered items. Each
examinee will have two scores coming from the same test. The scores on each set should be close
or consistent.

Split-Half is applicable when the test has a large number of items.

Correlate the two sets of scores using Pearson r. After the correlation, use another formula called
Spearman Brown Coefficient. The correlation coefficient obtained using Pearson r and
Spearman Brown should be significant and positive to mean that the test has internal consistency
reliability.

4. Test of Internal Consistency Using Kuder-Richardson and Cronbach’s Alpha Method. This
procedure involves determining if the scores for each item are consistently answered by the
examinees. After administering the test to a group of examinees , it is necessary to determine and
record the scores for each item. The idea here is to see if the responses per item are consistent
with each other.

This technique will; work well when the assessment tool has a large number of items. It is also
applicable for scales and inventories (e.g., Likert Scale from “strongly agree” to “strongly
disagree”).

A statistical analysis called Cronbach’s Alpha or the Kuder Richardson is used to determine the
internal consistency of the items. A Cronbach’s Alpha value of 0.60 and above indicates that the
test items have internal consistency.

5. Inter-Rater Reliability Method. This procedure is used to determine the consistency of multiple
raters when using rating scales and rubrics to judge performance. The reliability here refers to the
similar or consistent ratings provided by more than one rater or judge when they use an
assessment tool.

Inter-Rater is applicable when the assessment requires the use of multiple raters.

A statistical analysis called Kendall”s Tau Coefficient of Concordance is used to determine if


the ratings provided by multiple raters agree with each other. Significant Kendall’s Tau value
indicates that the raters concur or agree with each other in their ratings.

You will notice that statistical analysis is required to determine the reliability of a measure. The very
basis of statistical analysis to determine reliability is the use of linear regression.

1. Linear Regression.

Linear regression is demonstrated when you have two variables that are measured, such as
two sets of scores in a test taken at two different times by the same participants. When the two
scores are plotted in a graph (with x – axis and y – axis), they tend to form a straight line . The
straight line formed for the two sets of scores can produce a linear regression. When a straight
line is formed, you can say that there is a correlation between the two sets of scores. This can be
seen in the graph shown. The graph is called a scatterplot. Each point in the scatterplot is a
respondent with two scores (one for each test). Given point: P (2, 2), M (4, 6), and Q (10, 8).

(3)

y - axis

• Q (10, 8)

• MN (4,6)


P (2, 2)

x - axis

2. Computation of Pearson r Correlation


The index of the linear regression is called a correlation coefficient. When the points in a
scatterplot tend to fall within the linear line, the correlation is said to be strong. When the direction
of the scatterplot is directly proportional, the correlation coefficient will have a positive value. If the
line is inverse, the correlation will have a negative value. The statistical analysis used to
determine the correlation coefficient is called the Pearson r. How the Pearson r is obtained is
illustrated in the example below.

Example. Suppose that a teacher gave the spelling of two-syllable words with 20 items for
Monday
and Tuesday. The teacher wanted to determine the reliability of two set of scores by
computing for the Pearson r.

N ( Σxy )−Σx (Σy)


Formula: r =
√ [N ( Σx 2 )−( Σx ) 2][ N ( Σy 2 )−( Σy ) 2]

Where: r = Pearson Σxy = summation of x times y


x = first variable Σx (Σx) = summation of x times summation of y
y = second variable (Σx) 2 = square of summation of x
Σx = sum of square of the first variable ( Σy) 2 = square of summation of y
2

Σy2 = sum of square of the second variable

Monday Test Tuesday Test


x y X2 Y2 xy
10 20 100 400 200
9 15 81 225 135
6 12 36 144 72
10 18 100 324 180
12 19 144 361 228
4 8 16 64 32
5 7 25 49 35
7 10 49 100 70
16 17 256 289 272
8 13 64 169 104
Σx = 87 Σy = 139 Σx2 = 871 Σy2 = 2125 Σxy = 1328

N ( Σxy )−Σx (Σy)


Substitute: r =
√ [ N ( Σx 2 )−( Σx ) 2][ N ( Σy 2 )−( Σy ) 2]

10 ( 1328 )−87(139)
r =
√ [10 ( 871 ) −( 87 ) 2][10 ( 2125 )−( 139 ) 2]
( 13280 )−12093
r =
√ [ ( 8710 )−7569][ ( 21250 ) −19321]
1187
r =
√ [1141][1929]
1187
r =
√ 2200989
1187
r = ; r=0.80
14835.573052
(4)

The value of a correlation coefficient does not exceed 1.00 or – 1.00. A value of 1.00 and – 1.00
indicates perfect correlation.
3. Difference Between a Positive and a Negative Correlation
When the value of the correlation is positive, it means that the higher the scores in x, the higher
the scores in y. This is called a positive correlation.
When the value of the correlation is negative, it means that the higher the scores in x, the lower
the scores in y. This is called a negative correlation.
4. Determining the Strength of a Correlation
The strength of the correlation also indicates the strength of the reliability of the test. This is
indicated by the value of the correlation coefficient. The closer the value to 1.00 or - 1.00, the
stronger is the correlation. Below is the guide:
± 1.00 - Perfect (±) correlation
±0.91 - ± 0.99 - Very strong relationship
±0.71 - ± 0.90 - Strong relationship
±0.41 - ± 0.70 - Moderate strong relationship
±0.21 - ± 0.40 - Low relationship
±0.01 - ± 0.20 - Negligible relationship
5. Determining the Significance of the Correlation
The correlation obtained between two variables could be due to chance. In order to
determine if the correlation is free of certain errors, it is tested for significance. When a correlation
is significant, it means that the probability of the two variables being related is free of certain
errors.
In order to determine if a correlation coefficient value is significant, it is compared with an
expected probability of correlation coefficient values called a critical value. When the value
computed is greater than the critical value, it means that the information obtained has more than
95% chance of being correlated and is significant.
Another statistical analysis mentioned to determine the internal consistency of test is the
Cronbach’s alpha. Follow the procedure to determine the internal consistency.

Illustration:

Suppose that five students answered a checklist about their hygiene with a scale 1 to 5 ,
where in the following are the corresponding scores:

5 - always, 4 - often, 3 - sometimes, 2 - rarely, 1 - never

The checklist has five items. The teacher wanted to determine if the items have internal
consistency.

Item 1 Item 2 Item 3 Item 4 Item 5 Total for each Score -


Student x x2 case (x) Mean (Score - Mean)2
A 5 25 5 4 4 1 19 2.8 7.84
B 3 9 4 3 3 2 15 -1.2 1.44
C 2 4 5 3 3 3 16 -0.2 0.04
D 1 1 4 2 3 3 13 -3.2 10.24
E 3 9 3 4 4 4 18 1.8 3.24
2
Total (Σx) 14 21 16 17 13 x = 16.2 Σ(S - M) = 22.8
Σx2 48 91 54 59 39 σ2 = Σ (S - M) 2 / (n –
1)
σ2 = 22.8 / (5 – 1)
SD2 2.2 0.7 0.7 0.3 1.3 ΣSD2 = 5.2 σ2 = 22.8 / 4
σ2 = 5.7
(5)
Formula: Cronbach’s α =
Support solution of finding SD2. n σ 2−ΣSD 2
Formula: SD2 = (Σx2 - CF) / (N - 1) ( )( )
n−1 σ2
Formula: CF = (Σx)2 / N
Substitute:
Solution Item 1: b) SD2 = (Σx2 - CF)/(N - 1)
a) CF = (14)2/5 SD2 = (48 – 39.2)/(5 - 1) Cronbach’s α =
CF = 196/5 SD2 = 8.8 / 4 5 5.7−5.2
( )( )
CF = 39.2 SD2 = 2.2 5−1 5.7
5 0.5
Cronbach’s α = ( )( )
4 5.7
2.5
Cronbach’s α = ( 22.8 ) = 0.109649
or 0.11

The internal consistency of the responses in the attitude toward teaching is 0.11,
indicating negligible internal consistency.

The consistency of ratings can also be obtained using a coefficient of concordance. The
Kendall’s ω coefficient of concordance is used to test the agreement among raters.

Below is a performance task demonstrated by five students rated by three raters. The
rubric used a scale of 1 to 4, where in 4 is the highest and 1 is the lowest.

Five Rater Rater Rater Sum of D=R-x D2


Demonstrations 1 2 3 Ratings
A 4 4 3 11 2.6 6.76
B 3 2 3 8 -0.4 0.16
C 3 4 4 11 2.6 6.76
D 3 3 2 8 -0.4 0.16
E 1 1 2 4 -4.4 19.36
2
Total 42 ΣD = 33.2
x = 8.4

12 ΣD 2
Formula: Kendall’s ω =
m2( N )( N 2−1)

Where: ω = Kendall’s Tau coefficient


m = number of raters
N = number of observation
ΣD2 = summation of squared difference
x = mean

12(33.2) 398.4
Substitute: Kendall’s ω = ; Kendall’s ω =
32 (5)(52−1) 1080
398.4
Kendall’s ω = ; Kendall’s ω = 0.3688888 or 0.37
9(5)(24)

A Kendall’s ω coefficient value of 0.37 indicates the agreement of the three raters in
the five demonstrations. There is weak concordance among the three raters because the value is
far from 1.00.

Validity
A measure is valid when it measures what it is supposed to measure. If a quarterly examination is
valid, then the contents should directly measure the objectives of the curriculum. If a scale that measures
personality is composed of five factors, then the scores on the five factors should have items that are highly
correlated. If an entrance examination is valid, it should predict students’ grades after the first semester.

Different Ways to Establish Test Validity

There are different ways to establish test validity:


(6)

1. Content Validity. When the items represent the domain being measured. The items are
compared with the objectives of the program. The items need to measure directly the objectives
(for achievement) or definition (for scales). A reviewer conducts the checking.

A coordinator in science is checking the science test paper for grade 4. She asked the grade 4
science teacher to submit the table of specifications containing the objectives of the lesson and the
corresponding items. The coordinator checked whether each item is aligned with the objectives.

2. Face Validity. When the test is presented well, free of errors and administered well. The test
items and layout are reviewed and tried out on a small group of respondents. A manual for
administration can be made as a guide for the test administrator.

The assistant principal browsed the test paper made by the math teacher. She checked if the
contents of the items are about mathematics. She examined if instructions are clear. She browsed
through the items if the grammar is correct and if the vocabulary is within the students’ level of
understanding.

3. Predictive Validity. A measure should predict a future criterion. Example is an entrance exam
predicting the grades of the students after the first semester. A correlation coefficient is obtained
where the x-variable is used as the predictor and the y-variable as the criterion.

The school admission’s office developed an entrance examination. The officials wanted to
determine if the results of the entrance examination are accurate in identifying good students. They
took the grades of the students accepted for the first quarter. They correlated the entrance exam
results and the first quarter grades. They found significant and positive correlations between the
entrance examination scores and grades. The entrance examination results predicted the grades
of students after the first quarter. Thus, there was prediction validity.

4. Construct Validity. The components or factors of the test should contain items that are strongly
correlated. The pearson r can be used to correlate the items for each factor. However, there is a
technique called factor analysis to determine which items are highly correlated to form a factor.

A science test was made by a grade 10 teacher composed of four domains: matter, living things,
force, and earth and space. There are 10 items under each domain. The teacher wanted to
determine if the 10 items made under each domain really belonged to that domain. The teacher
consulted an expert in test measurement. They conducted a procedure called factor analysis.
Factor analysis is a statistical procedure done to determine if the items written will load under the
domain they belong.
5. Concurrent Validity. When two or more measures are present for each examinee that measure
the same characteristic. The scores on the measures should be correlated.

A School guidance counsellor administered a math achievement test to grade 6 students. She also
has a copy of the students’ grade in math. She wanted to verify if the math grades of the students
are measuring the same competencies as the math achievement test. The school counsellor
correlated the math achievement scores and math grades to determine if they are measuring the
same competencies.

6. Convergent Validity. When the components or factors of a test are hypothesized to have a
positive correlation. Correlation is done doe the factors of the test.

A math teacher developed a test to be administered at the end of the school year, which measures
number sense, patterns and algebra, measurement, geometry, and statistics. After administering
the test, the scores were separated for each area, and these five domains were intercorrelated
using Pearson r. The positive correlation between number sense and patterns and algebra
indicates that, when number sense scores increase, the patterns and algebra scores also increase.
This shows student learning of number sense scaffold patterns and algebra competencies.

(7)

7. Divergent Validity. When the components or factors of a test are hypothesized to have a
negative correlation. An example to correlate are the scores in as test on intrinsic and extrinsic
motivation.

An English teacher taught metacognitive awareness strategy to comprehend a paragraph for grade
11 students. She wanted to determine if the performance of her students in reading comprehension
would reflect well in the reading comprehension test. She administered the same reading
comprehension test to another class which was not taught the metacognitive awareness strategy.
She compared the results using a t-test for independent samples and found that the class that was
taught metacognitive awareness strategy performed significantly better than the other group. The
test has divergent validity.

How to Determine if an Item is Easy or Difficult (Item Analysis)

An item is difficult if majority of students are unable to provide the correct answer. The item is easy if
majority of the students are able to answer correctly. An item can discriminate if the examinees who score
high in the test can answer more the items correctly than examinees who got low scores.

The Item Analysis Procedure is as follows:

Step 1. Arrange the test papers from highest to lowest score.

Step 2. Select 27% of the papers from the lower group and 27% from the upper group.
 For smaller classes such as a group of only 20 students, you may just divide it in half with 10
test papers (students) belonging to the lower group and 10 test papers (students) belonging in
the upper group.
 In the example (40 high school students 0, 27% would be 10.8 or 11. You are going to get the
bottom 11 test papers (lower group) and upper 11 test papers (upper group) and set aside the
middle 18 test papers.

Step 3. Tabulate the number of students in both the upper and lower groups who selected each
alternative.

Example: A tabulation of the number of students who selected each alternative for the first five terms of
the
given test is shown in Table 6.1.
Table 6.1. Sample Tabulation of Students’ Responses
Groups (upper and Alternatives No. of Students
lower 27%) who got the item
Item No. a b c d Total
right
Upper 0 0 1 10 10 11
1 Lower 1 0 1 9 9 11
Upper 8 1 1 1 8 11
2 Lower 4 2 2 3 4 11
Upper 8 1 2 0 8 11
3 Lower 5 2 3 1 5 11
Upper 1 0 0 10 10 11
4 Lower 0 1 0 10 10 11
Upper 3 2 1 5 5 11
5 Lower 5 4 2 0 0 11

A. Determining the Difficulty Index

In computing for the difficulty index of each item using the formula below:

R
Item Difficulty =
T
(8)

Where:
R = Number of students who got the item right from both groups.
T = Total number of students from both group.

Example: Compute for the difficulty index of the first five test items given earlier.

Table 6.2. Difficulty Index of the Sample Test Items

No. of Students who Got the


Item Correct (From both Verbal
Item No. Difficulty Decision
Groups) Interpretation
Index

1 19 0.86 Very Easy Reject/Revise


2 12 0.55 Ideal Difficulty Retain
3 13 0.59 Ideal Difficulty Retain
4 20 0.91 Very Easy Reject/Revise
5 5 0.23 Difficult Retain

R
Formula: Item Difficulty =
T
Solutions:

Item No. 1. Item No. 2. Item No. 3. Item No. 4. Item No. 5.
D = 19 / 22 D = 12 / 22 D = 13 / 22 D = 20 / 22 D = 5 / 22
D = 0.86 D = 0.55 D = 0.59 D = 0.91 D = 0.23

Guide in Interpreting the Computed Difficulty

1.00 - 0.81 - Very Easy


0.80 - 0.61 - Easy
0.60 - 0.41 - Ideal Difficulty
0.40 - 0.21 - Difficulty
0.20 - 0.00 - Very Difficult
B. Determining the Discrimination Index
Compute for the Discrimination Index of each item using the formula below:
RU −RL
Discrimination Index =
1 /2 T
Where:
RU = Number of students in the upper group who answered the item correctly.
RL = Number of students in the lower group who answered the item correctly.
½ T = One half of the total number of students included in the analysis which is also
equal to the number of students in one of the two groups (lower and upper
group).

Example: Compute for the discrimination index of the first five test items given earlier.

(9)

Table 6.3. Discrimination Index of the Sample Test Items

Verbal
Item No. Upper Lower Difficulty Discrimination Interpretation
Group Group Index Index (Discriminating Decision
Index)
1 10 9 0.86 0.09 Poor Reject/Revise
2 8 4 0.55 0.36 Good Retain
3 8 5 0.59 0.27 Moderate Retain
4 10 10 0.91 0 Poor Reject/Revise
5 5 0 0.23 0.45 High Retain

Formula:
RU −RL
Discrimination Index =
1 /2 T
Solutions:

Item No. 1. Item No. 2. Item No. 3.


Disc = (10 - 9) / 11 Disc = (8 - 4) / 11 Disc = (8 - 5) / 11
Disc = 1 / 11 Disc = 4 / 11 Disc = 3 / 11
Disc = 0.09 Disc = 0.36 Disc = 0.27

Item No. 4. Item No. 5.


Disc = (10 - 10) / 11 Disc = (5 - 0) / 11
Disc = 0 / 11 Disc = 5 / 11
Disc = 0 Disc = 0.45

Guide in Interpreting the Computed Difficulty

Below - 0.20 - Poor Discriminating Index


0.21 - 0.30 - Moderate Discriminating Index
0.31 - 0.40 - Good Discriminating Index
0.41 - 0.00 - High Discriminating Index
In extreme cases, a negative value for the discriminating index might occur. This would mean that
there are more students in the lower group who got the item correctly compared to the upper group. This
could mean that the item is questionable and there might be high degree of ambiguity in the test item.
Remember however, that these are assumptions or guesses as to the reasons why it occurred. The data
from item analysis tell us only what specific items are poorly functioning, and it does not tell us the reasons
or causes of its poor.

LESSON 7 :

ORGANIZATION OF TEST DATA USING TABLES AND GRAPHS

THINK ABOUT THESE EXPECTATIONS:

1. Organize data using tables and graphs.


2. Interpret frequency distribution of test data.

Opening

The appropriate statistical tools and procedures to apply for the results of testing are as follows:
For Traditional assessment, the common statistical tools to assess the scores are measures of central
tendency, point measures, and measures of variability.
For Authentic assessment, particularly on performance test, the common statistical tools to assess the
scores are measures of central tendency, point measures, and measures of variability are still applied.
For Rubric assessment, weighted arithmetic mean is used.
For Investigatory projects, usually mean, t-Test (bivariate experimental design), z-Test (bivariate
descriptive design), F-test or ANOVA (analysis of variance), and many others are employed.
The scores collected from assessments are arranged in a methodical order by grouping them in
classes in a form of frequency distribution. This Lesson 7 of Module 3 presents the frequency distributions,
tallying the scores, and graphical representation like bar graph, line graph, pictograph, and circle graph.

Frequency Distribution

Frequency distribution is applicable when the number of cases (N) is 30 or more. Table 2.1 scores are
results of 50 teacher education students in a 110-item test in Assessment of Learning 2 in a certain State
University in Metro Manila.

Table 7.1. Scores of 50 Teacher Education Students in a 110-item Test in Assessment


of Learning 2 in a certain State University in Metro Manila (Artificial Data)

50 97 96 95 48 55 58 59 51 53

85 80 83 77 70 60 62 63 64 65

90 91 92 93 90 83 82 66 67 68

98 70 71 72 73 74 75 76 77 69
(2)

Generally speaking, frequency distribution is any arrangement of data that shows the frequency of
occurrence of different values of variable or frequency of occurrence of values falling within arbitrary
defined ranges of the variable known as class interval.

In arranging the scores in a form of frequency distribution, the steps are as follows:
Step 1. Find the absolute range. The range is obtained by subtracting the highest score (HS) and lowest
score (LS).
R = HS - LS; R = 98 - 48; R = 50
Step 2. Find the class interval. In finding the class interval, divide the range by 10 and by 20 such that the
class limits are not less than 10 and not more than 20, provided that the class covers the total
number of scores.

Step 2 can be modified in finding the class interval by the use of Sturge’s formula to obtain a
common result as follows:
(2)

R
c = (2.1) Formula of k:
k
k = 1 + 3.32 log N (2.2)
Where: c = class interval
R = Range
k = definite divisor

Computation:
a) Solve for k:

k = 1 + 3.32 (log 50) use logarithm table in your cellphone


k = 1 + 3.32 (1.69897)
k = 1 + 5.64058
k = 6.64

b) Solve for c:

50
c = ; c = 7.53 or c = 8 (rounded off)
6.64

Step 3. Set up the classes. Look for a multiple of c whose product is less than or equal to the lowest
score.
8 x 6 = 48

Step 4. Choose the starting lower class limit. The product is the lower limit to the upper limit whose value
of c is decrease by 1 added to lower limit to serve as upper limit..

48 - 55 lower class limit


lower limit upper limit

Step 5. List down the class limits or class interval and tally the score for each class interval. The
procedure is starting from the lower class limit in a vertical column going upward.
Table 7.2. Frequency Distribution of Scores in Assessment of Learning 2
Test Taken by 50 Teachers Education Students in a
State University in Metro Manila (Artificial Data)

Class Interval Tally Frequency (f)

96 - 103 IIII 4
88 - 95 IIII – I 6
80 - 87 IIII - III 8
72 - 79 IIII –IIII - II 12
64 - 71 IIII - IIII 10
56 - 63 IIII 5
48 - 55 IIII 5

Total N = 50

The tally must be carefully checked if the sum of each class is equal and also to the number of cases.
If unequal tally occurs, tallying must be repeated and rechecked to arrive at an exact tally and frequency. At
the bottom of column 3, symbol N or Σf is written which means number of cases (N) or ‘sum of”
frequency (Σf) equals to 50.

(3)

Present Test Data Graphically

There is a saying, “A picture is worth a thousand words.” In a similar manner, “a graph can be worth a
hundred or a thousand numbers.” The use of tables may not be enough to give a clear picture of the
properties of a group of test scores. If numbers presented in tables are transformed into visual models, then
the reader becomes more interested in reading the material.

There are many types of graphs, but the most common methods of graphing a frequency distribution
are the following:

1. Histogram. A histogram is a type of graph appropriate for quantitative data such as test scores.
This graph consists of columns – each has a base that represents one class interval, and its height
represents the number of observations or simply the frequency in that class interval. There are
statistical software that are available to help construct histograms and other forms of graphs. Look
at the graph in Figure 7.1 below.

25 •

20 •

15 •

10 •

5 •

0 • • • • •
20.00 40.00 60.00 80.00
Figure 7.1. Histogram of Test Scores of College Students

2. Frequency Polygon. This is also used for quantitative data, and it is one of the most commonly
used methods in presenting test scores. It is the line graph of a frequency polygon. It is very similar
to a histogram, but instead of bars, it uses lines to compare sets of test data in the same axes.
Figure 7.2 illustrates a frequency polygon.
14

12

10
8

0 90 95 100 105 110 115 120 125 130 135 140 145 150

Figure 7.2. Frequency Polygon in Reading Comprehension Test

(4)

You can construct a frequency polygon manually using the histogram in Figure 7.1 by following these
simple steps.
Step 1. Locate the midpoint on the top of each bar. Bear in mind that the height of each bar represents
the frequency in each class interval, and the width of the bar is the class interval. As such, that
point in the middle of each bar is actually the midpoint of that class interval. In the histogram on
Figure 7.1, there are two spaces without bars. In such a case, the midpoint falls on the line.
Step 2. Draw a line to connect all the midpoints in consecutive order.
Step 3. The line graph is an estimate of the frequency polygon of the test scores.

Following the above steps, you can draw a frequency polygon using the histogram presented earlier in
Figure 7.1.

25 •

20 •

15 •

10 •

5 •

0 • • • • •
20.00 40.00 60.00 80.00

Figure 7.3. Frequency Polygon of Test Scores of College Students


3. Bar Graph. This graph is often used to present frequencies in categories of a qualitative variable.
It looks similar to a histogram, constructed in the same manner, but spaces are placed in between
the consecutive bars. The columns represent the categories and the height of each bar as in a
histogram represents the frequency. Figure 7.4 is shown below.

25 •

20 •

15 •

10 •

5 •

0 • • • • •
20.00 40.00 60.00 80.00

Figure 7.4. Bar Graph of Test Scores of College Students


(5)

4. Pie Graph. One commonly used method to represent categorical data is the use of a circle graph.
You have learned in basic mathematics that there are 360° in a full circle. As such the categories
can be represented by the slices of the circle that appear like a pie, thus, the name pie graph. The
size of the pie is determined by the percentage of students who belong in each category.

Example. In a class of 100 students, results were categorized according to different levels which
shown below.

No. of Percent
Group Students Percentage Equivalent in the
(%) Circle
Above Average 10 10 % 0.10 x 360 = 36°
Average 40 40 % 0.40 x 360 = 144°
Below Average 30 30 % 0.30 x 360 = 108°
Poor 20 20 % 0.20 x 360 = 72°
Total10 100 100 % 360°

Graph:
Above
Poor Average
72% 10%

Average
Below 40%
Average
30%

Figure 7.5. Students Results According to Different Groups

Skewness

Examine the graphs below.

Y
Figure 7.6. Symmetrical Distribution of Test Scores

Y Y

x x

Figure 7.7. Negatively Skewed Distribution Figure 7.8. Positively Skewed Distribution

(6)

Figure 7.6 is labeled as normal distribution. Note that half the area of the curve is a mirror reflection
of the other half. It I symmetrical distribution, which is also referred to as bell-shaped distribution. The
higher frequencies are concentrated in the middle of the distribution. A number of experiments have shown
that IQ scores, height, and weight of human beings follow a normal distribution.

The graphs in Figure 7.7 and Figure 7.8 are symmetrical in shape. The degree of asymmetry of graph
is its skewness. Basic principle of a coordinate system tells you that, as you move toward the right of the
x-axis, the numerical value increases. Likewise, s you move up the y-axis, the scale value becomes higher.
Thus, in a negatively-skewed distribution, there are more who get higher scores and the tail, indicting lower
frequencies of distribution points to the left or to the lower scores. In positive-skewed distribution, lower
scores are clustered on the left side. This means that there are more who get lower scores and the tail
indicates the lower frequencies are on the right or to the higher scores.

Kurtosis

Another way of differentiating frequency distributions is shown below. Consider now the graphs of
three frequency distributions in Figure 7.9.

f z

Test Scores

What is common among the three distributions?


What difference can you observe among the three distribution of test scores?

It is the flatness of the distribution, which is also the consequence of how high or peaked the
distribution is. This property is referred to as kurtosis.

x is the flattest distribution. It has a platykurtic (platy, meaning broad or flat) distribution. y is the
normal distribution and it is a mesokurtic (meso, meaning intermediate) distribution. z is the steepest or
slimmest, and is called leptokurtic (lepto, meaning narrow) distribution.

What curve has more extreme scores than the normal distribution?

What curve has more scores that are far from the central value (or average) than does the normal
distribution?

For the meantime, the characteristics are simply described visually. The next lesson will connect these
visual characteristics to important statistical measures.

You might also like