0% found this document useful (0 votes)
8 views

Final Chap1-5_FerdinandAzuela

Uploaded by

Jb Doria
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
8 views

Final Chap1-5_FerdinandAzuela

Uploaded by

Jb Doria
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 106

ITEM BIAS ELIMINATION ANALYSIS IN MATHEMATICS 7:

A MANTEL-HAENSZEL APPROACH

A Dissertation Presented to the Faculty


of the School of Advanced Studies
Pangasinan State University
Urdaneta City

In Partial Fulfillment of the Requirements for the


Degree Doctor of Education
Major in Mathematics

FERDINAND S. AZUELA

JUNE 2023
APPROVAL SHEET

In partial fulfillment of the requirements for the degree of Doctor of Education

major in Mathematics, this dissertation entitled “ITEM BIAS ELIMINATION

ANALYSIS OF ACHIEVEMENT TEST IN MATHEMATICS 7: A MANTEL-

HAENSZEL APPROACH”, prepared and submitted by FERDINAND S. AZUELA, is

recommended for acceptance and approval.

MELODY C. DE VERA, EdD ARLENE N. MENDOZA, PhD


Critic Reader Adviser

Approved by the Committee on Oral Examination

MICHAEL HOWARD D. MORADA, PhD


Chair

CHRISTOPHER J. COCAL, PhD RODELIO M. GARIN, PhD


Member Member

JOSEPH B. CAMPIT, EdD ANA PERLA B. GUZMAN, PhD


Member Member

Accepted and approved in partial fulfillment of the requirements for the degree of

Doctor of Education, Major in Mathematics.

JOEL T. CAYABYAB, EdD PAULO V. CENAS, EdD


Deputy Director for Academics, Executive Director
Research and Extension
ACKNOWLEDGEMENT

The researcher could not have finished this academic endeavor without the

commendable assistance and guidance extended by God through His instruments. Thus, he

sincerely expresses his profound gratitude and indebtedness to:

Dr. Paulo V. Cenas, the Executive Director, for instilling to him the value of

character and service;

Dr. Arlene N. Mendoza, the adviser, for being the keyperson for inculcating the

learning attitude through her constant supervision and guidance as well as her valuable

comments and suggestions that benefited him much in the completion and success of this

study;

Dr. Melody C. De Vera, the critic reader, for being the backbone of this scholarly

journey as she generously shared her time, wisdom, and expertise in enhancing the

technical aspect of this manuscript;

Dr. Michael Howard D. Morada, the chairperson of the panel, for his brilliant ideas

for the improvement of this study and for his words of encouragement; Dr. Rodelio M.

Garin, Dr. Christopher J. Cocal, Dr. Joseph B. Campit, Dr. Ana Perla B. Guzman, the

distinguished members of the committee for oral examination and Mr. Bobby F. Roaring,

the genius statistician, for imparting their unparalleled expertise, wisdom, untiring

assistance, and immeasurable effort to help the researcher improve and finish his study;

The School Heads of public and private junior high schools in the Municipality of

Victoria for allowing the researcher to float the questionnaires that allowed the researcher

to gather the needed data as the building block of this study;


Erlinda S. Azuela, Dionisio G. Azuela and Elena B. Duran, his parents and mother-

in-law, for their love, care, and constant prayers for his academic life and for being the

paramount reason of this academic achievement; and

No words of thanks can sum up the gratitude and indebtedness to his beloved and

supportive wife Dr. Angela Francesca D. Azuela who is always by my side in times that

he need her most and helped him throughout the accomplishment of this research.

All others, who in one way or another extended a helping hand for the completion

of this research.

Above all, to the Almighty God for all his wonderful blessings, kindness, and

graces, to Him be all the honor and the glory!


DEDICATION

For his Parents…

For his Wife …

For the Learners…

And above all…

For our Almighty God, for the gift of wisdom and for all the blessings

He bestowed on his life.

May this bring back to Him all the glory and honors.

FSA
TABLE OF CONTENTS

TITLE PAGE i

APPROVAL SHEET ii

ACKNOWLEDGEMENT iii

DEDICATION v

TABLE OF CONTENTS ix

LIST OF TABLES xii

LIST OF FIGURES xiii

ABSTRACT x

CHAPTER

1 THE PROBLEM

Background of the study 1

Statement of the Problem 5

Significance of the Study 6

Scope and Delimitation 7

Definition of Terms 7

2 REVIEW OF RELATED LITERATURE AND STUDIES

Related Literature 11

Related Studies 29

Conceptual Framework 34

3 METHODS OF THE STUDY AND SOURCES OF DATA


Research Design 36

Respondents of the Study 37

Data Gathering Procedure 38

Research Instrument 40

Statistical Treatment of Data 45

4 PRESENTATION, ANALYSIS, AND INTERPRETATION OF DATA

Students Profile 49

Students Learning Styles 49

School Type 51

Academic Performance in Mathematics 53

Content, Construct, and Concurrent Validity, and Internal 55


Consistency Reliability of Mathematics Test

Item Bias Elimination using Mantel- Haenszel Chi square Analysis 62


Based on Students Profile

Content, Construct, and Concurrent Validity, and Internal 73


Consistency Reliability of the Mathematics Test after Removing
Bias Items

Comparison of Content, Construct, and Concurrent Validity of the 79


Original and Revised Test Versions

Comparison of Internal Reliability of the Original and Revised 85


Test Versions

5 SUMMARY, CONCLUSIONS AND RECOMMENDATIONS

Summary 87

Salient Findings 88

Conclusions 91

Recommendations 92
BIBLIOGRAPHY 93

APPENDICES

A. Revised Test Version of Mathematics Achievement Test 96

(2nd Quarter)

B. Letter to the Superintendent 101

C. Letter to the School Heads 102

D. Original Test Version of Mathematics Achievement Test 103

E. Table of Specification 112

F. Questionnaire for Learning Styles 114

G. Letter to Experts for Content Validity 116

H. Questionnaire for Content Validation 118

I. Results on Students’ Learning Styles 132

J. Profile of the Students 133

K. Content Validity Results 135

L. Item Difficulty Index Results 137

M. Item Discrimination Index Results 139

N. Construct Validity using PCA of the Original Test Version 141

O. Item Difficulty Index after the Detection of Item Bias 142

P. Item Discrimination Index after the Detection of Item Bias 144

Q. Construct Validity using PCA after the Detection and Elimination

of the Bias Items 146

R. Construct Validity using PCA of the Revised Test Version 147

CURRICULUM VITAE 148


LIST OF TABLES

Corresponding Number of Experts with 41


1
Acceptable CVI Values

2 Difficulty and discrimination Index 42

3 KMO Value of Samples Adequacy 43

Cronbach’s Coefficient of Reliability 44


4
5 Academic Performance 45

Guilford’s Magnitude of Significant 46


6
Correlation

Detection Threshold and Effect size of 48


7 Mantel-Haenszel Chi-Square Statistics
DIF Detection Method

8 Profile of Grade 7 Students 49

Content Validity of the Grade 7 55


9
Mathematics Achievement Test

Summary of Difficulty Index of Grade 7 56


10 Mathematics Achievement Test

Summary of Discrimination Index of 57


11
Grade 7 Mathematics Achievement Test

12 KMO and Bartlett’s Test of Sphericity of 59


Original Test Version

Concurrent Validity of Grade 7 61


13
Mathematics Achievement Test

Internal Consistency Reliability of the 61


14
Mathematics 7 Achievement Test

Detected Bias Items as Based on School 63


15
Type
Detected Bias Items as Based on Learning 66
16
Styles (Auditory vs NonAuditory)

Detected Bias Items as Based on Learning 67


17 Styles (Visual vs NonVisual)

Detected Bias Items as Based on Learning 70


18
Styles (Tactile vs NonTactile)

Summary of Detected Bias Items Using 71


19
Mantel-Haenszel Chi-Square and DIF
Analysis

Content Validity of the Grade 7 74


20 Mathematics Achievement Test after the
Detection of Bias Items

Summary of Difficulty Index of Grade 7 74


21 Mathematics Achievement Test after the
Detection of Bias Items

Summary of Discrimination Index of 74


22 Grade 7 Mathematics Achievement Test
after the Detection of Bias Items

Difficulty and Discrimination Index of


23 75
Grade 7 Mathematics Achievement Test
after the Detection of Bias Items

KMO and Bartlett’s Test of Sphericity


24 77
Test after the Detection of Bias Items

Concurrent Validity of Grade 7 78


25 Mathematics Achievement Test after the
Detection of Bias Items

Internal Consistency Reliability of Grade 79


26 7 Mathematics Achievement Test after the
Detection of Bias Items

Comparison of Content Validity of the 79


27 Grade 7 Mathematics Achievement Test
Original and Revised Test Versions
Comparison of Difficulty Index of the 80
28 Grade 7 Mathematics Achievement Test
Original and Revised Test Versions

Comparison of Discrimination Index of 80


29 the Grade 7 Mathematics Achievement
Test Original and Revised Test Versions

30 Itemized Difficulty and Discrimination 81


Index of the Revised Test Versions

KMO and Bartlett’s Sphericity Test of


31 83
the Revised Test Version

Comparison of the Concurrent Validity of 84


32 the Grade 7 Mathematics Achievement
Test Original and Revised Test Versions

Internal Consistency Reliability of the 85


33
Original and Revised Test Versions

Table of Specification of the Revised Test 86


34 Version of Achievement Test in
Mathematics
LIST OF FIGURES

Figure Page

1 Research Paradigm 35

2 Scree Plot of the Original Test Version 60

3 Scree Plot of the Test after the Detection of Bias Items 78

4 Scree Plot of the Revised Test Version 83


ABSTRACT

Dissertation Title: ITEM BIAS ELIMINATION


ANALYSIS IN MATHEMATICS 7: A
MANTEL-HAENSZEL APPROACH

Name of Researcher: FERDINAND S. AZUELA

Adviser: ARLENE N. MENDOZA, PhD

Institution: Pangasinan State University


School of Advance Studies
Urdaneta City

Degree/Specialization: DOCTOR OF EDUCATION


Major in Mathematics

Date Finished: June 2023

Keywords: Item Bias


Mantel-Haenszel Approach
Test Standardization

This study aimed to analyze and detect biased items of the grade 7 Mathematics

Achievement Test using the Mantel Haenszel Method based on the students’ type of school

and learning styles. It was also designed to strengthen test validity as a basis for test

standardization.

A total of 207 grade 7 students were randomly selected from all public and private

schools in the municipality of Victoria. A 60-item test was constructed and subjected to

validation and reliability test. Experts’ judgement and item analysis were performed for

content validity, testing the relationship between the constructed test and students’

performance on 2nd quarter for concurrent validity using Pearson product moment

correlation and Linear Regression, and Principal Component Analysis (PCA) was used to
describe test construct validity. A descriptive and development research design was used

to analyze the biased items using the Mantel-Haenszel approach.

Eighteen test questions that were detected and eliminated were significantly biased

after tallying as greater than the critical value (3.8415) at a significance level of 0.05. After

the process of validation and removal of significantly biased questions, the revised test

version consisted of thirty-one questions that were essential, a much higher percentage of

questions with an optimum level of difficulty index, as well as its acceptable discrimination

index. An increase in concurrent validity was also evident, indicating a less homogeneous

test score. This indicates that the revised test version was more valid than the original

version.

The findings indicate that the Mantel-Haenszel chi-square analysis can be used to

detect a large amount of DIF, which may strengthen the validity and reliability of the test

questionnaire.
Chapter 1

THE PROBLEM

Background of the Study

The need to overcome the difficulty of learning Mathematics is obvious in the

Trends in International Mathematics and Science Study (TIMSS)2019 Assessment

Framework, which found that Filipino pupils lagged behind other nations in the

international assessment for Grade 4 Mathematics. The country received a score of 297,

which is much lower than the TIMMS Scale Center point of 500, and the lowest among

the 58 countries. The said assessment was topped by the country’s East Asian neighbors,

such as Singapore, Chinese Taipei, Korea, Japan, and Hong Kong. Meanwhile, only 19

percent of Grade 4 Filipinos who took part in the survey were classified as Low

International Benchmark, indicating that they had some fundamental arithmetic

understanding, whereas 81 percent did not even reach this level.

Before the release of TIMSS results, the Programme for International Student

Assessment 2018 conducted a triennial survey of 15-year-olds worldwide to determine the

extent to which they acquired essential information and abilities. The evaluation focused

on the core school disciplines of Reading, Mathematics, and Science as well as the student's

competency level in an innovative domain called global competence. The assessment

resulted in a 353-point score for mathematics, which was much lower than the average

score of 489. This reveals that Filipinos aged 15 trailed behind other countries in terms of

global competency.
Similarly, the low performance of Filipino learners was also evident in the National

Achievement Test (NAT) of private and public school learners from 2009 to 2014.

Statistics show that the mean percentage score in Mathematics was far from the target of

75% mastery level. The National Achievement Test 2018 results recorded the lowest MPS

in the history of the DepEd standardized exam. The recently concluded results on the

conduct of learning loss among private schools nationwide, Grades 2 to 12, brought about

by the pandemic based on the Philippine Assessment for Learning Loss Solutions

(PALLS), during the last quarter of 2022, bannered a 47.5% average score in mathematics,

much lower than the 60% passing percentage set by DepEd. The assessment consists of 75

multiple-choice type items covering the three core subjects of the previous level. This

indicates that learning loss continuously occurs during the pandemic.

These claims on the results of international and national assessments show accuracy

in the sense that the forms of assessments go through the process of standardization; that

is, it is consistent with what is intended to measure. The school-based results from the

regional administration of Project All Numerates (PAN) emphasized these claims on

assessments. The results of the assessment program will conclude on the numeracy level

of Grade 1-7 learners based on the standardized tool crafted July 22-24, 2020, and piloted

November 2020, per RM No. 194, s.2020. Based on the results of Grade 7 learners of

Victoria National High School on the post-test of PAN administration, the school year

2021-2022, it was found that 368 or 54.68% of the Grade 7 examiners were non-numerates,

while only 62 or 9.21% of the examinees were enumerated. This may imply that most

learners scored 0–49% on the test. While there are 346 (46.95%), Grade 7 examiners are

non-numerates, while only 31 (4.21%) of the examinees are numerates as they started the
school year 2022-2023. The results of the program did not justify teachers' teaching and

learning practices during the school year. In making fair and right decisions based on the

test results and selecting students who have ability and interest by DepEd standards, the

ability to be measured in the test must be evaluated accurately.

This proof of the low performance of Filipino learners based on assessments

signifies poor learning outcomes, which may be caused by different factors. Thus, the

recalibration of teaching practices and strengthening of assessment procedures must be

reviewed to solve worldwide and national standardized assessment issues. Furthermore,

this study might provide an innovative strategy to strengthen test quality to address the

achievement gap between any type of learner in Mathematics.

Achievement gaps occur when the results of the assessments are statistically

significant between the performance of the two groups of students, where one group

outperformed the other. Thus, the results of this study could be used as a basis for

improving the validity of strengthening assessments. The effectiveness of teaching can be

defined through student assessment results. Therefore, the testing should be performed and

prepared. Tests and testing are important to help children develop their capacity to deal

with standardized exams. As a result, the validity of the instruments must be tested by test

developers to confirm that they are intended for students. It should be designed to assess

learners' critical thinking while considering the individual variability in IQ levels. The

evaluation instrument should be validated to obtain accurate outcomes for the learners.

With these in mind, this study provides a validated and free from item bias

Mathematics achievement test to ease the achievement gap. Bias items are those that

behave differently for people from two different cultures. Item bias exists if an item proved
to be more difficult for one group than the other or easier for one group than the other,

assuming that students' average ability level remains constant. Bias is defined as the

presence of a trait in an item that results in differing performance for people of the same

ability who come from various ethnic, sex, cultural, or religious groups. If the proportion

of correct responses was the same in both the groups, the question was considered unbiased

(Rudner et. al, 1980) that each test item should possess. Determination of the item bias of

the items in the tests is important to increase test validity and reliability. Kristjansson et.

al. (2005) pointed out that item bias is an important factor that threatens the validity of the

measurements, and that test and item bias detection methods should be used as much as

possible. Thus, test bias distorts results by allowing examinees' characteristics to influence

the measure of the main construct. Item bias is a possible threat to validity (Ackan et.al,

2019, Clauser & Mazor, 1998). Therefore, the conduct of the research on this matter is of

importance.

In this regard, teachers are likely to use test scores to assess student performance;

therefore, creating bias-free test questions is crucial. This study will be of considerable

assistance to both teachers and students as item developers and users of the instruments,

providing accurate assessments for students and a ready-made assessment tool for teachers.

Since item bias studies in the Philippine setting using different methods to analyze this

kind of bias are insufficient, the study may also provide opportunities, especially in the

educational system, to include this method in validating test items to improve the quality

of teaching through fair assessment procedures.

In conducting the study, the Mantel-Haenszel chi-square technique was utilized to

uncover bias. According to Fontaine (2005), Mantel-Haenszel can be used to compare


groups in which the observed item scores are dichotomous, such as correct incorrect, or

yes-no. This is the most extensively used method for detecting item bias. Previous research

also has shown that, compared to other procedures, the Mantel-Haenszel approach is

effective in detecting item bias (Narayanan, 1995). Moreover, in Skaggs's (1992) research

on the consistency of detecting item bias across different test administrations, the Mantel-

Haenszel method is one of the most consistent ways to detect item bias among males and

females from a curriculum-based mathematics test. It is theoretically easy to understand,

easy to implement, does not require knowledge of Item Response Theory (IRT), and is

often used for DIF detection (Holland & Thayer, 1988; Wainer, 2010; Diaz et. al, 2021).

The purpose of this study is to use the Mantel Haenszel chi-square method to

analyze, detect and remove biased items in the Mathematics achievement test to improve

mathematical competence among Grade 7 students.

Statement of the problem

The study aimed to analyze and detect biased items of the Grade 7 Mathematics

Achievement Test using the Mantel Haenszel Method. Specifically, it sought to answer the

following questions:

1. What is the students’ profile in terms of:

1.1 Learning Styles;

1.2 School Type;

1.3 Academic performance in Mathematics (2nd Quarter)

2. What is the content, construct and concurrent validity, and internal consistency

reliability of the Mathematics Test?

3. What are the detected bias items based on the student's profile?
4. What is the content, construct and concurrent validity, and internal consistency

reliability of the revised Mathematics Test after the detection and removal of the

bias items?

5. How do the original and the revised test versions be compared in terms of

a. Content validity;

b. Construct validity, and

c. Concurrent validity?

6. How do the original and the revised test versions compare in terms of internal

consistency reliability?

Significance of the study

The study aimed to analyze and detect bias items in the Grade 7 Mathematics Test

using the Mantel Haenszel Chi-square Method. This study is significant because it would

benefit the students, teachers, and school personnel/administrators.

Students. Since the results of the study will lead to producing validated and item

bias-free test questions, it accurately identifies difficulty in different competencies in

Mathematics 7. Also, it will strengthen critical thinking to answer pen-and-paper

assessments. With this, the teacher will be able to address students' difficulties specifically

to achieve mastery among learners.

Teachers. The study will be of great help to producing validated assessment

instruments free from item bias for teachers, as assessment developers. The result of the

study will be a useful instrument to identify difficulties among the students and come up

with different interventions in the teaching and learning process. It will serve as a

standardized instrument to measure students' mastery of the lesson.


School administrations and the Department of Education personnel. The study

will produce validated assessment instruments for Grade 7 Mathematics that can be used

by teachers, in general. Also, it will address the accurate difficulty of students in

Mathematics that will provide a basis for curriculum enhancement.

Researchers. The study will also serve as a basis for future researchers concerned

with providing a validated assessment instrument to increase the level of mastery and

achievement among the students.

Scope and Delimitations

The study aimed to analyze and detect bias items in the Grade 7 Mathematics

Achievement Test using the Mantel Haenszel Chi-square Method.

Randomly selected students from Grade 7 enrolled in different public and private

schools in Victoria, Tarlac were the respondents of the study.

Teacher-made achievement test items using the second quarter's most essential

learning competencies were administered to the respondents. Content, construct, and

concurrent validity were conducted. The internal consistency reliability using Khuder-

Richardson 20 was also tested. The results of the test were tabulated. Moreover, the results

of every test item were tallied, organized, and interpreted. Using the Mantel-Haenszel

Method, bias items were also eliminated.

Definition of terms

Academic Performance. It refers to the numerical grade of the students in

Mathematics in the second grading period. It was composed of 50% written works and

50% performance tasks in the form of summative test (DepEd order no.031 s2020)
Assessment. It is a way of assessing learners' progress of the students. In this study,

the use of achievement tests in Grade 7 Mathematics will be utilized.

Auditory Learning Style. This refers to the learners who learn through verbal

instructions from themselves or others. Auditory students favor acquiring information

through listening. They interpret meaning through the tone of a sound as well as through

the quickness and accentuation of speech (Mašić et. al., 2020. Gilakjani, 2012). It is

recommended that those learners make sure that they can hear well, recite information, and

have conversations for better memorization.

Concurrent Validity. This refers to the validity conducted using the current

performance of the respondents associated with the raw scores garnered during the test

conducted.

Construct Validity. This refers to the validity proving that no items on the test

measure the same construct using the Principal Component Analysis.

Content Validity. It refers to the content property or traits of the test items. It will

be done through checking by the experts. Content validity studies pertain to the adequacy

of the test items as a sample from a well-specified content domain and are associated with

judgmental review procedures.

Test Reliability. It refers to the extent to which the test scores are consistent

concerning one or more sources of inconsistency. (Livingston, 2018)

Item Bias. It refers to invalidity or systematic error in how a test item measures a

construct for the members of a particular group (Villas, 2019). A statistically significant

difference across two or more groups of examinees due to characteristics of the item

unrelated to the construct being measured. An item is considered positively or negatively


biased for a group within a population if the average expected item score for that group is

substantially higher or lower than that for the overall population. Item bias describes

irregularities in the instrument at the item level brought on by inadequate translation or

items that are unsuitable in a given context.

Learning Styles. It refers to the styles of the students (i.e. Visual, Auditory, and

Tactile) in learning concepts. aim to describe individual differences among students by

determining their preferred learning methods and adapting their learning to fit those

preferences (Antoniuk, 2019). Mašić et. al. (2020) and Cornett (1983), learning styles are

overall patterns that provide directions to learning and teaching

Mantel-Haenszel Method. It refers to the method used in identifying biased items

on the questionnaire constructed by the teacher. The Mantel-Haenszel statistic (MH) can

be used for comparing two cultural groups when the observed item scores are dichotomous

(correct–incorrect) (Fontaine, 2005).

School Type. It refers to weather the school of the respondents is public or private.

Tactile Learning style. This refers to the learners who enjoy creating things in

their hands and making sense of information through touch. This suggests learning

activities such as writing, highlighting, underlining, labelling, and role-playing in retaining

information (Mašic et al., 2020, Dunn et al., 2002)

Test Validity. This refers to the concurrent, construct, and content validity of the

achievement test questions in Mathematics 7.

Visual Learning Styles. These refer to the learners who learn by seeing (Mašić et.

al, 2020) or watching demonstrations. They prefer learning via visual channels. Visual
students need the visual stimulations of bulletin board videos and movies. They must write

directions if they are to function well in the classroom.


Chapter 2

REVIEW OF RELATED LITERATURE AND STUDIES

This chapter presents the discussion of related literature and studies to this research.

These materials provided the researcher with some insights, theories, concepts, and ideas

which contributed to the conceptualization and formulation of the framework of the study.

Related Literature

Assessment Outcomes of Filipino Learners

The implementation of the k to 12 curricula in 2012 was the biggest reform in

education that possibly addressed the low achievement level of the Philippines in an

international numeracy assessment conducted by the Trends in International Mathematics

and Science Study (TIMSS) in 2003. The Philippines ranked 34th in 2nd-year high school

mathematics out of the 38 countries assessed. For grade 4 Math, the Philippines ranked

23rd among the participating countries. In 2008, even though only science high schools

participated in the advanced mathematics category, the Philippines ranked at the bottom

(Department of Education, 2010). After the assessment, the country did not participate in

any international assessment, instead relying on the results of the National Achievement

Test (NAT). The overall mean percentage scores on the NAT were low across the years.

The Department of Education records showed a declining mathematics

achievement level of students in the National Achievement Test (Nat) results from 2009-

2014. (DepEd2014) Statistics show that the mean percentage score (MPS) of mathematics

subjects in NAT was far from its target of 75% mastery level. (Austria, CM C., 2020

Interactive SIM in Selected Topic in Algebra) It can also be seen in the publication last

September 26, 2019, that the Garde 6 NAT scores were at a low mastery level. The 2018
NAT results showed that the national average mean percentage score (MPS) was 37.44,

the lowest in the history of the DepEd standardized exam. In the Grade 6 NAT 2009 overall

mastery level, only 6.82% of the takers from the public schools described it as closely

approximating mastery, and it is better than the 0.36% of takers from private schools. A

total of 52.16% of the takers from public schools tallied moving towards mastery,

compared to only 20.26% of the takers from private schools. On the other hand, the higher

percentage of the average mastery among private school takers with 69.13% was tallied

compared to 37.76% of the public school takers. In addition, private school takers tallied

the highest number of low-mastery students, with 10.25% and 3.23%, respectively (Benito,

2010).

On the other hand, a comparison of the overall achievement levels of second-year

high school students in public and private schools was described. 0.13% of the takers from

the public schools tallied closely approximating mastery, while only 0.01% under this

mastery level were from private school takers. Twelve and thirty-five percent (12.35%) of

the public school takers were described as moving towards mastery compared to 5.04% of

the takers in private schools. Private schools tallied 71.69% of their takers compared to

67.89% of public school takers were described as average. Furthermore, 19.60%, 0.02%,

and 0.01% of the public school takers were described as having low mastery, very low

mastery, and absolutely no mastery, respectively. Meanwhile, 23.24% and 0.01% of private

schoolteachers were described as having low mastery and very low mastery, respectively

(Benito, 2010). Surveys of the firms and investors showed that the low performance by the

country's students and graduates in Mathematics, Science, and English may constrain

economic modernization (Valentine, 2019, J.Sarvi, et.al., 2015)


With these trends in the achievement level of the students in the country, curriculum

reform was pushed through the last school year 2012-2013. According to the Department

of Education, only the Philippines, Angola, and Djibouti have a 10-year basis schooling

cycle. The k to 12 program aims to make Philippine education at par with the rest of the

world with 12 years of basic schooling already a global standard. It provides sufficient time

for learners to achieve mastery of skills and concepts, develop lifelong learners holistically,

and prepare learners for tertiary education, middle-level skills development, employment,

and entrepreneurship. (Government Official Gazette)

The salient feature of the k to 12 that will contribute to the competency of the

learners and increase the achievement level of the country are strengthening early

childhood education or Universal Kindergarten, making the curriculum accordant to the

learners through enhancement and contextualization, ensuring integrated and seamless

learning through the spiral progression that showed evidently in every grade level, building

proficiency through language by using mother tongue-based multilingual education in

grades 1 to 3, gearing up for the future by adding two years in high school (senior high

school) and nurturing the holistically developed Filipinos in line with the Department of

Education's vision, mission, and core values.

The mathematics learning standard states that learners should demonstrate

appreciation and understanding of key concepts and principles of mathematics as applied

with the used appropriate technology in problem-solving, critical thinking, and

communicating reasoning; learners should also demonstrate key concepts and principles in

mathematics as applied in making connections, representations, decisions, and real life.

Upon completing Grade 10, learners should demonstrate understanding and appreciation
of key skills and concepts involving numbers and number sense through the lessons from

sets and real numbers, measurement using the lesson in the conversion of units and

patterns, and algebra using the lessons on linear equations and inequalities in one and two

variables, linear functions, and a system of linear inequalities and equations in two

variables. Some other lessons in algebra are exponents and radicals, quadratic equations,

inequalities and functions, and lessons from polynomial functions and equations. Learners

should also demonstrate understanding and appreciation of key concepts and skills

involving geometry, such as polygons, the axiomatic structure of geometry, triangle

congruence, inequality and similarity, basic trigonometry, and lessons from statistics and

probability, such as measures of central tendency, variability, and position, as well as in

combinatorics and probability. (DepEd kto12 Curriculum Guide, 2012)

Furthermore, the reform of education does not satisfy its goal and aims to achieve

learning standards based on the international assessment conducted by PISA and TIMSS

in 2018 and 2019, respectively. According to the OECD, PISA database 2018, the

Philippines scored 353 in mathematics which is a 136 score difference from the average

score of 489. It was shown that 15-year-old girls outperformed boys in mathematics by

scoring 12 points, unlike the typical scores across OECD countries, where 15-year-old boys

outperformed girls by five points. Only 19% of students assessed in the Philippines attained

Level 2 or higher in mathematics. Students can interpret and recognize without direct

instructions and can represent the situations mathematically belonging to this level. (PISA,

2018) While in TIMSS 2019, out of 64 countries participated Grade 4 Filipino students

ranked the last with an average scale score of 297. From the International Mathematics

Achievement score of 500, it is evident that Filipino learners are behind the 203 scale
scores. It was also tallied that among the participating countries, 27 of them where boys

outperformed girls, and there were four countries where boys outperformed girls in

Mathematics Grade 4. It was also evident in the results for Grade 8 Mathematics, Boys in

the six countries outlasted girls in terms of achievement level score, but there were seven

countries where girls outperformed boys in the performance level of Grade 8 (IEA's

TIMSS, 2019).

Reform in education does not justify the results of the international assessment, as

it shows a low achievement level among all participating countries. Thus, continuous

educational reform should be geared toward various innovations and strategies that can

sustain students' interests corresponding to their needs. To address the emerging low

performance of students, researchers and educators continue to work on determining which

curricula and instructional practices and approaches will bring a positive result in a short

period. Moreover, instructional practices such as developing a validated and reliable

standardized assessment without committing bias on students' characteristics is an

important matter.

Students’ Learning Styles

Individuals have an innate ability to learn naturally. Naturally, they can find ways

to learn as quickly as possible. Learning occurs when one observes a change in learners’

behavior resulting from what has been experienced. Some students prefer to acquire

information visually, auditorily, or kinesthetically (Smith, 2019). These preferences are

crucial in delivering teaching and learning processes. This is evident in some studies that

effective learning can be achieved once the delivery of instruction is tailored to students’

learning preferences or styles.


In the teaching and learning process, students differ in the way they learn or prefer

to concentrate on, store, and remember new and/or difficult information; this refers to

learning style (Prashning, 2005). It is not an ability but a preference for the way an

individual uses his or her abilities (Sternberg 1994). There are various learning style

schemes; however, these are often categorized by sensory approaches such as visual, aural,

and kinesthetic approaches (Chick, 2016).

The present study will analyze item bias using the Mantel Haenszel method and

will focus on the student's learning style, specifically visual, auditory, and tactile learners,

as a variable in detecting biased items. Visual learners are students who learn the best

through what they see. That is why teachers were suggested to use visual aids such as

pictures, illustrations, graphs, and films, and the need to demonstrate the activity is

essential for them to learn. Studies on learning styles and the application of visual aids

noted that at least 40% of all students were generally beneficial in supporting learners’

processes and retention of information (Masic et al., 2020; Clarke et. al, 2006).

Auditory learners are students who are fond of acquiring information through

listening. Students interpret meaning through the tone of a sound as well as through

quickness and accentuation of speech. This learning style suggests that students should

hear well, recite information, and have conversations for better memorization. (Masic et al.

2020; Gilakjani 2012). Tactile learners were students who enjoyed creating things in their

hands and making sense of information through touch. Writing, highlighting, underlining,

labeling, and role playing is helping students to retain information with types of learners

Masic et. al., 2020)


Learning style is both a student 'characteristic and an instructional strategy.

Learning style is an indicator of how students like to learn (Keefe, 2011). Banaga (2016)

emphasized that learning style covers individuals' natural or habitual patterns of acquiring

and processing information in any learning situation. Ishak and Awang (2017) describe it

as a motive and strategy that involves the planning and learning system of students to learn

and achieve their aspirations. Moreover, it is also a way in which individuals absorb and

retain new information or skills, regardless of how they are described. The use of learning

styles is required to differentiate instruction. Ideally, this includes all the learning styles so

that students can learn in a way that suits them best for the day.

Assessment in Education

During the global outbreak of a pandemic in 2020, the Department of Education

crafted the Basic Education-Learning Continuity Plan, which seeks to continue the

teaching-learning process that ensures the health, safety, and welfare of the learners,

teachers, and personnel of the department with the use of different learning modalities. It

also specifies the rules, appropriate guidelines, and directives through projects, programs,

and activities. DepEd's Information and Communications Technology Services (ICTS)

cited some challenges that might be faced in the implementation of distance learning

technologies by both teachers and learners as well as stakeholders. These include Internet

connectivity among students and learners, which is the major limiting factor, the capacity

of teachers to use technology in the teaching process on the learning delivery, and early

grade levels, which should be accompanied by parents and guardians in using technology.

Therefore, parents should also be oriented toward this. (DepEd, 2020)


The Basic Education-Learning Continuity Plan focuses on what learners should

learn and accomplish. Communication, materials, learning activities, assignments, and

assessments will be employed in the BE-LCP to identify the methods to be used in learning

delivery. Communication talks about ways to provide feedback on different activities,

direct teaching, discussions, and queries about the lesson and activities. The materials focus

on the learning materials needed and references in constructing content lessons. Learning

activities, such as case studies, group discussions, and presentations, facilitate learning.

Lastly, assessments are crucial during the pandemic. Assessment is a way to assess student

progress. Through the learning continuity plan, these four aspects of the teaching-learning

process are possible during times of Covid-19.

Assessment plays a vital role in the teaching-learning process during the pandemic,

as it is defined as a central pillar in the educative process. It is challenging for teachers to

assess learners in distance learning programs and provide feedback and formative guidance

to students. When a teacher fails to provide regular feedback, students may fail to address

their learning levels and struggle to improve their new knowledge and skills in the self-

learning phase, as required by distance learning. The assessment procedure should be

communicated clearly and systematically between students and parents. If examinations

are performed through alternative channels such as computer-based assessment, they

should be analyzed for quality and equity implications. Teachers may also use text

messaging and phone-call interactions to provide feedback to students, as well as parents

giving feedback to teachers.

Another challenge for teachers in the new normal education is to rapidly change the

practices in the face-to-face teaching and learning process, such as giving daily tasks,
responsibilities, and accountabilities. The development of new alternatives and varied

approaches should be applied by teachers to monitor students' achievement, from

assessment to remediation of learning gaps, including the method of conducting formative

and summative assessments. These challenges for the teachers brought about by the

pandemic were a roller coaster ride with the aim of the k to 12 curricula to develop learners

holistically.

The BE-LCP also managed to craft educational competencies among the learners,

as was evident from the Most Essential Learning Competencies (MELCs) produced by the

department. From the original number of learning competencies in the k to 12 curriculum,

which is 14 and 17, down to 5,689 in the Most Essential Learning Competencies (MELCs).

Sixty% of the original competencies were eliminated. In Mathematics, 27% (198) of

mathematical competencies were removed in the k to 12 curricula, and 543 out of 741

competencies remained. Assessment of such competencies will be done through

summative tests such as written works and perform tasks on any learning delivery

modality. (DepEd BE-LCP, 2020)

Assessment in Mathematics is composed of three components before the pandemic:

written work (40%), performance tasks (40%), and quarterly assessments (20%). Written

work includes long tests, unit tests, or any activities that ensure students' written skills in

expressing their ideas. The performance task included a skill presentation and

demonstration. Written works may also be included in this component. Finally, quarterly

assessments are tested at the end of the quarter (DepEd order no.8 s2015). The learning

continuity plan of DepEd has been crafted to ensure the teaching and learning process amid

the pandemic. Based on the interim guidelines for grading and assessment in light of the
basic education learning continuity plan, students' mathematics achievement level was

composed of written works (50%) and performance tasks (50%) on whatever forms of

modality. Summative tests will continue in the form of written works and performance

tasks (DepEd order no.031 s2020).

A standardized assessment will be performed throughout the implementation of the

BE-LCP of the Department of Education. Teacher-made items can be used if it was re-

evaluated to find their difficulty and discrimination index as well as reliability, or the

teacher may use test items stored and used repeatedly. There are two important

characteristics of an item that will be of interest to teachers. These are item difficulty and

discrimination indices. The difficulty of an item or item is defined as the number of students

who can answer an item correctly divided by the total number of students. Difficult items

tend to discriminate between those who know and those who do not know the answer.

Conversely, the easy items could not discriminate between the two groups of students.

Therefore, we are interested in deriving a measure that will tell us whether an item can

discriminate between these two groups of students. This measure is called the

discrimination index (Santos, 2007). Items can be used every now and subsequently.

Performance should be identified not only within the test forms but also across all test

forms (Velasco, 2017 ).

Item difficulty indicates that the higher its value, the easier it is to answer the

question. It is important to determine whether students have learned the concept being

tested. A high difficulty score indicated that a greater proportion of the sample answered

the question correctly. A lower difficulty value indicates that a smaller proportion of the

sample understood the question and answered correctly. This may be because of the item
being coded incorrectly, ambiguity with the item, confusing language, or ambiguity with

the response option. The items suggest modifying or deleting them from a pool of items

(Boateng et. al., 2018)The desirable difficulty levels are slightly higher than midway

between chance and perfect scores for the item, that is, if the goal is maximizing item

discrimination.

Item discrimination refers to the ability to differentiate between students based on

how well they know about the material being tested. It reflects the degree to which the item

and the test, as a whole, measure unitary ability or attributes. This was calculated by

subtracting the proportion of examinees in the lower group from the proportion of

examinees in the upper group who received the item correctly or expectedly endorsed the

item. It enables the identification of items that are differentiated correctly between those

who are knowledgeable about a subject and those who are not (positive discriminating

items), items that are poorly designed such that more knowledgeable get them wrong and

less knowledgeable get them right (negative discriminating item), and items that fail to

differentiate between participants who are knowledgeable about a subject and those who

are not (non-discriminating item), according to Boateng (2018). Items with low indices

were often ambiguously worded and should be examined. Items with negative indices

should be examined to determine why a negative value was obtained. Boateng (2018) states

how the item discrimination index improves test items. First, items that were too easy, too

hard, or ambiguous (non-discriminating items) were removed. Second, items defined as

negative discriminating should be reexamined and modified. Finally, items that were

positively discriminated against should be retained. In practice, values of the discrimination


index seldom exceed 0.50, because of differing shapes of item and total score distributions

(Glossary of Education Reform).

Another important factor to consider in the construction and design of test questions

is the reliability of the questionnaire. The reliability of a test refers to the extent to which

it is likely to produce consistent scores. High reliability means that the questions of a test

tend to be pulled together. This signifies that the relative scores of the students would show

little change when a parallel test was developed using similar items. Low reliability means

that the questions are unrelated to each other in terms of who answers them correctly. The

KR20 reliability analysis was used in this study to test the internal reliability consistency

of the mathematics achievement test. This indicates that the higher the number of test items,

the higher the internal consistency reliability and vice versa. Thus, a test can be made

reliable by increasing the number of test questions (Pedrajita, 2017; Ferguson & Takane,

1989). Moreover, the identification of difficulty and discrimination index is not sufficient

to ensure test fairness among students (Villas, 2019; Gatchalian & Lantajo, 2010)

The study also measured the concurrent validity of the achievement test in

Mathematics 7. Concurrent validity was secured by defining the relationship between

second-quarter academic performance and test scores on the test. Concurrent validity

coefficients signify the homogeneity of a group of test scores. The higher the correlation,

the less homogeneous a group of scores, which means that the larger the range scores, the

larger the correlation coefficients. The more homogeneous the test becomes, the less valid

it is (Pedrajita, 2017; Ferguson and Takane, 1989). The lower the correlation, the more

homogeneous the group of scores.


Sutton and Krueger (2002) mentioned in their book "EDThoughts: What We Know

on Mathematics Teaching and Learning" that "a successful educational system focuses on

students' outcomes and provides necessary support among the students in achieving them".

Strategies to create equitable classrooms with high-quality content were also provided by

Sutton and Krueger. The accurate identification of students' knowledge and mastery is one

of the strategies enumerated. The diagnosis in which the students struggle to address

appropriate learning instructions was also cited. Another strategy is to engage all learners

with higher-order thinking skills. All these strategies can be vital in providing a well-

crafted assessment that will surely identify students' difficulties and the need for calibrating

learning instructions. The literature supports the aim of the study by providing validated

questions and is free from item bias. This will surely address students' difficulties and

mastery of specific competencies.

In addition, the construction of classroom tests is an art that must be learned. It is

not automatically derived from knowledge of the subject matter, a formulation of learning

outcomes to be achieved, or a psychological understanding of the pupils, which are mental

processes, although all of these are prerequisites. The ability to construct high-quality test

items requires knowledge of the techniques and principles of test construction and the skills

in their application.

This literature can be of great help in the ongoing study because it provides the

researcher with a guide in constructing test items for the achievement test in mathematics

despite all the challenges of the present situation. This emphasizes the need to provide

high-quality test items, especially amid pandemics.

Item Bias
Test equity is a challenging task in the development of educational assessment

instruments. It ensures that no individuals should be disadvantaged in any way to deal with

the instruments. This is primarily achieved by ensuring that a test measures only construct-

relevant differences between subpopulations of examinees, such as gender, school types,

and socio-economic status. If test equity is not achieved, a test or test item is biased toward

a particular subpopulation of examinees (Pedrajita, 2017; Kanjee, 2007).

Item bias is described as a systematic error in the measurement process (Osterlind,

1983) as cited by Diaz et. al. (2021). This is related to the test fairness issue between groups

of variables and a condition in which a group of characteristics that are not related to the

construct to be measured affects the test results. An educational assessment test is

considered biased if the results have disadvantages for certain students over others based

on their identified profile, such as students' ethnicity, income backgrounds, gender, and

other variables. It is a possible threat to validity (Akcan & Kabasakal, 2019; Clauser &

Mazor, 1998). Identifying test bias requires test developers and educators to determine why

one group of students tends to perform better or worse than another group on a particular

test (Glossary of Education Reform). Therefore, it is important to research this topic.

The issue of bias—and how to overcome it—has gained significance as public

school student populations become more diverse, and exams play increasingly vital roles

in evaluating individual performance or access to opportunities. Item bias analysis does not

examine if there are general between-group variations in total score, or whether group A

members would be more likely to answer "yes" to X than members of group B. Item bias

studies, on the other hand, look for intergroup disparities at the score level, i.e., whether

group A members with a certain attitude level have the same average score on a given item
as group B members with the same attitude level. Bias is not a mere presence of a score

difference between groups. In the test items, bias was the presence of a systematic error in

the measurement. Items may be judged relatively more or less difficult for a particular

group by comparison with the performance of another group or groups drawn from the

same population (Pedrajita, 2017; Camilli & Shepard, 1994)

Test bias can be categorized into construct, content, and predictive validity bias.

Whether a test accurately assesses what it was intended to measure is referred to as

construct-validity bias. The results of an intelligence test, for instance, may reflect a

student's comparatively poor English-language skills rather than their academic or

intellectual talents, since English-language learners are likely to meet vocabulary on the

test that they have not learned. Content-validity bias emerges when a test's subject matter

is substantially harder for one group of pupils than for another. This can happen when a

group is scored unfairly (for instance, when answers that make sense in one group's culture

are deemed correct), when questions are worded in ways that are unfamiliar to some

students due to linguistic or cultural differences, or when members of a student subgroup,

such as members of various minority groups, have not been given the same opportunity to

learn the material being tested. A subtype of this prejudice, known as item selection bias,

deals with the use of certain test items that are better suited to one group's linguistic and

cultural experiences. Predictive validity bias, also known as bias in criterion-related

validity, relates to how effectively a test predicts future performance for a certain student

group. For instance, a test would be regarded as "unbiased" if it accurately predicted future

academic and test performance for all student groups.


The question of test fairness, or whether the social implications of test results result

in unjust advantages or disadvantages for particular student groups, is closely related to

test bias. Given their crucial role in determining admission to higher education institutions,

particularly prestigious schools, and universities, college admission examinations

frequently generate questions regarding both test bias and fairness. For instance, even

though female students often receive higher grades in college, they frequently score lower

than male students (perhaps due to gender bias in test design) (which possibly suggests

evidence of predictive-validity bias).

Such bias can be measured by its magnitude at the item level through Differential

Item Functioning (DIF). DIF is an indicator that an item is potentially biased, which will

eventually provide a better understanding of the behavior of subgroups on potentially

biased items (Gatchalian & Lantajo, 2010; Osterlind 1983), as cited by Villas (2019). DIF

may be rooted in item bias, which is a harmful DIF (Diaz et. al, 2021). DIF analysis is a

means of statistically identifying unexpected differences in performance across matched

groups of examinees, reference groups, and focal groups.

This study focuses on detecting and removing bias items that may be validity bias

based on the learning styles and type of school that the learners enrolled in. It may also

describe the magnitude of the bias based on the DIF analysis. Thus, the literature mentioned

above will help in recognizing biased items on mathematics achievement tests in

Mathematics 7.

Mantel-Haenszel Test

Different methods for detecting bias have been developed over the years. Among

these methods is the Mantel-Haenszel analysis, which will be used in this study to detect
and remove bias. The Mantel-Haenszel (MH) method is a common method for detecting

differential items. It was seen as a practical means of determining bias test questions

because of its simplicity and ease of use, and it provides effect size statistics if the detected

DIF found is damaging. It is easy to understand and implement and provides a statistical

significance test (Diaz et. al, 2021; Holand & Thayer, 1988; Millsap & Everson, 1993). It

aims to test whether there is an association between group membership and item response

conditional on the total score (Ukanda et. al, 2019; Magis, et. al, 2010). It is widely

implemented in detecting DIF, as the procedure demonstrated external validity and was

recommended to match conditioned simultaneously on total score, a categorical variable,

and additional background variables such as mathematical ability, educational attainment,

and type of community (Villas, 2019; Pedrajita &Talisayon, 2009).

Mantel-Haenszel (MH) is commonly used in studies of Differential Item

Functioning because it makes meaningful comparisons of item performance for different

groups by comparing examinees of similar proficiency levels instead of comparing the

performance of all groups on an item. The MH analysis yields a chi-square test with one

degree of freedom; if it is greater than the computed chi-square value of 3.84, the item will

be tagged as a biased item. The MH procedure is also used to estimate the ratio that yields

a measure of effect size, evaluating the magnitude of DIF (Pedrajita, 2017). This ratio was

transformed to produce Delta MH (DMH). A delta metric scale of odds ratio a, as suggested

by Holland and Thayer (1988) and cited by Khalid et. al. (2021), is given on the equation

Delta an MH = -2.35 ln (an MH). A positive DMH indicates DIF in favor of the focal group

and a negative value signifies DIF in favor of the reference group.


The classification of the items was based on the magnitude of the DIF guidelines

proposed by Zwick and Ercikan (1989), cited by Khalid et al.. al (2021), an absolute value

of the delta MH less than 1 is a Type A item, signifies negligible DIF that also indicates

MH chi-square test is not statistically significant and considered to function properly. Type

B items were items with an absolute value of MH delta less than or equal to 1.5 tagged as

moderate DIF. Items that had the lowest delta MH and did not have alternative items could

be used. Finally, item C signifies a large amount of DIF, which is statistically significant.

These are the items with greater than 1.5 absolute value delta MH. A critical review of

these items is necessary and will only be selected in exceptional circumstances. In addition,

because the MH estimator is consistent even when the sample size per stratum is small, it

can be useful in DIF studies, even when there is a very fine partition of the ability

distribution.

These qualities of the Mantel-Haenszel method were the reason for using this

method in the present study. Thus, different studies in MH analysis assert that the results

of the study in detecting biased items are accurate and significant.

Related Studies

Studies on Learning Styles

A study entitled "Assessment of Visual, auditory and kinesthetic learning style

among undergraduate nursing students" was presented in the International Journal of

Advanced Nursing by Ibrahim and Hussein (2014). The primary concern of the research

paper is to assess the Visual, Auditory, and Kinesthetic (VAK) Learning styles of the two

hundred ten (210) nursing students who are enrolled in two Nursing colleges in the
Universities of Mosul and Kirkuk. The results showed that visual learning styles were the

most tallied, with 40% of the 210 nursing students. Auditory and kinesthetic learning styles

accounted for 29.5% and 30.5%, respectively. Based on their sex, females preferred the

auditory learning style (30.3%) more than males (27.3%), while males preferred the

kinesthetic learning style (32.3%) more than females (29.8%).

Apipah, et. al. (2018) analyzed a mathematical connection ability based on students'

learning style on visualization, auditory, kinesthetic (VAC) learning model with self-

assessment. The research found that among the VIII-grade students of State Junior High

School 9 Semarang, students with a visual learning style had the highest mathematical

connection ability after taking the assessments. Moreover, students with kinesthetic and

auditory learning styles had average and lowest mathematical connection abilities,

respectively.

Sakinah and Avip (2021) conducted another study that aimed to determine students'

mathematical literacy skills based on their learning styles. The study showed a result

contrary that to of the study by Apipah et al.. al. the study "An analysis of students'

mathematical skills assessed from their learning style". This revealed that the mathematical

literacy skills of students with a kinesthetic learning style were better than those of students

with visual and auditory learning styles. The results of the study revealed low literacy skills

after tallying only 14% of the students who correctly answered the mathematical literacy

question. Furthermore, it was also revealed that visual learners were able to formulate the

given mathematical problems but were lacking in the use of mathematical concepts and

interpreting mathematical problems, while the students who were auditory learners had the

lowest skills in formulating mathematical problems due to incorrect application of the


mathematical concepts. Lastly, kinesthetic learners were able to formulate and use

mathematical concepts correctly but were lacking in interpreting mathematical problems.

Moreover, Karlimah and Risfiani (2017) emphasized the students with auditory

learning styles in their study entitled "Contribution of Auditory Learning Styles to

Students' Mathematical Connection Ability'. It was concluded that learning facilities suited

to auditory learning styles are a significant factor in increasing the mathematical

connection ability of students. An increase in mathematical connection ability was evident

after improving the learning materials that were suited to the learners. The study suggested

having suited learning facilities for the students as well as students with this kind of

learning style must give a particular activity in learning mathematics.

Ishartono et.al (2021) in their study on "Visual, Auditory and kinesthetic students:

How they solve PISA-Oriented Mathematics Problems", revealed there was no difference

in ability between students' visual, auditory, and kinesthetic learning styles in learning

PISA-Oriented questions. A total of 23 students (70% of the population) were students

with visual learning styles, and 30% of them were in the high category. Meanwhile, 44%

and 26% were tallied in the medium and low categories, respectively. It was also revealed

that there were only eight and two students with auditory and kinesthetic learning styles,

respectively, who tallied within the medium and low categories, respectively.

On the other hand, the study by Mašic et. al.(2020), " The relationship Between

Learning Styles, GPA, School Level, and Gender," found that the most preferred of 269

middle and high school students in Sarajevo, Bosnia, and Herzegovina was auditory

learners followed by visual and tactile learners. Most of the learners were middle school

students. At both levels, the auditory style was the most preferred learning style, while
contrary results were found for the next preferred learning style. The least preferred

learning style in middle school was visual, while in secondary school it was tactile. It was

hypothesized that there would be a significant difference in learning style preferences

based on the school level.

The findings of the studies conducted by Ibrahim and Hussein (2014) and Apipah

et. al. (2018), Karlimah and Risfiani (2017), Mašic et. al.(2020), Ishartono et al. (2021),

and Sakinah and Avip (2021) identify the preferred learning styles of students at different

school levels, which may indicate to the present study that the results of tallying students’

preferred learning styles may vary. Thus, the participants in this study may not be assumed

to have the learning styles they preferred.

Test Development and Standardization

Bhat and Prasad (2021) in their study on "Item analysis and optimizing multiple-

choice questions for viable question bank in ophthalmology: A cross-sectional study",

aimed to evaluate the difficulty level, discriminating power with functional distractors of

Multiple-choice questions (MCQs) using item analysis, analyzing the poor items for

writing flaws, and optimization. Items were categorized according to their difficulty index,

discrimination index, and distractor efficiency using simple proportions, standard

deviations, and correlations. Defective items were analyzed for proper construction and

optimization. Seventeen (17) out of 20 defective items were optimized and added to the

question bank; two items were added and modified, and one item was dropped. It was

concluded that item analysis is a valuable tool for detecting poor multiple-choice questions.

Defective items should be optimized and should not be dropped.


In the study titled “Development of an Achievement Test to Measure Students'

Competency in General Mathematics” by Mamolo (2021), the development of a valid,

reliable, and 40-item achievement test in General Mathematics was constructed. Eight

experts performed the test for improvement and refinement. The developed achievement

test was piloted with 425 senior high school students. The items were analyzed and

subjected to a reliability test. It was found that the average item difficulty was 0.40, which

means intermediate, while there was an average item distinctiveness of 0.34, which

signifies a good item. Moreover, the test also indicated a reliability coefficient of 0.84,

which means that the internal consistency value was acceptable. The results showed that

the developed achievement test for General Mathematics is an excellent tool for classroom

assessment.

In the study "Correction of Differentially Functioning Items: Basis for Maintaining

and Enhancing Test Validity and Reliability" by Pedrajita (2017), he examined

differentially functioning items in a chemistry achievement test among public and private

junior high school students in the Division of City schools in Quezon City, found out that

22 out of 50 items displayed statistically bias using the Mantel-Haenszel analysis. Each of

these ten items signified bias against private school examinees, while 12 items indicated

bias against public school examinees. The content validity of the test differed from slightly

to moderately adequate in terms of the number of items retained. The concurrent validity

of the test differed, but all were positive, indicating a moderate relationship between the

examinees' test scores and GPA in Science III. The internal consistency reliability of the

tests differed. The more differentially functioning items were eliminated, the lesser the

content and concurrent validity, and the internal consistency reliability of the test.
Eliminating functioning items diminishes the content validity, concurrent validity, and

internal consistency reliability of the test, as it decreases the length of the number of items

in the test, but could be a basis for enhancing content, concurrent, and internal consistency

reliability by replacing the eliminated DIF items. It was also concluded that Mantel-

Haenszel had a high degree of correspondence between the four methods used in this study.

A gender-based comparative study on identifying bias items by Villas (2019)

entitled "Differential Item Functioning in Grade 8 Math using Logistic Regression, Mantel-

Haenszel, and Logical Data Analysis" was conducted to compare DIF methods in

determining bias items between male and female, low and high English proficiency

examinees in a researcher-constructed achievement test on probability and Statistics strand

in the k12 program. Out of 40 multiple-choice items, 25 flagged a significant MH chi-

square value, thus flagged as DIF and characterized as large evident by delta MH that

favors high English proficient examinees. The same number of items was tallied in

detecting DIF items using the examinees' gender. 25 items were significantly different,

containing a large amount of DIF in favor of female examinees. MH analysis indicated that

all the detected DIF items were classified as large amounts of DIF. The study recommends

that the practice of DIF analysis should be incorporated into test development to ensure

test validity from a unified perspective, especially in the Philippines, where studies are

scarce regarding the detection of biased items.

A comparative investigation on detecting DIF conducted by Khalid, Shafiq, and

Ahmed (2021) entitled "Detection of Differential Item Functioning Using Mantel-

Haenszel, Standardization, and BILOG-MG Procedures" concluded that MH is more

robust and useful and can be used assertively. From a test development perspective,
Mantel-Haenszel should be used to screen unfair items. It provides a magnitude of DIF and

effect size, in addition to a statistical significance test that facilitates further necessary

actions, especially for item writers and practitioners. This finding supports Michaelides'

(2010) study of the European University Cyprus, entitled "An Illustration of a Mantel-

Haenszel Procedure to Flag Misbehaving Common Items in Test Equating". It was

concluded that MH should be applied in the context of test equating to flagging common

items that behave differently across cohorts of examinees. It has the advantage of

conditioning ability when comparing the performance of two groups on an item. There are

guidelines on the effect size that can be used in the decision-making process, whether to

retain those items in the common-item pool or to disregard them.

These enumerated studies will be of great help to the researcher to validate the

claims of the present study on detecting and removing item bias in the mathematics

achievement test based on the Mantel-Haenszel method.

Conceptual Framework

The study aimed to analyze and detect biased items in the Grade 7 Mathematics

Achievement Test using the Mantel Haenszel Method. Figure 1 shows the conceptual

framework of the study. It started with the construction of the test questions. A 60-item test

was used as the main instrument of the study. To ensure the distribution of the items, the

use of Most Essential Learning Competencies (MELCs) was used in the construction of

the table of specification. For content validity, at least 3 experts were asked to validate the

test questions. Afterward, the administration of the test took place. The respondents of the

study were the Grade 7 students of public and private schools in Victoria, Tarlac. The
profile of the students like learning styles, type of school, and academic performance was

gathered through the test questionnaire. An adopted questionnaire for learning styles by

O'Brian and the University of California, Merced Students Advising and learning center

was employed. Answers were checked, tabulated, and organized.

The results of the test were used for item analysis. Item difficulty and

discrimination were sought. The academic performance of the respondents on Mathematics

in the previous quarter and their raw scores on the test were compared to find the concurrent

validity of the test. Reliability tests using KR 20 were tested.

Figure 1. Research Paradigm

Bias items were detected and analyzed after the test administration as well. Each

item was described with its corresponding profile (Learning styles and school type). The

use of SPSS was employed to compute if the given value of MH chi-square is significantly

acceptable and not identified as bias items (critical value is less than 3.8415 at 0.05 alpha

level, df = 1). Lastly, after the elimination of item bias, and validating the test questions,

the results were compared to the original test questions. The validity was described in terms

of content, construct, and concurrent, as well as its reliability.


Chapter 3

RESEARCH METHODOLOGY

This chapter presents the research design, how the population and samples were

determined, the research instrument utilized with the procedures in the validation, how the

data were gathered, and the statistical tools used in the analysis of the gathered data.

Research Design

The study used the scheme of Descriptive and Development research design.

According to Sugiyono (2014), the Development method is used to produce a validated

product. Moreover, Gall et. al (2003) stated that this method has two main objectives which

are (1) to develop a product and (2) to validate it. This study sought to develop a 30-item

validated and free-from-bias Mathematics Achievement test in Grade 7 using the Mantel

Haenszel Method during the 2nd quarter.

The Development method follows the following phases: Phase 1- preliminaries and

development phase of the achievement test; Phase 2- implementation of the test

administration and Phase 3- validation and removal of bias items according to learning

styles and type of school. The final test version consists of validated and unbiased test items

in the Mathematics achievement test during the 2nd quarter.

The development of the achievement test was based on the Most Essential Learning

Competencies (MELCs) in the Learning Continuity Plan by the Department of Education

amidst pandemics. A table of specifications was done to make sure that the items were

properly distributed. The teacher-made test questions were constructed and further
validated by experts. Students learning styles and type of school were also determined to

use as factors in detecting and analyzing bias between items.

The study also used descriptive research that seeks to analyze and detect biased

items of the Grade 7 Mathematics Achievement Test using the Mantel Haenszel Method.

Also, the study will describe the validity and reliability of the test. McCombes (2019)

define the aims of descriptive research, that is "to describe a population, situation or

phenomenon accurately and systematically. It can answer what, when, where, and how

questions, but not why questions". Also, McCombs described the appropriateness of the

use of descriptive research design, it is "when the research aimed to identify characteristics,

frequencies, trends, and categories." As descriptive research seeks to describe the situation

and how the variables are naturally distributed, the results provide the researcher the data

to illustrate the basic relationships to have a better understanding of the questions asked

(Thyer, 2009).

After the teacher-made test questions were constructed, validation will be done. It

will describe the content, construct, and concurrent validity of the instruments. The internal

consistency of the instruments will also be described using the Kuder Richardson 20. Also,

bias items will be described and eliminated using the Mantel-Haenszel method with

concern for the respondents' learning styles and the type of school they are enrolled.

Respondents of the Study

The respondents of the study were meticulously chosen, with a deliberate emphasis

on the random selection of Junior High School students in the 7th grade who actively

participated in the limited face-to-face modality from the public and private schools in the
Municipality of Victoria. This meticulous approach to respondent selection was designed

to yield a balanced and representative sample of students within the chosen municipality,

thereby facilitating the derivation of insightful and meaningful conclusions from the study's

findings.

Data Gathering Procedure

The research procedure by the researcher in developing, validating, and detecting,

and removing biased in the Achievement test in Grade 7 Mathematics is as follows:

I. Preliminary and Development Phase

The researcher used the Most Essential Learning Competencies (MELCs) in

drafting the table of specifications of the achievement test to make sure that test items were

distributed evenly. The second quarter of the Grade 7 mathematics competencies was the

focus of the study. MELCs are composed of 10 weeks of competencies. A 60-item multiple

choice type of test was crafted. Three experts that consist of a head teacher, a master

teacher, and a grade 7 teacher, check the essential of each item developed and validated

using the content validity index.

II. Implementation phase

To facilitate the implementation, the researcher first secured the necessary permits

and letters from the Schools Division Superintendent of Tarlac Province and to the school

head of different public and private schools in Victoria, Tarlac. A 60-item test (see

Appendix D) was given to the respondents in a 2-day scheme since the subject of

Mathematics is good for an hour per day, 4 times a week. On the first day, the students

were given an hour to answer the 30-item test and the 15-item learning style questionnaire
adopted from O'Brien and the University of California, Merced Student, and Advising and

Learning Center. On the second day, the other 30 items were given. The researcher also

provided and discussed a test administration procedure with the teacher-administrator to

make the testing efficient and effective. Below were the test administration procedures:

Pre-Administrator

1. Count the number of copies of the test questionnaire and answer sheet.

2. Check the number of items in each test questionnaire. Two-day administration

will be conducted. A 30-item test will be taken by the students each day for 60 minutes.

3. Secure the key to correction.

4. Orient the students on taking the test.

Administration of the Test

Discuss with the students the following procedure.

1. Do not forget to write your name and other information.

2. Read and understand each question carefully.

3. Shade the letter of your answer on the answer sheet provided. Shade legibly.

4. Use your MONGOL 2 PENCIL only in shading your answers.

5. No erasures. You may use a clean sheet of paper for your computations.

6. You will take the 30-item test in 60 minutes. (State the time that started and the

time that will end)

7. Honesty is always the best policy.

Post Administration

1. Collect the test questionnaires and answer sheets.

2. Count the number of collected questionnaires and answer sheets.


3. Checked the answer sheet and tallied the scores. (Checking of the test was done

by teacher-administrators and was verified by the researcher.)

III. Validating Phase

Content validity through experts was described using the content validity index

before the test administration, and followed by item analysis to describe item difficulty and

discrimination after the score of the students were checked and tallied. To describe the

construct validity of the test, the use of a Principal Component was sought. The concurrent

validity of the instruments was tested by describing their relationship using the Pearson

product-moment correlation and Linear regression. The internal consistency of the

instruments was also described using the Kuder Richardson 20. The original test version

undergoes the detection of biased items using the Mantel-Haenszel method with concern

for the respondents' learning styles and the type of school they are enrolled at. Detected

bias items were described and eliminated.

Moreover, the validation process will be repeated after the elimination of biased

items to further describe the effectivity of bias elimination on the strength of the validity

of the achievement test. Lastly, after the combined process of biased elimination and test

validity, the revised test version was produced. The validity and reliability of the revised

achievement test in Mathematics 7 were sought and described.

Research Instrument

The main instrument utilized in the conduct of the study was a 60-item teacher-

made Achievement Test questions in Mathematics 7 anchored in the Most Essential

Learning Competencies (MELCs) that were validated and checked through consultations

from experts, Master teacher, Head teacher, and Mathematics teacher as well as from the
dissertation adviser of the researcher. Furthermore, construct, concurrent and content

validity as well as the internal consistency of the questionnaire were determined.

Content Validity

The test questions were evaluated by experts (i.e. Master Teachers in Mathematics,

Head Teachers in Mathematics, Math Teachers specifically teaching Grade 7 Mathematics

and Math majors). Experts validated the test questions whether it is essential or not

essential to the Most Essential Learning Competencies (MELCs), subject matter, and level

of the target respondents. Also, experts judged the questionnaire in terms of the clarity of

each test question (see Appendix H).

Table 1. Corresponding Number of Experts with their Acceptable Content


Validity Index (CVI) values
Number of Experts Acceptable CVI Values Sources of
Recommendation
Two Experts At least 0.80 Davis (1992)
Three to five experts Should be 1 Polit & Beck (2006), Polit
et. al (2007)
At least six experts At least 0.83 Polit & Beck (2006), Polit
et. al (2007)
Six to eight experts At least 0.83 Lynn (1986)
At least nine experts At least 0.78 Lynn (1986)

The use of a content validity index of items was employed to describe the validity

of each item of the standard test. Table 1 shows the number of experts and its implication

on the acceptable cut-off score of CVI.

The difficulty and discrimination index was also utilized by the researcher to

analyze the item of the teacher-made questions.

The formula for difficulty and discrimination indices are as follows:


𝑅𝑢 +𝑅𝑙 𝑅𝑢 − 𝑅𝑙
Index of Difficulty = Index of Discrimination = 1
𝑇𝑁 𝑁
2
Where;

𝑅𝑢 = number of correct responses in the upper group

𝑅𝑙 = number of correct responses in the lower group

TN = Total number of respondents who answered the test

N = Total number of respondents who answered the items

Table 2. Difficulty and Discrimination Index


Difficulty Discrimination Index
Index 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1.0
19.5 &
below(Very
Hard)
19.51 -
44.50
(Hard)
44.51-
74.50
(Optimum)
74.51-89.5
(Easy)
89.51 &
above(Very
Easy)

After computing the difficulty and discrimination indices, items were plotted in

cross-tabulation shown in the table. Frienberg (1995) cited by Isip (2013) identified the

adequate discrimination index (D = 0.3 and above) while the difficulty index must be

within the optimum region (44.51-74.50). After plotting the items, those that fall outside

the shaded region were studied and revised or modified.

Table 2 shows the discrimination index and difficulty index of test questions. It

shows that at 44.51 to 74.50 on the difficulty index and at 0.3 up to 1.0 on the discrimination
index, questions are acceptable, and can be retained. Otherwise, questions will be changed

or reconstructed.

Construct Validity

Since the data was found suitable and eligible based on sampling adequacy, using

Kaiser-Meyer-Olkin (KMO) Test, and appropriateness, using Bartlett's Test of Sphericity,

then the Principal Component Analysis (PCA) was used to measure the test construct

validity.

The value of samples adequacy of KMO was given on Table 3 (Shrestha, 2020):

Table 3. KMO Value of Samples Adequacy

Value Description

0.80-1.00 Adequate

0.70-0.79 Middling

0.60-0.69 Mediocre

< 0.50 Not adequate

Bartletts' test is designed to test the equality of variances across groups against the

alternative that variances are unequal for at least two groups. Bartlett's test of Sphericity

provides a chi-square output that must be significant. It indicates that the matrix is not an

identity matrix and accordingly, it should be significant (p < 0.05 alpha level).

Concurrent Validity

The performance of the respondents from the second quarter was tabulated as well

as the scores. To identify the concurrent validity of the test questions, the relationship
between the respondents' performance on the test and the respondents' performance during

the second quarter was sought. The second quarter grades were sought on the final grades

composed of 50% written works and 50% performance tasks through standardized

summative tests. The use of the linear regression analysis was also sought to describe the

variance explained by the score on the Mathematics achievement test of the students to

their performance during the second quarter.

Internal Consistency Reliability

Kuder-Richardson 20 was used to measure test questions’ internal consistency of

the reliability of the test questions.

Table 4 shows Cronbach's coefficient of reliability which was used to measure test

questions' internal consistency. It shows that from 0.7 to 0.9 coefficient, the internal

consistency of the test questions will be acceptable while less than 0.7, the internal

consistency of the test questions will not be accepted.

Table 4. Cronbach’s Coefficient of Reliability

Cronbach’s Alpha Internal Consistency

a≥0.9 Excellent

0.9 > a ≥ 0.8 Good

0.8 > a ≥ 0.7 Acceptable

0.7 > a ≥ 0.6 Questionable

0.6 > a ≥ 0.5 Poor

0.6 > a Unacceptable


For the learning styles, a 15-item questionnaire adopted from O'Brien (1985) and

the University of California, Merced Student Advising and Learning Center was utilized

to identify students' learning styles. The rating scale used is:

3- Often

2-Sometimes

1-Seldom

Statistical Treatment

In describing the learning styles of the respondents, frequency counts and

percentage was utilized. The Likert scale below is the basis for describing the over-all

learning styles of the respondents:

1.00-1.66 Seldom

1.67-2.33 Sometimes

2.34-4.00 Often

Frequency counts and percentages were also used to describe the respondents'

school type.

In describing the performance of the students in Mathematics in the Second

Quarter, the level and progress report prescribed by the Department of Education which is

shown in Table 5, was adopted.

Table 5. Academic Performance

Grade Verbal Description


100-90 Outstanding
89-85 Very Satisfactory
84-80 Satisfactory
79-75 Fairly Satisfactory
Below 75 Did not meet Expectations

To describe the content validity of the test questions, the content validity index

(CVI) was used. The rating scale for the level of essential of the test questions to the Most

Essential Learning Competencies, subject matter, and level of the target respondents will

be:

1- not essential

2- essential

For the degree of clarity of the test questions, the rating scale below was used:

1-the item is not clear


2-the item needs some revision
3-the item is clear but needs some minor revision
4-the item is very clear

Item difficulty and discrimination index were also used for content validity.

To describe the construct validity of the questionnaire, Principal Component

Analysis was utilized. Kaiser's Criterion and Scree Plot Test were used to identify the

optimum number of components (or factors) that can be extracted.

Table 6. Guilford’s Magnitude of Significant Correlation

Correlation Coefficient Interpretation


(Absolute Value)
< 0.19 Slight Correlation

0.20 - 0.39 Low Correlation

0.40 - 0.69 Moderate Correlation

0.70 - 0.89 High Correlation

0.90 - 1.00 Very High Correlation

≥ 0.30 Practically Significant Relationship


For concurrent validity, to find the relationship between the performance of the

learners during the 2nd quarter and the teacher-made achievement test in mathematics for

the second quarter, Pearson-product correlation and Regression analysis were

administered. The level of relationships was described using table 6.

Kuder-Richardson 20 (KR20) was utilized to describe the internal consistency or

the reliability validation of the questionnaire.

To analyze bias items, the Mantel-Haenszel chi-square statistic was used. If the

computed absolute value for Mantel-Haenszel chi-square is less than its critical value

(3.8415) at 0.05 alpha level, with one degree of freedom, the item is acceptable and not

identified as the potentially biased item. The critical value serves as a detection threshold

for potentially biased items. On the other hand, the items having a Mantel-Haenszel chi-

square statistic value greater than the critical absolute value were flagged as biased items.

To describe the degree of bias items using Differential Item Functioning analysis, the

Mantel-Haenszel Delta (MHD) was used. A positive MHD indicates that DIF is in favor

of the focal group, and a negative value indicates DIF is in favor of the reference group

(Khalid et.al, 2021). Differential Item Functioning (DIF) analysis is one method for

examining bias at the item level. DIF analysis is a method for statistically detecting

unanticipated performance variations across matched groups of test takers. Pedrajita

(2016), as cited also in the study of Ukanda et. al. (2019), categorized bias items (or DIF)

as negligible, moderate, and large amounts of DIF.

Table 7 shows the detection threshold and effect size of the Mantel-Haenszel Chi-

Square statistics DIF Detection Method.


The computations for the processing of data will be done through SPSS software

and Jamovi software.

Table 7
Detections Threshold and Effect Size of Mantel-Haenszel Chi-Square
Statistics DIF Detection Method
Detection Threshold Effect size Category Scale
(Absolute Value)
0.0-1.0 Negligible

3.8415 1.0-1.5 Moderate Delta

>1.5 Large
Chapter 4

PRESENTATION, ANALYSIS, AND INTERPRETATION OF DATA

This chapter presents the data gathered, the results of the statistical analysis and the

interpretation of findings based on the objectives of the study. These are presented in tables

following the sequence of the specific research problems propounded.

1. Students' Profile

In this study, Grade 7 students' learning styles, type of school and academic

performance in Mathematics were describe as concerns with their profile.

Table 8
Profile of Grade 7 Students
n=207
Profile Category Frequency Percentage
Auditory 67 32.4
Learning Styles Tactile 64 30.9
Visual 76 36.7
Private 92 44.4
School Type
Public 115 55.6
75-79 26 12.6
Academic performance in Mathematics 80-84 66 31.9
nd
(2 Quarter) 85-89 69 33.3
90-100 46 22.2

1.1 Students Learning Styles

Learning styles refer to the ability of learners to perceive and process information

in learning situations. Understanding learning styles of the students can increase

educational outcomes such as effective teaching and learning process (Ibrahim and

Hussein, 2016)
Table 8 shows that students from Grade 7 prefer to learn by seeing visuals such as

graphs, pictures, and other visual instructional materials, as it tallied the highest number of

study respondents among the listed learning styles. This implies that students often learn

better if they work in a quiet place and easily understand and follow directions written on

the board or paper. Students sometimes see the textbook page and where the answer is

located during the test.

The table also shows that Grade 7 students often understood how to do something

if they were being told to them, rather than reading it to themselves. It was also shown that

students often do their best in academic subjects by listening to lectures and tapes.

Sometimes, students remember things they hear rather than see or reading. The results

emphasize that academic subjects such as mathematics should be taught in a face-to-face

manner or in the sense that it should be discussed by the teacher, rather than just reading

their modules at home.

Grade 7 students often think better when they have the freedom to move. Students

often enjoy working with their hands or making things. They sometimes see someone else

to do the instruction before following it. The table shows that tactile learning styles tallied

the least among grade 7 students in the limited face-to-face class in the Municipality of

Victoria, as shown in Table 8.

Among other learning styles, the visual learning style had the highest mean. This

result is also evident in the research paper entitled "Assessment of visual, auditory and

kinesthetic learning style among undergraduate nursing students" presented in the

International Journal of Advanced Nursing studies by Ibrahim and Hussein (2014) wherein

visual learners dominated the study as it tallied 40% of the 210 respondents of two
universities. This might be because students, as respondents of the study, might perform

well in mathematics achievement tests, considering the findings of Apipah et al..(2018) on

mathematical connection ability based on student's learning style in the VAC learning

model with assessment among VIII grade students of State Junior High School 9 Semerang,

that visual learning style has the highest mathematical ability. Moreover, students with

kinesthetic and auditory learning styles had average and lowest mathematical connection

abilities, respectively.

On the other hand, the study of Sakinah and Avip (2021) contradicts the claim of

Apipah et. al. (2018). In the study entitled, "An analysis of students' mathematical literacy

skills assessed from students learning style," tactile (kinesthetic) learners had been noted

to be better than visual and auditory learners even though the study was dominated by

visual learners. The skills of students with kinesthetic backgrounds were better than those

of students with auditory and visual learning styles in terms of understanding and using

mathematical concepts. They were also more likely to formulate solutions for these

problems. However, those with auditory learning lack the necessary skills to interpret and

use mathematical concepts. Additionally, those with visual-learning skills were more likely

to mistakenly use basic mathematical concepts.

These contradictory findings regarding the performance of students concerning

mathematical literacy skills and ability might serve as evidence that assessment validity

and reliability are crucial with regard to the learning styles of students. Thus, the

importance of detecting biased items in mathematics assessment based on students’

learning styles is of great help in securing test fairness.

1.2 School Type


Due to the pandemic, guidelines for the pilot were limited face-to-face among

Private and Public schools and implemented through DepEd Memorandum no. 071 s.2021.

The expansion of limited face-to-face communication was authorized in February 2022.

Since the municipality of Victoria was in a low-risk category for Covid-19 cases, all public

schools and private schools were permitted to join the limited face-to-face. A total of 207

students from public and private schools qualified as respondents for the research. Fifty-

five percent (55.6%) were from public schools and four percent (44.4%) were from private

schools, as shown in Table 8. Qualified respondents were grade 7 students who were

present at the time of the research and tallied with a specific learning style.

In the context of assessment, the National Achievement Test (NAT) is administered

annually in both private and public schools. It is a standardized test created to identify

students' success levels, opportunities for improvement, and five major academic subjects

after the school year. According to Department of Education data (DepEd 2014), student

achievement in mathematics decreased between 2009 and 2014, as measured by the

National Achievement Test (NAT). Data indicate that the mean percentage score (MPS) in

the NAT was much below the desired competence level of 75% (Austria, 2020 Interactive

SIM in Selected Topics in Algebra). In addition, the Grade 6 NAT results were at a low

mastery level, as can be seen in the publication on September 26, 2019, which was released.

According to the 2018 NAT results, the national average mean percentage score (MPS)

was the lowest it has ever been for a DepEd standardized test, at 37.44.

The contributions of the results from the National Achievement Tests were students

from public and private schools in the country. In the Grade 6 NAT 2009 overall mastery

level, only 6.82% of the takers from the public schools described it as closely
approximating mastery, and it is better than the 0.36% of takers from private schools. A

total of 52.16% of the takers from public schools tallied moving towards mastery,

compared to only 20.26% of the takers from private schools. On the other hand, a higher

percentage of the average mastery among private school takers 69.13% was tallied

compared to the 37.76% of the public school takers. In addition, private school takers

tallied the highest number of low mastery students, with 10.25% and 3.23%, respectively

(Benito, 2010).

On the other hand, a comparison on the overall achievement level of second year

high school students of the public and private schools in 2009 was also described. 0.13%

of the takers from the public schools tallied closely approximating mastery, while only

0.01% under this mastery level were from private school takers. Twelve and thirty-five

percent (12.35%) of the public school takers were described as moving towards mastery

compared to 5.04% of the takers in private schools. Private schools tallied 71.69% of its

takers compared to 67.89% of public school takers were describe as average. Furthermore,

19.60%, 0.02%, and 0.01% of the public school takers were described as low mastery, very

low mastery, and absolutely no mastery, respectively. Meanwhile, 23.24% and 0.01% of

private school takers were described as having low mastery and very low mastery,

respectively (Benito, 2010).

The need to improve NAT results is evident. Thus, this study may lead as a basis

for an additional process in validating standardized test wherever the students came from.

The detection and removal of bias items in any standardized achievement test, such as the

national achievement test, will greatly help increase the mathematical proficiency of the

students in whatever type of school they are enrolled in.


1.3Academic Performance in Mathematics (Second Quarter)

The second-quarter grade in mathematics of qualified Grade 7 students was sought

through documentary analysis. No students who got 75 below were documented, as shown

in Table 8. 33.3% of the students tallied very satisfactory while there are 31.9 students

tallied satisfactorily. Table 6 also shows that there were 46 outstanding students and 26

fairly satisfactory students.

The assessment of academic performance in mathematics is composed of three

components prior to the pandemic: written works (40%), performance tasks (40%), and

quarterly assessments (20%). Written work includes long tests, unit tests, or any activities

that ensure students' written skills in expressing their ideas. The performance task included

a skill presentation and demonstration. Written works may also be included in this

component. Finally, quarterly assessments are tested at the end of the quarter (DepEd order

no.8 s2015). The learning continuity plan of DepEd has been crafted to ensure the teaching

and learning process amid the pandemic. Based on the interim guidelines for grading and

assessment in light of the basic education learning continuity plan, students' mathematics

achievement level was composed of written works (50%) and performance tasks (50%) on

whatever forms of modality. Summative tests will continue in the form of written works

and performance tasks (DepEd order no.031 s2020). Thus, the performance of Grade 7

students is based on the different modules answered by modular distance learners and the

activities performed by online distance learners. Mostly, under this modality, tallied

satisfactorily.
The academic performance in mathematics of the students during the second

quarter was used to compute the concurrent validity of the achievement test comparing its

relationship to the raw score on the said achievement test.

2.Content, Construct and Concurrent validity, and Internal Consistency Reliability

of the Mathematics Test

A 60-item test was constructed using the Most Essential Learning Competencies

crafted in the DepEd learning continuity plan 2020. To ensure the distribution of the items,

table of specification was used.

Content Validity. Three (3) competent content validators checked the essentials of each

item of the mathematics achievement test in Grade 7. Polit and Beck (2006), and Polit et.

al (2007) suggested that the content validity index of the questions should be 1 if there are

3 to 5 experts who validate the questionnaire. It shows that there are only 4 out of 60 items

tallied as non-essential items.

Table 9
Content Validity of the Grade 7 Mathematics Achievement Test
n=60
Content Validation Frequency Percentage
Accepted (Essential) 56 93.3
Rejected (Non – essential) 4 6.7

Q51, Q57, Q58, and Q60 were non-essential questions based on the Most Essential

Learning Competencies, Subject Matter, and respondents. Question 51 approximates the

measures of quantities, particularly the length, weight/mass, volume, time, angle,

temperature, and rate. It should be noted that the items were repeated. Questions 57 and

58 illustrate the linear equation and inequality for one variable, respectively. Question 60

is concerned finding the solution of the linear equation or inequality in one variable.
Validators believed that the questions did not align with the most essential learning

competencies (MELCs). Although the results show that most of the items are accepted,

questions 8,10, 14, 22, 24, and 59 need to be revised based on their degree of clarity.

Content validity of the test items can also be done through item analysis, providing

the discrimination and difficulty index. The difficulty Index is the percentage of the total

group that has responded correctly to the item.

Table 10 shows that there are 37 questions that reach the optimum level of difficulty

in the mathematics 7 achievement test, while question 16 tallied as the easiest question

with 76. 8% value. Among the 22 hard questions, Q50, Q17, and Q32 tallied the most

difficult questions (19.8 %, 20.8% and 24.2 %, respectively). Question 16 (Q16) tackles

the derivation of the laws of exponent, while question 50 (Q50) tackles solving problems

involving equations and inequalities in one variable. Moreover, question 17 (Q17)

addresses the illustration of the linear equation and inequality in one variable, and question

32 (Q32) addresses approximating the measurement of quantities, particularly length,

mass/weight, volume, time, angle, temperature, and rate.

Table 10
Summary of Difficulty Index of the Grade 7 Mathematics Achievement Test
n=60
Difficulty Index Frequency Percentage
19.51 - 44.50 (Hard) 22 36.7
44.51-74.50 (Optimum) 37 61.7
74.51-89.5 (Easy) 1 1.6

Table 11 summarizes the discrimination index of the Grade 7 mathematics

achievement test. The index of Discrimination is the difference between the percentage of

correct responses in the upper group and the percentage of correct responses in the lower

group. The high-low interpretation of discrimination is similar to the interpretation of


correlational indices. Values near zero indicate little discrimination, positive values

indicate good discrimination, and negative discrimination indicates that the item is easier

for low-scoring respondents.

Table 11
Summary of Discrimination Index of the Grade 7 Mathematics Achievement Test
n=60
Discrimination Index Frequency Percentage
Change/Reconstruct(Below .30) 22 36.7
Acceptable/Retained(.30 and Above) 38 63.3

Sixty-three and 3 percent of the test questions on mathematics achievement tallied

acceptably and retained a discriminatory index of 0.30 and above. Questions 52, 23, 45,

46, and 15 tallied the highest and most acceptable discrimination indices, respectively. Q52

asked students to translate English phrases into mathematical phrases and English

sentences into mathematical sentences, and vice versa. Q23, Q45, and Q46 ask the students

to use models and algebraic methods to find the: (a) product of two binomials; (b) square

of binomial, (c) product of the sum and difference of two terms; (d) cube of a binomial;

and lastly (e) product of a binomial and a trinomial. Q15 Students deal with the subtraction

of polynomials.

Of the 60 questions, 22 need to be revised or changed. Questions 17, 50, and 51

tallied negative discrimination, indicating that the given questions are easier for low-

scoring respondents and need to be removed and changed. Q17 asked students to illustrate

a linear equation and inequality in one variable. Q51 approximates the measures of

quantities, particularly the length, weight/mass, volume, time, angle, temperature, and rate.

Q50 asked students to solve problems involving equations and inequalities in one variable.
Questions 17 and 50 tallied as both the most difficult and had a negative discrimination

index; thus, removing the items was strongly suggested.

Item difficulty indicates the quality of the test items and the test as a whole. With

61.7% of the items tested as the optimum level of difficulty, the target of 50% of the items

was achieved. Q16, who tallied with a high difficulty index value, implied that a greater

proportion of the students answered the question correctly. Meanwhile, 37.7% of the 60

items indicated that there was a smaller proportion of those who understood the question

and answered correctly. Based on Boateng et al.. al. (2018), difficulty of the items may be

due to the item being coded as wrongly, ambiguity with the item, confusing language, or

ambiguity with response options. The study also suggests that a lower difficulty value

requires item modification or deletion from a pool of items. Item difficulty is relevant for

determining whether students have learned the concept being tested.

The item discrimination index measures item effectiveness. A positively

discriminating item indicates questions that differentiate between those who are

knowledgeable about a subject and those who are not (Boateng et. al, 2018). This implies

that the majority of the test questions on the achievement test in mathematics, tallying 57

questions out of 60. This was recommended by Boateng et al.. al (2018) that these items

be considered for revision as the differences could be due to the level of difficulty of the

item. On the other hand, questions that are poorly designed such that the more

knowledgeable get them wrong and less knowledgeable get them right or the negatively

discriminating items were Q17, Q50, and Q51, and should be re-examined and modified.

An item discrimination index was used to improve the test items. If an item is non-

discriminating and fails to discriminate between respondents because it may be too easy,
too hard, or ambiguous, it should be removed. Questions that are negatively discriminating

should be reexamined and modified. Items that are positively discriminated against should

be retained (Boateng, et. al,2018). To maximize item discrimination, desirable difficulty

levels should be slightly higher than midway between chance and perfect scores for the

item. The item will have low discrimination if it is so difficult that almost everyone gets

the wrong or guesses the answer, or so easy that almost everyone gets it right. (Office of

Educational Assessment, University of Washington).

Construct Validity. Kaiser-Meyer-Olkin (KMO) test yielded an index of 0.727, described

as middling adequacy, and Bartlett’s test of sphericity was highly significant (x2 (1770) =

3389, p < 0.001), suggesting a support for Principal Component Analysis (PCA). PCA

showed a 21-factors solution based on Kaiser’s Criterion and Scree Plot Test, as shown in

Figure 2, that provides support for the construct validity of the mathematics achievement

test.

Table 12
KMO and Bartlett’s Sphericity Test of the Original Test Version

Test Value Description

Kaiser-Meyer-Olkin (KMO) 0.727 Middling Adequacy

Bartlett’s Sphericity p < 0.001 Highly significant

Using Jamovi Software

These factors accounted for 64. 86% of the variance in scoring. Factor 1 was highly

correlated to Q23, Q24, Q27, Q45. and Q46. Factor 2 was highly correlated with Q5, Q16,

and Q34. Factor 3 was highly correlated Q20, Q30, and Q33. Factor 4 was highly correlated

with Q13 and Q37. Factor 5 was highly correlated with Q14 and Q57, while Factor 6 was

highly correlated with Q49. Factor 7 was highly correlated with Q58, while Factor 8 was
highly correlated with Q8 and Q41. Factor 9 was highly but negatively correlated with Q17

while Q6 was highly correlated with Factor 10.

Figure 2
Scree Plot of the Original Test Version

Factor 11 was highly correlated with Q32 while factor 12 was highly correlated to

Q11 and Q56. Factor 13 was highly correlated with Q10 and Q54, and factor 14 with Q55.

Factor 15 and factor 16 were highly correlated to Q15 and Q9, respectively. Moreover,

Q18, Q20, and Q21 were highly correlated with Q60, Q1 and Q52, respectively. Factor 19

tallied a negatively high correlation with Q50.

No questions were highly correlated to at least two factors. Thus, multicollinearity

does not exist on the questions on mathematics achievement test.

Concurrent Validity. Validity was assessed using the current performance of the

respondents associated with the raw scores garnered during the test. Using the Pearson

product-moment correlation (r-value), the relationship between the achievement test and
the second-quarter grade in Mathematics 7 was determined. A r-value of 0.500, as it was

shown in table 13, indicates that there is a moderate positive correlation between the score

on mathematics achievement test and their performance in their 2nd quarter grade. The

scores in Mathematics achievement test account 25.0% of variance in their performance

during the second quarter as revealed by Linear Regression.

Table 13
Concurrent Validity of the Grade 7 Mathematics Achievement Test
Variable X Variable Y
Score
R Sig.
Academic performance in Mathematics (Second quarter) .500** .000
**Correlation was significant at the 0.01 level (2-tailed).

Pedrajita (2017) emphasized that concurrent validity is high if the test scores

obtained by the students are highly correlated with their grade point average, and low if the

test scores obtained are a low magnitude of correlation with their grade. This may indicate

that the test scores of grade 7 students on the mathematics achievement test are an average

magnitude of correlation on the students' second-quarter grade in mathematics.

Internal Consistency Reliability. The teacher-made achievement test in mathematics 7

was given to students with different learning styles that are enrolled in private and public

schools in the municipality of Victoria. The internal consistency reliability will examine

the consistency of responding to all the individual items that was derived from a single

administration of the test. Kuder-Richardson 20 (KR20) will be used in measuring the

reliability of the achievement test in mathematics.

Table 14
Internal Consistency Reliability of the Mathematics Test
Reliability Test Value Interpretation
Kuder-Richardson 20 0.875 Good
(KR20)
Table 14 shows that the constructed achievement test in mathematics has a Good

reliability tallying 0.875. According to University of Washington, Office of Educational

Assessment, tests with high internal consistency were items with mostly positive

relationships with total test scores. Thus, the constructed achievement test implies a

positive relationship with the students in grade 7 total test scores. A high reliability

indicates that the questions of a test tended to pull together. Students who answered a given

question correctly were more likely to answer other questions correctly. Low reliability

indicates that the questions tended to be unrelated to each other in terms of who answered

them correctly.

3. Item Bias Elimination Using Mantel-Haenszel Chi Square Analysis Based on

Students’ Profile

Item bias analysis through Differential Item Functioning (DIF) analysis examines

whether the construction of an index from two or more variables results in a bias about

different criteria. DIF is a popular and effective way to study item bias. A statistically

significant difference was observed across two or more groups of examinees due to the

characteristics of the item unrelated to the construct being measured. An item is considered

positively or negatively biased for a group within a population if the average expected item

score for that group is substantially higher or lower, respectively, than that for the overall

population.

Bias Detection Based on School Type

The detection of biased items using the Mantel-Haenszel method is based on

students’ type of school, either public (focal group) or private (reference group), as shown
in Table 15. A total of 10 items displayed statistical bias as it tallied with a significant MH

chi square value based on the type of school. These are questions 2, 15, 18, 20, 22, 29, 32,

46 51, and 59. Q32 tallied the highest amount of Differential Item Functioning (DIF) with

an absolute value of 2.49, while Q15 tallied with an absolute value of 2. 48 effect size. A

statistically significant chi square value indicates a large Differential Item Functioning

(DIF) effect for the Mantel-Haenszel Statistic (Pedrajita, 2017).

Table 15
Detected Bias Items Based on School Type
Item Chi-Square p-value Item Chi-Square p-value
Q1 0.618 0.432 Q31 3.661 0.056
Q2 4.904* 0.027 Q32 8.086* 0.004
Q3 0.822 0.365 Q33 0.375 0.540
Q4 0.055 0.815 Q34 0.131 0.717
Q5 0.171 0.679 Q35 0.979 0.323
Q6 0.073 0.788 Q36 0.000 0.988
Q7 1.218 0.270 Q37 0.012 0.914
Q8 1.134 0.287 Q38 0.070 0.791
Q9 0.050 0.823 Q39 3.831 0.050
Q10 0.002 0.963 Q40 1.105 0.293
Q11 0.411 0.521 Q41 3.259 0.071
Q12 0.002 0.963 Q42 0.006 0.938
Q13 1.506 0.220 Q43 2.222 0.136
Q14 2.748 0.097 Q44 0.269 0.604
Q15 10.56* 0.001 Q45 3.760 0.052
Q16 1.096 0.295 Q46 4.305* 0.038
Q17 0.044 0.834 Q47 0.020 0.889
Q18 5.724* 0.017 Q48 1.079 0.299
Q19 3.760 0.052 Q49 0.676 0.411
Q20 5.798* 0.016 Q50 1.317 0.251
Q21 0.107 0.744 Q51 6.512* 0.011
Q22 7.042* 0.008 Q52 0.495 0.482
Q23 2.271 0.132 Q53 2.473 0.116
Q24 0.543 0.461 Q54 0.629 0.428
Q25 0.575 0.448 Q55 0.137 0.711
Q26 0.689 0.407 Q56 0.000 0.988
Q27 3.090 0.079 Q57 3.095 0.079
Q28 2.241 0.134 Q58 0.089 0.766
Q29 6.259* 0.012 Q59 6.301* 0.012
Q30 1.909 0.167 Q60 2.045 0.153
Note: Highlighted items were statistically bias (MH chi-square value greater than 3.841)

Among the detected bias items, Q15, Q32, Q46, Q51, and Q59 were interpreted as

difficult based on the item difficulty analysis. In terms of the discriminatory index, suggests

that Q15, Q32, and Q51 should be changed or reconstructed. The results only confirmed

that the items need more revision or modification.

All questions displayed statistical bias in favor of the focal group, which consisted

of public school learners. Instructional practices and resources may have contributed to

these differences in student performance. During the pandemic and post-pandemic periods,

there were more seminars/workshops provided and carried out with regard to the

development of learning materials in public schools than in private schools. Public schools

in Victoria produced localized learning materials such as learning activity sheets, a

compendium of notes, and instructional support (aside from the SLMs given by DepEd) to

make teaching mathematics simpler among teachers and to make learning easier for

students. Meanwhile, private schools rely solely on modules crafted by the department and

from existing textbooks and reference books prior to the pandemic. However, the detection

of biased items regarding school type may address the problem of test equity issues.

Eliminating these biased items will ensure accurate evaluation among learners, regardless

of their school type, which will contribute to recalibrating teaching practices and

curriculum development. Thus, the need to conduct bias elimination for test standardization

highlights the results of the study.

Bias Detection Based on Learning Styles (Auditory vs Non-Auditory)

Table 16 shows the detected item bias based on student's learning styles,

specifically between the auditory (reference group) and non-auditory (focal group)
learners. Only three out of 60 items on the achievement test in Mathematics 7 obtained a

significant Mantel-Haenszel chi square value. Q31 had the largest effect size among the

items, with 1.96, a large amount of DIF. Q57 and Q37 tallied with a large effect size of

DIF at absolute values of 1.82 and 1.75, respectively. Q31 approximates the measures of

quantities, particularly length, weight/mass, volume, time, angle, temperature, and rate, as

well as an optimum difficulty index and an acceptable discrimination index analysis.

Validators’ assessment indicated that the item is essential to MELCs, subject

matter, and the level of the target respondents. This may imply that other factors may

contribute to their identification as biased items. This is also true for Q37, which evaluates

algebraic expressions for the given values of the variables. On the other hand, Q57

suggested that it is not essential for experts to remove MELC. Moreover, the discrimination

index of the item suggests changing or reconstructing the question; however, it reaches the

optimum level of the difficulty index.

Significantly biased items Q31, Q37, and Q57 favored the focal group (non-

auditory learning style). These learning styles were either visual or tactile learners who

tended to learn through visual aids and solve problems through trial-and-error approaches.

The most essential learning competencies of these identified items require students to

approximate, evaluate, and illustrate that might be learned through a rigid explanation of

the concepts through class discussion with the use of illustrations, board works, and

examples. With the learning materials (SLM, LAS, other printed materials) given to the

learners, it was assumed that during the pandemic and post-pandemic era, visual learners

were favored over auditory learners. This aspect of teaching and learning affects learning

outcomes, especially when designing tests that maximize equity and avoid biased items.
Thus, factors emphasizing item bias highlighted the need to conduct the study to ensure a

meaningful teaching and learning process.

Table 16
Detected Bias Items as Based on Learning Style (Auditory vs Non-Auditory)
Item Chi-Square p-value Item Chi-Square p-value
Q1 1.839 0.175 Q31 7.100* 0.008
Q2 0.039 0.843 Q32 0.340 0.560
Q3 3.362 0.067 Q33 0.637 0.425
Q4 0.511 0.475 Q34 0.642 0.423
Q5 0.000 0.998 Q35 0.000 0.987
Q6 1.439 0.230 Q36 0.002 0.969
Q7 0.155 0.694 Q37 5.360* 0.021
Q8 2.302 0.129 Q38 0.081 0.776
Q9 0.622 0.430 Q39 1.974 0.160
Q10 2.785 0.095 Q40 0.029 0.865
Q11 3.363 0.067 Q41 1.295 0.255
Q12 1.067 0.302 Q42 2.994 0.084
Q13 0.390 0.532 Q43 0.013 0.910
Q14 0.260 0.610 Q44 0.010 0.919
Q15 0.023 0.881 Q45 0.863 0.353
Q16 1.083 0.298 Q46 0.659 0.417
Q17 0.890 0.346 Q47 0.569 0.451
Q18 0.238 0.626 Q48 0.878 0.349
Q19 0.723 0.395 Q49 0.066 0.797
Q20 0.071 0.790 Q50 0.687 0.407
Q21 1.914 0.166 Q51 0.084 0.771
Q22 1.105 0.293 Q52 0.012 0.912
Q23 0.107 0.743 Q53 1.274 0.259
Q24 0.818 0.366 Q54 1.846 0.174
Q25 0.371 0.542 Q55 0.001 0.976
Q26 0.043 0.836 Q56 0.477 0.490
Q27 0.669 0.413 Q57 5.564* 0.018
Q28 0.089 0.766 Q58 0.477 0.490
Q29 1.189 0.276 Q59 0.009 0.925
Q30 0.434 0.510 Q60 0.000 0.994
Note: Highlighted items were statistically bias (MH chi-square value greater than 3.841)

Bias Detection Based on Learning Styles (Visual vs Non-Visual)

Between visual (reference group) and non-visual (focal group) learners, there are 6

questions identified as bias items after tallying a significant MH chi square value. As shown
in Table 17, Q60 tallied the largest amount of DIF with an effect size of absolute value of

2.49, while Q9, and Q6 tallied absolute value of 2.27 and 2.17, respectively. Q11, Q31 and

Q48 which is also categorized as a large amount of DIF.

Table 17
Detected Bias Items as Based on Learning Style (Visual Vs Nonvisual)
Item Chi-Square p-value Item Chi-Square p-value
Q1 0.134 0.714 Q31 7.3298* 0.007
Q2 1.006 0.316 Q32 1.938 0.164
Q3 0.029 0.864 Q33 1.075 0.300
Q4 0.154 0.695 Q34 2.194 0.139
Q5 1.369 0.242 Q35 1.218 0.270
Q6 7.982* 0.005 Q36 0.004 0.951
Q7 3.676 0.055 Q37 0.552 0.457
Q8 0.947 0.330 Q38 0.001 0.979
Q9 6.764* 0.009 Q39 2.262 0.133
Q10 0.011 0.918 Q40 0.642 0.423
Q11 4.782* 0.029 Q41 0.018 0.892
Q12 0.091 0.762 Q42 3.697 0.055
Q13 0.001 0.974 Q43 0.002 0.963
Q14 3.805 0.051 Q44 0.506 0.477
Q15 0.017 0.896 Q45 0.122 0.727
Q16 3.049 0.081 Q46 0.006 0.936
Q17 2.311 0.128 Q47 0.002 0.965
Q18 1.738 0.187 Q48 7.184* 0.007
Q19 0.051 0.821 Q49 0.264 0.607
Q20 1.984 0.159 Q50 0.026 0.872
Q21 0.001 0.972 Q51 2.719 0.099
Q22 2.201 0.138 Q52 1.738 0.187
Q23 0.805 0.369 Q53 0.269 0.604
Q24 0.055 0.814 Q54 0.046 0.830
Q25 0.183 0.669 Q55 0.032 0.859
Q26 0.400 0.527 Q56 0.057 0.811
Q27 0.472 0.492 Q57 2.993 0.084
Q28 0.249 0.618 Q58 0.003 0.959
Q29 0.580 0.446 Q59 0.037 0.847
Q30 0.019 0.890 Q60 7.439* 0.006
Note: Highlighted items were statistically bias (MH chi-square value greater than 3.841)

Question 6 (Q6) was suggested to change or reconstruct the item based on the

discrimination index, but it reached the optimum level of difficulty. Experts believe that
Q6 is essential to MELC, Subject matter, and the level of target respondent with an

acceptable clarity of the statement of the question. It tackles the illustration and

differentiates related terms in algebra: a. an, where n is a positive integer; b. constants and

variables; c. literal coefficients and numerical coefficients; d. algebraic expressions, terms,

and polynomials; e. number of terms, degree of the term, and degree of the polynomial.

Experts suggested that Q60 be removed from the mathematics achievement test

because the item was found to be non-essential to MELCs. It was also described that the

item is hard based on the difficulty index and needs to be changed and reconstructed in

terms of the discrimination index through item analysis. These findings are contrary to

those of Q48 and Q31. The item was acceptable in terms of content validity, reached an

optimum level based on the difficulty index, and had an acceptable discrimination index.

Questions 60 and 48 address the finding of the solution of the linear equation or inequality

in one variable, while Q31 addresses approximating measures of quantities, particularly

length, mass/weight, volume, time, angle, temperature, and rate.

Experts found that Q9 and Q11, which differentiate algebraic expressions,

equations, and inequalities and solve problems involving algebraic expressions,

respectively, are essential to MELCs, subject matter, and level of the target respondents.

The said items needed to be changed or reconstructed based on the tallied discrimination

index, but only Q11 was tagged as hard based on the difficulty index of the test questions.

Out of the six questions, five items were biased in favor of the reference group,

which consisted of visual learners. These were Q6, Q9, Q11, Q31, and Q48, while Q60

was significantly biased toward the focal group, which was a non-visual learner. The results

indicated that there was a higher score among the visual learners on the identified items.
There is a sufficient representation in mathematics teaching. Representation, as define by

Mainali (2021) and Goldin (2001), is a sign or combination of signs, characters, diagrams,

objects, pictures, or graphs, which can be utilized in teaching and learning mathematics

that can be done through verbal, graphic, algebraic and numeric. Specifically, they are

notational and formal external representations. These refer to algebraic expressions,

systems of numeration, derivates, programming languages, etc., whereas other

representations denote relationships visually or graphically, such as number lines and

graphs. Competency questions regarding identified bias against visual learners were asked

to find solutions, illustrate, solve, and differentiate algebraic equations, expressions,

inequalities, and approximate measures of quantities. These instructional practices may

contribute to differences in performance with regard to students' learning styles. Thus, test

equity may be addressed by eliminating biased items from the original test version.

Bias Detection Based on Learning Styles (Tactile vs Non-Tactile)

Table 18 shows the detected bias item through Mantel-Haenszel method based on

the learning styles, of tactile and non-tactile learners. It was shown that Q18, Q29, and Q60

were tagged as part of differential item functioning. Based on the results in table 15, among

the three questions, Q60 has an absolute value of 2.172 amount of DIF based on MHD.

Q18 and Q29 had tallied amounts of DIF with absolute value of 1.55 and 1.54, respectively.

Question 60 focused on finding the solution of the linear equation or inequality in one

variable. It was interpreted as hard based on the difficulty index and a changeable level of

discrimination index was recorded. Experts also agreed to remove the questions because

they were not essential to MELC, subject matter, and target respondents of the study.
Differentiating algebraic expressions, equations, and inequalities, and solving

problems involving equations and inequalities in one variable were the focus of questions

18 and 29. Both items were acceptable difficulty indices and had an optimum

discrimination index. Experts believe that the items are essential when it comes to MELC,

subject matter, and the target respondents of the study. The questions were clearly stated

on the achievement test.

Table 18
Detected Bias Items as Based on Learning Style (Tactile Vs Non-Tactile)
Item Chi-Square p-value Item Chi-Square p-value
Q1 0.699 0.403 Q31 0.001 0.980
Q2 1.956 0.162 Q32 0.471 0.492
Q3 2.291 0.130 Q33 0.015 0.904
Q4 1.646 0.200 Q34 0.338 0.561
Q5 1.120 0.290 Q35 0.934 0.334
Q6 2.493 0.114 Q36 0.031 0.861
Q7 2.088 0.148 Q37 2.013 0.156
Q8 0.131 0.717 Q38 0.012 0.912
Q9 3.054 0.081 Q39 0.000 0.996
Q10 2.049 0.152 Q40 0.257 0.612
Q11 0.070 0.791 Q41 0.702 0.402
Q12 2.284 0.131 Q42 0.011 0.917
Q13 0.561 0.454 Q43 0.000 0.999
Q14 1.866 0.172 Q44 0.237 0.626
Q15 0.030 0.862 Q45 0.181 0.670
Q16 0.348 0.555 Q46 0.345 0.557
Q17 0.199 0.656 Q47 0.323 0.570
Q18 4.080* 0.043 Q48 2.879 0.090
Q19 0.226 0.635 Q49 0.016 0.900
Q20 1.101 0.294 Q50 0.671 0.413
Q21 1.639 0.201 Q51 1.618 0.203
Q22 0.111 0.739 Q52 2.001 0.157
Q23 0.206 0.650 Q53 0.200 0.654
Q24 0.269 0.604 Q54 3.065 0.080
Q25 0.000 0.994 Q55 0.000 0.999
Q26 0.089 0.766 Q56 0.089 0.766
Q27 0.001 0.969 Q57 0.188 0.665
Q28 0.004 0.947 Q58 0.361 0.548
Q29 4.186* 0.041 Q59 0.021 0.886
Q30 0.925 0.336 Q60 7.110* 0.008
Note: Highlighted items were statistically bias (MH chi-square value greater than 3.841)
Q18 and Q29 were statistically biased against the focal group, non-tactile learners

(either auditory or visual learners), while Q60 was statistically bias against the reference

group which is the tactile learners.


Table 19
Summary of detected Biased Items using Mantel-Haenszel Chi Square and DIF
Analysis
Item Analysis
Biased
Question Content
item Item Item
Number Validity
against Difficulty Discrimination

2 Public Acceptable Optimum Acceptable


6 Visual Acceptable Optimum Cha/ Rec
9 Visual Acceptable Optimum Cha/ Rec
11 Visual Acceptable Hard Cha/ Rec
15 Public Acceptable Hard Cha/ Rec
18 Public
Acceptable Optimum Acceptable
NonTact
20 Public Acceptable Optimum Acceptable
22 Public Acceptable Optimum Acceptable
29 Public
Acceptable Optimum Acceptable
NonTact
31 NonAudi
Acceptable Optimum Acceptable
Visual
32 Public Acceptable Hard Cha/ Rec
37 NonAudi Acceptable Optimum Acceptable
46 Public Acceptable Hard Acceptable
48 Visual Acceptable Optimum Acceptable
51 Public Remove Hard Cha/ Rec
57 NonAudi Remove Optimum Cha/ Rec
59 Public Acceptable Hard Acceptable
60 NonVis
Remove Hard Cha/ Rec
Tactile
Note: NonAudi=Non Auditory, NonVis=Non Visual,
NonTact=Non Tactile, Cha/Rec=Change/Reconstruct

In general, there are 18 of the 60 questions obtained a significant MH chi square

value and it also indicates a large amount of Differential Item Functioning (DIF) that

displayed statistical bias. The results evidently show that all amount of bias contained on

the tagged items were large DIF that is also evident in the study of Villas (2019). Out of

40, 25 items on Probability and Statistics were tagged in his study as biased against females

and high English proficient examinees using the same method, Mantel-Haenszel analysis.

This may conclude that items flagged with DIF using the MH method were mostly large
and it is harmful to test items. A critical review of large DIF items is necessary and will be

only selected in exceptional circumstances (Khalid, et.al,2021).

Table 19 shows the summary results of detected bias items using Mantel-Haenszel

chi-square analysis on the mathematics achievement test of Grade 7 students between types

of schools and learning styles of the students.

Among these 18 questions, 10 obtained a positive value for delta Mantel-Haenszel

statistics (MHD), which implies that DIFs were in favor of the public school students.

These are questions 2, 15, 18 20, 22,29, 32 46, 51 and 59. In addition, Q18 and Q29 also

obtained a positive MHD between tactile and nontactile learners, signifying that DIF is in

favor nontactile learners. The Q60 tallied negative MHD values between tactile and non-

tactile learners, signifying bias in favor of the tactile learners.

On the other hand, questions 6, 9, 11, 31,48, and 60 tallied a large amount of DIF

between the visual and non-visual learners using MH chi square statistics. Questions 6, 9,

11, 31, and 48 obtained a negative value for MHD, which signifies questions in favor of

the visual learners. Meanwhile, Q60 obtained a positive MHD value, which signifies in

favor non-visual learners. A positive value of MHD between auditory and non-auditory

learners was obtained for questions 31, 37, and 57, signifying that the biased questions

were in favor of the non-auditory learners.

Among the detected DIF items, Q51 and Q60 were unacceptable to the content; it

is not essential to Most Essential Learning Competencies (MELCs), subject matter, and

grade 7 students. It is also subject to change or reconstruct after tallying below the 0.30

discrimination index, and tagged as a hard difficulty index tallying below 44.51. These

results imply that questions 51 and 60 are content-validity-related biases. Q57 may imply
content validity and discrimination index related bias. Content validity-related bias may

imply Q11, Q15, and Q32 after tallying below 44.51 in the difficulty index and below 0.30

in the discrimination index. With the optimum level of difficulty and acceptable content

but tallied below 0.30, the discrimination index was addressed by Q6 and Q9.

Moreover, even though they are tagged as statistically biased items with significant

values of MH chi square, Q2, Q18, Q20, Q22, Q29, Q31, Q37, and Q48 have acceptable

content validity as well as difficulty and discrimination indices. Thus, other factors may

contribute to the items that were tagged as biased, such as questions that are not

demographically and culturally holistic. It may also be due to linguistic and socioeconomic

bias in some of the respondents, as cited in the Glossary of Education Reform. The test

format may also contribute to bias, which may favor the specific learning style introduced

in the study.

The study implies that aside from the validity of content and using the indices of

difficulty and discrimination, other factors may contribute to bias in developing

assessments. A test developer must consider diversity in language, culture, and socio-

economic factors, as well as the learning styles of the test takers.

4. Content, Construct and Concurrent validity, and Internal Consistency Reliability


of the Mathematics Test After Removing Bias Items

Content Validity. Of the 60 test questions in the mathematics achievement test of grade

7, 42 questions (70 %) were retained after the detection of biased items using the Mantel-

Haenszel Chi square analysis. Only one of the retained questions was suggested to be

removed by the experts because it was not essential to the MELCs, subject matter, and the
level of grade 7 respondents, as shown in Table 20. The sole question, Q58, suggests that

students illustrate the linear equation and inequality in one variable.

Table 20
Content Validity of the Grade 7 Mathematics Achievement Test
After the Detection of Bias Items
n=42
Content Validation Frequency Percentage
Accepted (Essential) 41 97.6
Rejected (Non – essential) 1 2.4

The Difficulty index of the retained questions after the detection of biased items

consisted of 26 questions, or 61.9%, which reached the optimum level, while there were

15 questions that were tagged as hard, as shown in Table 21. Only 2.4 percent or one

question, which is Q16, was tagged as an easy question.

Table 21
Summary of Difficulty Index of the Grade 7 Mathematics Achievement Test
After the Detection of Bias Items
n=42
Difficulty Index Frequency Percentage
19.51 - 44.50 (Hard) 15 35.7
44.51-74.50 (Optimum) 26 61.9
74.51-89.5 (Easy) 1 2.4

Table 22 shows the breakdown percentage of the discrimination index of the


retained items after the detection of bias items using the Mantel-Haenszel chi square
analysis.

Table 22
Summary of Discrimination Index of the Grade 7 Mathematics Achievement Test
After the Detection of Bias Items
n=42
Discrimination Index Frequency Percentage
Acceptable/Retained(.30 and Above) 29 69.0
Change/Reconstruct(Below .30) 13 31.0

Table 23
Content Validity in terms of Difficulty and Discrimination Index of the Grade 7
Mathematics Achievement Test After the Detection of Bias Items
n=42
Item Analysis
Former Item No. Difficulty Index(%) Discrimination Index
New Item No.
Q1*** 37.20% 0.24 Remove
Q3 64.30% 0.43 Q1
Q4 44.90% 0.42 Q2
Q5 69.10% 0.4 Q3
Q7 50.20% 0.48 Q4
Q8*** 33.80% 0.04 Remove
Q10 44.90% 0.34 Q5
Q12 49.30% 0.15 Q6
Q13 53.10% 0.35 Q7
Q14 40.60% 0.41 Q8
Q16 76.80% 0.49 Q9
Q17** 20.80% -0.07 Remove
Q19 48.30% 0.43 Q10
Q21** 30.90% 0.14 Remove
Q23 53.10% 0.54 Q11
Q24 42.50% 0.44 Q12
Q25 69.60% 0.43 Q13
Q26 57.00% 0.37 Q14
Q27 54.10% 0.43 Q15
Q28 41.10% 0.32 Q16
Q30 52.70% 0.32 Q17
Q33 42.00% 0.34 Q18
Q34 58.50% 0.43 Q19
Q35 72.50% 0.43 Q20
Q36 51.70% 0.45 Q21
Q38 56.00% 0.41 Q22
Q39 57.00% 0.52 Q23
Q40 64.30% 0.45 Q24
Q41*** 25.10% 0.19 Remove
Q42 49.80% 0.42 Q25
Q43*** 31.90% 0.25 Remove
Q44 58.00% 0.5 Q26
Q45 48.30% 0.48 Q27
Q47 47.80% 0.24 Q28
Q49 48.30% 0.35 Q29
Q50*** 19.80% -0.05 Remove
Q52 56.50% 0.51 Q30
Q53*** 39.10% 0.18 Remove
Q54 46.40% 0.33 Q31
Q55*** 36.70% 0.19 Remove
Q56*** 43.00% 0.24 Remove
Q58*** 43.00% 0.22 Remove
**Items with poor and negative discrimination indexes are considered removed items.
***Items with poor and unacceptable discrimination and difficulty index considered as removed items.

This shows that the majority of the items were acceptable and retained as it tallies

0.30 and above the discrimination index. Only 13 questions of the retained items were

subject to reconstruction or change.


Among these retained questions, Q1, Q8, Q17, Q21, Q41, Q43, Q47, Q53, Q55,

Q56, and Q58 will be disregarded as it tallies with a poor discrimination index (below 0.30

at the same time and does not meet the optimum level at 44.51%-74.50% difficulty index.

Questions 17, and 21 poor and negative discrimination indices but a highly acceptable

difficulty index, will be excluded in the final revision of the achievement test on

mathematics 7.

Q12 and Q47 tallied with a poor discrimination index but a highly acceptable difficulty

index subject to inclusion in the final achievement test but need to be revised. The different

actions will be taken to questions 14, 16, 24, 28, and 33, whose difficulty index is tagged

as easy/hard items but a highly acceptable index of discrimination and subject to revisions,

as shown in Table 23.

Construct Validity. Kaiser-Meyer-Olkin (KMO) test yielded an index of 0.752, described

as middling adequacy, and Bartlett’s test of sphericity was highly significant (x2 (861) =

1951, p < 0.001), suggesting a support for Principal Component Analysis (PCA). PCA

showed 14-factor solution based on Kaiser’s Criterion and Scree Plot Test that provides

support for the construct validity of the mathematics achievement test. These factors

accounted for 59. 8% of the variance in scoring. Factor 1 was highly correlated with Q19,

Q23, Q27, Q39, and Q44. Factor 2 highly correlated with Q3, Q7, and Q40.

While Q5, Q16 and Q34 were highly correlated to Factor 3. On the other hand,

Factor 4 was correlated with Q4, Q33, and Q42. Q24 and Q38 were highly correlated with

Factor 5. Q13 was highly correlated with Factor 6. Furthermore, Factor 7 was highly

correlated with Q10 and Q54. Q12 and Q56 were highly correlated to Factor 8

Table 24
KMO and Bartlett’s Sphericity Test After the Detection of Bias Items
Test Value Description

Kaiser-Meyer-Olkin (KMO) 0.752 Middling Adequacy

Bartlett’s Sphericity p < 0.001 Highly significant

Moreover, Factor 9 was highly correlated with Q8 and Q41. Negatively high

correlation was tallied by Q1, and a positively high correlation was tallied by Q47 with

Factor 10. Q21 was highly correlated and Q17 was negatively high correlated with Factor

11 and 12, respectively. Factor 13 is highly correlated with Q53 while Factor 14 is with

Q50 and Q58. No questions were highly correlated to at least two factors. Thus,

multicollinearity does not exist on the questions on mathematics achievement test.

Figure 3
Scree Plot of the Test After the Detection of Bias Items

Concurrent Validity. An increase in concurrent validity in the mathematics achievement

test of grade 7 students is shown in Table 25. An r-value of 0.530 indicates that there is a

moderate positive correlation between the performance of the students during the second

quarter test and their scores on the achievement test. The scores of the students in
Mathematics achievement test account 28.1% of variance in their academic performance

during the second quarter as revealed by Linear Regression.

Table 25
Concurrent Validity of the Grade 7 Mathematics Achievement Test
After the Detection of Bias Items
Variable X Variable Y
Score
r Sig.
Academic performance in Mathematics (Previous quarter) .530** .000
**. Correlation was significant at the 0.01 level (2-tailed).

Predajita (2017) stated that the larger the concurrent validity, the less homogeneity

among groups of test scores. The more homogeneous the test becomes, the less valid it

becomes. The detection and removal of biased items using the Mantel-Haenszel Chi square

contributed to the increase in the concurrent validity value by 0.030, which indicates a

lower test homogeneity but a more valid set of test questions. Thus, this result signifies that

the detection and removal of biased items increase the validity of a standardized test.

Internal Consistency Reliability. The test length, according to the Glossary of Education

Reform, indicates that more items have higher reliability. Pedrajita (2017) emphasized that

the KR20 reliability analysis also indicates that the greater the length of the test, the higher

the internal consistency reliability. After the detection of biased items using Mantel-

Haenszel chi square analysis, a difference of 0.031 was computed. A total of 42 of 60 items

were retained after the detection of bias items.

Table 26
Internal Consistency Reliability of the Mathematics Achievement Test
After the Detection of Bias Items
Reliability Test Value Interpretation
Kuder-Richardson 20 0.844 Good
(KR20)
5. Comparison of Content, Construct and Concurrent validity of the Original and

Revised Test Versions

The main target of the study was to produce a 30-item validated and reliable

achievement test in mathematics 7 free from bias based on the students’ type of school and

learning styles. After validating and detecting bias using Mantel-Haenszel Chi square

analysis, a total of 31 items from the pool of 60 items covering the second quarter of the

Most Essential Learning Competencies were retained.

Content Validity. From a total of four items that were not essential to MELCs, subject

matter, and to the level of grade 7 students, no more items were tallied as non-essential by

the experts. Thus, a maximum of 100% of the items, or a total of 31 items, were accepted

and retained in the revised achievement test.

Table 27
Comparison of Content Validity of the Grade 7 Mathematics Achievement
Original and Revised Test Versions
Original Test Revised Test
Content Validation (n=60) (n=31)
Frequency Percentage Frequency Percentage
Accepted (Essential) 56 93.3 31 100
Rejected (Non – essential) 4 6.7 0 0

A decrease of 11 optimum questions was reduced in the original test has been made

after the validity tests and detection of bias items using the MH Chi-square analysis,

tallying 26 optimum questions retained. Only four items were optimized after they were

tagged as hard on the difficulty index, as shown in Table 28. The sole easy question, Q16,

was retained for optimization.

Table 28
Comparison of Difficulty Index of the Grade 7 Mathematics Achievement
Original and Revised Test Versions
Original Test Revised Test
Difficulty Index (n=60) (n=31)
Frequency Percentage Frequency Percentage
19.51 - 44.50 (Hard) 22 36.7 4 12.9
44.51-74.50 (Optimum) 37 61.7 26 83.87
74.51-89.5 (Easy) 1 1.7 1 3.23

Table 29 shows the reduction in the nine acceptable items based on the

discrimination index.

Table 29
Comparison of Discrimination Index of the Grade 7 Mathematics Achievement
Original and Revised Test Versions
Original Test Revised Test
Discrimination Index
Frequency Percentage Frequency Percentage
Change/Reconstruct(Below .30) 22 36.7 2 6.45
Acceptable/Retained(.30 and Above) 38 63.3 29 93.55

Two of the 22 questions were modified, and changed, measuring the same

competencies and further included during the final revision of the test. The final

achievement test for Mathematics 7 is now composed of 93.55 percent acceptable items

and 6. 45 percent of the reconstructed items were from the original pool of questions. Two

questions were subject to revision and optimization through retesting as part of the

recommendation of the study.

In general, twenty-four items among the retained questions, after detecting biased

items, were tallied with a highly acceptable discrimination and difficulty index. The

questions automatically included in the achievement test in Mathematics 7 were Q3, Q4,

Q5, Q7, Q10, Q13, Q19, Q23, Q25, Q26, Q27, Q30, Q34, Q35, Q36, Q38, Q39, Q40, Q42,

Q44, Q45, Q49, Q52, and Q54.

Table 30
Itemized Difficulty and Discrimination Index of the Revised Version of the Grade 7
Mathematics Achievement Test
Former Item No. Content Validity New Item No.
Difficulty
Discrimination Index
Index(%)

Q3 64.30% 0.43 Q1
Q4 44.90% 0.42 Q2
Q5 69.10% 0.4 Q3
Q7 50.20% 0.48 Q4
Q10 44.90% 0.34 Q5
Q12** 49.30% 0.15 Q6
Q13 53.10% 0.35 Q7
Q14* 40.60% 0.41 Q8
Q16* 76.80% 0.49 Q9
Q19 48.30% 0.43 Q10
Q23 53.10% 0.54 Q11
Q24* 42.50% 0.44 Q12
Q25 69.60% 0.43 Q13
Q26 57.00% 0.37 Q14
Q27 54.10% 0.43 Q15
Q28* 41.10% 0.32 Q16
Q30 52.70% 0.32 Q17
Q33* 42.00% 0.34 Q18
Q34 58.50% 0.43 Q19
Q35 72.50% 0.43 Q20
Q36 51.70% 0.45 Q21
Q38 56.00% 0.41 Q22
Q39 57.00% 0.52 Q23
Q40 64.30% 0.45 Q24
Q42 49.80% 0.42 Q25
Q44 58.00% 0.5 Q26
Q45 48.30% 0.48 Q27
Q47** 47.80% 0.24 Q28
Q49 48.30% 0.35 Q29
Q52 56.50% 0.51 Q30
Q54 46.40% 0.33 Q31
*Items with hard/easy item difficulty but with an acceptable index of discrimination are considered for inclusion but subject to revision
**Items with poor discrimination index but highly acceptable difficulty index considered for inclusion

It also shows that questions 14, 16, 24, 28 and 33 were among the 16.13 percent of

the questions that needs revision to achieve their maximum level of acceptability in terms

of difficulty. These tasks were differentiating algebraic expressions, equations and

inequalities, deriving the laws of exponent, the use models and algebraic methods to find
the square of a binomial, finding the solution of linear equation or inequality in one the

square of a binomial, finding the solution of linear equation or inequality in one the square

of a binomial, finding the solution of linear equation or inequality in one the square of a

binomial, finding the solution of linear equation or inequality in one the square of a

binomial, finding the solutions of linear equation or inequality in one the square of a

binomial, finding the solutions of the linear equation or inequality in one variable, and

converting measurements from one unit to another in both Metric and English systems,

respectively.

Among the 6.45 percent of the items that needed to be reconstructed based on the

discrimination index, as shown in Table 27, were Q12 and Q47. Students were asked to

evaluate algebraic expressions for the given values of the variables and find the solution of

the linear equation or inequality in one variable.

Construct Validity. After the elimination of items, the revised test version’s Kaiser-

Meyer-Olkin (KMO) test yielded an index of 0.806, described as adequate, and Bartlett’s

test of sphericity was highly significant (x2 (496) = 1520, p < 0.001), suggesting support

for Principal Component Analysis (PCA).

PCA only showed a 10-factor solution based on Kaiser’s Criterion and Scree Plot

Test that provides support for the construct validity of the mathematics achievement test,

compared to 21 factors of the original test version.

Table 31
KMO and Bartlett’s Sphericity Test After the Detection of Bias Items
Test Value Description

Kaiser-Meyer-Olkin (KMO) 0.806 Adequate

Bartlett’s Sphericity p < 0.001 Highly significant


These factors accounted for 57. 82% of the variance in scoring. Factor 1 was highly

correlated with Q3, Q5, and Q16. Factor 2 was highly correlated with Q27 and Q45. Factor

3 tallied highly correlated with Q4 and Q38. Q5 was highly correlated with Q13, while

Q28 and Q47 were highly correlated with Factor 6. It was also found that Factor 7 was

highly correlated with Q38 and Q40. Factors 8, 9 and 10 were highly correlated with Q12,

Q54, and Q8, respectively.

Figure 4
Scree Plot of the Revised Test Version

The original test version showed that there are no items that measure the same

construct. While Q38 was tallied as highly correlated with two factors, Factor 4 and Factor

7, hence, the item measures the same construct in the revised test version. Thus, the item

suggests to be removed or reconstructed.


Concurrent Validity. Table 32 shows the increase in correlation of the revised test version

of the mathematics achievement test 7 based on the concurrent validity of the test upon

comparing the test results and the academic performance of the students in the second

quarter. An increase of 0.053 indicates that after the exclusion of the items identified as

biased based on the Mantel-Haenszel Chi-square analysis, as well as items with much

difficulty and discrimination index, the revised version of the achievement test in

Mathematics 7 signifies that there is a higher correlational coefficient of the test scores in

the achievement test and students' academic performance in the second quarter.

Table 32
Comparison of Concurrent Validity of Grade 7 Original and Revised Mathematics
Achievement Test Versions
Variable X Original Test Revised Test
Score Score
r Sig. R Sig.
Academic performance in Mathematics .500** .000 .553** .000
(Previous quarter)
**. Correlation was significant at the 0.01 level (2-tailed).
The scores of the students in the revised test version of the Mathematics

achievement test account for 30.6% of the variance in their performance during the second

quarter as revealed by Linear Regression compared to the 25% of the variance of the

original test version. This also indicates that the group of test scores is less homogeneous

on the revised test questions; thus, the test is more valid (Pedrajita, 2016). The result

reflects that the validation process and detection of item bias using the MH chi-square

analysis strengthen the validity of a standardized test.

6. Comparison of Internal Consistency Reliability of the Original and Revised Test


Versions
The internal consistency reliability of both test versions was described as good;

however, the revised test version decreased by 0.003 using the KR 20 analysis. This type

of reliability analysis indicates that the higher the number of tests retained (bias-free items),

the higher the internal consistency reliability (Pedrajita 2017; Glossary of Education

Reform).

Table 33
Internal Consistency Reliability of the Original and Revised Test Versions
Original Test Revised Test
Reliability Test
Value Interpretation Value Interpretation
Kuder-Richardson 0.875 Good 0.872 Good
20 (KR20)

From the 60 items on the mathematics achievement test, 31 items were retained

after conducting a single test content, construct, and concurrent validity as well as after the

detection of bias items. Pedrajita’s (2017) study has a similar claim to the study’s decrease

in reliability. its reliability of 0.71 with 50 items, a decrease of 0.14 in the reliability with

28 questions retained using the Mantel Haenszel analysis decreased. Thus, a test may be

made more reliable by increasing its length (Pedrajita, 2017; Ferguson & Takane, 1989).

The final Test Version of the achievement test composed of 31 items divided into

the 10-week Most Essential Learning Competencies (MELCs) as it was shown in Table

34. A sufficient number of items was tallied on weeks 4, 5, 9, and 10. While there is a need

for 2 items on weeks 1, 7 and 8. There is an excess item on the actual number of items on

weeks 2, 3, and 6. Thus, the output recommends increasing the number of items of the

original test version to come up with the needs of the number of items on the target number

of items on the final revised test version.

Table 34
Table of Specification of the Revised Test Version of Achievement Test in
Mathematics 7
Learning Levels
Number
MELC’s Number (Item Distribution)
Week of
of items K C Ap An S E TOTAL
Hours
ITEMS
Approximates the measures of
quantities particularly length, 1
1 4 3 1
mass/weight, volume, time, angle,
temperature and rate.
Converts measurements from one unit
2 to another in both Metric and English 3 18 2
systems. 4 3 4
Solves problems involving conversion
2 20
of units of measurement.
Translates English phrases to
mathematical phrases and English 4
3 30
sentences to mathematical sentences, 21
and vice versa.
Illustrates and differentiates related
terms in algebra:
a. a where n is a positive integer, b. 4 3 5
constants and variables, c. literal
19
3 coefficients and numerical 5
coefficients, d. algebraic expressions,
terms and polynomials, e. number of
terms, degree of the term, and the
degree of the polynomial.
Evaluate algebraic expressions given
4 6
values of the variables
4 3 3
Add polynomials and subtract 7
4
polynomials 23
5 Derive the laws of an exponent. 9
Multiply and divide polynomials. 4 3 10 3
5
26
Uses models and algebraic methods to
find the: (a) product of two binomials;
(b) square of a binomial, (c) product of 12
6 4 4 13 27 11 31 5
the sum and difference of two terms;
(d) cube of a binomial (e) product of a
binomial and trinomial;
Solves problems involving algebraic
24 22
expressions.
Differentiates algebraic expressions,
8
7-8 equations, and inequalities 8 6 4
25
Illustrates linear equations and
inequality in one variable.
9-10 Finds the solution of a linear equation
28 14 16
or inequality in one variable.
Solves the linear equation or
inequality in one variable involving
8 6 15 6
absolute value by graphing and
algebraic method.
Solves problems involving equations 17
and inequalities in one variable 29
Total 40 31 2 6 8 5 4 6 31
Remarks: K-Knowledge, C-Comprehension, Ap-Application, An-Analysis, S-Synthesis, E-Evaluation

Chapter 5
SUMMARY, CONCLUSIONS, AND RECOMMENDATIONS

This chapter presents the summary of the study, the conclusions derived from the

findings, and the recommendations derived from the data which were shown, analyzed,

and interpreted.

Summary
The study was conducted to detect, analyze, and eliminate biased items in the

mathematics achievement test using the Mantel-Haenszel method as a basis for test

standardization.

It aimed to describe the students' profiles, such as type of school, learning styles,

and academic performance during the second quarter in Mathematics 7. This study used a

descriptive method to analyze the content, construct, and concurrent validity of a teacher-

made achievement test in mathematics. The internal consistency reliability and the

significantly biased items across the student profile are also described.

Three experts were asked to check the test content validity, and the item difficulty

and discrimination index was used to identify construct validity. The relationship between

the student's academic performance and achievement test scores was used to describe the

concurrent validity of the test. The Kuder-Richardson 20 was used to describe the internal

consistency validity of the achievement test. The Mantel-Haenszel chi-square was used to

detect the bias items, while DIF analysis using the MH Delta was used to analyze the

magnitude of the bias items.

The respondents of the study were grade 7 students from public and private schools

in the municipality of Victoria school year 2021-2022.

Salient Findings of the Study


Based on the results of the study, the following are the salient findings:
1. With a total of 207 grade 7 qualified students, thirty-six and seven percent, or 36.7% of

them were Visual learners, tallied the most number, while Tactile learners tallied the least

with 64 or 30.9% of the total learners. Most of the qualified learners came from public

schools with 115 students or 55.6% while 44.4% or 92 students were from private schools.

Moreover, among these qualified learners, 33.3% and 31.9% with second quarter grades of

85-89 and 80-84, respectively. Only 12.6 percent or 26 students tallied 75-79 grades for

their performance in mathematics during the 2nd quarter.

2. Three competent content validators found out that out of 60 items from the original test,

four items were non-essential to the subject matter, MELCs, and grade 7 students. These

items were Q51, Q57, Q58 and Q60. Sixty-one and seven percent (61.7%) or 37 questions

achieved an optimum level of difficulty index while there were 22 questions or 36.7% of

the test items were tagged as Hard questions. On the other hand, Q16 was the only item

tagged as Easy in terms of the Difficulty index. A total of 38 questions out of 60 items on

the original version of the test, were suggested to be retained as it tallies an acceptable

index of discrimination. Principal Component Analysis (PCA) was used to describe the

construct validity of the achievement test. A total of 21 factors describe 64.86% of the total

variance. No questions were highly correlated to at least two factors. Thus,

multicollinearity does not exist on the questions on mathematics achievement tests.

Pearson product-moment correlation was used to identify the concurrent validity of

the original test version and the academic performance of the students in mathematics

during the 2nd quarter. It was determined that there is a moderate positive correlation with

an R-value of 0.500. In contrast, good internal consistency reliability was determined using

KR20, with a value of 0.875.


3. The elimination of bias items from the original test versions was done using the Mantel-

Haenszel Chi-Square analysis with the students' profiles as a basis. A total of 10 questions

displayed statistical bias (critical value > 3.8415 at 0.05 alpha level with one of) based on

the type of school. These are questions 2, 15, 18, 20, 22, 29, 32, 46, 51 and 59. A positive

delta MH (MHD) value statistics indicative that bias was against the public school students.

Between Auditory and non-auditory learning styles, only Q31, Q37, and Q57 out of 60

questions were tagged as significantly biased against non-auditory learners, as they tallied

with a positive MHD value. Out of 60 items, only 6 items were found statistically biased.

These were Questions 6, 9, 11, 31, 48, and 60. Q6, Q9, Q11, Q31, and Q37 were

statistically biased against visual learners after tallying negative MHD values. In contrast,

Q60 tagged bias against non-visual learners with positive MHD. Q60 was also significantly

biased against tactile learners tallying negative MHD. Q18 and Q29, together with Q60,

were significantly biased between the tactile and non-tactile learners. These questions were

biased toward nontactile learners.

In general, a total of 18 out of 60 questions obtained a significant MH chi-square

value and it also indicated a large amount of Differential Item Functioning (DIF) that

displayed statistical bias. These were Q2, Q6, Q9, Q11. Q15, Q18, Q20, Q22, Q29, Q31,

Q32, Q37, Q46, Q48, Q51, Q57, Q59, and Q60.

4. After removing 18 statistically biased items, Q58 was suggested to be removed by the

experts after tallying a non-essential value on MELCs, subject matter, and to the level of

grade 7 students. Construct validity in terms of difficulty index tallied 26 questions of

optimum level. Twenty-nine questions reach the acceptable value of 0.30 and above on the

discrimination index. Fourteen-factor solution was established describing a total of 59.8 %


of the total variance. No questions were highly correlated to at least two factors. Thus,

multicollinearity does not exist on the questions on mathematics achievement tests.

Positive moderate concurrent validity is denoted by the achievement test after eliminating

statistically biased items with a 0.530 r-value. A Good interpretation of internal consistency

reliability was computed after tallying the 0.844 KR20 value of the mathematics

achievement test.

5. The revised test version is composed of 31 questions after testing the validity and

detection of bias items. Four non-essential questions were removed from the original test

version of the mathematics achievement test 7, providing a total of 31 essential questions

based on the experts' judgment on content validity. A total of 18 hard questions were

removed, and only four questions were retained in the revised version. Twenty-six of the

31 questions on the test versions were on the optimum level of the difficulty index, while

the item tagged as easy was also retained for optimization. In terms of the discrimination

index, only of out 2 questions on the revised test version out of 22 questions below the 0.30

discrimination index on the original test versions were retained. On the other hand, the

remaining 29 questions on the revised test version out of the 38 questions with 0.30 a

discrimination index greater than or original test version were retained. PCA only showed

a 10-factor solution based on Kaiser’s Criterion and Scree Plot Test that provides support

for the construct validity of the mathematics achievement test, compared to 21 factors of

the original test version. A moderately positive correlation, with an increase of 0.053 in the

r-value on the concurrent validity of the revised test version, was evident after the detection

and elimination of bias items and testing of validity.


6. The KR20 value denotes the good reliability of both test versions, implying that the

greater the number of items, the larger the reliability value. Thus, there was a 0.003

decrease in the revised test version in terms of the internal consistency reliability of the

achievement test.

Conclusions
As an outcome of the findings presented in the previous discussions, the following

conclusions are drawn:

1. Based on the results, it concludes that grade 7 students from public and private schools

prefer to learn by seeing visuals such as graphs, pictures, and other visual instructional

materials. This learning factor may conclude the contribution of biased items on the

achievement test.

2. The validity and reliability process revealed that most of the developed items were

crafted on a basis that it measures what is intended to measure with considerations on the

Most Essential Learning competencies, subject matter, and level of grade 7 students. Thus,

the majority of the questions were retained and subject to bias elimination analysis.

3. Validated test questions do not imply that they are free from bias as revealed by the

study. The detection and removal of these biased items strengthen the validity of the

questionnaire even after its usual way of standardizing a test.

4. Based on the results of the elimination of biased items, test questions' validity and

reliability strengthen as it increases the percentage of valid items that can be retained on

the final revision of the test.

5. The final test version of the achievement test in Mathematics 7 concludes to have a

higher percentage of acceptable validity items than the original test version.
6. A decrease in internal consistency reliability was observed in the revised test version as

compared to the original test version.

Recommendations
The following recommendations were formulated by the researcher based on the

finding and conclusions of the study:

1. Teachers, as test developers, should add more identified student profiles affecting the

teaching-learning process as a basis for identifying biased items on assessment to further

improve students' performance.

2. The effectiveness of the classical way of validating a test was evident in the results of

the study. Thus, the continuously used of the process is still recommended in validating

tests.

3. The use of Mantel-Haenszel analysis as a statistical method for identifying misbehaving

test items is a better add-on process in test validations. Thus, teachers, as test validators,

should engage in detecting and eliminating biased items to strengthen the validity of the

questionnaire.

4. Re-administration of the achievement test by test developers after the detection of biased

items should be performed to further optimize, refine, and purify the item content of the

test.

5. The final test version will be subject to re-administration in achieving standardization of

the test.

You might also like