0% found this document useful (0 votes)

9 views

Solving Arithmetic Word Problems Using Natural Language Processing and Rule-Based Classification

Uploaded by

M R Alam

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

9 views

Solving Arithmetic Word Problems Using Natural Language Processing and Rule-Based Classification

Uploaded by

M R Alam

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 12

See discussions, stats, and author profiles for this publication at: https://www.researchgate.

net/publication/359629772

Solving Arithmetic Word Problems Using Natural Language Processing and

Rule-Based Classiﬁcation

Article in International Journal of Intelligent Systems and Applications in Engineering · March 2022
DOI: 10.18201/ijisae.2022.271

CITATIONS READS

7 426

3 authors, including:

Rohini Basak Sourav Mandal

Jadavpur University XIM University, Bhubaneswar
26 PUBLICATIONS 347 CITATIONS 26 PUBLICATIONS 119 CITATIONS

SEE PROFILE SEE PROFILE

All content following this page was uploaded by Rohini Basak on 01 April 2022.

The user has requested enhancement of the downloaded file.

International Journal of
INTELLIGENT SYSTEMS AND APPLICATIONS IN
ENGINEERING
ISSN:2147-67992147-6799 www.ijisae.org Original Research Paper

Solving Arithmetic Word Problems Using Natural Language Processing

and Rule-Based Classification
Swagata Acharya 1, Sourav Mandal *2, Rohini Basak 3

Submitted: 28/09/2021 Accepted : 02/02/2022 DOI: 10.1039/b000000x

Abstract: In today's world, intelligent tutoring systems (ITS), computer-based training (CBT), etc. are rapidly gaining popularity in both
educational and professional fields, and an automatic solver for mathematical word problems is one of the most important subfields of
ITS. Automatic solving of mathematical word problems is a challenging research problem in the fields of artificial intelligence (AI) and
its subfields like natural language processing (NLP), machine learning (ML), etc., since understanding and extracting relevant
information from an unstructured text requires a lot of logical skills. To date, much research has been done in this area, focusing on
solving each type of mathematical word problem, such as arithmetic word problems, algebraic word problems, geometric word problems,
trigonometric word problems, etc. In this paper, we present an approach to automatically solve arithmetic word problems. We use a rule-
based approach to classify word problems. We propose various rules to establish the relationships and dependencies among different key
elements and classify the word problems into four categories (Change, Combine, Compare, and Division-Multiplication) and their
subcategories to identify the desired operation among+, -, *, and /. However, it is limited to solving only word problems with a single
operation and a single equation word problem. Irrelevant information is also filtered out from the input problem texts, based on manually
created rules to extract relevant quantities. Later, an equation is formed with the relevant quantities and the predicted operation to obtain
the final answer. The work proposed here performs well compared to most similar systems based on the standard SingleOp dataset,
achieving an accuracy of 93.02%.

Keywords: solving arithmetic word problems, classification of word problems, rule-based information extraction, rule-based arithmetic
word problem solver.
This is an open access article under the CC BY-SA 4.0 license.
(https://creativecommons.org/licenses/by-sa/4.0/)

1. Introduction learning experiences, it is quite difficult for a system to solve

even a basic addition-subtraction kind of word problem. Though,
Mathematical word problems can be defined as mathematical
from the evaluation of the computer, the machine has proven its
exercises that present the relevant background information on a
supremacy in terms of speed and accuracy of the mathematical
problem as natural language text, rather than in the form of
calculations, it is not capable enough to extract information from
mathematical notations [1]. Here, the natural language could be
natural languages and understand accordingly. Therefore, it is
any language like English, Arabic etc. For example, “Kimberly
considered as one of the open research areas in the domain AI,
has 5 Skittles. She buys 7 more. Later, Kimberly buys 18 oranges
ML, NLP, to build a computing system which can be
at the store. How many Skittles does Kimberly have in all?” is an
programmed according to human cognitive perspective to solve
arithmetic word problem from our reference dataset. Where, as
mathematical word problems in different ways.
per the question asked, 5 and 7 are treated as relevant quantities
Designing the algorithms to solve mathematical word problems is
and take part to generate the final answer (5+7) =12, and 18 is
not a new concept, the idea arose back in the 1960s and since
considered as irrelevant quantity. Since, the arithmetic word
then plenty of research have been carried out to deal with various
problems are an integral part of our day-to-day calculation, even
aspects of solving diverse word problems. Although, the progress
the children get introduced to such kind of problems from a very
is not that great and still in the preliminary level. We have chosen
early stage and as they move to higher classes, the complexity of
to work on SingleOp dataset, which was published by [2],
the problems also get increased. The children are trained to solve
containing arithmetic word problems with all the basic four
word problems involving the basic mathematical operations like,
operations (+, -, *, /).
addition, subtraction, multiplication, and division in beginning, to
The Intelligent Tutoring Systems (ITS), Computer Based
rate, permutation, combination, probability etc.
Training (CBT), and various online-learning platforms are also
However, though the human brain can solve all these kinds of
gaining popularity in the last two decades. These systems are
diverse problems efficiently along with their growing age and
_____________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________
trying to use AI, ML, and NLP to improve the quality of
1
Dept. of Information Technology, Jadavpur University, Kolkata- 700032, teaching-learning procedure. Mathematical word problem solvers
West Bengal, India. ORCID ID: 0000-0002-4942-5734 can be one useful components of such systems, which typically
2
School of Computer Science and Engineering (SCSE), XIM University,
Harirajpur-752050, Odisha, India. ORCID ID: 0000-0002-6066-8008 focus to replicate the personalized tutoring. The authors in [3-6],
3
Dept. of Information Technology, Jadavpur University, Kolkata- 700032, have made some attempts to build intelligent tutoring systems
West Bengal, India. ORCID ID: 0000-0001-9662-3074 related to arithmetic word problems. The primary goal of these
* Corresponding Author Email: [email protected]
International Journal of Intelligent Systems and Applications in Engineering IJISAE, 2022, 10(1), 87–97 | 87
Table 1. Example of MWPS belongs to various classes and sub-classes.

Change (Level 1)
Change Plus (Level 2) Change Minus (Level 2)
There were 14 kids on the soccer field. 22 kids decided to Denise removes 5 bananas from a jar. There were originally 46 bananas in the jar.
join in. Now how many kids are on the soccer field? How many bananas are left in the jar?
Relevant quantities- 14, 22, Operation- ‘+’, Equation- Relevant quantities- 5, 46, Operation- ‘-’, Equation- 46-5, Answer- 41
14+22, Answer- 36
Combine (Level 1)
Combine Plus (Level 2) Combine Minus (Level 2)
A cake recipe requires 0.6 cup of sugar for the frosting and There are 40 boys and some girls on the playground. There are 117 children altogether.
0.2 cup of sugar for the cake. How much sugar is that How many girls are on the playground?
altogether? Relevant quantities- 40, 117, Operation- ‘-’, Equation- 117-40, Answer- 77
Relevant quantities- 0.6, 0.2, Operation- ‘-’, Equation-
0.6+0.2, Answer- 0.8
Compare (Level 1)
Compare Plus (Level 2) Compare Minus (Level 2)
Lucy has an aquarium with 212 fish. She wants to buy 68 James has 232 balloons. Amy has 101 balloons. How many more balloons does James
more fish. How many fish would Lucy have then? have than Amy?
Relevant quantities- 212, 68, Operation- ‘+’, Equation- Relevant quantities- 232, 101, Operation- ‘-’, Equation- 232-101, Answer- 131
212+68, Answer- 280
Division-Multiplication (Level 1)
Division (Level 2) Multiplication (Level 2)
Betty has 24 oranges stored in boxes. If there are 3 boxes, Jill invited 37 people to her birthday party. They each ate 8 pieces of pizza. How many
how many oranges must go in each box? pieces of pizza did they eat?
Relevant quantities- 24, 3, Operation- ‘/’, Equation- 24/3, Relevant quantities- 37, 8, Operation- ‘*’, Equation- 37*8, Answer- 296
Answer- 8

systems is to provide high quality education to each student these categories to identify the desired operation, using keyword-
through computers. based cues, pattern-based cues, parts-of-speech cues etc. We also
We have tried to build an arithmetic word problem solver by worked with the problems containing irrelevant quantities and
classifying the word problems according to their operations used rule-based approaches to identify relevant quantities from
involved and our work is inspired by the research of [7-10]. After the word problem discussed later in this paper. Defining rules for
classifying the word problems, we solved them independently. the problems which require numerical reasoning or word
Table 1 shows some word problems belonging to various knowledge was quite difficult for our approach, though we tried
categories and sub-categories. to relate those problems to some structural cues, most of the time
Depending on various research studies, the authors in [7], tried to it does not satisfy the proper reasoning. The source code of our
categorize the addition-subtraction type arithmetic word problems work is available at [43]. The key highlights of this research work
into four categories i.e., CHANGE, EQUALIZING, COMBINE are given below.
and COMPARE. Further, by performing deeper analysis, authors • Innovative classification features extraction.
in [7] tried to sub-categorize them. The sub-categories are, • Unique rules for irrelevant information removal.
CHANGE (Result unknown, change unknown, start unknown), The rest of the paper is organized as follows. Section 2 describes
EQUALIZING, COMBINE (Combine value unknown, Subset the previous work done in the field of automatic math word
unknown), COMPARE (Difference unknown, Compared quantity problem solving. Then, the detailed methodology we followed is
unknown, Referent unknown) [7, Table 4.3]. Among these, the described in section 3. Section 4 discusses the experimental
categories CHANGE, COMBINE and COMPARE are also used results critically and analyse the errors of our proposed method.
by the authors in [11-13], though the sub-category names differ. Finally, section 5 concludes the paper and provides with the
The authors in [8, 9], also perform similar kind of classification future scope of the proposed method.
techniques with CHANGE, COMBINE, and COMPARE [8,
Table 1] or with the names differ slightly [9, Table 2]. 2. Related Work
The authors in [14], first proposed a method, which includes the
division-multiplication type problems along with addition- We can broadly categorize the methodologies adopted in previous
subtraction type problems. All the systems proposed by [2, 10, work into three categories- using Symbolic Semantic Parsing,
14, 15, 16] can solve the arithmetic problems with all the basic using Structure Prediction, and using Deep Learning. In Symbolic
four types of operations. The datasets chosen by them are also Semantic Parsing [17,18, 19, 20, 21, 22], semantic parsing refers
similar. All this research motivated us to solve the addition, to the process of converting natural language text (parsing) to an
subtraction, multiplication, and division type word problems with intermediate logical form that captures the meaning of the input
some new strategies. (semantic). We refer to the early work in this field as symbolic
The core concept of our technical approach is based on semantic parsing as the intermediate representation often
classification features we proposed, which is also closely related including human readable symbols. Structure Prediction [2, 9, 15,
to the human cognitive capabilities in understanding natural 16, 23, 24, 25, 26, 27, 28, 29], refers to the process of building
language-based word problems. We used several keyword-based data driven models that align a particular simple intermediate
cues to classify the word problems into four categories i.e., representation with the vectorial representation of a word
Change, Combine, Compare and Division-Multiplication. This problem. The features used to convert the text to vector were
concept is quite like the work of [10]. Further we sub-categorised hand-engineered. Deep Learning [30] modelling is a recent neural

International Journal of Intelligent Systems and Applications in Engineering IJISAE, 2022, 10(1), 87–97 | 88
network-based approach to convert one sequence to another. system ‘ARIS’, which solves addition-subtraction type problems
[17] first proposed an approach and built a system named by following Symbolic Semantic Parsing. It represents the whole
“STUDENT”, which is capable to read, understand and evaluate word problem as a logic template named state, which consists of
a wide range of algebraic word problems (of some specific a set of entities, their attributes, containers, quantities, and their
structure like, times, rates, percentage etc.,) represented in natural relationships. For example, “Tom has 10 white kittens.”, here,
language (English) and gives the answer in natural language. The ‘kitten’ refers to the entity, ‘white’ refers to its attribute, and
system basically consists of two programs- (i) “STUDENT” is for since kittens belongs to Tom, so, ‘Tom’ is treated as the
converting the algebraic word problems into equation form and container. They built an SVM-based classifier to identify the verb
(ii) “REMEMBER” is for storing the global information for category. They also compiled a dataset named AI [36] on
solving a particular word problem. However, the main addition-subtraction type problems and achieved remarkable
disadvantage of the system is, it is only able to solve very small accuracy on that dataset.
number of problems due to the limited semantic base and global Some notable works on tree-based methods (structure prediction)
information. After, the work of [17], the next notable work was have been done by [2,23,24]. The main idea behind their works is
done by [31], and they discussed about the issues, that children to transform the arithmetic expression to an equivalent binary tree
face during solving an arithmetic word problem. One of these structure step-by-step, by following bottom-up approach, where
issues is about establishing the relationship between conceptual the internal nodes represent the operators, and the leaves
and procedural knowledge. By analysing the characteristics of the represent the operands. The main advantage of this is, there is no
addition-subtraction type word problems, they proposed a need of additional annotations i.e., equation template, logic forms
theoretical approach to solve the problems by categorizing them or tags. The algorithmic approach they developed, can solve
into four categories- ‘CHANGE’, ‘COMBINE’, ‘COMPARE’ multi-step and multi-operation arithmetic word problems and the
and ‘EQUALIZING’. These categories are further divided into algorithmic framework consists of two processing stages. At the
sub-categories. The categories, ‘CHANGE’ and ‘EQUALIZING’ first stage, the relevant quantity extraction is done from the input
are generally associated with the actions, related to increment or text and from the bottom levels of the tree. The syntactically valid
decrement of the quantities, but ‘COMBINE’ and ‘COMPARE’ candidate trees but, with different internal nodes and structures
describe the static relationship among the quantities. are enumerated. At the second stage, to pick the best candidate
Another, similar kind of work was done by [32]. They built a tree, they defined a scoring function, and that candidate tree is
model named ‘CHIPS’, which was a simulation program based used to derive the final output. All the algorithms they developed,
on children psychology for solving any word problem and the follows a common strategy to build the local classifier to predict
level of difficulties they met. The similar kind of work was also the operation between two quantities. The authors in [2], first
carried by [33] and [8], and the system they built is based on proposed the algorithmic approach of expression tree to solve the
human cognitive science theories, named ‘ARITHPRO’ and arithmetic word problems. They trained a binary classifier to
‘WORDPRO’ respectively. To represent the meaning of a word determine whether the extracted quantity is relevant or not, to
problem, ‘WORDPRO’ uses a set a proposition. The system minimize the search space. Only the relevant quantities take part
consists of four schemas- ‘Change-in’, ‘Change-out’, ‘Combine’ in tree construction, and are placed at the bottom level, while the
and ‘Compare’. Conceptually, the ‘Change’ schema, establish the irrelevant quantities are eliminated. They introduced and proved
relationship among, ‘Start-set’ (‘Jemmy had 5 apples’), many theorems to identify the operations between two relevant
‘Transfer-set’ (‘Then, John gave her 4 apples’) and ‘Result- quantities along with their order of occurrence. They used
set’(‘How many apples does Jemmy have now?’). Basically, the multiclass SVM to predict the operation and the binary SVM to
system solves a problem by following certain rules, i.e., 13 rules identify relevant quantities. They out-performed all the previous
for ‘meaning postulates’, 12 rules for ‘arithmetic strategies’ and system accuracies on existing datasets and created two new
11 rules for ‘problem solving procedures’. These rules are applied datasets, named Illinois-562 and Commoncore-600, consisting of
sequentially, for addition, change and subtraction, depending on more diverse and complex word problems. Their system was
the content of the ‘Short Term Memory’(STM) of the system. more generalized with minimal dataset dependency. Further, they
Just like ‘CHIPS’ and ‘WORDPRO’, ‘ARITHPRO’ could also extended their work to create a web based MWP solver [23],
solve the single equation and single operation word problems of which can solve a huge number of word problems provided by
type addition-subtraction. It also categorizes the word problems common users. To manage the queries, asking for operations
into three categories (‘CHANGE’, ‘COMBINE’, ‘COMPARE’), between the numbers, they added a CFG parser with their
same as ‘CHIPS’. Since, both systems had limitations on the existing MWP solver. Later, they also developed a system based
change verb (‘give’) and the order of appearance of the problem on the theory of ‘Unit Dependency Graph’ (UDG), which
sentences i.e., the first sentence must mention the number of identifies the relationship and dependency between the units of
objects the owner had initially, and the second sentence must the quantities [24]. An extensive review of these works can be
contain the change verb. After, the previous research, the next found in [37].
remarkable research was done by [19], with the system The authors in [15], first approach the method based on template-
‘ROBUST’, which could understand free-format multi-step based techniques (structure prediction) to solve the algebraic
arithmetic word problems with irrelevant information. Though, word problems. The area of their research was based on three
the system is based on propositional logic, it can work perfectly main fields (Semantic Interpretation, Information Extraction, and
for multiple verbs and the corresponding operations. Instead of Automatic Word Problem Solver) of Natural Language
identifying the operation, ‘ROBUST’ uses the concept of schema. Processing (NLP). They used both supervised and semi-
The author expanded the ‘CHANGE’ schema into six distinct supervised learning methods by gathering problems and solutions
categories (‘Transfer-In-Ownership’, ‘Transfer-Out-Ownership’, from a website named Algebra.com. However, the performance
‘Transfer-In-Place’, ‘Transfer-Out-Place’, ‘Creation’ and of their system was not up to the mark, where the additional
‘Termination’) according to their role in the word problem. The background knowledge and domain knowledge were required.
author in [21], proposed an approach and came out with the For example, “A painting is 20 inches tall and 25 inches wide. A

International Journal of Intelligent Systems and Applications in Engineering IJISAE, 2022, 10(1), 87–97 | 89
print of the painting is 35 inches tall, how wide is the print in characteristics we tried to broadly categorize them. Four such
inches?” However, they included all the basic four operations (+, categories as we mentioned before are- Change, Combine,
-, *, /) in their dataset. In Equation Based, many of the systems Compare and Division-Multiplication. The first three categories
can handle multiple simultaneous equations. However, if the basically belong to Addition-Subtraction type problems and the
template is not available in the training phase, it is not possible to last one refers to the entire Division-Multiplication type
generate new templates at inference time. In structure prediction, problems. These categories are further categorized into sub-
every system had their own niche set of hand-crafted rules to categories in level-2 classification framework to determine the
develop the vector representation of a word problem. They were desired operation. We basically used keyword-based cues,
far more generalizable than their symbolic counterparts. There positional cues, phrase cues, and pattern cues, etc., to classify the
were some attempts at modelling domain knowledge either in the word problems in multiple levels. We are indebted to [10] as we
form of constraints or introducing new template elements [38]. have reused some of their identified features and reused them in
In the recent years, deep learning has gained remarkable our proposed method with various new cues and features.
popularity due to its predominance in terms of accuracy, when,
the system is trained with enough data. In the last few years,
several efforts have been put into solving math word problems by
applying deep learning. The authors in [30], first proposed an
algorithm Deep Neural Solver (DNS) which does not depend on
hand-crafted features, and it is considered as huge contribution,
since it does not require any human intelligence for feature
extraction. It directly translates the input word problem to
corresponding equation templates using Recurrent Neural
Network (RNN) model, using any feature engineering. For
improving the performance of the system, they further came up
with a new hybrid model that is built of combining the RNN
model and a similarity-based retrieval model. This model consists
of a set of encoders and decoders. It also includes a classifier, to Fig 1. System overview of rule-based math word problem solver
determine the significance of a numerical quantity and proposed a
TF-TDF similarity-based retrieval model, to predict the question
associated. To examine the performance of both models an 3.3.1. Level-1 Classification Framework
experiment was conducted on a large set of data and surprisingly
I. Change
these outperform all the state-of-the art models built on statistical
learning methods. The category ‘Change’ can be defined as set of actions that
RNN models are generally used to perform Seq2Seq modelling. causes the increment or decrement to the quantity belonging to a
Though these models provide satisfying results over both small particular entity or variable.
and large datasets, traditional ML models perform better on • Change Verb Keywords- “gives”, “takes”, “loses”, “lost”,
smaller datasets, due to the high lexical similarities [39]. “add”, “join”, “left”, “shares”, “eaten” etc.
• Change Non-Verb Keywords- “now”, “change”, “sum”,
3. Proposed Method “away”, “rest”, “off”, “empty” etc.
3.1. Problem Formulation II. Combine
A single operation, single equation and single step arithmetic The category ‘Combine’ refers to the word problems which are
word problem P can be defined as a sequence of n words {w0, related to the combination or collection of two or more entities. In
w1,...,wn-1} which contains a set of quantities QP = {q0, this type of word problems, either the combined numerical value
q1,….,qx-1}, where, n>x. The quantities i.e., the numeric values, of participating entities is asked or the combined value and one of
appear in the quantity set according to the order of appearance of the participating entity’s values are given and the other
the numerical entities in P [10]. The set of relevant quantities can participating entity’s value is being asked.
be defined as QP(rel) = {qs, qt}, where, {qs, qt}∊ QP, i.e. QP(rel) ⫅
• Combine Keywords- “all”, “total”, “together”,
Qp.
“altogether” etc.
Let, PSingleOp is a set of arithmetic word problems, and each
problem P ∊ PSingleOp can be solved by evaluating a correct III. Compare
mathematical equation E, which was formulated by the quantities The category ‘Compare’ represents the set of questions, where
of QP(rel) and by selecting one of the operators op ∊ {+, -, *, /}. one quantity is being compared to another quantity. Here, the
The equation E, for the problem P ∊ PSingleOp can be formulated category does not always mean the comparison between two
by applying one of the possible equation format {Eaddmul, different entities, it could be the comparison to current numerical
Esubdiv}, described in section 3.4. values associated to the state of same entity also. For example,
3.2. System Overview “Brenda starts with 7 Skittles. She buys 8 more. How many
Skittles does Brenda end with?” where, the additional quantity 8,
Fig 1 describes the overview of the system. The detailed is compared to the current numerical state of the entity, which is
workflow is explained in the following sections. 7, to find out the actual answer.
3.3. Operation Prediction • Comparative Adjectives or Adverbs- Any word
belonging to the mentioned Part-of- speech tags (POS) like,
Predicting the operation of the MWP is one of the major tasks. “more”, “less”, “longer”, “heavier”, “fewer” etc.
We used a multilevel classification framework like [10] for this • Associated Comparative Keywords- “another”, “than”
task. In the level-1 classification framework, we manually studied etc.
the characteristics of each problem, and according to the

International Journal of Intelligent Systems and Applications in Engineering IJISAE, 2022, 10(1), 87–97 | 90
IV. Division-Multiplication “costs” and “change” > are both present in the word
This category contains all the word problems of type ‘Division problem, it should be a part of “Change” type problem.
and Multiplication’. Therefore, the features to identify this The keywords, both of which appeared to be single, as well
as combined features, the precedence of execution of these
category are quite different than the previous three.
features should be combined feature followed by single
Multiplication is about combining equal parts to make a whole
feature. For example, the combined feature, < “costs” and
and Division is about separating into equal parts. ‘Division- “change”> has higher precedence than the single feature
Multiplication’ are the key operations for some specific type of “costs”.
word problems as well, such as calculating time and distance,
3.3.2. Level-2 Classification Framework
calculating area, etc.
For, level-2 classification, we reused the features identified in
• “Equal part” Related Keywords- “each”, “every”, “per”. level-1 classification along with some extra features, which
• Time & Distance Related Keywords- “mile”, “kilometre”, includes, keyword-based cues, positional cues, phrase cues,
“meter”, “minute”, “hour” etc. pattern cues, and the combination of these cues. To apply the
• Miscellaneous Div-Mul Keywords- “whole”, “times”, positional cues, we have divided the input problem into two
“row”, “split”, “divide”, “cost”, “square”, “cover”, “do”,
parts- “story part” that contains all the sentences excluding the
“feet,” etc.
question sentence, and “query part” that contains only the
• Combined Div-Mul Keywords- <“sold” and “does”>,
<“shares” and “among”>. question sentence. The main objective of level-2 classification is
to apply the unique features or combination of features on level-1
By analysing the dataset, we observed that, one word problem classification output to identify whether the input question
may belong to multiple categories simultaneously, i.e., it may performs any one operation among Addition, Subtraction,
have keywords representing two different categories. For Multiplication, and Division.
example, “There are 4 marbles. 7 marbles more are added. How
Addition-Subtraction
many are there total?” Here, the keywords “more” and “total”
appear together in a single word problem, where, “more” I. Compare
keyword is generally used to identify “Compare” type problems According to the concept discussed in Level-1 Classification
and “total” keyword indicates “Combine” type problems. To Framework (III. Compare) and by analysing the “Compare”
avoid this kind of conflicts, we prioritized the categories based on category [31, 8] and the “Comparison” category [9], we have
the number of occurrences of the keywords belonging to each divided it into two subcategories- Comparative Addition and
category. Since the keyword “more” occurs more frequently to Comparative Subtraction.
identify the category “Compare” with respect to “total” for • Keyword Based Cues
“Combine”, the priority is given to “Compare”. Therefore, the (i) Presence of keyword “some” in the “Compare” type
above-mentioned problem is categorized as “Compare” type. question always indicates “Subtraction” operation. (ii)
Basically, we adopted the precedence rules for different Presence of keyword “another” always indicates “Addition”
categories from the work of [10]. According to these rules, operation, according to the dataset.
“Compare” has the highest priority, followed by “Division- • Keyword Positional Cues- If, the Comparative Adjectives
Multiplication”, “Combine” and “Change” respectively. or Adverbs are present in the “query part”, the operation
should be “Subtraction”.
• Dealing with Overlapping Keywords- As we know that, • Combined Cues- If, the Comparative Adjectives or
“Compare”, “Combine” and “Change” are the categories of Adverbs are present in the “story part”, whether the
Addition-Subtraction type problems, their overlapping
operation is “Addition” or “Subtraction” is decided based
keyword features can easily be solved by applying the
precedence of the categories. However, some exceptions on some other cues, (i) If, the comparison is done between
may happen, in case of Division-Multiplication. Since it is two different entities and therefore, the keyword “than” is
possible to overlap some keywords between Addition- present in the question: At this situation, to identify, which
Multiplication and Subtraction-Division, there is a entity is being compared with respect to another entity,
possibility of categorizing “Combine” or “Change” keyword cues or positional cues are not sufficient. For
category problems as “Division-Multiplication” type, as example, if we consider the questions, (a) “Ethan has 31
“Division-Multiplication” has higher precedence than presents. Alissa has 22 more than Ethan. How many
“Combine” and “Change”. We have handled these cases presents does Alissa have?” and (b) “Sean has 223 whistles.
explicitly. For example, “Linda has 34 candies. Chloe has He has 95 more whistles than Charles. How many whistles
28. How many candies do they have in all?” Here, the does Charles have?” both the questions seem similar
keyword “do” belongs to “Division-Multiplication” according to the keyword and positional cues, but clearly
category, whereas the keyword “all” belongs to “Combine” they perform different operations. To overcome this
category. To overcome such conflicts, if we simply follow
scenario, we need to use pattern cues here. (ii) If the
the precedence table, it returns its category as “Division-
Multiplication” type, which is not correct. Thus, to identify comparison is done to the numerical value associated with
the categories uniquely, we must follow some combined the current state of the same entity, the operation should be
keyword features. The below mentioned list displays all “Addition”. Algorithm 1 shows the procedure.
such conditions.
• Combined Explicit Combine Features- If, the keywords Algorithm 1: compare_type_pattern_cue (question,
(i) < “do” and “all”> (ii) < “do” and “altogether” > both predicted_category)
present in the word problem, the problem should belong to
the category “Combine”. Input: (i) Word problem after lower casing the text (ii) Category
• Combined Explicit Change Features- If, the keywords (i) of the input problem, which is the output of Level 1
classification.
< “each” and “added” > (ii) < “miles” and “left” > (iii) <
Output: The predicted operation of the word problem.

International Journal of Intelligent Systems and Applications in Engineering IJISAE, 2022, 10(1), 87–97 | 91
1. predicted_operation ⃪ ɸ “Division”. (ii) If the keyword “per” is present in the
2. persons[] ⃪ ɸ question, and
3. if(predicted_category == “compare” and “than” ∊ question), -If the keywords “far”, “miles”, “points” etc. are present in
then, the “query part” i.e., if the question is mainly asking about
4. resolve the co-references of the question.
distance related information, the operation should be
5. if(proper noun ∊ question),
“Multiplication”.
6. if(proper noun ∉ persons[]),
7. persons[] ⃪ P, where, P={P0, P1} is proper noun -If the keywords “long”, “minutes”, “gallons” etc. are
8. if(“than” → next == P0), then, present in the “query part” i.e., if the question is mainly
9. return(predicted_operation ⃪ “addition”) asking about time related information, the operation should
10. else, be “Division”. (iii) If the keyword “cost” is present in the
11. return(predicted_operation ⃪ “subtraction”) question, and
-If the phrase “how much” is present in the “query part”,
II. Combine
the operation should be “Multiplication”.
As per the discussion of Level-1 Classification Framework (II. -If the phrase “how many” is present in the “query part”,
Combine), we divided it into two sub-categories, Combine the operation should be “Division”. (iv) If the keyword
Addition and Combine Subtraction. “times” is present in the “story part”, the operation should
• Keyword Based Cues- Presence of keyword “total” in the be “Multiplication”.
“Combine” type problems, always indicates “Addition” (v) If the keyword “times” is present in the “query part”,
operation. and
• Keyword Positional Cues - If, the keyword “will” is present in the question, the
(i) If, the keyword “all” is present in the “story part”, along operation should be “Multiplication”.
with the keyword “will” in the question, it indicates - Otherwise, the operation should be Division. (vi) If the
“Addition” operation, but, in that case, absence of the keyword “each” or “every” is present in the “story part”,
“will” keyword indicates “Subtraction” operation. (ii) If, and
the keyword “together” is present in the question, the -If, a numeric value is present in the “query part”, the
operation should be “Addition”. (iii) Presence of keyword operation should be “Multiplication”.
“all” or “altogether” in the “query part” of “Combine” type -If the “Combine keywords” i.e., “all”, “total”, “altogether”
question, always indicates “Addition” operation. (iv) If, the are present in the question, the operation should be
keyword “altogether” is present in the “story part”, “Multiplication”.
the operation should be “Subtraction”. - If none of the above two conditions is satisfied, there
III. Change exists some multiplication and division problems, which are
indistinguishable based on the keyword cues. For example,
According to the concept discussed in Level-1 Classification
Table 2 lists up a few such cases and Algorithm 2 describes
Framework (I. Change), irrespective of the position of unknown
the rules we propose to handle them.
quantity, we have tried to find out the features, that ultimately
responsible for defining its sub-categories as, Change Addition Table 2. Comparing item name in “story part” and “query part” to
and Change Subtraction. identify the final operation

• Keyword Based Cues Item Name Item Name

(i) Presence of the keyword “sum” in the question, always Word Problem in “Story in “Query Operator
Part” Part”
indicates “Addition” operation. (ii) Presence of any one of
the keywords “away”, “empty”, “rest”, “loses”, “lost”, Case 1:
(a) Marlee has 12 guests coming (a) table (a) table (a) “/”
“change”, “take”, “off”, “shares”, “eaten”, “gives” in the to her Halloween party. Each
“Change” type question, always indicates “Subtraction” table will hold 3 guests. How
operation. many tables will she need?
• Keyword Positional Cues- (i) If the keywords “added” or (b) Michelle has 7 boxes of (b) box (b) crayon (b) “*”
“join”, is present in the “story part”, the operation should be crayons. Each box holds 5
“Addition”, otherwise, if these keywords are present in the crayons. How many crayons
does Michelle have?
“query part”, the operation should be “Subtraction”. (ii) If
the keyword “left” is present in the “story part” and its part- Case 2:
(c) Mrs. Heine is buying (c) biscuit (c) biscuit (c) “*”
of-speech is verb, the operation should be “Addition”, else, Valentine’s Day treats for her 2
if the keyword is present in the “query part”, the operation dogs. If she wants to buy them 3
should be “Subtraction”. heart biscuits each, how many
Division-Multiplication-As per the discussion of Level-1 biscuits does she need to buy?
Classification Framework (IV. Division-Multiplication), we (d) There are 14240 books in a (d) book (d) shelve (d) “/”
have divided it into two subcategories- Division and library. They are arranged on
shelves that hold 8 books each.
Multiplication. We have already observed that, in many How many shelves are in the
cases “Division” and “Multiplication” share same set of library?
keywords, however based on several other factors, the
ultimate operation is determined. Here, the underlying pattern of the questions play an important
• Combined Division-Multiplication Cues-(i) If the role to identify the final operation. The Algorithm 2 describes the
keyword “each” is present in the “query part”, and if, the same.
“story part” of the question is null, the operation should be
“Multiplication”. Otherwise, the operation should be

International Journal of Intelligent Systems and Applications in Engineering IJISAE, 2022, 10(1), 87–97 | 92
Algorithm 2: divmul_type_pattern_cue (story_part, Pre-processing is considered as the most important step for the
query_part, predicted_category) information extraction from the arithmetic word problems in the
first step of pre-processing. We perform co-reference resolution
Inputs: (i) Story part of an input question. (ii) Query part of an and substitution to substitute the pronouns with the relevant
input question. (iii) Category of the input problem, which is the
nouns and used NeuralCoref [40] for this purpose. After that, the
output of Level 1 classification.
Output: The predicted operation of the word problem. input text is segregated into two parts, ‘story part’ and ‘query
part’. Further, the story part is divided into individual sentences.
1. predicted_operation ⃪ ɸ Then, we eliminated the conjunctions which are responsible for
2. item_name ⃪ ɸ joining two quantities and re-constructed the sentences. We used
3. if(predicted_category == “div-mul” and <“each”/ “every”> ∊ SpaCy’s dependency parser [41] for this purpose. For example, if
story_part), then, a question contains the sentence, “Carolyn starts with 47 marbles
4. find the index of “each” / “every” and 6 oranges.”, it is re-phrased as, “Carolyn starts with 47
5. if( noun(s) present between each_index+1 and end of marbles” and “Carolyn starts with 6 oranges.”
story_part),
After pre-processing, we extracted the information provided in
6. item_name ⃪ item_name + noun
the “query part”. We observed that, the information ranges to four
7. find out noun phrases ∊ query_part
8. find out noun phrase that contains wh-word. parameters depending on the type of operation the question
9. find out the rightmost noun present in the wh-noun phrase. belongs to. The parameters include location, primary entity,
10. if(item_name ≠ ɸ), then, person(s) involved and secondary entity. It is not necessary that
11. find out the rightmost noun present in the item_name all the problems should have all these parameters. For example,
12. if(item_name_rightmost_noun == “There are 8 apples in a pile on the desk. Each apple comes in a
wh_phrase_rightmost_noun),then, package of 11. 5 apples are added to the pile. How many apples
13. return(predicted_operation ⃪ “division”) are there in the pile?”, if we consider the query part, “pile” is
14. else, considered as location and “apples” is considered as primary
15. return(predicted_operation ⃪ “multiplication”)
entity. Here, location means the place, not any geographical
16. else,
17. if(each_index-1 == noun), then, location. So, it is identified by matching the POS pattern,
18. item_name ⃪ lemmatized(each_index → prev) determiner followed by a preposition and a noun, and then by
19. if(item_name_rightmost_noun == extracting only the noun from the matched pattern phrase.
wh_phrase_rightmost_noun),then, Likewise, primary entity is the entity, about which the problem is
20. return(predicted_operation ⃪ “multiplication”) asking about, and secondary entity is another entity apart from
21. else, the primary entity and are identified by extracting nouns from
22. return(predicted_operation ⃪ “division”) noun phrases and person(s) name are identified by extracting
proper nouns from the “query part”.
• Explicit Division-Multiplication Keyword Cues- (i)
Presence of the keywords “do”, “cover”, “far”, “row”, Addition-Subtraction
“will” etc. in the question mostly indicate “Multiplication”
The information extracted from the “query part” of Addition-
operation. (ii) If, the keyword “whole” is present in the
Subtraction type problems, mainly consists of three types of
question, and the keyword “cover” is not present in the
information i.e., location, person(s) involved and primary entity.
question, the operation should be “Division”. (iii) Presence
Combination of these is also possible. (i) If, the location
of the keywords “split” “sold”, “fast” etc. in the Division-
information and the primary entity information, both are present
Multiplication type question, indicate the operation
in the “query part”, then,
“Division”.
-First search for the location in the sentences of “story part”. If, a
3.4. Identifying Relevant Quantities sentence contains the location, then search for the presence of
After predicting operation for a word problem, the next primary entity. If, the primary entity is also there, then only the
challenging work is to identify the relevant quantities which are quantity belongs to that sentence is considered as relevant.
responsible for final answer generation. Basically, a word However, if the primary entity is not present in the sentence,
problem may contain irrelevant quantities. However, identifying since, the location information is present, then also the quantity
irrelevant sentences seems simpler than identifying irrelevant present in the sentence is considered relevant by mapping it to the
quantities. Here, the irrelevant information does not only mean entity name of the previously qualified sentence. Hence, location
out of context information, but also the information, that is has the higher precedence than the entity name. (ii) If, the
important for the problem definition, but not taking part in location information is not present in the “query part”, but
answer generation. primary entity and person name(s) related information are
By analysing the dataset, we observed that, irrelevant information present, then,
(or quantities) belongs to the word problems comprising of all the -First search for the person’s name(s) in the sentences of “story
four types of operations i.e., “Addition”, “Subtraction”, part”. If, a sentence contains the person’s name, then search for
“Multiplication” and “Division”. Since the features to identify all the presence of primary entity. If the primary entity is also
these operations are different, they are handled in different present in the sentence, then only the quantity belonging to that
manner in identifying irrelevant quantities. Depending on the sentence is considered as relevant. However, if the primary entity
characteristic of the questions containing irrelevant information, is not present in the sentence and only person name is present,
we have divided these into three groups- “Addition-Subtraction”, then the quantity present in the sentence is considered relevant by
“Division” and “Multiplication” and propose specific mapping it to the entity name of the previously qualified
independent rules to filter out the irrelevant information (or sentence, if, no other entity name belongs to the same sentence.
quantities). Hence, person name(s) has the higher priority than the entity
name. (iii) If only the person’s name(s) related information is

International Journal of Intelligent Systems and Applications in Engineering IJISAE, 2022, 10(1), 87–97 | 93
present in the “query part”, then, problems is limited up to single equation and single operation.
-Consider the quantities relevant, which belong to the sentences However, they included the word problems with irrelevant
containing the person’s name(s), same as the person’s name(s) information also to increase the complexity a step ahead. The
present in the “query part”. (iv) If only the primary entity related dataset contains 562-word problems, consisting of 159 addition,
information is present in the “query part”, then, 159 subtraction, 117 multiplication and 127 division type
-Consider the quantities relevant, which belong to the sentences problems. Table 3 and Table 4 shows the performance of the
containing the entity, same as the entity information present in proposed method in predicting the desired operation.
the “query part”. Table 3. Performance of Operation Prediction for Each Operation.
Accuracy Precision Recall
Division Operation F1 Score (%)
(%) (%) (%)
By analysing the problems of division, we have observed that, the
information extracted from the “query part”, consists of only two Addition 97.5 98.65 92.45 95.44
types of information, i.e., primary entity and secondary entity. A Subtraction 98.04 99.33 93.71 96.44
quantity belonging to a sentence of “story part” is considered Multiplication 97.68 94.82 94.01 94.41
relevant, if, any of the entities of “query part” is present in that
sentence. Hence, both entities are of equal priority. Division 99.28 99.2 97.64 98.41

4.1. Critical Discussion

Multiplication
The concept of identifying relevant quantities for multiplication From Table 5, we can observe that, our method outperforms most
type operation is a bit complex than the previous two types, as we of the state-of-the-art systems built on the same dataset. The main
have observed that, multiplication type questions may contain idea behind the systems like KAZB [15], Roy and Roth [2] is to
more than two quantities, though the third quantity is not actually use classification techniques for different components of their
extraneous information. For example, “Tammy drove 55 miles in systems, while the system proposed by [35], is a meaning-based
one hour. At that rate, how far can she drive in 36 hours?” Here, system.
“one hour” is not extraneous information, but not also the
Table 4. Result of Our Method Specific to Each Operation.
quantity which is responsible for final output generation.
The ‘query part’ of Multiplication type problems with irrelevant Total Answers Accuracy
Operation
Problems Correctly (%)
information, generally consists of two types of information, i.e.,
primary entity and person(s) name, but only primary entity name Addition 159 142 90.56
is sufficient for relevant information identification. A quantity Subtraction 159 145 92.45
belonging to a sentence of “story part” is considered relevant, if
Multiplication 117 107 91.45
the primary entity of “query part” is present in that sentence.
Therefore, primary entity has higher priority than the person(s) Division 127 124 97.63
name in multiplication. Total 562 522 93.02
3.5. Equation Formation
Table 5. Accuracy of other state-of-the-art Systems on SingleOp Dataset
It is the last step to generate final output. During forming the
Systems SingleOp Dataset Accuracy (%)
equation, determining the order of quantities is an important
factor for “Division” and “Subtraction” type problems. So many KAZB [15] 73.7
times, numerical reasoning is required for this purpose. Since our Roy and Roth [2] 73.9
dataset contains the word problems suitable for 2nd or 3rd grade
Tag-based [35] 79.5
students, no complex logic is needed for determining their order.
Division type equation is formed by considering the larger AMWPS [10] 94.48
number as dividend and the smaller number as divisor. Similarly, Our System 93.02
for “Subtraction” type question also, the smaller number is
subtracted from the larger number. However, for “Addition” and
“Multiplication” type problems, there is no issues for determining Our method uses rule-based approaches for both predicting
the order of quantities. operations and identifying relevant quantities. Although rule-
Eaddmul = QP(rel)1 (op) QP(rel)2, where, op ∊ {+, *} and QP(rel)1, based systems are expensive and difficult to construct for wide
QP(rel)2 are the order of appearance of relevant quantities in QP(rel) coverage of the rule set, their performance is always better than
Esubdiv = QP(rel)L (op) QP(rel)S, where, op ∊ {-, /} and QP(rel)L, that of systems using purely statistical techniques. The possible
QP(rel)S are the larger and smaller quantities respectively from the reasons for the good performance of our method are listed below:
set of relevant quantities QP(rel). • Performing the categorization technique reduces the effort
required to identify the operation associated with a
4. Dataset and Performance Evaluation problem.
• To identify the relevant operations, we thoroughly analyse
In our proposed method, we used the Illinois SingleOp dataset,
the dataset to find out important keywords whose presence
published by [2]. Most of the problems of this dataset were
helps to uniquely identify the operations.
collected from [34]. The dataset contains the word problems, that
covers all the basic four type of operations i.e., addition, • In addition to the keywords, we have also tried to identify
subtraction, multiplication, and division. These word problems various patterns which in turn help to determine the
are basically designed by keeping in mind the analytical abilities operation.
of the 1st to 3rd grade students. Thus, the complexity of the word • We use effective natural language processing techniques

International Journal of Intelligent Systems and Applications in Engineering IJISAE, 2022, 10(1), 87–97 | 94
like POS tagging, dependency parsing, shallow semantic 5. Conclusion and Future Work
parsing, co-reference resolution etc., to simplify the
In this paper, we present several algorithms built on multiple
structure of the question and establish the relationship
rules to solve the word problems with one equation and one
between the entities. These techniques play an important
operation belonging to the SingleOp dataset. Our work focuses on
role in identifying relevant variables.
identifying important features and establishing relationships and
However, the performance of the current method is slightly lower
dependencies among them to solve the word problems step by
than that proposed by [10], because the method of [10] is a hybrid
step. However, the main challenge was to identify the relevant
method (a combination of rule-based and machine learning), in
quantities that are important for the final generation of the
which several new concepts have been introduced, such as an
answer.
object-oriented approach in modelling word problems belonging
The proposed method performs relatively well in both predicting
to different categories, RDBMS-based information storage, etc.
the operations and identifying the relevant quantities, although
Although it is ahead of our method in performance, its structure is
performance deteriorates in the case of incorrect identification of
quite complex, while our method works well with a simple
the relevant quantities due to errors in predicting the operations
system structure.
and the lack of other inferences. Operation prediction works more
4.2. Error Analysis accurately for division-type problems (see Table 4) because most
The proposed method has produced 40 errors on SingleOp division-type input problems contain fewer ambiguous clues and
dataset. Out of these, 32 are due to either being unable to predict the problems do not require additional background knowledge to
any operation or predicting wrong operation, and the rest 8 are be solved. However, notwithstanding any flaws inherent in the
due to the problem of identifying relevant quantities. Since, the method, it outperforms most work published on the same dataset.
method is rule-based, if a problem does not fit under any rule, it Although the performance of our method is quite impressive, it is
is not able to predict any operation. The origins of errors are completely dependent on the features and rules created by hand.
discussed below. Our work can be further extended in numerous ways, as
explained below.
• Lack of world knowledge- 9 such cases are there, where,
to predict the operation of a problem or to identify the • The hand-generated features could be used to train a
relevant quantities, real world knowledge is required. For classifier that automatically predicts the function of a word
example, “There were 105 parents in the program and 698 problem using a machine learning approach.
pupils, too. How many people were present in the • A numerical inference module could be introduced to
program?” To solve this, the method must have the improve the algorithm's ability to identify relevant
knowledge that “parents” and “pupils” are “people”. quantities, thus avoiding the errors caused by incorrect
• Lack of keyword cues- 10 such cases are there, where no resolution of co-references.
definite cues are present to identify the ultimate operation • An inference module for world knowledge could also be
of the problem. For example, “Misha has 34 dollars. How extensively integrated into our method. The main purpose
many dollars does she have to earn to have 47 dollars to of this module is to deal with problems that require
buy a dog?” additional background knowledge to solve the problem.
• Lack of numerical reasoning- Three such cases are there, • The concept of intelligent explanation of the solution could
where, to identify the relevant quantities present in the also be implemented and this module will show the
problem, only the keyword or pattern cues are not solutions step by step.
sufficient, some sort of numerical reasoning is also
required. For example, “Theresa has 32 crayons. Janice has References
12 crayons. She shares 13 with Nancy. How many crayons
will Theresa have?” The co-reference resolver, NeuralCoref [1] L. Verschaffel, B. Greer, and E. De Corte. Making sense of word
identifies “Janice” as the antecedent of “she”, but the actual problems. Leiden, Netherlands: Lisse Swets and Zeitlinger, 2000,
antecedent should be “Theresa”. doi:10.1023/A:1004190927303.
• Overlapped rule-based cues- Nine such cases are there, [2] S. Roy and D. Roth. Solving general arithmetic word problems. in
where the method fails to predict right operation of a Proc. 2015 Conf. Empirical Methods Natural Language Processing
problem due to falling under incorrect rule. (EMNLP), Lisbon, Portugal, Sep. 17–21, 2015, pp. 1743-1752,
• Logical errors- Two such cases are there, where the doi:10.18653/v1/D15-1202.
method fails to identify relevant quantities, due to logical
[3] M. J. Nathan. Knowledge and situational feedback in a learning
error. Four more errors occur due to the word problems that
environment for algebra story problem solving. Interactive Learn.
were characteristically different than the word problems,
Environ. vol. 5, no. 1, pp. 135–159, 1998,
inappropriate question structure, etc.
doi:10.1080/1049482980050110.
• Wrongly identified POS tags- Three such cases are there,
where the method fails, due to wrong identification of Part- [4] D. Arnau, M. Arevalillo-Herráez, L. Puig, and J. A. González-
of-Speech by the spaCy’s POS tagger [42]. For example, Calero. Fundamentals design and the operation of an intelligent
“Emily collects 63 cards. Emily's father gives Emily 7 more. tutoring system for the learning of the arithmetical and algebraic
Bruce has 13 apples. How many cards does Emily have?” way of solving word problems. Comput. & Educ. vol. 63, pp. 119–
Here, the POS tagger, returns the POS of “Emily” (query 130, Apr. 2013, doi:10.1016/j.compedu.2012.11.020
sentence) as “ADV” or adverb, therefore, unable to identify
[5] D. Arnau, M. Arevalillo-Herráez, and J. A. González-Calero.
the person’s name.
Emulating human supervision in an intelligent tutoring system for
arithmetical problem solving. IEEE Trans. Learn. Technol. vol. 7,
no. 2, pp. 155–164, Apr./Jun. 2014, doi:

International Journal of Intelligent Systems and Applications in Engineering IJISAE, 2022, 10(1), 87–97 | 95
10.1109/TLT.2014.2307306. Information Systems. NLDB 2012. vol. vol 7337, Springer, Berlin,
Heidelberg., 2012, pp. 247-252, Lecture Notes in Computer
[6] C. R. Beal. Animalwatch: An intelligent tutoring system for
Science.
algebra readiness. in Int. Handbook Metacognition Learn.
Technologies. Springer, Mar. 2013, pp. 337–348, doi:10.1007/978- [21] M.J. Hosseini, H. Hajishirzi, O. Etzioni, and N. Kushman.
1-4419-5546-3 22. Learning to solve arithmetic word problems with verb
categorization. in In: Proceedings of the 2014 Conference on
[7] M. S. Riley, J. G. Greeno, and J. I. Heller. Development of
Empirical Methods in Natural Language Processing, EMNLP
children’s problem-solving ability in arithmetic. Univ. of
2014., Doha, Qatar, A meeting of SIGDAT, a Special Interest
Pittsburgh, Pittsburgh, PA, USA, Tech. Rep. LRDC-1984/37,
Group of the ACL., October 25-29,2014, pp. 523-533. [Online].
1984. [Online].
http://aclweb.org/anthology/D/D14/D14-1058.pdf
Available:https://files.eric.ed.gov/fulltext/ED252410.pdf
[22] S. Shi, Y. Wang, C. Lin, X. Liu, and Y. Rui. Automatically solving
[8] C. R. Fletcher. Understanding and solving arithmetic word
number word problems by semantic parsing and reasoning. in In:
problems: A computer simulation. Behav. Res. Methods, Instrum.,
Proceedings of the 2015 Conference on Empirical Methods in
& Comput. vol. 17, no. 5, pp. 565–571, Sep. 1985,
Natural Language Processing, EMNLP 2015, Lisbon, Portugal,
doi:10.3758/BF03207654.
September 17-21, 2015, pp. 1132-1142. [Online].
[9] A. Mitra and C. Baral. Learning to use formulas to solve simple http://aclweb.org/anthology/D/D15/D15-1135.pdf
arithmetic problems. in Proc. 54th Annu. Meeting Association
[23] S. Roy and D. Roth. Illinois math solver: Math reasoning on the
Computational Linguistics (ACL), Berlin, Germany, Aug. 7–12,
web. in In: Proceedings of the Demonstrations Session, NAACL
2016, pp. 2144–2153, doi: 10.18653/v1/P16-1202.
HLT 2016, The 2016 Conference of the North American Chapter of
[10] S. Mandal and S. K. Naskar. Classifying and Solving Arithmetic the Association for Computational Linguistics: Human Language
Math Word Problems—An Intelligent Math Solver. in IEEE Technologies., San Diego California, USA., June 12-17, 2016, pp.
Transactions on Learning Technologies. vol. 14, no. 1, pp. 28-41, 52–56. [Online]. http://aclweb.org/anthology/N/N16/N16-3011.pdf
Feb. 2021, doi: 10.1109/TLT.2021.3057805.
[24] S. Roy and D. Roth. Unit dependency graph and its application to
[11] T. P. Carpenter, J. Hiebert, and J. M. Moser. Problem structure and arithmetic word problem solving. in In: Proceedings of the Thirty-
first-grade children’s initial solution processes for simple addition First AAAI Conference on Artificial Intelligence., San Francisco,
and subtraction problems. J. Res. Math. Educ., pp. 27–39, Jan. California, USA., February 4-9, 2017, pp. 3082–3088. [Online].
1981, doi:10.5951/jresematheduc.24.5.0428. http://aaai.org/ocs/index.php/AAAI/AAAI17/paper/view/14764

[12] P. Nesher, J. G. Greeno, and M. S. Riley. The development of [25] S. Roy, T. Vieira, and D. Roth. Reasoning about quantities in
semantic categories for addition and subtraction. Educational Stud. natural language. vol. TACL 3, pp. 1–13, 2015. [Online].
Math. vol. 13, no. 4, pp. 373–394, Nov. 1982, https://tacl2013.cs.columbia.edu/ojs/index.php/tacl/article/view/45
doi:10.1007/BF00366618. 2

[13] G. Vergnaud. A classification of cognitive tasks and operations of [26] S. Roy and D. Roth. Mapping to Declarative Knowledge for Word
thought involved in addition and subtraction problems. Addition Problem Solving. Transactions of the Association for
subtraction: A Cogn. perspective, pp. 39–59, 1982, doi: 10.4324/ Computational Linguistics. vol. Volume 6, pp. 159-172, 2018.
9781003046585-4.
[27] L. Zhou, S. Dai, and L. Chen. Learn to solve algebra word
[14] T. P. Carpenter, E. Ansell, M. L. Franke, E. Fennema, and L. problems using quadratic programming. in In: Proceedings of the
Weisbeck. Models of problem solving: A study of kindergarten 2015 Conference on Empirical Methods in Natural Language
children’s problem-solving processes. J. Res. Math. Educ., pp. Processing, EMNLP 2015, Lisbon, Portugal, September 17-21,
428–441, Nov. 1993, doi:10.5951/jresematheduc.24.5.0428. 2015, pp. 817-822.

[15] N. Kushman, L. Zettlemoyer, R. Barzilay, and Y. Artzi. Learning [28] S. Upadhyay and M. Chang. Annotating derivations: A new
to automatically solve algebra word problems. in Proc. 52nd Annu. evaluation strategy and dataset for algebra word problems. 2016.
Meeting Association Computational Linguistics (ACL), Baltimore, [Online]. http://arxiv.org/abs/1609.07197
MD, USA, Jun. 22–27, 2014, pp. 271–281, doi: 10.3115/v1/P14-
[29] D. Huang, S. Shi, C. Lin, J. Yin, and W. Ma. How well do
1026.
computers solve math word problems? large-scale dataset
[16] R. Koncel-Kedziorski, H. Hajishirzi.+ 90A. Sabharwal, O. Etzioni, construction and evaluation. In: Proceedings of the 54th Annual
and S. D. Ang. Parsing algebraic word problems into equations. Meeting of the Association for Computational Linguistics, ACL
Trans. 01Assoc. Comput. Linguistics. vol. 3, pp. 585–597, Dec. 2016. vol. Volume 1: Long Papers (2016), August 2016. [Online].
2015, doi: 10.1162/tacl_a_00160. http://aclweb.org/anthology/P/P16/P16-1084.pdf

[17] D.G. Bobrow. Natural language input for a computer problem [30] Y. Wang, X. Liu, and S. Shi. Deep Neural Solver for Math Word
solving system. 1964. Problems. pp. 845–854, January 2017. [Online].
https://www.aclweb.org/anthology/D17-1088.pdf
[18] E. Charniak. Computer Solution of Calculus Word Problem. 1968.
[31] M.S., et al. Riley. Development of children’s problem-solving
[19] Y. Bakman. Robust understanding of word problems with
ability in arithmetic. 1984.
extraneous information. vol. arXiv preprint math/0701393, 2007.
[32] D.J. Briars and J.H Larkin. An integrated model of skill in solving
[20] C. Liguda and T. Peffier. Modeling Math Word Problems with
elementary word problems. vol. Cognition and instruction 1(3), pp.
Augmented Semantic Networks. in In: Bouma G., Ittoo A., Métais
245-296, 1984.
E., Wortmann H. (eds) Natural Language Processing and

International Journal of Intelligent Systems and Applications in Engineering IJISAE, 2022, 10(1), 87–97 | 96
[33] D. Dellarosa. A computer simulation of childrens arithmetic word-
problem solving. Behavior Reaearch Methods. vol. Instruments, &
Computers 18(2), pp. 147-154, 1986.

[34] DadsWorksheets.com, Available at:

https://www.dadsworksheets.com/worksheets/word-problems.html,
accessed June 2021.

[35] C. Liang, S. Tsai, T. Chang, Y. Lin, and K. Su. A meaning-based

English math word problem solver with understanding, reasoning
and explanation. in Proc. 26th Int. Conf. Computational Linguistics
(COLING), Osaka, Japan, Dec. 11–16, 2016, pp. 151–155.