Solving Arithmetic Word Problems Using Natural Language Processing and Rule-Based Classification
Solving Arithmetic Word Problems Using Natural Language Processing and Rule-Based Classification
net/publication/359629772
Article in International Journal of Intelligent Systems and Applications in Engineering · March 2022
DOI: 10.18201/ijisae.2022.271
CITATIONS READS
7 426
3 authors, including:
All content following this page was uploaded by Rohini Basak on 01 April 2022.
Abstract: In today's world, intelligent tutoring systems (ITS), computer-based training (CBT), etc. are rapidly gaining popularity in both
educational and professional fields, and an automatic solver for mathematical word problems is one of the most important subfields of
ITS. Automatic solving of mathematical word problems is a challenging research problem in the fields of artificial intelligence (AI) and
its subfields like natural language processing (NLP), machine learning (ML), etc., since understanding and extracting relevant
information from an unstructured text requires a lot of logical skills. To date, much research has been done in this area, focusing on
solving each type of mathematical word problem, such as arithmetic word problems, algebraic word problems, geometric word problems,
trigonometric word problems, etc. In this paper, we present an approach to automatically solve arithmetic word problems. We use a rule-
based approach to classify word problems. We propose various rules to establish the relationships and dependencies among different key
elements and classify the word problems into four categories (Change, Combine, Compare, and Division-Multiplication) and their
subcategories to identify the desired operation among+, -, *, and /. However, it is limited to solving only word problems with a single
operation and a single equation word problem. Irrelevant information is also filtered out from the input problem texts, based on manually
created rules to extract relevant quantities. Later, an equation is formed with the relevant quantities and the predicted operation to obtain
the final answer. The work proposed here performs well compared to most similar systems based on the standard SingleOp dataset,
achieving an accuracy of 93.02%.
Keywords: solving arithmetic word problems, classification of word problems, rule-based information extraction, rule-based arithmetic
word problem solver.
This is an open access article under the CC BY-SA 4.0 license.
(https://creativecommons.org/licenses/by-sa/4.0/)
Change (Level 1)
Change Plus (Level 2) Change Minus (Level 2)
There were 14 kids on the soccer field. 22 kids decided to Denise removes 5 bananas from a jar. There were originally 46 bananas in the jar.
join in. Now how many kids are on the soccer field? How many bananas are left in the jar?
Relevant quantities- 14, 22, Operation- ‘+’, Equation- Relevant quantities- 5, 46, Operation- ‘-’, Equation- 46-5, Answer- 41
14+22, Answer- 36
Combine (Level 1)
Combine Plus (Level 2) Combine Minus (Level 2)
A cake recipe requires 0.6 cup of sugar for the frosting and There are 40 boys and some girls on the playground. There are 117 children altogether.
0.2 cup of sugar for the cake. How much sugar is that How many girls are on the playground?
altogether? Relevant quantities- 40, 117, Operation- ‘-’, Equation- 117-40, Answer- 77
Relevant quantities- 0.6, 0.2, Operation- ‘-’, Equation-
0.6+0.2, Answer- 0.8
Compare (Level 1)
Compare Plus (Level 2) Compare Minus (Level 2)
Lucy has an aquarium with 212 fish. She wants to buy 68 James has 232 balloons. Amy has 101 balloons. How many more balloons does James
more fish. How many fish would Lucy have then? have than Amy?
Relevant quantities- 212, 68, Operation- ‘+’, Equation- Relevant quantities- 232, 101, Operation- ‘-’, Equation- 232-101, Answer- 131
212+68, Answer- 280
Division-Multiplication (Level 1)
Division (Level 2) Multiplication (Level 2)
Betty has 24 oranges stored in boxes. If there are 3 boxes, Jill invited 37 people to her birthday party. They each ate 8 pieces of pizza. How many
how many oranges must go in each box? pieces of pizza did they eat?
Relevant quantities- 24, 3, Operation- ‘/’, Equation- 24/3, Relevant quantities- 37, 8, Operation- ‘*’, Equation- 37*8, Answer- 296
Answer- 8
systems is to provide high quality education to each student these categories to identify the desired operation, using keyword-
through computers. based cues, pattern-based cues, parts-of-speech cues etc. We also
We have tried to build an arithmetic word problem solver by worked with the problems containing irrelevant quantities and
classifying the word problems according to their operations used rule-based approaches to identify relevant quantities from
involved and our work is inspired by the research of [7-10]. After the word problem discussed later in this paper. Defining rules for
classifying the word problems, we solved them independently. the problems which require numerical reasoning or word
Table 1 shows some word problems belonging to various knowledge was quite difficult for our approach, though we tried
categories and sub-categories. to relate those problems to some structural cues, most of the time
Depending on various research studies, the authors in [7], tried to it does not satisfy the proper reasoning. The source code of our
categorize the addition-subtraction type arithmetic word problems work is available at [43]. The key highlights of this research work
into four categories i.e., CHANGE, EQUALIZING, COMBINE are given below.
and COMPARE. Further, by performing deeper analysis, authors • Innovative classification features extraction.
in [7] tried to sub-categorize them. The sub-categories are, • Unique rules for irrelevant information removal.
CHANGE (Result unknown, change unknown, start unknown), The rest of the paper is organized as follows. Section 2 describes
EQUALIZING, COMBINE (Combine value unknown, Subset the previous work done in the field of automatic math word
unknown), COMPARE (Difference unknown, Compared quantity problem solving. Then, the detailed methodology we followed is
unknown, Referent unknown) [7, Table 4.3]. Among these, the described in section 3. Section 4 discusses the experimental
categories CHANGE, COMBINE and COMPARE are also used results critically and analyse the errors of our proposed method.
by the authors in [11-13], though the sub-category names differ. Finally, section 5 concludes the paper and provides with the
The authors in [8, 9], also perform similar kind of classification future scope of the proposed method.
techniques with CHANGE, COMBINE, and COMPARE [8,
Table 1] or with the names differ slightly [9, Table 2]. 2. Related Work
The authors in [14], first proposed a method, which includes the
division-multiplication type problems along with addition- We can broadly categorize the methodologies adopted in previous
subtraction type problems. All the systems proposed by [2, 10, work into three categories- using Symbolic Semantic Parsing,
14, 15, 16] can solve the arithmetic problems with all the basic using Structure Prediction, and using Deep Learning. In Symbolic
four types of operations. The datasets chosen by them are also Semantic Parsing [17,18, 19, 20, 21, 22], semantic parsing refers
similar. All this research motivated us to solve the addition, to the process of converting natural language text (parsing) to an
subtraction, multiplication, and division type word problems with intermediate logical form that captures the meaning of the input
some new strategies. (semantic). We refer to the early work in this field as symbolic
The core concept of our technical approach is based on semantic parsing as the intermediate representation often
classification features we proposed, which is also closely related including human readable symbols. Structure Prediction [2, 9, 15,
to the human cognitive capabilities in understanding natural 16, 23, 24, 25, 26, 27, 28, 29], refers to the process of building
language-based word problems. We used several keyword-based data driven models that align a particular simple intermediate
cues to classify the word problems into four categories i.e., representation with the vectorial representation of a word
Change, Combine, Compare and Division-Multiplication. This problem. The features used to convert the text to vector were
concept is quite like the work of [10]. Further we sub-categorised hand-engineered. Deep Learning [30] modelling is a recent neural
International Journal of Intelligent Systems and Applications in Engineering IJISAE, 2022, 10(1), 87–97 | 88
network-based approach to convert one sequence to another. system ‘ARIS’, which solves addition-subtraction type problems
[17] first proposed an approach and built a system named by following Symbolic Semantic Parsing. It represents the whole
“STUDENT”, which is capable to read, understand and evaluate word problem as a logic template named state, which consists of
a wide range of algebraic word problems (of some specific a set of entities, their attributes, containers, quantities, and their
structure like, times, rates, percentage etc.,) represented in natural relationships. For example, “Tom has 10 white kittens.”, here,
language (English) and gives the answer in natural language. The ‘kitten’ refers to the entity, ‘white’ refers to its attribute, and
system basically consists of two programs- (i) “STUDENT” is for since kittens belongs to Tom, so, ‘Tom’ is treated as the
converting the algebraic word problems into equation form and container. They built an SVM-based classifier to identify the verb
(ii) “REMEMBER” is for storing the global information for category. They also compiled a dataset named AI [36] on
solving a particular word problem. However, the main addition-subtraction type problems and achieved remarkable
disadvantage of the system is, it is only able to solve very small accuracy on that dataset.
number of problems due to the limited semantic base and global Some notable works on tree-based methods (structure prediction)
information. After, the work of [17], the next notable work was have been done by [2,23,24]. The main idea behind their works is
done by [31], and they discussed about the issues, that children to transform the arithmetic expression to an equivalent binary tree
face during solving an arithmetic word problem. One of these structure step-by-step, by following bottom-up approach, where
issues is about establishing the relationship between conceptual the internal nodes represent the operators, and the leaves
and procedural knowledge. By analysing the characteristics of the represent the operands. The main advantage of this is, there is no
addition-subtraction type word problems, they proposed a need of additional annotations i.e., equation template, logic forms
theoretical approach to solve the problems by categorizing them or tags. The algorithmic approach they developed, can solve
into four categories- ‘CHANGE’, ‘COMBINE’, ‘COMPARE’ multi-step and multi-operation arithmetic word problems and the
and ‘EQUALIZING’. These categories are further divided into algorithmic framework consists of two processing stages. At the
sub-categories. The categories, ‘CHANGE’ and ‘EQUALIZING’ first stage, the relevant quantity extraction is done from the input
are generally associated with the actions, related to increment or text and from the bottom levels of the tree. The syntactically valid
decrement of the quantities, but ‘COMBINE’ and ‘COMPARE’ candidate trees but, with different internal nodes and structures
describe the static relationship among the quantities. are enumerated. At the second stage, to pick the best candidate
Another, similar kind of work was done by [32]. They built a tree, they defined a scoring function, and that candidate tree is
model named ‘CHIPS’, which was a simulation program based used to derive the final output. All the algorithms they developed,
on children psychology for solving any word problem and the follows a common strategy to build the local classifier to predict
level of difficulties they met. The similar kind of work was also the operation between two quantities. The authors in [2], first
carried by [33] and [8], and the system they built is based on proposed the algorithmic approach of expression tree to solve the
human cognitive science theories, named ‘ARITHPRO’ and arithmetic word problems. They trained a binary classifier to
‘WORDPRO’ respectively. To represent the meaning of a word determine whether the extracted quantity is relevant or not, to
problem, ‘WORDPRO’ uses a set a proposition. The system minimize the search space. Only the relevant quantities take part
consists of four schemas- ‘Change-in’, ‘Change-out’, ‘Combine’ in tree construction, and are placed at the bottom level, while the
and ‘Compare’. Conceptually, the ‘Change’ schema, establish the irrelevant quantities are eliminated. They introduced and proved
relationship among, ‘Start-set’ (‘Jemmy had 5 apples’), many theorems to identify the operations between two relevant
‘Transfer-set’ (‘Then, John gave her 4 apples’) and ‘Result- quantities along with their order of occurrence. They used
set’(‘How many apples does Jemmy have now?’). Basically, the multiclass SVM to predict the operation and the binary SVM to
system solves a problem by following certain rules, i.e., 13 rules identify relevant quantities. They out-performed all the previous
for ‘meaning postulates’, 12 rules for ‘arithmetic strategies’ and system accuracies on existing datasets and created two new
11 rules for ‘problem solving procedures’. These rules are applied datasets, named Illinois-562 and Commoncore-600, consisting of
sequentially, for addition, change and subtraction, depending on more diverse and complex word problems. Their system was
the content of the ‘Short Term Memory’(STM) of the system. more generalized with minimal dataset dependency. Further, they
Just like ‘CHIPS’ and ‘WORDPRO’, ‘ARITHPRO’ could also extended their work to create a web based MWP solver [23],
solve the single equation and single operation word problems of which can solve a huge number of word problems provided by
type addition-subtraction. It also categorizes the word problems common users. To manage the queries, asking for operations
into three categories (‘CHANGE’, ‘COMBINE’, ‘COMPARE’), between the numbers, they added a CFG parser with their
same as ‘CHIPS’. Since, both systems had limitations on the existing MWP solver. Later, they also developed a system based
change verb (‘give’) and the order of appearance of the problem on the theory of ‘Unit Dependency Graph’ (UDG), which
sentences i.e., the first sentence must mention the number of identifies the relationship and dependency between the units of
objects the owner had initially, and the second sentence must the quantities [24]. An extensive review of these works can be
contain the change verb. After, the previous research, the next found in [37].
remarkable research was done by [19], with the system The authors in [15], first approach the method based on template-
‘ROBUST’, which could understand free-format multi-step based techniques (structure prediction) to solve the algebraic
arithmetic word problems with irrelevant information. Though, word problems. The area of their research was based on three
the system is based on propositional logic, it can work perfectly main fields (Semantic Interpretation, Information Extraction, and
for multiple verbs and the corresponding operations. Instead of Automatic Word Problem Solver) of Natural Language
identifying the operation, ‘ROBUST’ uses the concept of schema. Processing (NLP). They used both supervised and semi-
The author expanded the ‘CHANGE’ schema into six distinct supervised learning methods by gathering problems and solutions
categories (‘Transfer-In-Ownership’, ‘Transfer-Out-Ownership’, from a website named Algebra.com. However, the performance
‘Transfer-In-Place’, ‘Transfer-Out-Place’, ‘Creation’ and of their system was not up to the mark, where the additional
‘Termination’) according to their role in the word problem. The background knowledge and domain knowledge were required.
author in [21], proposed an approach and came out with the For example, “A painting is 20 inches tall and 25 inches wide. A
International Journal of Intelligent Systems and Applications in Engineering IJISAE, 2022, 10(1), 87–97 | 89
print of the painting is 35 inches tall, how wide is the print in characteristics we tried to broadly categorize them. Four such
inches?” However, they included all the basic four operations (+, categories as we mentioned before are- Change, Combine,
-, *, /) in their dataset. In Equation Based, many of the systems Compare and Division-Multiplication. The first three categories
can handle multiple simultaneous equations. However, if the basically belong to Addition-Subtraction type problems and the
template is not available in the training phase, it is not possible to last one refers to the entire Division-Multiplication type
generate new templates at inference time. In structure prediction, problems. These categories are further categorized into sub-
every system had their own niche set of hand-crafted rules to categories in level-2 classification framework to determine the
develop the vector representation of a word problem. They were desired operation. We basically used keyword-based cues,
far more generalizable than their symbolic counterparts. There positional cues, phrase cues, and pattern cues, etc., to classify the
were some attempts at modelling domain knowledge either in the word problems in multiple levels. We are indebted to [10] as we
form of constraints or introducing new template elements [38]. have reused some of their identified features and reused them in
In the recent years, deep learning has gained remarkable our proposed method with various new cues and features.
popularity due to its predominance in terms of accuracy, when,
the system is trained with enough data. In the last few years,
several efforts have been put into solving math word problems by
applying deep learning. The authors in [30], first proposed an
algorithm Deep Neural Solver (DNS) which does not depend on
hand-crafted features, and it is considered as huge contribution,
since it does not require any human intelligence for feature
extraction. It directly translates the input word problem to
corresponding equation templates using Recurrent Neural
Network (RNN) model, using any feature engineering. For
improving the performance of the system, they further came up
with a new hybrid model that is built of combining the RNN
model and a similarity-based retrieval model. This model consists
of a set of encoders and decoders. It also includes a classifier, to Fig 1. System overview of rule-based math word problem solver
determine the significance of a numerical quantity and proposed a
TF-TDF similarity-based retrieval model, to predict the question
associated. To examine the performance of both models an 3.3.1. Level-1 Classification Framework
experiment was conducted on a large set of data and surprisingly
I. Change
these outperform all the state-of-the art models built on statistical
learning methods. The category ‘Change’ can be defined as set of actions that
RNN models are generally used to perform Seq2Seq modelling. causes the increment or decrement to the quantity belonging to a
Though these models provide satisfying results over both small particular entity or variable.
and large datasets, traditional ML models perform better on • Change Verb Keywords- “gives”, “takes”, “loses”, “lost”,
smaller datasets, due to the high lexical similarities [39]. “add”, “join”, “left”, “shares”, “eaten” etc.
• Change Non-Verb Keywords- “now”, “change”, “sum”,
3. Proposed Method “away”, “rest”, “off”, “empty” etc.
3.1. Problem Formulation II. Combine
A single operation, single equation and single step arithmetic The category ‘Combine’ refers to the word problems which are
word problem P can be defined as a sequence of n words {w0, related to the combination or collection of two or more entities. In
w1,...,wn-1} which contains a set of quantities QP = {q0, this type of word problems, either the combined numerical value
q1,….,qx-1}, where, n>x. The quantities i.e., the numeric values, of participating entities is asked or the combined value and one of
appear in the quantity set according to the order of appearance of the participating entity’s values are given and the other
the numerical entities in P [10]. The set of relevant quantities can participating entity’s value is being asked.
be defined as QP(rel) = {qs, qt}, where, {qs, qt}∊ QP, i.e. QP(rel) ⫅
• Combine Keywords- “all”, “total”, “together”,
Qp.
“altogether” etc.
Let, PSingleOp is a set of arithmetic word problems, and each
problem P ∊ PSingleOp can be solved by evaluating a correct III. Compare
mathematical equation E, which was formulated by the quantities The category ‘Compare’ represents the set of questions, where
of QP(rel) and by selecting one of the operators op ∊ {+, -, *, /}. one quantity is being compared to another quantity. Here, the
The equation E, for the problem P ∊ PSingleOp can be formulated category does not always mean the comparison between two
by applying one of the possible equation format {Eaddmul, different entities, it could be the comparison to current numerical
Esubdiv}, described in section 3.4. values associated to the state of same entity also. For example,
3.2. System Overview “Brenda starts with 7 Skittles. She buys 8 more. How many
Skittles does Brenda end with?” where, the additional quantity 8,
Fig 1 describes the overview of the system. The detailed is compared to the current numerical state of the entity, which is
workflow is explained in the following sections. 7, to find out the actual answer.
3.3. Operation Prediction • Comparative Adjectives or Adverbs- Any word
belonging to the mentioned Part-of- speech tags (POS) like,
Predicting the operation of the MWP is one of the major tasks. “more”, “less”, “longer”, “heavier”, “fewer” etc.
We used a multilevel classification framework like [10] for this • Associated Comparative Keywords- “another”, “than”
task. In the level-1 classification framework, we manually studied etc.
the characteristics of each problem, and according to the
International Journal of Intelligent Systems and Applications in Engineering IJISAE, 2022, 10(1), 87–97 | 90
IV. Division-Multiplication “costs” and “change” > are both present in the word
This category contains all the word problems of type ‘Division problem, it should be a part of “Change” type problem.
and Multiplication’. Therefore, the features to identify this The keywords, both of which appeared to be single, as well
as combined features, the precedence of execution of these
category are quite different than the previous three.
features should be combined feature followed by single
Multiplication is about combining equal parts to make a whole
feature. For example, the combined feature, < “costs” and
and Division is about separating into equal parts. ‘Division- “change”> has higher precedence than the single feature
Multiplication’ are the key operations for some specific type of “costs”.
word problems as well, such as calculating time and distance,
3.3.2. Level-2 Classification Framework
calculating area, etc.
For, level-2 classification, we reused the features identified in
• “Equal part” Related Keywords- “each”, “every”, “per”. level-1 classification along with some extra features, which
• Time & Distance Related Keywords- “mile”, “kilometre”, includes, keyword-based cues, positional cues, phrase cues,
“meter”, “minute”, “hour” etc. pattern cues, and the combination of these cues. To apply the
• Miscellaneous Div-Mul Keywords- “whole”, “times”, positional cues, we have divided the input problem into two
“row”, “split”, “divide”, “cost”, “square”, “cover”, “do”,
parts- “story part” that contains all the sentences excluding the
“feet,” etc.
question sentence, and “query part” that contains only the
• Combined Div-Mul Keywords- <“sold” and “does”>,
<“shares” and “among”>. question sentence. The main objective of level-2 classification is
to apply the unique features or combination of features on level-1
By analysing the dataset, we observed that, one word problem classification output to identify whether the input question
may belong to multiple categories simultaneously, i.e., it may performs any one operation among Addition, Subtraction,
have keywords representing two different categories. For Multiplication, and Division.
example, “There are 4 marbles. 7 marbles more are added. How
Addition-Subtraction
many are there total?” Here, the keywords “more” and “total”
appear together in a single word problem, where, “more” I. Compare
keyword is generally used to identify “Compare” type problems According to the concept discussed in Level-1 Classification
and “total” keyword indicates “Combine” type problems. To Framework (III. Compare) and by analysing the “Compare”
avoid this kind of conflicts, we prioritized the categories based on category [31, 8] and the “Comparison” category [9], we have
the number of occurrences of the keywords belonging to each divided it into two subcategories- Comparative Addition and
category. Since the keyword “more” occurs more frequently to Comparative Subtraction.
identify the category “Compare” with respect to “total” for • Keyword Based Cues
“Combine”, the priority is given to “Compare”. Therefore, the (i) Presence of keyword “some” in the “Compare” type
above-mentioned problem is categorized as “Compare” type. question always indicates “Subtraction” operation. (ii)
Basically, we adopted the precedence rules for different Presence of keyword “another” always indicates “Addition”
categories from the work of [10]. According to these rules, operation, according to the dataset.
“Compare” has the highest priority, followed by “Division- • Keyword Positional Cues- If, the Comparative Adjectives
Multiplication”, “Combine” and “Change” respectively. or Adverbs are present in the “query part”, the operation
should be “Subtraction”.
• Dealing with Overlapping Keywords- As we know that, • Combined Cues- If, the Comparative Adjectives or
“Compare”, “Combine” and “Change” are the categories of Adverbs are present in the “story part”, whether the
Addition-Subtraction type problems, their overlapping
operation is “Addition” or “Subtraction” is decided based
keyword features can easily be solved by applying the
precedence of the categories. However, some exceptions on some other cues, (i) If, the comparison is done between
may happen, in case of Division-Multiplication. Since it is two different entities and therefore, the keyword “than” is
possible to overlap some keywords between Addition- present in the question: At this situation, to identify, which
Multiplication and Subtraction-Division, there is a entity is being compared with respect to another entity,
possibility of categorizing “Combine” or “Change” keyword cues or positional cues are not sufficient. For
category problems as “Division-Multiplication” type, as example, if we consider the questions, (a) “Ethan has 31
“Division-Multiplication” has higher precedence than presents. Alissa has 22 more than Ethan. How many
“Combine” and “Change”. We have handled these cases presents does Alissa have?” and (b) “Sean has 223 whistles.
explicitly. For example, “Linda has 34 candies. Chloe has He has 95 more whistles than Charles. How many whistles
28. How many candies do they have in all?” Here, the does Charles have?” both the questions seem similar
keyword “do” belongs to “Division-Multiplication” according to the keyword and positional cues, but clearly
category, whereas the keyword “all” belongs to “Combine” they perform different operations. To overcome this
category. To overcome such conflicts, if we simply follow
scenario, we need to use pattern cues here. (ii) If the
the precedence table, it returns its category as “Division-
Multiplication” type, which is not correct. Thus, to identify comparison is done to the numerical value associated with
the categories uniquely, we must follow some combined the current state of the same entity, the operation should be
keyword features. The below mentioned list displays all “Addition”. Algorithm 1 shows the procedure.
such conditions.
• Combined Explicit Combine Features- If, the keywords Algorithm 1: compare_type_pattern_cue (question,
(i) < “do” and “all”> (ii) < “do” and “altogether” > both predicted_category)
present in the word problem, the problem should belong to
the category “Combine”. Input: (i) Word problem after lower casing the text (ii) Category
• Combined Explicit Change Features- If, the keywords (i) of the input problem, which is the output of Level 1
classification.
< “each” and “added” > (ii) < “miles” and “left” > (iii) <
Output: The predicted operation of the word problem.
International Journal of Intelligent Systems and Applications in Engineering IJISAE, 2022, 10(1), 87–97 | 91
1. predicted_operation ⃪ ɸ “Division”. (ii) If the keyword “per” is present in the
2. persons[] ⃪ ɸ question, and
3. if(predicted_category == “compare” and “than” ∊ question), -If the keywords “far”, “miles”, “points” etc. are present in
then, the “query part” i.e., if the question is mainly asking about
4. resolve the co-references of the question.
distance related information, the operation should be
5. if(proper noun ∊ question),
“Multiplication”.
6. if(proper noun ∉ persons[]),
7. persons[] ⃪ P, where, P={P0, P1} is proper noun -If the keywords “long”, “minutes”, “gallons” etc. are
8. if(“than” → next == P0), then, present in the “query part” i.e., if the question is mainly
9. return(predicted_operation ⃪ “addition”) asking about time related information, the operation should
10. else, be “Division”. (iii) If the keyword “cost” is present in the
11. return(predicted_operation ⃪ “subtraction”) question, and
-If the phrase “how much” is present in the “query part”,
II. Combine
the operation should be “Multiplication”.
As per the discussion of Level-1 Classification Framework (II. -If the phrase “how many” is present in the “query part”,
Combine), we divided it into two sub-categories, Combine the operation should be “Division”. (iv) If the keyword
Addition and Combine Subtraction. “times” is present in the “story part”, the operation should
• Keyword Based Cues- Presence of keyword “total” in the be “Multiplication”.
“Combine” type problems, always indicates “Addition” (v) If the keyword “times” is present in the “query part”,
operation. and
• Keyword Positional Cues - If, the keyword “will” is present in the question, the
(i) If, the keyword “all” is present in the “story part”, along operation should be “Multiplication”.
with the keyword “will” in the question, it indicates - Otherwise, the operation should be Division. (vi) If the
“Addition” operation, but, in that case, absence of the keyword “each” or “every” is present in the “story part”,
“will” keyword indicates “Subtraction” operation. (ii) If, and
the keyword “together” is present in the question, the -If, a numeric value is present in the “query part”, the
operation should be “Addition”. (iii) Presence of keyword operation should be “Multiplication”.
“all” or “altogether” in the “query part” of “Combine” type -If the “Combine keywords” i.e., “all”, “total”, “altogether”
question, always indicates “Addition” operation. (iv) If, the are present in the question, the operation should be
keyword “altogether” is present in the “story part”, “Multiplication”.
the operation should be “Subtraction”. - If none of the above two conditions is satisfied, there
III. Change exists some multiplication and division problems, which are
indistinguishable based on the keyword cues. For example,
According to the concept discussed in Level-1 Classification
Table 2 lists up a few such cases and Algorithm 2 describes
Framework (I. Change), irrespective of the position of unknown
the rules we propose to handle them.
quantity, we have tried to find out the features, that ultimately
responsible for defining its sub-categories as, Change Addition Table 2. Comparing item name in “story part” and “query part” to
and Change Subtraction. identify the final operation
International Journal of Intelligent Systems and Applications in Engineering IJISAE, 2022, 10(1), 87–97 | 92
Algorithm 2: divmul_type_pattern_cue (story_part, Pre-processing is considered as the most important step for the
query_part, predicted_category) information extraction from the arithmetic word problems in the
first step of pre-processing. We perform co-reference resolution
Inputs: (i) Story part of an input question. (ii) Query part of an and substitution to substitute the pronouns with the relevant
input question. (iii) Category of the input problem, which is the
nouns and used NeuralCoref [40] for this purpose. After that, the
output of Level 1 classification.
Output: The predicted operation of the word problem. input text is segregated into two parts, ‘story part’ and ‘query
part’. Further, the story part is divided into individual sentences.
1. predicted_operation ⃪ ɸ Then, we eliminated the conjunctions which are responsible for
2. item_name ⃪ ɸ joining two quantities and re-constructed the sentences. We used
3. if(predicted_category == “div-mul” and <“each”/ “every”> ∊ SpaCy’s dependency parser [41] for this purpose. For example, if
story_part), then, a question contains the sentence, “Carolyn starts with 47 marbles
4. find the index of “each” / “every” and 6 oranges.”, it is re-phrased as, “Carolyn starts with 47
5. if( noun(s) present between each_index+1 and end of marbles” and “Carolyn starts with 6 oranges.”
story_part),
After pre-processing, we extracted the information provided in
6. item_name ⃪ item_name + noun
the “query part”. We observed that, the information ranges to four
7. find out noun phrases ∊ query_part
8. find out noun phrase that contains wh-word. parameters depending on the type of operation the question
9. find out the rightmost noun present in the wh-noun phrase. belongs to. The parameters include location, primary entity,
10. if(item_name ≠ ɸ), then, person(s) involved and secondary entity. It is not necessary that
11. find out the rightmost noun present in the item_name all the problems should have all these parameters. For example,
12. if(item_name_rightmost_noun == “There are 8 apples in a pile on the desk. Each apple comes in a
wh_phrase_rightmost_noun),then, package of 11. 5 apples are added to the pile. How many apples
13. return(predicted_operation ⃪ “division”) are there in the pile?”, if we consider the query part, “pile” is
14. else, considered as location and “apples” is considered as primary
15. return(predicted_operation ⃪ “multiplication”)
entity. Here, location means the place, not any geographical
16. else,
17. if(each_index-1 == noun), then, location. So, it is identified by matching the POS pattern,
18. item_name ⃪ lemmatized(each_index → prev) determiner followed by a preposition and a noun, and then by
19. if(item_name_rightmost_noun == extracting only the noun from the matched pattern phrase.
wh_phrase_rightmost_noun),then, Likewise, primary entity is the entity, about which the problem is
20. return(predicted_operation ⃪ “multiplication”) asking about, and secondary entity is another entity apart from
21. else, the primary entity and are identified by extracting nouns from
22. return(predicted_operation ⃪ “division”) noun phrases and person(s) name are identified by extracting
proper nouns from the “query part”.
• Explicit Division-Multiplication Keyword Cues- (i)
Presence of the keywords “do”, “cover”, “far”, “row”, Addition-Subtraction
“will” etc. in the question mostly indicate “Multiplication”
The information extracted from the “query part” of Addition-
operation. (ii) If, the keyword “whole” is present in the
Subtraction type problems, mainly consists of three types of
question, and the keyword “cover” is not present in the
information i.e., location, person(s) involved and primary entity.
question, the operation should be “Division”. (iii) Presence
Combination of these is also possible. (i) If, the location
of the keywords “split” “sold”, “fast” etc. in the Division-
information and the primary entity information, both are present
Multiplication type question, indicate the operation
in the “query part”, then,
“Division”.
-First search for the location in the sentences of “story part”. If, a
3.4. Identifying Relevant Quantities sentence contains the location, then search for the presence of
After predicting operation for a word problem, the next primary entity. If, the primary entity is also there, then only the
challenging work is to identify the relevant quantities which are quantity belongs to that sentence is considered as relevant.
responsible for final answer generation. Basically, a word However, if the primary entity is not present in the sentence,
problem may contain irrelevant quantities. However, identifying since, the location information is present, then also the quantity
irrelevant sentences seems simpler than identifying irrelevant present in the sentence is considered relevant by mapping it to the
quantities. Here, the irrelevant information does not only mean entity name of the previously qualified sentence. Hence, location
out of context information, but also the information, that is has the higher precedence than the entity name. (ii) If, the
important for the problem definition, but not taking part in location information is not present in the “query part”, but
answer generation. primary entity and person name(s) related information are
By analysing the dataset, we observed that, irrelevant information present, then,
(or quantities) belongs to the word problems comprising of all the -First search for the person’s name(s) in the sentences of “story
four types of operations i.e., “Addition”, “Subtraction”, part”. If, a sentence contains the person’s name, then search for
“Multiplication” and “Division”. Since the features to identify all the presence of primary entity. If the primary entity is also
these operations are different, they are handled in different present in the sentence, then only the quantity belonging to that
manner in identifying irrelevant quantities. Depending on the sentence is considered as relevant. However, if the primary entity
characteristic of the questions containing irrelevant information, is not present in the sentence and only person name is present,
we have divided these into three groups- “Addition-Subtraction”, then the quantity present in the sentence is considered relevant by
“Division” and “Multiplication” and propose specific mapping it to the entity name of the previously qualified
independent rules to filter out the irrelevant information (or sentence, if, no other entity name belongs to the same sentence.
quantities). Hence, person name(s) has the higher priority than the entity
name. (iii) If only the person’s name(s) related information is
International Journal of Intelligent Systems and Applications in Engineering IJISAE, 2022, 10(1), 87–97 | 93
present in the “query part”, then, problems is limited up to single equation and single operation.
-Consider the quantities relevant, which belong to the sentences However, they included the word problems with irrelevant
containing the person’s name(s), same as the person’s name(s) information also to increase the complexity a step ahead. The
present in the “query part”. (iv) If only the primary entity related dataset contains 562-word problems, consisting of 159 addition,
information is present in the “query part”, then, 159 subtraction, 117 multiplication and 127 division type
-Consider the quantities relevant, which belong to the sentences problems. Table 3 and Table 4 shows the performance of the
containing the entity, same as the entity information present in proposed method in predicting the desired operation.
the “query part”. Table 3. Performance of Operation Prediction for Each Operation.
Accuracy Precision Recall
Division Operation F1 Score (%)
(%) (%) (%)
By analysing the problems of division, we have observed that, the
information extracted from the “query part”, consists of only two Addition 97.5 98.65 92.45 95.44
types of information, i.e., primary entity and secondary entity. A Subtraction 98.04 99.33 93.71 96.44
quantity belonging to a sentence of “story part” is considered Multiplication 97.68 94.82 94.01 94.41
relevant, if, any of the entities of “query part” is present in that
sentence. Hence, both entities are of equal priority. Division 99.28 99.2 97.64 98.41
International Journal of Intelligent Systems and Applications in Engineering IJISAE, 2022, 10(1), 87–97 | 94
like POS tagging, dependency parsing, shallow semantic 5. Conclusion and Future Work
parsing, co-reference resolution etc., to simplify the
In this paper, we present several algorithms built on multiple
structure of the question and establish the relationship
rules to solve the word problems with one equation and one
between the entities. These techniques play an important
operation belonging to the SingleOp dataset. Our work focuses on
role in identifying relevant variables.
identifying important features and establishing relationships and
However, the performance of the current method is slightly lower
dependencies among them to solve the word problems step by
than that proposed by [10], because the method of [10] is a hybrid
step. However, the main challenge was to identify the relevant
method (a combination of rule-based and machine learning), in
quantities that are important for the final generation of the
which several new concepts have been introduced, such as an
answer.
object-oriented approach in modelling word problems belonging
The proposed method performs relatively well in both predicting
to different categories, RDBMS-based information storage, etc.
the operations and identifying the relevant quantities, although
Although it is ahead of our method in performance, its structure is
performance deteriorates in the case of incorrect identification of
quite complex, while our method works well with a simple
the relevant quantities due to errors in predicting the operations
system structure.
and the lack of other inferences. Operation prediction works more
4.2. Error Analysis accurately for division-type problems (see Table 4) because most
The proposed method has produced 40 errors on SingleOp division-type input problems contain fewer ambiguous clues and
dataset. Out of these, 32 are due to either being unable to predict the problems do not require additional background knowledge to
any operation or predicting wrong operation, and the rest 8 are be solved. However, notwithstanding any flaws inherent in the
due to the problem of identifying relevant quantities. Since, the method, it outperforms most work published on the same dataset.
method is rule-based, if a problem does not fit under any rule, it Although the performance of our method is quite impressive, it is
is not able to predict any operation. The origins of errors are completely dependent on the features and rules created by hand.
discussed below. Our work can be further extended in numerous ways, as
explained below.
• Lack of world knowledge- 9 such cases are there, where,
to predict the operation of a problem or to identify the • The hand-generated features could be used to train a
relevant quantities, real world knowledge is required. For classifier that automatically predicts the function of a word
example, “There were 105 parents in the program and 698 problem using a machine learning approach.
pupils, too. How many people were present in the • A numerical inference module could be introduced to
program?” To solve this, the method must have the improve the algorithm's ability to identify relevant
knowledge that “parents” and “pupils” are “people”. quantities, thus avoiding the errors caused by incorrect
• Lack of keyword cues- 10 such cases are there, where no resolution of co-references.
definite cues are present to identify the ultimate operation • An inference module for world knowledge could also be
of the problem. For example, “Misha has 34 dollars. How extensively integrated into our method. The main purpose
many dollars does she have to earn to have 47 dollars to of this module is to deal with problems that require
buy a dog?” additional background knowledge to solve the problem.
• Lack of numerical reasoning- Three such cases are there, • The concept of intelligent explanation of the solution could
where, to identify the relevant quantities present in the also be implemented and this module will show the
problem, only the keyword or pattern cues are not solutions step by step.
sufficient, some sort of numerical reasoning is also
required. For example, “Theresa has 32 crayons. Janice has References
12 crayons. She shares 13 with Nancy. How many crayons
will Theresa have?” The co-reference resolver, NeuralCoref [1] L. Verschaffel, B. Greer, and E. De Corte. Making sense of word
identifies “Janice” as the antecedent of “she”, but the actual problems. Leiden, Netherlands: Lisse Swets and Zeitlinger, 2000,
antecedent should be “Theresa”. doi:10.1023/A:1004190927303.
• Overlapped rule-based cues- Nine such cases are there, [2] S. Roy and D. Roth. Solving general arithmetic word problems. in
where the method fails to predict right operation of a Proc. 2015 Conf. Empirical Methods Natural Language Processing
problem due to falling under incorrect rule. (EMNLP), Lisbon, Portugal, Sep. 17–21, 2015, pp. 1743-1752,
• Logical errors- Two such cases are there, where the doi:10.18653/v1/D15-1202.
method fails to identify relevant quantities, due to logical
[3] M. J. Nathan. Knowledge and situational feedback in a learning
error. Four more errors occur due to the word problems that
environment for algebra story problem solving. Interactive Learn.
were characteristically different than the word problems,
Environ. vol. 5, no. 1, pp. 135–159, 1998,
inappropriate question structure, etc.
doi:10.1080/1049482980050110.
• Wrongly identified POS tags- Three such cases are there,
where the method fails, due to wrong identification of Part- [4] D. Arnau, M. Arevalillo-Herr´aez, L. Puig, and J. A. Gonz´alez-
of-Speech by the spaCy’s POS tagger [42]. For example, Calero. Fundamentals design and the operation of an intelligent
“Emily collects 63 cards. Emily's father gives Emily 7 more. tutoring system for the learning of the arithmetical and algebraic
Bruce has 13 apples. How many cards does Emily have?” way of solving word problems. Comput. & Educ. vol. 63, pp. 119–
Here, the POS tagger, returns the POS of “Emily” (query 130, Apr. 2013, doi:10.1016/j.compedu.2012.11.020
sentence) as “ADV” or adverb, therefore, unable to identify
[5] D. Arnau, M. Arevalillo-Herr´aez, and J. A. Gonz´alez-Calero.
the person’s name.
Emulating human supervision in an intelligent tutoring system for
arithmetical problem solving. IEEE Trans. Learn. Technol. vol. 7,
no. 2, pp. 155–164, Apr./Jun. 2014, doi:
International Journal of Intelligent Systems and Applications in Engineering IJISAE, 2022, 10(1), 87–97 | 95
10.1109/TLT.2014.2307306. Information Systems. NLDB 2012. vol. vol 7337, Springer, Berlin,
Heidelberg., 2012, pp. 247-252, Lecture Notes in Computer
[6] C. R. Beal. Animalwatch: An intelligent tutoring system for
Science.
algebra readiness. in Int. Handbook Metacognition Learn.
Technologies. Springer, Mar. 2013, pp. 337–348, doi:10.1007/978- [21] M.J. Hosseini, H. Hajishirzi, O. Etzioni, and N. Kushman.
1-4419-5546-3 22. Learning to solve arithmetic word problems with verb
categorization. in In: Proceedings of the 2014 Conference on
[7] M. S. Riley, J. G. Greeno, and J. I. Heller. Development of
Empirical Methods in Natural Language Processing, EMNLP
children’s problem-solving ability in arithmetic. Univ. of
2014., Doha, Qatar, A meeting of SIGDAT, a Special Interest
Pittsburgh, Pittsburgh, PA, USA, Tech. Rep. LRDC-1984/37,
Group of the ACL., October 25-29,2014, pp. 523-533. [Online].
1984. [Online].
http://aclweb.org/anthology/D/D14/D14-1058.pdf
Available:https://files.eric.ed.gov/fulltext/ED252410.pdf
[22] S. Shi, Y. Wang, C. Lin, X. Liu, and Y. Rui. Automatically solving
[8] C. R. Fletcher. Understanding and solving arithmetic word
number word problems by semantic parsing and reasoning. in In:
problems: A computer simulation. Behav. Res. Methods, Instrum.,
Proceedings of the 2015 Conference on Empirical Methods in
& Comput. vol. 17, no. 5, pp. 565–571, Sep. 1985,
Natural Language Processing, EMNLP 2015, Lisbon, Portugal,
doi:10.3758/BF03207654.
September 17-21, 2015, pp. 1132-1142. [Online].
[9] A. Mitra and C. Baral. Learning to use formulas to solve simple http://aclweb.org/anthology/D/D15/D15-1135.pdf
arithmetic problems. in Proc. 54th Annu. Meeting Association
[23] S. Roy and D. Roth. Illinois math solver: Math reasoning on the
Computational Linguistics (ACL), Berlin, Germany, Aug. 7–12,
web. in In: Proceedings of the Demonstrations Session, NAACL
2016, pp. 2144–2153, doi: 10.18653/v1/P16-1202.
HLT 2016, The 2016 Conference of the North American Chapter of
[10] S. Mandal and S. K. Naskar. Classifying and Solving Arithmetic the Association for Computational Linguistics: Human Language
Math Word Problems—An Intelligent Math Solver. in IEEE Technologies., San Diego California, USA., June 12-17, 2016, pp.
Transactions on Learning Technologies. vol. 14, no. 1, pp. 28-41, 52–56. [Online]. http://aclweb.org/anthology/N/N16/N16-3011.pdf
Feb. 2021, doi: 10.1109/TLT.2021.3057805.
[24] S. Roy and D. Roth. Unit dependency graph and its application to
[11] T. P. Carpenter, J. Hiebert, and J. M. Moser. Problem structure and arithmetic word problem solving. in In: Proceedings of the Thirty-
first-grade children’s initial solution processes for simple addition First AAAI Conference on Artificial Intelligence., San Francisco,
and subtraction problems. J. Res. Math. Educ., pp. 27–39, Jan. California, USA., February 4-9, 2017, pp. 3082–3088. [Online].
1981, doi:10.5951/jresematheduc.24.5.0428. http://aaai.org/ocs/index.php/AAAI/AAAI17/paper/view/14764
[12] P. Nesher, J. G. Greeno, and M. S. Riley. The development of [25] S. Roy, T. Vieira, and D. Roth. Reasoning about quantities in
semantic categories for addition and subtraction. Educational Stud. natural language. vol. TACL 3, pp. 1–13, 2015. [Online].
Math. vol. 13, no. 4, pp. 373–394, Nov. 1982, https://tacl2013.cs.columbia.edu/ojs/index.php/tacl/article/view/45
doi:10.1007/BF00366618. 2
[13] G. Vergnaud. A classification of cognitive tasks and operations of [26] S. Roy and D. Roth. Mapping to Declarative Knowledge for Word
thought involved in addition and subtraction problems. Addition Problem Solving. Transactions of the Association for
subtraction: A Cogn. perspective, pp. 39–59, 1982, doi: 10.4324/ Computational Linguistics. vol. Volume 6, pp. 159-172, 2018.
9781003046585-4.
[27] L. Zhou, S. Dai, and L. Chen. Learn to solve algebra word
[14] T. P. Carpenter, E. Ansell, M. L. Franke, E. Fennema, and L. problems using quadratic programming. in In: Proceedings of the
Weisbeck. Models of problem solving: A study of kindergarten 2015 Conference on Empirical Methods in Natural Language
children’s problem-solving processes. J. Res. Math. Educ., pp. Processing, EMNLP 2015, Lisbon, Portugal, September 17-21,
428–441, Nov. 1993, doi:10.5951/jresematheduc.24.5.0428. 2015, pp. 817-822.
[15] N. Kushman, L. Zettlemoyer, R. Barzilay, and Y. Artzi. Learning [28] S. Upadhyay and M. Chang. Annotating derivations: A new
to automatically solve algebra word problems. in Proc. 52nd Annu. evaluation strategy and dataset for algebra word problems. 2016.
Meeting Association Computational Linguistics (ACL), Baltimore, [Online]. http://arxiv.org/abs/1609.07197
MD, USA, Jun. 22–27, 2014, pp. 271–281, doi: 10.3115/v1/P14-
[29] D. Huang, S. Shi, C. Lin, J. Yin, and W. Ma. How well do
1026.
computers solve math word problems? large-scale dataset
[16] R. Koncel-Kedziorski, H. Hajishirzi.+ 90A. Sabharwal, O. Etzioni, construction and evaluation. In: Proceedings of the 54th Annual
and S. D. Ang. Parsing algebraic word problems into equations. Meeting of the Association for Computational Linguistics, ACL
Trans. 01Assoc. Comput. Linguistics. vol. 3, pp. 585–597, Dec. 2016. vol. Volume 1: Long Papers (2016), August 2016. [Online].
2015, doi: 10.1162/tacl_a_00160. http://aclweb.org/anthology/P/P16/P16-1084.pdf
[17] D.G. Bobrow. Natural language input for a computer problem [30] Y. Wang, X. Liu, and S. Shi. Deep Neural Solver for Math Word
solving system. 1964. Problems. pp. 845–854, January 2017. [Online].
https://www.aclweb.org/anthology/D17-1088.pdf
[18] E. Charniak. Computer Solution of Calculus Word Problem. 1968.
[31] M.S., et al. Riley. Development of children’s problem-solving
[19] Y. Bakman. Robust understanding of word problems with
ability in arithmetic. 1984.
extraneous information. vol. arXiv preprint math/0701393, 2007.
[32] D.J. Briars and J.H Larkin. An integrated model of skill in solving
[20] C. Liguda and T. Peffier. Modeling Math Word Problems with
elementary word problems. vol. Cognition and instruction 1(3), pp.
Augmented Semantic Networks. in In: Bouma G., Ittoo A., Métais
245-296, 1984.
E., Wortmann H. (eds) Natural Language Processing and
International Journal of Intelligent Systems and Applications in Engineering IJISAE, 2022, 10(1), 87–97 | 96
[33] D. Dellarosa. A computer simulation of childrens arithmetic word-
problem solving. Behavior Reaearch Methods. vol. Instruments, &
Computers 18(2), pp. 147-154, 1986.
International Journal of Intelligent Systems and Applications in Engineering IJISAE, 2022, 10(1), 87–97 | 97