0% found this document useful (0 votes)

27 views

Question Answering Systems With Deep Learning-Based Symbolic Processing

The authors propose methods to learn symbolic processing with deep learning and use the learned models to build question answering systems. Symbolic processing like Prolog can be learned using neural machine translation and Word2Vec models. This allows building question answering systems from Prolog knowledge bases that can handle unknown data, which is difficult for conventional methods. The proposed systems demonstrate higher potential for applications using large amounts of web data.

Uploaded by

Aparajita Aggarwal

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

27 views

Question Answering Systems With Deep Learning-Based Symbolic Processing

Uploaded by

Aparajita Aggarwal

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 11

Received August 25, 2019, accepted October 13, 2019, date of publication October 17, 2019, date of current

version October 30, 2019.

Digital Object Identifier 10.1109/ACCESS.2019.2948081

Question Answering Systems With Deep

Learning-Based Symbolic Processing
HIROSHI HONDA AND MASAFUMI HAGIWARA, (Senior Member, IEEE)
Faculty of Science and Technology, Keio University, Yokohama 2238522, Japan
Corresponding author: Hiroshi Honda ([email protected])
This work was supported in part by the Keio University Kenkyu-no-Susume Scholarship, and in part by the KLL Ph.D. Program Research
Grant.

ABSTRACT The authors propose methods to learn symbolic processing with deep learning and to build
question answering systems by means of learned models. Symbolic processing, performed by the Prolog
processing systems which execute unification, resolution, and list operations, is learned by a combination
of deep learning models, Neural Machine Translation (NMT) and Word2Vec training. To our knowledge,
the implementation of a Prolog-like processing system using deep learning is a new experiment that has
not been conducted in the past. The results of their experiments revealed that the proposed methods are
superior to the conventional methods because symbolic processing (1) has rich representations, (2) can
interpret inputs even if they include unknown symbols, and (3) can be learned with a small amount of
training data. In particular (2), handling of unknown data, which is a major task in artificial intelligence
research, is solved using Word2Vec. Furthermore, question answering systems can be built from knowledge
bases written in Prolog with learned symbolic processing, which, with conventional methods, is extremely
difficult to accomplish. Their proposed systems can not only answer questions through powerful inferences
by utilizing facts that harbor unknown data but also have the potential to build knowledge bases from a
large amount of data, including unknown data, on the Web. The proposed systems are a completely new
trial, there is no state-of-the-art methods in the sense of ‘‘newest’’. Therefore, to evaluate their efficiency,
they are compared with the most traditional and robust system i.e., the Prolog system. This is new research
that encompasses the subjects of conventional artificial intelligence and neural network, and their systems
have higher potential to build applications such as FAQ chatbots, decision support systems and energy-
efficient estimation using a large amount of information on the Web. Mining hidden information through
these applications will provide great value.

INDEX TERMS Deep learning, knowledge base, prolog, question answering system, neural machine
translation, symbolic processing, Word2Vec.

I. INTRODUCTION volumes of data as found on the Web, became difficult with

In artificial intelligence research, a wide range of stud- these technologies, and hence, they were only used in limited
ies on inferences using symbolic processing exist. From fields.
1970s to 1980s, studies on expert systems [1] that simu- In the 1990s, connectionists studied symbolic processing
lated expert decision making, were actively undertaken. Early by using multilayered neural networks [7], [8]. However,
expert systems needed to strictly build knowledge bases by owing to limitations of the hardware at that time and learning
hand. Eventually, studies were conducted to generate rules ability of the multilayered neural networks, the connectionists
where the knowledge base was rendered incomplete [2], [3] were limited to propose methods but were unable to build
and to obtain inferences based on hypotheses when there practical systems.
was a shortage of data [4]–[6], which, consequently, helped With the emergence of deep learning technique emerged
to improve these expert systems. However, handling large at the start of 2010s, the learning ability of neural networks
drastically improved. In particular, graph structures could be
The associate editor coordinating the review of this manuscript and learned by deep learning [9], [10], and symbolic processing
approving it for publication was Victor Hugo Albuquerque . could be reexamined by neural networks [11]–[13]. However,

This work is licensed under a Creative Commons Attribution 4.0 License. For more information, see http://creativecommons.org/licenses/by/4.0/
152368 VOLUME 7, 2019
H. Honda, M. Hagiwara: Question Answering Systems With Deep Learning-Based Symbolic Processing

in order to build knowledge bases from large volumes of II. RELATED WORK
data existing on the Web, symbolic processing will need rich A. SYMBOLIC PROCESSING WITH NEURAL NETWORKS
representations to cater to various formats, high robustness to Before the emergence of deep learning, many studies have
handle errors or unknown data, and learning capabilities from attempted to train neural networks on symbolic processing
small data. Thus, in the subsequent paragraphs, we propose and use it for inference [23]. Additionally, studies have pre-
methods with the following features which have not been viously been conducted to learn propositional logic [24]–[26]
covered by the conventional methods. and first-order predicate logic [27]–[29] as well as to perform
1) There is no restriction on the number of terms included unification [30], [31], similar to the present study. However,
in an atomic formula, regardless of the network config- these studies were limited to method proposal since they
uration. could not be implemented.
2) There is no restriction on the number of atomic for- With the emergence of deep learning, studies to learn sym-
mulas included in a formula, regardless of the network bolic processing with graph networks [11], [12], [32]–[34]
configuration. and feedforward networks [13] have been performed and
3) There is no need to provide meta-rules to the network. documented. In the case of symbolic learning with graph
4) List structures can be used in atomic formulas. networks, it was necessary to provide preliminarily forms of
5) Interpretation of inputs is possible even if they include atomic formulas or formulas to networks and to presume that
unknown atoms. forms of formulas were included in the data in advance. In the
6) Models can be trained with small training data. case of symbolic learning with feedforward networks, forms
The fourth feature regarding handling of unknown data can of atomic formulas and formulas depended on the network
be attributed to the implementation of Word2Vec, which is configuration and it was necessary to provide meta-rules to
an important achievement in the recent years in the field of the network in advance. Our proposed methods levy no such
neural network research. This method represents concepts restrictions on the number of terms included in an atomic
as vectors and thereby facilitates the estimation of similar- formula or the number of atomic formulas included in a
ities between concepts. Therefore, similarities between the formula, besides there is no need to give forms of the atomic
known and unknown data can be used to resolve handling of formulas to networks in advance.
unknown data.
Furthermore, we show an application by embedding B. QUESTION ANSWERING SYSTEMS
learned models into question answering systems. Most of the
After the emergence of deep learning, studies about question
conventional question answering systems [14], [15] are not
answering systems with deep learning [15], [35], [36] have
entirely capable of inferring from a large amount of infor-
been conducted, and the performance of these systems have
mation on the Web containing unknown data. The proposed
improved.
systems are designed to answer questions through powerful
These systems search for answer candidates from facts,
inferences based on first-order predicate logic by utilizing
select an answer from the existing candidates, and respond.
facts containing unknown data. By applying the proposed
Unlike the proposed system, these systems do not infer, but
systems, it will be possible to build applications such as
merely answer the facts.
high-performance frequently asked questions (FAQ) chat-
Additionally, studies have also been carried out to embed
bots [16], [17], decision support systems (DSS) [18], [19]
questions in vector representations and infer answers by deep
energy-efficient estimation in sensor networks [20]–[22]
learning [37]–[39]. However, even if unknown symbols have
using information on the Web. If these applications are real-
been included in questions, to our knowledge, no studies have
ized, users might be able to explore novel concepts or mine
previously embedded unknown symbols in internal repre-
hidden information.
sentations and inferred answers as proposed by the methods
Since connectionism in the 1990s, there have been very
described in this paper.
few studies in which the research areas of conventional
artificial intelligence and neural network have intersected.
This is a new research area spanning conventional artificial III. SYMBOLIC PROCESSING
intelligence and deep learning. Furthermore, this research Here, Prolog-like system is used for symbolic processing.
aims to accomplish a Prolog-like processing system using When a Prolog [40] processing system receives a question,
deep learning, and to our knowledge, this is a novel it refers to facts and rules stored in a knowledge base and
application. infers an answer. A question consists of one or more goals.
In this paper, we begin by reviewing related research work A Prolog processing system infers goals by backward rea-
in section II. In section III, we define and describe sym- soning. The following is a brief description of the operations
bolic processing which is the learning target. In section IV, that a Prolog processing system performs to infer answers.
we propose learning methods for symbolic processing
and building methods of question answering systems. A. UNIFICATION
Section V reports the experimental results of the proposed Unification is a process of determining whether two given
methods. terms are identical. In Prolog programs, it is possible to

VOLUME 7, 2019 152369

H. Honda, M. Hagiwara: Question Answering Systems With Deep Learning-Based Symbolic Processing

FIGURE 1. An example of unification.

FIGURE 3. An example of backtrack.

FIGURE 2. An example of SLD resolution.

question whether two terms are identical by connecting the

two terms with the operator ‘‘=’’. If a term contains variables,
the variables are substituted for values which another term
indicates. FIGURE 4. An example of membership relation.
Fig. 1 shows an example of unification. If ‘‘male(tom)’’
and the right side of operator ‘‘=’’ are the same, a Pro- backtracking is performed. When the facts ‘‘parent(bob,
log processing system answers ‘‘true.’’ If they are different, tom).’’ is matched to goal, it becomes ‘‘X = bob.’’ because
it answers ‘‘false.’’ In the case of ‘‘male(X)’’ including a the fact ‘‘male(bob).’’ exists, it is successful.
variable, the variable is substituted and answers ‘‘X = tom.’’
D. LIST OPERATIONS
B. RESOLUTION Prolog can handle lists in data structures. Prolog programs
Resolution is the process of resolving a new goal from a given can perform operations such as adding new objects to a list
goal and rules. In Prolog, resolution called Selective Linear and joining two lists.
Definite (SLD) limited to Horn clauses is performed. Horn Fig. 4 shows an example of membership relation which
clause is a clause consisting of one or less positive literals. is one of the list operations. Membership relation is a pro-
Fig. 2 shows an example of SLD resolution. The rule cess of checking whether a specific object exists in the list.
‘‘father(X, Y):- male(X), parent(X, Y).’’ is applied to the The list ‘‘[colorado, nebraska, kansas]’’contains ‘‘kansas’’.
goal ‘‘father(bob, tom).’’ and the new goal ‘‘male(bob), parent Thus, ‘‘member(kansas, [colorado, nebraska, kansas]).’’ is
(bob, tom).’’ is resolved. equal to ‘‘true.’’ On the other hand, the list ‘‘[colorado,
nebraska, kansas]’’ does not contain ‘‘georgia’’. Thus, ‘‘mem-
C. BACKTRACK ber(georgia, [colorado, nebraska, kansas]).’’ is equal to
A Prolog processing system takes facts and rules from a ‘‘false.’’
knowledge base and performs unification and resolution.
If unification or resolution fails, then the system retraces steps IV. PROPOSED DEEP LEARNING-BASED SYMBOLIC
to the previous successful operation; subsequently, it identi- PROCESSING AND QUESTION ANSWERING SYSTEMS
fies new facts and rules and repeats unification and resolution. In this section, we describe the methods of learning symbolic
This process executed by a Prolog processing system is called processing with deep learning. Subsequently, we describe
backtrack. symbolic processing methods to build question answer sys-
Fig. 3 shows an example of backtrack; ‘‘male(X), parent(X, tems from knowledge bases.
tom).’’ is resolved from the rule ‘‘father(X, Y):- male(X),
parent(X, Y).’’ and the goal ‘‘father(X, tom).’’ Subsequently, A. DEEP LEARNING-BASED SYMBOLIC PROCESSING
the goal ‘‘parent(X, tom)’’ matches the fact ‘‘parent(mary, We propose methods to learn matching, resolution and
tom).’’ and it becomes ‘‘X = mary.’’ However, it fails because membership relations described in section II by com-
there is no fact unification with the goal ‘‘male(mary).’’ hence bining Neural Machine Translation (NMT) [41]–[43] and

152370 VOLUME 7, 2019

H. Honda, M. Hagiwara: Question Answering Systems With Deep Learning-Based Symbolic Processing

FIGURE 6. Resolution model.

FIGURE 5. Unification model.

Word2Vec [44]–[46]. NMT is a model that can learn to

convert one sequence into another. Unlike graph networks
[11], [12] and feedforward networks [13], NMT has no
restriction in the number of terms included neither in an
atomic formula nor in the number of atomic formulas
included in a formula. It is also not imperative to provide
to the network with meta-rules in advance. Furthermore,
Word2Vec is used to replace unknown atoms with semanti-
cally similar atoms.

1) LEARNING UNIFICATION
Fig. 5 shows a unification model used in the proposed sys- FIGURE 7. Example of increasing training data.
tems. The input to this model is questions to determine pos-
sibility of performing the unification process. The model is
trained so that it outputs ‘‘true.’’ if it matches and ‘‘false.’’ if ple, in the rule ‘‘father(bob, mary):- male(bob), spouse(bob,
it does not match. Although questions can include variables mary).’’, the input to the proposed model is ‘‘father(bob,
unlike that in Prolog processing systems, these variables are mary).’’ and the output is ‘‘male(bob), spouse(bob, mary).’’
not substituted and only whether they are identical is deter- When a word string forming a part of the head is input to
mined. For example, ‘‘male(X) = male(tom).’’ should output the resolution model, the embedding layer for input, converts
‘‘true.’’ instead of ‘‘X = tom.’’ After checking whether terms the string into a combined vector of Word2Vec and Gray
are identical, substitution is performed outside the model. code word embeddings, as per conventions defined in the
When a query in the form of a word string is input to unification learning processes. Subsequently, the combined
the unification model, the embedding layer for the input vector of Word2Vec and Gray code is passed to NMT. The
converts the string into a combined vector, a 300-dimensional output from NMT consists of the one-hot encoded word
word embedding, of Word2Vec and Gray code [47]. Common inserted in the embedding layer, and also the resolution result
nouns of atoms such as ‘‘male’’ are Word2Vec word embed- of the word string forming the body.
dings, however logical symbols such as ‘‘(‘‘, ’’)’’, ‘‘,’’, ‘‘.’’, When training resolution with NMT using the existing
and proper nouns of atoms such as ‘‘bob’’ are Gray code knowledge base, the volume of training data may not be
word embeddings. Gray code has a characteristic that the sufficient in some cases. In such a case, a method to augment
Hamming distance between adjacent codes is 1. Although the volume of proper noun data is applied. For example,
the input could include words not contained in the training imaginary proper nouns such as ‘‘bob-1’’ and ‘‘bob-2’’ are
data, output can be obtained, because Word2Vec is used for generated from ‘‘bob’’, and training data is increased as
embedding atoms. shown in Fig. 7.
Subsequently, the combined vector of Word2Vec and Gray
code embeddings is passed to NMT. The output from NMT 3) LEARNING MEMBERSHIP RELATION
consists of the one-hot encoded word inserted in the embed- Fig. 8 shows the membership relation model. The input to
ding layer, and also the unification result of the input word this model is a membership related question such as ‘‘mem-
string. ber(kansas, [colorado, nebraska, kansas] ).’’ The model is
trained so that it outputs ‘‘true.’’ if the object exists in the list
2) LEARNING RESOLUTION and ‘‘false.’’ if it does not exist. Besides, questions to these
Fig. 6 shows resolution model used in the proposed systems. models do not have variables.
When the head of a rule is input to this model, the model When a query in the form of a word string is input to
is trained so that it outputs the body of the rule. For exam- the membership relation model, the input embedding layer,

VOLUME 7, 2019 152371

H. Honda, M. Hagiwara: Question Answering Systems With Deep Learning-Based Symbolic Processing

FIGURE 8. Membership relation model.

converts the string into a combined vector of Word2Vec and

Gray code word embeddings, as per conventions defined in FIGURE 9. Question answering system.
the unification learning processes. Subsequently, the com-
bined vector of Word2Vec and Gray code is passed to NMT.
The output from NMT consists of the one-hot encoded
word inserted in the embedding layer, and also the word
string, result of the object existence operation.
If the volume of training data is not sufficient, the volume
of training data is augmented in the same way as performed
for the resolution learning.

FIGURE 10. User interface.

B. QUESTION ANSWERING SYSTEMS WITH DEEP
LEARNING-BASED SYMBOLIC PROCESSING
We propose a method to build question answering sys-
tems using the model of symbolic processing detailed in
section IV-A. Users can construct questions-and-answers
with the question answering system by inputting texts. This
system supports yes/no-questions [48], what-questions [49],
and why-questions [50].
Fig. 9 shows the proposed question answering system.
When the user interface of the system receives questions
of texts inputted from users, it passes them to the Text-to-
Prolog model. Text-to-Prolog converts user questions from FIGURE 11. Some examples of translation from texts to Prolog.
text to Prolog and routes them to the inference engine. The
inference engine infers the answer to the question described 2) TEXT-TO-PROLOG
in Prolog using the facts stored in the Prolog knowledge base Text-to-Prolog converts texts of user questions into Prolog.
and returns the result to Prolog-to-Text model. Prolog-to-Text Some examples of conversion by question patterns are shown
converts the system response described in Prolog into text in Fig. 11. Yes/no-questions such as ‘‘Is Bob Tom’s father?’’
and routes it to the user interface. The user interface displays are converted to Prolog questions that do not use variables
this response to the user. Details of each component of the such as ‘‘father( bob, tom ).’’ What-questions such as ‘‘Who
question answering system are described below. is Tom’s father?’’ are converted into ‘‘father( X, tom ).’’ by
replacing a question target with a variable. Why-questions
1) USER INTERFACE such as ‘‘Why is Bob Tom’s father?’’ are converted to Pro-
Fig. 10 shows the user interface of the question answering log questions with ‘‘trace’’ at the beginning such as ‘‘trace,
system. In this example, when a user inputs a question ‘‘Is father( bob, tom ).’’ ‘‘trace’’ means a built-in predicate that
Bob a male?’’ in natural language, the question answering performs all tracing of goals.
system returns ‘‘Yes.’’ as the response in natural language. Conversion from text to Prolog is learned by NMT.
After a series of questions-and-answers is completed, a user By using deep learning, various user input questions
can enter the next question. can be converted to their corresponding Prolog questions.
Furthermore, if either the Text-to-Prolog or inference Fig. 12 shows the Text-to-Prolog model. The model is trained
engine provides erroneous grammar as input to Prolog sys- so that when questions using regular text are input to this
tem, and the system cannot answer, then the user interface model, it outputs the equivalent Prolog questions.
handles the error gracefully and outputs a message, ‘‘I cannot When the text string of word is input to Text-to-Prolog
answer.’’ model, the embedding layer for input, converts the string

152372 VOLUME 7, 2019

H. Honda, M. Hagiwara: Question Answering Systems With Deep Learning-Based Symbolic Processing

FIGURE 14. An example of translation from Prolog to texts.

FIGURE 12. Text-to-Prolog model.

In this algorithm, a goal list received from Text-to-Prolog is
unified, resolved, and membership relations checked by uni-
fication model, resolution model, and membership relation
model respectively, to obtain a new goal list. Even if unknown
atoms are included in the goal list received from Text-to-
Prolog, they are converted into a new goal list consisting
only of known atoms from unification model and resolution
model. After that, backtracking as performed in Prolog pro-
cess systems is performed on the new goal list. If ‘‘trace’’
is included in the goal list, abovementioned new goal list is
returned as response to the Why-questions.

4) PROLOG-TO-TEXT
Prolog-to-Text converts system response Prolog to regular
text. An example of conversion by question patterns is shown
in Fig. 14. Conversion from Prolog to text is done with rules.

C. NEURAL MACHINE TRANSLATION (NMT)

NMTs are models used in machine translation, which help to
facilitate conversion of one sequence into another. We apply
the widely used Seq2Seq with Attention [42] and Trans-
former [43] methods to the NMT used in our unification,
resolution, membership relation and Prolog-to-Text models.
Seq2Seq with Attention and Transformer are typical NMTs.
Unlike graph networks and feedforward networks, when
NMT is used, there is no restriction on the number of terms
included in an atomic formula and the number of atomic for-
mulas included in a formula, and meta rules are unnecessary.
This is because NMT can handle sequential data.

FIGURE 13. Inference algorithm. 1) SEQ2SEQ

Seq2Seq with Attention [42] consists of three blocks viz., the
Encoder, Decoder and Attention. When the Encoder receives
into a vector of word embedding by using one-hot encoding an input sequence, it returns a compressed vector. The Atten-
method. Subsequently, this vector is passed to NMT. The out- tion block, based on the context of the output sequence, cal-
put from NMT consists of the one-hot encoded word inserted culates the degree of attention to be given to each word in the
in the embedding layer for output, and also the resultant word input sequence. The compressed vector is then weighted by
string of the Prolog program. attention and added. When the Decoder receives the resulting
vector output from the Encoder and Attention, it generates an
3) INFERENCE ENGINE output sequence.
The inference engine performs symbolic processing using The Encoder and Decoder are composed of Long Short-
the model learned in section IV-A. Fig. 13 shows the algo- Term Memory (LSTM) [51], which is a type of Recurrent
rithm by which the inference engine generates responses from Neural Network (RNN) [52]. LSTM can handle sequential
questions. data with long-term dependencies that cannot be learned

VOLUME 7, 2019 152373

H. Honda, M. Hagiwara: Question Answering Systems With Deep Learning-Based Symbolic Processing

with conventional RNNs. The Encoder applies Bi-LSTM

that uses future as well as past information. Bi-LSTM has
128-dimensional output and three layers. The Decoder
applies Stateless LSTM that does not take over short-term
memory. Stateless LSTM has 128-dimensional output and
uses Maxout [53] as the activation function.
The dropout rate is 0.1, the batch size is 128 and 20 epochs
are trained. For optimization, we use the Adam optimizer [54]
with α = 0.001, β1 = 0.9, β2 = 0.999, eps = 1e-08.

2) TRANSFORMER
Transformer [43] consists of two blocks, an Encoder Stack FIGURE 15. Seq2Seq with attention.
and a Decoder Stack. The Encoder Stack receives an input
sequence and the Decoder Stack returns the output sequence.
The Encoder and Decoder Stacks each have 6 Encoders and
6 Decoders.
The Encoder consists of a Feedforward and a Self-
Attention, 512-dimensional output and six layers. The
Decoder consists of a Feedforward, an Encoder-Decoder
Attention and a Self-Attention, 512-dimensional output and
six layers.
The Self-Attention is used to relate different positions
of a single sequence by computing a representation of the
sequence. The Encoder-Decoder Attention helps the Decoder
to focus on the appropriate parts of the input sequence.
The Feedforward uses Leaky ReLU [55] for the activation
function.
The dropout rate is 0.1, the batch size is 48 and 100 epochs
are trained. For optimization, we use Adam optimizer [53]
with α = 5e-5, β1 = 0.9, β2 = 0.98, eps = 1e-9.

D. WORD2VEC
Word2Vec is a method for obtaining the vector representa-
tion of each word from a large amount of text data using
a neural network. We apply pre-trained Word2Vec called
GoogleNews-vectors-negative300 [40]–[42] to the unifica-
tion, resolution and membership relation models.
GoogleNews-vectors-negative300 is trained on a dataset
of approximately 100 billion words. The model has
300-dimensional vectors for 3 million words and phrases.
FIGURE 16. Transformer.

V. EVALUATION EXPERIMENTS
Using knowledge bases described in Prolog, we trained mod-
els, built question answering systems, and evaluated their
performance. Specifically, we conducted experiments using
two kinds of knowledge bases with graph structures.

A. EXPERIMENTS USING KINSOURCES FIGURE 17. An example of kinsources knowledge base.

Kinsources [56] is a collection of data representing blood

relationships, and this knowledge is presented graphically. Based on this knowledge base, training data are gener-
We used the data set to build the knowledge base for our ated. Because the number of training data of Kinsources
experiment. Fig. 17 shows an example of the knowledge is sufficiently large, we do not augment the data. The
base. The knowledge base described in Prolog consists training data of Text-to-Prolog includes Yes/no-questions,
of 5,887 atoms and 10 kinds of predicates, excluding list What-questions which have interrogative pronouns for sub-
structures. jects, What-questions which have interrogative pronouns for

152374 VOLUME 7, 2019

H. Honda, M. Hagiwara: Question Answering Systems With Deep Learning-Based Symbolic Processing

TABLE 1. Results of models by kinsources. TABLE 3. Results of resolution model with unknown data.

TABLE 4. Results of question answering system.

TABLE 2. Result of unification model with unknown data.

objects, and Why-questions, and each of which prepares two

patterns of wording.
The learning results of these models are shown in Table 1.
The test data in Table 1 does not contain the case with
unknown data. A correct answer rate is calculated as follows.

rate = num of exact matches / num of Test data (1)

We can compute each correct answer rate when Seq2Seq and

Transformer are applied to NMT. Furthermore, we conducted
similar experiments with the conventional Prolog system to
derive a correct answer rate, which was then compared to the FIGURE 18. An example of geoquery knowledge base.
figures generated by the proposed systems.
Any unknown data provided as test data input to the uni-
fication models were replaced as shown in Table 1 with randomly generated with the wording of two patterns for
‘‘male’’ as ‘‘man’’ and ‘‘female’’ as ‘‘woman’’. The results each question type. This was done with an understanding
of unification using unknown data are shown in Table 2. that if questions are randomly generated without dividing the
Any unknown data provided as test data input to the resolu- question types, then Negative questions will largely occupy
tion model were replaced as shown in Table 1 with ‘‘father’’ the dataset.
as ‘‘dad’’ and ‘‘mother’’ as ‘‘mom’’. The results of resolution Then, we decided to carry out the experiment separately
using unknown data are aggregated in Table 3. for Positive and Negative questions. Table 4 shows the correct
The question answering system was built using each model answer rate when questions are input to the question answer-
as shown in Table 1. Since Seq2Seq has better performance ing system.
than Transformer, it was applied to NMT of the question
answering system. In the experiments, questions were gen- B. EXPERIMENTS USING GEOQUERY
erated for each combination of question and response types Geoquery [57] is a database that represents the United States
and provided to the question answering system. There are geography using graph structures. We have used the data set
two kinds of response types, Positive and Negative. Posi- to build the knowledge base for our experiment. Fig. 18 shows
tive is the case where responses such as ‘‘Yes.’’ for Yes/no- an example of the knowledge base. The knowledge base,
questions, ‘‘Bob.’’ for What-questions or Why-questions are written in Prolog consists of 574 atoms and 8 kinds of pred-
obtained. Negative is the case where responses such as ‘‘No’’ icates and includes list structures in their bodies. Further-
for Yes/no-questions, and no valid response in case What- more, as shown in rules having their predicates of heads as
questions or Why-questions are obtained. Questions included ‘‘located’’ (Fig. 18), the representation of the knowledge base
unknown data, as shown in Tables 2 and 3, and they were also contains disjunctions.

VOLUME 7, 2019 152375

H. Honda, M. Hagiwara: Question Answering Systems With Deep Learning-Based Symbolic Processing

TABLE 5. Results of models by geoquery. TABLE 6. Results of resolution model with unknown data.

TABLE 7. Results of question answering system.

Based on this knowledge base, training data of models

were generated. Due to the small volume of training data
available for the resolution model and membership relation was applied than when Transformer was applied to NMT.
model, the data were augmented. In the training data of the With Seq2Seq the correct answer rate of the unification model
resolution model, lists were replaced with the variable ‘‘L’’. was 1.00 and the correct answer rate of the resolution model
This was the way for making learning easier, wherein lists was 0.823 or more. We did not restrict primitive logic and
could be uniquely determined from logical symbols appear- logical formulas, nor did we specify meta-rules in advance for
ing in rules. any of the models and still realized a high correct answer rate.
The training data of Text-to-Prolog included Yes/no- In the membership relation model with Seq2Seq, the correct
questions, and they were generated with the wording of two answer rate was 0.917, and data structures including lists
patterns. were handled. Therefore, the proposed methods have rich
The learning results of these models are shown in Table 5. representations for handling various data structures.
The test data in Table 5 does not contain the case with With Seq2Seq the correct answer rate of the unification
unknown data. We calculated each correct answer rate when model for unknown data was 0.839 or more and the correct
Seq2Seq and Transformer were applied to NMT. Further- answer rate of the resolution model was 0.727 or more. As for
more, we conducted similar experiments with conventional the membership relation model, because elements of lists
Prolog system to derive a correct answer rate, which was then were only proper nouns, we were unable to measure the
compared to the figures generated by the proposed systems. correct answer rate of unknown data in this experiment. How-
Any unknown data provided as test data input to the ever, because the membership relation model employs the
resolution model were replaced as shown in Table 5 with same learning model as those for unification and resolution,
‘‘adjacent’’ as ‘‘adjoining’’ and ‘‘located’’ as ‘‘situated’’. The we can infer that the results will apply to unknown data as
results of resolution using unknown data are shown in Table 4. well. Thus, by using Word2Vec, this proposed method pro-
The question answering system was developed using each vides high robustness for handling unknown data in inputs.
model as illustrated in Table 7. Since Seq2Seq has better Data that is originally available for learning of the resolu-
performance than Transformer, it was applied to NMT of tion model in the Geoquery dataset is 496, and data that
the question answering system. In the experiments, ques- can be used for learning the membership relation model is
tions were generated for each combination of question and 1,256. These are very small volumes of data. However, with
response types and provided to the question answering sys- Seq2Seq the correct answer rates are 0.823 and 0.913, respec-
tem. Question type is one kind of Yes/no-questions. There are tively, and the proposed method shows that the resolution
two kinds of response types, Positive and Negative. Positive is model can learn even from small-scale dataset.
the case where responses such as ‘‘Yes.’’ for Yes/no-questions In addition, comparing the proposed systems with the
are obtained. Negative is the case where responses such as conventional Prolog system, it is evident that the former
‘‘No’’ for Yes/no-questions are obtained. Questions included can handle unknown data. Since the conventional Prolog
unknown data, as shown in Table 6, and they were randomly system has manually built-in rules, when using known data,
generated with the wording of two patterns for each question the rate of unification, resolution, and membership relation
type. Table 7 shows the correct answer rate when questions was 1.00. However, when using unknown data, the rate was
are input to the question answering system. 0.00. Although the proposed systems do not have a rate that
is less than 1.0 for known data, not only can it achieve high
C. DISCUSSION performance, but it can also handle unknown data.
Experimental results on the learning of the symbolic process- In the experimental results of the question answering
ing revealed that the performance was better when Seq2Seq systems, the minimum value of the correct answer rate is

152376 VOLUME 7, 2019

H. Honda, M. Hagiwara: Question Answering Systems With Deep Learning-Based Symbolic Processing

0.614 and the maximum value is 1.00. Although it varies [9] M. Schlichtkrull, T. N. Kipf, P. Bloem, R. van den Berg, I. Titov, and
depending on the question types and response types, practical M. Welling, ‘‘Modeling relational data with graph convolutional net-
works,’’ in Proc. Eur. Semantic Web Conf. (ESWC), 2018, pp. 593–607.
question answering systems could likely be built based on the [10] T. Trouillon, J. Welbl, S. Riedel, E. Gaussier, and G. Bouchard, ‘‘Complex
existing knowledge bases. We do not learn the case where res- embeddings for simple link prediction,’’ in Proc. 33nd Int. Conf. Mach.
olution is impossible in the resolution model incorporated in Learn. (ICML), New York, NY, USA, 2016, 2016 pp. 2071–2080.
[11] T. Rocktaschel and S. Riedel, ‘‘End-to-end differentiable proving,’’ in
the system this time. If it becomes possible to judge whether
Proc. Annu. Conf. Neural Inf. Process. Syst., 2017, pp. 3788–3800.
or not resolution is impossible in the resolution model, there [12] P. Minervini, M. Bosnjak, T. Rocktäschel, and S. Riedel, ‘‘Towards neural
is a possibility that the correct answer rate when inputting theorem proving at scale,’’ 2018, arXiv:1807.08204. [Online]. Available:
‘‘Negative of What-questions’’ which is the lowest this time https://arxiv.org/abs/1807.08204
[13] H. Dong, J. Mao, T. Lin, C. Wang, L. Li, and D. Zhou, ‘‘Neural logic
is improved. machines,’’ in Proc. Int. Conf. Learn. Represent., New Orleans, LA, USA,
2019, pp. 1–22.
VI. CONCLUSION AND FUTURE WORK [14] S. Tellex, B. Katz, J. Lin, G. Marton, and A. Fernandes, ‘‘Quantitative
evaluation of passage retrieval algorithms for question answering,’’ in Proc.
In this paper, we have proposed the methods to learn symbolic 26th Annu. Int. ACM SIGIR Conf. Res. Develop. Inf. Retr., Toronto, ON,
processing using deep learning and the methods to build ques- Canada, Aug. 2003, pp. 41–47.
tion answering systems using the trained model. Experimen- [15] R. Sequiera, G. Baruah, Z. Tu, S. Mohammed, J. Rao, H. Zhang, and J. Lin,
‘‘Exploring the effectiveness of convolutional neural networks for answer
tal results on the training of the symbolic processing models selection in end-to-end question answering,’’ 2017, arXiv:1707.07804.
show that the proposed methods have rich representations and [Online]. Available: https://arxiv.org/abs/1707.07804
high robustness and that these models can learn even from [16] N. T. Thomas, ‘‘An e-business chatbot using AIML and LSA,’’ in Proc. Int.
Conf. Adv. Comput., Commun. Inform. (ICACCI), Jaipur, India, Sep. 2016,
small-scale data. Especially by using Word2Vec, the ability to pp. 2740–2742.
handle unknown data will be a great contribution to artificial [17] L. Cui, F. Wei, S. Huang, C. Tan, C. Duan, and M. Zhou, ‘‘SuperA-
intelligence research. Moreover, experimental results of the gent: A customer service chatbot for e-commerce websites,’’ in Proc.
question answering systems suggested that practical question 55th Annu. Meeting Assoc. Comput. Linguistics-Syst. Demonstrations,
Jul. 2017, pp. 97–102.
answering systems could be built from the knowledge bases [18] H. Bhargava and D. Power, ‘‘Decision support systems and Web technolo-
written in Prolog. Building such systems will be extremely gies: A status report,’’ in Proc. Amer. Conf. Inf. Syst., Boston, MA, USA,
difficult using a conventional connectionism-based method. Dec. 2001, p. 46.
[19] M. S. Kohn, J. Sun, S. Knoop, A. Shabo, B. Carmeli, D. Sow,
This study pertains a new research domain that spans T. Syed-Mahmood, and W. Rapp, ‘‘IBM’s health analytics and clini-
the areas of conventional artificial intelligence and neural cal decision support.,’’ Yearbook Med. Inf., vol. 9, no. 1, pp154-162,
networks. We conducted experiments using simple datasets. Aug. 2014.
However, through the experimental results, we recommend [20] A. Sodhro, Y. Li, and M. Shah, ‘‘Energy-efficient adaptive transmission
power control for wireless body area networks,’’ IET Commun., vol. 10,
its potential applications in areas such as FAQ chatbots, DSS no. 1, pp. 81–90, Jan. 2016.
and energy-efficient estimation in sensor networks. If these [21] A. Sodhro, S. Pirbhulal, M. Lodro, and M. Shah, ‘‘Energy-efficiency in
applications can respond to information not only by facts but wireless body sensor networks,’’ in Networks of the Future Architectures,
Technologies, and Implementations. Boca Raton, FL, USA: CRC Press,
also by inferring from a large amount of information on the 2017, p. 492.
Web, the proposed systems might be able to contribute great [22] A. Sodhro, A. Sangaiah, G. Sodhro, A. Sekhari, Y. Ouzrout, and
value to society. S. Pirbhulal, ‘‘Energy-efficiency of tools and applications on Internet,’’
in Computational Intelligence for Multimedia Big Data on the Cloud
Future work includes symbolic processing to analyze with Engineering Applications (Intelligent Data-Centric Systems: Sensor
large-scale data present on the Web and inductive inference Collected Intelligence). Amsterdam, The Netherlands: Elsevier, 2018.
with deep learning-based symbolic processing. [23] A. S. d’Avila Garcez, K. Broda, and D. M. Gabbay, Neural-Symbolic
Learning Systems: Foundations and Applications. London, U.K.:
Springer-Verlag, 2002.
REFERENCES [24] J. W. Shavlik and G. G. Towell, ‘‘An approach to combining explanation-
based and neural learning algorithms,’’ Connection Sci., vol. 1, no. 3,
[1] R. K. Lindsay, B. G. Buchanan, E. A. Feigenbaum, and J. Lederberg,
pp. 231–253, 1989.
Applications of Artificial Intelligence for Organic Chemistry: The Dendral
Project. New York, NY, USA: McGraw-Hill, 1980. [25] G. G. Towell and J. W. Shavlik, ‘‘Knowledge-based artificial neural net-
works,’’ Artif. Intell., vol. 70, nos. 1–2, pp. 119–165, Oct. 1994. doi:
[2] J. R. Quinlan, ‘‘Induction of decision trees,’’ Mach. Learn., vol. 1, no. 1,
10.1016/0004-3702(94)90105-8.
pp. 81–106, 1986.
[26] A. S. A. Garcez and G. Zaverucha, ‘‘The connectionist inductive learning
[3] S. Muggleton, ‘‘Inductive logic programming,’’ New Gener. Comput.,
and logic programming system,’’ Appl. Intell., vol. 11, no. 1, pp. 59–77,
vol. 8, no. 4, pp. 295–318, Feb. 1991. [Online]. Available: http://www.
Jul. 1999. doi: 10.1023/A:1008328630915.
doc.ic.ac.uk/~shm/Papers/ilp.pdf. doi: 10.1007/BF03037089.
[27] L. Shastri, ‘‘Neurally motivated constraints on the working memory capac-
[4] G. Brewka, Nonmonotonic Reasoning: Logical Foundations of Common- ity of a production system for parallel processing: Implications of a
sense. Cambridge, U.K.: Cambridge Univ. Press, 1991. connectionist model based on temporal synchrony,’’ in Proc. 14th Annu.
[5] J. Doyle, ‘‘The ins and outs of reason maintenance,’’ in Proc. 8th Int. Joint Conf. Cognit. Sci. Soc. Bloomington, IN, USA: Psychology Press, vol. 14,
Conf. Artif. Intell. (IJCAI), Los Altos, CA, USA, 1983, pp. 349–351. Jul./Aug. 1992, p. 159.
[6] A. C. Kakas, R. A. Kowalski, and F. Toni, ‘‘Abductive logic programming,’’ [28] L. Ding, ‘‘Neural prolog-the concepts, construction and mechanism,’’ in
J. Log. Comput., vol. 2, no. 6, pp. 719–770, Dec. 1993. doi: 10.1093/log- Proc. IEEE Int. Conf. Syst., Man Cybern., Intell. Syst. 21st Century, vol. 4,
com/2.6.719. Oct. 1995, pp. 3603–3608.
[7] G. E. Hinton, ‘‘Preface to the special issue on connectionist symbol pro- [29] M. V. M. Franca, G. Zaverucha, and A. S. d’Avila Garcez, ‘‘Fast relational
cessing,’’ Artif. Intell., vol. 46, nos. 1–2, pp. 1–4, Nov. 1990. learning using bottom clause propositionalization with artificial neural
[8] D. S. Touretzky, ‘‘BoltzCONS: Dynamic symbol structures in a connec- networks,’’ Mach. Learn., vol. 94, no. 1, pp. 81–104, Jan. 2014. doi:
tionist network,’’ Artif. Intell., vol. 46, nos. 1–2, pp. 5–46, Nov. 1990. 10.1007/s10994-013-5392-1.

VOLUME 7, 2019 152377

H. Honda, M. Hagiwara: Question Answering Systems With Deep Learning-Based Symbolic Processing

[30] E. Komendantskaya, ‘‘Unification neural networks: Unification by error- [51] S. Hochreiter and J. Schmidhuber, ‘‘Long short-term memory,’’ Neural
correction learning,’’ Log. J. IGPL, vol. 19, no. 6, pp. 821–847, Dec. 2011. Comput., vol. 9, no. 8, pp. 1735–1780, 1997.
doi: 10.1093/jigpal/jzq012. [52] J. L. Elman, ‘‘Finding structure in time,’’ Cognit. Sci., vol. 14, no. 2,
[31] S. Holldobler, ‘‘A structured connectionist unification algorithm,’’ in Proc. pp. 179–211, Mar. 1990.
8th Nat. Conf. Artif. Intell., Boston, MA, USA, vol. 2, 1990, pp. 587–593. [53] I. Goodfellow, D. Warde-Farley, M. Mirza, A. Courville, and Y. Bengio,
[32] G. Sourek, V. Aschenbrenner, F. Zelezny, and O. Kuzelka, ‘‘Lifted rela- ‘‘Maxout networks,’’ in Proc. 30th Int. Conf. Mach. Learn., Atlanta, GA,
tional neural networks,’’ in Proc. Int. Conf. Cogn. Comput., Integrating USA, 2013, pp. 1–9.
Neural Symbolic Approaches, Montreal, QC, Canada, 2015. [54] D. Kingma and J. Ba, ‘‘Adam: A method for stochastic optimiza-
[33] W. W. Cohen, ‘‘Tensorlog: A differentiable deductive database,’’ 2016, tion,’’ 2014, arXiv:1412.6980. [Online]. Available: https://arxiv.org/
arXiv:1605.06523. [Online]. Available: https://arxiv.org/abs/1605.06523 abs/1412.6980
[34] L. Serafini and A. S. d’Avila Garcez, ‘‘Logic tensor networks: Deep [55] A. Maas, A. Hannu, and A. Ng, ‘‘Rectifier nonlinearities improve neural
learning and logical reasoning from data and knowledge,’’ in Proc. 11th network acoustic models,’’ in Proc. 30th Int. Conf. Mach. Learn., Atlanta,
Int. Workshop Neural-Symbolic Learn. Reasoning (NeSy), New York, NY, GA, USA, 2013, p. 3.
USA, 2016, pp. 1–12. [56] Kinsources: A Collaborative Web Platform for Kinship Data Sharing.
[35] T. Lai, T. Bui, S. Li, and N. Lipka, ‘‘A simple end-to-end question answer- Accessed: May 19, 2018. [Online]. Available: https://www.kinsources.net/
ing model for product information,’’ in Proc. 1st Workshop Econ. Natural [57] L. R. Tang and R. J. Mooney, ‘‘Automated construction of database inter-
Lang. Process., Jul. 2018, pp. 38–43. faces: Integrating statistical and relational learning for semantic parsing,’’
[36] Y. Tay, L. A. Tuan, and S. C. Hui, ‘‘Hyperbolic representation learning for in Proc. SIGDAT Conf. Empirical Methods Natural Lang. Process. Very
fast and efficient neural question answering,’’ in Proc. 11th ACM Int. Conf. Large Corpora (EMNLP/VLC), Hong Kong, Oct. 2000, pp. 133–141.
Web Search Data Mining, Los Angeles, CA, USA, Feb. 2018, pp. 583–591.
[37] B. Peng, Z. Lu, H. Li, and K. Wong, ‘‘Towards neural network-based
reasoning,’’ 2015, arXiv:1508.05508. [Online]. Available: https://arxiv.
org/abs/1508.05508
[38] D. Weissenborn, ‘‘Separating answers from queries for neural read-
ing comprehension,’’ 2016, arXiv:1607.03316. [Online]. Available:
https://arxiv.org/abs/1607.03316 HIROSHI HONDA received the B.E. and M.E.
[39] Y. Shen, P. Huang, J. Gao, and W. Chen, ‘‘Reasonet: Learning to stop degrees in administration engineering from Keio
reading in machine comprehension,’’ in Proc. 23rd ACM SIGKDD Int. University, Yokohama, Japan, in 2003 and 2005,
Conf. Knowl. Discovery Data Mining, Barcelona, Spain, Aug. 2017, respectively, where he is currently pursuing
pp. 1047–1055. the Ph.D. degree in information and computer
[40] I. Bratko, Prolog Programming for Artificial Intelligence. 2nd ed. Reading, science.
MA, USA: Addison-Wesley, 1990, p. 597. From 2005 to 2014, he was a Software Engi-
[41] I. Sutskever, O. Vinyals, and Q. V. Le, ‘‘Sequence to sequence learning neer with Mitsubishi Electric Corporation. From
with neural networks,’’ in Proc. NIPS, Montreal, QC, Canada, 2014, 2014 to 2016, he was a Software Engineer with
pp. 3104–3112.
Fuji Xerox Company Ltd. Since 2017, he has been
[42] D. Bahdanau, K. Cho, and Y. Bengio, ‘‘Neural machine translation by
jointly learning to align and translate,’’ in Proc. ICLR, San Diego, CA,
a Researcher with Honda R&D Company Ltd. His research interests include
USA, 2015, pp. 1–15. symbolic processing using deep learning and dialogue systems.
[43] A. Vaswani, N. Shazeer, N. Parmar, J. Uszkoreit, L. Jones, A. Gomez,
L. Kaiser, and I. Polosukhi, ‘‘Attention Is All You Need,’’ in Proc.
31st Conf. Neural Inf. Process. Syst., Long Beach, CA, USA, 2017,
pp. 5998–6008.
[44] T. Mikolov, K. Chen, G. Corrado, and J. Dean, ‘‘Efficient estimation of MASAFUMI HAGIWARA (M’89–SM’04)
word representations in vector space,’’ in Proc. ICLR, Scottsdale, AZ, received the B.E., M.E., and Ph.D. degrees
USA, 2013, pp. 1–12. in electrical engineering from Keio University,
[45] T. Mikolov, I. Sutskever, K. Chen, G. Corrado, and J. Dean, ‘‘Distributed Yokohama, Japan, in 1982, 1984, and 1987,
representations of words and phrases and their compositionality,’’ in Proc. respectively. Since 1987, he has been with Keio
NIPS, Lake Tahoe, NV, USA, 2013, pp. 3111–3119. University, where he is currently a Professor. From
[46] T. Mikolov, W. Yih, and G. Zweig, ‘‘Linguistic regularities in continuous 1991 to 1993, he was a Visiting Scholar with
space word representations,’’ in Proc. NAACL HLT, Atlanta, GA, USA, Stanford University. His research interests include
2013, pp. 746–751.
neural networks, fuzzy systems, and affective
[47] F. Gray, ‘‘Pulse code communication,’’ U.S. Patent 2 632 058 A,
Mar. 17, 1953.
engineering. He is a member of IEICE, IPSJ, JSAI,
[48] H. Kanayama, Y. Miyao, and J. Prager, ‘‘Answering Yes/no questions via SOFT, IEE of Japan, and JNNS. He received from the IEEE Consumer
question inversion,’’ in Proc. 24th Int. Conf. Comput. Linguistics, Mumbai, Electronics Society Chester Sall Award, in 1990, the Author Award from
India, Dec. 201, pp. 1377–1392. Japan Society of Fuzzy Theory and Systems, in 1996, the Technical Award
[49] D. Ravichandran and E. Hovy, ‘‘Learning surface text patterns for a and Paper Awards from Japan Society of Kansei Engineering, in 2003, 2004,
question answering system,’’ in Proc. 40th Annu. Meeting Assoc. Comput. and 2014, respectively, and the Best Research Award from Japanese Neural
Linguistics, Jul. 2002, pp. 41–47. Network Society, in 2013. He was the President of the Japan Society for
[50] R. Higashinaka and H. Isozaki, ‘‘Corpus-based question answering for Fuzzy Theory and Intelligent Informatics (SOFT).
why-questions,’’ in Proc. IJCNLP, Hyderabad, India, 2008, pp. 418–425.