Question Answering Systems With Deep Learning-Based Symbolic Processing
Question Answering Systems With Deep Learning-Based Symbolic Processing
ABSTRACT The authors propose methods to learn symbolic processing with deep learning and to build
question answering systems by means of learned models. Symbolic processing, performed by the Prolog
processing systems which execute unification, resolution, and list operations, is learned by a combination
of deep learning models, Neural Machine Translation (NMT) and Word2Vec training. To our knowledge,
the implementation of a Prolog-like processing system using deep learning is a new experiment that has
not been conducted in the past. The results of their experiments revealed that the proposed methods are
superior to the conventional methods because symbolic processing (1) has rich representations, (2) can
interpret inputs even if they include unknown symbols, and (3) can be learned with a small amount of
training data. In particular (2), handling of unknown data, which is a major task in artificial intelligence
research, is solved using Word2Vec. Furthermore, question answering systems can be built from knowledge
bases written in Prolog with learned symbolic processing, which, with conventional methods, is extremely
difficult to accomplish. Their proposed systems can not only answer questions through powerful inferences
by utilizing facts that harbor unknown data but also have the potential to build knowledge bases from a
large amount of data, including unknown data, on the Web. The proposed systems are a completely new
trial, there is no state-of-the-art methods in the sense of ‘‘newest’’. Therefore, to evaluate their efficiency,
they are compared with the most traditional and robust system i.e., the Prolog system. This is new research
that encompasses the subjects of conventional artificial intelligence and neural network, and their systems
have higher potential to build applications such as FAQ chatbots, decision support systems and energy-
efficient estimation using a large amount of information on the Web. Mining hidden information through
these applications will provide great value.
INDEX TERMS Deep learning, knowledge base, prolog, question answering system, neural machine
translation, symbolic processing, Word2Vec.
This work is licensed under a Creative Commons Attribution 4.0 License. For more information, see http://creativecommons.org/licenses/by/4.0/
152368 VOLUME 7, 2019
H. Honda, M. Hagiwara: Question Answering Systems With Deep Learning-Based Symbolic Processing
in order to build knowledge bases from large volumes of II. RELATED WORK
data existing on the Web, symbolic processing will need rich A. SYMBOLIC PROCESSING WITH NEURAL NETWORKS
representations to cater to various formats, high robustness to Before the emergence of deep learning, many studies have
handle errors or unknown data, and learning capabilities from attempted to train neural networks on symbolic processing
small data. Thus, in the subsequent paragraphs, we propose and use it for inference [23]. Additionally, studies have pre-
methods with the following features which have not been viously been conducted to learn propositional logic [24]–[26]
covered by the conventional methods. and first-order predicate logic [27]–[29] as well as to perform
1) There is no restriction on the number of terms included unification [30], [31], similar to the present study. However,
in an atomic formula, regardless of the network config- these studies were limited to method proposal since they
uration. could not be implemented.
2) There is no restriction on the number of atomic for- With the emergence of deep learning, studies to learn sym-
mulas included in a formula, regardless of the network bolic processing with graph networks [11], [12], [32]–[34]
configuration. and feedforward networks [13] have been performed and
3) There is no need to provide meta-rules to the network. documented. In the case of symbolic learning with graph
4) List structures can be used in atomic formulas. networks, it was necessary to provide preliminarily forms of
5) Interpretation of inputs is possible even if they include atomic formulas or formulas to networks and to presume that
unknown atoms. forms of formulas were included in the data in advance. In the
6) Models can be trained with small training data. case of symbolic learning with feedforward networks, forms
The fourth feature regarding handling of unknown data can of atomic formulas and formulas depended on the network
be attributed to the implementation of Word2Vec, which is configuration and it was necessary to provide meta-rules to
an important achievement in the recent years in the field of the network in advance. Our proposed methods levy no such
neural network research. This method represents concepts restrictions on the number of terms included in an atomic
as vectors and thereby facilitates the estimation of similar- formula or the number of atomic formulas included in a
ities between concepts. Therefore, similarities between the formula, besides there is no need to give forms of the atomic
known and unknown data can be used to resolve handling of formulas to networks in advance.
unknown data.
Furthermore, we show an application by embedding B. QUESTION ANSWERING SYSTEMS
learned models into question answering systems. Most of the
After the emergence of deep learning, studies about question
conventional question answering systems [14], [15] are not
answering systems with deep learning [15], [35], [36] have
entirely capable of inferring from a large amount of infor-
been conducted, and the performance of these systems have
mation on the Web containing unknown data. The proposed
improved.
systems are designed to answer questions through powerful
These systems search for answer candidates from facts,
inferences based on first-order predicate logic by utilizing
select an answer from the existing candidates, and respond.
facts containing unknown data. By applying the proposed
Unlike the proposed system, these systems do not infer, but
systems, it will be possible to build applications such as
merely answer the facts.
high-performance frequently asked questions (FAQ) chat-
Additionally, studies have also been carried out to embed
bots [16], [17], decision support systems (DSS) [18], [19]
questions in vector representations and infer answers by deep
energy-efficient estimation in sensor networks [20]–[22]
learning [37]–[39]. However, even if unknown symbols have
using information on the Web. If these applications are real-
been included in questions, to our knowledge, no studies have
ized, users might be able to explore novel concepts or mine
previously embedded unknown symbols in internal repre-
hidden information.
sentations and inferred answers as proposed by the methods
Since connectionism in the 1990s, there have been very
described in this paper.
few studies in which the research areas of conventional
artificial intelligence and neural network have intersected.
This is a new research area spanning conventional artificial III. SYMBOLIC PROCESSING
intelligence and deep learning. Furthermore, this research Here, Prolog-like system is used for symbolic processing.
aims to accomplish a Prolog-like processing system using When a Prolog [40] processing system receives a question,
deep learning, and to our knowledge, this is a novel it refers to facts and rules stored in a knowledge base and
application. infers an answer. A question consists of one or more goals.
In this paper, we begin by reviewing related research work A Prolog processing system infers goals by backward rea-
in section II. In section III, we define and describe sym- soning. The following is a brief description of the operations
bolic processing which is the learning target. In section IV, that a Prolog processing system performs to infer answers.
we propose learning methods for symbolic processing
and building methods of question answering systems. A. UNIFICATION
Section V reports the experimental results of the proposed Unification is a process of determining whether two given
methods. terms are identical. In Prolog programs, it is possible to
1) LEARNING UNIFICATION
Fig. 5 shows a unification model used in the proposed sys- FIGURE 7. Example of increasing training data.
tems. The input to this model is questions to determine pos-
sibility of performing the unification process. The model is
trained so that it outputs ‘‘true.’’ if it matches and ‘‘false.’’ if ple, in the rule ‘‘father(bob, mary):- male(bob), spouse(bob,
it does not match. Although questions can include variables mary).’’, the input to the proposed model is ‘‘father(bob,
unlike that in Prolog processing systems, these variables are mary).’’ and the output is ‘‘male(bob), spouse(bob, mary).’’
not substituted and only whether they are identical is deter- When a word string forming a part of the head is input to
mined. For example, ‘‘male(X) = male(tom).’’ should output the resolution model, the embedding layer for input, converts
‘‘true.’’ instead of ‘‘X = tom.’’ After checking whether terms the string into a combined vector of Word2Vec and Gray
are identical, substitution is performed outside the model. code word embeddings, as per conventions defined in the
When a query in the form of a word string is input to unification learning processes. Subsequently, the combined
the unification model, the embedding layer for the input vector of Word2Vec and Gray code is passed to NMT. The
converts the string into a combined vector, a 300-dimensional output from NMT consists of the one-hot encoded word
word embedding, of Word2Vec and Gray code [47]. Common inserted in the embedding layer, and also the resolution result
nouns of atoms such as ‘‘male’’ are Word2Vec word embed- of the word string forming the body.
dings, however logical symbols such as ‘‘(‘‘, ’’)’’, ‘‘,’’, ‘‘.’’, When training resolution with NMT using the existing
and proper nouns of atoms such as ‘‘bob’’ are Gray code knowledge base, the volume of training data may not be
word embeddings. Gray code has a characteristic that the sufficient in some cases. In such a case, a method to augment
Hamming distance between adjacent codes is 1. Although the volume of proper noun data is applied. For example,
the input could include words not contained in the training imaginary proper nouns such as ‘‘bob-1’’ and ‘‘bob-2’’ are
data, output can be obtained, because Word2Vec is used for generated from ‘‘bob’’, and training data is increased as
embedding atoms. shown in Fig. 7.
Subsequently, the combined vector of Word2Vec and Gray
code embeddings is passed to NMT. The output from NMT 3) LEARNING MEMBERSHIP RELATION
consists of the one-hot encoded word inserted in the embed- Fig. 8 shows the membership relation model. The input to
ding layer, and also the unification result of the input word this model is a membership related question such as ‘‘mem-
string. ber(kansas, [colorado, nebraska, kansas] ).’’ The model is
trained so that it outputs ‘‘true.’’ if the object exists in the list
2) LEARNING RESOLUTION and ‘‘false.’’ if it does not exist. Besides, questions to these
Fig. 6 shows resolution model used in the proposed systems. models do not have variables.
When the head of a rule is input to this model, the model When a query in the form of a word string is input to
is trained so that it outputs the body of the rule. For exam- the membership relation model, the input embedding layer,
4) PROLOG-TO-TEXT
Prolog-to-Text converts system response Prolog to regular
text. An example of conversion by question patterns is shown
in Fig. 14. Conversion from Prolog to text is done with rules.
2) TRANSFORMER
Transformer [43] consists of two blocks, an Encoder Stack FIGURE 15. Seq2Seq with attention.
and a Decoder Stack. The Encoder Stack receives an input
sequence and the Decoder Stack returns the output sequence.
The Encoder and Decoder Stacks each have 6 Encoders and
6 Decoders.
The Encoder consists of a Feedforward and a Self-
Attention, 512-dimensional output and six layers. The
Decoder consists of a Feedforward, an Encoder-Decoder
Attention and a Self-Attention, 512-dimensional output and
six layers.
The Self-Attention is used to relate different positions
of a single sequence by computing a representation of the
sequence. The Encoder-Decoder Attention helps the Decoder
to focus on the appropriate parts of the input sequence.
The Feedforward uses Leaky ReLU [55] for the activation
function.
The dropout rate is 0.1, the batch size is 48 and 100 epochs
are trained. For optimization, we use Adam optimizer [53]
with α = 5e-5, β1 = 0.9, β2 = 0.98, eps = 1e-9.
D. WORD2VEC
Word2Vec is a method for obtaining the vector representa-
tion of each word from a large amount of text data using
a neural network. We apply pre-trained Word2Vec called
GoogleNews-vectors-negative300 [40]–[42] to the unifica-
tion, resolution and membership relation models.
GoogleNews-vectors-negative300 is trained on a dataset
of approximately 100 billion words. The model has
300-dimensional vectors for 3 million words and phrases.
FIGURE 16. Transformer.
V. EVALUATION EXPERIMENTS
Using knowledge bases described in Prolog, we trained mod-
els, built question answering systems, and evaluated their
performance. Specifically, we conducted experiments using
two kinds of knowledge bases with graph structures.
TABLE 1. Results of models by kinsources. TABLE 3. Results of resolution model with unknown data.
TABLE 5. Results of models by geoquery. TABLE 6. Results of resolution model with unknown data.
0.614 and the maximum value is 1.00. Although it varies [9] M. Schlichtkrull, T. N. Kipf, P. Bloem, R. van den Berg, I. Titov, and
depending on the question types and response types, practical M. Welling, ‘‘Modeling relational data with graph convolutional net-
works,’’ in Proc. Eur. Semantic Web Conf. (ESWC), 2018, pp. 593–607.
question answering systems could likely be built based on the [10] T. Trouillon, J. Welbl, S. Riedel, E. Gaussier, and G. Bouchard, ‘‘Complex
existing knowledge bases. We do not learn the case where res- embeddings for simple link prediction,’’ in Proc. 33nd Int. Conf. Mach.
olution is impossible in the resolution model incorporated in Learn. (ICML), New York, NY, USA, 2016, 2016 pp. 2071–2080.
[11] T. Rocktaschel and S. Riedel, ‘‘End-to-end differentiable proving,’’ in
the system this time. If it becomes possible to judge whether
Proc. Annu. Conf. Neural Inf. Process. Syst., 2017, pp. 3788–3800.
or not resolution is impossible in the resolution model, there [12] P. Minervini, M. Bosnjak, T. Rocktäschel, and S. Riedel, ‘‘Towards neural
is a possibility that the correct answer rate when inputting theorem proving at scale,’’ 2018, arXiv:1807.08204. [Online]. Available:
‘‘Negative of What-questions’’ which is the lowest this time https://arxiv.org/abs/1807.08204
[13] H. Dong, J. Mao, T. Lin, C. Wang, L. Li, and D. Zhou, ‘‘Neural logic
is improved. machines,’’ in Proc. Int. Conf. Learn. Represent., New Orleans, LA, USA,
2019, pp. 1–22.
VI. CONCLUSION AND FUTURE WORK [14] S. Tellex, B. Katz, J. Lin, G. Marton, and A. Fernandes, ‘‘Quantitative
evaluation of passage retrieval algorithms for question answering,’’ in Proc.
In this paper, we have proposed the methods to learn symbolic 26th Annu. Int. ACM SIGIR Conf. Res. Develop. Inf. Retr., Toronto, ON,
processing using deep learning and the methods to build ques- Canada, Aug. 2003, pp. 41–47.
tion answering systems using the trained model. Experimen- [15] R. Sequiera, G. Baruah, Z. Tu, S. Mohammed, J. Rao, H. Zhang, and J. Lin,
‘‘Exploring the effectiveness of convolutional neural networks for answer
tal results on the training of the symbolic processing models selection in end-to-end question answering,’’ 2017, arXiv:1707.07804.
show that the proposed methods have rich representations and [Online]. Available: https://arxiv.org/abs/1707.07804
high robustness and that these models can learn even from [16] N. T. Thomas, ‘‘An e-business chatbot using AIML and LSA,’’ in Proc. Int.
Conf. Adv. Comput., Commun. Inform. (ICACCI), Jaipur, India, Sep. 2016,
small-scale data. Especially by using Word2Vec, the ability to pp. 2740–2742.
handle unknown data will be a great contribution to artificial [17] L. Cui, F. Wei, S. Huang, C. Tan, C. Duan, and M. Zhou, ‘‘SuperA-
intelligence research. Moreover, experimental results of the gent: A customer service chatbot for e-commerce websites,’’ in Proc.
question answering systems suggested that practical question 55th Annu. Meeting Assoc. Comput. Linguistics-Syst. Demonstrations,
Jul. 2017, pp. 97–102.
answering systems could be built from the knowledge bases [18] H. Bhargava and D. Power, ‘‘Decision support systems and Web technolo-
written in Prolog. Building such systems will be extremely gies: A status report,’’ in Proc. Amer. Conf. Inf. Syst., Boston, MA, USA,
difficult using a conventional connectionism-based method. Dec. 2001, p. 46.
[19] M. S. Kohn, J. Sun, S. Knoop, A. Shabo, B. Carmeli, D. Sow,
This study pertains a new research domain that spans T. Syed-Mahmood, and W. Rapp, ‘‘IBM’s health analytics and clini-
the areas of conventional artificial intelligence and neural cal decision support.,’’ Yearbook Med. Inf., vol. 9, no. 1, pp154-162,
networks. We conducted experiments using simple datasets. Aug. 2014.
However, through the experimental results, we recommend [20] A. Sodhro, Y. Li, and M. Shah, ‘‘Energy-efficient adaptive transmission
power control for wireless body area networks,’’ IET Commun., vol. 10,
its potential applications in areas such as FAQ chatbots, DSS no. 1, pp. 81–90, Jan. 2016.
and energy-efficient estimation in sensor networks. If these [21] A. Sodhro, S. Pirbhulal, M. Lodro, and M. Shah, ‘‘Energy-efficiency in
applications can respond to information not only by facts but wireless body sensor networks,’’ in Networks of the Future Architectures,
Technologies, and Implementations. Boca Raton, FL, USA: CRC Press,
also by inferring from a large amount of information on the 2017, p. 492.
Web, the proposed systems might be able to contribute great [22] A. Sodhro, A. Sangaiah, G. Sodhro, A. Sekhari, Y. Ouzrout, and
value to society. S. Pirbhulal, ‘‘Energy-efficiency of tools and applications on Internet,’’
in Computational Intelligence for Multimedia Big Data on the Cloud
Future work includes symbolic processing to analyze with Engineering Applications (Intelligent Data-Centric Systems: Sensor
large-scale data present on the Web and inductive inference Collected Intelligence). Amsterdam, The Netherlands: Elsevier, 2018.
with deep learning-based symbolic processing. [23] A. S. d’Avila Garcez, K. Broda, and D. M. Gabbay, Neural-Symbolic
Learning Systems: Foundations and Applications. London, U.K.:
Springer-Verlag, 2002.
REFERENCES [24] J. W. Shavlik and G. G. Towell, ‘‘An approach to combining explanation-
based and neural learning algorithms,’’ Connection Sci., vol. 1, no. 3,
[1] R. K. Lindsay, B. G. Buchanan, E. A. Feigenbaum, and J. Lederberg,
pp. 231–253, 1989.
Applications of Artificial Intelligence for Organic Chemistry: The Dendral
Project. New York, NY, USA: McGraw-Hill, 1980. [25] G. G. Towell and J. W. Shavlik, ‘‘Knowledge-based artificial neural net-
works,’’ Artif. Intell., vol. 70, nos. 1–2, pp. 119–165, Oct. 1994. doi:
[2] J. R. Quinlan, ‘‘Induction of decision trees,’’ Mach. Learn., vol. 1, no. 1,
10.1016/0004-3702(94)90105-8.
pp. 81–106, 1986.
[26] A. S. A. Garcez and G. Zaverucha, ‘‘The connectionist inductive learning
[3] S. Muggleton, ‘‘Inductive logic programming,’’ New Gener. Comput.,
and logic programming system,’’ Appl. Intell., vol. 11, no. 1, pp. 59–77,
vol. 8, no. 4, pp. 295–318, Feb. 1991. [Online]. Available: http://www.
Jul. 1999. doi: 10.1023/A:1008328630915.
doc.ic.ac.uk/~shm/Papers/ilp.pdf. doi: 10.1007/BF03037089.
[27] L. Shastri, ‘‘Neurally motivated constraints on the working memory capac-
[4] G. Brewka, Nonmonotonic Reasoning: Logical Foundations of Common- ity of a production system for parallel processing: Implications of a
sense. Cambridge, U.K.: Cambridge Univ. Press, 1991. connectionist model based on temporal synchrony,’’ in Proc. 14th Annu.
[5] J. Doyle, ‘‘The ins and outs of reason maintenance,’’ in Proc. 8th Int. Joint Conf. Cognit. Sci. Soc. Bloomington, IN, USA: Psychology Press, vol. 14,
Conf. Artif. Intell. (IJCAI), Los Altos, CA, USA, 1983, pp. 349–351. Jul./Aug. 1992, p. 159.
[6] A. C. Kakas, R. A. Kowalski, and F. Toni, ‘‘Abductive logic programming,’’ [28] L. Ding, ‘‘Neural prolog-the concepts, construction and mechanism,’’ in
J. Log. Comput., vol. 2, no. 6, pp. 719–770, Dec. 1993. doi: 10.1093/log- Proc. IEEE Int. Conf. Syst., Man Cybern., Intell. Syst. 21st Century, vol. 4,
com/2.6.719. Oct. 1995, pp. 3603–3608.
[7] G. E. Hinton, ‘‘Preface to the special issue on connectionist symbol pro- [29] M. V. M. Franca, G. Zaverucha, and A. S. d’Avila Garcez, ‘‘Fast relational
cessing,’’ Artif. Intell., vol. 46, nos. 1–2, pp. 1–4, Nov. 1990. learning using bottom clause propositionalization with artificial neural
[8] D. S. Touretzky, ‘‘BoltzCONS: Dynamic symbol structures in a connec- networks,’’ Mach. Learn., vol. 94, no. 1, pp. 81–104, Jan. 2014. doi:
tionist network,’’ Artif. Intell., vol. 46, nos. 1–2, pp. 5–46, Nov. 1990. 10.1007/s10994-013-5392-1.
[30] E. Komendantskaya, ‘‘Unification neural networks: Unification by error- [51] S. Hochreiter and J. Schmidhuber, ‘‘Long short-term memory,’’ Neural
correction learning,’’ Log. J. IGPL, vol. 19, no. 6, pp. 821–847, Dec. 2011. Comput., vol. 9, no. 8, pp. 1735–1780, 1997.
doi: 10.1093/jigpal/jzq012. [52] J. L. Elman, ‘‘Finding structure in time,’’ Cognit. Sci., vol. 14, no. 2,
[31] S. Holldobler, ‘‘A structured connectionist unification algorithm,’’ in Proc. pp. 179–211, Mar. 1990.
8th Nat. Conf. Artif. Intell., Boston, MA, USA, vol. 2, 1990, pp. 587–593. [53] I. Goodfellow, D. Warde-Farley, M. Mirza, A. Courville, and Y. Bengio,
[32] G. Sourek, V. Aschenbrenner, F. Zelezny, and O. Kuzelka, ‘‘Lifted rela- ‘‘Maxout networks,’’ in Proc. 30th Int. Conf. Mach. Learn., Atlanta, GA,
tional neural networks,’’ in Proc. Int. Conf. Cogn. Comput., Integrating USA, 2013, pp. 1–9.
Neural Symbolic Approaches, Montreal, QC, Canada, 2015. [54] D. Kingma and J. Ba, ‘‘Adam: A method for stochastic optimiza-
[33] W. W. Cohen, ‘‘Tensorlog: A differentiable deductive database,’’ 2016, tion,’’ 2014, arXiv:1412.6980. [Online]. Available: https://arxiv.org/
arXiv:1605.06523. [Online]. Available: https://arxiv.org/abs/1605.06523 abs/1412.6980
[34] L. Serafini and A. S. d’Avila Garcez, ‘‘Logic tensor networks: Deep [55] A. Maas, A. Hannu, and A. Ng, ‘‘Rectifier nonlinearities improve neural
learning and logical reasoning from data and knowledge,’’ in Proc. 11th network acoustic models,’’ in Proc. 30th Int. Conf. Mach. Learn., Atlanta,
Int. Workshop Neural-Symbolic Learn. Reasoning (NeSy), New York, NY, GA, USA, 2013, p. 3.
USA, 2016, pp. 1–12. [56] Kinsources: A Collaborative Web Platform for Kinship Data Sharing.
[35] T. Lai, T. Bui, S. Li, and N. Lipka, ‘‘A simple end-to-end question answer- Accessed: May 19, 2018. [Online]. Available: https://www.kinsources.net/
ing model for product information,’’ in Proc. 1st Workshop Econ. Natural [57] L. R. Tang and R. J. Mooney, ‘‘Automated construction of database inter-
Lang. Process., Jul. 2018, pp. 38–43. faces: Integrating statistical and relational learning for semantic parsing,’’
[36] Y. Tay, L. A. Tuan, and S. C. Hui, ‘‘Hyperbolic representation learning for in Proc. SIGDAT Conf. Empirical Methods Natural Lang. Process. Very
fast and efficient neural question answering,’’ in Proc. 11th ACM Int. Conf. Large Corpora (EMNLP/VLC), Hong Kong, Oct. 2000, pp. 133–141.
Web Search Data Mining, Los Angeles, CA, USA, Feb. 2018, pp. 583–591.
[37] B. Peng, Z. Lu, H. Li, and K. Wong, ‘‘Towards neural network-based
reasoning,’’ 2015, arXiv:1508.05508. [Online]. Available: https://arxiv.
org/abs/1508.05508
[38] D. Weissenborn, ‘‘Separating answers from queries for neural read-
ing comprehension,’’ 2016, arXiv:1607.03316. [Online]. Available:
https://arxiv.org/abs/1607.03316 HIROSHI HONDA received the B.E. and M.E.
[39] Y. Shen, P. Huang, J. Gao, and W. Chen, ‘‘Reasonet: Learning to stop degrees in administration engineering from Keio
reading in machine comprehension,’’ in Proc. 23rd ACM SIGKDD Int. University, Yokohama, Japan, in 2003 and 2005,
Conf. Knowl. Discovery Data Mining, Barcelona, Spain, Aug. 2017, respectively, where he is currently pursuing
pp. 1047–1055. the Ph.D. degree in information and computer
[40] I. Bratko, Prolog Programming for Artificial Intelligence. 2nd ed. Reading, science.
MA, USA: Addison-Wesley, 1990, p. 597. From 2005 to 2014, he was a Software Engi-
[41] I. Sutskever, O. Vinyals, and Q. V. Le, ‘‘Sequence to sequence learning neer with Mitsubishi Electric Corporation. From
with neural networks,’’ in Proc. NIPS, Montreal, QC, Canada, 2014, 2014 to 2016, he was a Software Engineer with
pp. 3104–3112.
Fuji Xerox Company Ltd. Since 2017, he has been
[42] D. Bahdanau, K. Cho, and Y. Bengio, ‘‘Neural machine translation by
jointly learning to align and translate,’’ in Proc. ICLR, San Diego, CA,
a Researcher with Honda R&D Company Ltd. His research interests include
USA, 2015, pp. 1–15. symbolic processing using deep learning and dialogue systems.
[43] A. Vaswani, N. Shazeer, N. Parmar, J. Uszkoreit, L. Jones, A. Gomez,
L. Kaiser, and I. Polosukhi, ‘‘Attention Is All You Need,’’ in Proc.
31st Conf. Neural Inf. Process. Syst., Long Beach, CA, USA, 2017,
pp. 5998–6008.
[44] T. Mikolov, K. Chen, G. Corrado, and J. Dean, ‘‘Efficient estimation of MASAFUMI HAGIWARA (M’89–SM’04)
word representations in vector space,’’ in Proc. ICLR, Scottsdale, AZ, received the B.E., M.E., and Ph.D. degrees
USA, 2013, pp. 1–12. in electrical engineering from Keio University,
[45] T. Mikolov, I. Sutskever, K. Chen, G. Corrado, and J. Dean, ‘‘Distributed Yokohama, Japan, in 1982, 1984, and 1987,
representations of words and phrases and their compositionality,’’ in Proc. respectively. Since 1987, he has been with Keio
NIPS, Lake Tahoe, NV, USA, 2013, pp. 3111–3119. University, where he is currently a Professor. From
[46] T. Mikolov, W. Yih, and G. Zweig, ‘‘Linguistic regularities in continuous 1991 to 1993, he was a Visiting Scholar with
space word representations,’’ in Proc. NAACL HLT, Atlanta, GA, USA, Stanford University. His research interests include
2013, pp. 746–751.
neural networks, fuzzy systems, and affective
[47] F. Gray, ‘‘Pulse code communication,’’ U.S. Patent 2 632 058 A,
Mar. 17, 1953.
engineering. He is a member of IEICE, IPSJ, JSAI,
[48] H. Kanayama, Y. Miyao, and J. Prager, ‘‘Answering Yes/no questions via SOFT, IEE of Japan, and JNNS. He received from the IEEE Consumer
question inversion,’’ in Proc. 24th Int. Conf. Comput. Linguistics, Mumbai, Electronics Society Chester Sall Award, in 1990, the Author Award from
India, Dec. 201, pp. 1377–1392. Japan Society of Fuzzy Theory and Systems, in 1996, the Technical Award
[49] D. Ravichandran and E. Hovy, ‘‘Learning surface text patterns for a and Paper Awards from Japan Society of Kansei Engineering, in 2003, 2004,
question answering system,’’ in Proc. 40th Annu. Meeting Assoc. Comput. and 2014, respectively, and the Best Research Award from Japanese Neural
Linguistics, Jul. 2002, pp. 41–47. Network Society, in 2013. He was the President of the Japan Society for
[50] R. Higashinaka and H. Isozaki, ‘‘Corpus-based question answering for Fuzzy Theory and Intelligent Informatics (SOFT).
why-questions,’’ in Proc. IJCNLP, Hyderabad, India, 2008, pp. 418–425.