0% found this document useful (0 votes)
20 views6 pages

Incorporating Domain Knowledge and Semantic

The document discusses incorporating domain knowledge and semantic information into language models to improve commonsense question answering (CSQA). It proposes an approach that first extracts knowledge from existing resources by jointly learning to ask and answer questions as well as perform semantic role labeling. It then uses semantic role labeling to help the system understand relations between relevant entities. Experimental results on several CSQA benchmarks demonstrate the effectiveness of incorporating domain knowledge and semantic information.

Uploaded by

Ceeta Quality
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
20 views6 pages

Incorporating Domain Knowledge and Semantic

The document discusses incorporating domain knowledge and semantic information into language models to improve commonsense question answering (CSQA). It proposes an approach that first extracts knowledge from existing resources by jointly learning to ask and answer questions as well as perform semantic role labeling. It then uses semantic role labeling to help the system understand relations between relevant entities. Experimental results on several CSQA benchmarks demonstrate the effectiveness of incorporating domain knowledge and semantic information.

Uploaded by

Ceeta Quality
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 6

Proceedings of the 2021 IEEE 24th International Conference on Computer Supported Cooperative Work in Design

Incorporating Domain Knowledge and Semantic


Information into Language Models for
2021 IEEE 24th International Conference on Computer Supported Cooperative Work in Design (CSCWD) | 978-1-7281-6597-4/21/$31.00 ©2021 IEEE | DOI: 10.1109/CSCWD49262.2021.9437862

Commonsense Question Answering


Ruiying Zhou1,2 , Keke Tian 1,2 , Hanjiang Lai1,2 , Jian Yin1,2,3,∗
1 School
of Computer Science and Engineering, Sun Yat-sen University
2 Guangdong Key Laboratory of Big Data Analysis and Processing, Guangzhou 510006, P.R.China
3 School of Artificial Intelligence, Sun Yat-sen University
{zhoury9, tiankk}@mail2.sysu.edu.cn, {laihanj3, issjyin}@mail.sysu.edu.cn

Abstract—Commonsense question answering (CSQA) aims to Content: Sydney is a fan of Hillary Clinton. One day she
answer questions which require the system to understand related found a biography of Hillary Clinton. Sydney
commonsense knowledge that is not explicitly expressed in the wanted to read it.
given context. Recent advance in neural language models (e.g., Question: Why did Sydney do this?
BERT) that are pre-trained on a large-scale text corpus and
fine-tuned on downstream tasks h as boosted the performance on Answers: (A) know more about Hillary Clinton
CSQA. However, due to the lack of domain knowledge (e.g., in (B) become friends with Hillary Clinton
social situations), these models fail to reason about specific tasks. (C) get her glasses
In this work, we propose an approach to incorporate domain
knowledge and semantic information into language model based Fig. 1. A context-question-answers triple from SocialIQA dataset.
approaches for better understanding the related commonsense
knowledge. Firstly, we extract the knowledge from existing re-
sources by jointly learning to ask and answer as well as semantic
role labeling based answering. These two tasks are correlated incorporating domain knowledge into language models using
and can reinforce each other to discover the domain knowledge. external knowledge such as ConceptNet [13]–[15]. However, it
Then, we utilize Semantic Role Labeling to enable the system to is hard to obtain domain knowledge in some specific domains
gain a better understanding of relations among relevant entities. like in the social, medical and education fields. To address
Experimental results on several CSQA benchmarks demonstrate
this problem, an alternative solution is to make full use of
the effectiveness of the proposed approach.
Index Terms—commonsense question answering, language the provided dataset to incorporate domain knowledge into
models, question generation, semantic role labeling models. Recent works [5], [16], [17] directly concatenate
the context with a question and candidate answers without
I. I NTRODUCTION considering syntax and semantic information to calculate a
score for each candidate answer. In this paper, two novel
Commonsense question answering (CSQA) is a funda- components are proposed for effective knowledge acquisition.
mental problem in natural language processing that requires
First, we propose to combine learning-to-ask and semantic-
reasoning over commonsense knowledge related to questions
role-labeling based answering into a unified framework. In-
to predict the right answer. Such commonsense reasoning skill
stead of directly fine-tuning language models on downstream
is common to humans, but it is still challenging in artificial
tasks, we train them further by teaching them to ask questions
intelligence [1], [2], and it has potential educational value [3].
in a given context and then answer the generated questions,
Recently, several datasets for CSQA have been released to
which can enable models to understand related knowledge.
address the problem, such as CommonsenseQA [4], SocialIQA
For example, Sydney’s motivation after the event happened,
[5] and other types of commonsense inferences like Atomic
as shown in Figure 1.
[6], Event2mind [7] and so on. In this work, we focus on the
multiple-choice problem, such as an example shown in Figure Second, we utilize Semantic Role Labeling [18] to extract
1. arguments (i.e., subjective and objective) for each prediction
Recently, language models pre-trained from a large-scale as semantic information to better understand relations among
text corpus, e.g., ELMo [8], BERT [9] and XLNet [10], have relevant entities. This is motivated by that the relations among
achieved promising improvements on CSQA. However, since entities are important for modeling the reasons in the task of
the scope of commonsense knowledge in text corpora (e.g., CSQA. For example, as shown in Figure 1, the relation (i.e.,
Wikipedia) used to pre-train language models is limited [11], fan) between two people can help the model to know Hillary
[12], these models still struggle to reason about specific do- Clinton is a famous person, and Sydney read the biography to
mains, such as in social situations. There are recent attempts at know more about Hillary Clinton, which can guide the model
to choose the right answer.
* Corresponding author is Jian Yin We evaluate our approach on the SocialIQA [5] dataset,

978-1-7281-6597-4/21/$31.00 ©2021 IEEE 1160

Authorized licensed use limited to: California State University Fresno. Downloaded on July 01,2021 at 18:17:02 UTC from IEEE Xplore. Restrictions apply.
which contains 38k multiple-choice questions regarding social However, it is hard to obtain domain knowledge in some
events. We also evaluate on the Choice of Plausible Alter- specific domains like in the social, medical and education
natives [19] (COPA) dataset, which contains 1000 questions fields [6], [11], [12]. Therefore, we propose to make full
and the Winograd Schema Challenge [20] (WSC) dataset with use of the provided dataset to incorporate domain knowledge
285 examples. We gain 77.5%, 84.3% and 73.2% accuracy into models. In this work, we utilize RoBERTa [27] as our
scores on SocialIQA, COPA and WSC, respectively. The baseline and propose to pre-train the language model further
experimental results show that our approach brings further by learning to ask questions in a given context and then an-
improvement compared with the baselines. The improvement swer the generated questions [32], which can transfer domain
can be attributed to incorporating domain knowledge and knowledge to the model.
semantic information between entities that was conveyed in
the content. III. M ETHODS AND F RAMEWORK
A. Task Definition
II. RELATED WORK Commonsense question answering aims to answer questions
A. Commonsense Reasoning that require reasoning through commonsense knowledge. This
paper focuses on the task of multiple-choice questions that
Commonsense question answering is a fundamental prob-
requires commonsense-based reading comprehension. More
lem in NLP that requires reasoning over external knowledge
specially, given a context c = {c0 , c1 , ..., cn } and a related
related to questions to predict the right answer. Recently, some
question q = {q0 , q1 , ..., qm }, the task aims to predict the
tasks have been released to address this problem like ATOMIC
right answer a = {a0 , a1 , .., ak } from a set of candidate
[6], Event2Mind [7], SocailIQA [5]. Story Cloze Test [21]
answers A with the help of commonsense-based machine
aims to choose the right ending from a set of candidate endings
reading comprehension.
given the context of stories. SWAG [22] and HellaSWAG [23]
are the same tasks that predict the next sentence following B. Overview of the Approach
an initial event. Compared with SWAG, HellaSWAG is a In this section, we give an overview of our approach as
more challenging dataset as it is collected via adversarial shown in Figure 2, which consists of several components. We
filtering. Recently proposed CommonsenseQA [24] dataset use RoBERTa [27] as the encoder that takes a sentence as
derived from ConceptNet [25] aims to predict the right choice the input and computes the semantic representation for each
to a given question, which requires reasoning over external word in the sentence. We adopt a two-stage strategy to train
commonsense knowledge. This paper mainly focuses on the the RoBERTa encoder. First, to incorporate domain knowledge
task of multiple-choice questions that require commonsense- into the encoder, we first pre-train the encoder by learning to
based reading comprehension [5], [19], [20]. More specially, ask questions in a given context and then answer the generated
given a context and a question about it, the task aims to predict questions in the Learning to Ask (the top left panel in Figure
the right answer from a set of candidate answers based on 2), which encourages better semantic representation generated
commonsense-based machine reading comprehension. by the encoder in a specific domain.
Second, we then fine-tune the RoBERTa to compute the
B. Transfer Knowledge
score for each candidate answer in the Semantic Role La-
The challenge of commonsense reasoning is how to ex- beling Based Answering (the top right panel in Figure 2).
tract and utilize knowledge to answer questions. Transferring Specifically, we utilize Semantic Role Labeling [33] (SRL)
knowledge from external text corpus into models plays a to extract triples hsubjective, predicate, objectivei from the
vital role in commonsense reasoning. There are two lines of context as an extracted input to empower the model to un-
works on transferring knowledge, categorized by the type of derstand relations among entities. In the following, we will
supervision used for model learning. The first line of work describe these components in detail.
pre-trains models via transferring learning from unsupervised
text (e.g., Wikipedia and BookCorpus). Typically, pre-trained C. RoBERTa as the Encoder
language models like ELMo [8], GPT [26], BERT [9], XLNet An encoder is to convert words in a sentence to continue
[10], RoBERTa [27] have achieved significant improvements vector representations. We use RoBERTa [27] as our encoder,
in commonsense reasoning, which is first pre-trained from where the architecture is the same as BERT [9]. RoBERTa
large-scale unstructured text via predicting masked or next is a language model pre-trained on large scale corpus by
words in a sentence and then fine-tuned on downstream tasks. learning to predict masked words, which exhibits significant
The second line of work is to extract external knowledge from improvement in several NLP tasks. Specifically, RoBERTa
text corpus or knowledge base. NF Rajani [28] explored a takes a sequence x = {x0 , x1 , . . . , xn } as the input and
way to generate explanations to solve commonsense question transforms each token xi in sequence to an embedding vector
answering. Lin et al. and Shangwen et al. [29], [30] proposed ei . Then the encoder feeds the sequence of embedding vectors
to extract evidence from ConceptNet and then use Graph e = {e0 , e1 , . . . , en } into the 12 stacked Transformer blocks
Neural Networks [31] to reason through extracted knowledge [34] to calculate contextual representation for each token, de-
to answer commonsense questions. noted as hi . Finally, we can obtain contextual representations

1161

Authorized licensed use limited to: California State University Fresno. Downloaded on July 01,2021 at 18:17:02 UTC from IEEE Xplore. Restrictions apply.
Learning to Ask Semantic Role Labeling Based Answering
Answer Selection
Question & Answer Sequence
! = {!$ , !& , … , !ℓ }
Answer prob
Decoder
Feed Forward softmax

Masked Multi-Head Attention Fully Connected

RoBERTa
Transformer block
…… Transformer block
Transformer block 12

…… Embeddings

… … … … …
Context Context Question SRL Triples Candidate Answer

Fig. 2. An overview of our approach. The approach consists of two stages, i.e., Learning to Ask and Semantic Role Labeling Based Answering in details.
The top left panel shows the Learning to Ask, and the top right panel is Semantic Role Labeling Based Answering on downstream tasks. Two stages share
the same RoBERTa encoder.

{h0 , h1 , . . . , hn } for the input sequence {x0 , x1 , . . . , xn } after headi = Attention(QWiQ , KWiK , V WiV ) (3)
feeding them into the encoder. The transformation function is
QK T
encapsulated as: Attention(Q, K, V ) = sof tmax( √ )V (4)
dk
where u is the number of heads in the multi-head attention
{h0 , h1 , . . . , hn } = RoBERT a({x0 , x1 , . . . , xn }) (1) layer, the projections W O , WiQ , WiK , WiV are model param-
Note that for different stages, the input sequence will be eters and dk is dimension of keys V .
organized in different forms. In the Learning to Ask stage, we In Learning to Ask stage, we take the context c =
directly feed the context into the encoder. In the semantic role {c0 , c1 , . . . cn } as the input, and use RoBERTa to obtain
labeling based answering stage, the input is the concatenation contextual representations H c of the input.
of content, question, SRL triples and answer. H c = {hc0 , hc1 , . . . , hcn } = RoBERT a({c0 , c1 , . . . cn }) (5)
D. Learning to Ask At each time-step t, we obtain current hidden state st by the
Instead of directly fine-tuning the model on downstream multi-head attention mechanism, where yte is the embedding
e
tasks, we first pre-train the RoBERTa encoder by learning of current input token and y<t is an embedding matrix for the
to ask questions in a given context and then answer the generated sequence.
generated questions. The motivation of the Learning to Ask
stage is to adapt RoBERTa encoder to specific fields in st = M ultihead(yte , y<t
e e
, y<t ) (6)
downstream tasks and fully utilize the information from the The decoder takes current hidden state st and contextual
given context. The aim of the Learning to Ask stage is to representations H c as the input of the second multi-head
generate a question and its corresponding answer, denoted as attention layer to calculate a context vector ct .
y = {q, a} = {y0 , y1 , .., yl }.
The decoder is a multi-head attention based 6-layer trans- ct = M ultihead(st , H c , H c ) (7)
former. The multi-head attention mechanism maps query
Q and a set of key-value pairs (K and V ) to an output In order to enhance the non-linear fitting ability of the
M ultihead(Q, K, V ), where the query, keys, values, and model, we further feed ot into a 2-layer feed forward network
output are all vectors. The mechanism allows the model to with the ReLU activation function. The output c0t of the feed
jointly acquire information (i.e., output) from values through forward network is used to predict probability p(yt |yt−1 , c) of
the query, which is formulated as follows. the next word yt .

M ultihead(Q, K, V ) = Concat(head1 , .., headu )W O (2) p(yt |yt−1 , c) = sof tmax(c0t W ) (8)

Authorized licensed use limited to: California State University Fresno. Downloaded on July 01,2021 at 18:17:02 UTC from IEEE Xplore. Restrictions apply.
Content TABLE I
M AIN RESULT. T HIS RESULT IS BASED ON ACCURANCE .
Sydney is a fan of Hillary Clinton. One day she found a
biography of Hillary Clinton. Sydney wanted to read it. SocialIQA COPA WSC
Dev Test Test Test
GPT 63.3 63 - -
< Sydney , is , a fan of Hillary Clinton > BERT-large 66 64.5 80.8 67
< she , read , it > XLNet-large 68.2 67.9 82.6 71.3
< she , found , a biography of Hillary Clinton > RoBERTa-large 76.0 75.6 83.7 72.9
SRL Triples Ours-large 77.9 77.5 84.3 73.2
< she , wanted , to read it >

TABLE II
A BLATIONS RESULTS ON S OCIAL IQA DATASET.

Fig. 3. A Semantic Role Labeling analysis result of the content in the


example. backbone method SRL QG SocialIQA COPA WSC
(a) Baseline - - 75.6 83.7 72.9
(b) +QG X 75.8 83.9 72.9
RoBERTa
Here, we pre-train the encoder by maximizing the likelihood (c) +SRL X 77 84.2 73.1
(d) Ours X X 77.5 84.3 73.2
of the target sentence y as follow:
|y|−1
X
logp(y|c) = logp(yt |yt−1 , c) (9) A. Description of datasets
t=0
• SocialIQA Social Intelligence Question Answer [5], con-
E. Semantic Role Labeling Based Answering tains 37,588 examples is built by crowdsourcing, and
This module aims at reasoning from the relations among is split into train/dev/test sets, each of which contains
entities in the given content to further improve the performance 33.4k/1.9k/2.2k examples.
of the model. Specifically, we use Semantic Role Labeling • COPA The Choice of Plausible Alternatives [19] is a two-
to extract triples, denoted as r = {h, subj, pred, obj, i} = way multiple-choice dataset. It contains 1000 questions
{r0 , r1 , .., rj }, from the content of the sample sentence. For (500 dev, 500 test) asking about the causes or effects
instance, as shown in Figure 3, the description of Sydney can of ordinary things. We use the model trained on the
be analyzed into four triple samples. The extracted triples show SocialIQA training set and finetune it on COPA dev set,
the relations among different entities. We can know clearly that and report performance on the test set.
the description and the scene of Sydney, and the motivation • WSC The Winograd Schema Challenge [20] is also a
of Sydney. These triples could facilitate the ability of the two-way multiple-choice dataset of short passages with a
model to reason in the given context. We make the triples as a target pronoun. It contains 285 examples.
concatenation with the input sequence to finetune the model.
In this part, we take the content, SRL triples, question, B. Baseline models
candidate answers sequence z = {c, q, r, a} = {z0 , z1 , .., zm } • BERT BERT [9] aims to pre-train deep bidirectional rep-
as the input, and compute the representation by RoBERTa as: resentations from unlabeled text by jointly conditioning
on both left and right context in all layers.
H z = {hz0 , hz1 , . . . , hzm } = RoBERT a({z0 , z1 , . . . zm }) • XLNet XLNet [10] is a generalized Auto-Regressive
(10) (AR) pretraining method that uses a permutation language
At the end of the decoder, we use an n-to-1 fully connected modeling objective to combine the advantages of AR and
layer that calculates a score according to the output of the last Auto Encoding (AE) methods. XLNet is worked with the
layer. AR objective, including integrating Transformer-XL and
careful design of the two-stream attention mechanism.
IV. E XPERIMENTS
• RoBERTa RoBERTa [35] performs better than BERT
In this section, we conduct experiments on SocialIQA [5] with bigger batches over more data; removing the next
dataset. SocialIQA is the most large-scale QA dataset using sentence prediction objective; training on longer se-
commonsense reasoning about social situations, which is a quences; and dynamically changing the masking pattern
three-way multiple-choice task for probing emotional and applied to the training data.
social intelligence in a variety of everyday situations. We
use AllenNLP during Semantic Role Labeling to emphasize C. Implementation Details
the latent predicate-argument structure of the content, which Our model is trained in two stages, including domain-
represents the meaning of the content, including “who” did specific Learning to Ask stage and Semantic Role Label-
“what” to “whom”, etc. We also evaluate our model on two ing Based Answering stage. In the former stage, we train
commonsense challenge datasets: the COPA [19] the WSC RoBERTa encoder and Transformer decoder to ask and answer
[20]. the question in a given context, which enables the model

1163

Authorized licensed use limited to: California State University Fresno. Downloaded on July 01,2021 at 18:17:02 UTC from IEEE Xplore. Restrictions apply.
TABLE III
E XAMPLES OF S OCIAL IQA TEST SET WHICH OUR MODEL MADE THE WRONG PREDICTION (♠:O URS PREDICTION ,X: TRUE CORRECT ANSWER ).

Content Question Answers


X A. go home and shower
1. Robin went with Jan’s friends to the park to play
What will Jan want to do next? B. proud
some games for hours.
♠ C. go home and sleep
X A. lonely
2. Jan’s friends have been paying more attention to
How would Jan feel as a result? B. rich
Robin lately.
♠ C. would be happy for her
♠ A. like he’d done a good deed
3. Remy gave a bunch of coats to his friends who
How would you describe Remy? B. selfish
were cold.
X C. giving

to fully understand the domain-specific context. In the latter predicate-objective relationship that helps our model on
stage, we finetune RoBERTa with a fully connected layer how to make the correct predictions.
above to compute a score for each candidate answer. It is 2) QG By applying Learning to Ask stage (Table II (c)), we
worth noting that we incorporate semantic role labeling to could achieve consistent improvements on three datasets.
extract triples, which is concatenated as an extra input to the This demonstrates the effectiveness of learning specific
encoder. The maximum length of input is restricted as 64. knowledge by asking questions about the content with-
For RoBERTa-large encoder, the number of layers is 24, the out adding more external knowledge.
hidden size is 1024 and the number of attention heads is 16. Finally, our model (Table II (d)) achieves significant im-
We train a 12-layer Transformer decoder, where the number of provements (1.9 / 0.6 / 0.3 accuracy increase) on three
attention heads and the hidden size is the same as the encoder, datasets. It also verifies that both the Learning to Ask and the
to generate question and answer in the given context. Semantic Role Labeling Based Answering are effectiveness
We train our model on a single machine with 4 Nvidia and mutually reinforcing.
2080Ti GPUs with a batch size of 12 for RoBERTa-large.
We optimize our model using AdamW with a learning rate F. Error Analysis
of 1e-5. The learning rate warmups over the first 10000 steps In Table III, we show some wrong predictions that are
and then remains unchanged. The dropout rate in all layers is randomly chosen from the test set. A common problem is the
0.1. temporal sequence error when choosing the correct answer.
In example (1), Robin has already played some games for
D. Results hours, then he may go home and take a shower before going
The results of our model and previous methods are sum- to bed. On the other hand, emotional feeling error occurs
marized in Table I. For SocialIQA, we report the accuracy when the approach lacks hidden preconditions, so it cannot
of the dev set and test set. For COPA and WSC, we report complete commonsense inference. In example (2) and (3), all
the accuracy on the test set. Our model achieves an accuracy the methods make a wrong prediction about the emotional
score of 77.9 and 77.5 on the SocialIQA dev set and test set, feeling.
respectively, which are better than all the other approaches. Overall, our results show that domain knowledge and
Compared to BERT-large, our model gains 11.9%, 3.5%, relations between entities contribute a lot, but reasoning
6.2% accuracy improvement, on SocialIQA test set, COPA about social situations are still challenging for CSQA. These
test set and WSC, respectively, which validates the effective- problems might be alleviated if we incorporate more event
ness of our Learning to Ask and Semantic Role Labeling temporal sequence information, emotion analysis and external
Based Answering. We also conduct experiments on other pre- knowledge.
trained language model like GPT [26], BERT [9], XLNet [10], V. C ONCLUSION
RoBERTa [27] for the verification of our model.
In this work, we present an approach to incorporate do-
E. Ablations Analysis main knowledge into the language model with generated
questions that better understands the context. Our approach
Components of ours. First, we analyze the importance of also considers the relations between the correlative entities,
each component in our approach. The baseline method (Table using Semantic Role Labeling to combine more semantic
II(a)) is evaluated on RoBERTa language model. Then we information. The experimental results on three datasets showed
gradually apply our Learning to Ask stage and semantic role that our approach is effective to make correct predictions and
labeling stage. outperform the other approaches for Commonsense Question
1) SRL Table II (b) shows the ablation results of adding Answering. In future work, we would like to combine Graph
semantic role labeling based on RoBERTa. It brings Convolutional Network (GCN) to infer more complex relations
additional improvements with 0.5 / 0.1 / 0.1 increase on between entities in the context and combine external knowl-
three datasets. We could discover a definite subjective- edge about commonsense to improve our model.

1164

Authorized licensed use limited to: California State University Fresno. Downloaded on July 01,2021 at 18:17:02 UTC from IEEE Xplore. Restrictions apply.
ACKNOWLEDGMENT [20] H. J. Levesque, “The winograd schema challenge,” in Logical For-
malizations of Commonsense Reasoning, Papers from the 2011 AAAI
This work is supported by the National Natural Science Spring Symposium, Technical Report SS-11-06, Stanford, California,
Foundation of China under Grants (U1611264, U1811261, USA, March 21-23, 2011, 2011.
[21] N. Mostafazadeh, N. Chambers, X. He, D. Parikh, D. Batra,
61602530). This work is also supported by the Research L. Vanderwende, P. Kohli, and J. F. Allen, “A corpus and cloze
Foundation of Science and Technology Plan Project in evaluation for deeper understanding of commonsense stories,” in
Guangdong Province (2015A030401057, 2016B030307002, Proc. of NAACL-HLT, 2016, pp. 839–849. [Online]. Available:
http://aclweb.org/anthology/N/N16/N16-1098.pdf
2017B030308007) and the Pearl River Nova Program of [22] R. Zellers, Y. Bisk, R. Schwartz, and Y. Choi, “SWAG: A
Guangzhou(201906010080) . large-scale adversarial dataset for grounded commonsense inference,”
in Proc. of EMNLP, 2018, pp. 93–104. [Online]. Available:
R EFERENCES https://aclanthology.info/papers/D18-1009/d18-1009
[23] R. Zellers, A. Holtzman, Y. Bisk, A. Farhadi, and Y. Choi, “Hellaswag:
[1] E. Davis and G. Marcus, “Commonsense reasoning and commonsense Can a machine really finish your sentence?” in Proc. of ACL, 2019, pp.
knowledge in artificial intelligence.” Commun. ACM, vol. 58, no. 9, pp. 4791–4800. [Online]. Available: https://www.aclweb.org/anthology/P19-
92–103, 2015. 1472/
[2] C. Moore, The development of commonsense psychology. Psychology [24] A. Talmor, J. Herzig, N. Lourie, and J. Berant, “Commonsenseqa: A
Press, 2013. question answering challenge targeting commonsense knowledge,” in
[3] M. Heilman, “Automatic factual question generation from text,” Lan- Proc. of NAACL, 2019, pp. 4149–4158.
guage Technologies Institute School of Computer Science Carnegie [25] R. Speer, J. Chin, and C. Havasi, “Conceptnet 5.5:
Mellon University, vol. 195, 2011. An open multilingual graph of general knowledge,”
[4] A. Talmor, J. Herzig, N. Lourie, and J. Berant, “Commonsenseqa: A in AAAI, 2017, pp. 4444–4451. [Online]. Available:
question answering challenge targeting commonsense knowledge,” arXiv http://aaai.org/ocs/index.php/AAAI/AAAI17/paper/view/14972
preprint arXiv:1811.00937, 2018. [26] A. Radford, K. Narasimhan, T. Salimans, and I. Sutskever, “Improving
[5] M. Sap, H. Rashkin, D. Chen, R. LeBras, and Y. Choi, “Socialiqa: language understanding with unsupervised learning,” Technical report,
Commonsense reasoning about social interactions,” arXiv preprint OpenAI, Tech. Rep., 2018.
arXiv:1904.09728, 2019. [27] Y. Liu, M. Ott, N. Goyal, J. Du, M. Joshi, D. Chen, O. Levy, M. Lewis,
[6] M. Sap, R. Le Bras, E. Allaway, C. Bhagavatula, N. Lourie, H. Rashkin, L. Zettlemoyer, and V. Stoyanov, “Roberta: A robustly optimized bert
B. Roof, N. A. Smith, and Y. Choi, “Atomic: An atlas of machine com- pretraining approach,” arXiv preprint arXiv:1907.11692, 2019.
monsense for if-then reasoning,” in Proceedings of the AAAI Conference [28] N. F. Rajani, B. McCann, C. Xiong, and R. Socher, “Explain
on Artificial Intelligence, vol. 33, 2019, pp. 3027–3035. yourself! leveraging language models for commonsense reasoning,”
[7] H. Rashkin, M. Sap, E. Allaway, N. A. Smith, and Y. Choi, in Proc. of ACL, 2019, pp. 4932–4942. [Online]. Available:
“Event2mind: Commonsense inference on events, intents, and reactions,” https://www.aclweb.org/anthology/P19-1487/
arXiv preprint arXiv:1805.06939, 2018. [29] J. C. X. R. Bill Yuchen Lin, Xinyue Chen, “Kagnet: Knowledge-
[8] M. E. Peters, M. Neumann, M. Iyyer, M. Gardner, C. Clark, K. Lee, aware graph networks for commonsense reasoning,” arXiv preprint
and L. Zettlemoyer, “Deep contextualized word representations,” in arXiv:1909.02151, 2019.
Proceedings of NAACL-HLT, 2018, pp. 2227–2237. [30] S. Lv, D. Guo, J. Xu, D. Tang, N. Duan, M. Gong,
[9] J. Devlin, M.-W. Chang, K. Lee, and K. Toutanova, “Bert: Pre-training L. Shou, D. Jiang, G. Cao, and S. Hu, “Graph-based reasoning
of deep bidirectional transformers for language understanding,” arXiv over heterogeneous external knowledge for commonsense question
preprint arXiv:1810.04805, 2018. answering,” CoRR, vol. abs/1909.05311, 2019. [Online]. Available:
[10] Z. Yang, Z. Dai, Y. Yang, J. Carbonell, R. Salakhutdinov, and Q. V. Le, http://arxiv.org/abs/1909.05311
“Xlnet: Generalized autoregressive pretraining for language understand- [31] T. N. Kipf and M. Welling, “Semi-supervised classification with graph
ing,” arXiv preprint arXiv:1906.08237, 2019. convolutional networks,” arXiv preprint arXiv:1609.02907, 2016.
[11] J. Gordon and B. Van Durme, “Reporting bias and knowledge acqui- [32] D. Guo, Y. Sun, D. Tang, N. Duan, J. Yin, H. Chi, J. Cao, P. Chen,
sition,” in Proceedings of the 2013 workshop on Automated knowledge and M. Zhou, “Question generation from SQL queries improves neural
base construction. ACM, 2013, pp. 25–30. semantic parsing,” in Proceedings of the 2018 Conference on Empirical
[12] L. Lucy and J. Gauthier, “Are distributional representations ready for the Methods in Natural Language Processing, Brussels, Belgium, October
real world? evaluating word vectors for grounded perceptual meaning,” 31 - November 4, 2018, E. Riloff, D. Chiang, J. Hockenmaier, and
arXiv preprint arXiv:1705.11168, 2017. J. Tsujii, Eds. Association for Computational Linguistics, 2018, pp.
[13] T. Mihaylov, P. Clark, T. Khot, and A. Sabharwal, “Can a suit of armor 1597–1607. [Online]. Available: https://doi.org/10.18653/v1/d18-1188
conduct electricity? a new dataset for open book question answering,” [33] M. Palmer, D. Gildea, and N. Xue, “Semantic role labeling,” Synthesis
arXiv preprint arXiv:1809.02789, 2018. Lectures on Human Language Technologies, vol. 3, no. 1, pp. 1–103,
[14] S. Ostermann, M. Roth, A. Modi, S. Thater, and M. Pinkal, “Semeval- 2010.
2018 task 11: machine comprehension using commonsense knowledge,” [34] A. Vaswani, N. Shazeer, N. Parmar, J. Uszkoreit, L. Jones, A. N. Gomez,
in Proceedings of The 12th International Workshop on Semantic Eval- L. Kaiser, and I. Polosukhin, “Attention is all you need,” 2017.
uation, 2018, pp. 747–757. [35] Y. Liu, M. Ott, N. Goyal, J. Du, M. Joshi, D. Chen, O. Levy, M. Lewis,
[15] N. Tandon, G. de Melo, and G. Weikum, “Webchild 2.0: Fine-grained L. Zettlemoyer, and V. Stoyanov, “Roberta: A robustly optimized bert
commonsense knowledge distillation,” Proceedings of ACL 2017, System pretraining approach.”
Demonstrations, pp. 115–120, 2017.
[16] A. Mitra, P. Banerjee, K. K. Pal, S. Mishra, and C. Baral, “Exploring
ways to incorporate additional knowledge to improve natural language
commonsense question answering,” arXiv preprint arXiv:1909.08855,
2019.
[17] L. Huang, R. L. Bras, C. Bhagavatula, and Y. Choi, “Cosmos qa: Ma-
chine reading comprehension with contextual commonsense reasoning,”
arXiv preprint arXiv:1909.00277, 2019.
[18] A. Björkelund, L. Hafdell, and P. Nugues, “Multilingual semantic role
labeling,” in Proceedings of the Thirteenth Conference on Computational
Natural Language Learning: Shared Task. Association for Computa-
tional Linguistics, 2009, pp. 43–48.
[19] M. Roemmele, C. A. Bejan, and A. S. Gordon, “Choice of plausible
alternatives: An evaluation of commonsense causal reasoning,” in 2011
AAAI Spring Symposium Series, 2011.

1165

Authorized licensed use limited to: California State University Fresno. Downloaded on July 01,2021 at 18:17:02 UTC from IEEE Xplore. Restrictions apply.

You might also like