0% found this document useful (0 votes)
5 views

Evaluating the Complexity in Semantic Matching a New Dataset in News Final 20230303

The document presents a study on evaluating polysemy and lexical similarity in semantic matching using a newly created dataset derived from news articles on Twitter and Facebook. It details the dataset creation process, the performance of four state-of-the-art models (BERT, Sentence-BERT, Fine-Tuned BERT, and SimCSE), and highlights the challenges posed by polysemy in semantic text matching. The findings indicate that Fine-Tuned BERT performs best but requires significant computational effort, while Sentence-BERT and SimCSE struggle with text ambiguities.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
5 views

Evaluating the Complexity in Semantic Matching a New Dataset in News Final 20230303

The document presents a study on evaluating polysemy and lexical similarity in semantic matching using a newly created dataset derived from news articles on Twitter and Facebook. It details the dataset creation process, the performance of four state-of-the-art models (BERT, Sentence-BERT, Fine-Tuned BERT, and SimCSE), and highlights the challenges posed by polysemy in semantic text matching. The findings indicate that Fine-Tuned BERT performs best but requires significant computational effort, while Sentence-BERT and SimCSE struggle with text ambiguities.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 1

Evaluating polysemy and lexical similarity in semantic matching: a new dataset in news

Carlos Muñoz-Castroa,c,d Maria Apolob,c,d Maximiliano Ojedab,c,d Marcelo Mendozaa,c,d Hans Löbela,c,d
aPontificia Universidad Católica de Chile, Chile bUniversidad Técnica Federico Santa María, Chile cInstituto Milenio Fundamentos de los Datos, Chile dCentro Nacional de Inteligencia Artificial, Chile

Motivation Datasets Results


Classical task with applications in multiple domains. Notwithstanding the above, there are some The following steps describe the creation of a new dataset. This foremost consists of three sec- The results for the four models: BERT [1], Sentence-BERT [5], Fine-Tuned BERT [1], and Sim-
limitations arising from language flexibility: ambiguity, sarcasm, irony, etc. tions: data extraction, positive pairs and negative pairs. A positive pair indicates a paraphrasing CSE [2] are presented in the figure 3.
The state-of-the-art models show that transformers models can encode information about lin- detection, while a negative one signals the opposite.
guistic structures [6], but it is not clear what happens with polysemy and lexical similarity in the train 99 98 97 99
100

% Accuracy
1 Platforms val 95 95 96 94 95
sentence pair.
90 test
Examples Scores
80
s1: Kaiser Permanente to build new $900 million Oakland headquarters J: 0.63 2 Positive events 3 Negative events 74
s2: Kaiser Permanente cancels $900 million Oakland headquarters P: 7.00 (*)
sentence 1 sentence 2 target sentence 1 sentence 2 target 70
s1: Supreme Court Justice Ruth Bader Ginsburg dead at 87 J: 0.60 News Tw1
same link
News Fb2 1 News Tw1
jaccard
+ ? days News Fb2 0 BERT ft BERT ft Sent-BERT ft SimCSE
s2: Supreme Court says Justice Ruth Bader Ginsburg back at work P: 22.33 (*)
same link jaccard
News Fb3 News Tw3 1 News Tw5 + ? days
News Fb9 0
s1: Trying to shop for medical care? Lots of luck with that J: 0.27 Figure 3. Models performance in the semantic matching task.
s2: Despite rising deductibles, Americans still can't shop for medical care. P: 8.46 (*) jaccard
News Tw3 0
News Fb3 + ? days
same link
s1: Daughter's 911 call for pizza was actually a domestic violence report. The dispatcher knew News TwN News FbN 1
J: 0.12 In the table 3, it can be seen the accuracy desegregated by class and model.
s2: An Ohio dispatcher detected that a woman calling 911 ordering a pizza was in need of police help. P: 11.06 (*) jaccard
News TwN News FbN 0
+ ? days
Poly \Jacc [0.0-0.5[ [0.5-1.0] [0.0-0.5[ [0.5-1.0] [0.0-0.5[ [0.5-1.0]
Figure 1. Examples of paraphrasing and non-paraphrasing with (P) polysemy and (J) Jaccard scores. class = 0 [0.0-0.5[ 1.00% 5.08% 6.47% 20.30% 5.97% 20.81%
Figure 2. Diagram with the main steps for the creation of the dataset. [0.5-1.0] 0.49% 4.41% 6.86% 14.71% 5.88% 15.69%

Related Work 1. Data extraction: We worked primarily with two platforms; Twitter with an academic API and Poly \Jacc [0.0-0.5[ [0.5-1.0] [0.0-0.5[ [0.5-1.0] [0.0-0.5[ [0.5-1.0]
Facebook through Crowdtangle. On each, we brought together the major digital news outlets. class = 1 [0.0-0.5[ 1.46% 2.33% 3.13% 1.27% 2.71% 1.06%
In order to follow and improve the current limitations in Natural Language Processing (NLP), some [0.5-1.0] 2.24% 2.67% 4.08% 1.44% 3.67% 1.23%
works such as [3, 6] try to understand the implications of the ambiguity in discourse. 2. Positive events: Considering that each news could have a web-link with details of the event, we
compared and matched the link present in each Facebook news with each Twitter news, relating class=1 Poly \Jacc ft BERT ft Sent-BERT ft SimCSE
A recent work evaluating the polysemy impact on the BERT representation [6]. Here it is indicated each selected media. Additionally, we executed a pre-processing to improve the correct selection
the transformers models can in fact represent word-level polysemy in the text encoding. The of sentence pairs. Table 2. Error decomposition by class in the semantic matching task
aforementioned seems genuinely interesting considering that the focus of the work is on a classic
Word Sense Disambiguation (WSD) task and the effect on the semantic matching is not clear. 3. Negative events: We collected all the news present on the two platforms and compiled a
particular list containing the items of all digital media. Next we used the similarity by Jaccard
Polysemy score Jaccard index through locality sensitive hashing LSH with threshold = 0.5 and a number of hash functions = Conclusions
PVt
w (N Pw ×c(w,t)) |A∩B| 128. Thus, obtaining a considerable number of pairs of sentences, which we later filtered based
PA(t)= PVt J(A,B)= |A∪B| This work represents a first approach to constructing a new dataset for evaluating limitations in
w c(w,t) on the difference between the publication dates. Subsequently, we selected the sentence pairs
according to a high Jaccard index, resulting in a hard negative dataset. the semantic matching task.
Table 1. Methods for evaluate ambiguity [3] and lexical similarity in text.
It can be identified that Fine-tuning BERT model takes the best result, but requires more compu-
Ultimately, the result indicated 178,738 pairs of sentences distributed in three divisions: 80% tational effort than the Sentence-BERT model.
In addition, denial and speculation are known to be phenomena that affect performance in
NLP [4]. This alludes to the fact that acute situations where there is a high degree of lexical training, 10% validation and 10% test. Considering that the objective is to evaluate the complexity The Sentence-BERT and SimCSE models have difficulty resolving text ambiguities. On the other
similarity and in turn a low level of polysemy in the negative class (table 1) could mean a great in the task, 2,800 cases approximately were taken from the test set. hand, It can be also add that lexical similarity influences task complexity.
challenge for the representation of the transformer. The absolutely remarkable thing about this Future work can examine the identified limitations and propose new model that attempt to com-
is that it possibly suggests that a couple of sentences with a degree of similarity in words and Experiments pensate for computational effort and task performance.
polysemy can represent a challenge for the state-of-the-art models.
To evaluate the built dataset, we run four state-of-the-art models, and then compared the results
Regarding the models of the state-of-the-art, it is possible to highlight three models in recent between them. The four models are: BERT [1], Sentence-BERT [5], Fine-Tuned BERT [1] and References
times: pretrained BERT model [1], Siamese Neural Network Sentence-BERT [5] and a Simple SimCSE [2]. [1] Jacob Devlin, Ming-Wei Chang, Kenton Lee, and Kristina Toutanova. Bert: Pre-training of deep bidirectional transformers for language
Contrastive Learning SimCSE [2] that encodes the same sentence more than once. understanding. arXiv preprint arXiv:1810.04805, 2018.
In the case of the BERT model, the representation of the pre-trained model was used and a [2] Tianyu Gao, Xingcheng Yao, and Danqi Chen. Simcse: Simple contrastive learning of sentence embeddings. arXiv preprint
threshold = 0.5 was used to decide whether the pair was actually the same or not. arXiv:2104.08821, 2021.
Research Questions [3] Aina Garí Soler, Matthieu Labeau, and Chloé Clavel. Polysemy in spoken conversations and written texts. Proceedings of the Thirteenth
To evaluate the other models, a grid search of three parameters was performed, considering Language Resources and Evaluation Conference, 2022.
P1: Are the state-of-the-art models sufficient to address polysemy in semantic text matching?. variations in batch size, learning rate, and epochs number. All models were initialized with bert- [4] Ahmed Mahany, Heba Khaled, Nouh Elmitwally, Naif Aljohani, and Said Ghoniemy. Negation and speculation in nlp: A survey, corpora,
base-nli-mean-tokens [5] weights and finally adjusted through fine-tuning. methods, and applications. Applied Sciences, 2022.
P2: What is the relation between the similarity degree from the set of words and the model
[5] Nils Reimers and Iryna Gurevych. Sentence-bert: Sentence embeddings using siamese bert-networks. arXiv preprint arXiv:1908.10084,
performance?. 2019.
P3: What are the main challenges of the state-of-the-art models?. [6] Aina Garí Soler and Marianna Apidianaki. Let’s play mono-poly: Bert can reveal words’ polysemy level and partitionability into senses.
Transactions of the Association for Computational Linguistics, 2021.

KHIPU 2023 – Latin American Meeting In Artificial Intelligence [email protected]

You might also like