Evaluating the Complexity in Semantic Matching a New Dataset in News Final 20230303

The document presents a study on evaluating polysemy and lexical similarity in semantic matching using a newly created dataset derived from news articles on Twitter and Facebook. It details the dataset creation process, the performance of four state-of-the-art models (BERT, Sentence-BERT, Fine-Tuned BERT, and SimCSE), and highlights the challenges posed by polysemy in semantic text matching. The findings indicate that Fine-Tuned BERT performs best but requires significant computational effort, while Sentence-BERT and SimCSE struggle with text ambiguities.

Uploaded by

ZerosilentPhoenix

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

5 views

Evaluating the Complexity in Semantic Matching a New Dataset in News Final 20230303

Uploaded by

ZerosilentPhoenix

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 1

Evaluating polysemy and lexical similarity in semantic matching: a new dataset in news

Carlos Muñoz-Castroa,c,d Maria Apolob,c,d Maximiliano Ojedab,c,d Marcelo Mendozaa,c,d Hans Löbela,c,d
aPontificia Universidad Católica de Chile, Chile bUniversidad Técnica Federico Santa María, Chile cInstituto Milenio Fundamentos de los Datos, Chile dCentro Nacional de Inteligencia Artificial, Chile

Motivation Datasets Results

Classical task with applications in multiple domains. Notwithstanding the above, there are some The following steps describe the creation of a new dataset. This foremost consists of three sec- The results for the four models: BERT [1], Sentence-BERT [5], Fine-Tuned BERT [1], and Sim-
limitations arising from language flexibility: ambiguity, sarcasm, irony, etc. tions: data extraction, positive pairs and negative pairs. A positive pair indicates a paraphrasing CSE [2] are presented in the figure 3.
The state-of-the-art models show that transformers models can encode information about lin- detection, while a negative one signals the opposite.
guistic structures [6], but it is not clear what happens with polysemy and lexical similarity in the train 99 98 97 99
100

% Accuracy
1 Platforms val 95 95 96 94 95
sentence pair.
90 test
Examples Scores
80
s1: Kaiser Permanente to build new $900 million Oakland headquarters J: 0.63 2 Positive events 3 Negative events 74
s2: Kaiser Permanente cancels $900 million Oakland headquarters P: 7.00 (*)
sentence 1 sentence 2 target sentence 1 sentence 2 target 70
s1: Supreme Court Justice Ruth Bader Ginsburg dead at 87 J: 0.60 News Tw1
same link
News Fb2 1 News Tw1
jaccard
+ ? days News Fb2 0 BERT ft BERT ft Sent-BERT ft SimCSE
s2: Supreme Court says Justice Ruth Bader Ginsburg back at work P: 22.33 (*)
same link jaccard
News Fb3 News Tw3 1 News Tw5 + ? days
News Fb9 0
s1: Trying to shop for medical care? Lots of luck with that J: 0.27 Figure 3. Models performance in the semantic matching task.
s2: Despite rising deductibles, Americans still can't shop for medical care. P: 8.46 (*) jaccard
News Tw3 0
News Fb3 + ? days
same link
s1: Daughter's 911 call for pizza was actually a domestic violence report. The dispatcher knew News TwN News FbN 1
J: 0.12 In the table 3, it can be seen the accuracy desegregated by class and model.
s2: An Ohio dispatcher detected that a woman calling 911 ordering a pizza was in need of police help. P: 11.06 (*) jaccard
News TwN News FbN 0
+ ? days
Poly \Jacc [0.0-0.5[ [0.5-1.0] [0.0-0.5[ [0.5-1.0] [0.0-0.5[ [0.5-1.0]
Figure 1. Examples of paraphrasing and non-paraphrasing with (P) polysemy and (J) Jaccard scores. class = 0 [0.0-0.5[ 1.00% 5.08% 6.47% 20.30% 5.97% 20.81%
Figure 2. Diagram with the main steps for the creation of the dataset. [0.5-1.0] 0.49% 4.41% 6.86% 14.71% 5.88% 15.69%

Related Work 1. Data extraction: We worked primarily with two platforms; Twitter with an academic API and Poly \Jacc [0.0-0.5[ [0.5-1.0] [0.0-0.5[ [0.5-1.0] [0.0-0.5[ [0.5-1.0]
Facebook through Crowdtangle. On each, we brought together the major digital news outlets. class = 1 [0.0-0.5[ 1.46% 2.33% 3.13% 1.27% 2.71% 1.06%
In order to follow and improve the current limitations in Natural Language Processing (NLP), some [0.5-1.0] 2.24% 2.67% 4.08% 1.44% 3.67% 1.23%
works such as [3, 6] try to understand the implications of the ambiguity in discourse. 2. Positive events: Considering that each news could have a web-link with details of the event, we
compared and matched the link present in each Facebook news with each Twitter news, relating class=1 Poly \Jacc ft BERT ft Sent-BERT ft SimCSE
A recent work evaluating the polysemy impact on the BERT representation [6]. Here it is indicated each selected media. Additionally, we executed a pre-processing to improve the correct selection
the transformers models can in fact represent word-level polysemy in the text encoding. The of sentence pairs. Table 2. Error decomposition by class in the semantic matching task
aforementioned seems genuinely interesting considering that the focus of the work is on a classic
Word Sense Disambiguation (WSD) task and the effect on the semantic matching is not clear. 3. Negative events: We collected all the news present on the two platforms and compiled a
particular list containing the items of all digital media. Next we used the similarity by Jaccard
Polysemy score Jaccard index through locality sensitive hashing LSH with threshold = 0.5 and a number of hash functions = Conclusions
PVt
w (N Pw ×c(w,t)) |A∩B| 128. Thus, obtaining a considerable number of pairs of sentences, which we later filtered based
PA(t)= PVt J(A,B)= |A∪B| This work represents a first approach to constructing a new dataset for evaluating limitations in
w c(w,t) on the difference between the publication dates. Subsequently, we selected the sentence pairs
according to a high Jaccard index, resulting in a hard negative dataset. the semantic matching task.
Table 1. Methods for evaluate ambiguity [3] and lexical similarity in text.
It can be identified that Fine-tuning BERT model takes the best result, but requires more compu-
Ultimately, the result indicated 178,738 pairs of sentences distributed in three divisions: 80% tational effort than the Sentence-BERT model.
In addition, denial and speculation are known to be phenomena that affect performance in
NLP [4]. This alludes to the fact that acute situations where there is a high degree of lexical training, 10% validation and 10% test. Considering that the objective is to evaluate the complexity The Sentence-BERT and SimCSE models have difficulty resolving text ambiguities. On the other
similarity and in turn a low level of polysemy in the negative class (table 1) could mean a great in the task, 2,800 cases approximately were taken from the test set. hand, It can be also add that lexical similarity influences task complexity.
challenge for the representation of the transformer. The absolutely remarkable thing about this Future work can examine the identified limitations and propose new model that attempt to com-
is that it possibly suggests that a couple of sentences with a degree of similarity in words and Experiments pensate for computational effort and task performance.
polysemy can represent a challenge for the state-of-the-art models.
To evaluate the built dataset, we run four state-of-the-art models, and then compared the results
Regarding the models of the state-of-the-art, it is possible to highlight three models in recent between them. The four models are: BERT [1], Sentence-BERT [5], Fine-Tuned BERT [1] and References
times: pretrained BERT model [1], Siamese Neural Network Sentence-BERT [5] and a Simple SimCSE [2]. [1] Jacob Devlin, Ming-Wei Chang, Kenton Lee, and Kristina Toutanova. Bert: Pre-training of deep bidirectional transformers for language
Contrastive Learning SimCSE [2] that encodes the same sentence more than once. understanding. arXiv preprint arXiv:1810.04805, 2018.
In the case of the BERT model, the representation of the pre-trained model was used and a [2] Tianyu Gao, Xingcheng Yao, and Danqi Chen. Simcse: Simple contrastive learning of sentence embeddings. arXiv preprint
threshold = 0.5 was used to decide whether the pair was actually the same or not. arXiv:2104.08821, 2021.
Research Questions [3] Aina Garí Soler, Matthieu Labeau, and Chloé Clavel. Polysemy in spoken conversations and written texts. Proceedings of the Thirteenth
To evaluate the other models, a grid search of three parameters was performed, considering Language Resources and Evaluation Conference, 2022.
P1: Are the state-of-the-art models sufficient to address polysemy in semantic text matching?. variations in batch size, learning rate, and epochs number. All models were initialized with bert- [4] Ahmed Mahany, Heba Khaled, Nouh Elmitwally, Naif Aljohani, and Said Ghoniemy. Negation and speculation in nlp: A survey, corpora,
base-nli-mean-tokens [5] weights and finally adjusted through fine-tuning. methods, and applications. Applied Sciences, 2022.
P2: What is the relation between the similarity degree from the set of words and the model
[5] Nils Reimers and Iryna Gurevych. Sentence-bert: Sentence embeddings using siamese bert-networks. arXiv preprint arXiv:1908.10084,
performance?. 2019.
P3: What are the main challenges of the state-of-the-art models?. [6] Aina Garí Soler and Marianna Apidianaki. Let’s play mono-poly: Bert can reveal words’ polysemy level and partitionability into senses.
Transactions of the Association for Computational Linguistics, 2021.

KHIPU 2023 – Latin American Meeting In Artificial Intelligence [email protected]

30 Bash Script Examples
No ratings yet
30 Bash Script Examples
23 pages
A Hybrid Approach of Weighted Fine Tuned BERT Extraction With Deep Siamese Bi - LSTM Model For Semantic Text Similarity Identification
No ratings yet
A Hybrid Approach of Weighted Fine Tuned BERT Extraction With Deep Siamese Bi - LSTM Model For Semantic Text Similarity Identification
27 pages
Rebertsubmission116 NW
No ratings yet
Rebertsubmission116 NW
26 pages
Intbert Acl19paper-3
No ratings yet
Intbert Acl19paper-3
8 pages
Boosting The Performance of Transformer Architectu
No ratings yet
Boosting The Performance of Transformer Architectu
6 pages
Mridul 2021 Ijca 921582
No ratings yet
Mridul 2021 Ijca 921582
7 pages
Style and Semantics
No ratings yet
Style and Semantics
19 pages
Identifying Lexical Relationships and Entailments With Distributional Semantics
No ratings yet
Identifying Lexical Relationships and Entailments With Distributional Semantics
39 pages
German's Next Language Model - Branden Chan, Stefan Schweter, and Timo Moller
No ratings yet
German's Next Language Model - Branden Chan, Stefan Schweter, and Timo Moller
9 pages
NLP-week9-fine-tuning_and_IR
No ratings yet
NLP-week9-fine-tuning_and_IR
64 pages
Text Classificatio N: - by TV Harshawardhan (COE17B 005)
No ratings yet
Text Classificatio N: - by TV Harshawardhan (COE17B 005)
19 pages
Semantic Textual Similarity
No ratings yet
Semantic Textual Similarity
39 pages
Semantic Text Similarity
No ratings yet
Semantic Text Similarity
2 pages
2009.05451v1
No ratings yet
2009.05451v1
12 pages
Notes
No ratings yet
Notes
37 pages
Unit 3-1
No ratings yet
Unit 3-1
66 pages
Deep Learning For Semantic Similarity
No ratings yet
Deep Learning For Semantic Similarity
7 pages
2020.lrec-1.704
No ratings yet
2020.lrec-1.704
10 pages
split_1363534026993628405
No ratings yet
split_1363534026993628405
2 pages
Applying Deep Learning To Answer Selection - A Study and An Open Task
No ratings yet
Applying Deep Learning To Answer Selection - A Study and An Open Task
8 pages
Lecture10 - SRL
No ratings yet
Lecture10 - SRL
32 pages
Neural Net
No ratings yet
Neural Net
62 pages
Sscibert: A Pre-Trained Language Model For Social Science Texts
No ratings yet
Sscibert: A Pre-Trained Language Model For Social Science Texts
24 pages
NLP Unit 3
No ratings yet
NLP Unit 3
20 pages
2024.lchange-1.4
No ratings yet
2024.lchange-1.4
13 pages
Whitening Sentence Representations For Better Semantics and Faster Retrieval
No ratings yet
Whitening Sentence Representations For Better Semantics and Faster Retrieval
9 pages
Transformer Part3 16 Mar 23 PDF
No ratings yet
Transformer Part3 16 Mar 23 PDF
59 pages
SP 14
No ratings yet
SP 14
6 pages
GLUE
No ratings yet
GLUE
25 pages
2019-wiedemannetal-konvens-bert-5
No ratings yet
2019-wiedemannetal-konvens-bert-5
2 pages
NLP 9
No ratings yet
NLP 9
19 pages
Bert - Se: A P - L R M S E: RE Trained Anguage Epresentation Odel For Oftware Ngineering
No ratings yet
Bert - Se: A P - L R M S E: RE Trained Anguage Epresentation Odel For Oftware Ngineering
17 pages
Leveraging ParsBERT and Pretrained MT5 For Persian Abstractive
No ratings yet
Leveraging ParsBERT and Pretrained MT5 For Persian Abstractive
7 pages
Brill’s rule-based PoS tagger
No ratings yet
Brill’s rule-based PoS tagger
10 pages
Does BERT Make Any Sense? Interpretable Word Sense Disambiguation With Contextualized Embeddings
No ratings yet
Does BERT Make Any Sense? Interpretable Word Sense Disambiguation With Contextualized Embeddings
10 pages
A Large Annotated Corpus for Learning Natural Language Inference
No ratings yet
A Large Annotated Corpus for Learning Natural Language Inference
11 pages
Interpreting Language Models Through Knowledge Graph Extraction
No ratings yet
Interpreting Language Models Through Knowledge Graph Extraction
13 pages
Barts: Evaluating Generated Text As Text Generation: Corresponding Author
No ratings yet
Barts: Evaluating Generated Text As Text Generation: Corresponding Author
18 pages
Reasoning With Transformer Bas
No ratings yet
Reasoning With Transformer Bas
28 pages
2019-wiedemannetal-konvens-bert-1
No ratings yet
2019-wiedemannetal-konvens-bert-1
2 pages
NLP Unit Test 2
No ratings yet
NLP Unit Test 2
10 pages
Japanese Abstractive Summarization
No ratings yet
Japanese Abstractive Summarization
5 pages
4 - 21 - Sentiment Analysis Using BERT - ICCTA - 2021
No ratings yet
4 - 21 - Sentiment Analysis Using BERT - ICCTA - 2021
5 pages
Probing Knowledge and Structure in Transformers
No ratings yet
Probing Knowledge and Structure in Transformers
49 pages
BERT
No ratings yet
BERT
21 pages
Arxiv: Natural Language Processing (Almost) From Scratch
No ratings yet
Arxiv: Natural Language Processing (Almost) From Scratch
47 pages
PNG BERT-augmented BERT On Phonemes and Graphemes For Neural TTS
No ratings yet
PNG BERT-augmented BERT On Phonemes and Graphemes For Neural TTS
5 pages
(Done) 2023 Alt-1 3
No ratings yet
(Done) 2023 Alt-1 3
6 pages
thesis-wei
No ratings yet
thesis-wei
111 pages
data_mining_report
No ratings yet
data_mining_report
17 pages
BLEURT: Learning Robust Metrics For Text Generation
No ratings yet
BLEURT: Learning Robust Metrics For Text Generation
12 pages
BERT Finetuning Theory
No ratings yet
BERT Finetuning Theory
14 pages
Report Group-8
No ratings yet
Report Group-8
16 pages
Simcse: Simple Contrastive Learning of Sentence Embeddings
No ratings yet
Simcse: Simple Contrastive Learning of Sentence Embeddings
17 pages
Simcse: Simple Contrastive Learning of Sentence Embeddings
No ratings yet
Simcse: Simple Contrastive Learning of Sentence Embeddings
17 pages
NLTK Cheatsheet
No ratings yet
NLTK Cheatsheet
27 pages
Text Similarity
No ratings yet
Text Similarity
31 pages
Abstractive summarization using multilingual text-to-text transfer transformer for the Turkish text
No ratings yet
Abstractive summarization using multilingual text-to-text transfer transformer for the Turkish text
10 pages
Ordered Weighted Averaging Aggregation Operator: Fundamentals and Applications
From Everand
Ordered Weighted Averaging Aggregation Operator: Fundamentals and Applications
Fouad Sabry
No ratings yet
Top 20 MS Excel VBA Simulations, VBA to Model Risk, Investments, Growth, Gambling, and Monte Carlo Analysis
From Everand
Top 20 MS Excel VBA Simulations, VBA to Model Risk, Investments, Growth, Gambling, and Monte Carlo Analysis
Andrei Besedin
2.5/5 (2)
Satplan: Fundamentals and Applications
From Everand
Satplan: Fundamentals and Applications
Fouad Sabry
No ratings yet
A2 final 2_5
No ratings yet
A2 final 2_5
10 pages
Prepositions of Place
No ratings yet
Prepositions of Place
44 pages
11b RTM Writing For General Communication
No ratings yet
11b RTM Writing For General Communication
2 pages
College of Architecture, Engineering and Technology: Main Campus
50% (2)
College of Architecture, Engineering and Technology: Main Campus
6 pages
K'Iche' Mayab' Cholchi' Raqan Uq'Ab' Ri Mayab' Cholchi' Rech Paxil Kayala' Comunidad Lingüística K'Iche' Academia de Las Uas Mayas de Guatemala
No ratings yet
K'Iche' Mayab' Cholchi' Raqan Uq'Ab' Ri Mayab' Cholchi' Rech Paxil Kayala' Comunidad Lingüística K'Iche' Academia de Las Uas Mayas de Guatemala
2 pages
Rev IJPSS 111760
No ratings yet
Rev IJPSS 111760
3 pages
Gobeyond L2 EndofYearTest TeachersGuide
No ratings yet
Gobeyond L2 EndofYearTest TeachersGuide
2 pages
Key To Success Bac 2019 - 2020
No ratings yet
Key To Success Bac 2019 - 2020
62 pages
Cultural Project 2
No ratings yet
Cultural Project 2
5 pages
Java Language Features: With Modules, Streams, Threads, I/O, and Lambda Expressions 2nd Edition Kishori Sharan Download PDF
100% (5)
Java Language Features: With Modules, Streams, Threads, I/O, and Lambda Expressions 2nd Edition Kishori Sharan Download PDF
62 pages
Soal PTS Bahasa Inggris Kelas 6
No ratings yet
Soal PTS Bahasa Inggris Kelas 6
6 pages
CLSP English Framework Year 8
No ratings yet
CLSP English Framework Year 8
2 pages
2. PANZI NOTES, 2024- HIST F2
100% (5)
2. PANZI NOTES, 2024- HIST F2
44 pages
Use of San Andres Creole
No ratings yet
Use of San Andres Creole
4 pages
Practice Test 3 PDF
No ratings yet
Practice Test 3 PDF
7 pages
Curriculum Vitae_ Ths_ Luong Thien Phuc(2)
No ratings yet
Curriculum Vitae_ Ths_ Luong Thien Phuc(2)
3 pages
Spanish SOW Yr4 Lessons 1 2
No ratings yet
Spanish SOW Yr4 Lessons 1 2
16 pages
AQA GCSE English Literature Paper 1 SET 1
No ratings yet
AQA GCSE English Literature Paper 1 SET 1
6 pages
Bang DT Bat Quy Tac
No ratings yet
Bang DT Bat Quy Tac
3 pages
Curriculum Vitae: I. Personal Profile
No ratings yet
Curriculum Vitae: I. Personal Profile
2 pages
Babylonian and Assyrian Literature Gu010887 PDF
No ratings yet
Babylonian and Assyrian Literature Gu010887 PDF
245 pages
Jadwal Perkuliahan Semester Ganjil 2021-2022 Universitas Putra Bangsa
No ratings yet
Jadwal Perkuliahan Semester Ganjil 2021-2022 Universitas Putra Bangsa
9 pages
Units 1 Through 8 Henle Latin Book One Workbook Challenge AB
100% (1)
Units 1 Through 8 Henle Latin Book One Workbook Challenge AB
344 pages
Relative Clauses 2
No ratings yet
Relative Clauses 2
4 pages
Tcs Employment Application Form
No ratings yet
Tcs Employment Application Form
4 pages
Adjective Clause/ Relative Clause: A Adjective Clause Is A Subordinate Clause That Begins With A Question Word
0% (1)
Adjective Clause/ Relative Clause: A Adjective Clause Is A Subordinate Clause That Begins With A Question Word
29 pages
Module 4: How To Comply With Emergency Procedures: Lesson 1: Muster Station
100% (1)
Module 4: How To Comply With Emergency Procedures: Lesson 1: Muster Station
63 pages
Allen Term1 Paper
No ratings yet
Allen Term1 Paper
10 pages
Vegetables Wordsearch
No ratings yet
Vegetables Wordsearch
2 pages