SlideShare a Scribd company logo
Reference Scope Identification for Citances
Using Convolutional Neural Network
• SAURAV JHA
• AANCHAL CHAURASIA
• AKHILESH SUDHAKAR
• ANIL KUMAR SINGH
19 December, 2017
MNNIT, Allahabad IIT (BHU), Varanasi
Overview of the problem
! Automatically generating the reference scope (the span of cited
text) in a reference paper
! Corresponding to citances (sentences in the citing papers that
cite it)
! Application : Scientific Paper Summarization
The Computational Linguistics Scientific Document
Summarization Shared Task (CL-SciSumm)
• Given: A topic consisting of a Reference Paper (RP) and Citing Papers (CPs) that all contain citations to
the RP. In each CP, the text spans (i.e., citances) have been identified that pertain to a particular citation to
the RP.
• Task: For each citance, identify the spans of text (cited text spans) in the RP that most accurately reflect
the citance:
• A sentence fragment, a full sentence, or several consecutive sentences (no more than 5).
Citing Paper (ours) Referenced Paper (Yeh et. al. (2017))
Example
Contributions:
● Modeling a new feature set to represent a citance-
reference sentence pair
● Building a classification system: binary classification of a
<CP sentence, RP sentence> pair.
● Showing performance gains over state-of-the-art results
of Yeh et al. (2017)
● Better F1-scores
● Smaller feature set
● 3 binary classifiers:
● Adaptive Boosting Classifier (ABC)
● Gradient Boosting Classifier (GBC)
● CNN classifier..
Citance Reference
Feature Extraction
Label: 0/1
Undersampling + SMOTE
Train + Val. Set Test Set
Principal Comp. Analysis
Classifiers
Label: 0/1
Train Predict
Model
Dataset
• CL-SciSumm Shared Tasks 2016 and 2017
• Development corpus
• Training corpora
• Test corpus
● Each corpus = 10 topics.
● Each topic = a reference paper (RP) + its citing papers (CPs).
● The citation annotations specify citances, their associated reference text
and the discourse facet that it represents.
● Citances in CPs are paired with each sentence in the RPs, along with a
binary label indicating their actual reference relations : 0 or 1.
ANNOTATION FORMAT
Feature Extraction:
• Three different classes of citation-dependent features (i.e., lexical,
knowledge-based and corpus-based) and one class of citation-
independent features (i.e., surface).
1. LEXICAL FEATURES
● Word overlap* using 5 metrics: Dice coefficient, Jaccard coefficient,
Cosine similarity, Levenshtein distance based fuzzy string similarity and
modified gestalt pattern-matching based sequence matcher score.
● TF-IDF similarity: The TF-IDF vector cosine similarity.
● ROUGE measure: ROGUE-1, ROGUE-2 and ROGUE-L.
● Named entity overlap* : Using Dice coefficient, fuzzy string similarity,
sequence matcher score and word2vec similarity.
● Number overlap*: Fuzzy string similarity and sequence matcher score.
● Significance of citation-related word pairs: Based on Pointwise Mutual
Information (PMI) score (Church and Hanks, 1989).
Convention:
● ∗ = borrowed, but modified features.
● ∗∗ = newly added features in this work.
1. Lexical
2. Knowledge-based
3. Corpus-based
4. Surface
Feature Extraction:
2. KNOWLEDGE -BASED FEATURES
• WordNet-based semantic similarity* : Best semantic similarity
score between words in the citance and the reference sentence out of all the
sets of cognitive synonyms (synsets) present in the WordNet.
Convention:
● ∗ = borrowed, but modified features.
● ∗∗ = newly added features in this work.
1. Lexical
2. Knowledge-based
3. Corpus-based
4. Surface
Feature Extraction:
3. CORPUS-BASED FEATURES
• Word2Vec-based Semantic similarity** : Based on the pre-
trained embedding vectors of the GoogleNews corpus, following Mikolov et
al. (2013).
Convention:
● ∗ = borrowed, but modified features.
● ∗∗ = newly added features in this work.
1. Lexical
2. Knowledge-based
3. Corpus-based
4. Surface
Feature Extraction:
4. SURFACE FEATURES
• Count of words: In the reference sentence.
• Count of characters** : In the reference sentence.
• Count of digits: In the reference sentence.
• Count of special characters** : “@”, “#”, “$”, “%”, “&”, “*”, “-”, “=”, “+”, “>”,
“<”, “[”,“]”, “{”, “}”, “/”.
• Normalized count of punctuation markers** : The ratio of count of
punctuation characters to the total count of characters.
• Count of long words** : Words exceeding six letters in length.
• Average word Length** : The ratio of count of total characters in a word to
the count of words in the reference sentence.
• Count of named entities: In the reference sentence.
• Average sentiment score**: The overall positive and negative sentiment
score of the reference sentence averaged over all the words, based on the
SentiWordNet 3.0 lexical resource (Baccianella et al. (2010)).
• Lexical richness** : The lexical richness of the reference sentence based
on Yule’s K index.
Convention:
● ∗ = borrowed, but modified features.
● ∗∗ = newly added features in this work.
1. Lexical
2. Knowledge-based
3. Corpus-based
4. Surface
1. The Class Imbalance
Problem
2. Handling Class Imbalance
3. Handling Correlated
Features
• For a given reference paper, the number of
sentences in it that are cited by some citing
paper, is much lesser than the number of
sentences that are not cited
• Highly imbalanced data set with the ratio of non-
cited to cited pairs being 383.83 : 1 in the
combined corpus of development and training set
and 355.76 : 1 in the test set corpus.
Data Handling Techniques:
1. The Class Imbalance
Problem
2. Handling Class Imbalance
3. Handling Correlated
Features
• We experimented with combinations of three
different degrees of Random under-sampling
(20%, 30% and 35%) on the majority class
(negative samples).
• On each such undersampled dataset, we apply
the SMOTE (Synthetic Minority Over-sampling
Technique) method to generate synthetic cited
pairs until the ratio cited: non-cited pairs = 1:1.
Data Handling Techniques:
1. The Class Imbalance
Problem
2. Handling Class Imbalance
3. Handling Correlated
Features
• Principal Component Analysis (PCA), is
applied on both training and testing feature sets.
• Experiments were done by varying the number
of principal components from 30-40 and the
best performance was obtained by retaining the
top 35 principal components.
Data Handling Techniques:
1. Adaptive Boosting Classifier
(ABC)
2. Gradient Boosting Classifier
(GBC)
3. Convolutional Neural Network
(CNN)
• Work by creating a sequence of models that
attempt to correct the mistakes of the models
used before them in the sequence.
• Offer the added benefit of combining outputs
from weak learners (those whose performance
is at least better than random chance) to create
a strong learner with improved prediction
performance.
• Pay higher focus on instances that have been
misclassified or have higher errors.
• Base classifiers (or weak learner) used in ABC
are decision trees.
Classification Algorithms:
Boosting Ensemble Algorithms
1. Adaptive Boosting Classifier
(ABC)
2. Gradient Boosting Classifier
(GBC)
3. Convolutional Neural Network
(CNN)
• Allows each base classifier to gradually
minimize the loss function of the whole system
using the Gradient Descent method (Collobert
et al. (2004)).
• The base classifiers in a GBC are regression
trees.
Classification Algorithms:
Figure 1: Schematic illustration of the boosting framework. Adapted from Bishop and Nasrabadi (2007): each base classifier y_m(x)
is trained on a weighted form of the training set (blue arrows) in which the weights w_n(m) depend on the performance of the
previous base classifier y_m −1(x) (green arrows). Once all base classifiers have been trained, they are combined to give the final
classifier Y_M(x) (red arrows).
1. Gradient Boosting Classifier
(GBC)
2. Adaptive Boosting Classifier
(ABC)
3. Convolutional Neural Network (CNN)
• Have the ability to extract features of high-level
abstraction with minimum pre-processing of
data.
➢ ARCHITECTURE :
• A 1D Convolutional layer accepts inputs of the
form (Height * Width * Channels).
• Visualize each feature vector as an image with
a unit channel, unit height and a width equal to
the number of features in the reduced feature
vector obtained after applying PCA.
• Input shape for the vector to be fed into the
input layer of the CNN = (No. of features * 1).
Classification Algorithms:
Figure 2: Our CNN architecture: stack of two 1-D convolutional layers with 64 hidden units each (ReLu activations) + 1-
D MaxPooling + stack of two 1-D convolutional layers with 128 hidden units each (ReLu activations) + 1-D Global
Average Pooling + 50% Dropout + a single unit output dense layer (sigmoid activation)
Post Filtering :
! The binary classifier may classify multiple sentences in the RP as positive, i.e., being relevant to a
particular citance: all these might not be true!
! In order to reduce our false positive error rate, we post-process by filtering out some of these false
positives.
! We use the method of Yeh et al. (2017): the final output denotes the top-k sentences from the
ordered sequence of classified reference sentences based on the TF-IDF vector cosine similarity
score to measure the relevance between the citance and the reference sentences.
EXPERIMENTS:
• Evaluation Metrics: Precision, Recall and F1-Score
• The average score on all topics in the test corpus is reported.
• We run experiments on two separate training sets: first run and second run.
➢ In the first run, we use data only from the 2016 shared task for comparison with the existing state-of-the-art (Yeh et
al. (2017)):
1. Train our data on the training set, and tune the CNN’s hyper-parameters on the development set.
2. We then augment the training data and the development data to train the final models.
3. We test our model on the test provided as part of this dataset.
Table 1 : F1 score comparison of CNN with previous models
➢ In the second run, we make use of the datasets from both 2016 and 2017.
1. Both the training datasets are augmented to form the initial training set.
2. After tuning the CNN’s hyperparameters on the development set, the initial training and development sets are
augmented to form the final training set.
➢ Grid search algorithm over 10-fold cross validation used to find the best model parameters for ABC and GBC:
Results And Analysis:
● Precision, recall and F1-score obtained by the models on the test set with respect to the positive
classes, evaluated by 10-fold cross validation are shown in Table 3.
● The CNN-based classifier was trained for 30 epochs.
Instances of TP, FP, TN, FN:
• True Positive:
• False Positive:
Citance: We agree with Sekine (2005) who claims that several different methods are
required to discover a wider variety of paraphrases.
Reference: Rather believe several methods developed using different heuristics
discover wider variety paraphrases
Citance: Similarly, (Sekine, 2005) improved information retrieval based on pattern
recognition by introducing paraphrase generation.
Reference: obstacles completing idea, believe automatic paraphrase discovery
important component building fully automatic information extraction system.
Citance: We agree with Sekine (2005) who claims that several different methods are required to
discover a wider variety of paraphrases.
Reference: Keyword detection error Even keyword consists single word, words desirable
keywords domain.
Citance: This sparked intensive research on unsupervised acquisition of entailment
rules (and similarly paraphrases) e.g. (Lin and Pantel, 2001; Szpektor et al., 2004; Sekine, 2005).
Reference: proposed unsupervised method discover paraphrases large untagged corpus.
• True Negative:
• False Negative:
Comparison with Klampfl et al. (2016):
• Reported an F1-score of 0.346 on the development set corpus and 0.432 on the training set corpus of
2016 using TextSentenceRank assisted sentence classifier.
• Because of the unavailability of their performance results on the test set corpus, we choose to compare
the performance of our CNN classifier with theirs on the development and training set corpus (80:20
train:test split) of 2016.
1. Effect of Feature Classes
2. Effect of Data Handling
Techniques
Ablation Studies:
1. Effect of Feature Classes
2. Effect of Data Handling
Techniques
Ablation Studies:
• More Data
• Extensions to Word2Vec: Paragraph Vector
(Le and Mikolo (2014)).
• Modeling a Learning to rank problem:
Establish some partial order between the
training instances using the binary labels
assigned to each <CP sentence, RP sentence>
pair.
Future Work
● We describe our work on reference
scope identification for citances
using an extended feature set
applied to three different classifiers.
● Among the classifiers trained to
distinguish cited and non-cited
pairs, the CNN-based model gave
the overall best results with an F1
score of 0.5558 on the combined
corpus of CL-SciSumm 2016 and
2017.
● We also achieved an F1 score of
0.2462 on the 2016 dataset, which
surpasses the previous state-of-the-
art accuracy on the dataset.
References:
• Peeyush Aggarwal and Richa Sharma. 2016a. Lexical and syntactic cues to identify reference scope of
citance. In BIRNDL@ JCDL, pages 103–112.
• Stefano Baccianella, Andrea Esuli, and Fabrizio Sebastiani. 2010. Sentiwordnet 3.0: An enhanced lexical
resource for sentiment analysis and opinion mining. In LREC.
• François Chollet et al. 2015. Keras. https: //github.com/fchollet/keras
• Kevin W. Bowyer, Nitesh V. Chawla, Lawrence O. Hall, and W. Philip Kegelmeyer. 2002. Smote: Synthetic
minority oversampling technique. J. Artif. Intell. Res. (JAIR), 16:321–357.
• Yoon Kim. 2014. Convolutional neural networks for sentence classification. In EMNLP.
• Bruno Malenfant and Guy Lapalme. 2016. Rali system description for cl-scisumm 2016 shared task. In
BIRNDL@ JCDL, pages 146–155.
• Tomas Mikolov, Ilya Sutskever, Kai Chen, Gregory S. Corrado, and Jeffrey Dean. 2013. Distributed
representations of words and phrases and their compositionality. In NIPS.
• Ted Pedersen, Siddharth Patwardhan, and Jason Michelizzi. 2004. Wordnet: : Similarity-measuring the
relatedness of concepts. In AAAI
• Jen-Yuan Yeh, Tien-Yu Hsu, Cheng-Jung Tsai, and Pei-Cheng Cheng. 2017. Reference scope identification for
citances by classification with text similarity measures. In ICSCA ’17.
THANKYOU
Ad

More Related Content

What's hot (20)

Document Summarization
Document SummarizationDocument Summarization
Document Summarization
Pratik Kumar
 
Understanding Natural Languange with Corpora-based Generation of Dependency G...
Understanding Natural Languange with Corpora-based Generation of Dependency G...Understanding Natural Languange with Corpora-based Generation of Dependency G...
Understanding Natural Languange with Corpora-based Generation of Dependency G...
Edmond Lepedus
 
CSTalks-Quaternary Semantics Recomandation System-24 Aug
CSTalks-Quaternary Semantics Recomandation System-24 AugCSTalks-Quaternary Semantics Recomandation System-24 Aug
CSTalks-Quaternary Semantics Recomandation System-24 Aug
cstalks
 
COLING 2012 - LEPOR: A Robust Evaluation Metric for Machine Translation with ...
COLING 2012 - LEPOR: A Robust Evaluation Metric for Machine Translation with ...COLING 2012 - LEPOR: A Robust Evaluation Metric for Machine Translation with ...
COLING 2012 - LEPOR: A Robust Evaluation Metric for Machine Translation with ...
Lifeng (Aaron) Han
 
TextRank: Bringing Order into Texts
TextRank: Bringing Order into TextsTextRank: Bringing Order into Texts
TextRank: Bringing Order into Texts
Shubhangi Tandon
 
semeval2016
semeval2016semeval2016
semeval2016
Lukáš Svoboda
 
AI to advance science research
AI to advance science researchAI to advance science research
AI to advance science research
Ding Li
 
Sentiment analysis using naive bayes classifier
Sentiment analysis using naive bayes classifier Sentiment analysis using naive bayes classifier
Sentiment analysis using naive bayes classifier
Dev Sahu
 
Introduction to Probabilistic Latent Semantic Analysis
Introduction to Probabilistic Latent Semantic AnalysisIntroduction to Probabilistic Latent Semantic Analysis
Introduction to Probabilistic Latent Semantic Analysis
NYC Predictive Analytics
 
MaxEnt (Loglinear) Models - Overview
MaxEnt (Loglinear) Models - OverviewMaxEnt (Loglinear) Models - Overview
MaxEnt (Loglinear) Models - Overview
ananth
 
4 the sql_standard
4 the  sql_standard4 the  sql_standard
4 the sql_standard
Utkarsh De
 
Cg4201552556
Cg4201552556Cg4201552556
Cg4201552556
IJERA Editor
 
Probabilistic models (part 1)
Probabilistic models (part 1)Probabilistic models (part 1)
Probabilistic models (part 1)
KU Leuven
 
Latent Semantic Indexing For Information Retrieval
Latent Semantic Indexing For Information RetrievalLatent Semantic Indexing For Information Retrieval
Latent Semantic Indexing For Information Retrieval
Sudarsun Santhiappan
 
Semantics2018 Zhang,Petrak,Maynard: Adapted TextRank for Term Extraction: A G...
Semantics2018 Zhang,Petrak,Maynard: Adapted TextRank for Term Extraction: A G...Semantics2018 Zhang,Petrak,Maynard: Adapted TextRank for Term Extraction: A G...
Semantics2018 Zhang,Petrak,Maynard: Adapted TextRank for Term Extraction: A G...
Johann Petrak
 
Tdm probabilistic models (part 2)
Tdm probabilistic  models (part  2)Tdm probabilistic  models (part  2)
Tdm probabilistic models (part 2)
KU Leuven
 
Dissertation defense slides on "Semantic Analysis for Improved Multi-document...
Dissertation defense slides on "Semantic Analysis for Improved Multi-document...Dissertation defense slides on "Semantic Analysis for Improved Multi-document...
Dissertation defense slides on "Semantic Analysis for Improved Multi-document...
Quinsulon Israel
 
Part 1
Part 1Part 1
Part 1
butest
 
Learning Collaborative Agents with Rule Guidance for Knowledge Graph Reasoning
Learning Collaborative Agents with Rule Guidance for Knowledge Graph ReasoningLearning Collaborative Agents with Rule Guidance for Knowledge Graph Reasoning
Learning Collaborative Agents with Rule Guidance for Knowledge Graph Reasoning
Deren Lei
 
LP&IIS2013 PPT. Chinese Named Entity Recognition with Conditional Random Fiel...
LP&IIS2013 PPT. Chinese Named Entity Recognition with Conditional Random Fiel...LP&IIS2013 PPT. Chinese Named Entity Recognition with Conditional Random Fiel...
LP&IIS2013 PPT. Chinese Named Entity Recognition with Conditional Random Fiel...
Lifeng (Aaron) Han
 
Document Summarization
Document SummarizationDocument Summarization
Document Summarization
Pratik Kumar
 
Understanding Natural Languange with Corpora-based Generation of Dependency G...
Understanding Natural Languange with Corpora-based Generation of Dependency G...Understanding Natural Languange with Corpora-based Generation of Dependency G...
Understanding Natural Languange with Corpora-based Generation of Dependency G...
Edmond Lepedus
 
CSTalks-Quaternary Semantics Recomandation System-24 Aug
CSTalks-Quaternary Semantics Recomandation System-24 AugCSTalks-Quaternary Semantics Recomandation System-24 Aug
CSTalks-Quaternary Semantics Recomandation System-24 Aug
cstalks
 
COLING 2012 - LEPOR: A Robust Evaluation Metric for Machine Translation with ...
COLING 2012 - LEPOR: A Robust Evaluation Metric for Machine Translation with ...COLING 2012 - LEPOR: A Robust Evaluation Metric for Machine Translation with ...
COLING 2012 - LEPOR: A Robust Evaluation Metric for Machine Translation with ...
Lifeng (Aaron) Han
 
TextRank: Bringing Order into Texts
TextRank: Bringing Order into TextsTextRank: Bringing Order into Texts
TextRank: Bringing Order into Texts
Shubhangi Tandon
 
AI to advance science research
AI to advance science researchAI to advance science research
AI to advance science research
Ding Li
 
Sentiment analysis using naive bayes classifier
Sentiment analysis using naive bayes classifier Sentiment analysis using naive bayes classifier
Sentiment analysis using naive bayes classifier
Dev Sahu
 
Introduction to Probabilistic Latent Semantic Analysis
Introduction to Probabilistic Latent Semantic AnalysisIntroduction to Probabilistic Latent Semantic Analysis
Introduction to Probabilistic Latent Semantic Analysis
NYC Predictive Analytics
 
MaxEnt (Loglinear) Models - Overview
MaxEnt (Loglinear) Models - OverviewMaxEnt (Loglinear) Models - Overview
MaxEnt (Loglinear) Models - Overview
ananth
 
4 the sql_standard
4 the  sql_standard4 the  sql_standard
4 the sql_standard
Utkarsh De
 
Probabilistic models (part 1)
Probabilistic models (part 1)Probabilistic models (part 1)
Probabilistic models (part 1)
KU Leuven
 
Latent Semantic Indexing For Information Retrieval
Latent Semantic Indexing For Information RetrievalLatent Semantic Indexing For Information Retrieval
Latent Semantic Indexing For Information Retrieval
Sudarsun Santhiappan
 
Semantics2018 Zhang,Petrak,Maynard: Adapted TextRank for Term Extraction: A G...
Semantics2018 Zhang,Petrak,Maynard: Adapted TextRank for Term Extraction: A G...Semantics2018 Zhang,Petrak,Maynard: Adapted TextRank for Term Extraction: A G...
Semantics2018 Zhang,Petrak,Maynard: Adapted TextRank for Term Extraction: A G...
Johann Petrak
 
Tdm probabilistic models (part 2)
Tdm probabilistic  models (part  2)Tdm probabilistic  models (part  2)
Tdm probabilistic models (part 2)
KU Leuven
 
Dissertation defense slides on "Semantic Analysis for Improved Multi-document...
Dissertation defense slides on "Semantic Analysis for Improved Multi-document...Dissertation defense slides on "Semantic Analysis for Improved Multi-document...
Dissertation defense slides on "Semantic Analysis for Improved Multi-document...
Quinsulon Israel
 
Part 1
Part 1Part 1
Part 1
butest
 
Learning Collaborative Agents with Rule Guidance for Knowledge Graph Reasoning
Learning Collaborative Agents with Rule Guidance for Knowledge Graph ReasoningLearning Collaborative Agents with Rule Guidance for Knowledge Graph Reasoning
Learning Collaborative Agents with Rule Guidance for Knowledge Graph Reasoning
Deren Lei
 
LP&IIS2013 PPT. Chinese Named Entity Recognition with Conditional Random Fiel...
LP&IIS2013 PPT. Chinese Named Entity Recognition with Conditional Random Fiel...LP&IIS2013 PPT. Chinese Named Entity Recognition with Conditional Random Fiel...
LP&IIS2013 PPT. Chinese Named Entity Recognition with Conditional Random Fiel...
Lifeng (Aaron) Han
 

Similar to Reference Scope Identification of Citances Using Convolutional Neural Network (20)

316_16SCCCS4_2020052505222431.pptdatabasex
316_16SCCCS4_2020052505222431.pptdatabasex316_16SCCCS4_2020052505222431.pptdatabasex
316_16SCCCS4_2020052505222431.pptdatabasex
abhaysonone0
 
Query processing System
Query processing SystemQuery processing System
Query processing System
Department of Computer Science, Bharathidasan University, Tiruchirappalli
 
Learning from similarity and information extraction from structured documents...
Learning from similarity and information extraction from structured documents...Learning from similarity and information extraction from structured documents...
Learning from similarity and information extraction from structured documents...
Infrrd
 
presenttat related toautomated text summ
presenttat related toautomated text summpresenttat related toautomated text summ
presenttat related toautomated text summ
ssuserc8d828
 
DBMS Module-2 notes for engineering BE vtu
DBMS Module-2 notes for engineering BE vtuDBMS Module-2 notes for engineering BE vtu
DBMS Module-2 notes for engineering BE vtu
shreya520613
 
ch02-240507064009-ac337bf1 .ppt
ch02-240507064009-ac337bf1             .pptch02-240507064009-ac337bf1             .ppt
ch02-240507064009-ac337bf1 .ppt
iamayesha2526
 
Query optimization and processing for advanced database systems
Query optimization and processing for advanced database systemsQuery optimization and processing for advanced database systems
Query optimization and processing for advanced database systems
meharikiros2
 
Relational Algebra in Database Systems.pptx
Relational Algebra in Database Systems.pptxRelational Algebra in Database Systems.pptx
Relational Algebra in Database Systems.pptx
tehzeebwzr
 
RBHF_SDM_2011_Jie
RBHF_SDM_2011_JieRBHF_SDM_2011_Jie
RBHF_SDM_2011_Jie
MDO_Lab
 
Hua Shan - 2015 - A Dependency-to-String Model for Chinese-Japanese SMT System
Hua Shan - 2015 - A Dependency-to-String Model for Chinese-Japanese SMT SystemHua Shan - 2015 - A Dependency-to-String Model for Chinese-Japanese SMT System
Hua Shan - 2015 - A Dependency-to-String Model for Chinese-Japanese SMT System
Association for Computational Linguistics
 
Comparative Analysis of Transformer Based Pre-Trained NLP Models
Comparative Analysis of Transformer Based Pre-Trained NLP ModelsComparative Analysis of Transformer Based Pre-Trained NLP Models
Comparative Analysis of Transformer Based Pre-Trained NLP Models
saurav singla
 
Text categorization as a graph
Text categorization as a graphText categorization as a graph
Text categorization as a graph
James Wong
 
Text categorization as graph
Text categorization as graphText categorization as graph
Text categorization as graph
Harry Potter
 
Text categorization as a graph
Text categorization as a graph Text categorization as a graph
Text categorization as a graph
David Hoen
 
Text categorization as a graph
Text categorization as a graphText categorization as a graph
Text categorization as a graph
Young Alista
 
Text categorization
Text categorization Text categorization
Text categorization
Luis Goldster
 
Text categorization as a graph
Text categorization as a graphText categorization as a graph
Text categorization as a graph
Fraboni Ec
 
Graph classification problem.pptx
Graph classification problem.pptxGraph classification problem.pptx
Graph classification problem.pptx
Tony Nguyen
 
Achieving Algorithmic Transparency with Shapley Additive Explanations (H2O Lo...
Achieving Algorithmic Transparency with Shapley Additive Explanations (H2O Lo...Achieving Algorithmic Transparency with Shapley Additive Explanations (H2O Lo...
Achieving Algorithmic Transparency with Shapley Additive Explanations (H2O Lo...
Sri Ambati
 
So sánh cấu trúc protein_Protein structure comparison
So sánh cấu trúc protein_Protein structure comparisonSo sánh cấu trúc protein_Protein structure comparison
So sánh cấu trúc protein_Protein structure comparison
bomxuan868
 
316_16SCCCS4_2020052505222431.pptdatabasex
316_16SCCCS4_2020052505222431.pptdatabasex316_16SCCCS4_2020052505222431.pptdatabasex
316_16SCCCS4_2020052505222431.pptdatabasex
abhaysonone0
 
Learning from similarity and information extraction from structured documents...
Learning from similarity and information extraction from structured documents...Learning from similarity and information extraction from structured documents...
Learning from similarity and information extraction from structured documents...
Infrrd
 
presenttat related toautomated text summ
presenttat related toautomated text summpresenttat related toautomated text summ
presenttat related toautomated text summ
ssuserc8d828
 
DBMS Module-2 notes for engineering BE vtu
DBMS Module-2 notes for engineering BE vtuDBMS Module-2 notes for engineering BE vtu
DBMS Module-2 notes for engineering BE vtu
shreya520613
 
ch02-240507064009-ac337bf1 .ppt
ch02-240507064009-ac337bf1             .pptch02-240507064009-ac337bf1             .ppt
ch02-240507064009-ac337bf1 .ppt
iamayesha2526
 
Query optimization and processing for advanced database systems
Query optimization and processing for advanced database systemsQuery optimization and processing for advanced database systems
Query optimization and processing for advanced database systems
meharikiros2
 
Relational Algebra in Database Systems.pptx
Relational Algebra in Database Systems.pptxRelational Algebra in Database Systems.pptx
Relational Algebra in Database Systems.pptx
tehzeebwzr
 
RBHF_SDM_2011_Jie
RBHF_SDM_2011_JieRBHF_SDM_2011_Jie
RBHF_SDM_2011_Jie
MDO_Lab
 
Hua Shan - 2015 - A Dependency-to-String Model for Chinese-Japanese SMT System
Hua Shan - 2015 - A Dependency-to-String Model for Chinese-Japanese SMT SystemHua Shan - 2015 - A Dependency-to-String Model for Chinese-Japanese SMT System
Hua Shan - 2015 - A Dependency-to-String Model for Chinese-Japanese SMT System
Association for Computational Linguistics
 
Comparative Analysis of Transformer Based Pre-Trained NLP Models
Comparative Analysis of Transformer Based Pre-Trained NLP ModelsComparative Analysis of Transformer Based Pre-Trained NLP Models
Comparative Analysis of Transformer Based Pre-Trained NLP Models
saurav singla
 
Text categorization as a graph
Text categorization as a graphText categorization as a graph
Text categorization as a graph
James Wong
 
Text categorization as graph
Text categorization as graphText categorization as graph
Text categorization as graph
Harry Potter
 
Text categorization as a graph
Text categorization as a graph Text categorization as a graph
Text categorization as a graph
David Hoen
 
Text categorization as a graph
Text categorization as a graphText categorization as a graph
Text categorization as a graph
Young Alista
 
Text categorization
Text categorization Text categorization
Text categorization
Luis Goldster
 
Text categorization as a graph
Text categorization as a graphText categorization as a graph
Text categorization as a graph
Fraboni Ec
 
Graph classification problem.pptx
Graph classification problem.pptxGraph classification problem.pptx
Graph classification problem.pptx
Tony Nguyen
 
Achieving Algorithmic Transparency with Shapley Additive Explanations (H2O Lo...
Achieving Algorithmic Transparency with Shapley Additive Explanations (H2O Lo...Achieving Algorithmic Transparency with Shapley Additive Explanations (H2O Lo...
Achieving Algorithmic Transparency with Shapley Additive Explanations (H2O Lo...
Sri Ambati
 
So sánh cấu trúc protein_Protein structure comparison
So sánh cấu trúc protein_Protein structure comparisonSo sánh cấu trúc protein_Protein structure comparison
So sánh cấu trúc protein_Protein structure comparison
bomxuan868
 
Ad

Recently uploaded (20)

1. Briefing Session_SEED with Hon. Governor Assam - 27.10.pdf
1. Briefing Session_SEED with Hon. Governor Assam - 27.10.pdf1. Briefing Session_SEED with Hon. Governor Assam - 27.10.pdf
1. Briefing Session_SEED with Hon. Governor Assam - 27.10.pdf
Simran112433
 
4. Multivariable statistics_Using Stata_2025.pdf
4. Multivariable statistics_Using Stata_2025.pdf4. Multivariable statistics_Using Stata_2025.pdf
4. Multivariable statistics_Using Stata_2025.pdf
axonneurologycenter1
 
04302025_CCC TUG_DataVista: The Design Story
04302025_CCC TUG_DataVista: The Design Story04302025_CCC TUG_DataVista: The Design Story
04302025_CCC TUG_DataVista: The Design Story
ccctableauusergroup
 
Modern_Distribution_Presentation.pptx Aa
Modern_Distribution_Presentation.pptx AaModern_Distribution_Presentation.pptx Aa
Modern_Distribution_Presentation.pptx Aa
MuhammadAwaisKamboh
 
Calories_Prediction_using_Linear_Regression.pptx
Calories_Prediction_using_Linear_Regression.pptxCalories_Prediction_using_Linear_Regression.pptx
Calories_Prediction_using_Linear_Regression.pptx
TijiLMAHESHWARI
 
Minions Want to eat presentacion muy linda
Minions Want to eat presentacion muy lindaMinions Want to eat presentacion muy linda
Minions Want to eat presentacion muy linda
CarlaAndradesSoler1
 
Classification_in_Machinee_Learning.pptx
Classification_in_Machinee_Learning.pptxClassification_in_Machinee_Learning.pptx
Classification_in_Machinee_Learning.pptx
wencyjorda88
 
Deloitte Analytics - Applying Process Mining in an audit context
Deloitte Analytics - Applying Process Mining in an audit contextDeloitte Analytics - Applying Process Mining in an audit context
Deloitte Analytics - Applying Process Mining in an audit context
Process mining Evangelist
 
Day 1 - Lab 1 Reconnaissance Scanning with NMAP, Vulnerability Assessment wit...
Day 1 - Lab 1 Reconnaissance Scanning with NMAP, Vulnerability Assessment wit...Day 1 - Lab 1 Reconnaissance Scanning with NMAP, Vulnerability Assessment wit...
Day 1 - Lab 1 Reconnaissance Scanning with NMAP, Vulnerability Assessment wit...
Abodahab
 
MASAkkjjkttuyrdquesjhjhjfc44dddtions.docx
MASAkkjjkttuyrdquesjhjhjfc44dddtions.docxMASAkkjjkttuyrdquesjhjhjfc44dddtions.docx
MASAkkjjkttuyrdquesjhjhjfc44dddtions.docx
santosh162
 
Conic Sectionfaggavahabaayhahahahahs.pptx
Conic Sectionfaggavahabaayhahahahahs.pptxConic Sectionfaggavahabaayhahahahahs.pptx
Conic Sectionfaggavahabaayhahahahahs.pptx
taiwanesechetan
 
computer organization and assembly language.docx
computer organization and assembly language.docxcomputer organization and assembly language.docx
computer organization and assembly language.docx
alisoftwareengineer1
 
Principles of information security Chapter 5.ppt
Principles of information security Chapter 5.pptPrinciples of information security Chapter 5.ppt
Principles of information security Chapter 5.ppt
EstherBaguma
 
Stack_and_Queue_Presentation_Final (1).pptx
Stack_and_Queue_Presentation_Final (1).pptxStack_and_Queue_Presentation_Final (1).pptx
Stack_and_Queue_Presentation_Final (1).pptx
binduraniha86
 
Process Mining and Data Science in the Financial Industry
Process Mining and Data Science in the Financial IndustryProcess Mining and Data Science in the Financial Industry
Process Mining and Data Science in the Financial Industry
Process mining Evangelist
 
Digilocker under workingProcess Flow.pptx
Digilocker  under workingProcess Flow.pptxDigilocker  under workingProcess Flow.pptx
Digilocker under workingProcess Flow.pptx
satnamsadguru491
 
Medical Dataset including visualizations
Medical Dataset including visualizationsMedical Dataset including visualizations
Medical Dataset including visualizations
vishrut8750588758
 
Cleaned_Lecture 6666666_Simulation_I.pdf
Cleaned_Lecture 6666666_Simulation_I.pdfCleaned_Lecture 6666666_Simulation_I.pdf
Cleaned_Lecture 6666666_Simulation_I.pdf
alcinialbob1234
 
LLM finetuning for multiple choice google bert
LLM finetuning for multiple choice google bertLLM finetuning for multiple choice google bert
LLM finetuning for multiple choice google bert
ChadapornK
 
Adobe Analytics NOAM Central User Group April 2025 Agent AI: Uncovering the S...
Adobe Analytics NOAM Central User Group April 2025 Agent AI: Uncovering the S...Adobe Analytics NOAM Central User Group April 2025 Agent AI: Uncovering the S...
Adobe Analytics NOAM Central User Group April 2025 Agent AI: Uncovering the S...
gmuir1066
 
1. Briefing Session_SEED with Hon. Governor Assam - 27.10.pdf
1. Briefing Session_SEED with Hon. Governor Assam - 27.10.pdf1. Briefing Session_SEED with Hon. Governor Assam - 27.10.pdf
1. Briefing Session_SEED with Hon. Governor Assam - 27.10.pdf
Simran112433
 
4. Multivariable statistics_Using Stata_2025.pdf
4. Multivariable statistics_Using Stata_2025.pdf4. Multivariable statistics_Using Stata_2025.pdf
4. Multivariable statistics_Using Stata_2025.pdf
axonneurologycenter1
 
04302025_CCC TUG_DataVista: The Design Story
04302025_CCC TUG_DataVista: The Design Story04302025_CCC TUG_DataVista: The Design Story
04302025_CCC TUG_DataVista: The Design Story
ccctableauusergroup
 
Modern_Distribution_Presentation.pptx Aa
Modern_Distribution_Presentation.pptx AaModern_Distribution_Presentation.pptx Aa
Modern_Distribution_Presentation.pptx Aa
MuhammadAwaisKamboh
 
Calories_Prediction_using_Linear_Regression.pptx
Calories_Prediction_using_Linear_Regression.pptxCalories_Prediction_using_Linear_Regression.pptx
Calories_Prediction_using_Linear_Regression.pptx
TijiLMAHESHWARI
 
Minions Want to eat presentacion muy linda
Minions Want to eat presentacion muy lindaMinions Want to eat presentacion muy linda
Minions Want to eat presentacion muy linda
CarlaAndradesSoler1
 
Classification_in_Machinee_Learning.pptx
Classification_in_Machinee_Learning.pptxClassification_in_Machinee_Learning.pptx
Classification_in_Machinee_Learning.pptx
wencyjorda88
 
Deloitte Analytics - Applying Process Mining in an audit context
Deloitte Analytics - Applying Process Mining in an audit contextDeloitte Analytics - Applying Process Mining in an audit context
Deloitte Analytics - Applying Process Mining in an audit context
Process mining Evangelist
 
Day 1 - Lab 1 Reconnaissance Scanning with NMAP, Vulnerability Assessment wit...
Day 1 - Lab 1 Reconnaissance Scanning with NMAP, Vulnerability Assessment wit...Day 1 - Lab 1 Reconnaissance Scanning with NMAP, Vulnerability Assessment wit...
Day 1 - Lab 1 Reconnaissance Scanning with NMAP, Vulnerability Assessment wit...
Abodahab
 
MASAkkjjkttuyrdquesjhjhjfc44dddtions.docx
MASAkkjjkttuyrdquesjhjhjfc44dddtions.docxMASAkkjjkttuyrdquesjhjhjfc44dddtions.docx
MASAkkjjkttuyrdquesjhjhjfc44dddtions.docx
santosh162
 
Conic Sectionfaggavahabaayhahahahahs.pptx
Conic Sectionfaggavahabaayhahahahahs.pptxConic Sectionfaggavahabaayhahahahahs.pptx
Conic Sectionfaggavahabaayhahahahahs.pptx
taiwanesechetan
 
computer organization and assembly language.docx
computer organization and assembly language.docxcomputer organization and assembly language.docx
computer organization and assembly language.docx
alisoftwareengineer1
 
Principles of information security Chapter 5.ppt
Principles of information security Chapter 5.pptPrinciples of information security Chapter 5.ppt
Principles of information security Chapter 5.ppt
EstherBaguma
 
Stack_and_Queue_Presentation_Final (1).pptx
Stack_and_Queue_Presentation_Final (1).pptxStack_and_Queue_Presentation_Final (1).pptx
Stack_and_Queue_Presentation_Final (1).pptx
binduraniha86
 
Process Mining and Data Science in the Financial Industry
Process Mining and Data Science in the Financial IndustryProcess Mining and Data Science in the Financial Industry
Process Mining and Data Science in the Financial Industry
Process mining Evangelist
 
Digilocker under workingProcess Flow.pptx
Digilocker  under workingProcess Flow.pptxDigilocker  under workingProcess Flow.pptx
Digilocker under workingProcess Flow.pptx
satnamsadguru491
 
Medical Dataset including visualizations
Medical Dataset including visualizationsMedical Dataset including visualizations
Medical Dataset including visualizations
vishrut8750588758
 
Cleaned_Lecture 6666666_Simulation_I.pdf
Cleaned_Lecture 6666666_Simulation_I.pdfCleaned_Lecture 6666666_Simulation_I.pdf
Cleaned_Lecture 6666666_Simulation_I.pdf
alcinialbob1234
 
LLM finetuning for multiple choice google bert
LLM finetuning for multiple choice google bertLLM finetuning for multiple choice google bert
LLM finetuning for multiple choice google bert
ChadapornK
 
Adobe Analytics NOAM Central User Group April 2025 Agent AI: Uncovering the S...
Adobe Analytics NOAM Central User Group April 2025 Agent AI: Uncovering the S...Adobe Analytics NOAM Central User Group April 2025 Agent AI: Uncovering the S...
Adobe Analytics NOAM Central User Group April 2025 Agent AI: Uncovering the S...
gmuir1066
 
Ad

Reference Scope Identification of Citances Using Convolutional Neural Network

  • 1. Reference Scope Identification for Citances Using Convolutional Neural Network • SAURAV JHA • AANCHAL CHAURASIA • AKHILESH SUDHAKAR • ANIL KUMAR SINGH 19 December, 2017 MNNIT, Allahabad IIT (BHU), Varanasi
  • 2. Overview of the problem ! Automatically generating the reference scope (the span of cited text) in a reference paper ! Corresponding to citances (sentences in the citing papers that cite it) ! Application : Scientific Paper Summarization
  • 3. The Computational Linguistics Scientific Document Summarization Shared Task (CL-SciSumm) • Given: A topic consisting of a Reference Paper (RP) and Citing Papers (CPs) that all contain citations to the RP. In each CP, the text spans (i.e., citances) have been identified that pertain to a particular citation to the RP. • Task: For each citance, identify the spans of text (cited text spans) in the RP that most accurately reflect the citance: • A sentence fragment, a full sentence, or several consecutive sentences (no more than 5).
  • 4. Citing Paper (ours) Referenced Paper (Yeh et. al. (2017)) Example
  • 5. Contributions: ● Modeling a new feature set to represent a citance- reference sentence pair ● Building a classification system: binary classification of a <CP sentence, RP sentence> pair. ● Showing performance gains over state-of-the-art results of Yeh et al. (2017) ● Better F1-scores ● Smaller feature set ● 3 binary classifiers: ● Adaptive Boosting Classifier (ABC) ● Gradient Boosting Classifier (GBC) ● CNN classifier..
  • 6. Citance Reference Feature Extraction Label: 0/1 Undersampling + SMOTE Train + Val. Set Test Set Principal Comp. Analysis Classifiers Label: 0/1 Train Predict Model
  • 7. Dataset • CL-SciSumm Shared Tasks 2016 and 2017 • Development corpus • Training corpora • Test corpus ● Each corpus = 10 topics. ● Each topic = a reference paper (RP) + its citing papers (CPs). ● The citation annotations specify citances, their associated reference text and the discourse facet that it represents. ● Citances in CPs are paired with each sentence in the RPs, along with a binary label indicating their actual reference relations : 0 or 1.
  • 9. Feature Extraction: • Three different classes of citation-dependent features (i.e., lexical, knowledge-based and corpus-based) and one class of citation- independent features (i.e., surface). 1. LEXICAL FEATURES ● Word overlap* using 5 metrics: Dice coefficient, Jaccard coefficient, Cosine similarity, Levenshtein distance based fuzzy string similarity and modified gestalt pattern-matching based sequence matcher score. ● TF-IDF similarity: The TF-IDF vector cosine similarity. ● ROUGE measure: ROGUE-1, ROGUE-2 and ROGUE-L. ● Named entity overlap* : Using Dice coefficient, fuzzy string similarity, sequence matcher score and word2vec similarity. ● Number overlap*: Fuzzy string similarity and sequence matcher score. ● Significance of citation-related word pairs: Based on Pointwise Mutual Information (PMI) score (Church and Hanks, 1989). Convention: ● ∗ = borrowed, but modified features. ● ∗∗ = newly added features in this work. 1. Lexical 2. Knowledge-based 3. Corpus-based 4. Surface
  • 10. Feature Extraction: 2. KNOWLEDGE -BASED FEATURES • WordNet-based semantic similarity* : Best semantic similarity score between words in the citance and the reference sentence out of all the sets of cognitive synonyms (synsets) present in the WordNet. Convention: ● ∗ = borrowed, but modified features. ● ∗∗ = newly added features in this work. 1. Lexical 2. Knowledge-based 3. Corpus-based 4. Surface
  • 11. Feature Extraction: 3. CORPUS-BASED FEATURES • Word2Vec-based Semantic similarity** : Based on the pre- trained embedding vectors of the GoogleNews corpus, following Mikolov et al. (2013). Convention: ● ∗ = borrowed, but modified features. ● ∗∗ = newly added features in this work. 1. Lexical 2. Knowledge-based 3. Corpus-based 4. Surface
  • 12. Feature Extraction: 4. SURFACE FEATURES • Count of words: In the reference sentence. • Count of characters** : In the reference sentence. • Count of digits: In the reference sentence. • Count of special characters** : “@”, “#”, “$”, “%”, “&”, “*”, “-”, “=”, “+”, “>”, “<”, “[”,“]”, “{”, “}”, “/”. • Normalized count of punctuation markers** : The ratio of count of punctuation characters to the total count of characters. • Count of long words** : Words exceeding six letters in length. • Average word Length** : The ratio of count of total characters in a word to the count of words in the reference sentence. • Count of named entities: In the reference sentence. • Average sentiment score**: The overall positive and negative sentiment score of the reference sentence averaged over all the words, based on the SentiWordNet 3.0 lexical resource (Baccianella et al. (2010)). • Lexical richness** : The lexical richness of the reference sentence based on Yule’s K index. Convention: ● ∗ = borrowed, but modified features. ● ∗∗ = newly added features in this work. 1. Lexical 2. Knowledge-based 3. Corpus-based 4. Surface
  • 13. 1. The Class Imbalance Problem 2. Handling Class Imbalance 3. Handling Correlated Features • For a given reference paper, the number of sentences in it that are cited by some citing paper, is much lesser than the number of sentences that are not cited • Highly imbalanced data set with the ratio of non- cited to cited pairs being 383.83 : 1 in the combined corpus of development and training set and 355.76 : 1 in the test set corpus. Data Handling Techniques:
  • 14. 1. The Class Imbalance Problem 2. Handling Class Imbalance 3. Handling Correlated Features • We experimented with combinations of three different degrees of Random under-sampling (20%, 30% and 35%) on the majority class (negative samples). • On each such undersampled dataset, we apply the SMOTE (Synthetic Minority Over-sampling Technique) method to generate synthetic cited pairs until the ratio cited: non-cited pairs = 1:1. Data Handling Techniques:
  • 15. 1. The Class Imbalance Problem 2. Handling Class Imbalance 3. Handling Correlated Features • Principal Component Analysis (PCA), is applied on both training and testing feature sets. • Experiments were done by varying the number of principal components from 30-40 and the best performance was obtained by retaining the top 35 principal components. Data Handling Techniques:
  • 16. 1. Adaptive Boosting Classifier (ABC) 2. Gradient Boosting Classifier (GBC) 3. Convolutional Neural Network (CNN) • Work by creating a sequence of models that attempt to correct the mistakes of the models used before them in the sequence. • Offer the added benefit of combining outputs from weak learners (those whose performance is at least better than random chance) to create a strong learner with improved prediction performance. • Pay higher focus on instances that have been misclassified or have higher errors. • Base classifiers (or weak learner) used in ABC are decision trees. Classification Algorithms: Boosting Ensemble Algorithms
  • 17. 1. Adaptive Boosting Classifier (ABC) 2. Gradient Boosting Classifier (GBC) 3. Convolutional Neural Network (CNN) • Allows each base classifier to gradually minimize the loss function of the whole system using the Gradient Descent method (Collobert et al. (2004)). • The base classifiers in a GBC are regression trees. Classification Algorithms:
  • 18. Figure 1: Schematic illustration of the boosting framework. Adapted from Bishop and Nasrabadi (2007): each base classifier y_m(x) is trained on a weighted form of the training set (blue arrows) in which the weights w_n(m) depend on the performance of the previous base classifier y_m −1(x) (green arrows). Once all base classifiers have been trained, they are combined to give the final classifier Y_M(x) (red arrows).
  • 19. 1. Gradient Boosting Classifier (GBC) 2. Adaptive Boosting Classifier (ABC) 3. Convolutional Neural Network (CNN) • Have the ability to extract features of high-level abstraction with minimum pre-processing of data. ➢ ARCHITECTURE : • A 1D Convolutional layer accepts inputs of the form (Height * Width * Channels). • Visualize each feature vector as an image with a unit channel, unit height and a width equal to the number of features in the reduced feature vector obtained after applying PCA. • Input shape for the vector to be fed into the input layer of the CNN = (No. of features * 1). Classification Algorithms:
  • 20. Figure 2: Our CNN architecture: stack of two 1-D convolutional layers with 64 hidden units each (ReLu activations) + 1- D MaxPooling + stack of two 1-D convolutional layers with 128 hidden units each (ReLu activations) + 1-D Global Average Pooling + 50% Dropout + a single unit output dense layer (sigmoid activation)
  • 21. Post Filtering : ! The binary classifier may classify multiple sentences in the RP as positive, i.e., being relevant to a particular citance: all these might not be true! ! In order to reduce our false positive error rate, we post-process by filtering out some of these false positives. ! We use the method of Yeh et al. (2017): the final output denotes the top-k sentences from the ordered sequence of classified reference sentences based on the TF-IDF vector cosine similarity score to measure the relevance between the citance and the reference sentences.
  • 22. EXPERIMENTS: • Evaluation Metrics: Precision, Recall and F1-Score • The average score on all topics in the test corpus is reported. • We run experiments on two separate training sets: first run and second run. ➢ In the first run, we use data only from the 2016 shared task for comparison with the existing state-of-the-art (Yeh et al. (2017)): 1. Train our data on the training set, and tune the CNN’s hyper-parameters on the development set. 2. We then augment the training data and the development data to train the final models. 3. We test our model on the test provided as part of this dataset. Table 1 : F1 score comparison of CNN with previous models
  • 23. ➢ In the second run, we make use of the datasets from both 2016 and 2017. 1. Both the training datasets are augmented to form the initial training set. 2. After tuning the CNN’s hyperparameters on the development set, the initial training and development sets are augmented to form the final training set. ➢ Grid search algorithm over 10-fold cross validation used to find the best model parameters for ABC and GBC:
  • 24. Results And Analysis: ● Precision, recall and F1-score obtained by the models on the test set with respect to the positive classes, evaluated by 10-fold cross validation are shown in Table 3. ● The CNN-based classifier was trained for 30 epochs.
  • 25. Instances of TP, FP, TN, FN: • True Positive: • False Positive: Citance: We agree with Sekine (2005) who claims that several different methods are required to discover a wider variety of paraphrases. Reference: Rather believe several methods developed using different heuristics discover wider variety paraphrases Citance: Similarly, (Sekine, 2005) improved information retrieval based on pattern recognition by introducing paraphrase generation. Reference: obstacles completing idea, believe automatic paraphrase discovery important component building fully automatic information extraction system.
  • 26. Citance: We agree with Sekine (2005) who claims that several different methods are required to discover a wider variety of paraphrases. Reference: Keyword detection error Even keyword consists single word, words desirable keywords domain. Citance: This sparked intensive research on unsupervised acquisition of entailment rules (and similarly paraphrases) e.g. (Lin and Pantel, 2001; Szpektor et al., 2004; Sekine, 2005). Reference: proposed unsupervised method discover paraphrases large untagged corpus. • True Negative: • False Negative:
  • 27. Comparison with Klampfl et al. (2016): • Reported an F1-score of 0.346 on the development set corpus and 0.432 on the training set corpus of 2016 using TextSentenceRank assisted sentence classifier. • Because of the unavailability of their performance results on the test set corpus, we choose to compare the performance of our CNN classifier with theirs on the development and training set corpus (80:20 train:test split) of 2016.
  • 28. 1. Effect of Feature Classes 2. Effect of Data Handling Techniques Ablation Studies:
  • 29. 1. Effect of Feature Classes 2. Effect of Data Handling Techniques Ablation Studies:
  • 30. • More Data • Extensions to Word2Vec: Paragraph Vector (Le and Mikolo (2014)). • Modeling a Learning to rank problem: Establish some partial order between the training instances using the binary labels assigned to each <CP sentence, RP sentence> pair. Future Work
  • 31. ● We describe our work on reference scope identification for citances using an extended feature set applied to three different classifiers. ● Among the classifiers trained to distinguish cited and non-cited pairs, the CNN-based model gave the overall best results with an F1 score of 0.5558 on the combined corpus of CL-SciSumm 2016 and 2017. ● We also achieved an F1 score of 0.2462 on the 2016 dataset, which surpasses the previous state-of-the- art accuracy on the dataset. References: • Peeyush Aggarwal and Richa Sharma. 2016a. Lexical and syntactic cues to identify reference scope of citance. In BIRNDL@ JCDL, pages 103–112. • Stefano Baccianella, Andrea Esuli, and Fabrizio Sebastiani. 2010. Sentiwordnet 3.0: An enhanced lexical resource for sentiment analysis and opinion mining. In LREC. • François Chollet et al. 2015. Keras. https: //github.com/fchollet/keras • Kevin W. Bowyer, Nitesh V. Chawla, Lawrence O. Hall, and W. Philip Kegelmeyer. 2002. Smote: Synthetic minority oversampling technique. J. Artif. Intell. Res. (JAIR), 16:321–357. • Yoon Kim. 2014. Convolutional neural networks for sentence classification. In EMNLP. • Bruno Malenfant and Guy Lapalme. 2016. Rali system description for cl-scisumm 2016 shared task. In BIRNDL@ JCDL, pages 146–155. • Tomas Mikolov, Ilya Sutskever, Kai Chen, Gregory S. Corrado, and Jeffrey Dean. 2013. Distributed representations of words and phrases and their compositionality. In NIPS. • Ted Pedersen, Siddharth Patwardhan, and Jason Michelizzi. 2004. Wordnet: : Similarity-measuring the relatedness of concepts. In AAAI • Jen-Yuan Yeh, Tien-Yu Hsu, Cheng-Jung Tsai, and Pei-Cheng Cheng. 2017. Reference scope identification for citances by classification with text similarity measures. In ICSCA ’17.