Japanese Abstractive Summarization

This document discusses using BERT for abstractive text summarization in Japanese. The authors developed a neural network model with an encoder-decoder architecture. The encoder uses BERT to encode input text into context representations. A Transformer-based decoder then generates a draft summary from the encoder output. The model was trained on the livedoor news corpus but generated repeated texts in summaries, highlighting an area for future improvement.

Uploaded by

Aiom Mitri

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

12 views

Japanese Abstractive Summarization

Uploaded by

Aiom Mitri

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 5

Japanese abstractive text summarization using BERT

Yuuki Iwasaki Akihiro Yamashita

Department of Computer Science Department of Computer Science
National Institute of Technology, Tokyo College National Institute of Technology, Tokyo College
Tokyo, Japan Tokyo, Japan
[email protected] [email protected]

Yoko Konno Katsushi Matsubayashi

CHOWA GIKEN Corporation Department of Computer Science
Hokkaido, Japan National Institute of Technology, Tokyo College
[email protected] Tokyo, Japan
[email protected]

Abstract － In this study, we developed an automatic Generator Network model has advantages in both abstractive
abstractive text summarization algorithm in Japanese and extractive text summarizations.
using a neural network. We used a sequence-to-sequence The model developed in this study was constructed with
encoder-decoder model for experimentation purposes. The reference to the text summarization model using BERT [1]
encoder obtained a feature-based input vector of sentences and has two stages. In the first stage, input text was encoded
using BERT. A transformer-based decoder returned the into context representations using BERT and the output was
summary sentence from the output as generated by the
a draft summary text, generated from the input text processed
encoder. This experiment was performed using the
livedoor news corpus with the above model. However, with BERT using Transformer-based decoder. In the second
issues arose as the same texts were repeated in the stage, the draft summary text was reverified using BERT for
summary sentence. crisper summarizations. In our experiment, we employ only
the first stage in our text summarization model.
Keywords-component; abstractive text summarization; BERT; In this study, we build a text summarization model using
livedoor news corpus BERT, and evaluate the model. We also highlight future
issues that could arise from the text generated as a result of
I. INTRODUCTION training in the Japanese corpus.
Text summarization is the process of effectively II. RELATED WORKS
summarizing long sentences. Text summarization algorithms
in machine learning are mainly divided into two types: extract A. BERT
and abstract summaries. In the former type, an input sentence Recently, pre-learning models, such as BERT, have widely
is split into smaller sentences and a summary sentence is been incorporated in neural network models. Especially,
generated by combining important sentences. On the other those trained with BERT have achieved state-of-the-art
hand, the latter abstract type apprehends the input sentence performance on natural language processing tasks. BERT is
and yields a corresponding summary sentence by itself. Both pre-trained with a huge unlabeled corpus and can accomplish
are similar in that they summarize the key points of the input better performance by fine-tuning with another corpus.
sentence, however, the extract summary can only process We briefly describe the structure of the BERT model with
sentences gathered from the input sentence while abstract reference to [4]. BERT has several layers; each layer has a
summaries generate summary sentences by themselves, and Multi-Head Attention and a linear affine with the residual
hence, are more flexible in this regard. In this study, we focus connection. In our experiment, we utilize the BERT-base
on abstract summaries. model that has 12-layers and 768-hidden sizes.
In the recent years, various models have been proposed for
abstract text summarization [1] [3]. Zhang et al. [1] proposed B. Multi-Head Attention
an abstractive text summarization model using Bidirectional BERT and Transformer primarily comprise Multi-Head
Encoder Representations from Transformers (BERT). Attention layers. Multi-Head Attention divides the Attention
Experimentation results as reported in the paper revealed that input into multiple parts and concatenates them with multiple
their model achieved new state-of-the-art performace on both outputs. Multi-Head Attention is more accurate than
CNN/Daily Mail and New York Times datasets. Viswani et Attention. Multi-Head Attention can be calculated as shown
al. [2] proposed an abstractive text summarization model in equations (1), (2), and (3) as described in reference [2].
named Pointer-Generator Network based CopyNet. Pointer-

978-1-7281-4666-9/19/$31.00 ©2019 IEEE

A K A A K HA D BD
H D AF HA D D D D
H D D H D H ID H IB A

( A I A
A -A H A D
HA ' A H' A
) ) D A A
HA D D D D
K A, B JD HA AFD
' D A K A BD
A A I AF D D AH A JD
HA D 1.- K' AA 'A D D H AH D F I
1.- A D I D D

Figure 1. Overview of our text summarization model

𝑄𝐾 4 feed the output generated from the encoder into the input of
𝐴𝑡𝑡𝑒𝑛𝑡𝑖𝑜𝑛(𝑄, 𝐾, 𝑉) = 𝑠𝑜𝑓𝑡𝑚𝑎𝑥 3 8𝑉 (1) the decoder, it was not the proposed choice of decoder.
5𝑑7
Furthermore, we opted for a Transformer-based decoder as
ℎ𝑒𝑎𝑑; = 𝐴𝑡𝑡𝑒𝑛𝑡𝑖𝑜𝑛<𝑄𝑊; > , 𝐾𝑊;? , 𝑉𝑊;@ A (2) opposed to using Recurrent Neural Networks (RNNs) such
as LSTM and GRU for the following reasons: (1) Time for
𝑀𝑢𝑙𝑡𝑖𝐻𝑒𝑎𝑑(𝑄, 𝐾, 𝑉) = 𝐶𝑜𝑛𝑐𝑎𝑡(ℎ𝑒𝑎𝑑I , … , ℎ𝑒𝑎𝑑K )𝑊 L (3) training; transformer-based decoders are built with Multi-
Equation (1) represents the formula for a single input query Head Attentions, which can perform parallel computations,
In Multi-Head Attention, the input is split into units of head and are, hence, faster. (2) Accuracy; transformers are far
as shown in (2), and each output is concatenated as shown in more accurate than RNNs on machine translation tasks. (3)
(3) to generate the overall output. In practice, we compute Long-range dependency; attention used in transformers is
the attention function on a set of queries simultaneously, easier to learn long-range dependencies compared to RNNs
packed together into a matrix Q. The keys and values are also such as LSTM.
packed together into matrices K and V. 𝑊; > , 𝑊;? , 𝑊;@ and C. Abstractive summarization model
𝑊 L are parameter matrices. 𝑑7 used in Equation (1) is The input is denoted as 𝑋 = {𝑥I , 𝑥Q , … , 𝑥R } , sequence
dimension of keys K. representing sentence breaks as S = {sI , sQ , … , sV } , and
III. MODEL corresponding summary as 𝐴 = {𝑎I , 𝑎Q , … , 𝑎7 }. We started
by entering 𝑋 and S into BERT.
Fig. 1 shows an overview of the model used in our If 𝑓WXR (𝑥) is assumed to be the sentence number of 𝑥 ,
experiment. sequence 𝑆 is computed as 𝑆 = 𝑓WXR (𝑋) 𝑚𝑜𝑑 2 . The
A. Encoder resulting BERT encoder output is denoted as 𝐻. Next, we
Several pre-learning models, such as BERT, are widely input 𝐻 and the output of decoder at the t-th time step.
utilized in encoder-decoder models. As BERT is efficient for The probability of vocabulary at the t-th time step can be
fast training with high precision, it ensures higher accuracy obtained as shown in (4). This probability was conditioned
in existing models. In the study conducted by Zhang et al. [1], on the decoder output until the t-th time step and output of
BERT is used as the encoder to achieve state-of-the-art the encoder (𝐻).
performance in the abstract text summarization task. We 𝑃[ (𝑤) = 𝑓]X^L]X_ (𝑤|𝐻, 𝑌b[ ) (4)
applied pre-learning model BERT as the encoder of our The loss of training, 𝐿, is calculated as shown in (5) using
model in this experiment. the probability of vocabulary, 𝑃[ (𝑤).
R

B. Decoder 𝐿 = − f log 𝑃(𝑦[ |𝐻, 𝑎[kI ) (5)

A Transformer-based decoder was configured as the encoder ;lm
in our model. With BERT, because it cannot simultaneously
accuracy
loss

step step

Figure 2. Result of training with livedoor news corpus (left: the loss of training, right: the accuracy of training)

for training and 30,000 for verification. The maximum length

IV. EXPERIMENT of input sequence was set to 512 tokens, however, some data
values exceeded this limit. In such cases, only the first 512
A. Setting
tokens were entered into the model. As, in sentence
In this experiment, we used a pre-trained model with BERT summarization, the key points are often affixed at the
that was developed in Kurohara and Kawahara laboratory of beginning of the sentence, thus this method was deemed
Kyoto College. Most of the BERT hyperparameters were appropriate.
same as that of BERT-Base (i.e., 12-layers, 768-hidden, 12-
heads) in [4]. The model was trained for 30 epochs with 1.8 C. Training
billion Japanese Wikipedia corpuses. The input text was When 100,000 livedoor news corpus datapoints of 1 epoch
divided into sub-words with byte pair encoding (BPE) using were trained for 15 epochs, the loss of each vocabulary in the
the morphological analysis system Juman++ [5]. validation data was approximately 6.5 words, and the
The vocabulary size stood at 32,000 words. The decoder in accuracy of identifying the right vocabulary was
our model comprised 8 Multi-Head Attention layers; the approximately 67%. Fig. 2 shows the result of the training.
division of hidden size as 3,072; and the embedding vector The graphics card used for training was Titan X (Pascal) with
as 768, same as the encoder. a memory of 12GB. The time taken for training was
We used Adam as the model optimizer; set the parameters approximately three days.
as 𝛽I = 0.9，𝛽Q = 0.999，𝜖 = 1𝑒 − 9 . The maximum
V. RESULT AND ANALYSIS
learning rate was set to 1𝑒 − 4. The dynamic learning rate
was adopted as the model learning rate as per reference [2]. Fig. 3 and Fig. 4 show the results (translated into English)
The learning rate is computed as shown in equation (6), of the text summarization of 100,00 livedoor news corpora
min(𝑐𝑠 km.} , 𝑤𝑠 kI.} ∗ 𝑐𝑠) using a model that were translated into English for text
𝑙𝑟 = max _ 𝑙𝑒𝑎𝑟𝑛𝑖𝑛𝑔_𝑟𝑎𝑡𝑒 ∗ (6) summarization by learning for 15 epochs. The text under
𝑤𝑠 km.}
In principle, the learning rate increases linearly up to the “Output Text” is the model output in Japanese. When
warmup step (ws). If the current step (cs) exceeds the generating the text, a beam search with a width of 4 was used.
warmup step, the learning rate gradually decreases. The Prior to the training process, WordPiece was used to further
learning rate peaks when current step and warmup step are divide out-of-vocabulary words into multiple words. The “##”
equal. At this point, the learning rate is denoted as 𝑙𝑟 = identifier appended to words, as shown in Fig. 3, indicates a
𝑚𝑎𝑥_𝑙𝑒𝑎𝑟𝑛𝑖𝑛𝑔_𝑟𝑎𝑡𝑒 . For our experiment, we set letter at the beginning of a word that was divided into a sub-
𝑤𝑎𝑟𝑚𝑢𝑝 𝑠𝑡𝑒𝑝 (𝑤𝑠) = 4000 and max_learning_rate = word. For example, the word “スモー##ク”("smoke”) in Fig.
0.0001 for training. The model contains BERT that has 12 3 is not present in the vocabulary, thus, it is divided into “ス
Multi-Head Attention layers and Transformer-based decoder モー” and “##ク” to fit the vocabulary.
that has 8 Multi-Head Attention layers. As we set the max A word is divided as long as it can be represented by another
input sequence to 512, the batch size is set to 4 because of word in the vocabulary after which it is treated as an
the GPU memory. unknown word.
Input text
B. Dataset [CLS]Director Wayne Wang from Hong Kong who has won the
The livedoor news corpus contains 130,000 Japanese news Silver Bear Award of the Berlin International Film Festival, one
articles from Live Door News; each news article is of the world's three largest film festivals in the film “Smoke”
accompanied by a 3-line summary. The article body and ([UNK]), and Japan ’s world-famous Beat Takeshi A movie
summary are set as the input and output of the experiment, titled “When a Woman Sleeps” (Hide on the [UNK] month
[UNK] day). Starring outside his own director's work, since
respectively. From the dataset, 100,000 datapoints were used [UNK] “Blood and Bones”, Takeshi, who was actually [UNK]
for the first time in years, said, “I was anxious about my images." This news has ripples on the internet. Will it disappear?
performance enough to hold my head.” I made it. The reason is Why isn't the Kanji breaking down more and more? ·What does
that the boundary between dreams and delusions and reality is it mean? There are many comments, such as "I'm too ugly" and
vague and mysterious. [UNK] People talked to each other. This what was the strict instruction in elementary school ... The kanji
film, which was a short story by Spanish writer Javier Marias, stop, Haneno [UNK], is one of the first strict instruction of
was filmed in a quiet resort hotel. [UNK] 's mystery, staring at a elementary school students, and many people are surprised that
writer (Nishijima) who is obsessed with his mysterious it will disappear. However, more surprisingly, because there are
relationship and runs out of curiosity. “I thought that the star was various ways of writing Kanji handwritten characters, it is
Nishijima-kun,” Takeshi, who smiled, read the script and read, originally written in the regular Kanji table established by the
“I thought it was a bad idea. He said he had a hard time Ministry of Education, Culture, Sports, Science and Technology.
understanding a complicated story. Nishijima, who admires that Usually, the “example” has been “guidance guide” and “printing
he is more familiar with film expression than himself, relieved, letters”, not “correct letter shapes”. Some people have heard
“If you say“ good ”, there will be no mistake.” “I was watching about this for the first time in this case. I didn't know that I didn't
only acting, and when I finished watching, I didn't remember have the right shape. I was surprised and convinced, but there
what kind of movie it was”, revealing an undiscovered side was also a comment that there were various ways of writing. By
called “Kitano in the world”. “I think it will come out because the way, the confusing “stroke order” is the same on the right,
Wayne Wang’s film is taken by Beat Takeshi.” (Laughs), and although there is a guide, there is nothing officially
Nishijima decided to appear immediately. I was really shocked established. In recent years, there has been an increase in
and really moved [SEP] confusion about kanji at entrance examinations and financial
Output text institution windows, so this time, we are trying to spread this
映画「スモー##ク」で世界三大映画祭祭に数えられる映 recognition socially.[SEP]
画「女が眠るとき」が公開された「自分の演技が不安だ Output text
し、完成作も不安で頭を抱えながら自分の演技ばかりを文部科学省の外##局となる文化##庁が、漢字の手書き文
ずっと観ていて。自分の演技ばかりをずっと観ていたと字に##ついて語##った「は##ねにはこだ##わらず広く許
##い##う[SEP] 容する」と##い##う指針##案をまとめたネットでは「学
(The movie “Smoke”', the movie “When a woman sleeps” that 校に##よ##るものがない」「質##んな書き##方がある」
was counted at the world's three largest film festivals was などの声があ##が##った[SEP]
released I was always watching my performance.[SEP]) (The Agency for Cultural Affairs, the Ministry of Education,
Figure 3. The generated summary text by trained model Culture, Sports, Science and Technology (MEXT) spoke about
the handwritten characters of kanji, and the voice that says
Fig. 3 demonstrates that the model was able to learn "There are various ways of writing" on the Internet, which
correctly as, it can be seen, to some extent, that the summary summarizes the draft guidelines of "tolerate widely without
text retained the key points of the input text. splashing"[SEP])
Figure 4. Another example of generated summary text
It can be seen from Fig. 3 that the phrase “自分の演技ばか
りを観ていた” (“I was only watching my performance.”) Another limitation of this study is where the model returned
has appeared twice and is not suitable for text summarization. incorrect translations of words in the output summary, such
Cases where the model returned a phrase more than once in as, in Fig. 4, “質##んな” should translate to “色んな,”
the summary occurred 3 to 4 times out of 10. The which means “various”.
characteristics of the input sentence that are likely to cause As described above, problems such as repeated contents,
this could not be identified. inability to handle unknown words, and simple word
Fig. 4 highlights some words, such as “[UNK],” that were mistakes occurred in this study.
not present in the vocabulary. Such cases were handled by
WordPiece; words were split until a reference was found in A. Coverage mechanism
the vocabulary. The chunks that did not find a vocabulary There is often a problem that the content of the summary
match were processed as unknown words as it not possible sentence is repeated in the sentence summarization model by
to find references for all unknown words. the neural network of a sequence-to-sequence model as
Figure 4 shows another summary example. referenced in [1]. This can be avoided by implementing a
Input text mechanism called “coverage mechanism” that adds
[CLS]On this month [UNK], the Agency for Cultural Affairs, an additional losses to words that have been used multiple times.
external office of the Ministry of Education, Culture, Sports, The loss calculation formula can be obtained by (7), (8), and
Science and Technology, put together a draft guideline on “Kanji
handwritten characters, which are widely accepted, not stuck,
(9) with reference to [3].
and not splashed”. As a result, various kanji characters will be „
accepted, but various opinions such as voices of doubt and 𝑐 [ = ∑[kI
[ „ lm 𝑎
[ (7)
support are on the internet. According to the Agency for Cultural [
where 𝑐 is the sum of all attention distributions up to the t-
Affairs, if there are parts of the kanji that correspond to the th step of the decoder output and 𝑎 is the output when 𝑐 [ is
framework, try to tolerate small differences in character shape. input to the decoder.
For example, the angle of the first stroke of “word” may be
diagonal, horizontal, or vertical. "Click here for an article with 𝑐𝑜𝑣𝑙𝑜𝑠𝑠 = f min(𝑎;[ , 𝑐;[ ) (8)
;
Equation (8) represents the loss by the coverage mechanism This would solve the problem of not using words that are
to be added to the loss function. Therefore, the overall loss not in the vocabulary.
of the model is as shown in (9).
C. Model improvement
𝑙𝑜𝑠𝑠[ = 𝐿 + 𝜆 ∗ 𝑐𝑜𝑣𝑙𝑜𝑠𝑠 (9)
where 𝐿 is the loss, as computed in (5), and 𝜆 is the The third problem is simple word mistakes.
hyperparameter. In this study, as explained in section I, we used the first
By this method, the loss compounds when a word is stage, out of the two stages, of the model developed in [1].
repeated. Therefore, the model learns to avoid the same word The second stage 2 verifies each word in the summary
as much as possible. This can assist our model to minimize sentence using BERT.
content repetition in the summary sentence. When the draft summary sentence and encoder output are
entered into the BERT decoder, the draft summary is masked
B. Copy mechanism word by word and, thereafter, fed into BERT to obtain a
In human text summarization, words and phrases from the context vector. A probability for each of the masked word is
original sentence are often used as is in the summary computed using a Transformer-based decoder, as configured
sentence. With machine learning, however, the sentence in stage one.
summarization model collates a vocabulary using neural The purpose of the second stage is to improve the accuracy
networks, and words outside of the vocabulary cannot be of the summarized sentence by checking word by word the
used. Hence, you may not, for instance, be able to use the correctness of the draft summary. Therefore, the simple word
names of people in the text. This has a considerable impact mistakes that were encountered in our output sentence can be
on the accuracy of the summarized sentence. However, it can eliminated by adding a refined BERT and Transformer-
be handled by implementing copy mechanism defined in [3]. based decoder to check each word in the summarization
Copy mechanism allocates a temporary number to an out-of- model.
vocabulary word in the input sentence so as to be replicated
in the summary sentence. With reference to [2], the equation VI. CONCLUSION AND FUTURE WORK
for the copy mechanism is shown in (10), (11), (12) and (13). We conducted an experiment to demonstrate Japanese
Š abstractive text summarization with a neural network model
𝑢[ = 𝑜[ 𝑊^ ℎŠ (10)
using BERT. Our model comprised a BERT encoder and
Š
•
‹Œ• Ž•
Transformer-based decoder. The dataset used in this paper
𝑎[ = ∑’ ‘ (11) was the livedoor news corpus consisting of 130,000
‘“” ‹Œ• Ž•
We start by calculating the attention probability datapoints, of which 100,000 were used for training.
distributions using encoder output as ℎŠ and decoder output The results of the experiment revealed that the model was
able to learn correctly as the summary sentence captured the
up to t-th step as 𝑜[ . Equation (11) applies softmax with 𝑢[ .
key points of the text to some extent. However, the contents
Then, we compute a gate, 𝑔[ , using the encoder output, ℎ,
of the summary sentence were repeated, and the model was
and decoder output up to t-th step, 𝑜[ . The gate value helps
unable to handle unknown words. Additionally, there was a
to select words from the input text to be added to the output
problem of simple word mistakes. We believe that the above
text. Gate is a real number from 0 to 1, where 𝑔[ and
problems could be solved by utilizing the coverage and copy
(1 − 𝑔[ ) are coefficients used to generate and copy words
mechanisms, and by improving the models.
separately.
In the future, we will explore these recommendations with
𝑔[ = 𝑠𝑖𝑔𝑚𝑜𝑖𝑑<𝑊• ∗ [𝑜[ , ℎ] + 𝑏• A (12) new experiments and compare the results.
We calculate the probability of copy and generate using the
gate 𝑔[ . ACKNOWLEDGMENT
𝑃[ (𝑤) = (1 − 𝑔[ )𝑃[™L^š› (𝑤) + 𝑔[ f 𝑎[; (13) This work was supported by JSPS KAKENHI Grant Number
;:•ž l• 19K12906.
The right hand side terms of (13) can be defined as follows:
(1 − 𝑔[ )𝑃[™L^š› (𝑤) is the generate probability, 𝑃[™L^š› (𝑤) is REFERENCES
the vocabulary probability of 𝑃[ (𝑤), as shown in (4), and [1] Haoyu Zhang, Jianjun Xu, Ji Wang, “Pretraining-Based Natural
𝑔[ ∑;:•žl• 𝑎[; is the copy probability. Language Generation for Text Summarization”, arXiv preprint
arXiv:1902.09243
A sequence-to-sequence model inherently maps a decoder [2] Ashish Vaswani et al., “Attention Is All You Need”, arXiv preprint
output to the same form as the vocabulary number and uses arXiv:1706.03762
that as the final probability. However, the copy mechanism [3] Abigail See, Peter J. Liu, Christopher D. Manning, “Get To The
Point: Summarization with Pointer-Generator Networks”, arXiv
calculates the final probability by finding the probability of preprint arXiv:1704.04368
each of the encoder words and adding it to the output [4] Jacob Devlin, Ming-Wei Chang, Kenton Lee, Kristina Toutanova,
probability of the decoder. In addition, an identifier is “BERT: Pre-training of Deep Bidirectional Transformers for
temporarily assigned to an out-of-vocabulary word, which Language Understanding”, arXiv preprint arXiv:1810.04805
[5] Kyoto University, Morphological Analysis System JUMAN++,
can be used as is when generating the summary sentence. http://nlp.ist.i.kyoto-u.ac.jp/index.php?JUMAN++

Lto Cover Letter Nadine Block
50% (2)
Lto Cover Letter Nadine Block
2 pages
Leadership - Questions and Answers
100% (2)
Leadership - Questions and Answers
13 pages
Pretraining-Based Natural Language Generation For Text Summarization
No ratings yet
Pretraining-Based Natural Language Generation For Text Summarization
7 pages
BERT Summarization MP IA1Final
No ratings yet
BERT Summarization MP IA1Final
12 pages
BERT Summarization MP IA1
No ratings yet
BERT Summarization MP IA1
16 pages
MD Adil Irshad
No ratings yet
MD Adil Irshad
37 pages
Wijayanti 2021
No ratings yet
Wijayanti 2021
6 pages
T-BERTSum Topic-Aware Text Summarization Based on BERT
No ratings yet
T-BERTSum Topic-Aware Text Summarization Based on BERT
12 pages
1903.10318 - Fine-Tune BERT For Extractive Summarization
No ratings yet
1903.10318 - Fine-Tune BERT For Extractive Summarization
6 pages
ACM Journals Primary Article Template Latest Version 4
No ratings yet
ACM Journals Primary Article Template Latest Version 4
31 pages
Unsupervised Extractive Multi-Document Summarization Method Based On Transfer Learning From BERT Multi-Task Fine-Tuning
No ratings yet
Unsupervised Extractive Multi-Document Summarization Method Based On Transfer Learning From BERT Multi-Task Fine-Tuning
19 pages
Leveraging ParsBERT and Pretrained MT5 For Persian Abstractive
No ratings yet
Leveraging ParsBERT and Pretrained MT5 For Persian Abstractive
7 pages
Abstractive Text Summary Generation With Knowledge Graph Representation
No ratings yet
Abstractive Text Summary Generation With Knowledge Graph Representation
9 pages
Abstractive summarization using multilingual text-to-text transfer transformer for the Turkish text
No ratings yet
Abstractive summarization using multilingual text-to-text transfer transformer for the Turkish text
10 pages
Ensemble_BERT_A_Student_Social_Network_Text_Sentiment_Classification_Model_Based_on_Ensemble_Learning_and_BERT_Architecture
No ratings yet
Ensemble_BERT_A_Student_Social_Network_Text_Sentiment_Classification_Model_Based_on_Ensemble_Learning_and_BERT_Architecture
4 pages
Hybrid Model For Extractive Single Document Summarization: Utilizing BERTopic and BERT Model
No ratings yet
Hybrid Model For Extractive Single Document Summarization: Utilizing BERTopic and BERT Model
9 pages
Comparative Analysis of T5 Model For Abstractive Text Summarization On Different Datasets
No ratings yet
Comparative Analysis of T5 Model For Abstractive Text Summarization On Different Datasets
7 pages
Combination of Abstractive and Extractive Approaches For Summarization of Long Scientific Texts
No ratings yet
Combination of Abstractive and Extractive Approaches For Summarization of Long Scientific Texts
11 pages
BERT Finetuning Theory
No ratings yet
BERT Finetuning Theory
14 pages
21MCEC01 Prashant MP Review 3
No ratings yet
21MCEC01 Prashant MP Review 3
19 pages
Bert ayman
No ratings yet
Bert ayman
5 pages
BERT
No ratings yet
BERT
4 pages
T5-Based Model For Abstractive Summarization A Semi-Supervised Learning Approach With Consistency Loss Functions
No ratings yet
T5-Based Model For Abstractive Summarization A Semi-Supervised Learning Approach With Consistency Loss Functions
16 pages
Short Updates-Machine Learning Based News Summarizer
No ratings yet
Short Updates-Machine Learning Based News Summarizer
11 pages
Stanford Dataset 2.0
No ratings yet
Stanford Dataset 2.0
9 pages
BERT-NAR-BERT A Non-Autoregressive Pre-Trained Sequence-to-Sequence Model Leveraging BERT Checkpoints
No ratings yet
BERT-NAR-BERT A Non-Autoregressive Pre-Trained Sequence-to-Sequence Model Leveraging BERT Checkpoints
11 pages
HKBK College of Engineering Department of Computer Science and Engineering
No ratings yet
HKBK College of Engineering Department of Computer Science and Engineering
24 pages
Inlg 19 TL DR Writeup 4
No ratings yet
Inlg 19 TL DR Writeup 4
7 pages
Report Group-8
No ratings yet
Report Group-8
16 pages
nlp
No ratings yet
nlp
8 pages
BERT
No ratings yet
BERT
21 pages
Paper for reference
No ratings yet
Paper for reference
47 pages
Abstractive Sentence Summarization With Attentive Recurrent Neural Networks
No ratings yet
Abstractive Sentence Summarization With Attentive Recurrent Neural Networks
6 pages
INTELLIPAAT - 2024 - 01 - 20 - Tansformers Cont. and Autoencoders
No ratings yet
INTELLIPAAT - 2024 - 01 - 20 - Tansformers Cont. and Autoencoders
11 pages
LSTM to BERT
No ratings yet
LSTM to BERT
30 pages
A Neural Attention Model For Abstractive Sentence Summarization
No ratings yet
A Neural Attention Model For Abstractive Sentence Summarization
11 pages
paper-225
No ratings yet
paper-225
5 pages
Amharic Abstractive Text Summarization
No ratings yet
Amharic Abstractive Text Summarization
5 pages
Final4 W18-2706
No ratings yet
Final4 W18-2706
10 pages
Abstractive Text Summarization Using Deep Learning
No ratings yet
Abstractive Text Summarization Using Deep Learning
43 pages
Rebertsubmission116 NW
No ratings yet
Rebertsubmission116 NW
26 pages
BERT Architecture
No ratings yet
BERT Architecture
8 pages
Generating Wikipedia by Summarizing Long Sequence
No ratings yet
Generating Wikipedia by Summarizing Long Sequence
33 pages
Paper 3
No ratings yet
Paper 3
6 pages
Transformers MUIA
No ratings yet
Transformers MUIA
34 pages
Chen, Bansal - 2018 - Fast Abstractive Summarization With Reinforce-Selected Sentence Rewriting-Annotated
No ratings yet
Chen, Bansal - 2018 - Fast Abstractive Summarization With Reinforce-Selected Sentence Rewriting-Annotated
12 pages
BERT Language Model
No ratings yet
BERT Language Model
7 pages
COMP 652 Project Final Paper
No ratings yet
COMP 652 Project Final Paper
10 pages
UNIT-5 and 6
No ratings yet
UNIT-5 and 6
40 pages
Get To The Point:: Summarization With Pointer-Generator Networks
No ratings yet
Get To The Point:: Summarization With Pointer-Generator Networks
32 pages
4 - 21 - Sentiment Analysis Using BERT - ICCTA - 2021
No ratings yet
4 - 21 - Sentiment Analysis Using BERT - ICCTA - 2021
5 pages
Perceptual Computing: Fundamentals and Applications
From Everand
Perceptual Computing: Fundamentals and Applications
Fouad Sabry
No ratings yet
Data Representation for Deep Learning - Based Arabic Text Summarization Performance Using Python Results
No ratings yet
Data Representation for Deep Learning - Based Arabic Text Summarization Performance Using Python Results
18 pages
Problem Statement:: Rule-Based Machine Translation (RBMT), Statistical Machine Translation (SMT), Neural
No ratings yet
Problem Statement:: Rule-Based Machine Translation (RBMT), Statistical Machine Translation (SMT), Neural
4 pages
Research Article: N-GPETS: Neural Attention Graph-Based Pretrained Statistical Model For Extractive Text Summarization
No ratings yet
Research Article: N-GPETS: Neural Attention Graph-Based Pretrained Statistical Model For Extractive Text Summarization
14 pages
32-Bidirectional Encoder Representations From Transformers (BERT) - 30!09!2024
No ratings yet
32-Bidirectional Encoder Representations From Transformers (BERT) - 30!09!2024
8 pages
11 Bert
No ratings yet
11 Bert
66 pages
report24
No ratings yet
report24
7 pages
Abstractive Text Summarization of Multimedia News Content Using RNN
No ratings yet
Abstractive Text Summarization of Multimedia News Content Using RNN
10 pages
Abstractive Text Summarization Using Deep Learning
No ratings yet
Abstractive Text Summarization Using Deep Learning
7 pages
Ranking Sentences For Extractive Summarization With Reinforcement Learning
No ratings yet
Ranking Sentences For Extractive Summarization With Reinforcement Learning
13 pages
ChatGPT KZ Feb2023 PDF
No ratings yet
ChatGPT KZ Feb2023 PDF
7 pages
Lesson Plan Final
No ratings yet
Lesson Plan Final
4 pages
Arguments and Fallacies.
No ratings yet
Arguments and Fallacies.
15 pages
Division With Remainders Lesson Plan
No ratings yet
Division With Remainders Lesson Plan
3 pages
Toeic Prep B2: Emlv 4Th Year Alternance FCG - Drh-Msid Dmda-Nma
No ratings yet
Toeic Prep B2: Emlv 4Th Year Alternance FCG - Drh-Msid Dmda-Nma
28 pages
What Is Differentiated Instruction
No ratings yet
What Is Differentiated Instruction
24 pages
Workplace Conflict Resolution: People Management Tips by Susan M. Heathfield C
No ratings yet
Workplace Conflict Resolution: People Management Tips by Susan M. Heathfield C
5 pages
Constructivism
100% (1)
Constructivism
9 pages
Is There A World Standard English? David Crystal Investigates
No ratings yet
Is There A World Standard English? David Crystal Investigates
0 pages
Future: Future: Will and Shall
No ratings yet
Future: Future: Will and Shall
1 page
Speech Outline
No ratings yet
Speech Outline
5 pages
Cover For Tasks
No ratings yet
Cover For Tasks
2 pages
NPTEL - Machine Learning Assignment Q&A
No ratings yet
NPTEL - Machine Learning Assignment Q&A
18 pages
Week 1-2 Las English q3
No ratings yet
Week 1-2 Las English q3
4 pages
Speech Emotion Recognition: Two Decades in A Nutshell, Benchmarks, and Ongoing Trends
No ratings yet
Speech Emotion Recognition: Two Decades in A Nutshell, Benchmarks, and Ongoing Trends
9 pages
Writing The Visual Kinesthetic and Auditory Alphabet FREE SAMPLE
No ratings yet
Writing The Visual Kinesthetic and Auditory Alphabet FREE SAMPLE
9 pages
Avoiding Repetition When Communicating
No ratings yet
Avoiding Repetition When Communicating
42 pages
An Attention Based YOLOv 5 Networkfor Small Traffic Sign Recognition
No ratings yet
An Attention Based YOLOv 5 Networkfor Small Traffic Sign Recognition
8 pages
Memory in Julian Barnes' The Sense of An Ending and Salman Rushdie's Midnight's Children
No ratings yet
Memory in Julian Barnes' The Sense of An Ending and Salman Rushdie's Midnight's Children
4 pages
Lesson:-32 Attitude: Components of Attitudes
No ratings yet
Lesson:-32 Attitude: Components of Attitudes
9 pages
English Lesson Plan Form 1 2020: at The End of The Lesson, Pupils Should Be Able To
No ratings yet
English Lesson Plan Form 1 2020: at The End of The Lesson, Pupils Should Be Able To
2 pages
Sop of Quantitative Methods
No ratings yet
Sop of Quantitative Methods
2 pages
Attention Deficit Hyperactivity Disorder (ADHD) Strategies: Organisation
No ratings yet
Attention Deficit Hyperactivity Disorder (ADHD) Strategies: Organisation
1 page
The Verb To Be
No ratings yet
The Verb To Be
2 pages
Michel Foucault-The Order of Discourse
No ratings yet
Michel Foucault-The Order of Discourse
17 pages
CLASA A XII-a B
No ratings yet
CLASA A XII-a B
4 pages
Stress Lessons Educators Guide en (1) 0
No ratings yet
Stress Lessons Educators Guide en (1) 0
52 pages
Exploring Neural Word Embeddings For Amharic Languages
No ratings yet
Exploring Neural Word Embeddings For Amharic Languages
105 pages
Unit Plan For Science
No ratings yet
Unit Plan For Science
30 pages