0% found this document useful (0 votes)

20 views

T5-Based Model For Abstractive Summarization A Semi-Supervised Learning Approach With Consistency Loss Functions

Uploaded by

onkarrborude02

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

20 views

T5-Based Model For Abstractive Summarization A Semi-Supervised Learning Approach With Consistency Loss Functions

Uploaded by

onkarrborude02

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 16

applied

sciences
Article
T5-Based Model for Abstractive Summarization:
A Semi-Supervised Learning Approach with Consistency
Loss Functions
Mingye Wang 1, *, Pan Xie 1 , Yao Du 1 and Xiaohui Hu 2

1 School of Automation Science and Electrical Engineering, Beihang University, Beijing 100191, China;
[email protected] (P.X.); [email protected] (Y.D.)
2 Science and Technology on Integrated Information System Laboratory, Institute of Software,
Chinese Academy of Sciences, Beijing 100045, China; [email protected]
* Correspondence: [email protected]

Abstract: Text summarization is a prominent task in natural language processing (NLP) that con-
denses lengthy texts into concise summaries. Despite the success of existing supervised models,
they often rely on datasets of well-constructed text pairs, which can be insufficient for languages
with limited annotated data, such as Chinese. To address this issue, we propose a semi-supervised
learning method for text summarization. Our method is inspired by the cycle-consistent adversarial
network (CycleGAN) and considers text summarization as a style transfer task. The model is trained
by using a similar procedure and loss function to those of CycleGAN and learns to transfer the style
of a document to its summary and vice versa. Our method can be applied to multiple languages,
but this paper focuses on its performance on Chinese documents. We trained a T5-based model and
evaluated it on two datasets, CSL and LCSTS, and the results demonstrate the effectiveness of the
proposed method.

Keywords: natural language processing; automatic text summarization; abstractive summarization;

semi-supervised learning; consistency loss function

Citation: Wang, M.; Xie, P.; Du, Y.;

Hu, X. T5-Based Model for
Abstractive Summarization: A 1. Introduction
Semi-Supervised Learning Approach
Automatic text summarization is a crucial task in natural language processing (NLP)
with Consistency Loss Functions.
Appl. Sci. 2023, 13, 7111. https://
that aims to condense the core information of a given corpus into a brief summary. With the
doi.org/10.3390/app13127111
exponential growth of textual data, including documents, articles, and news, automatic
summarization has become increasingly important.
Academic Editor: Alessandro Di Text summarization methods can be classified into two categories: extractive and
Nuovo
abstractive. Extractive summarization selects the most important sentences from the origi-
Received: 20 April 2023 nal corpus based on statistical or linguistic features, whereas abstractive summarization
Revised: 1 June 2023 generates a summary by semantically understanding the text and expressing it in a new
Accepted: 9 June 2023 way [1]. Abstractive summarization is more challenging than extractive summarization,
Published: 14 June 2023 but it is also considered superior, as it avoids the issues of coherence and consistency in the
summaries generated with extractive methods.
Deep learning has achieved state-of-the-art results in NLP, and more researchers
have shifted their focus to abstractive summarization. The sequence-to-sequence (seq2seq)
Copyright: © 2023 by the authors. model [2] combined with an attention mechanism has become a benchmark in abstractive
Licensee MDPI, Basel, Switzerland.
summarization [3–5]. However, these methods require well-constructed datasets, which
This article is an open access article
can be difficult and costly to build.
distributed under the terms and
In this paper, we propose a semi-supervised learning method for text summarization
conditions of the Creative Commons
that treats summarization as a style transfer task. Our approach uses a transfer text-to-text
Attribution (CC BY) license (https://
transformer (T5) model as the text generator and trains it with loss functions from the
creativecommons.org/licenses/by/
4.0/).
cycle-consistent adversarial network (CycleGAN) for semantic transfer.

Appl. Sci. 2023, 13, 7111. https://doi.org/10.3390/app13127111 https://www.mdpi.com/journal/applsci

Appl. Sci. 2023, 13, 7111 2 of 16

The remainder of this paper is structured as follows. In Section 2, we review previous

research related to our work. Section 3 describes our method of text summarization in
detail. Section 4 presents the experimental results of our proposed model. In Section 5, we
perform an extensive ablation study to validate the effectiveness of our model. Finally, we
summarize our work in Section 6.

2. Related Works
2.1. Automatic Text Summarization
Automatic text summarization is a crucial task in the field of natural language process-
ing (NLP), and it has received a significant amount of attention from researchers in recent
years. Over the years, a range of methods and models have been proposed to improve
the quality of automatic text summaries. In the early days of NLP research, traditional ap-
proaches to text summarization were based on sentence ranking algorithms that evaluated
the importance of sentences in a given text. These methods used statistical features, such as
frequency and centrality, to rank sentences and select the most important ones to form a
summary [6–8].
With the advent of machine learning techniques in the 1990s, researchers have applied
these methods to NLP to improve the quality of summaries. In automatic text summariza-
tion, this is mostly considered a sequence classification problem. Models are trained to
differentiate summary sentences from non-summary sentences [9–12]. These methods are
referred to as extractive, as they essentially extract important phrases or sentences from
the text without fully understanding their meaning. Thanks to the tremendous success
of deep learning techniques, many extractive summarization studies have been proposed
based on techniques including the encoder–decoder classifier [13], recurrent neural net-
work (RNN) [14], sentence embeddings [15], reinforcement learning, and long short-term
memory (LSTM) network [16].
Moreover, the development of deep learning has given rise to a method called abstract
summarization. Abstract summarization has improved significantly and has become a
crucial area of research in the NLP field. Researchers have made remarkable progress in
this field by leveraging deep learning techniques, such as RNN [3], LSTM [17], and classic
seq2seq models [4,5].
With the introduction of the transformer architecture in 2017 [18], transformer-based
models have significantly outperformed other models in many NLP tasks. This architecture
has been naturally applied to the text summarization task, leading to the development of
several models based on pre-trained language models, including BERT [19], BART [20],
and T5 [21]. These models have demonstrated remarkable performance on various NLP
tasks, including text summarization.

2.2. Text Style Transfer

Text style transfer is a task in the field of NLP that focuses on modifying the style
of a text without altering its content. This task has received considerable attention from
researchers due to its potential applications in many areas, such as creative writing, machine
translation, and sentiment analysis.
The early methods for text style transfer mainly focused on rule-based approaches,
where linguistic patterns and attributes were manually defined and applied to modify the
style of text [22]. These methods, though simple and effective, are limited by the fixed set
of rules that they rely on, which may not adapt well to changing styles and genres.
With the advent of deep learning, several machine-learning-based approaches have
been proposed. The most well-known method is the sequence-to-sequence (seq2seq)
model [2]. Seq2seq models have been used in various NLP tasks, such as text summarization
and machine translation, due to their ability to encode the source text and generate a
target text.
Recently, generative adversarial networks (GANs) [23] were applied to the task of
text style transfer. The idea of GANs is to train two neural networks: a generator and a
Appl. Sci. 2023, 13, 7111 3 of 16

discriminator. The generator tries to generate text that is indistinguishable from the target
style, while the discriminator tries to differentiate between the generated text and the real
target text.

2.3. Cycle-Consistent Adversarial Network

The cycle-consistent adversarial network (CycleGAN) is a generative adversarial
network (GAN) architecture for image-to-image translation tasks. This approach has been
widely used in various domains, including but not limited to image style transfer, domain
adaptation, and super-resolution. The key idea of CycleGAN is to train two generator–
discriminator pairs, with each pair consisting of a generator and a discriminator. One
generator aims to translate an image from the source domain to the target domain, while
the other generator aims to translate an image from the target domain back to the source
domain. The discriminator in each pair is trained to distinguish the translated images from
the real images in the corresponding domain. The cycle consistency loss is introduced to
force the translated image to be transformed back into the original image.
Figure 1 illustrates how CycleGAN works in one direction.

Figure 1. Working principle of CycleGAN.

CycleGAN is focused on the application of style transfer in computer vision. For exam-
ple, Zhu et al. [24] originally proposed CycleGAN for unpaired image-to-image translation,
where there was no one-to-one mapping between the source and target domains. This
method has been widely used in tasks such as colorization, super-resolution, and style
transfer. Based on CycleGAN, different models have been proposed for face transfer [25],
Chinese handwritten character generation [26], image generation from text [27], image
correction [28], and tasks in the audio field [29–31].
One of the highlights of CycleGAN is the implementation of two consistency losses
in addition to the original GAN loss: identity mapping loss and cycle consistency loss.
The identity mapping loss implies that the source data should not be changed during
transformation if they are already in the target domain. The cycle consistency loss comes
with the idea of back translation: The result of back translation should be the same as
the original source. These two loss functions cause the CycleGAN model to keep great
consistency during its transfer procedure; thus, it is possible to handle unpaired images
and achieve outstanding results.

2.4. Transfer Text-to-Text Transformer

The transfer text-to-text transformer (T5) [21] is a state-of-the-art pre-trained language
model based on the transformer architecture. It adopts a unified text-to-text framework
that can handle any natural language processing (NLP) task by converting both the input
and output into natural language texts. T5 can be easily scaled up by varying the number
of parameters (from 60M to 11B), which enables it to achieve superior performance on
various NLP benchmarks. Moreover, T5 employs a full-attention mechanism that allows it
to capture long-range dependencies and complex semantic relations in natural language
Appl. Sci. 2023, 13, 7111 4 of 16

texts. T5 has been successfully applied to many NLP tasks, such as machine translation,
text summarization, question answering, and sentiment analysis [21].
The T5 model follows the typical encoder–decoder structure, and its architecture is
shown in Figure 2.

Figure 2. Architecture of the T5 model.

One of the key features of T5’s text-to-text framework is the use of different prefixes to
indicate different tasks, thus transforming all NLP problems into text generation problems.
For example, to perform sentiment analysis on a given sentence, T5 simply adds the prefix
“sentiment:” before the sentence and generates either “positive” or “negative” as the output.
This feature makes it possible to train a single model that can perform multiple tasks
without changing its architecture or objective function.

3. Proposed Methodology
3.1. Overall
This section presents the foundation of our semi-supervised method for automatic text
summarization. Unlike existing models, which rely heavily on paired text for supervised
training, our approach leverages a small paired dataset followed by a semi-supervised
training process with unpaired corpora. The algorithm used in our method is illustrated in
Algorithm 1, where L denotes the loss incurred by comparing two texts.
Our approach is inspired by the CycleGAN architecture, which uses two generators to
facilitate style transfer in two respective directions. The first part of our method comprises
a warm-up step that employs real text pairs to clarify the tasks of the style transferers
Ta2s and Ts2a and generate basic outputs. The subscripts a2s and s2a, which represent
“article-to-summary” and vice versa, are employed to clarify the transfer direction. The
second part adopts a similar training procedure to that of CycleGAN with consistency loss
functions to further train the models without supervision.
Specifically, the identity mapping loss ensures that a text should not be summarized if
it is already a summary and vice versa. The corresponding training procedure is based on
calling the model to re-generate an identity of the input text. The loss is then calculated by
measuring the difference between the original text and the generated identity. This part is
designed to train the model to be capable of identifying the characteristics of two distinct
text domains. In the following sections of the paper, a superscript idt is used to indicate
re-generated identity texts.
Appl. Sci. 2023, 13, 7111 5 of 16

In contrast, the cycle consistency loss trains the model to reconstruct a summary after
expanding it or vice versa. The corresponding training procedure follows a cyclical process:
For a real summary s, the model Ts2a first expands it and generates a fake article. The term
“fake” indicates that it is generated by our model, rather than a real example from datasets.
Next, the fake article is sent to Ta2s to re-generate its summary. For real articles, the same
cycle steps are utilized. This part is designed to train the model to be capable of transferring
texts between two domains. In the following, a superscript fake is used to indicate the fake
texts generated by the models, and a superscript cyc is used to indicate the final outputs
after such a cycle procedure.

Algorithm 1 Semi-supervised automatic text summarization.

1: for each batch ∈ gold_batches do
2: fine-tune Ta2s and Ts2a with batch . Finetune with real text pairs
3: end for
4: for epoch ∈ [1, nb_epochs] do
5: for all ( ai , si ) such that ai ∈ Articles and si ∈ Summaries do
6: ( aidt idt
i , si ) ← ( Ts2a ( ai ), Ta2s ( si )) . Re-expand and re-summary
7: ( Lidt
a , L idt ) ← ( L ( a , aidt ), L ( s , sidt ))
s i i i i . identity mapping loss
f ake f ake
8: ( si ) ← ( Ta2s ( ai ), Ts2a (si ))
, ai . Generate fake summary and article
cyc cyc f ake f ake
9: ( ai , si ) ← ( Ts2a (si ), Ta2s ( ai )) . Restore article and summary
cyc cyc cyc cyc
10: ( L a , Ls ) ← ( L( ai , ai ), L(si , si )) . cycle consistency loss
cyc cyc
11: Loss ← Lidt idt
a + Ls + L a + Ls . Total loss
12: Back-propagation of Loss
13: end for
14: end for

As observed, despite the integration of the CycleGAN loss functions, we refrain from
constructing a GAN architecture for our task. This decision arises from two factors: firstly,
the challenge involved in the back-propagation phase of discrete sampling during text
generation; secondly, the lack of discernible improvement vis-à-vis our method during
development and the inherent instability in the training process.
The back-propagation of gradients for text generation in a GAN framework presents an
arduous problem, which is primarily due to the discrete nature of text data. Consequently,
the GAN model for text generation often entails the adoption of reinforcement learning or
the use of Gumbel–softmax approximation. These techniques are complicated and may
render the training process unstable, leading to the production of sub-optimal summaries.
Moreover, we found no clear evidence of improved performance through the use of
GAN-based models in our task in comparison with our semi-supervised method with
CycleGAN loss functions. Therefore, we conclude that our approach presents a promis-
ing solution for automatic text summarization and is better suited for our task given its
simplicity and effectiveness.

3.2. Style Transfer Model

As mentioned previously, we view the summarization task as a style transfer problem.
To accomplish this, we employ a T5 model, which offers several advantages over alternative
models. Firstly, the native tasks of the T5 model align well with the requirements of the
style transfer task. Secondly, by modifying the prefix of the input text, a T5 model can
perform tasks in both directions, i.e., from text to summary and vice versa.
As illustrated in Figure 3, a single T5 model can perform the tasks of Ta2s and Ts2a
outlined in Algorithm 1 by changing the prefix of the input text. Therefore, we only require
one generator for both directions, unlike in the original CycleGAN architecture.
Appl. Sci. 2023, 13, 7111 6 of 16

Figure 3. T5 model with different prefixes.

The versatility of the T5 model in undertaking various natural language processing

tasks has been well documented in recent research. The model’s pre-training process
enables it to perform a wide range of tasks, including question answering, text classification,
and text generation. By leveraging the strengths of the T5 model, our approach provides
an effective solution to the problem of automatic text summarization.

3.3. Training with the T5 Model

Our training procedure consists of two parts: a supervised part and an unsupervised
part. In the supervised part, we use small labeled data for warm-up while following the
same procedure as that in the original T5 model. In this part, we fine-tune the T5 model
with pairs of articles and summaries using different prefixes to indicate the generation
direction. The loss function for the supervised part is cross-entropy, which is the same loss
as that used in the original T5 model.
In the unsupervised part, we adopt a training procedure inspired by the CycleGAN
architecture, thus incorporating identity mapping loss and cycle consistency loss. The
identity mapping loss deters the model from re-summarizing a summary or expanding a
full article by minimizing the difference between the input and output texts. Meanwhile,
the cycle consistency loss ensures that the model preserves the source text after a cyclical
transfer by minimizing the difference between the input and reconstructed texts. Figure 4
illustrates these two processes.

(a) Identity mapping loss (b) Cycle consistency loss

Figure 4. CycleGAN losses of the proposed model.

We propose a novel training procedure that uses a single T5 model for both generation
tasks with different prefixes. Given an article a and its summary s, we use the T5 model to
generate a fake summary s f ake from a and a fake article a f ake from s. To indicate the desired
Appl. Sci. 2023, 13, 7111 7 of 16

task, we prepend a prefix string to the input text. The generation process can be formulated
as follows:

s f ake = Ts ( a) = T ( Ps ⊕ a)
(1)
a f ake = Te (s) = T ( Pe ⊕ s)
where Ts () and Te () denote the T5 model with the summary prefix and the expansion
prefix, respectively.
The training process follows a typical supervised paradigm, a cross-entropy
loss [32] is calculated to measure the difference between two texts, and the model
is trained via back-propagation.
C
L( x, x f ake ) = − ∑ pi ( x ) log pi ( x f ake ) (2)
i =1

where C is the vocabulary size, and pi () is the probability of i-th word in the vocabulary.
For the rest of the dataset, where an article a and a summary s are not paired, we calcu-
late the two consistency losses. The identity mapping loss is calculated by re-summarizing
a summary or re-expanding an article as follows:

aidt = Te ( a) sidt = Ts (s)

(3)
Lidt idt
a = L ( a, a ) Lidt idt
s = L ( s, s )

As for the cycle consistency loss, the model first generates s f ake and a f ake as stated
before; then, it regenerates acycle and scycle based on s f ake and a f ake . After such a cycle,
the losses are calculated as follows:

a f ake = Ts ( a) s f ake = Te (s)

acyc = Te (s f ake ) scyc = Ts ( a f ake ) (4)
cyc cyc
La = L( a, acyc ) Ls = L(s, scyc )
The training algorithm is, thus, adapted as in Algorithm 2 (T for T5 model, ⊕ for
concatenation of texts). We use Ps and Pe to denote pre f ix_summarize and pre f ix_expand,
respectively.

Algorithm 2 Semi-supervised automatic text summarization with T5.

1: Set pre f ix_summarize and pre f ix_expand as Ps and Pe
2: for each batch ∈ gold_batches do
3: ( article, summary) ← batch;
4: fine-tune T with ( Ps ⊕ article, summary) and ( Pe ⊕ summary, article)
5: . Fine-tune with real text pairs
6: end for
7: for epoch ∈ [1, nb_epochs] do
8: for all ( ai , si ) such that ai ∈ Articles and si ∈ Summaries do
9: ( aidt idt
i , si ) ← ( T ( Pe ⊕ ai ), T ( Ps ⊕ si )) . Re-expand and re-summarize
10: idt idt idt idt
( L a , Ls ) ← ( L( ai , ai ), L(si , si )) . identity mapping loss
f ake f ake
11: ( si ) ← ( T ( Ps ⊕ ai ), T ( Pe ⊕ si )) . Generate fake summary and article
, ai
cyc cyc f ake f ake
12: ( ai , si ) ← ( T ( Pe ⊕ si ), T ( Ps ⊕ ai )) . Restore article and summary
cyc cyc cyc cyc
13: ( L a , Ls ) ← ( L( ai , ai ), L(si , si )) . cycle consistency loss
cyc cyc
14: Loss ← λidt Lidt ( ai , aidt
i ) + λ L ( s
idt idt i i, s idt ) + λ
cyc cyc i i ) + λcyc Lcyc ( si , si )
L ( a , a
15: . Total loss
16: Back-propagation of Loss
17: end for
18: end for
Appl. Sci. 2023, 13, 7111 8 of 16

Here, the hyperparameters λidt and λcyc control the weights of the two types of losses.

4. Experiments
This section presents the experimental details for evaluating the performance of
our method.

4.1. Datasets
We conducted experiments on two datasets: CSL (Chinese Scientific Literature Dataset) [33]
and LCSTS (Large Scale Chinese Short Text Summarization Dataset) [34].
The CSL is the first scientific document dataset in Chinese consisting of 396,209 papers’
meta-information obtained from the National Engineering Research Center for Science and
Technology Resources Sharing Service (NSTR) and spanning from 2010 to 2020. In our
experiments, we used the paper titles and abstracts to generate summary–article pairs for
training and evaluation purposes. To facilitate evaluation and comparison, we chose the
subset of CSL used in the Chinese Language Generation Evaluation (CLGE) [35] for our
experiments. This sub-dataset comprised 3500 computer science papers.
The LCSTS is a large dataset collecting 2,108,915 Chinese news articles published on
Weibo, the most popular Chinese microblogging website. The data in LCSTS include news
titles and contents posted by verified media accounts. Similarly to with CSL, we used the
news titles and contents to create summary–article pairs for our experiments.
Examples from these datasets can be viewed in Figures A1 and A2.
For the unsupervised training part, our model did not have access to the matched
summary–article pairs. Instead, we intentionally broke the pairs and randomly shuffled the
data, ensuring that the model did not receive matched data during this part of the training.

4.2. Implementation Details

The original datasets contained well-paired texts. We used only a fraction of the
paired data during the warm-up stage. The unsupervised part used text samples of the
corresponding dataset without pair information.
Since the original T5 model does not support the Chinese language, we chose Mengzi [36],
a high-performing lightweight (103M parameters) pre-trained language model for Chinese
in our experiments (Mengzi includes a family of pre-trained models, among which we
used the T5-based one).
We used the AdamW optimizer to train the model with the learning rate, β1, β2, e,
and weight decay as 5 × 10−5 , 0.9, 0.999, 1 × 10−6 , and 0.01, respectively. Moreover, we
set the learning rate with a cosine decay schedule. We restricted the length of sentences in
each batch to a maximum of 512 tokens, and we set the batch size to 8. The two consistency
losses were weighted with factors of 0.1 for the identity mapping loss and 0.2 for the cycle
consistency loss. The higher weight for the cycle consistency loss was due to its direct
contribution to the model’s ability to transfer texts, which was the primary objective of the
task. In contrast, the identity mapping loss helped preserve the characteristics of the input
texts, but it did not directly contribute to the summarization process. All of the experiments
were conducted by using Python 3.7.12 with PaddlePaddle 2.3 and PyTorch 1.11 while
running on an NVIDIA Tesla 32GB V100 GPU. For clarity, the hyperparameter settings
used in our experiments are presented in Table 1.
Appl. Sci. 2023, 13, 7111 9 of 16

Table 1. Hyperparameters used to train the model.

Hyperparameter Value
Optimizer AdamW
Learning rate 5 × 10−5
β1 0.9
β2 0.999
e 1 × 10−6
Weight decay 0.01
Learning rate schedule Cosine decay
Sentence length 512 tokens
Batch size 8
Identity mapping loss weight 0.1
Cycle consistency loss weight 0.2

4.3. Results
In this section, we present the results of our proposed approach for automatic text
summarization and compare its performance with baselines on four commonly used eval-
uation metrics: the ROUGE-1, ROUGE-2, ROUGE-L [37], and BLEU [38] scores. ROUGE
is the acronym for Recall-Oriented Understudy for Gisting Evaluation, and BLEU is the
acronym for BiLingual Evaluation Understudy.
The evaluation metrics play a critical role in assessing the effectiveness of a summa-
rization model. The ROUGE and BLEU scores are widely used to evaluate the quality of
generated summaries. ROUGE measures the overlap between the generated summary
and the reference summary at the n-gram level, whereas BLEU assesses the quality of
the summary by computing the n-gram precision between the generated summary and
the reference summary. By comparing the performance of our proposed model with the
baselines on these four metrics, we can determine the effectiveness of our approach in
automatic text summarization. To provide clarity, we present the formal definitions of these
metrics as follows:

∑S∈{ Re f erenceSummaries} ∑ gramn ∈S Countmatch ( gramn )

ROUGE-N = (5)
∑S∈{ Re f erenceSummaries} ∑ gramn ∈S Count( gramn )
where n stands for the length of the n-gram, gramn , and Countmatch ( gramn ) is the max-
imum number of n-grams co-occurring in a candidate summary and a set of reference
summaries. By switching the reference and summary, we get the precision and recall values.
The final ROUGE-N score is, hence, the F1 score. We used ROUGE-1 and ROUGE-2 in our
experiments. ROUGE-L is based on the longest common subsequence (LCS). It is calculated
in the same way as ROUGE-N, but by replacing the n-gram match with the LCS.
N
BLEU = BP · exp( ∑ wn log pn ) (6)
n =1

where pn is the proportion of correctly predicted n-grams within all predicted n-grams.
Typically, we use N = 4 kinds of grams and uniform weights wn = N/4. BP is the brevity
penalty, which penalizes sentences that are too short:

1, if c > r
Brevity Penalty = (7)
e(1−r/c) , if c <= r

where c is the predicted length and r is the target length.

Appl. Sci. 2023, 13, 7111 10 of 16

We conducted experiments on two Chinese datasets: CSL [33], which consists of

abstracts from the scientific literature and their corresponding titles, and LCSTS [34], which
consists of Chinese news articles and their corresponding human-written summaries. Due
to the lack of research on semi-supervised Chinese summarization, all baselines used in this
study were fully supervised models and were proposed by the organizers of the original
corresponding datasets. For the CSL dataset, we conducted the supervised part of the
experiment with two fractions of the original dataset: one using 50 paired samples, and the
other using 250, while the remaining data were used for the unsupervised part of our
method. For the LCSTS dataset, which was larger than CSL, we conducted the experiments
with 200 and 1000 paired samples.
We also performed an ablation study in comparison with the T5 model trained with
labeled data only and without our proposed loss functions. The T5 models in Table 2 refer
to the results obtained in these cases.
Table 2 illustrates the performance of the baselines and our proposed approach on the
CSL dataset, while Table 3 shows the results on the LCSTS dataset.

Table 2. CSL results.

Models ROUGE-1 ROUGE-2 ROUGE-L BLEU

ALBERT-tiny 52.75 37.96 48.11 21.63
BERT-base 63.83 51.29 59.76 41.45
BERT-wwm-ext 63.44 51 59.4 41.19
RoBERTa-wwm-ext 63.23 50.74 58.99 41.31
LSTM-seq2seq 46.48 30.48 41.8 22
Original T5 50 34.82 19.93 32.62 3.85
T5 50 with CL (ours) 53.13 41.03 50.85 33.95
Original T5 250 56.45 45.01 53.96 37.48
T5 250 with CL (ours) 59.41 47.93 56.16 38.91

Table 3. LCSTS results.

Models ROUGE-1 ROUGE-2 ROUGE-L BLEU

RNN-Word 17.7 8.5 15.8 -
RNN-Char 21.5 8.9 18.6 -
RNN-context-Word 26.8 16.1 24.1 -
RNN-context-Char 29.9 17.4 27.2 -
mT5 - - 34.8 -
CPM-2 - - 35.88 -
Original T5 200 23.61 12.00 21.80 3.99
T5 200 with CL (ours) 28.28 15.48 25.84 10.56
Original T5 1000 28.01 15.59 25.66 9.51
T5 1000 with CL (ours) 30.09 18.59 29.00 14.74

The results presented in Tables 2 and 3 demonstrate that our method achieved compa-
rable performance to that of early supervised large models and even outperformed them
in several metrics, despite using only a lightweight model and a limited amount of data.
However, the performance of recent supervised models was still better than that of our
semi-supervised method. For instance, on CSL, our best results achieved over 93% of the
fully supervised BERT-base’s performance on every metric, significantly outperforming
LSTM-seq2seq and ALBERT-tiny. Regarding LCSTS, our model achieved better results than
the best early fully supervised model, RNN-context-Char, by about 6%, and it had a score
that was approximately 81% of the ROUGE-L of recent models, such as mT5 and CPM2.
The experimental results confirm the effectiveness of our proposed approach in automatic
text summarization.
In addition to comparing our results with those of other models, it is important to
highlight the comparison between the results of our models and that of the original T5
Appl. Sci. 2023, 13, 7111 11 of 16

models without unsupervised learning. This comparison sheds light on the effectiveness
of incorporating unsupervised learning techniques in our approach, as evidenced by
the improved summarization performance, particularly when well-paired data or “gold
batches” were limited. Our semi-supervised method notably improved the performance
across every metric compared to the fully supervised T5 model trained on a limited
amount of labeled data. When labeled text pairs were extremely rare, the proposed method
significantly improved the performance on every metric, especially the BLEU score (from
3.85 to 33.95 on SCL and from 3.99 to 10.56 on LCSTS). As the number of golden batches
increased, the original T5 achieved better results, while our method still ameliorated
its performance. This demonstrates the effectiveness of our approach in leveraging the
information contained in unlabeled data.
The present study showcases a portion of the experimental findings, which are visually
presented in Figures A1 and A2.

5. Conclusions
This study presents a novel semi-supervised learning method for abstractive summa-
rization. To achieve this, we employed a T5-based model to process texts and utilized an
identity mapping constraint and a cycle consistency constraint to exploit the information
contained in unlabeled data. The identity mapping constraint ensures that the input and
output of the model have a similar representation, whereas the cycle consistency constraint
ensures that the input text can be reconstructed from the output summary. Through this ap-
proach, we aim to improve the generalization ability of the model by leveraging unlabeled
data while requiring only a limited number of labeled examples.
A key contribution of this study is the successful application of CycleGAN’s training
process and loss functions to NLP tasks, particularly text summarization. Our method
demonstrates significant advantages in addressing the problem of limited annotated data
and showcases its potential for wide applicability in a multilingual context, especially when
handling Chinese documents. Despite not modifying the model architecture, our approach
effectively leverages the strengths of the original T5 model while incorporating the benefits
of semi-supervised learning.
Our proposed method was evaluated on various datasets, and the experimental results
demonstrate its effectiveness in generating high-quality summaries with a limited number
of labeled examples. In addition, our method employs lightweight models, making it
computationally efficient and practical for real-world applications.
Our approach can be particularly useful in scenarios where obtaining large amounts of
labeled data is challenging, such as when working with rare languages or specialized domains.
It is worth noting that our proposed method can be further improved by using more ad-
vanced pre-training techniques or by fine-tuning on larger datasets. Additionally, exploring
different loss functions and architectures could also lead to better performance.
In summary, our study introduces a novel semi-supervised learning approach for
abstractive summarization, which leverages the information contained in unlabeled data
and requires only a few labeled examples. The proposed approach offers a practical and
efficient method for generating high-quality summaries, and the experimental results
demonstrate its effectiveness on various datasets.

6. Limitations and Future Work

In this section, we discuss the limitations of our proposed T5-based abstractive sum-
marization method and suggest directions for future work to address these limitations.
Semi-supervised training requirement: Our model cannot be trained entirely in an
unsupervised manner. Instead, it requires a small amount of labeled data for a “warm-up”
in a semi-supervised training setting. In our experiments, we found that the performance
of the model trained in a completely unsupervised fashion was inferior to that of the
semi-supervised approach. Future work could explore ways to reduce the reliance on
Appl. Sci. 2023, 13, 7111 12 of 16

labeled data or investigate alternative unsupervised training techniques to improve the

model’s performance.
Room for improvement in model performance: Although our model can match the
performance of some earlier supervised training models, there is still a gap between its
performance and that of more recent state-of-the-art models. Future research could focus on
refining the model architecture, incorporating additional contextual information, or explor-
ing novel training strategies to further enhance the performance of our proposed method.
Domain adaptability: The adaptability of our model to other domains remains to
be tested through further experimentation. Our current results demonstrate the model’s
effectiveness on specific datasets, but its generalizability to different contexts and domains
is still an open question. Future work could involve testing the model on a diverse range
of datasets and languages, as well as developing techniques for domain adaptation to
improve its applicability across various settings.

Author Contributions: Conceptualization, M.W.; methodology, M.W.; software, M.W.; validation,

P.X. and Y.D.; formal analysis, M.W.; investigation, M.W.; resources, P.X.; data curation, Y.D.; writing—
original draft preparation, M.W.; writing—review and editing, M.W. and X.H.; visualization, M.W.;
supervision, X.H.; project administration, X.H.; funding acquisition, X.H. All authors have read and
agreed to the published version of the manuscript.
Funding: This research was funded by the key R&D project of the Ministry of Science and Technology
of the People’s Republic of China with grant number 2020-JCJQ-ZD-079-00.
Institutional Review Board Statement: Not applicable.
Data Availability Statement: The datasets and baselines utilized in our experiments are available at
the following URLs: https://github.com/ydli-ai/CSL and http://icrc.hitsz.edu.cn/Article/show/
139.html. The codes and outputs of our proposed model can also be accessed at https://github.com/
StarsMoon/ATS (20 April 2023).
Conflicts of Interest: The authors declare no conflict of interest.
Appl. Sci. 2023, 13, 7111 13 of 16

Appendix A

Figure A1. Some experimental results on CSL with human translation.

Appl. Sci. 2023, 13, 7111 14 of 16

Figure A2. Some experimental results on LCSTS with human translation.

References
1. Yao, K.; Zhang, L.; Luo, T.; Wu, Y. Deep reinforcement learning for extractive document summarization. Neurocomputing 2018,
284, 52–62. [CrossRef]
2. Sutskever, I.; Vinyals, O.; Le, Q.V. Sequence to sequence learning with neural networks. Adv. Neural Inf. Process. Syst. 2014,
27, 3104–3112.
Appl. Sci. 2023, 13, 7111 15 of 16

3. Chopra, S.; Auli, M.; Rush, A.M. Abstractive sentence summarization with attentive recurrent neural networks. In Proceedings
of the 2016 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language
Technologies, San Diego, CA, USA, 12–17 June 2016; pp. 93–98.
4. Hou, L.; Hu, P.; Bei, C. Abstractive document summarization via neural model with joint attention. In Proceedings of the National
CCF Conference on Natural Language Processing and Chinese Computing, Dalian, China, 8–12 November 2017; Springer:
Berlin/Heidelberg, Germany, 2017; pp. 329–338.
5. Nayeem, M.T.; Fuad, T.A.; Chali, Y. Neural diverse abstractive sentence compression generation. In Proceedings of the European
Conference on Information Retrieval, Cologne, Germany, 14–18 April 2019; pp. 109–116.
6. Ferreira, R.; Cabral, L.; Lins, R.D.; Silva, G.; Favaro, L. Assessing sentence scoring techniques for extractive text summarization.
Expert Syst. Appl. 2013, 40, 5755–5764. [CrossRef]
7. Radev, D.R. LexRank: Graph-based Lexical Centrality as Salience in Text Summarization. J. Qiqihar Jr. Teach. Coll. 2004, 22, 2004.
8. Alguliev, R.M.; Aliguliyev, R.M.; Isazade, N.R. Multiple documents summarization based on evolutionary optimization algorithm.
Expert Syst. Appl. 2013, 40, 1675–1689. [CrossRef]
9. Conroy, J.M.; O’Leary, D.P. Text summarization via hidden Markov models. In Proceedings of the 24th Annual International
ACM SIGIR Conference on Research and Development in Information Retrieval, New Orleans, LA, USA, 13 September 2001.
10. Mihalcea, R.; Tarau, P. TextRank: Bringing Order into Texts. In Proceedings of the 2004 Conference on Empirical Methods in
Natural Language Processing, 20 October 2004.
11. Bollegala, D.T.; Okazaki, N.; Ishizuka, M. A machine learning approach to sentence ordering for multidocument summarization
and its evaluation. In Proceedings of the International Conference on Natural Language Processing, Jeju Island, Republic of
Korea, 11–13 October 2005.
12. Baralis, E.; Cagliero, L.; Mahoto, N.; Fiori, A. GRAPHSUM: Discovering correlations among multiple terms for graph-based
summarization. Inf. Sci. 2013, 249, 96–109. [CrossRef]
13. Cheng, J.; Lapata, M. Neural Summarization by Extracting Sentences and Words. arXiv 2016, arXiv:1603.07252.
14. Nallapati, R.; Zhai, F.; Zhou, B. SummaRuNNer: A Recurrent Neural Network based Sequence Model for Extractive Summa-
rization of Documents. In Proceedings of the AAAI Conference on Artificial Intelligence, Phoenix, AZ, USA, 12–17 February
2016.
15. Anand, D.; Wagh, R. Effective Deep Learning Approaches for Summarization of Legal Texts. J. King Saud Univ.-Comput. Inf. Sci.
2019, 34, 2141–2150. [CrossRef]
16. Mohsen, F.; Wang, J.; Al-Sabahi, K. A hierarchical self-attentive neural extractive summarizer via reinforcement learning
(HSASRL). Appl. Intell. 2020, 50, 2633–2646. [CrossRef]
17. Rush, A.M.; Chopra, S.; Weston, J. A Neural Attention Model for Abstractive Sentence Summarization. arXiv 2015,
arXiv:1509.00685.
18. Vaswani, A.; Shazeer, N.; Parmar, N.; Uszkoreit, J.; Jones, L.; Gomez, A.N.; Kaiser, L.; Polosukhin, I. Attention Is All You Need.
arXiv 2017, 30, 5998–6008.
19. Zhang, H.; Gong, Y.; Yan, Y.; Duan, N.; Xu, J.; Wang, J.; Gong, M.; Zhou, M. Pretraining-Based Natural Language Generation for
Text Summarization. In Proceedings of the 23rd Conference on Computational Natural Language Learning (CoNLL), Hong
Kong, China, 21 November 2019.
20. Lewis, M.; Liu, Y.; Goyal, N.; Ghazvininejad, M.; Zettlemoyer, L. BART: Denoising Sequence-to-Sequence Pre-training for Natural
Language Generation, Translation, and Comprehension. arXiv 2019, arXiv:1910.13461.
21. Raffel, C.; Shazeer, N.; Roberts, A.; Lee, K.; Narang, S.; Matena, M.; Zhou, Y.; Li, W.; Liu, P.J. Exploring the Limits of Transfer
Learning with a Unified Text-to-Text Transformer. J. Mach. Learn. Res. 2020, 21, 5485–5551.
22. Ban, H. Stylistic Characteristics of English News. In Proceedings of the Japan-Korea Joint Symposium on Emotion & Sensibility,
Daejeon, Republic of Korea, 4–5 June 2004.
23. Goodfellow, I.; Pouget-Abadie, J.; Mirza, M.; Xu, B.; Warde-Farley, D.; Ozair, S.; Courville, A.; Bengio, Y. Generative Adversarial
Nets. In Proceedings of the Neural Information Processing Systems, Montreal, QC, Canada, 8–13 December 2014.
24. Zhu, J.Y.; Park, T.; Isola, P.; Efros, A.A. Unpaired Image-to-Image Translation Using Cycle-Consistent Adversarial Networks. In
Proceedings of the International Conference on Computer Vision, Venice, Italy, 22–29 October 2017.
25. Wu, R.; Gu, X.; Tao, X.; Shen, X.; Tai, Y.W.; Jia, J.I. Landmark Assisted CycleGAN for Cartoon Face Generation. arXiv 2019,
arXiv:1907.01424.
26. Bo, C.; Zhang, Q.; Pan, S.; Meng, L. Generating Handwritten Chinese Characters using CycleGAN. In Proceedings of the 2018
IEEE Winter Conference on Applications of Computer Vision (WACV), Lake Tahoe, NV, USA, 12–15 March 2018.
27. Gorti, S.K.; Ma, J. Text-to-Image-to-Text Translation using Cycle Consistent Adversarial Networks. arXiv 2018, arXiv:1808.04538.
28. Harms, J.; Lei, Y.; Wang, T.; Zhang, R.; Zhou, J.; Tang, X.; Curran, W.J.; Liu, T.; Yang, X. Paired cycle-GAN-based image correction
for quantitative cone-beam computed tomography. Med. Phys. 2019, 46, 3998–4009. [CrossRef] [PubMed]
29. Kaneko, T.; Kameoka, H. CycleGAN-VC: Non-parallel Voice Conversion Using Cycle-Consistent Adversarial Networks. In
Proceedings of the 2018 26th European Signal Processing Conference (EUSIPCO), Roma, Italy, 3–7 September 2018.
30. Kaneko, T.; Kameoka, H.; Tanaka, K.; Hojo, N. CycleGAN-VC2: Improved CycleGAN-based Non-parallel Voice Conversion.
In ICASSP 2019—2019 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), Brighton, UK, 9
April 2019.
Appl. Sci. 2023, 13, 7111 16 of 16

31. Kaneko, T.; Kameoka, H.; Tanaka, K.; Hojo, N. CycleGAN-VC3: Examining and Improving CycleGAN-VCs for Mel-spectrogram
Conversion. arXiv 2020, arXiv:2010.11672.
32. Bishop, C. Pattern Recognition and Machine Learning; Stat Sci; Springer: Berlin/Heidelberg, Germany, 2006.
33. Li, Y.; Zhang, Y.; Zhao, Z.; Shen, L.; Liu, W.; Mao, W.; Zhang, H. CSL: A Large-scale Chinese Scientific Literature Dataset. In
Proceedings of the 29th International Conference on Computational Linguistics, Gyeongju, Republic of Korea, 12–17 October
2022; pp. 3917–3923.
34. Hu, B.; Chen, Q.; Zhu, F. LCSTS: A Large Scale Chinese Short Text Summarization Dataset. arXiv 2015, arXiv:1506.05865.
35. CLUEbenchmark. Chinese Language Generation Evaluation. 2020. Available online: https://github.com/CLUEbenchmark/
CLGE (accessed on 8 June 2023).
36. Zhang, Z.; Zhang, H.; Chen, K.; Guo, Y.; Hua, J.; Wang, Y.; Zhou, M. Mengzi: Towards Lightweight Yet Ingenious Pre-Trained
Models for Chinese. 2021. Available online: http://xxx.lanl.gov/abs/2110.06696 (accessed on 8 June 2023).
37. Lin, C.Y. Rouge: A package for automatic evaluation of summaries. In Text Summarization Branches Out; Association for
Computational Linguistics: Barcelona, Spain, 2004; pp. 74–81.
38. Papineni, K.; Roukos, S.; Ward, T.; Zhu, W.J. Bleu: A Method for Automatic Evaluation of Machine Translation. In Proceedings of
the 40th Annual Meeting of the Association for Computational Linguistics, Philadelphia, PA, USA, 7–12 July 2002; pp. 311–318.
[CrossRef]

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual
author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to
people or property resulting from any ideas, methods, instructions or products referred to in the content.

Edoc - Pub As 4084 2012 Steel Storage Racking
No ratings yet
Edoc - Pub As 4084 2012 Steel Storage Racking
2 pages
Research Proposal
No ratings yet
Research Proposal
13 pages
Text Summarization Using NLP
No ratings yet
Text Summarization Using NLP
6 pages
Seminar Text Summarization 1
No ratings yet
Seminar Text Summarization 1
21 pages
Natural Language Processing With Improved Deep Lea
No ratings yet
Natural Language Processing With Improved Deep Lea
8 pages
1805 03616 ReinforcedTopicAwareConvS2S PDF
No ratings yet
1805 03616 ReinforcedTopicAwareConvS2S PDF
8 pages
Data Representation for Deep Learning - Based Arabic Text Summarization Performance Using Python Results
No ratings yet
Data Representation for Deep Learning - Based Arabic Text Summarization Performance Using Python Results
18 pages
Project File
No ratings yet
Project File
23 pages
The NLP Cookbook Modern Recipes For Transformer Ba
No ratings yet
The NLP Cookbook Modern Recipes For Transformer Ba
29 pages
Rane, Govilkar - 2019 - Recent Trends in Deep Learning Based Abstractive Text Summarization-Annotated
No ratings yet
Rane, Govilkar - 2019 - Recent Trends in Deep Learning Based Abstractive Text Summarization-Annotated
8 pages
Automated Extraction and Augmentation of Key Information From Audio Using Speech Recognition and Text Summarization
No ratings yet
Automated Extraction and Augmentation of Key Information From Audio Using Speech Recognition and Text Summarization
5 pages
Abstractive Text Summarization Using Transformer Based Approach
No ratings yet
Abstractive Text Summarization Using Transformer Based Approach
10 pages
Abstractive Text Summary Generation With Knowledge Graph Representation
No ratings yet
Abstractive Text Summary Generation With Knowledge Graph Representation
9 pages
A Survey On Abstractive Text Summarization
No ratings yet
A Survey On Abstractive Text Summarization
7 pages
Multi Task Learning For Abstractive and Extractive Summarization
No ratings yet
Multi Task Learning For Abstractive and Extractive Summarization
10 pages
Combining Knowledge With Deep Convolutional Neural Networks For Short Text Classification
No ratings yet
Combining Knowledge With Deep Convolutional Neural Networks For Short Text Classification
7 pages
Malay Phoneme-Based Subword News Headline Generator For Low-Resource Language
No ratings yet
Malay Phoneme-Based Subword News Headline Generator For Low-Resource Language
11 pages
Research Final
No ratings yet
Research Final
6 pages
Review of Data-Driven Generative AI Models For Knowledge Extraction From Scientific Literature in Healthcare
No ratings yet
Review of Data-Driven Generative AI Models For Knowledge Extraction From Scientific Literature in Healthcare
20 pages
33
No ratings yet
33
7 pages
BMK Q2 JIFS TextSummarizationUsingModifiedGenerativeAdversialNetwork
No ratings yet
BMK Q2 JIFS TextSummarizationUsingModifiedGenerativeAdversialNetwork
13 pages
Abstractive Text Summarization of Multimedia News Content Using RNN
No ratings yet
Abstractive Text Summarization of Multimedia News Content Using RNN
10 pages
TOPSIS With Multiple Linear Regression For Multi-Document Text Summarization
No ratings yet
TOPSIS With Multiple Linear Regression For Multi-Document Text Summarization
11 pages
Unsupervised Extractive Multi-Document Summarization Method Based On Transfer Learning From BERT Multi-Task Fine-Tuning
No ratings yet
Unsupervised Extractive Multi-Document Summarization Method Based On Transfer Learning From BERT Multi-Task Fine-Tuning
19 pages
nlp
No ratings yet
nlp
8 pages
Research_paper[1][1][1] Final - Copy
No ratings yet
Research_paper[1][1][1] Final - Copy
4 pages
IEEE_Conference_Template__3_
No ratings yet
IEEE_Conference_Template__3_
4 pages
Research Paper[1][1][1] Final[1] - Copy
No ratings yet
Research Paper[1][1][1] Final[1] - Copy
4 pages
Ir Case Study
No ratings yet
Ir Case Study
8 pages
duan2020
No ratings yet
duan2020
6 pages
2024_Multi-Task Learning in Natural Language Processing - An Overview_Chen et al_ACM Computing Surveys
No ratings yet
2024_Multi-Task Learning in Natural Language Processing - An Overview_Chen et al_ACM Computing Surveys
31 pages
A Comparative Study On Text Summarization Methods: Abstract
No ratings yet
A Comparative Study On Text Summarization Methods: Abstract
7 pages
2022 Acl-Demo 10
No ratings yet
2022 Acl-Demo 10
9 pages
A Word-Concept Heterogeneous Graph Convolutional
No ratings yet
A Word-Concept Heterogeneous Graph Convolutional
16 pages
1 s2.0 S2095809922006324 Main
No ratings yet
1 s2.0 S2095809922006324 Main
20 pages
Hybrid Model For Extractive Single Document Summarization: Utilizing BERTopic and BERT Model
No ratings yet
Hybrid Model For Extractive Single Document Summarization: Utilizing BERTopic and BERT Model
9 pages
A Hybrid Bidirectional Recurrent Convolutional Neu
No ratings yet
A Hybrid Bidirectional Recurrent Convolutional Neu
13 pages
Research of Sentiment Analysis Based On Long-Sequence-Term-Memory Model
No ratings yet
Research of Sentiment Analysis Based On Long-Sequence-Term-Memory Model
6 pages
Inlg 19 TL DR Writeup 4
No ratings yet
Inlg 19 TL DR Writeup 4
7 pages
IJSRED Paper SupervisedPromptEngineering ALiteratureReview
No ratings yet
IJSRED Paper SupervisedPromptEngineering ALiteratureReview
9 pages
EMNLP__MALNIS_Dataset-2
No ratings yet
EMNLP__MALNIS_Dataset-2
11 pages
A Multi-Metric Model For Analyzing and Comparing e
No ratings yet
A Multi-Metric Model For Analyzing and Comparing e
18 pages
Summarization of Text Based On Deep Neural Network
No ratings yet
Summarization of Text Based On Deep Neural Network
12 pages
Ijetae 0223 071
No ratings yet
Ijetae 0223 071
11 pages
Literature Survey
No ratings yet
Literature Survey
2 pages
Ranking Sentences For Extractive Summarization With Reinforcement Learning
No ratings yet
Ranking Sentences For Extractive Summarization With Reinforcement Learning
13 pages
Automatic Text Summarization Using RL
No ratings yet
Automatic Text Summarization Using RL
5 pages
Enhancing Text Classification Through Novel Deep Learning Sequential Attention Fusion Architecture
No ratings yet
Enhancing Text Classification Through Novel Deep Learning Sequential Attention Fusion Architecture
12 pages
Review Article: A Comprehensive Survey of Abstractive Text Summarization Based On Deep Learning
No ratings yet
Review Article: A Comprehensive Survey of Abstractive Text Summarization Based On Deep Learning
21 pages
Abstractive Survey
No ratings yet
Abstractive Survey
8 pages
A Framework For Multi-Document Abstractive Summarization Based On Semantic Role Labelling
No ratings yet
A Framework For Multi-Document Abstractive Summarization Based On Semantic Role Labelling
11 pages
Bashaier Proposal Ver 22-8-2024
No ratings yet
Bashaier Proposal Ver 22-8-2024
15 pages
Pre Trained Models For NLP
No ratings yet
Pre Trained Models For NLP
15 pages
Word Embeddings
No ratings yet
Word Embeddings
13 pages
Character Level Text Classification Via Convolutional Neural Network and Gated Recurrent Unit
No ratings yet
Character Level Text Classification Via Convolutional Neural Network and Gated Recurrent Unit
11 pages
Summarization of Odia Text Document Using Cosine Similarity and Clustering
No ratings yet
Summarization of Odia Text Document Using Cosine Similarity and Clustering
4 pages
Rare Words in Text Summarization
No ratings yet
Rare Words in Text Summarization
11 pages
22mca025 22mca032 22mca034
No ratings yet
22mca025 22mca032 22mca034
14 pages
Malayalam 2
No ratings yet
Malayalam 2
4 pages
Analysis of Abstractive and Extractive Summarizati
No ratings yet
Analysis of Abstractive and Extractive Summarizati
11 pages
Paper A Survey On ETS
No ratings yet
Paper A Survey On ETS
6 pages
Hugging Face Transformers Essentials: From Fine-Tuning to Deployment
From Everand
Hugging Face Transformers Essentials: From Fine-Tuning to Deployment
Robert Johnson
No ratings yet
Judul Jurnal
No ratings yet
Judul Jurnal
20 pages
The Nclex-Rn Cram Sheet: General Test Information
No ratings yet
The Nclex-Rn Cram Sheet: General Test Information
2 pages
LR 681 Dated 08.11.2024 Title
No ratings yet
LR 681 Dated 08.11.2024 Title
42 pages
Invoice
No ratings yet
Invoice
1 page
ASSESSMENT INSTRUCTION NO 03 OF 2025 - APPLICATION MARK-1
No ratings yet
ASSESSMENT INSTRUCTION NO 03 OF 2025 - APPLICATION MARK-1
3 pages
Individual Family Constellations
100% (9)
Individual Family Constellations
11 pages
Week 1 - Foundations of Social Networking
No ratings yet
Week 1 - Foundations of Social Networking
10 pages
Srtmun Coursework Exam
100% (2)
Srtmun Coursework Exam
8 pages
Dnssec Bind Resolver Installation Hyperlocal - Hands On en
No ratings yet
Dnssec Bind Resolver Installation Hyperlocal - Hands On en
20 pages
Microplastics in The Bay
No ratings yet
Microplastics in The Bay
1 page
Holding Hands With An Angel - Chris Marcotte
No ratings yet
Holding Hands With An Angel - Chris Marcotte
8 pages
Copy of Semester Exam Econ G9 - Paper 1.Docx
No ratings yet
Copy of Semester Exam Econ G9 - Paper 1.Docx
9 pages
Critical Thinking Dispositions Among Polytechnic Students: Why Does It Matter?
No ratings yet
Critical Thinking Dispositions Among Polytechnic Students: Why Does It Matter?
5 pages
Loan Case For Purchase of Plot - Muhammad Ayyaz, Steno-II
No ratings yet
Loan Case For Purchase of Plot - Muhammad Ayyaz, Steno-II
10 pages
PFR Case Digest
No ratings yet
PFR Case Digest
3 pages
Comparison Valentine and
No ratings yet
Comparison Valentine and
9 pages
The Magic Lotus Lantern and Other Tales From The Han Chinese 12
No ratings yet
The Magic Lotus Lantern and Other Tales From The Han Chinese 12
1 page
Blues Changes For Serious Players
No ratings yet
Blues Changes For Serious Players
17 pages
Foreign Affairs Magazine Jan-Feb 2024
No ratings yet
Foreign Affairs Magazine Jan-Feb 2024
178 pages
Ann Garry Serene Khader Alison Stone Routledge Companion Feminist Philosophy 2017
100% (2)
Ann Garry Serene Khader Alison Stone Routledge Companion Feminist Philosophy 2017
755 pages
ASTM F 382 Experiment - 1 - Foy
No ratings yet
ASTM F 382 Experiment - 1 - Foy
2 pages
Memory: Memory Is The Ability To Take in Information, Store It, and Recall It at A Later Time
No ratings yet
Memory: Memory Is The Ability To Take in Information, Store It, and Recall It at A Later Time
2 pages
AQA 3.4.6 and 3.4.7 For Fabio
No ratings yet
AQA 3.4.6 and 3.4.7 For Fabio
49 pages
Eric Clapton Holy Mother
0% (1)
Eric Clapton Holy Mother
2 pages
Mecca Cola Repostioning
No ratings yet
Mecca Cola Repostioning
29 pages
Qatar Small State Big Politics Mehran Kamravapdf download
100% (2)
Qatar Small State Big Politics Mehran Kamravapdf download
44 pages
Learning and Thinking Style
86% (35)
Learning and Thinking Style
58 pages
Winter Newsletter 2012
No ratings yet
Winter Newsletter 2012
19 pages