Generating Text Using Generative Adversarial Networks and Quick-Thought Vectors
Generating Text Using Generative Adversarial Networks and Quick-Thought Vectors
Feng Tian
David Russell and Longzhuang Li System Engineering Institute
Dept. of Computing Sciences Xi’an Jiaotong University
Texas A&M Uni.-Corpus Christi Xi’an, Shanxi, China 710049
Corpus Christi, TX, USA 78412 [email protected]
e-mail: [email protected]
Abstract—Generative Adversarial Networks (GANs) have been Skip-Thought model [12] is a method of embedding
shown to perform very well with the image generation tasks. sentences into vectors for use in NLP. Skip-Thoughts encode
Many advancements have been made with GANs over the past the whole sentences and the created vector holds the context
few years, which are making them more and more accurate in of a sentence. This means that entire sentences can be used to
their generation tasks. State-of-the-art methods of natural classify a context of the text instead of just words
language processing (NLP) involve word embeddings such as individually. The decoder for the sentence embeddings uses
global vectors for word representation (GLoVe) and word2vec. a greedy approach to build the sentence. It creates a sentence
These word embeddings help apply text data to a neural
from scratch one word at a time. This can be costly and may
network by converting the textual data to numbers that the
not always lead to a great result. Skip-Thought vectors has
networks could use. The main focus for GANs has been image
generation and in the past few years there have been research
been utilized in STGAN to generate text with good BLEU
works to apply GANs to the text generation task. This paper scores [4]. Quick-Thought vectors are the new method of
presents a Quick-Thought GAN (QTGAN) to generate sentence embedding and encode similarly to the Skip-
sentences by incorporating the Quick-Thought model. Quick- Thought model [11]. The Quick-Thought model can train
Thought vectors offer richer representations than prior faster and produce better results. When the embeddings are
unsupervised and supervised methods and enable a classifier to decoded it does not generate a sentence word by word as
distinguish context sentences from other contrastive sentences. Skip-Thought. Instead it classifies that embedding and grabs
The proposed QTGAN is trained on a portion of the the best matching pre-existing sentence from some candidate
BookCorpus dataset that has been converted to Quick- sentences. The Quick-Thought vector model is incorporated
Thought embeddings. The embeddings generated from the in the QTGAN proposed in this paper.
generator are then classified and used to pick a generated The rest of paper is organized as follows. In Section II,
sentence. BLEU scores are used to test the results of the related works in NLP and GANs are reviewed. In Section III,
training and compared to the Skip-Thought GAN. The a new Quick-Thought GAN (QTGAN) is proposed. In
increases in BLEU-3 and BLEU-4 scores were achieved with Section IV, the proposed QTGAN is trained by using the
the QTGAN. BookCorpus dataset, and the performance is compared with
the STGAN. In Section V, the future works are discussed. In
Keywords-generative adversarial network (GAN), word vector,
Section VI, we conclude the paper.
word embedding, quick thought vector, skip thought vector,
sentence embedding, natural language processing (NLP) II. RELATED WORK
130
Authorized licensed use limited to: Amrita School of Engineering. Downloaded on February 19,2024 at 17:33:15 UTC from IEEE Xplore. Restrictions apply.
using the Skip-Thought encoder. These encodings are loss function for the GAN is the improved loss function
passed to the discriminator for training. The generator also shown above in Equation 2.
makes samples for the discriminator to evaluate. Once B. Quick-Though Encoder
training of the STGAN is complete the generator will
produce Skip-Thought sentence embeddings to be decoded The Quick-Thought model being used for the encoder is
by the Skip-Thought decoder, which will generate new the model from [11]. The model is inspired by the multi-
channel CNN model in [7]. The model is a concatenation of
sentences based on the embedding.
two bi-directional recurrent neural networks (RNNs). One of
III. PROPOSED WORK the RNNs uses a large vocabulary of around 3 million words
while the other uses a much smaller vocabulary of about 50
In this paper, the Quick-Thought GAN (QTGAN) is thousand words. The model uses GLoVe to kick start its
proposed, which is mostly close to the STGAN. The training. The model is shown to produce lower amounts of
QTGAN is trained using sentence embeddings from a redundant features than the bi-directional model discussed in
different model name Quick-Thoughts. Quick-Thoughts are [12].
the newer method of sentence encoding providing better
results on generating sentences. Once the training of the C. Quick-Thought Decoder
QTGAN has completed then it will be used for generating The Quick-Thought decoder differs from the Skip-
sentence embeddings and then decoding those embeddings Thought decoder in that it does not actually generate a new
into sentences using the Quick-Thought decoder. sentence. Instead it looks for the best sentences that fit the
The Quick-Thought GAN is implemented based on a embedding and then chooses the best out of those. So the
standard GAN model that is shown in [10]. The architecture decoder uses a set of sentence embeddings from the training
of the QTGAN can be seen in Figure 1. There are four corpus. In other words, the decoder classifies the current
major components making up the architecture, which are the embedding and then finds candidate sentences for that
Quick-Thought Encoder, Generator, Discriminator and the embedding. The candidate with the highest probability is
Quick-Thought Decoder. chosen as the new sentence (see Equation 3). For a given
sentence position in the context of s, the probability that a
A. Quick-Thought Vectors candidate sentence scand ϵ Scand is the correct sentence for that
Quick-Thought vectors as proposed by Logeswaran & position is computed in Equation 3 as follows:
Lee in [11] improve on the Skip-Thought framework. The
Quick-Thought vector framework uses the same encoder-
decoder model. The encoder takes a sentence and encodes it
into a vector just like the Skip-Thought framework. The
difference is in the decoder. The decoder of the Quick- where c is a scoring function, f and g the parametrized
Thought framework does not try to generate the words of the functions that encode an input sentence into a fixed length
sentences around the encoded sentence. Instead the decoder vector, and Scand a set of candidate sentences. This is where
will classify the sentence embedding and pick a sentence the computational cost is saved. It works directly in the
from a selection of candidate vectors to use. This saves vector space and does not generate its own sentences.
computational power as it works directly in the embedding Instead it works with pre-existing sentences from the corpus
vector space instead of using a greedy algorithm to and allows quick results of sentence embedding decoding.
generate sentences one word at a time like the Skip-Thought
model. Quick-Thought vectors will be used in this GAN. The D. Generator
131
Authorized licensed use limited to: Amrita School of Engineering. Downloaded on February 19,2024 at 17:33:15 UTC from IEEE Xplore. Restrictions apply.
third layer will expand it to 2048. These first 3 layers use BookCorpus dataset and give you the ability to download the
LeakyReLU as their activation functions with an alpha of dataset in full. Then perform needed actions such as making
0.2. They also use batch normalization with a momentum of the dataset one sentence per line and tokenizing if necessary.
0.5. The final layer is the layer which expands the 2048 to Due to the RAM limitations in Colab using the Quick-
4800 which is our final size that matches the word Thought encoder only 64K sentences were used from the
embeddings of the Quick-Thought model. This final layer dataset. The Quick-Thought model being used is the pre-
uses a tanh activation function. These samples that the trained MC-QT model from [11].
generator creates is similar to the real embeddings from the B. Training
Quick-Thought model. The samples are then sent to the
As previously mentioned, the training was done on a
discriminator during training for evaluation. Once the
Google Colab Python 2 GPU instance using the pre-trained
training is complete then the generator is able to generate
model from [11]. The generator takes in a random noise
embeddings for use in the Quick-Thought decoding part of vector z of dimension 100, and outputs a sample vector that
the system, which produces newly generated sentences from has a dimension of 4800 since Both Skip-Thought and
the generated sentence embeddings. Quick-Thought bi-directional encoders produce embeddings
E. Discriminator with a dimension of 4800. The training uses batch sizes of
The discriminator is used for checking whether or not the 128 with 500 batches. This gives a total of 64,000 training
input it receives is real or fake. The input for the examples to use. Training was done for 5001 iterations for a
discriminator is a 4800-dimensional vector that represents a total of 10 epochs. Going above 10 epochs sometimes leads
sentence embedding. It also takes in another input that to undesirable results. After each epoch in the training
represents whether or not the provided embedding is real or session the generator produces 50 samples for evaluation. In
a fake sample from the generator. The discriminator addition to this it also saves the states of the discriminator
evaluates the samples it receives and determine the and generator models so they can be loaded later for more
authenticity of them during training of the generator. It does training. During the training the discriminator receives
this using 3 densely connected layers. The first layer shrinks samples from the real embeddings and the generated
the vector from 4800 to 1024, and the second layer shrinks embeddings along with labels to tell it if they are real or
it from 1024 to 512. The first and second layers use fake. The loss function used is Equation 2 that was shown
LeakyReLU as the activation function like the generator above.
with an alpha of 0.2. The final layer outputs a value from 0 C. Results
to 1 using a sigmoid activation function. This value The bilingual evaluation understudy BLEU scores are
represents whether it thinks the sample is real or fake and is used to compare the performance of QTGAN, STGAN and
used in the loss calculation as well. Once the training of the STGAN minibatch. The BLEU scores check to see how
generator is complete then the discriminator has served its much the candidate sentence which in this case is the
purpose and is no longer used. generated sentence resembles the sentences in the corpus.
The BLEU-2 is to check with groups of two words at a time
IV. EXPERIMENTS and see if those two words appear in the corpus in the same
The experiments in this paper were done using Google order and scores from that. The BLEU-3 and BLEU-4 are
Colab's services. A Python 2 run time was used with GPU very similar except they use groups of 3 and 4 respectively.
acceleration. This provided easy to use method of running The scores for BLEU-2, BLEU-3 and BLEU-4 are shown
code in a GPU accelerated environment. The run time below in Table I. The BLEU scores show an improvement
provides ~12GB of RAM and ~350GB of storage space. The when using the Quick-Thought model in all but BLEU-2 for
GPU provided by Google Colab is a Nvidia Tesla K80. the STGAN minibatch model. To be more specific, the
BLEU-2 score of QTGAN is 0.736, which is higher than that
A. Dataset of STGAN and lower than STGAN minibatch. For both
The dataset used in the experiments is the BookCorpus BLEU-3 and BLEU-4 scores, QTGAN achieves 0.684 and
dataset [16], which is made up of free books from 0.771, respectively, while STGAN offers 0.564 and 0.525
unpublished authors. It contains 16 different genres and and STGAN minibatch 0.607 and 0.531. The examples of
about 45 million sentences. This dataset is used in a lot of decoded sentences are shown in Table II.
text generation works for training so it will be used for
training this GAN as well. This dataset is no longer publicly TABLE I: BLUE SCORES FOR 50 SAMPLES AVERAGED
distributed so in order to obtain it you have to use a script to BookCorpus Reference
scrape the data from the smashwords website which is where Model BLEU-2 BLEU-3 BLEU-4
the books in the BookCorpus dataset come from. In order to
STGAN 0.709 0.564 0.525
do this, we used code from a GitHub repository by
Soskek 1 This repository contains a snapshot of the STGAN (minibatch) 0.745 0.607 0.531
QTGAN 0.736 0.684 0.771
1
https://github.com/soskek/bookcorpus
132
Authorized licensed use limited to: Amrita School of Engineering. Downloaded on February 19,2024 at 17:33:15 UTC from IEEE Xplore. Restrictions apply.
TABLE II: THE EXAMPLES OF DECODED SENTENCES shown that the QTGAN model does perform well after
Example Sentences training and have the ability to create Quick-Thought
“An all-inclusive wakeup call into collective hive greater good action.”
“He means is there anything in his life that could help us now.”
sentence embeddings. Decoding these embeddings and
"Almost out of nowhere, a figure emerged and started towards Ray." using the BLEU scoring system allows a comparison
"We had run from so much danger in that car, but now I had to keep dealing between the QTGAN and the STGAN. This comparison
with boring politics." showed that the QTGAN had better results than the STGAN.
"I wish Phoenix or even Elathan would come back and just… fix With higher scores in all tests but the BLEU-2 against the
everything.”
“Kenneth, can you help us find your brother?” STGAN that used minibatches. This shows that the QTGAN
is capable of producing good text results that are on par if
not better than the STGAN. In conclusion a GAN can be
V. FUTURE WORK trained on quick thought encodings and be able to produce
The proposed QTGAN can be expanded and changed in good results for text generation.
many ways for better results. The loss function currently REFERENCES
used is the original loss function from Equation 2. This can
[1] L. Yu, W. Zhang, J. Wang and Y. Yong, "SeqGAN: Sequence
be changed to something like the wasserstein distance Generative Adversarial Nets with Policy Gradient", in 31st AAAI
function in order to improve the loss function. The Conference on Artificial Intelligence, 2017.
wasserstein distance provides a better insight during the [2] J. Guo, S. Lu, H. Cai, W. Zhang, Y. Yu and J. Wang, “Long Text
training because it converges better which GANs tend to Generation via Adversarial Training with Leaked Information”, in
have issues doing. Another addition that could be done is a 32nd AAAI Conference on Artificial Intelligence, 2018.
[3] A. Fan, M. Lewis and Y. Dauphin, “Hierarchical Neural Story
change to the generator. Currently the generator is not Generation”, in 56th AMACL, 2018.
conditioned on any input. It just receives its random noise [4] A. Ahamad, “Generating Text through Adversarial Training using
vector and generates a sentence embedding based on that Skip-Thought Vectors”, in NAACL Student Research Workshop
random vector. Instead the conditional GAN model can be (SRW), 2019.
implemented to change this [6],[8],[9]. The conditional [5] P. Tambwekar, M. Dhuliawala, A. Mehta, L. Martin, B. Harrison and
GAN model allows the generated sample to be conditioned M. Riedl, “Controllable Neural Story Generation via Reinforcement
Learning”, arXiv preprint arXiv:1809.10736v1. 2018.
on an input. Once a sentence is encoded the sentence
[6] M Mirza and S. Osindero, "Conditional Generative Adversarial Nets",
embedding y can be concatenated to the random noise arXiv preprint arXiv:1411.1784v1. 2014.
vector z and then becomes the input to the GAN's generator. [7] Y. Kim, "Convolutional neural networks for sentence classification",
This will produce a sample that is conditioned on the input y. in Conference on Empirical Methods in Natural Language Processing,
For example, if a sentence about having a party is entered in 2014.
the conditional GAN, then the sentence that is decoded from [8] A. Dash, J. Gamboa, S Ahmed, M Liwicki and M. Afzal, “TAC-GAN
the generated embedding should have something to do with – Text Conditioned Auxiliary Classifier Generative Adversarial
Network”, arXiv preprint arXiv:1703.06412v2. 2017.
having a party. Once conditional sentences can be generated
[9] X. Deng, Y. Zhu and S. Newsam, “Using Conditional Generative
then another addition could be making sure that the future Adversarial Networks to Generate Ground-Level Views from
generated sentences are related to each other. In this way, Overhead Imagery”, arXiv preprint arXiv:1902.06923v1. 2019.
multiple sentences could be produced out of multiple [10] I. Goodfellow, J. Pouget-Abadie, M. Mirza, B. Xu, D. Warde-Farley,
encodings, which allow the model to write a complete short S. Ozair, A. Courville and Y. Bengio, "Generative Adversarial Nets",
in 28th Conference on Neural Information Processing Systems, 2014.
story [2],[3],[5]. More future work would be expanding the
[11] L. Logeswaran and H. Lee, "An Efficient Framework for Learning
training data to be more of the dataset. Because of the limit Sentence Representations", in ICLR, 2018.
imposed by Google Colab only a small portion of [12] R. Kiros, Y. Zhu, R. Salakhutdinov, R. Zemel, A. Torralba, R.
BookCorpus was used. It would be better to be able to use a Urtasun and S. Fidler, "Skip-Thought Vectors", in NIPS, 2015.
larger amount of BookCorpus. [13] Y. Zhang, Z. Gan and L. Carin, "Generating Text via Adversarial
Training", Workshop on Adversarial Training", in NIPS, 2016.
VI. CONCLUSION [14] T. Mikolov, K. Chen, G. Corrado and J. Dean, "Efficient Estimation
In the paper, the QTGAN is proposed and implemented of Word Representations in Vector Space", in International
Conference on Learning Representations, 2013.
by incorporating the Quick-Thought encoding and decoding
[15] J. Pennington, R. Socher and C. Manning, "GloVe: Global Vectors
models into it. The BookCorpus dataset was used as the for Word Representation", in International Conference on on
training data and encoded using the Quick-Thought encoder. Empirical Methods in Natural Language Processing, 2014.
During the training, the QTGAN used these encodings of [16] Y. Zhu, R. Kiros, R. Zemel, R. Salakhutdinov, R. Utrasun, A.
the sentences as input for the discriminator to help the Torrabla and S. Fidler, "Aligning Books and Movies: Towards Story-
like Visual Explanations by Watching Movies and Reading Books",
generator learn and produce better embeddings. It was in International Conference on Computer Vision, 2015.
133
Authorized licensed use limited to: Amrita School of Engineering. Downloaded on February 19,2024 at 17:33:15 UTC from IEEE Xplore. Restrictions apply.