0% found this document useful (0 votes)

53 views

Cscubs Dialogues

The document discusses using neural networks with attention and background knowledge for dialogue response generation. It trains a sequence-to-sequence model on the Ubuntu Dialogue Corpus to generate responses rather than select from predefined responses. The model uses an attention mechanism and background knowledge from technical manuals to improve response generation.

Uploaded by

Emy Omar

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

53 views

Cscubs Dialogues

Uploaded by

Emy Omar

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 11

Dialogue Response Generation using Neural

Networks with Attention and Background

Knowledge
Sofiia Kosovan1 , Jens Lehmann2 and Asja Fischer3
1
RWTH Aachen University, Germany
[email protected]
2
University of Bonn, Germany
[email protected]
3
University of Bonn, Germany
[email protected]

Abstract
Constantly increasing amounts of data on the Internet as well as fast GPUs available
allowed for the research in AI. Google DeepMind and self-driving cars are showing how fast
machine learning flows into industry. It helps humans to do their job, and the chatbots
are a good example of it. We use them to play music, tell us the weather, order a cab,
and much more. Chatbots are all about generating the response given the conversation so
far. In this work we train a dialogue response generation model using neural networks. It
is a sequence to sequence model which takes a dialogue as an input and produces the next
response as an output. The data used is the Ubuntu Dialogue Corpus - a large dataset
for research in unstructured multi-turn dialogue systems. Since the input sequences are
the dialogues, the inputs are pretty long, and in this case the information at the very
beginning of the sequence is often lost while training the model. That is one of the reasons
we are using attention. It allows the decoder more direct access to the input and lets the
model itself decide which inputs consider more for output generation. Also, we extend the
Ubuntu Dialogue Corpus with the information from the man pages in order to enrich the
input with the technical information. We use the short descriptions. In this way we do not
overload the input with too much additional information, but still add some background
knowledge.

1 Introduction
An explosion in the number of people having informal, public conversations on social media
websites such as Facebook and Twitter presented a unique opportunity to build collections of
naturally occurring conversations that are orders of magnitude larger than those previously
available. These corpora, in turn, present new opportunities to apply data-driven techniques
to conversational tasks [13].
The task of the response generation is to generate any response that fits the provided
stimulus without mentioning the context, intent or dialogue state. Without employing rules
or templates, there is the hope of creating a system that is both flexible and extensible when
operating in an open domain [13]. Success in open domain response generation could be useful to
social media platforms and provide conversation-aware autocomplete for responses in progress
or providing a list of suggested responses to a target status.
Researchers have recently observed critical problems applying end-to-end neural network
architectures for dialogue response generation [15, 16, 8]. The neural networks have been

1
unable to generate meaningful responses taking dialogue context into account, which indicates
that the models have failed to learn useful high-level abstractions of the dialogue.
When we want a chatbot that learns from existing conversations between humans and
answers the complex queries, we need the intelligent models like retrieval-based or generative
models. The retrieval–based models pick a response from a collection of responses based on the
query. They do not generate new sentences and have always grammatically correct sentences.
The generative models are much more intelligent. They generate a response word by word
based on the query. They are computationally expensive to train, require huge amounts of data
and often have grammatical errors. They learn the sentence structure by themselves. But the
very important advantage of generative models over all the other types of the models is that
they are able to remember the previous conversations and handle previously unseen queries.
In this work we train the Sequence to Sequence model on the Ubuntu Dialogue Corpus
[6, 9, 10]. We train the generative model, which is generating the responses from scratch and
does not pick the response from the predefined set like retrieval-based models do. To our
knowledge, there was only one paper where the attention mechanism was used on the Ubuntu
Dialogue Corpus [11]. Our work was done independently and in parallel to the paper. We
also visualise attention weights to be able to see to which words the model attends to when
generating the output. Modelling was stopped when perplexity on the evaluation set stopped
decreasing by calculating moving average over the last steps. We also extend the input data
with user manuals in order to enrich it with the technical information. Extending the input
was already done for different tasks, like classification or response selection. We enlarge the
dataset for the response generation task.

2 Related Work
The Ubuntu Dialogue Corpus is the largest publicly available multiturn dialogue corpus and
is used for the task of response selection and generation [9]. There was considered a task of
selecting best next response using TF-IDF, Recurrent Neural networks (RNN) and Long Short-
Term Memory (LSTM). In the next utterance ranking task on the Ubuntu Dialogue Corpus
were evaluated performances of LSTMs, Bi-LSTMs and CNNs on the dataset and created an
ensemble by averaging predictions of multiple models [6]. The best classifier was an ensemble
of 11 LSTMs, 7 Bi-LSTMs and 10 CNNs trained with different meta-parameters.
The more interesting task that the response selection is response generation. Generative
models produce system responses that are autonomously generated word-by-word. They open
up the possibility for flexible and realistic interactions. Generative models were used for building
open domain, conversational dialogue systems based on large dialogue corpora [16]. There
was also introduced the multiresolution recurrent neural network (MrRNN) for generatively
modeling sequential data at multiple levels of abstraction [15]. MrRNN was applied to dialog
response generation on two different tasks: Ubuntu technical support and Twitter conversations,
and evaluated it in a human evaluation study and via automatic evaluation metrics.
To our knowledge, there was only one paper published recently with attention model on the
Ubuntu Dialogue Corpus (”Coherent Dialogue with Attention-based Language Models”) [11].
There was modeled coherent conversation continuation via RNN-based dialogue models. They
investigated how to improve the performance of a recurrent neural network dialogue model via
an attention mechanism. They evaluated the model on two dialogue datasets, the open domain
MovieTriples dataset and the closed domain Ubuntu Troubleshoot dataset. There was also
showed that a vanilla RNN with dynamic attention outperforms more complex memory models
(e.g., LSTM and GRU) by allowing for flexible, long-distance memory. The paper ”Coherent
Dialogue with Attention-based Language Models” was written independently and in parallel to
our work.

3 Sequence to Sequence Models

Here we discuss the difference between retrieval–based and generative models, explain RNNs
and LSTMs, describe encoder–decoder sequence to sequence architectures. After that we show
an interesting example of the attention mechanism, and why it is popular.

3.1 Retrieval-Based vs. Generative Models

Retrieval-based models pick a response from a fixed set and do not generate any new text. A
response is picked based on the input and context with the help of some heuristic (rule-based
expression match or an ensemble of Machine Learning classifiers). The responses are predefined.
Retrieval-based models are of course always grammatically correct. But picking the sentence
out of predefined set rarely makes much sense in Dialogue Systems. There are so many different
possibilities of conversations, that we really need to generate a new response instead of choosing
it.
Generative models is a harder task. There are no predefined responses. The new responses
are generated from scratch. These models are based on Machine Translation techniques, but are
also widely used for dialogue response generation. Generative models give the feeling of talking
to a human and unlike the retrieval-based models can refer back to the entities in the input.
Generative models require huge amounts of training data and are very hard to train. Also they
are likely to make grammatical mistakes, especially if the sentence is long. The research is
moving into the generative direction and in this work a generative model is built.

3.2 RNNs and LSTMs

In some machine learning algorithms (as well as neural networks) it is assumed that the inputs
and outputs are independent of each other. Obviously these models should not be used for
the text generation, where each word in a sentence heavily depends on the previous words.
This is where recurrent neural networks (RNN) come into play. They have a ”memory” which
captures information about what has been calculated so far. RNN performs the same task for
every element of a sequence and that is why is called ”recurrent”.
RNN is neural sequence model that achieves state of the art performance on important tasks
that include language modeling, speech recognition, machine translation and image captioning
[12, 4, 7]. They are widely used for natural language processing. In most cases LSTMs (Long
short-term memory), type of RNNs, are used since they are better at capturing long-term
dependencies [5]. The main difference between RNNs and LSTMs is that they have a different
way of computing the hidden state.
The LSTM contains special units called memory blocks in the recurrent hidden layer [14].
The memory blocks contain memory cells with self-connections storing the temporal state of the
network in addition to special multiplicative units called gates to control the flow of information.
Each memory block in the original architecture contained an input gate and an output gate.
The input gate controls the flow of input activations into the memory cell. The output gate
controls the output flow of cell activations into the rest of the network. Later, the forget gate
was added to the memory block [2].
Figure 1: An unrolled recurrent neural network

On the Figure 11 is a simple representation of the RNN with inputs xi and outputs hi .
RNNs have loops in them, allowing information to persist. It can be thought of as multiple
copies of the same network, each passing a message to a successor. Recurrent neural networks
are intimately related to sequences and lists because of it’s chain-like nature.

3.3 Encoder-Decoder Sequence to Sequence Architectures

In many applications, such as speech recognition, machine translation or question answering
the input and output sequences in the training set are generally not of the same length. RNN
can map an input sequence to an output sequence which is not necessarily of the same length.

Figure 2: Example of an encoder-decoder or sequence to sequence RNN architecture

On the Figure 2 is an example of an encoder-decoder or sequence to sequence RNN archi-

tecture [3]. It is used for learning to generate an output sequence (y (1) ,..., y (ny ) ) given an input
sequence (x(1) ,..., x(nx ) ). It is composed of an encoder RNN that reads the input sequence and
a decoder RNN that generates the output sequence (or computes the probability of a given
output sequence). The final hidden state of the encoder RNN is used to compute a generally
1 http://colah.github.io/posts/2015-08-Understanding-LSTMs/
fixed-size context variable C which represents a semantic summary of the input sequence and
is given as input to the decoder RNN.
The input to the RNN is often called the ”context”. C is a representation of this context. It
might be a vector or sequence of vectors that summarize the input sequence X = (x(1) , ..., x(nx ) ).
The simplest RNN architecture for mapping a variable-length sequence to another variable-
length sequence was proposed recently and the state-of-the-art translation using this approach
was obtained [1, 17]. The idea is:

• an encoder or reader or input RNN processes the input sequence. The encoder emits the
context C, usually as a simple function of its final hidden state.

• a decoder or writer or output RNN is conditioned on that fixed-length vector to generate

the output sequence Y = (y (1) , ..., y (ny ) ).

In a sequence to sequence architecture, the two RNNs are trained jointly to maximize the
average of logP (y (1) , ..., y (ny ) |x(1) , ..., x(nx ) ) over all the pairs of x and y sequences in the training
set. The last state hnx of the encoder RNN is typically used as a representation C of the input
sequence that is provided as input to the decoder RNN.

3.4 Attention Mechanism

An attention mechanism was introduced by Bahdanau, Cho, and Bengio [1]. It is used in the
model to allow the decoder more direct access to the input. The decoder peeks into the different
parts of the source sentence at each step of the output generation. The full source sentence is
no longer encoded into a fixed-length vector. Now the model learns what to attend to based on
what it has produced so far as well as on the input sentence.
On the Figure 3 below st is an RNN hidden state for time t, y’s are the translated words
produced by the decoder, and the x’s are the source sentence words [1]. The a’s are weights
that define in how much of each input state should be considered for each output. For example,
a1,3 shows how much attention the decoder pays to the third state in the source sentence while
producing the first word of the target sentence. Each decoder output word yt now depends not
only on the last state, but also on a weighted combination of all the input states.

Figure 3: The graphical illustration of the model generating the t-th target word yt given a
source sentence (x1 , x2 , ..., xT )
Illustration of the attention mechanism is on the Firgure 4 [1]. The source sentence is
in English and the generated translation is in French. Each pixel shows the weight αij of
the annotation of the j-th source word for the i-th target word in grayscale (0: black, 1:
white). ”La destruction” was translated from ”Destruction” and ”la Syrie” from ”Syria”, so
the corresponding weights are non-black.

Figure 4: Attention mechanism example for word sequences

4 Experimental Results
Due to very long input sequences and a very large dataset, the maximum batch size we could
fit into TITAN X with 12 GB for the model with 2 layers, 1024 units each was 30. This is very
small compare to 0,5 million data instances, which made the training process of such a large
model very slow.
Responses generated by the model were always grammatically correct and different, suitable
as response or not, but very generic. That is why we did not evaluate them using different
metrics such as BLEU or METEOR, but used perplexity, visualized attention and looked at
the generated responses.
When training the model statistics was printed and model was saved as checkpoint every 50
steps. During a step the model was trained using one batch. We also calculate moving average
over last 3000 steps since the perplexities every 50 steps were very different and hard to follow.
We stopped the training process when the perplexity on the evaluation set stopped decreas-
ing. On the Figure 5 is an example of how the perplexity on the evaluation set decreases over
time. On the X axis are the number of steps, and on the Y axis is the perplexity. This model
was trained on TITAN X with 12 GB for 10 days. The training takes that long due to large
input sequences, big number of dialogues and using attention, which further makes the model
bigger and more difficult to train. This model had 2 layers with 1024 units each.
An interesting observation on Figure 5 is that the perplexity of the smallest bucket is very
close to the perplexity on the training set. The perplexities for bigger buckets are larger: the
smallest perplexity is reached for the smallest bucket, and the largest perplexity is reached for
Figure 5: Decreasing of perplexity while training the model

the biggest bucket (bigger bucket - larger perplexity). This shows that it is more difficult for
the model to memorize longer sequences then shorter sequences, as expected.
The responses generated by the models were very generic. The reasons for that may be that
the beam search was not used and that the ”I don’t know” and ”I am not sure” are indeed
the most common answers on the Ubuntu Dialogue Corpus. We show some examples of the
response generated (* indicates the response generated by the model and + is the real answer
by the user).

Example

B: sorry i have no idea what that is

B: You can disable luks password prompt at boot by adding ”rd NO LUKS” kernel flag to
grub.conf
A: yah!! where, grub.cfg? syntax please. thanks
A: whats the syntax for rd NO LUKS? where to put in grub file
B*: I don ’ t know , sorry
B+: it doesn’t say
B+: can you reformat the disk?
When we take a closer look on the other words the model generated as candidates, we notice
much more variety among them (for now we cannot see the other possible responses, as we did
not use beam search and this is an idea for the future work). So on the Figure 6 we show the
top 10 generated tokens - candidates for each word in the output response:

Figure 6: Top 10 most probable tokens for each word in the generated response

First items from each row on Figure 6 form the output ”I don’t know, sorry” for the dialogue
above, since it is one of the most safe answers.
An interesting way to see how the attention is used by the model is to visualise it. On
the Figure 7 are the attention weights visualised (darker square - bigger weight). The X axis
represents the input dialogue and the Y axis shows the output response.

Figure 7: Visualisation of the attention weights

On the Figure 7 we can see, that the model was paying more attention to the words ”pulse”
and ”sound” when generating the output response for the dialogue below. The generated
response is A*.

Example

A: Any idea why empathy’s not playing notification sounds? Even though I have ’em ticked
in preferences
B: restarted it yet?
A: yar
B: check pulse to see if the application is muted for some reason?
B: well Sound settings.
A: Had sound effects turned off in sound settings, didn’t realize that controlled other
applications
B: Ah yea, ive done it a few time it’s annoying
B: My favorite though is recently pulse has been freezing on my desktop and audio will just
not be adjustable for like... 30 seconds or so
A*: I’m not sure, I’m not sure if it’s a problem with the sound card.

In conclusion we can say, that the model tends to give grammatically correct and different,
but generic responses. We believe that adding beam search will give the possibility to see the
other responses as well and choose the one among them.
Consider the next example. The user B at the end was giving the instructions what to do.
The model generated response ”thanks, I’ll try that”, which is suitable in this case. In any
case, there is enormous number of different possible responses, and every human would answer
differently.

Example
A: how do I create a folder from the command line?
A: ooooookay, how do I restart a process? kill and start? or is it easier? :P
B: depends on the process... who is it owned by? is it a system service?
A: nautilus
A: it seems to randomly lock up, so I was going to assign a keyboard shortcut to restart it,
only to find out I don’t know how to restart a process, or if it’s even possible...
B: Use the ”kill” command to send processes signals telling them to exit. You need the
process id to use ”kill” e.g. kill -TERM 1234. You can also use ”killall some-process-name”
e.g. killall nautilus
A*: thanks, I’ll try that
A+: so there’s no restart?

5 Challenges with Application in Industry

Machine learning and text mining methods are now widely used in industry. However, there are
still many open challenges when implementing the chatbots. The deep learning based dialogue
response generation models make it very hard to control the flow of the conversation. It is
difficult to know when a discussion was successful and a case can be ”closed”. It is as well
problematic to incorporate company knowledge into the process. In those ”non goal driven”
dialogue generation methods, it’s of course also hard to include process knowledge (i.e. the
chatbot should always ask for some insurance number in certain cases) - it may learn this from
previous conversations or not.
Another task would be to keep track of the ”state” of the conversation. For instance, a
customer wants to know whether he has warranty for a broken notebook screen. In this case,
the system needs to know when the notebook was purchased and whether it was damaged in an
accident (or is a manufacturing problem). Once the system has this information, it can actually
answer the original request of the customer.
6 Conclusion and Future Work
Dialogue response generation is still a challenge. Every human has his opinion and would
answer differently to the questions in the same situations. Even the same person would answer
differently to the same question depending on his mood and when you ask.
Dialogue response generation is more difficult then the question answering since the input
to the model is not a question, but a dialogue, and we consider the dialogues up to 160 tokens
long. Nevertheless we try to build the deep learning model to generate the responses the human
would give.
As evaluation metric we used perplexity and stopped the training when the perplexity on
the evaluation set stopped decreasing. Due to the large input size, long sequences and using
attention mechanism, the model used was large (e.g. 2 layers, each with 1024 units). Training
time of such a model was around 10 days on TITAN X with 12 GB. The responses produced by
the models were always grammatically correct, different, but generic. We believe that further
improvements to the model, like using beam search will help to overcome this issue.
The reason why our models produce such generic responses could be that the beam search
was not used while decoding. It is currently not implemented in TensorFlow models, so there
is still work to be done. As our research showed, the model without beam search produces very
generic responses in the dialogue response generation task. When displaying the best suitable
words for the response proposed by the model, we observe, that the model generated many
more possible responses to choose from. With the beam search we will be able to see them
and then pick the best one. For now the model picks the highest probable response, which is
natural and logical since the answer ”I am not sure, sorry” indeed can be given as response to
any possible question.
In this work we used the manpages as additional knowledge. Another way would be to
include the different type of knowledge from Ask Ubuntu2 (a question and answer site for
Ubuntu users and developers). In this case we would search through the website, find the most
relevant question for each dialogue and then condition on the answer to the question (or both
the question and the answer).
Another interesting idea for the future work would be to compare the performance differ-
ence between a model using attention and another model not using attention. Here the
availability of a GPU with large RAM should be considered, as well as possible large amount
of training time.
When analysing results we looked at perplexity, visualised attention weights and analysed
the model responses. But due to very generic responses generated by the model we did not
evaluate the results using all the evaluation metrics described before. During this work many
different evaluation metrics where analysed and compared. So, when better results are available
they can be analysed using another measures.
We implemented additional knowledge into the model appending it to the input sequence.
Another way would be to add the manpage information in a separate encoder RNN, and
then perform attention on this jointly. This would be better because the model can easily
separate out dialogue history from the technical manpages. For example, the model could learn
a separate attention mechanism for the dialogue history and for the man pages. It would also
shorten the length of the encoding sequences, which helps to reduce long-term dependencies.

2 www.askubuntu.com
References
[1] Dzmitry Bahdanau, Kyunghyun Cho, and Yoshua Bengio. Neural machine translation by jointly
learning to align and translate. CoRR, abs/1409.0473, 2014.
[2] Felix A Gers, Jürgen Schmidhuber, and Fred Cummins. Learning to forget: Continual prediction
with lstm. Neural computation, 12(10):2451–2471, 2000.
[3] Ian Goodfellow, Yoshua Bengio, and Aaron Courville. Deep Learning. MIT Press, 2016. http:
//www.deeplearningbook.org.
[4] Alex Graves, Abdel-rahman Mohamed, and Geoffrey E. Hinton. Speech recognition with deep
recurrent neural networks. CoRR, abs/1303.5778, 2013.
[5] Sepp Hochreiter and Jürgen Schmidhuber. Long short-term memory. Neural Comput., 9(8):1735–
1780, November 1997.
[6] Rudolf Kadlec, Martin Schmid, and Jan Kleindienst. Improved deep learning baselines for ubuntu
corpus dialogs. CoRR, abs/1510.03753, 2015.
[7] Nal Kalchbrenner and Phil Blunsom. Recurrent continuous translation models. In EMNLP,
volume 3, page 413, 2013.
[8] Jiwei Li, Michel Galley, Chris Brockett, Jianfeng Gao, and Bill Dolan. A diversity-promoting
objective function for neural conversation models. CoRR, abs/1510.03055, 2015.
[9] Ryan Lowe, Nissan Pow, Iulian Serban, and Joelle Pineau. The ubuntu dialogue corpus: A large
dataset for research in unstructured multi-turn dialogue systems. CoRR, abs/1506.08909, 2015.
[10] Ryan Lowe, Nissan Pow, Iulian V. Serban, Laurent Charlin, Chia-Wei Liu, and Joelle Pineau.
Training end-to-end dialogue systems with the ubuntu dialogue corpus. Dialogue Discourse,
8(1):31–65, 2017.
[11] Hongyuan Mei, Mohit Bansal, and Matthew R. Walter. Coherent dialogue with attention-based
language models. CoRR, abs/1611.06997, 2016.
[12] Tomas Mikolov and Geoffrey Zweig. Context dependent recurrent neural network language model.
SLT, 12:234–239, 2012.
[13] Alan Ritter, Colin Cherry, and William B. Dolan. Data-driven response generation in social media.
In Proceedings of the Conference on Empirical Methods in Natural Language Processing, EMNLP
’11, pages 583–593, Stroudsburg, PA, USA, 2011. Association for Computational Linguistics.
[14] Hasim Sak, Andrew W. Senior, and Françoise Beaufays. Long short-term memory based recurrent
neural network architectures for large vocabulary speech recognition. CoRR, abs/1402.1128, 2014.
[15] Iulian Vlad Serban, Tim Klinger, Gerald Tesauro, Kartik Talamadupula, Bowen Zhou, Yoshua
Bengio, and Aaron C. Courville. Multiresolution recurrent neural networks: An application to
dialogue response generation. CoRR, abs/1606.00776, 2016.
[16] Iulian Vlad Serban, Alessandro Sordoni, Yoshua Bengio, Aaron C. Courville, and Joelle Pineau.
Hierarchical neural network generative models for movie dialogues. CoRR, abs/1507.04808, 2015.
[17] Wojciech Zaremba, Ilya Sutskever, and Oriol Vinyals. Recurrent neural network regularization.
CoRR, abs/1409.2329, 2014.

Textnow PDF
100% (1)
Textnow PDF
6 pages
Replika: Building An Emotional Conversation With Deep Learning
100% (3)
Replika: Building An Emotional Conversation With Deep Learning
26 pages
Transforming Education with AI: Guide to Understanding and Using ChatGPT in the Classroom
From Everand
Transforming Education with AI: Guide to Understanding and Using ChatGPT in the Classroom
Shane Snipes, PhD
No ratings yet
TensorFlow in 1 Day: Make your own Neural Network
From Everand
TensorFlow in 1 Day: Make your own Neural Network
Krishna Rungta
3.5/5 (10)
The Newbie’s Guidebook to ChatGPT: A Beginner's Tutorial: The Newbie’s Guidebook
From Everand
The Newbie’s Guidebook to ChatGPT: A Beginner's Tutorial: The Newbie’s Guidebook
Timothy King
No ratings yet
A Task-Oriented Chatbot Based On LSTM and Reinforcement Learning
No ratings yet
A Task-Oriented Chatbot Based On LSTM and Reinforcement Learning
5 pages
Deep Learning for Beginners: A Comprehensive Introduction of Deep Learning Fundamentals for Beginners to Understanding Frameworks, Neural Networks, Large Datasets, and Creative Applications with Ease
From Everand
Deep Learning for Beginners: A Comprehensive Introduction of Deep Learning Fundamentals for Beginners to Understanding Frameworks, Neural Networks, Large Datasets, and Creative Applications with Ease
Steven Cooper
5/5 (1)
Deep Learning: Fundamentals and Applications
From Everand
Deep Learning: Fundamentals and Applications
Fouad Sabry
No ratings yet
Neural Networks: A Practical Guide for Understanding and Programming Neural Networks and Useful Insights for Inspiring Reinvention
From Everand
Neural Networks: A Practical Guide for Understanding and Programming Neural Networks and Useful Insights for Inspiring Reinvention
Steven Cooper
No ratings yet
Internet of Things (IoT) A Quick Start Guide: A to Z of IoT Essentials
From Everand
Internet of Things (IoT) A Quick Start Guide: A to Z of IoT Essentials
Chitra Lele
No ratings yet
Posterior-GAN: Towards Informative and Coherent Response Generation With Posterior Generative Adversarial Network
No ratings yet
Posterior-GAN: Towards Informative and Coherent Response Generation With Posterior Generative Adversarial Network
8 pages
Natural Language Understanding: Fundamentals and Applications
From Everand
Natural Language Understanding: Fundamentals and Applications
Fouad Sabry
No ratings yet
Deep Learning With Python Illustrated Guide For Beginners & Intermediates: The Future Is Here!: The Future Is Here!, #2
From Everand
Deep Learning With Python Illustrated Guide For Beginners & Intermediates: The Future Is Here!: The Future Is Here!, #2
William Sullivan
1/5 (1)
Homo Ludens in the Loop: Playful Human Computation Systems
From Everand
Homo Ludens in the Loop: Playful Human Computation Systems
Markus Krause
No ratings yet
Artificial Intelligence Systems Integration: Fundamentals and Applications
From Everand
Artificial Intelligence Systems Integration: Fundamentals and Applications
Fouad Sabry
No ratings yet
Deep Learning
From Everand
Deep Learning
Manish Soni
No ratings yet
CHATGPT DALL.E 3: Complete Guide. Third Edition
From Everand
CHATGPT DALL.E 3: Complete Guide. Third Edition
Hesham Mohamed Elsherif
No ratings yet
Seq2Seq Attention Mechanism
No ratings yet
Seq2Seq Attention Mechanism
19 pages
50 Breakthrough AI Concepts in 500 Words Each: In 500 words, #17
From Everand
50 Breakthrough AI Concepts in 500 Words Each: In 500 words, #17
Nietsnie Trebla
No ratings yet
TensorFlow Developer Certification Guide: Crack Google's official exam on getting skilled with managing production-grade ML models
From Everand
TensorFlow Developer Certification Guide: Crack Google's official exam on getting skilled with managing production-grade ML models
Patrick J
No ratings yet
TensorFlow Developer Certification Guide
From Everand
TensorFlow Developer Certification Guide
Patrick J
No ratings yet
Deep Learning Frameworks
From Everand
Deep Learning Frameworks
Jamal Hopper
No ratings yet
Unveiling the Secrets of ChatGPT Inside the Mind of an AI
From Everand
Unveiling the Secrets of ChatGPT Inside the Mind of an AI
Nelson Ambrose
No ratings yet
Gensim for Natural Language Processing: Definitive Reference for Developers and Engineers
From Everand
Gensim for Natural Language Processing: Definitive Reference for Developers and Engineers
Richard Johnson
No ratings yet
Operating System Interview Questions and Answers
From Everand
Operating System Interview Questions and Answers
Manish Soni
No ratings yet
Conceptual Dependency Theory: Fundamentals and Applications
From Everand
Conceptual Dependency Theory: Fundamentals and Applications
Fouad Sabry
No ratings yet
Implementing Chatbots Using Neural Machine Translation Techniques
No ratings yet
Implementing Chatbots Using Neural Machine Translation Techniques
44 pages
Knowledge Reasoning: Fundamentals and Applications
From Everand
Knowledge Reasoning: Fundamentals and Applications
Fouad Sabry
No ratings yet
Deep Learning with Keras: Beginner’s Guide to Deep Learning with Keras
From Everand
Deep Learning with Keras: Beginner’s Guide to Deep Learning with Keras
Frank Millstein
3/5 (1)
Practical Mathematics for AI and Deep Learning: A Concise yet In-Depth Guide on Fundamentals of Computer Vision, NLP, Complex Deep Neural Networks and Machine Learning (English Edition)
From Everand
Practical Mathematics for AI and Deep Learning: A Concise yet In-Depth Guide on Fundamentals of Computer Vision, NLP, Complex Deep Neural Networks and Machine Learning (English Edition)
Tamoghna Ghosh
No ratings yet
FINAL-MIDTERM Major2
No ratings yet
FINAL-MIDTERM Major2
20 pages
Unraveling the Magic of Large Language Models: A Journey into the Future of Communication
From Everand
Unraveling the Magic of Large Language Models: A Journey into the Future of Communication
Lila Hartney
No ratings yet
Natural Language Understanding
From Everand
Natural Language Understanding
Kai Turing
No ratings yet
Prompt Engineering Unleashed: Crafting the Future of AI Communication
From Everand
Prompt Engineering Unleashed: Crafting the Future of AI Communication
Michael Ferguson
No ratings yet
Hugging Face Transformers Essentials: From Fine-Tuning to Deployment
From Everand
Hugging Face Transformers Essentials: From Fine-Tuning to Deployment
Robert Johnson
No ratings yet
Wen 2018 PHD
No ratings yet
Wen 2018 PHD
174 pages
Low-Resource Adaptation of Open-Domain Generative Chatbots
No ratings yet
Low-Resource Adaptation of Open-Domain Generative Chatbots
8 pages
Beginning Machine Learning in iOS: CoreML Framework
From Everand
Beginning Machine Learning in iOS: CoreML Framework
Mohit Thakkar
No ratings yet
Prompt Engineering ; The Future Of Language Generation
From Everand
Prompt Engineering ; The Future Of Language Generation
Michael Ferguson
3.5/5 (3)
Deep learning: deep learning explained to your granny – a guide for beginners
From Everand
Deep learning: deep learning explained to your granny – a guide for beginners
PAT NAKAMOTO
3/5 (2)
Explanation Based Learning: Fundamentals and Applications
From Everand
Explanation Based Learning: Fundamentals and Applications
Fouad Sabry
No ratings yet
The Art Of Conversation With ChatGPT
From Everand
The Art Of Conversation With ChatGPT
Hakan SAĞLIK
No ratings yet
Applied Deep Learning: Design and implement your own Neural Networks to solve real-world problems (English Edition)
From Everand
Applied Deep Learning: Design and implement your own Neural Networks to solve real-world problems (English Edition)
Dr. Rajkumar Tekchandani
No ratings yet
Question Answering: Fundamentals and Applications
From Everand
Question Answering: Fundamentals and Applications
Fouad Sabry
No ratings yet
Working of Chatgpt Report
No ratings yet
Working of Chatgpt Report
24 pages
Unit 5.
No ratings yet
Unit 5.
17 pages
Untrapped Value:: Software Reuse Powering Future Prosperity
From Everand
Untrapped Value:: Software Reuse Powering Future Prosperity
David Erickson
No ratings yet
PYTHON MACHINE LEARNING: Leveraging Python for Implementing Machine Learning Algorithms and Applications (2023 Guide)
From Everand
PYTHON MACHINE LEARNING: Leveraging Python for Implementing Machine Learning Algorithms and Applications (2023 Guide)
Roberta Bowman
No ratings yet
Applied Machine Learning Solutions with Python: SOLUTIONS FOR PYTHON, #1
From Everand
Applied Machine Learning Solutions with Python: SOLUTIONS FOR PYTHON, #1
rayaan
No ratings yet
A Unified Framework For Emotion Identification and
No ratings yet
A Unified Framework For Emotion Identification and
6 pages
ChatGPT Simplified: A Comprehensive Guide to Understanding and Utilizing AI Language Models, ChatGPT-4, ChatGPT Prompts, Fiction Writing, Blogging, Content Writing, Make Money Online
From Everand
ChatGPT Simplified: A Comprehensive Guide to Understanding and Utilizing AI Language Models, ChatGPT-4, ChatGPT Prompts, Fiction Writing, Blogging, Content Writing, Make Money Online
Silas Quantum
5/5 (1)
111-115, Tesma0712, IJEAST, 19806
No ratings yet
111-115, Tesma0712, IJEAST, 19806
5 pages
Machine Learning and Deep Learning With Python
From Everand
Machine Learning and Deep Learning With Python
James Chen
No ratings yet
Blackboard System: Fundamentals and Applications
From Everand
Blackboard System: Fundamentals and Applications
Fouad Sabry
No ratings yet
Full Text 01
No ratings yet
Full Text 01
132 pages
Skeleton-to-Response: Dialogue Generation Guided by Retrieval Memory
No ratings yet
Skeleton-to-Response: Dialogue Generation Guided by Retrieval Memory
8 pages
Python Text Mining: Perform Text Processing, Word Embedding, Text Classification and Machine Translation
From Everand
Python Text Mining: Perform Text Processing, Word Embedding, Text Classification and Machine Translation
Alexandra George
No ratings yet
Long Short Term Memory: Fundamentals and Applications for Sequence Prediction
From Everand
Long Short Term Memory: Fundamentals and Applications for Sequence Prediction
Fouad Sabry
No ratings yet
Me and My AI: 1, #1
From Everand
Me and My AI: 1, #1
Factsmasterx
No ratings yet
Prompt Engineering Tutorial – Master ChatGPT and LLM Responses
From Everand
Prompt Engineering Tutorial – Master ChatGPT and LLM Responses
tarek mohamed
5/5 (1)
Automatic Image Annotation: Enhancing Visual Understanding through Automated Tagging
From Everand
Automatic Image Annotation: Enhancing Visual Understanding through Automated Tagging
Fouad Sabry
No ratings yet
HP-UX Performance and Tuning (H4262S)
100% (1)
HP-UX Performance and Tuning (H4262S)
616 pages
Memory and Storage System
No ratings yet
Memory and Storage System
10 pages
Streaming Algorithms
No ratings yet
Streaming Algorithms
73 pages
Chapter 6 (Deadlock)
No ratings yet
Chapter 6 (Deadlock)
53 pages
Scipy Lectures
100% (1)
Scipy Lectures
184 pages
DQLab Meetup #15 - Tableau 101 PDF
No ratings yet
DQLab Meetup #15 - Tableau 101 PDF
39 pages
Binomial Heaps PDF
No ratings yet
Binomial Heaps PDF
26 pages
Image and Audio Steanography
No ratings yet
Image and Audio Steanography
49 pages
Query Optimization
No ratings yet
Query Optimization
103 pages
Network Security
100% (2)
Network Security
18 pages
How RAM Works
100% (2)
How RAM Works
3 pages
PESC1430 Spec AP
No ratings yet
PESC1430 Spec AP
2 pages
Real-Time Kernel For The 8051 Family: Reference Manual September 2005
No ratings yet
Real-Time Kernel For The 8051 Family: Reference Manual September 2005
110 pages
Owasp Webscarab: Uncovering The Hidden Treasures
No ratings yet
Owasp Webscarab: Uncovering The Hidden Treasures
32 pages
SSD1353
No ratings yet
SSD1353
75 pages
HDD Rebuild Instructions For NAS LS WSXL
No ratings yet
HDD Rebuild Instructions For NAS LS WSXL
20 pages
FSM Design
100% (1)
FSM Design
24 pages
Apps
No ratings yet
Apps
133 pages
Algorithm Diniz
No ratings yet
Algorithm Diniz
8 pages
Readme Cadence
No ratings yet
Readme Cadence
12 pages
Online Library Management System: Ashutosh Tripathi & Ashish Srivastava
No ratings yet
Online Library Management System: Ashutosh Tripathi & Ashish Srivastava
7 pages
Advanced KM Development
No ratings yet
Advanced KM Development
287 pages
Connectivity - Alliance 7.0 - FileAct Support in SWIFTNet Release 7.0 - April 2011 - IP - Alliance - Interfaces - Fileact - Support - in - Swiftnet - Rel - 7 PDF
No ratings yet
Connectivity - Alliance 7.0 - FileAct Support in SWIFTNet Release 7.0 - April 2011 - IP - Alliance - Interfaces - Fileact - Support - in - Swiftnet - Rel - 7 PDF
23 pages
Software Engineering With LabVIEW
No ratings yet
Software Engineering With LabVIEW
56 pages
Little Minion Computer Notes (Student Copy)
No ratings yet
Little Minion Computer Notes (Student Copy)
6 pages
Qual Comm 2016
100% (2)
Qual Comm 2016
15 pages
Introduction To THE TMS320C6000 Vliw DSP: Prof. Brian L. Evans
No ratings yet
Introduction To THE TMS320C6000 Vliw DSP: Prof. Brian L. Evans
33 pages
02-Message Analyzer-Neil B Martin
No ratings yet
02-Message Analyzer-Neil B Martin
20 pages
Mpls VPN: Nguyen Hoang Minh Chiem Thach Phat
No ratings yet
Mpls VPN: Nguyen Hoang Minh Chiem Thach Phat
38 pages

Cscubs Dialogues

Uploaded by

Cscubs Dialogues

Uploaded by

Dialogue Response Generation using Neural

Networks with Attention and Background

3 Sequence to Sequence Models

3.1 Retrieval-Based vs. Generative Models

3.2 RNNs and LSTMs

3.3 Encoder-Decoder Sequence to Sequence Architectures

Figure 2: Example of an encoder-decoder or sequence to sequence RNN architecture

On the Figure 2 is an example of an encoder-decoder or sequence to sequence RNN archi-

• a decoder or writer or output RNN is conditioned on that fixed-length vector to generate

3.4 Attention Mechanism

Figure 4: Attention mechanism example for word sequences

B: sorry i have no idea what that is

Figure 7: Visualisation of the attention weights

5 Challenges with Application in Industry

You might also like