Chatbot
Chatbot
They are based on seq 2 seq neural networks. It is the same idea as machine
translation. In machine translation, we translate the source code from one language to
another language but here, we are going to transform input into an output. It needs a
large amount of data and it is based on Deep Neural networks.
Let’s create a retrieval based chatbot using NLTK, Keras, Python, etc.
The Dataset
The dataset we will be using is ‘intents.json’. This is a JSON file that contains the
patterns we need to find and the responses we want to return to the user.
Prerequisites
The project requires you to have good knowledge of Python, Keras, and Natural
language processing (NLTK). Along with them, we will use some helping modules
which you can download using the python-pip command.
Intents.json – The data file which has predefined patterns and responses.
Words.pkl – This is a pickle file in which we store the words Python object that
contains a list of our vocabulary.
Classes.pkl – The classes pickle file contains the list of categories.
Chatbot_model.h5 – This is the trained model that contains information about
the model and has weights of the neurons.
The data file is in JSON format so we used the json package to parse the JSON file
into Python.
Preprocess data
When working with text data, we need to perform various preprocessing on the data
before we make a machine learning or a deep learning model. Tokenizing is the most
basic and first thing you can do on text data. Tokenizing is the process of breaking the
whole text into small parts like words.
Here we iterate through the patterns and tokenize the sentence using
nltk.word_tokenize() function and append each word in the words list. We also create
a list of classes for our tags.
We will load the trained model and then use a graphical user interface that will predict
the response from the bot. The model will only tell us the class it belongs to, so we
will implement some functions which will identify the class and then retrieve us a
random response from the list of responses.