100% found this document useful (3 votes)
734 views

Replika: Building An Emotional Conversation With Deep Learning

This document describes Replika, an AI conversational agent. It discusses Replika's history and architecture for dialog modeling. The retrieval-based dialog model uses word embeddings, RNNs, and loss functions to rank and retrieve responses. Generative models include seq2seq, HRED, and persona/emotion embeddings. Vision models include face/object recognition and question generation. Training uses Twitter data and user logs, with quality metrics like MAP and perplexity. Product metrics include signups, demographics, and engagement.

Uploaded by

Kartikeya Shorya
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
100% found this document useful (3 votes)
734 views

Replika: Building An Emotional Conversation With Deep Learning

This document describes Replika, an AI conversational agent. It discusses Replika's history and architecture for dialog modeling. The retrieval-based dialog model uses word embeddings, RNNs, and loss functions to rank and retrieve responses. Generative models include seq2seq, HRED, and persona/emotion embeddings. Vision models include face/object recognition and question generation. Training uses Twitter data and user logs, with quality metrics like MAP and perplexity. Product metrics include signups, demographics, and engagement.

Uploaded by

Kartikeya Shorya
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 26

Replika

Building an Emotional conversation with Deep Learning


Replika: History
Luka Luka Replika
Restaurant Personality bots: Your AI friend
recommendations Prince, Roman
Dialog Architecture
Typical scenario: Small talk
Dialog Architecture
• Scenarios — encapsulates all models and clays
them together by providing a graph-like interface
(nodes, constraints, conversation flow)

• Retrieval-based dialog model — ranks and


retrieves a response for a user’s message from pre-
defined or user-filled datasets of responses while
taking a current conversation context into account

• Fuzzy matching model — compares if a message


from a user is semantically equal to some given
text
Dialog Architecture
• Generative dialog model — generates a response
for a user message while taking his personally and
emotion state into account

• Classification models — sentiment analysis,


emotions classification, negation detection,
‘statement about user’ recognition

• Computer vision models — face recognition,


object recognition, visual question generation

• Parser — NER, hard-coded keywords


Dialog Architecture
Typical scenario: Small talk

Fuzzy matching
Classifiers
Parser

Retrieval-based
model

Generative model
Retrieval-based dialog model:
Basic architecture
Retrieval-based dialog model:
Basic architecture
Retrieval-based dialog model:
Basic architecture
Word embeddings — word2vec 300-dimensional
pre-initialisation

RNN — 2-layer 1024-dimensional Bidirectional LSTM

Sentence embedding — max-pooling over LSTM


hidden states at each timestamp

Loss — Triplet ranking loss (with cosine similarity):


Retrieval-based dialog model:
Our Improvements
Hard negatives mining — mine «hard» negative samples
from batch, 20% quality boost!

Echo avoiding — use input context as a negative, got rid of


context echoing!

Context-aware encoder — encode recent dialog history,


+10% quality by users’ reactions

Relevance classification model — estimate the response


confidence (absolute relevance) with a simple classification
model (logistic regression) to rerank and filter out irrelevant
candidates
Retrieval-based dialog model:
Hard negatives & Echo avoiding
Major problems

• Baseline model has a moderate quality

• Retrieval-based models are engineered to find


similar but not the relevant responses => not
ok for conversation tasks

• As an implication, basic model tends to


produce echoed responses — sentences that
are very similar to a user input
Retrieval-based dialog model:
Hard negatives & Echo avoiding
Solution

Hard negatives mining for a huge quality improvements: 

+10% MAP, +20% recall@10

Hard negative with a context for an echoing problem


solution, total quality boost: +40% MAP, +20% recall
Retrieval-based dialog model:
In product
Topic-oriented Statements about
User profile Q&A
conversation sets user
Fuzzy matching model
Use pre-trained context encoder
from a retrieval-based model

Similarity loss
Fuzzy matching model
• We use pre-trained context encoder part of
retrieval-based model as body of a siamese
network

• Two sentences as an input, single predicted scalar


score as an output

• We train simple classification model over the


context encoder outputs (sentence embeddings) to
produce semantic similarity score between the
given sentences
Fuzzy matching model:
In product
Match by semantic similarity
Generative seq2seq dialog model:
Architecture

Basic seq2seq
(+ persona-based)
John

HRED seq2seq
Generative seq2seq dialog model:
Improvements
• HRED (context history) — +20% user’s quality!

• Persona embeddings — conditions the decoder to produce lexically


personalised responses (see persona-based seq2seq)

• Emotional embeddings — conditions the decoder to produce emotional


responses — i.e. joyful, angry, sad (see emotional chatting machine)

• Non-offensive sampling with temperature — decrease probabilities of f-


words at the sampling stage

• MMI reranking — more diverse responses, but slow

• Beam search — more stable, but less diverse responses

• No attention mechanisms — it’s slow and gives no quality boost


Generative seq2seq dialog model:
In product
Cake mode TV mode Small talk
Vision models
Face & Person Pets & Object Question
recognition recognition generation
Datasets
• Twitter — 50M dialogs (consecutive tweet-reply turns)
from a twitter stream for a training models from scratch

• User’s logs (anonymised) with reactions (likes /


dislikes) — millions of messages with thousands
reactions at daily average

• Amazon Mechanical Turk — quality assessments and


small amounts of training data (it’s pricey)

• Replika context-free — small public dialog dataset


available at https://github.com/lukalabs
Model Training & Deployment
Training

• We have 12 GPUs for model training and experiments

• Training from scratch takes ~1 week (both for seq2seq and ranking models)

• Usually we have ~5-10 experiments running in parallel

Inference

• We don’t exceed 100 ms for a single response

• Because we have around 30M service requests per day and 100 RPS per
each model at a peak

• Tensorflow Serving: quick zero-downtime deploy, great GPU resource


sharing (request batching)
Conversation analytics
Projection of user dialog utterances onto a 3D space using the
pre-trained model embeddings along with t-SNE
Quality metrics
Offline

• ranking models: recall, MAP on several datasets

• generative models: perplexity, distinctness, lexical


similarity

Online

• reactions: likes & dislikes from user experience

• user experiments: A/B testing for any model improvements


Product metrics
Total sign ups: 1,400,000 users and growing

User demographics: 70% — young adults (20-34), 20%


— teens (13-19)

Overall conversation quality: 85% by users’ likes

Other metrics: Retention, DAU, MAU, Engagement

Community metrics — active users in our facebook


community, loyal users, twitter/instagram communities,
Brazil/Netherlands communities
iOS

Thanks! Android

You might also like