for slides
for slides
WhatsApp
and Signal are two messaging apps dominating the headlines, let's take a look at why
- WhatsApp recently updated its privacy policy, stating that the messaging platform will
share user data with other Facebook-owned and third-party apps. This has prompted
several users to look for alternative platforms, top among them is Signal. Signal is
essentially an encrypted messaging app. Messages sent through Signal are said to be
encrypted, meaning the platform cannot access private messages or media, or store
them on their server. This is called end-to-end encryption. End-to-End
Encryption(E2EE) is the most important feature in real-time chat applications. Our
article will cover:
o Real-time Systems
o Web Sockets
o End-to-end Encryption
o Comparison of messaging applications
A real-time system means sending and receiving of data instantly over a network
among multiple clients. One may specify this as bi-directional flow. This enables users
to make the right decisions at the right time. In simple systems, data transfer usually
takes place through a request-response mechanism using a client-server architecture.
For any app to feel real-time, the user needs to be kept updated with any activity
happening as soon as possible. The challenge arises in selecting and implementing a
suitable development technique. With the traditional request-response model, we have
few options:
1. Refresh Webpage
The user might refresh the web page time-to-time to check for message updates. But
that is not an optimal solution. This may result in bad UX.
2. HTTP protocol
The concept of HTTP request-response is widely used. But this requires establishing a
TCP connection every time data is sent to the server. Being a one-way synchronous
communication protocol, this may result in a lot of overheads while creating and
destroying a TCP connection every time a message is sent in real-time chat
applications.
3.
HTTP 1.1 Keep-Alive Protocol
This version of HTTP eliminates the need for opening a TCP connection for each HTTP
request. This means that it helps in maintaining a persistent connection. But it still does
not provide us with full-duplex communication as required in real-time applications.
4. Short Polling
An AJAX based-timer! Means that the client sends HTTP requests time-to-time and the
response is immediately given by the server. Although it is asynchronous, it uses a lot of
resources, thus creating traffic. The resources are immediately released but cannot be
used in heavy applications as in real-time.
5. Long Polling
This involves less traffic as compared to short polling. Here, the responses are not
immediate, but this makes the application hold the resources for some time, hence
leaving the requests unresolved. Also, we have to perform re-authentication or re-
authorization several times. Again, this is not a good option for real-time applications.
6. Server-Sent Events
This is a mechanism used by the server to update the client whenever any event takes
place. It is quite useful in real-time applications and it performs like a one-way publish-
subscribe model. We want our application to perform bi-directional communication.
Web Sockets
This concept resolves most of the issues we just discussed. It implements instant two-
way communication of messages with a persistent connection just as required for
developing a real-time system. NodeJS offers several libraries to implement this
technology. What we will be utilizing for our application is the Web Socket API with
‘WebSocket’ library.
The closing handshake can take place either by the client or the server. Reconnection
has to be done manually.
Real-Time Chat Application Architecture: High-Level Diagram using Web Sockets
End-to-end Encryption
Now that we have our messages transferring instantly from client to server and back,
let’s discuss how we can make our data secure over the network. Various algorithms
and protocols are working on the internet these days to make the exchange of
confidential information secure. Messaging applications do implement encryption, but
not each one of them makes the encryption end-to-end. This means that not even the
server can decrypt our messages. But why do we need to make the application that
secure?
1. Need for End to End Encryption
The answer is simple - to make the user’s private information hidden from any third
party user. This may be the government, hackers or any other intelligence agency. The
service provider may or may not allow third-parties like the government to access the
data as in the case of any criminals or terrorist activities. But what if the servers get
hacked? The information might then be in the wrong hands. In such cases, the users
prefer to choose end-to-end encryption, where even the service provider cannot access
decrypted data.
While many applications mention that they implement end-to-end encryption, only a few
of them prove to do so. The very famous Telegram application provides an optional
feature of Secret chats, using a protocol named “MTProto”.
While Whatsapp, Facebook and Signal Messenger use the Signal Protocol
developed by the Open Whisper Systems, only the Signal Messenger proved to
be the most secure application.
This is because it encrypts the metadata as well, and has also denied the intelligence
agencies to provide them with any user’s information. Moreover, the protocol is
available as open-source code to be used or cross-verified by other developers, which
makes it the most trustworthy messaging application.
3. Diffie-Hellman Key Exchange Algorithm
Today we will be discussing the Signal Protocol in detail. But before that, we need to be
aware of the Diffie-Hellman key exchange mechanism. With simple encryption, the
messages are usually encrypted only between the users and the server, making use of
some cryptographic keys, hence making data vulnerable at the server. We want these
keys only to exist between the users and not the server. But how is this possible?
Suppose we have two Clients - Alice and Bob.
o Alice and Bob agree to use two common prime numbers (g & n)
provided by the server.
o Now, these are combined using some mathematical calculations with
the Private keys of Alice and Bob => a + g = ag and b + g = bg.
o We exchange these Ephemeral/Public Keys ag and bg via server.
o Combine the exchanged keys with the Private keys of Alice and Bob
respectively to form a Shared Secret Key => ag+b = agb and bg+a
= bga at both ends.
o Now the attacker might be aware of g, n, ag & bg as these are being
shared publicly, but not a & b since these are private keys only available
to Alice and Bob.
o It is too difficult for any intruder to split up the public
components ag and bg.
o Any attacker can combine ag+bg = abgg (extra bit) - too hard to figure
out.
This mechanism was developed by Whitfield Diffie andMartin Hellman to derive the
cryptographic keys instead of exchanging them completely in public. It is explained
using colors since it is not possible to separate colours once mixed. Similarly, it is hard
to figure out the secret keys using the only public components, once combined
mathematically with the prime numbers provided by the server.
To perform authentication, this algorithm is integrated with other algorithms that provide
authentication (ECDH) or derived multiple times mathematically (X3DH). That is when
RSA came to rescue. The sender not only performs Diffie-Hellman but also shares
his/her signature to ensure that only he/she has sent that message.
Here is an example of the audible website, where you may see that the
security protocols being used are TLS 1.2, ECDHE_RSA, AES, and not DH
alone. This is how TLS, VPNs, and HTTP work. However, this algorithm was
very slow and didn’t provide perfect forward secrecy. Wait, what is Forward
Secrecy now?
o Steps 1-9: These are the same as we discussed during the Web Socket
setup.
o Step 3: Additional Step, storing and fetching prekey bundle to/from
LocalStorage on user login.
o Integration of Signal Protocol is depicted in the Low-Level diagram
mentioned below.
1. Initialize Signal Server Store before login.
2. On user Login, Axios calls are made to verify if the user exists,
returning Users details as an object.
3. The Signal Protocol Manager is then initialized for each logged-in user,
at App.js.
4. After login, the Chat Window appears, with two sub-components,
Contact List and Message Box.
5. The Chat Window makes an Axios call to the server, to fetch all
contacts except the logged-in one and which are not equal to
the role of the logged-in user.
6. It then displays the contacts in the Contact List component.
7. A user can select a contact to Chat with; then the selected user Id is
sent to the chat window to display its messages (if any) in the message
box component, and for further communication.
8. When a user hits enter to send a message, it is first encrypted using
Signal, and then sent to the server using Web Socket.
9. On receiving a message, it is checked by the client if it is its message. If
no then it is sent to Signal for decryption, else the last message is
used.
10. The chats in the message box (decrypted) and local storage
(encrypted) are updated with new messages.