The Websocket Handbook
The Websocket Handbook
Handbook
Learn about the technology underpinning
the realtime web and build your first web
app powered by WebSockets
v2.0 • February 2022
Special Thanks
In no particular order: Jo Franchetti (for contributing Chapter 4 and building the demo
app), Ramiro Nuñez Dosio (for encouraging me to write the book in the first place,
giving valuable advice, and removing blockers), Jonathan Mercier-Ganady (for the
technical review), Jo Stichbury (for the editorial review), Leonie Wharton, Chris Hipson,
Jamie Watson (for all the design work involved), Ben Gamble (for helping me define and
write Chapter 5).
Preface 06
Who this book is for 06
What this book covers 07
Resources 77
References (ordered alphabetically) 77
Videos 78
Further reading 78
Open-source WebSocket libraries 78
Final thoughts 79
About Ably 80
Preface
Our everyday digital experiences are in the midst of a
realtime revolution. Whether we’re talking about virtual
events, EdTech, news and financial information, IoT
devices, asset tracking and logistics, live score updates, or
gaming, consumers increasingly expect realtime digital
experiences as standard. And what better to power these
realtime interactions than WebSockets?
Until the emergence of WebSockets, the “realtime” web was difficult to achieve and
slower than we’re used to nowadays; it was delivered by hacking existing HTTP-based
technologies that were not designed and optimized for realtime applications.
Knowledge of/familiarity with HTML, JavaScript (and Node.js), HTTP, web APIs, and web
development is required to get the most out of this book.
Chapter 2: The WebSocket Protocol covers key considerations related to the WebSocket
protocol. You’ll find out how to establish a WebSocket connection and exchange
messages, what kind of data can be sent over WebSockets, what types of extensions and
subprotocols you can use to augment WebSockets.
Chapter 3: The WebSocket API provides details about the constituent components of the
WebSocket API — its events, methods, and properties, alongside usage examples for
each of them.
Resources — a collection of articles, videos, and WebSocket solutions you might want to
explore.
The WebSocket technology is a vast and complex topic; this second version of the ebook
does not intend to cover everything there is to know about it. In future iterations, we plan
to:
• Add more details to the existing chapters.
• Provide more examples and walkthroughs for building apps with WebSockets.
• Cover additional aspects that are currently out of scope, such as WebSocket security,
and alternatives to WebSockets.
The first realtime web apps started to appear in the 2000s, attempting to deliver
responsive, dynamic, and interactive end-user experiences. However, at that time, the
realtime web was difficult to achieve and slower than we’re used to nowadays; it was
delivered by hacking existing HTTP-based technologies that were not designed and
optimized for realtime applications. It quickly became obvious that a better alternative
was needed.
In this first chapter, we’ll look at how web technologies evolved, culminating with the
emergence of WebSockets, a vastly superior improvement on HTTP for building realtime
web apps.
This initial version of HTTP1 (commonly known as HTTP/0.9) that Berners-Lee developed
was incredibly basic. Requests consisted of a single line and started with the only
supported method, GET, followed by the path to the resource:
GET /mypage.html
<HTML>
My HTML page
</HTML>
There were no HTTP headers, status codes, URLs, or versioning, and the connection was
terminated immediately after receiving the response.
Since interest in the web was skyrocketing, and with HTTP/0.9 being severely limited,
both browsers and servers quickly made the protocol more versatile by adding new
capabilities. Some key changes:
• Header fields including rich metadata about the request and response (HTTP version
number, status code, content type).
• Two new methods — HEAD and POST.
• Additional content types (e.g., scripts, stylesheets, or media), so that the response was
no longer restricted to hypertext.
1
The Original HTTP as defined in 1991
2
The IETF HTTP Working Group
3
RFC 1945: Hypertext Transfer Protocol - HTTP/1.0
In 1995, Netscape hired Brendan Eich with the goal of embedding scripting capabilities
into their Netscape Navigator browser. Thus, JavaScript was born. The first version of the
language was simple, and you could only use it for a few things, such as basic validation
of input fields before submitting an HTML form to the server. Limited as it was back
then, JavaScript brought dynamic experiences to a web that had been fully static until
that point. Progressively, JavaScript was enhanced, standardized, and adopted by all
browsers, becoming one of the core technologies of the web as we know it today.
4
RFC 2068: Hypertext Transfer Protocol - HTTP/1.1
5
IETF HTTP Working Group, HTTP Documentation, Core Specifications
We will now look at the main HTTP-centric design models that emerged for developing
realtime apps: AJAX and Comet.
AJAX
AJAX (short for Asynchronous JavaScript and XML) is a method of asynchronously
exchanging data with a server in the background and updating parts of a web page —
without the need for an entire page refresh (postback).
Publicly used as a term for the first time in 20056, AJAX encompasses several technologies:
• HTML (or XHTML) and CSS for presentation.
• Document Object Model (DOM) for dynamic display and interaction.
• XML or JSON for data interchange, and XSLT for XML manipulation.
• XMLHttpRequest7 (XHR) object for asynchronous communication.
• JavaScript to bind everything together.
It’s worth emphasizing the importance of XMLHttpRequest, a built-in browser object that
allows you to make HTTP requests in JavaScript. The concept behind XHR was initially
created at Microsoft and included in Internet Explorer 5, in 1999. In just a few years,
XMLHttpRequest would benefit from widespread adoption, being implemented by Mozilla
Firefox, Safari, Opera, and other browsers.
Let’s now look at how AJAX works, by comparing it to the classic model of building a web
app.
6
Jesse James Garrett, Ajax: A New Approach to Web Applications
7
XMLHttpRequest Living Standard
In a classic model, most user actions in the UI trigger an HTTP request sent to the server.
The server processes the request and returns the entire HTML page to the client.
In comparison, AJAX introduces an intermediary (an AJAX engine) between the user and
the server. Although it might seem counterintuitive, the intermediary significantly improves
responsiveness. Instead of loading the webpage, at the start of the session, the client
loads the AJAX engine, which is responsible for:
• Regularly polling the server on the client’s behalf.
• Rendering the interface the user sees, and updating it with data retrieved from the
server.
AJAX (and XMLHttpRequest request in particular) can be considered a black swan event
for the web. It opened up the potential for web developers to start building truly dynamic,
asynchronous, realtime-like web applications that could communicate with the server
silently in the background, without interrupting the user’s browsing experience. Google
was among the first to adopt the AJAX model in the mid-2000s, initially using it for Google
Suggest, and its Gmail and Google Maps products. This sparked widespread interest in
AJAX, which quickly became popular and heavily used.
The Comet model was made famous by organizations such as Google and Meebo. The
former initially used Comet to add web-based chat to Gmail, while Meebo used it for their
web-based chat app that enabled users to connect to AOL, Yahoo, and Microsoft chat
platforms through the browser. In a short time, Comet became a default standard for
building responsive, interactive web apps.
Several different techniques can be used to deliver the Comet model, the most well-
known being long polling9 and HTTP streaming. Let’s now quickly review how these two
work.
Long polling
Essentially a more efficient form of
polling, long polling is a technique
where the server elects to hold a client’s
connection open for as long as possible,
delivering a response only after data
becomes available or a timeout threshold
is reached. Upon receipt of the server
response, the client usually issues another
request immediately. Long polling is
often implemented on the back of
XMLHttpRequest , the same object that
plays a key role in the AJAX model. Figure 1.2: High-level overview of long polling
HTTP streaming
Also known as HTTP server push, HTTP streaming is a data transfer technique that allows
a web server to continuously send data to a client over a single HTTP connection that
remains open indefinitely. Whenever there’s an update available, the server sends a
response, and only closes the connection when explicitly told to do so.
HTTP streaming can be achieved by using the chunked transfer encoding mechanism
available in HTTP/1.1. With this approach, the server can send response data in chunks of
newline-delimited strings, which are processed on the fly by the client.
8
Alex Russell, Comet: Low Latency Data for the Browser
9
Long Polling - Concepts and Considerations
HTTP/1.1 200 OK
Content-Type: text/plain
Transfer-Encoding: chunked
7\r\n
Chunked\r\n
8\r\n
Response\r\n
7\r\n
Example\r\n
0\r\n
\r\n
Server-Sent Events10 (SSE) is another option you can leverage to implement HTTP
streaming. SSE is a server push technology commonly used to send message updates or
continuous data streams to a browser client. SSE aims to enhance native, cross-browser
server-to-client streaming through a JavaScript API called EventSource, standardized11 as
part of HTML5 by the World Wide Web Consortium (W3C).
10
Server-Sent Events (SSE): A Conceptual Deep Dive
11
Server-sent events, HTML Living Standard
Most of their limitations stem from using HTTP as the underlying transport protocol. The
problem is that HTTP was initially designed to serve hypermedia resources in a request-
response fashion. It hadn’t been optimized to power realtime apps that usually involve
high-frequency or ongoing client-server communication, and the ability to react instantly
to changes.
Hacking HTTP-based technologies to emulate the realtime web was bound to lead to all
sorts of drawbacks. We will now cover the main ones (without being exhaustive).
Limited scalability
HTTP polling, for example, involves sending requests to the server at fixed intervals to see
if there’s any new update to retrieve. High polling frequencies result in increased network
traffic and server demands; this doesn’t scale well, especially as the number of concurrent
users rises. Low polling frequencies will be less taxing on the server, but they may result in
delivery of stale information that has lost (part of) its value.
Although an improvement on regular polling, long polling is also intensive on the server,
and handling thousands of simultaneous long polling requests requires huge amounts of
resources.
Another problem is that a server may send a response, but network or browser issues
may prevent the message from being successfully received. Unless some sort of message
receipt confirmation process is implemented, a subsequent call to the server may result in
missed messages.
Although HTTP streaming techniques are better for lower latencies than (long) polling,
they are limited themselves (just like any other HTTP-based mechanism) by HTTP headers,
which increase message size and cause unnecessary delays. Often, the HTTP headers in
the response outweigh the core data being delivered12.
No bidirectional streaming
A request/response protocol by design, HTTP doesn’t support bidirectional, always-
on, realtime communication between client and server over the same connection. You
can create the illusion of bidirectional realtime communication by using two HTTP
connections. However, the maintenance of these two connections introduces significant
overhead on the server, because it takes double the resources to serve a single client.
With the web continuously evolving, and user expectations of rich, realtime web-based
experiences growing, it was becoming increasingly obvious that an alternative to HTTP
was needed.
12
Matthew O’Riordan, Google — polling like it’s the 90s
13
IRC logs, 18.06.2008
14
W3C mailing lists, TCPConnection feedback
WEBSOCKETS HTTP/1.1
Communication
Full-duplex Half-duplex
Bidirectional Request-response
Server push
Overhead
State
Stateful Stateless
HTTP and WebSockets are designed for different use cases. For example, HTTP is a good
choice if your app relies heavily on CRUD operations, and there’s no need for the user
to react to changes quickly. On the other hand, when it comes to scalable, low-latency
realtime applications, WebSockets are the way to go. More about this in the next section.
Adoption
Initially called TCPConnection, the WebSocket interface made its way into the HTML5
specification15, which was first released as a draft in January 2008. The WebSocket
protocol was standardized in 2011 via RFC 6455; more about this in Chapter 2: The
WebSocket Protocol.
In December 2009, Google Chrome 4 was the first browser to ship full support for
WebSockets. Other browser vendors started to follow suit over the next few years; today,
all major browsers have full support for WebSockets. Going beyond web browsers,
WebSockets can be used to power realtime communication across various types of user
agents — for example, mobile apps.
Nowadays, WebSockets are a key technology for building scalable realtime web apps.
The WebSocket API and protocol have a thriving community, which is reflected by a
variety of client and server options (both open-source and commercial), developer
ecosystems, and myriad real-life implementations.
15
Web sockets, HTML Living Standard
This chapter covers key considerations related to the WebSocket protocol, as described
in RFC 6455. You’ll find out how to establish a WebSocket connection and exchange
messages, what kind of data can be sent over WebSockets, what types of extensions and
subprotocols you can use to augment WebSockets.
Protocol overview
The WebSocket protocol enables ongoing, full-duplex, bidirectional communication
between web servers and web clients over an underlying TCP connection.
16
RFC 6455: The WebSocket Protocol
17
IANA WebSocket Protocol Registries
The WebSocket URI schemes are analogous to the HTTP ones; the wss scheme
uses the same security mechanism as https to secure connections, while ws
corresponds to http.
The rest of the WebSocket URI follows a generic syntax, similar to HTTP. It consists of
several components: host, port, path, and query, as highlighted in the example below.
Client request
Here’s a basic example of a GET request made by the client to initiate the opening
handshake:
In addition to the required headers, the request may also contain optional ones. See
the Opening handshake headers section later in this chapter for more information on
headers.
18
RFC 8441: Bootstrapping WebSockets with HTTP/2
Server response
The server must return an HTTP 101 Switching Protocols response code for the
WebSocket connection to be successfully established:
The response must contain several headers: Connection, Upgrade, and Sec-WebSocket-
Accept. Other optional headers may be included, such as Sec-WebSocket-Extensions ,
or Sec-WebSocket-Protocol (provided they were passed in the client request). See the
Opening handshake headers section in this chapter for additional details.
If the status code returned by the server is anything but HTTP 101 Switching
Protocol, the handshake will fail, and the WebSocket connection will not be
established.
Host Yes The host name and optionally the port number of the
server to which the request is being sent. If no port
number is included, a default value is implied (80 for
ws, or 433 for wss).
Sec-WebSocket- Yes The only accepted value is 13. Any other version passed
Version in this header is invalid.
Some common, optional headers like User-Agent, Referer, or Cookie may also be used in
the opening handshake. However, we have omitted them from the table above, as they
don’t directly pertain to WebSockets.
First, we have Sec-WebSocket-Key, which is passed by the client to the server, and contains
a 16-byte, base64-encoded one-time random value (nonce). Its purpose is to help ensure
that the server does not accept connections from non-WebSocket clients (e.g., HTTP
clients) that are being abused (or misconfigured) to send data to unsuspecting WebSocket
servers. Here’s an example of Sec-WebSocket-Key:
Sec-WebSocket-Key: dGhlIHNhbXBsZSBub25jZQ==
Sec-WebSocket-Accept: s3pPLMBiTxaQ9kYGzzhZRbK+xOo=
The WebSocket frame has a binary syntax and contains several pieces of information, as
shown in the following figure:
We will now take a more detailed look at all these constituent parts of a WebSocket frame.
Per RFC 645519, another use case for fragmentation is represented by multiplexing, where
“[...] it is not desirable for a large message on one logical channel to monopolize the
output channel, so the multiplexing needs to be free to split the message into smaller
fragments to better share the output channel.”
All data frames that comprise a WebSocket message must be of the same type
(text or binary); you can’t have a fragmented message that consists of both text
and binary frames. However, a fragmented WebSocket message may include
control frames. See the Opcodes section later in this chapter for more details
about frame types.
Let’s now look at some quick examples to illustrate fragmentation. Here’s what a single-
frame message might look like:
In comparison, with fragmentation, the same message would look like this:
The WebSocket protocol makes fragmentation possible via the first bit of the WebSocket
frame — the FIN bit, which indicates whether the frame is the final fragment in a
message. If it is, the FIN bit must be set to 1. Any other frame must have the FIN bit clear.
19
RFC 6455: The WebSocket Protocol
Opcodes
Every frame has an opcode that determines how to interpret that frame’s payload data.
The standard opcodes currently in use are defined by RFC 6455 and maintained by
IANA20.
OPCODE DESCRIPTION
8 (Close), 9 (Ping), and 10 (Pong) are known as control frames, and they are used
to communicate state about the WebSocket connection.
20
IANA WebSocket Opcode Registry
A masking bit set to 1 indicates that the respective frame is masked (and
therefore contains a masking-key). The server will close the WebSocket
connection if it receives an unmasked frame.
On the server-side, frames received from the client must be unmasked before further
processing. Here’s an example of how you can do that:
Payload length
The WebSocket protocol encodes the length of the payload data using a variable number
of bytes:
• For payloads <126 bytes, the length is packed into the first two frame header bytes.
• For payloads of 126 bytes, two extra header bytes are used to indicate length.
• If the payload is 127 bytes, eight additional header bytes are used to indicate its
length.
Each frame’s payload type is indicated via a 4-bit opcode (1 for text or 2 for binary).
Closing handshake
Compared to the opening handshake, the closing handshake is a much simpler process.
You initiate it by sending a close frame with an opcode of 8. In addition to the opcode,
the close frame may contain a body that indicates the reason for closing. This body
consists of a status code (integer) and a UTF-8 encoded string (the reason).
The standard status codes that can be used during the closing handshake are defined by
RFC 6455; additional, custom close codes can be registered with IANA21.
0-999 N/A Codes below 1000 are invalid and cannot be used.
1000 Normal closure Indicates a normal closure, meaning that the purpose
for which the WebSocket connection was established
has been fulfilled.
1001 Going away Should be used when closing the connection and there
is no expectation that a follow-up connection will
be attempted (e.g., server shutting down, or browser
navigating away from the page).
21
IANA WebSocket Close Code Number Registry
1005 No status received Used by apps and the WebSocket API to indicate
that no status code was received, although one was
expected.
1006 Abnormal closure Used by apps and the WebSocket API to indicate that
a connection was closed abnormally (e.g., without
sending or receiving a close frame).
1009 Message too big The endpoint is terminating the connection due to
receiving a data frame that is too large to process.
1013 Try again later The server is terminating the connection due to a
temporary condition, e.g., it is overloaded.
1014 Bad gateway The server was acting as a gateway or proxy and
received an invalid response from the upstream server.
Similar to 502 Bad Gateway HTTP status code.
1015 TLS handshake Reserved. Indicates that the connection was closed due
to a failure to perform a TLS handshake (e.g., the server
certificate can’t be verified).
Both the client and the server can initiate the closing handshake. Upon receiving a close
frame, an endpoint (client or server) has to send a close frame as a response (echoing the
status code received).
Once a close frame has been sent, no more data frames can pass over the
WebSocket connection.
After an endpoint has both sent and received a close frame, the closing handshake is
complete, and the WebSocket connection is considered closed.
Subprotocols are negotiated during the opening handshake. The client uses the Sec-
WebSocket-Protocol header to pass along one or more comma-separated subprotocols,
as shown in this example:
Provided it understands the subprotocols passed in the client request, the server must pick
one (and only one) and return it alongside the Sec-WebSocket-Protocol header. From this
point onwards, the client and server can communicate over the negotiated subprotocol.
If the server doesn’t agree with any of the subprotocols suggested, the Sec-
WebSocket-Protocol header won’t be included in the response.
22
IANA WebSocket Subprotocol Name Registry
23
Kayla Matthews, MQTT: A Conceptual Deep-Dive
24
The Simple Text Oriented Messaging Protocol (STOMP)
At the time of writing, there are only a couple of extensions registered with IANA25, such
as permessage-deflate, which compresses the payload data portion of WebSocket
frames. If you’re interested in developing your own extension, you can use an open-source
framework like websocket-extensions26.
Extensions are negotiated during the opening handshake. The client uses the Sec-
Websocket-Extensions header to pass along the extensions it wishes to use, as shown in
this example:
Provided it supports the extensions sent in the client request, the server must include
them in the response, alongside the Sec-WebSocket-Extensions header. From this point
onwards, the client and server can communicate over WebSockets using the extensions
they’ve negotiated.
Security
In this section, we will cover some of the mechanisms you can use to secure WebSocket
connections, and communication done over the WebSocket protocol. This is by no means
an exhaustive section; it only aims to provide a high-level overview of several security-
related considerations. More about the complex topic of WebSockets security will be
treated in future versions of this book.
Let’s start with the Origin header, which is sent by all browser clients (optional for non-
browsers) to the server during the opening handshake. The Origin header is essential
for securing cross-domain communication. Specifically, if the Origin indicated is
unacceptable, the server can fail the handshake (usually by returning an HTTP 403
Forbidden status code). This ability can be extremely helpful in mitigating denial of service
(DoS) attacks.
25
IANA WebSocket Extension Name Registry
26
The websocket-extensions framework
The WebSocket protocol doesn’t prescribe any particular way that servers can
authenticate clients. For example, you can handle authentication during the
opening handshake, by using cookie headers. Another option is to manage
authentication (and authorization) at the application level, by using techniques
such as JSON Web Tokens27.
So far, we’ve covered security mechanisms that are used during connection establishment.
Now, let’s look at some aspects that impact security during data exchange between
the client and the server. First of all, to reduce the chance of man-in-the-middle
and eavesdropping attacks (especially when exchanging critical, sensitive data), it’s
recommended to use the wss URI scheme — which uses TLS to encrypt the connection, just
like https.
We’ve talked about message frames earlier in this chapter, and mentioned that frames
sent by the client to the server need to be masked with the help of a random masking-
key (32-bit value). This key is contained within the frame, and it’s used to obfuscate the
payload data. Frames need to be unmasked by the server before further processing.
Masking makes WebSocket traffic look different from HTTP traffic, which is especially useful
when proxy servers are involved. That’s because some proxy servers may not “understand”
the WebSocket protocol, and, were it not for the mask, they might mistake it for regular
HTTP traffic; this could lead to all sorts of problems, such as cache poisoning.
27
RFC 7519: JSON Web Token (JWT)
Overview
Defined in the HTML Living Standard28, the WebSocket API is a technology that makes
it possible to open a persistent two-way, full-duplex communication channel between
a web client and a web server. The WebSocket interface enables you to send messages
asynchronously to a server and receive event-driven responses without having to poll for
updates.
Almost all modern browsers support the WebSocket API29. Additionally, there are plenty
of frameworks and libraries — both open-source and commercial solutions — that
implement WebSocket APIs. See the Resources section in this book for more details.
For the rest of this chapter, we will cover the core capabilities of the WebSocket API. As you
will see, it’s intuitive, designed with simplicity in mind, and trivial to use.
28
Web sockets, HTML Living Standard
29
Can I use WebSockets?
See Chapter 4: Building a Web App with WebSockets to learn how to create your own
WebSocket server in Node.js.
The WebSocket constructor contains a required parameter — the url to the WebSocket
server. Additionally, the optional protocols parameter may also be included, to indicate
one or more WebSocket subprotocols (application-level protocols) that can be used
during the client-server communication:
Once the WebSocket object is created and the connection is established, the client can
start exchanging data with the server.
30
Berkeley sockets
Open
The open event is raised when a WebSocket connection is established. It indicates that the
opening handshake between the client and the server was successful, and the WebSocket
connection can now be used to send and receive data. Here’s a usage example:
// Connection opened
socket.onopen = function(e) {
console.log('Connection open!');
};
socket.onmessage = function(msg) {
if(msg.data instanceof ArrayBuffer) {
processArrayBuffer(msg.data);
} else {
processText(msg.data);
}
}
Error
The error event is fired in response to unexpected failures or issues (for example, some
data couldn’t be sent). Here’s how you listen for error events:
socket.onerror = function(e) {
console.log('WebSocket failure', e);
handleErrors(e);
};
socket.onclose = function(e) {
console.log('Connection closed', e);
};
You can manually trigger calling the close event by executing the close()
method.
Methods
The WebSocket API supports two methods: send() and close().
send()
Once the connection has been established, you’re almost ready to start sending and
receiving messages to and from the WebSocket server. But before doing that, you first
have to ensure that the connection is open and ready to receive messages. You can
achieve this in two main ways.
The first option is to trigger the send() method from within the onopen event handler, as
demonstrated in the following example:
socket.onopen = function(e) {
socket.send(JSON.stringify({'msg': 'payload'}));
}
function processEvent(e) {
if(socket.readyState === WebSocket.OPEN) {
// Socket open, send!
socket.send(e);
} else {
// Show an error, queue it for sending later, etc
}
}
The two code snippets above show how to send text (string) messages. However, in
addition to strings, you can also send binary data (Blob or ArrayBuffer), as shown in this
example:
After sending one or more messages, you can leave the WebSocket connection open for
further data exchanges, or call the close() method to terminate it.
close()
The close() method is used to close the WebSocket connection (or connection attempt).
It’s essentially the equivalent of the closing handshake we covered previously, in Chapter
2. After this method is called, no more data can be sent or received over the WebSocket
connection.
If the connection is already closed, calling the close() method does nothing.
socket.close();
Here’s an example of calling the close() method with the two optional parameters:
Properties
The WebSocket object exposes several properties containing details about the WebSocket
connection.
binaryType
The binaryType property controls the type of binary data being received over the
WebSocket connection. The default value is blob; additionally, WebSockets also support
arraybuffer.
bufferedAmount
Read-only property that returns the number of bytes of data queued for transmission but
not yet sent. The value of bufferedAmount resets to zero once all queued data has been
sent.
bufferedAmount is most useful particularly when the client application transports large
amounts of data to the server. Even though calling send() is instant, actually transmitting
that data over the internet is not. Browsers will buffer outgoing data on behalf of your
client application. The bufferedAmount property is useful for ensuring that all data is sent
before closing a connection, or performing your own throttling on the client-side.
Below is an example of how to use bufferedAmount to send updates every second, and
adjust accordingly if the network cannot handle the rate:
extensions
Read-only property that returns the name of the WebSocket extensions that were
negotiated between client and server during the opening handshake.
“onevent” properties
These properties are called to run associated handler code whenever a WebSocket event
is fired. There are four types of “onevent” properties, one for each type of event:
PROPERTY DESCRIPTION
onerror Gets called when an error event occurs, impacting the WebSocket connection.
onclose Called with a close event when the WebSocket’s connection readyState
property changes to 3; this indicates that the connection is closed.
Subprotocols are specified via the protocols parameter when creating the
WebSocket object (see The WebSocket constructor section earlier in this chapter
for details). If no protocol is specified during connection establishment, the
protocol property will return an empty string.
readyState
Read-only property that returns the current state of the WebSocket connection. The table
below shows the values you can see reflected by this property, and their meaning:
0 CONNECTING Socket has been created, but the connection is not yet
open.
The value of readyState will change over time. It’s recommended to check
it periodically to understand the lifespan and life cycle of the WebSocket
connection.
url
Read-only property that returns the absolute URL of the WebSocket, as resolved by the
constructor.
Using WebSockets in the frontend is fairly straightforward, via the WebSocket API built
into all modern browsers (we’ll use this API on the client-side in the first part of the demo,
alongside ws on the server-side). Additionally, there are plenty of libraries and solutions
implementing the WebSocket technology on both the client-side and the server-side. This
includes SockJS, which we will cover in the second part of the demo.
For more details about WebSocket client and server implementations, see the Resources
section.
31
Jo Franchetti, WebSockets and Node.js — testing WS and SockJS by building a web app
In order to demonstrate how to set up WebSockets with Node.js and ws, we will build
a demo app that shares users’ cursor positions in realtime. We walk through building it
below.
For brevity’s sake, we call it wss in our code. Any resemblance to secure
WebSockets (often referred to as wss) is a coincidence.
Next, create a Map to store a client’s metadata (any data we wish to associate with a
WebSocket client):
32
ws: a Node.js WebSocket library
clients.set(ws, metadata);
Every time a client connects, we generate a new unique ID, which is used to identify them.
Clients are also assigned a cursor color by using Math.random(); this generates a number
between 0 and 360, which corresponds to the hue value of an HSV color. The ID and
cursor color are then added to an object that we’ll call metadata, and we’re using the Map
to associate them with our ws WebSocket instance.
The Map is a dictionary — we can retrieve this metadata by calling get and providing a
WebSocket instance later on.
Using the newly connected WebSocket instance, we subscribe to that instance’s message
event, and provide a callback function that will be triggered whenever this specific client
sends a message to the server.
This event is on the WebSocket instance (ws) itself, and not on the WebSocket.
Server instance (wss).
Whenever our server receives a message, we use JSON.parse to get the message contents,
and load our client metadata for this socket from our Map using clients.get(ws).
We’re going to add our two metadata properties to the message as sender and color:
message.sender = metadata.id;
message.color = metadata.color;
[...clients.keys()].forEach((client) => {
client.send(outbound);
});
});
Finally, when a client closes its connection, we remove its metadata from our Map:
ws.on("close", () => {
clients.delete(ws);
});
});
function uuidv4() {
return 'xxxxxxxx-xxxx-4xxx-yxxx-xxxxxxxxxxxx'.replace(/[xy]/g, function(c)
{
var r = Math.random() * 16 | 0, v = c == 'x' ? r : (r & 0x3 | 0x8);
return v.toString(16);
});
}
console.log('wss up');
This server implementation multicasts, sending any message it has received to all
connected clients.
We now need to write some client-side code to connect to the WebSocket server, and
transmit the user’s cursor position as it moves.
<!DOCTYPE html>
<html lang='en'>
<head>
<meta charset='UTF-8'>
<meta http-equiv='X-UA-Compatible' content='IE=edge'>
<meta name='viewport' content='width=device-width, initial-scale=1.0'>
<title>Document</title>
The body contains a single HTML template which contains an SVG image of a pointer.
We’re going to use JavaScript to clone this template whenever a new user connects to our
server.
<body id='box'>
<template id='cursor'>
<svg viewBox='0 0 16.3 24.7' class='cursor'>
<path stroke='#000' stroke-linecap='round' stroke-
linejoin='round'
stroke-miterlimit='10' d='M15.6 15.6L.6.6v20.5l4.6-4.5 3.2 7.5
3.4-1.3-3-7.2z' />
</svg>
</template>
</body>
</html>
(async function() {
const ws = await connectToServer();
...
We stringify this object, and send it via our now connected ws WebSocket instance as
the message text:
You might notice that the syntax here differs slightly from the server-side WebSocket code.
That’s because we’re using the browser’s native WebSocket class, rather than the ws library.
When we receive a message over the WebSocket, we parse the data property of the
message, which contains the stringified data that the onmousemove handler sent to the
WebSocket server, along with the additional sender and color properties that the server-
side code adds to the message.
We then use the x and y values from the messageBody to adjust the cursor position using a
CSS transform.
Our code relies on two utility functions. The first is connectToServer, which opens a
connection to our WebSocket server, and then returns a Promise that resolves when the
WebSocket readyState property is 1 - CONNECTED.
This means that we can just await this function, and we’ll know that we have a connected
and working WebSocket connection.
function getOrCreateCursorFor(messageBody) {
const sender = messageBody.sender;
const existing = document.querySelector(`[data-sender='${sender}']`);
if (existing) {
return existing;
}
If we can’t find an existing element, we clone our HTML template, add the data attribute
with the current sender ID to it, and append it to the document.body before returning it:
cursor.setAttribute('data-sender', sender);
svgPath.setAttribute('fill', `hsl(${messageBody.color}, 50%, 50%)`);
document.body.appendChild(cursor);
return cursor;
}
}) ();
Now when you run the web application, each user viewing the page will have a cursor
that appears on everyone’s screens because we are sending the data to all the clients
using WebSockets.
If not, you can clone a working version of the demo from: https://github.com/ably-labs/
websockets-cursor-sharing.
This demo includes two applications: a web app that we serve through Snowpack33, and a
Node.js web server. The NPM start task spins up both the API and the web server.
Click to play
(opens in a browser)
33
Snowpack
Figure 4.2: Error message returned by the browser when a WebSocket connection can’t be established
This is because the ws library offers no fallback transfer protocols if WebSockets are
unavailable. If this is a requirement for your project, or you want to have a higher level of
reliability of delivery for your messages, then you will need a library that offers multiple
transfer protocols, such as SockJS.
Using SockJS in the client is similar to the native WebSocket API, with a few small
differences. We can swap out ws in the demo built previously and use SockJS instead to
include fallback support.
<script src='https://cdn.jsdelivr.net/npm/sockjs-client@1/dist/sockjs.min.
js' defer></script>
Note the defer keyword — it ensures that the SockJS library is loaded before index.js
runs.
In the app/script.js file, we then update the JavaScript to use SockJS. Instead of the
WebSocket object, we’ll now use a SockJS object. Inside the connectToServer function, we’ll
establish the connection with the SockJS server:
SockJS requires a prefix path on the server URL. The rest of the app/script.js
file requires no change.
34
SockJS-client
35
SockJS-node
Then we need to require the sockjs module and the built-in HTTP module from Node.
Delete the line that requires ws and replace it with the following:
At the very bottom of the API/index.js file we’ll create the HTTPS server and add the
SockJS HTTP handlers:
We map the handlers to a prefix supplied in a configuration object ('/ws'). We tell the
HTTP server to listen on port 7071 (arbitrarily chosen) on all the network interfaces on the
machine.
The final job is to update the event names to work with SockJS:
And that’s it, the demo will now run with WebSockets where they are supported; and
where they aren’t, it will use Comet long polling. This latter fallback option will show a
slightly less smooth cursor movement, but it is more functional than no connection at all!
If not, you can clone a working version of the demo from: https://github.com/ably-labs/
websockets-cursor-sharing/tree/sockjs.
This demo includes two applications: a web app that we serve through Snowpack36, and a
Node.js web server. The NPM start task spins up both the API and the web server.
Click to play
(opens in a browser)
Figure 4.3: Realtime cursor movement powered by the SockJS WebSockets library
36
Snowpack
The number of active users you can support is thus directly related to how much hardware
your server has. Node.js is pretty good at managing concurrency, but once you reach a
few hundred to a few thousand users, you're going to need to scale your hardware to
keep all the users in sync.
Scaling vertically is often an expensive proposition, and you'll always be faced with
a performance ceiling of the most powerful piece of hardware you can procure.
Additionally, vertical scaling is not elastic, so you have to do it ahead of time. You should
consider horizontal scaling, which is better in the long run — but also significantly more
difficult. See The scalability of your server layer section in the next chapter for details.
There are multiple ways to solve this: either by using some form of direct connection
between the cluster nodes that are handling the traffic, or by using an external pub/sub
mechanism. This is sometimes called "adding a backplane" to your infrastructure, and is
yet another moving part that makes scaling WebSockets difficult.
See Chapter 5: WebSockets at Scale for a more in-depth read about the engineering
challenges involved in scaling WebSockets.
37
Redis
38
Everything You Need To Know About Publish/Subscribe
WebSockets at Scale
This chapter covers the main aspects to consider when
you set out to build a system at scale. By this, I mean
a system to handle thousands or even millions of
concurrent end-user devices as they connect, consume,
and send messages over WebSockets. As you will see,
scaling WebSockets is non-trivial, and involves numerous
engineering decisions and technical trade-offs.
However, since you are only using one machine, there's a finite amount of resources
you can add to it, which means you can only scale your server up to a finite capacity.
Furthermore, what happens if, at some point, the number of concurrent WebSocket
connections proves to be too much to handle for just one machine? Or what happens if
you need to upgrade your server? With vertical scaling, you have a single point of failure,
which would severely affect the availability of your system and the uptime of your virtual
events platform.
In contrast, horizontal scaling is a more available model in the long run. Even if
a server crashes or needs to be upgraded, you are in a much better position to
protect your system’s overall availability since the workload of the machine that
failed is distributed to the other nodes in the network.
Of course, horizontal scaling comes with its own complications — it involves a more
complex architecture, additional infrastructure to manage, load balancing, and
automatically syncing message data and connection state across multiple WebSocket
servers in milliseconds (more about these topics is covered later in this chapter).
Despite its increased complexity, horizontal scaling is worth pursuing, as it allows you
to scale limitlessly (in theory). This makes it a superior alternative to vertical scaling. So,
instead of asking how many connections can a server handle, a better question would be:
how many servers can I distribute the workload to?
In addition to horizontal scaling, you should also consider the elasticity of your
server layer. System-breaking complications can arise when you expose an
inelastic server layer to the public internet, a volatile and unpredictable source
of traffic. To successfully handle WebSockets at scale, you need to be able to
dynamically (automatically) add more servers into the mix so that your system
can quickly adjust and deal with potential usage spikes at all times.
Load balancers detect the health of backend resources and do not send traffic to servers
that cannot deal with additional load. If a server goes down, the load balancer redirects
its traffic to the remaining operational servers. When a new server is added to the farm,
the load balancer automatically starts distributing traffic to it.
You can perform load balancing at different layers of the Open Systems Interconnection
(OSI) Model39:
• Layer 4 (L4). Transport-level load balancing. This means load balancers can make
routing decisions based on the TCP or UDP ports that packets use, along with their
source and destination IP addresses. The contents of the packets themselves are not
inspected.
39
OSI model, Wikipedia
ALGORITHM ABOUT
Least connections A new connection is routed to the server with the least number of active
connections.
Least bandwidth A new connection is routed to the server currently serving the least amount of
traffic as measured in megabits per second (Mbps).
Least response time A new connection is routed to the machine that takes the least amount of
time to respond to a health monitoring request (the response speed is used to
indicate how loaded a server is). Some load balancers might also factor in the
number of active connections on each server.
Hashing methods The routing decision is made based on a hash of various bits of data from
the incoming connection. This may include information such as port number,
domain name, and IP address.
Random with two The load balancer randomly picks two servers from your farm and routes a
choices new connection to the machine with the fewest active connections.
Custom load The load balancer queries the load on individual servers using something
like the Simple Network Management Protocol (SNMP)40, and assigns a new
connection to the machine with the best load metrics. You can define various
metrics to look at, such as CPU usage, memory, and response time.
However, if, for example, you are developing a chat solution, some WebSocket
connections will be more resource-intensive, due to certain end-users being
chattier. In this case, a round robin strategy might not be the best way to
go, since you would in effect be distributing load unevenly across your server
farm. In such a context, you would be better off using an algorithm like least
bandwidth.
Here are other aspects to bear in mind when you’re load balancing WebSockets:
Your server layer must be prepared to quickly adjust to falling back to a less efficient
transport, which usually means increased load. You should bear in mind that your ideal
load balancing strategy for WebSockets might not always be the right one for HTTP
requests; after all, stateful WebSockets and stateless HTTP are fundamentally different.
When you design your system, you must ensure it’s able to successfully load balance
WebSockets, as well as any HTTP fallbacks you support (you might want to have different
40
Simple Network Management Protocol, Wikipedia
Sticky sessions
One could argue that WebSockets are sticky by default (in the sense that there’s a
persistent connection between server and client). However, this doesn’t mean that you
are forced to use sticky load balancing (where the load balancer repeatedly routes traffic
from a client to the same destination server). In fact, sticky load balancing is a rather
fragile approach (there’s always the risk that a server will fail), making it hard to rebalance
load. Rather than using sticky load balancing, which inherently assumes that a client will
always stay connected to the same server, it’s more reliable to use non-sticky sessions, and
have a mechanism that allows your servers to share connection state between them. This
way, stream continuity can be ensured without the need for a WebSocket connection to
always be linked to the exact same server.
41
Everything You Need To Know About Publish/Subscribe
The pub/sub pattern's decoupled nature means your apps can theoretically scale to
limitless subscribers. A significant advantage of adopting the pub/sub pattern is that you
often have only one component that has to deal with scaling WebSocket connections —
the message broker. As long as the message broker can scale predictably and reliably,
it's unlikely you'll have to add additional components or make any other changes to
your system to deal with the unpredictable number of concurrent users connecting over
WebSockets.
There are numerous projects built with WebSockets and pub/sub42, and plenty of open-
source libraries and commercial solutions combining these two elements, so it’s unlikely
you’ll have to build your own WebSockets + pub/sub capability from scratch. Examples of
open-source solutions you can use include: Socket.IO with the Redis pub/sub adapter43,
SocketCluster44, or Django Channels45. Of course, when choosing an open-source solution,
you have to deploy it, manage it, and scale it yourself — this is, without a doubt, a tough
engineering challenge.
42
Websocket Pubsub open source projects on GitHub
43
Redis adapter for Socket.IO
44
SocketCluster
45
Django Channels
Imagine you’ve developed a CRM, marketing, and sales platform for tens of thousands
of business users, where you have realtime features such as chat and live dashboards
that are powered by WebSockets. But some users might be connecting from restrictive
corporate networks that block or break WebSocket connections. So what do you do to
ensure your product is available to your customers, knowing that you may not be able to
use WebSockets in all situations?
Most WebSocket solutions have fallback support baked in. For example, Socket.IO46,
one of the most popular open-source WebSocket libraries out there, will opaquely try to
establish a WebSocket connection if possible, and will fall back to HTTP long polling if not.
See Chapter 4: Building a Web App with WebSockets for details on building
a realtime app with SockJS that falls back to Comet long polling when
WebSockets can’t be used.
46
Socket.IO
47
SockJS-client
48
XMLHttpRequest Living Standard
49
Server-Sent Events (SSE): A Conceptual Deep Dive
In the context of scale, it's essential to consider the impact that fallbacks may
have on the availability of your system.
Let's assume you have thousands or even tens of thousands of simultaneous users
connected to your system, and there's an incident that causes a significant proportion
of the WebSocket connections to fall back to long polling. Not only is handling tens of
thousands of concurrent WebSocket connections a challenge in itself, but it's further
amplified by falling back to long polling, which is significantly more demanding on your
server layer. For example, while WebSockets allow you to push data as soon as it becomes
available over persistent connections, with long polling you have to buffer the data and
hold it somewhere until the next request comes in; this is much more resource-intensive
(increased RAM usage).
New connections
There’s a moderate overhead in establishing a new WebSocket connection — the process
involves a non-trivial request/response pair between the client and the server known as
the opening handshake.
Now, imagine you’re streaming live sports updates, and there’s a widely popular event
taking place, like a World Cup game, or a Grand Slam final. You can have tens of
thousands or even millions of client devices trying to open WebSocket connections at
the same time. Such a scenario leads to a huge burst in traffic, and your system needs
to be prepared to handle it. The situation would be even more complicated if all these
WebSocket connections were to simultaneously fall back to a less efficient transport.
Here are some of the things you can do to prepare for cases where you have to deal with
an extremely high number of WebSocket connections opening concurrently:
• Testing and monitoring. Run load and stress testing to evaluate how your system
behaves under peak load, and have comprehensive realtime monitoring and
alerting mechanisms in place, so you have a good understanding of what’s
happening at any given moment, and can act immediately if any issues occur.
• Enforce limits. Based on load and stress testing results, you can enforce hard limits,
such as maximum number of connections, or how many new connections can be
opened in a specific time interval. This way, you have more predictability, and you’re
in a better position to scale your system in a reliable way.
• Ensure your system is highly available and fault tolerant. You need to be able to
quickly (auto)scale your server layer so it can deal with traffic spikes. Additionally,
it’s advisable to operate with some capacity margin, and have backups for various
system components, to ensure redundancy and remove single points of failure.
We won’t go into details about the solutions you can use to build your WebSockets
monitoring stack — there are plenty of options to choose from, including open-source
tools like Prometheus50 and Grafana51.
It’s best if the metrics you monitor are presented on realtime dashboards, so you always
have up-to-date visibility into what is going on. And, of course, you should have alerts
configured so that when certain metrics go outside of acceptable values, you can be
instantly notified, and quickly react to address potential issues.
Load shedding
When scaling WebSockets, you will inevitably have to deal with traffic congestion and
the risk of your server layer getting overloaded by the number of connections it needs to
handle. If the situation is left unchecked, it can lead to cascading failures and even a total
collapse of your system.
To prevent this from happening, you need to have a load shedding strategy in place.
Load shedding mechanisms generally allow you to detect congestion and fail gracefully
when a server approaches overload, by rejecting some or all of the incoming traffic.
50
Prometheus
51
Grafana
Restoring connections
Many reasons could lead to WebSocket connections being dropped. Users might
switch from a mobile data network to a Wi-Fi network, go through a tunnel, or perhaps
experience intermittent network issues. Or one of your servers might be overloaded and
it crashes, or it needs to shed connections. When scenarios like these occur, WebSocket
connections need to be restored.
Automatic reconnections
You could implement a reconnection script that enables clients to reconnect
automatically. A simple one might look something like this52:
function connect() {
ws = new WebSocket("ws://localhost:8080");
ws.addEventListener('close', connect);
}
However, this approach is not ideal, since reconnection attempts occur immediately after
the WebSocket connections are closed. Clients continuously try to reconnect, even if your
server layer does not have enough capacity to deal with all of the incoming WebSocket
connections. This can put your system under even more pressure, and lead to cascading
failures.
52
Jeroen de Kok, How to implement a random exponential backoff algorithm in Javascript
function connect() {
ws = new WebSocket("ws://localhost:8080");
ws.addEventListener('open', onWebsocketOpen);
ws.addEventListener('close', onWebsocketClose);
}
function onWebsocketOpen() {
currentReconnectDelay = initialReconnectDelay;
}
function onWebsocketClose() {
ws = null;
setTimeout(() => {
reconnectToWebsocket();
}, currentReconnectDelay);
}
function reconnectToWebsocket() {
if(currentReconnectDelay < maxReconnectDelay) {
currentReconnectDelay*=2;
}
connect();
}
The algorithm exponentially increases the delay after each reconnection attempt,
increasing the waiting time between retries to a maximum backoff time. Compared to
a simple reconnection script, this is better, because it gives you some time to add more
capacity into your system so that it can deal with all the WebSocket reconnections. But it’s
still not great, because all the clients would keep trying to reconnect all at once.
You can make the exponential backoff mechanism more reliable by making it random, so
not all clients reconnect at the exact same time:
function onWebsocketClose() {
ws = null;
// Add anything between 0 and 3000 ms to the delay.
setTimeout(() => {
reconnectToWebsocket();
}, currentReconnectDelay + Math.floor(Math.random() * 3000)));
}
If resuming a stream exactly where it left off after brief disconnections is important to your
use case, here are some things you’ll need to consider:
• Caching messages in front-end memory. How many messages do you store, and for
how long?
• Moving data to persistent storage. Do you need to transfer data to persistent
storage? If so, where do you store it, and for how long? How will clients access that
data when they reconnect?
• How does the stream resume? When a client reconnects, how do you know exactly
where to resume the stream from? Do you need to use a serial number / timestamp
to establish where a connection broke off? Who needs to keep track of the connection
breaking down — the client or the server?
• Syncing connection state across your servers. Assuming you are not using sticky load
balancing (you shouldn’t), a client might reconnect to a different server than the
initial one, so you need to ensure any server is able to resume the stream. Should
you use something like a gossip protocol or a publish/subscribe solution to ensure
connection state is shared across your server farm? How will you ensure that the sync
mechanism itself is always available and working reliably?
Heartbeats
The WebSocket protocol natively supports control frames53 known as Ping and Pong.
These control frames are an application-level heartbeat mechanism to detect whether a
WebSocket connection is alive. Usually, the server is the one that sends a Ping frame and,
on receipt, the client-side must send a Pong frame back as a response.
You should closely monitor the effect heartbeats at scale have on your system,
and the ratio of Ping/Pong frames to actual messages being sent over
WebSockets. There are situations when you might find that you are sending
more heartbeats than messages (text or binary frames) over WebSockets.
53
RFC 6455, Section 5.5: Control Frames
Backpressure
When streaming data to client devices at scale over the internet, backpressure is one of
the key issues you will have to deal with. For example, let’s assume you are streaming 20
messages per second, but a client can only handle 15 messages per second. What do you
do with the remaining 5 messages per second that the client is unable to consume?
You need a way to monitor the buffers building up on the sockets used to stream data
to clients, and ensure a buffer never grows beyond what the downstream connection
can sustain. Beyond client-side issues, if you don’t actively manage buffers, you’re risking
exhausting the resources of your server layer — this can happen very fast when you have
thousands of concurrent WebSocket connections.
However, dropping packets is not always a good solution — there are use cases where
data integrity is critical, and you simply can’t afford to lose information. In such scenarios,
you should use application-level acknowledgments (ACKs) as confirmation of message
receipt, and configure your system to hold off from sending additional batches of
messages until it has received ACKs. You also need to consider how to ensure stream
continuity even if disconnections are involved.
54
Tsviatko Yovtchev, Delta Compression: A practical guide to diff algorithms and delta file formats
Fault tolerant designs treat failures as routine. The assumption has to be that component
failures will happen sooner or later. The larger the system, the higher the chances of issues
occurring. What’s important is that, when failures do happen, your system has enough
redundancy to continue operating, with functionality and user experience preserved as
effectively as possible.
Without going into much detail, to make your system fault-tolerant, you need to ensure
it's redundant against server and datacenter failures. If you’re building cloud-based apps,
this first of all implies having the ability to elastically scale your server layer (and operating
with extra capacity on standby), and distributing your infrastructure across multiple
availability zones.
However, it isn’t sufficient to rely on any specific region for multiple reasons56 — sometimes
multiple availability zones in a region do fail at the same time; sometimes there might
be local connectivity issues making the region unreachable; and sometimes there might
simply be capacity limitations in a region that prevent all services from being supportable
there.
As a result, it’s often best to have infrastructure deployed in multiple regions. This is the
ultimate way of ensuring statistical independence of failures and the strongest guarantee
that your system will continue to operate and provide uninterrupted service to your users.
55
Dr. Paddy Byers, Engineering dependability and fault tolerance in a distributed system
56
Michael Gariffo, AWS suffers third outage of the month
Use horizontal scaling rather than vertical scaling. It’s more reliable, especially
for use cases where you can’t afford your system to be unavailable under any
circumstances.
If possible, use smaller machines (servers) rather than large ones. They are easier
and faster to spin up, and costs are more granular.
Aim to have a homogeneous server farm. It’s much more complicated to balance
load efficiently and evenly across machines with different configurations.
Have a good understanding of your use case and relevant parameters (such as
usage patterns and bandwidth) before choosing a load balancing algorithm.
Ensure your server layer is dynamically elastic, so you can quickly scale out when you
have traffic spikes. You should also operate with some capacity margin, and have
backups for various system components, to ensure redundancy and remove single
points of failure.
You most likely need to support fallback transports, such as Comet long polling,
because WebSockets, although widely supported, are blocked by certain enterprise
firewalls and networks. Note that falling back to another protocol changes your
scaling parameters; after all, stateful WebSockets are fundamentally different from
stateless HTTP, so you need a strategy to scale both.
Run load and stress testing to understand how your system behaves under peak
load, and enforce hard limits (for example, maximum number of concurrent
WebSocket connections) to have some predictability.
You need to have a load shedding strategy; failing gracefully is always better than
a total collapse of your system.
Use a tiered infrastructure to enable you to recover from faults and coordinate
between servers.
Some WebSocket connections will inevitably break at some point. You need a
strategy for ensuring that after the WebSocket connections are restored, you can
resume the stream with ordering and delivery (preferably exactly-once) guaranteed.
Keep track of idle connections and close them. Even if no messages (text or binary
frames) are being sent, you are still sending ping/pong frames periodically, so even
idle connections consume resources.
Videos
• A Beginner’s Guide to WebSockets
• The Complete Guide to WebSockets
• WebSockets Crash Course - Handshake, Use-cases, Pros & Cons and more
Further reading
• WebSockets Security: Main Attacks and Risks
• WebSocket Security - Cross-Site Hijacking (CSWSH)
• The Future of Web Software Is HTML-over-WebSockets
• Implementing a WebSocket server with Node.js
• Migrating Millions of Concurrent Websockets to Envoy (Slack Engineering)
• The Periodic Table of Realtime
We treasure any feedback from our readers. If you’ve spotted a mistake, if you have any
suggestions for what we should include in future versions of the ebook, or if you simply
want to chat about WebSockets, reach out to us!
Contact us
Ably provides a suite of APIs to build, extend, and deliver powerful digital experiences
in realtime – primarily over WebSockets – for more than 250 million devices across 80
countries each month. Organizations like Bloomberg, HubSpot, Verizon, and Hopin
depend on Ably’s platform to offload the growing complexity of business-critical realtime
data synchronization at global scale.