Horizontal scaling of WebSockets

Websockets are a communication protocol that enables real-time, full-duplex communication between a client (such as a web browser) and a server. Understanding websockets is crucial because they facilitate efficient and persistent communication between clients and servers, particularly in scenarios requiring real-time updates or interactive features.

This article aims to explore a method for horizontally scaling websockets, which can be challenging due to their stateful nature. Our focus will be on explaining the concept of scaling rather than providing a detailed implementation guide.

Understanding the Challenges of Scaling Websockets

Websockets maintain a stateful connection between the client and the server, which becomes particularly challenging in horizontally scaled environments. When multiple clients are connected, and the backend needs to send a message to a specific client, it must determine which of the horizontally scaled services is connected to that particular client. This makes communication more complex and requires a smart way to keep track of connections.

For example:

In this scenario, we have a backend service that needs to communicate with a specific client, Client 1. The challenge arises because the WebSocket service is horizontally scaled into multiple replicas, each managing different client connections.

When “Another backend service” tries to send a message to Client 1, the WebSocket service must correctly identify which replica, in this case, Replica 1, maintains the connection to Client 1. If the WebSocket service incorrectly attempts to connect through Replica 2, it would reach Client 2 instead of Client 1. Thus, it's crucial for the WebSocket service to track which client is connected through which replica to ensure the correct routing of messages.

Effective Horizontal Scaling Strategy for WebSocket Connections

To address the challenge of routing messages correctly in a horizontally scaled WebSocket environment, a solution is to enable each WebSocket service replica to operate independently while still being able to communicate with other replicas.

This means that each replica should have the capability to determine whether an incoming message is relevant to its current connections. If the message is intended for a client connected to a different replica, the replica receiving the message should have a method to forward the message to the correct replica. This requires a coordination mechanism among replicas, ensuring that messages are routed accurately to the client they are meant for, regardless of which replica initially receives the message.

Implementing a broadcast mechanism is a feasible solution to manage message routing among replicas in a horizontally scaled WebSocket service. Here’s how it would work:

Broadcast Incoming Messages: Whenever any replica receives a message, it broadcasts this message to all other replicas in the WebSocket service.
Evaluate Message Relevance: Each replica, upon receiving a broadcasted message, evaluates whether the message pertains to any of its connected clients.
Ignore or Process: If the message is relevant to a client connected to the replica, it processes the message accordingly. If not, the replica simply ignores the message.

This approach ensures that every message reaches the appropriate client, regardless of which replica initially received it. However, it's important to consider the potential for increased network traffic and the need for efficient filtering mechanisms to handle irrelevant messages, to avoid overloading the replicas with unnecessary processing tasks.

Following image shows an example implementation of this idea

This diagram illustrates the process of routing a message from "Another backend service" to Client 3 within a horizontally scaled WebSocket environment utilizing a publish/subscribe (pub/sub) queue system.

Here's a step-by-step breakdown:

Initial Message Sending: "Another backend service" sends a message intended for Client 3. This message is initially received by one of the WebSocket service replicas, in this case, Replica 1.
Publish to Pub/Sub Queue: Replica 1 doesn't handle Client 3 directly but knows it needs to ensure the message reaches the correct client. Thus, it publishes the message to the centralized Pub/Sub Queue.
Broadcast to All Replicas: The Pub/Sub Queue then broadcasts this message to all subscribed replicas, which in this scenario are Replica 1, Replica 2, and Replica 3.
Evaluation by Replicas: Each replica evaluates the incoming message to determine if it is relevant to any of the clients it directly manages.
- Replica 1 checks and determines the message is not for Client 1.
- Replica 2 checks and determines the message is not for Client 2.
- Replica 3 recognizes that the message is intended for Client 3, which it manages.
Message Delivery:Upon recognizing its relevance, Replica 3 accepts the message and forwards it to Client 3.

This setup illustrates a robust mechanism where messages can be effectively routed across multiple service replicas, ensuring that even in a scaled environment, communication remains precise and reliable. The pub/sub model facilitates a scalable architecture by decoupling the message sending and receiving duties, allowing replicas to independently handle only relevant communications.

This method significantly enhances the system's efficiency by minimizing unnecessary message handling and processing across the service replicas.