Introduction
Building real-time applications that scale to millions of users requires a fundamental shift in how we think about state and connection management. In this log, I'll break down the architecture used to support 10M+ users at Zoho.
The Challenge: Connection Limits
Traditional HTTP request-response cycles are stateless. WebSockets, however, maintain a persistent connection. A single server has a limit on the number of open file descriptors (sockets) it can handle.
# Standard limit check
ulimit -n
# Often defaults to 1024, needs to be raised to 65535+
Architecture: The Pub/Sub Layer
To scale horizontally, we cannot rely on sticky sessions alone. We need a way to broadcast messages across different server nodes.
Redis Pub/Sub
We utilized Redis as a lightweight message broker. When a user connects to Node A, and a message needs to be sent to them from a process on Node B, Node B publishes to a Redis channel that Node A is subscribed to.
- Client connects to Load Balancer
- Load Balancer routes to Socket Server Instance
- Instance subscribes to Redis channel
user:{userId}
Optimization Techniques
- Heartbeat Tuning: Adjusting ping/pong intervals to balance load vs. zombie connection detection.
- Binary Formats: Using Protobuf instead of JSON for high-frequency telemetry data reduced bandwidth by 40%.
Conclusion
Scaling is not just about adding more servers; it's about reducing the state each server needs to hold and ensuring efficient inter-node communication.