A production trading system at 2 AM. The order book for 150 symbols needs updating every 500 milliseconds. The single WebSocket connection that worked flawlessly during development is now dropping messages, timing out, and crashing the entire pipeline. The on-call engineer faces an uncomfortable choice: scale vertically until the server runs out of memory, or redesign the entire subscription architecture from scratch.
This scenario is not hypothetical. It is the inevitable endpoint of every real-time data architecture that starts simple and scales without a plan. The gap between "subscribe to one symbol" and "subscribe to 100 symbols with sub-second latency and zero message loss" is not a configuration change. It is an architectural decision.
This article dissects that decision. We will walk through the evolution from single-connection subscriptions to connection-pooled architectures, analyze the trade-offs at each stage, and provide production-grade code that handles the failure modes you will encounter at scale.
The Core Problem: Why 100 Subscriptions Break Single Connections
Understanding why single connections fail under load requires examining three distinct pressure points: message throughput, backpressure handling, and failure blast radius.
Message Throughput Bottlenecks
A single WebSocket connection operates on a single TCP stream. When you subscribe to 100 symbols and each emits 10 messages per second, the connection must handle 1,000 messages per second on one stream. At 50 bytes per message average, this is 50 KB/s — well within any reasonable network capacity. The bottleneck is not bandwidth. It is the event loop.
WebSocket message processing in most runtimes (Node.js event loop, Python asyncio, JavaScript in the browser) is sequential. Each message must be deserialized, parsed, and dispatched before the next message can be processed. When the event loop blocks on a computationally expensive message (say, a depth snapshot with 50 levels), subsequent messages queue up in the kernel receive buffer. If the buffer overflows, the kernel drops packets, and you lose data.
Backpressure and Flow Control
WebSocket includes a built-in flow control mechanism via the ping/pong frame exchange. A receiver can signal that it is overwhelmed by not sending a pong immediately. However, most client libraries implement this incorrectly or not at all. The result is a one-way valve: the server sends data faster than the client can process it, and the connection eventually stalls or resets.
Failure Blast Radius
This is the most insidious problem. When a single connection drops — due to a network hiccup, a server-side restart, or an unhandled exception in the message handler — you lose all 100 subscriptions simultaneously. The reconnection sequence must resubscribe to every symbol, re-establish the order book state, and catch up on any missed data. During this window, which can last 2–10 seconds depending on the server's subscription throttle limits, your system is blind.
For a trading system, 10 seconds of blind operation across 100 positions is not an edge case. It is a weekly occurrence.
Architecture Evolution: Four Stages
The progression from single connection to production-grade connection pooling follows four distinct architectural stages. Each stage solves the problems of the previous stage while introducing new trade-offs.
Stage 1: Single Connection (Development)
┌─────────────────────────────────────────────────────────┐
│ Application Process │
│ ┌─────────────────────────────────────────────────┐ │
│ │ WebSocket Client │ │
│ │ ┌─────────────────────────────────────────┐ │ │
│ │ │ Subscription Manager │ │ │
│ │ │ - symbol list: [AAPL, TSLA, NVDA...] │ │ │
│ │ │ - message queue (unbounded) │ │ │
│ │ └─────────────────────────────────────────┘ │ │
│ └─────────────────────────────────────────────────┘ │
│ │ │
│ single TCP │
└─────────────────────────┼───────────────────────────────┘
│
┌─────▼─────┐
│ TickDB │
│ WebSocket │
└───────────┘
Characteristics:
- One WebSocket connection to the data provider
- All symbol subscriptions multiplexed over the single stream
- Single message queue for the entire application
- Reconnection resubscribes all symbols
When it works: Up to approximately 20–30 symbols with message rates below 5/second per symbol. Development and testing environments.
When it breaks: Beyond 30 symbols or 5 messages/second/symbol, message queue growth becomes unbounded, and the event loop cannot drain messages faster than they arrive.
Stage 2: Connection Multiplexing
┌─────────────────────────────────────────────────────────┐
│ Application Process │
│ ┌──────────────┐ ┌──────────────┐ ┌──────────────┐ │
│ │ WS Client #1 │ │ WS Client #2 │ │ WS Client #3 │ │
│ │ symbols: │ │ symbols: │ │ symbols: │ │
│ │ [AAPL-N] │ │ [NVDA-TSLA] │ │ [META-GOOGL] │ │
│ └──────────────┘ └──────────────┘ └──────────────┘ │
└─────────────────────────────────────────────────────────┘
│ │ │
TCP #1 TCP #2 TCP #3
│ │ │
┌─────┼────────────────┼────────────────┼─────┐
│ ▼ ▼ ▼ │
│ ┌─────────────────────────────────────────┐ │
│ │ Load Balancer │ │
│ └─────────────────────────────────────────┘ │
│ │ │
│ ┌─────▼─────┐ │
│ │ TickDB │ │
│ │ WebSocket │ │
│ └───────────┘ │
└───────────────────────────────────────────────┘
Characteristics:
- Multiple WebSocket connections, each subscribing to a subset of symbols
- Load distribution via round-robin or symbol-hash partitioning
- Independent reconnection per connection
- Connection count typically capped by server-side subscription limits
When it works: Up to approximately 100 symbols with message rates below 10/second per symbol. Small-to-medium production deployments.
When it breaks: Static partitioning creates hot connections (high-activity symbols grouped together). Reconnection of one connection still blinds that subset of symbols.
Stage 3: Connection Pooling with Dynamic Scaling
┌─────────────────────────────────────────────────────────┐
│ Connection Pool Manager │
│ ┌─────────────────────────────────────────────────┐ │
│ │ Pool State: │ │
│ │ - connections: Map<id, ConnectionState> │ │
│ │ - symbol_map: Map<symbol, connection_id> │ │
│ │ - load_score: Map<connection_id, float> │ │
│ │ - target_load: 0.7 (configurable) │ │
│ └─────────────────────────────────────────────────┘ │
│ │ │
│ ┌──────────────┐ ┌──────────────┐ ┌──────────────┐ │
│ │ PooledConn #1│ │ PooledConn #2│ │ PooledConn #3│ │
│ │ healthy │ │ degraded │ │ spawning │ │
│ └──────────────┘ └──────────────┘ └──────────────┘ │
└─────────────────────────────────────────────────────────┘
Characteristics:
- Pool of WebSocket connections managed as a pool of resources
- Dynamic rebalancing: symbols migrate between connections based on load
- Automatic connection spawning when load exceeds threshold
- Automatic connection teardown during low-activity periods
- Per-connection health monitoring with circuit breaker behavior
When it works: 100–1,000 symbols with variable message rates. Production systems with SLA requirements.
When it breaks: Rebalancing events create momentary subscription churn. Requires careful handling of partial state during symbol migration.
Stage 4: Hierarchical Pool with Fan-Out Aggregation
┌─────────────────────────────────────────────────────────┐
│ Global Aggregation Layer │
│ ┌─────────────────────────────────────────────────┐ │
│ │ Unified Message Bus │ │
│ │ - topic: symbol name │ │
│ │ - subscriber: downstream consumers │ │
│ └─────────────────────────────────────────────────┘ │
└─────────────────────────────────────────────────────────┘
│ │ │
┌────▼────┐ ┌────▼────┐ ┌────▼────┐
│ Zone A │ │ Zone B │ │ Zone C │
│ Pool │ │ Pool │ │ Pool │
│ 10 conn │ │ 10 conn │ │ 10 conn │
└─────────┘ └─────────┘ └─────────┘
Characteristics:
- Multiple connection pools organized by geographic zone or symbol class
- Central aggregation layer normalizes and fans out messages
- Enables cross-zone redundancy and failover
- Supports different subscription tiers per symbol class
When it works: 1,000+ symbols, multi-region deployments, enterprise SLAs requiring 99.99% uptime.
Implementation complexity: High. Requires investment in monitoring, orchestration, and operational tooling.
Production-Grade Implementation
The following implementation covers Stage 3: a connection pool with dynamic scaling. This is the practical sweet spot for most production systems managing 100–1,000 real-time subscriptions.
Core Data Structures
import asyncio
import time
import random
import logging
from dataclasses import dataclass, field
from typing import Dict, List, Optional, Set
from enum import Enum
import os
logger = logging.getLogger(__name__)
class ConnectionHealth(Enum):
HEALTHY = "healthy"
DEGRADED = "degraded"
RECONNECTING = "reconnecting"
FAILED = "failed"
@dataclass
class ConnectionState:
"""Tracks the state of a single pooled WebSocket connection."""
id: str
websocket: Optional[object] = None
health: ConnectionHealth = ConnectionHealth.FAILED
subscribed_symbols: Set[str] = field(default_factory=set)
load_score: float = 0.0
last_message_ts: float = field(default_factory=time.time)
reconnect_attempts: int = 0
max_reconnect_attempts: int = 10
# Load scoring parameters
messages_per_second: float = 0.0
message_buffer_depth: int = 0
def __post_init__(self):
self.message_timestamps: List[float] = []
def update_load(self, message_count: int, buffer_depth: int):
"""Update load metrics using an exponentially weighted moving average."""
alpha = 0.3 # Smoothing factor
self.messages_per_second = (
alpha * message_count +
(1 - alpha) * self.messages_per_second
)
self.message_buffer_depth = buffer_depth
# Load score: combination of message rate and buffer backlog
self.load_score = (
min(self.messages_per_second / 100, 1.0) * 0.7 +
min(buffer_depth / 1000, 1.0) * 0.3
)
self.last_message_ts = time.time()
@dataclass
class PoolConfig:
"""Configuration for the connection pool."""
min_connections: int = 2
max_connections: int = 10
target_load: float = 0.7 # Scale up when average load exceeds this
scale_up_threshold: float = 0.8
scale_down_threshold: float = 0.3
symbols_per_connection_target: int = 20
health_check_interval: float = 10.0 # seconds
reconnect_base_delay: float = 1.0 # seconds
reconnect_max_delay: float = 60.0 # seconds
# ⚠️ Production note: For HFT workloads with sub-100ms latency requirements,
# consider replacing asyncio with trio or using a compiled language (Rust/Go).
# Python's GIL creates a hard ceiling on message processing throughput.
Connection Pool Manager
class WebSocketConnectionPool:
"""
Manages a pool of WebSocket connections with dynamic scaling.
Design principles:
1. No single connection exceeds its target symbol load
2. Rebalancing happens incrementally to minimize churn
3. Health monitoring triggers reconnection before complete failure
4. Graceful degradation: new subscriptions queued if pool is at capacity
"""
def __init__(
self,
api_key: str,
ws_url: str,
config: Optional[PoolConfig] = None
):
self.api_key = api_key or os.environ.get("TICKDB_API_KEY")
if not self.api_key:
raise ValueError(
"TickDB API key required. Set TICKDB_API_KEY environment variable."
)
self.ws_url = ws_url
self.config = config or PoolConfig()
self.connections: Dict[str, ConnectionState] = {}
self.symbol_to_connection: Dict[str, str] = {}
self.subscription_queue: asyncio.Queue = asyncio.Queue()
self._running = False
self._lock = asyncio.Lock()
# Rate limiting state
self._rate_limit_remaining: int = 100
self._rate_limit_reset: float = 0
async def start(self):
"""Initialize the pool with minimum connections."""
self._running = True
# Spawn initial connections
for i in range(self.config.min_connections):
conn_id = f"conn-{i}"
await self._spawn_connection(conn_id)
# Start background tasks
asyncio.create_task(self._health_monitor())
asyncio.create_task(self._rebalancer())
asyncio.create_task(self._subscription_processor())
logger.info(
f"Connection pool started with {self.config.min_connections} connections"
)
async def _spawn_connection(self, conn_id: str) -> ConnectionState:
"""Create and register a new WebSocket connection."""
conn = ConnectionState(id=conn_id)
self.connections[conn_id] = conn
try:
# WebSocket URL with API key as query parameter (not header)
# This matches TickDB's WebSocket authentication spec
ws_url = f"{self.ws_url}?api_key={self.api_key}"
# Using aiohttp for production WebSocket handling
import aiohttp
async with aiohttp.ClientSession() as session:
conn.websocket = await session.ws_connect(
ws_url,
heartbeat=15, # ping/pong heartbeat interval
timeout=aiohttp.ClientTimeout(total=30)
)
conn.health = ConnectionHealth.HEALTHY
conn.reconnect_attempts = 0
# Start message consumer for this connection
asyncio.create_task(
self._connection_consumer(conn_id, conn.websocket)
)
logger.info(f"Connection {conn_id} established")
except Exception as e:
logger.error(f"Failed to spawn connection {conn_id}: {e}")
conn.health = ConnectionHealth.FAILED
await self._schedule_reconnect(conn_id)
return conn
async def _connection_consumer(self, conn_id: str, websocket):
"""
Dedicated consumer for a single connection's message stream.
Runs in its own task to prevent cross-connection message blocking.
"""
async for msg in websocket:
async with self._lock:
conn = self.connections.get(conn_id)
if not conn or conn.health != ConnectionState.HEALTHY:
break
if msg.type == aiohttp.WSMsgType.PONG:
# Heartbeat acknowledged — connection is alive
continue
elif msg.type == aiohttp.WSMsgType.TEXT:
await self._process_message(conn_id, msg.data)
elif msg.type == aiohttp.WSMsgType.ERROR:
logger.error(f"WebSocket error on {conn_id}: {msg.data}")
async with self._lock:
self.connections[conn_id].health = ConnectionState.RECONNECTING
await self._schedule_reconnect(conn_id)
break
elif msg.type == aiohttp.WSMsgType.CLOSE:
logger.warning(f"Connection {conn_id} closed by server")
async with self._lock:
self.connections[conn_id].health = ConnectionState.RECONNECTING
await self._schedule_reconnect(conn_id)
break
async def _process_message(self, conn_id: str, raw_data: str):
"""Parse and dispatch a single message."""
conn = self.connections.get(conn_id)
if not conn:
return
try:
import json
data = json.loads(raw_data)
# Update load metrics
now = time.time()
conn.message_timestamps.append(now)
# Count messages in last second
recent_count = sum(
1 for ts in conn.message_timestamps
if now - ts < 1.0
)
conn.update_load(recent_count, 0) # Buffer depth tracked separately
# Emit to symbol-specific handlers
symbol = data.get("symbol")
if symbol:
await self._emit_to_symbol_subscribers(symbol, data)
except json.JSONDecodeError as e:
logger.warning(f"Invalid JSON on {conn_id}: {e}")
except Exception as e:
logger.error(f"Message processing error on {conn_id}: {e}")
async def _emit_to_symbol_subscribers(self, symbol: str, data: dict):
"""Fan out message to all subscribers of this symbol."""
# Implementation depends on your downstream consumer pattern
# Could use asyncio.Event, pub/sub, or a message queue
pass
async def _schedule_reconnect(self, conn_id: str):
"""Schedule reconnection with exponential backoff and jitter."""
conn = self.connections.get(conn_id)
if not conn:
return
conn.reconnect_attempts += 1
if conn.reconnect_attempts > conn.max_reconnect_attempts:
logger.error(
f"Connection {conn_id} exceeded max reconnection attempts. "
f"Marking as failed."
)
conn.health = ConnectionState.FAILED
return
# Exponential backoff with jitter
base_delay = self.config.reconnect_base_delay
max_delay = self.config.reconnect_max_delay
delay = min(base_delay * (2 ** conn.reconnect_attempts), max_delay)
jitter = random.uniform(0, delay * 0.1)
total_delay = delay + jitter
logger.info(
f"Scheduling reconnect for {conn_id} in {total_delay:.2f}s "
f"(attempt {conn.reconnect_attempts}/{conn.max_reconnect_attempts})"
)
await asyncio.sleep(total_delay)
# Re-spawn the connection
async with self._lock:
conn.health = ConnectionState.HEALTHY
await self._spawn_connection(conn_id)
async def subscribe(self, symbol: str):
"""Subscribe to a symbol, routing to the least-loaded connection."""
# Queue-based to handle burst subscriptions without blocking
await self.subscription_queue.put(symbol)
async def _subscription_processor(self):
"""
Background task that processes subscription requests from the queue.
Implements batching and load-aware routing.
"""
pending_symbols: List[str] = []
while self._running:
# Collect subscriptions for batch processing (every 100ms)
try:
symbol = await asyncio.wait_for(
self.subscription_queue.get(),
timeout=0.1
)
pending_symbols.append(symbol)
except asyncio.TimeoutError:
pass # Process any pending symbols
if pending_symbols:
await self._route_subscriptions(pending_symbols)
pending_symbols = []
async def _route_subscriptions(self, symbols: List[str]):
"""Route symbols to appropriate connections based on current load."""
for symbol in symbols:
async with self._lock:
# Check if already subscribed
if symbol in self.symbol_to_connection:
continue
# Find least-loaded healthy connection
eligible = [
(cid, conn) for cid, conn in self.connections.items()
if conn.health == ConnectionState.HEALTHY
]
if not eligible:
logger.warning(f"No healthy connections for {symbol}, queuing")
await self.subscription_queue.put(symbol)
continue
# Select connection with lowest load
target_conn_id, target_conn = min(
eligible,
key=lambda x: x[1].load_score
)
# Check if scaling is needed
avg_load = sum(c.load_score for _, c in eligible) / len(eligible)
if (
avg_load > self.config.scale_up_threshold and
len(self.connections) < self.config.max_connections
):
# Scale up: create new connection for this subscription
new_id = f"conn-{len(self.connections)}"
await self._spawn_connection(new_id)
target_conn_id = new_id
logger.info(f"Scaled up: new connection {new_id}")
# Subscribe on target connection
await self._send_subscription(target_conn_id, symbol)
self.symbol_to_connection[symbol] = target_conn_id
target_conn.subscribed_symbols.add(symbol)
logger.debug(
f"Subscribed {symbol} to {target_conn_id} "
f"(load: {target_conn.load_score:.2f})"
)
async def _send_subscription(self, conn_id: str, symbol: str):
"""Send a subscription command to a specific connection."""
conn = self.connections.get(conn_id)
if not conn or not conn.websocket:
raise RuntimeError(f"Connection {conn_id} not available")
subscription_msg = {
"cmd": "subscribe",
"symbol": symbol,
"channel": "depth" # TickDB depth channel for order book
}
await conn.websocket.send_json(subscription_msg)
async def _health_monitor(self):
"""
Periodic health check: detect stale connections and trigger recovery.
"""
while self._running:
await asyncio.sleep(self.config.health_check_interval)
async with self._lock:
for conn_id, conn in self.connections.items():
now = time.time()
stale_seconds = now - conn.last_message_ts
# Connection with no messages for >30 seconds is suspicious
if stale_seconds > 30 and conn.health == ConnectionState.HEALTHY:
logger.warning(
f"Connection {conn_id} stale ({stale_seconds:.0f}s "
f"since last message)"
)
conn.health = ConnectionState.DEGRADED
# Health check: send ping and expect pong within 5 seconds
if conn.websocket and conn.health in (
ConnectionState.HEALTHY,
ConnectionState.DEGRADED
):
try:
await asyncio.wait_for(
conn.websocket.ping(),
timeout=5.0
)
except asyncio.TimeoutError:
logger.warning(
f"Ping timeout on {conn_id}, scheduling reconnect"
)
conn.health = ConnectionState.RECONNECTING
asyncio.create_task(self._schedule_reconnect(conn_id))
async def _rebalancer(self):
"""
Periodically rebalances symbol assignments to even out load.
Runs every 60 seconds to avoid excessive churn.
"""
while self._running:
await asyncio.sleep(60)
async with self._lock:
await self._execute_rebalance()
async def _execute_rebalance(self):
"""Move symbols from high-load to low-load connections."""
if len(self.connections) < 2:
return
# Calculate average load
loads = [
(cid, conn.load_score, len(conn.subscribed_symbols))
for cid, conn in self.connections.items()
if conn.health == ConnectionState.HEALTHY
]
if not loads:
return
avg_load = sum(l[1] for l in loads) / len(loads)
# Find overloaded and underloaded connections
overloaded = [
(cid, score, count) for cid, score, count in loads
if score > avg_load * 1.3 and count > 1
]
underloaded = [
(cid, score, count) for cid, score, count in loads
if score < avg_load * 0.7
]
for hot_id, hot_score, hot_count in overloaded:
for cold_id, _, cold_count in underloaded:
if hot_count <= self.config.symbols_per_connection_target:
break
# Migrate one symbol from hot to cold
hot_conn = self.connections[hot_id]
symbol_to_move = next(iter(hot_conn.subscribed_symbols))
hot_conn.subscribed_symbols.remove(symbol_to_move)
del self.symbol_to_connection[symbol_to_move]
await self._send_subscription(cold_id, symbol_to_move)
cold_conn = self.connections[cold_id]
cold_conn.subscribed_symbols.add(symbol_to_move)
self.symbol_to_connection[symbol_to_move] = cold_id
logger.info(
f"Rebalanced {symbol_to_move} from {hot_id} to {cold_id}"
)
break # One migration per rebalance cycle
async def unsubscribe(self, symbol: str):
"""Unsubscribe from a symbol."""
async with self._lock:
conn_id = self.symbol_to_connection.get(symbol)
if not conn_id:
return
conn = self.connections.get(conn_id)
if conn and conn.websocket:
await conn.websocket.send_json({
"cmd": "unsubscribe",
"symbol": symbol
})
conn.subscribed_symbols.discard(symbol)
del self.symbol_to_connection[symbol]
async def stop(self):
"""Graceful shutdown: close all connections and cancel tasks."""
self._running = False
async with self._lock:
for conn_id, conn in self.connections.items():
if conn.websocket:
await conn.websocket.close()
logger.info("Connection pool stopped")
Usage Example
import asyncio
import os
async def main():
# Initialize pool with TickDB WebSocket endpoint
pool = WebSocketConnectionPool(
api_key=os.environ.get("TICKDB_API_KEY"),
ws_url="wss://api.tickdb.ai/v1/ws/market",
config=PoolConfig(
min_connections=3,
max_connections=8,
target_load=0.6,
symbols_per_connection_target=25
)
)
await pool.start()
# Subscribe to a basket of tech stocks
symbols = [
"AAPL.US", "MSFT.US", "GOOGL.US", "NVDA.US", "TSLA.US",
"META.US", "AMZN.US", "AMD.US", "INTC.US", "CRM.US",
"ORCL.US", "ADBE.US", "NFLX.US", "PYPL.US", "COIN.US"
]
for symbol in symbols:
await pool.subscribe(symbol)
# Keep running — messages processed in background
await asyncio.Event().wait()
if __name__ == "__main__":
asyncio.run(main())
Performance Trade-offs: Memory, CPU, and Connection Count
Designing a connection pool requires explicit decisions about where to accept trade-offs. There is no universally optimal configuration. The table below summarizes the trade-offs at each dimension.
| Dimension | Low connection count | High connection count |
|---|---|---|
| Memory per connection | Higher (larger message buffers) | Lower (smaller per-connection buffers) |
| CPU overhead | Lower (fewer event loop tasks) | Higher (more context switching) |
| Failure blast radius | Larger (more symbols per connection) | Smaller (fewer symbols per connection) |
| Server-side limits | Safer (fewer open connections) | Risk of hitting provider limits |
| Rebalancing cost | Lower frequency | Higher frequency (more connections) |
| Recommended for | Stable, predictable symbol sets | Dynamic, bursty subscription patterns |
Sizing Formula
A practical starting point for connection pool sizing:
optimal_connections = ceil(
total_symbols / symbols_per_connection_target
) + failover_buffer
Where:
total_symbols: Maximum number of concurrent subscriptionssymbols_per_connection_target: Target symbols per connection (typically 20–50 depending on message rate)failover_buffer: 1–2 connections for redundancy (handles one connection failure without rebalancing)
For 150 symbols with 25 symbols/connection target and 1 failover buffer:
optimal_connections = ceil(150 / 25) + 1 = 7 + 1 = 8 connections
Memory Per Connection
Each WebSocket connection maintains:
- TCP receive buffer (typically 16–64 KB)
- WebSocket frame buffer (varies by library)
- Application-level message queue (unbounded in naive implementations)
A well-designed pool caps the per-connection queue depth and drops or backpressures when the limit is exceeded. The recommended maximum queue depth is 1,000 messages per connection. Beyond this, the memory growth becomes unbounded during processing spikes.
CPU Considerations
Python's asyncio event loop is single-threaded. Every await point is a context switch, but the GIL means only one task executes Python bytecode at a time. For message processing rates above 5,000 messages/second, Python's asyncio becomes a bottleneck.
Warning: If your use case requires sub-50ms processing latency at high message rates (50+ symbols, 20+ messages/second/symbol), consider:
- Offloading parsing to C extension: Use
orjsonfor JSON parsing (10x faster than stdlib) - Process-per-connection: Run each connection in its own process with shared memory for state
- Compiled language rewrite: Rust with
tokioor Go with channels
TickDB Integration: Depth Channel at Scale
For order book depth data specifically, TickDB's WebSocket API supports the depth channel with up to 10 levels of order book depth per symbol. At scale, this creates a specific optimization opportunity: depth deltas.
The full depth snapshot (L1–L10) on every update is expensive. A 10-level depth update at 10 updates/second across 100 symbols generates 10,000 message parse operations per second. For production systems, consider:
- Subscribe at L1 only for high-frequency symbols (hot path)
- Subscribe at L5 or L10 for mid-frequency analysis
- Use the kline channel for lower-frequency strategies that do not need real-time depth
# Example: Tiered subscription strategy
async def setup_tiered_subscriptions(pool: WebSocketConnectionPool):
"""
Tier 1: High-activity symbols — L1 depth, high-frequency updates
Tier 2: Medium-activity symbols — L5 depth, medium-frequency
Tier 3: Low-activity symbols — L1 depth, low-frequency
"""
tier1_symbols = ["BTC.USDT", "ETH.USDT"] # Crypto: high volume
tier2_symbols = ["AAPL.US", "NVDA.US", "TSLA.US"] # High-volume equities
tier3_symbols = ["SPY.US", "QQQ.US"] # ETFs: lower volume
for symbol in tier1_symbols:
await pool.subscribe(symbol, channel="depth", depth_level=1)
for symbol in tier2_symbols:
await pool.subscribe(symbol, channel="depth", depth_level=5)
for symbol in tier3_symbols:
await pool.subscribe(symbol, channel="depth", depth_level=1)
Conclusion
Scaling WebSocket subscriptions from one connection to a dynamic connection pool is not a feature addition. It is an architectural shift that touches every layer of your real-time data system: how messages are routed, how failures are contained, how load is measured, and how the system adapts to changing demand.
The implementation provided in this article covers the core patterns — dynamic scaling, health monitoring, rebalancing, and graceful reconnection — that form the foundation of a production-grade subscription architecture. Adapt it to your specific provider's API semantics, your language runtime's constraints, and your SLA requirements.
The 2 AM incident described at the opening of this article is preventable. It requires investing in the architecture before the crisis, not during it.
Next Steps
If you are building a real-time data pipeline and evaluating data providers, visit tickdb.ai to explore the WebSocket API, check connection limits, and review the depth channel documentation.
If you want to run this connection pool code against TickDB:
- Sign up at tickdb.ai (free, no credit card required)
- Generate an API key in the dashboard
- Set the
TICKDB_API_KEYenvironment variable - Copy the connection pool implementation above and customize the
PoolConfigfor your symbol count
If you are processing high-frequency order book data, consider the tickdb-market-data SKILL on ClawHub, which includes pre-built connection pooling templates optimized for TickDB's depth channel.
If you need institutional-grade data with 10+ years of historical OHLCV for backtesting your strategy before deploying the real-time pipeline, reach out to enterprise@tickdb.ai for volume pricing and SLA guarantees.
This article does not constitute investment advice. Real-time market data systems involve technical complexity; ensure thorough testing in a sandbox environment before production deployment.