The anatomy of a sub-50ms Solana RPC request
Where the milliseconds actually go between your client and a Solana node: connection setup, commitment levels, queueing, and the calls that quietly dominate your p99, plus how we engineer each stage down.
On this page +
A Solana slot lasts about 400 milliseconds. If your RPC round-trip costs 200 of them, you're reacting to a chain that has already moved on. This post walks the full path of a JSON-RPC request, from your client's socket to a node's bank state and back, and accounts for every place the time goes. Most of it hides where nobody looks.
01Where latency actually lives#
When developers see slow RPC, they usually blame the node. Sometimes they're right. But a request spends its life in five distinct stages, and the node's execution time is frequently the smallest of them:
client edge / LB solana node
────── ───────── ───────────
① connect ──────────────▶
dns + tcp + tls (80–250ms cold,
~0ms warm)
② transmit ─────────────▶ ③ queue ────────────────▶
serialize + send wait for a worker (0ms … unbounded)
④ execute
read bank state
(1–30ms typical)
⑥ parse ◀─────────────────────────────────────── ⑤ respond
deserialize JSON serialize + sendThree of the five stages are negotiable. The handshake disappears if you keep connections alive. Queueing disappears if your provider has capacity and doesn't throttle you into a holding pattern. Execution time collapses if you choose methods the node can answer from hot state. The rest of this post takes each in turn.
02The wire tax: connections you keep paying for#
A fresh HTTPS connection costs one round trip for TCP and at least one more for TLS 1.3. From a client 20ms away from the endpoint, that's 40 to 60ms gone before the node has seen a single byte of your request. That alone blows the entire latency budget. The highest-impact change most teams can make is embarrassingly mundane: stop opening connections.
First, measure it. curl will itemize the bill for you:
If tls dominates total, your client is paying the handshake tax on every call. In Node, the fix is a single long-lived dispatcher:
Libraries like @solana/web3.js reuse connections internally if you reuse the Connection object. The classic mistake is constructing a new client per request inside a serverless handler. Every invocation pays the cold tax, and your p50 quietly triples.
03Commitment is a freshness knob#
Every read method accepts a commitment level, and it changes which bank state answers your query. It doesn't change how fast the node runs, just how old the data you get back is allowed to be:
The latency angle: asking for finalized when confirmed would do adds up to 13 seconds of data staleness to every decision your system makes, which dwarfs anything you'll ever save optimizing the wire. We see well-built bots running confirmed reads with processed pre-checks. The fast path speculates, the slow path verifies.
04The calls that eat your p99#
Not all methods are priced equally by reality. getBalance is a hash-map lookup. getProgramAccounts against the SPL Token program is a scan over hundreds of millions of accounts. Same API, same HTTP status code, four orders of magnitude apart.
Taming getProgramAccounts
If you must call gPA, give the node every chance to skip work. dataSize narrows the scan to accounts of one exact size; memcmp filters on bytes at a fixed offset; dataSlice shrinks the response. Here's the canonical “find token accounts for a mint” query done properly:
Beyond gPA, the usual suspects in a slow trace:
getBlockwith full transaction detail: megabytes of JSON per call. Ask forsignaturesdetail or usemaxSupportedTransactionVersionand slice what you need.getSignaturesForAddresson hot addresses: paginating a DEX program's history through RPC is archaeology with a teaspoon. Historical questions belong in datasets, not request loops.- Polling anything: if you call the same method every 400ms to detect change, you have reinvented a push feed, badly. That workload is what gRPC streams are for.
05Queueing: the silent killer#
Here's the part of the request path nobody benchmarks: what happens between your packet arriving at the provider's edge and the node beginning to execute it. Under load, oversubscribed providers make a choice. They can reject excess requests with a 429, or queue them so success rates look clean while latency quietly bleeds. Queueing is invisible in uptime dashboards and very visible in your fill prices.
Your users don't experience your median. They experience your p99, at the worst possible moment, when everyone else is also hammering the chain.
This is why we publish percentiles instead of averages. It's also why you should measure them yourself; the harness is twenty lines:
06How we engineer the path#
Everything above applies to any provider. Here's what we specifically do with the stages we control:
- No artificial queues. Flat pricing with no request caps means we never have to slow-walk your traffic to protect a metering tier. Capacity planning is our problem, by design.
- Heavy methods run on segregated pools. A whale calling unfiltered
gPAlands on scan-optimized nodes and can't head-of-line-block yourgetLatestBlockhash. - HTTP/2 at the edge, hot state at the node. Handshakes are amortized, and the methods that dominate real traffic are answered from memory-resident bank state.
- Push over poll wherever possible. Our WebSocket and Yellowstone gRPC endpoints exist precisely so your hot path never contains a polling loop.
Note the last row: even filtered, gPA costs about 8× a state read at p50. No provider engineering makes a scan free. The honest fix is to move standing queries off the request path entirely, which is the subject of our streaming field guide.
07A latency checklist#
Before you file a ticket that says “RPC is slow,” run down this list:
- One long-lived client per process, never per request. Verify with the
curlbreakdown. - Deploy in the same region as your endpoint. Physics is undefeated; 120ms of RTT can't be engineered away downstream.
- Use
confirmedby default. Escalate tofinalizedonly where settlement demands it. - Filter every
gPAwithdataSize+memcmp, and question why it's on the hot path at all. - Batch point reads with
getMultipleAccountsinstead of fanning out singles. - Replace polling loops with subscriptions. WebSocket for convenience, gRPC for the last milliseconds.
- Measure p99 from production conditions, continuously, not once during vendor selection.
A sub-50ms request isn't one trick. It's the absence of a dozen small taxes: handshakes you didn't reuse, queues you didn't see, scans you didn't filter, staleness you didn't choose. Every stage of the path is measurable, and everything measurable is fixable. The numbers in this post run against our public endpoints; we'd genuinely like you to check them.
Every benchmark in this blog runs against our public endpoints.
Spin up an RPC, WebSocket, or gRPC endpoint in under a minute — flat pricing, no request caps — and reproduce the numbers for your own workload.