Solana Geyser Plugin (2026): Build, Deploy & 6 Failure Modes
How the Agave Geyser plugin interface works, a complete Rust build from cargo new to mainnet, and the six production failure modes we have debugged on real customer plugins: variant mismatch, silent panic, startup double-count, and more.
On this page +
We host Geyser plugins for a living. Which means we've debugged more broken ones than we'd like to admit. Variant mismatches caught at 2am. Data-integrity issues traced back to five missing lines of code. And the one that still bothers me to think about: a plugin that silently lost data for two weeks while the validator kept reporting itself as completely healthy. No alerts. No errors. The data just stopped coming. We found it during a routine audit. Nobody warned us about the silent panic. The variant mismatch is annoying; the silent panic is the one that loses data for two weeks while the validator reports healthy. If you read nothing else, read failure mode 3.
01What the Solana Geyser plugin actually is#
A Geyser plugin is a Rust .so (a shared library) that the Agave validator loads at startup and runs inside its own process. It gets callbacks when accounts are written, when transactions confirm, when slots change status. Your code sees state at the moment the runtime commits it, before the RPC layer knows anything happened.
slot fires geyser path: your .so runs in-process < 1ms rpc path: RPC -> JSON/WebSocket -> your app ~10-150ms
If you've been polling getProgramAccounts at 250ms, or using accountSubscribe over WebSocket, you're on the RPC path. A Geyser plugin bypasses it entirely.
The interface is maintained by Anza (the team that took over Solana Labs' validator work) and ships as agave-geyser-plugin-interface on crates.io, currently at version 3.1.5. If you want a quick definition before diving in, our Geyser plugin glossary entry covers the basics. The Solana Labs repository was archived January 22, 2025. Before you add a single dependency, point it at anza-xyz/agave, not the old solana-geyser-plugin-interface crate. Nobody is maintaining that anymore.
Yellowstone gRPC (what most people in the Solana space loosely call “the streaming gRPC thing”) is itself a Geyser plugin. When you stream from Helius, Alchemy, Triton, or our own gRPC endpoint, the data started as a callback in code structurally identical to what you'd write here.
02Why it exists#
Validators were falling behind consensus because people were using them as databases. A validator fielding thousands of getProgramAccounts calls per second (AMM pools, NFT listings, orderbooks, token holder lists) can't process transactions and answer database queries at the same time. Geyser replaced a failed second-node approach with a simpler idea: instead of a second node playing catch-up, you get a hook inside the first one. Accounts are pushed to your code the moment they're written. Your code decides what to do with them. The validator moves on without waiting.
Magic Eden ran this experiment and measured it. Their engineering team documented switching from RPC-based state tracking to a Geyser pipeline and got “almost 30x faster” end-to-end performance, from chain event to visible UI change. It comes from operating one of the highest-traffic NFT platforms on Solana, and they published it. We've seen similar jumps on our own validators. Teams come in polling RPC at 500ms and leave with sub-millisecond account updates. The gap is that wide.
03The trait: nine methods and one change that mattered#
The GeyserPlugin trait has nine methods. Realistically you'll implement two or three; the rest have default no-op implementations. Your plugin compiles to a cdylib, exports a C-ABI entry point named _create_plugin, and the validator dlopen()s it at startup.
Solana 1.16 changed &mut self to &self on every callback except on_load and on_unload. Before that, every callback acquired an exclusive write lock on your plugin struct—a serious bottleneck under load. This is why you'll see old plugins using RefCell for interior mutability. Today, every field your plugin touches from callback code must be thread-safe: Mutex, RwLock, AtomicU64, channels. RefCell and Cell won't compile.
Lifecycle methods
name() is the only truly required method. Name it after whatever program you're indexing, not “plugin”. “pumpfun-indexer” tells you everything at 3am; “plugin” tells you nothing.
on_load() is where you parse the config, open your database pool, and spawn worker threads. Most guides skip the second parameter: is_reload. When a hot-reload is triggered without a full validator restart, this comes in true. We use it to swap filter configs and reconnect to new database endpoints without downtime.
on_unload() is where things go wrong when teams skip it. Every resource you open in on_load must be closed here, in order, with a timeout. See failure mode #6.
The account callback
update_account() is the one that matters. Agave's own docs say “any delay here may cause the validator to fall behind the network.” That's not a suggestion.
The account argument is a versioned enum. On current Agave that's ReplicaAccountInfoVersions::V0_0_3. Match the wrong variant, the validator starts, loads the plugin, prints no errors, fires nothing. The Rust compiler won't warn you. The match arm is perfectly valid Rust, it just never executes. We come back to this in failure mode #1.
write_version inside is not a per-account counter. It's global: a single atomic counter that increments with every account write anywhere on the validator. Same account written twice in one slot? write_version is the only way to know which came last. slot alone won't tell you.
The txn field inside ReplicaAccountInfoV3 is Option<&SanitizedTransaction>. During startup replay (when is_startup is true) this is always None. Unwrap it without checking and your plugin panics on startup.
Slot status variants
update_slot_status() has more variants than any blog post I've seen actually documents:
FirstShredReceived: earliest possible signal, before processing even startsCreatedBank: execution environment is live for this slotCompleted: all shreds received, not yet replayedProcessed: replayed, but 5–10% of these never get confirmedConfirmed: supermajority vote; what production indexers actually useRooted: permanent, ~32 seconds of latency for the guaranteeDead(String): slot rejected, the string tells you why
The Confirmed vs Processed distinction is real. Processed slots get skipped at a 5–10% rate. If your indexer acts on processed events and the slot gets skipped, you've acted on data that doesn't exist on-chain. Most production indexers wait for confirmed: supermajority guarantee without the 32-second finality wait.
Feature flags
Return false for what you don't need. The validator genuinely skips those codepaths. Transaction notifications are off by default; you have to explicitly turn them on. A lot of people miss that and then wonder why CPU usage is higher than expected on a high-throughput program.
04Building a Solana Geyser plugin in Rust: full tutorial#
The plugin below filters by program owner and writes account updates to Postgres. The hot-path/worker channel split is the one part that needs explaining; the rest is boilerplate you set up once.
Scaffold and manifest
crate-type = ["cdylib"] is the line most teams miss on the first try. Without it, cargo build --release produces a regular Rust library the validator can't dlopen(). It needs a dynamic library with a C-ABI entry point. That's what cdylib produces.
Version compatibility is non-negotiable: the agave-geyser-plugin-interface version in your Cargo.toml must match the validator binary's version exactly, built with the same Rust toolchain. A mismatch produces either a silent dlopen() failure or undefined behavior across the FFI boundary. When Agave upgrades, you recompile. No workaround exists.
The plugin
Build and deploy
05Six failure modes that will ruin your week#
We've had support tickets for all of these. Some more than once.
1. The variant mismatch
Your plugin compiles. The validator starts. No errors anywhere. No callbacks fire. Agave 3.x ships ReplicaAccountInfoVersions::V0_0_3. If you're pattern-matching V0_0_2 (which most tutorials still show), the compiler says nothing. The match arm is perfectly valid Rust. It just never executes. You'll spend two hours adding print statements before you find it.
Write a unit test before you deploy: construct ReplicaAccountInfoVersions::V0_0_3 directly, call update_account, assert the channel receiver got a job. If the counter stays at zero, you found the bug on your laptop. That's the right place to find it.
2. The slow callback
Your update_account takes 1.5ms per call instead of 100µs. The validator is still producing blocks. But it's starting to skip slots. A skipped slot means transactions landed and your callback never fired for them. Permanently. You won't see this in your plugin logs (there's nothing to log when a callback doesn't execute). You'll see it as gaps in indexer data, or a trading strategy underperforming by exactly the margin of missed events, discovered weeks later in a post-mortem.
Monitor callback p99 continuously. Not “is the validator running” but callback duration specifically. Different metrics. Only one catches this.
3. The silent panic
This is the one I mentioned at the top. GitHub issue #27283, closed as “not planned.” When a Geyser plugin panics inside a callback, the validator keeps reporting itself as healthy. RPC health endpoints return ok. Block production continues. Only that specific callback stops executing, silently.
If notify_transaction panics, you stop receiving transaction notifications. If update_account panics, accounts stop being indexed. The validator has no idea either happened. No alert, no log entry on the validator side. Your indexer looks fine. Trading strategies running off it start underperforming slightly. We had a customer spend twelve days in that loop before we asked them to add a heartbeat counter inside the callback itself.
Mitigations we use in production: panic_on_db_errors: true in your PostgreSQL plugin config forces the validator to terminate on errors rather than continue without them. std::panic::catch_unwind around your callback body turns panics into Err returns. A heartbeat counter emitted from inside the callback: not a generic “is the validator running” check, but something tracking whether this specific callback fired in the last 5 seconds. Alert on that. It's the only signal that catches cases where the first two didn't trigger.
4. The startup double-count
Validator restarts. Snapshot replay fires update_account with is_startup: true for every account in the snapshot. If your callback doesn't distinguish startup writes from live writes, you re-process everything. For a basic indexer that upserts by pubkey, that's fine. For anything that counts events, computes rolling aggregates, or writes one row per transaction, startup replay is silent corruption.
There's also a second piece that gets teams: during validator startup, account updates arrive for slots N through roughly N+150 before the corresponding slot status notifications are sent (GitHub issue #28871). If your indexer needs a slot notification before finalizing an account write, those 150 slots of account updates will arrive with no slot to attach them to. Buffer them, release only after notify_end_of_startup fires and slot notifications have started coming in.
5. The slot ordering trap
SlotStatus::Confirmed and SlotStatus::Processed can arrive in either order. Almost nobody's state machine is built for it, and when it happens it 's intermittent enough that you might not catch it for weeks.
From Agave 3.0 onward, notify_transaction is guaranteed to arrive before SlotStatus::Processed for the same slot. Before 3.0, either could arrive first. Use write_version for ordering account writes within a slot. Use SlotStatus::Rooted for permanent finality. Don't build ordering logic on slot status arrival sequence.
6. Missing on_unload
Your plugin has a worker pool. on_unload is a no-op. The validator restarts for a routine Agave bump. The sender drops while the worker is mid-INSERT. Some rows commit, some don't, you can't tell which.
We've had teams find this months after deploy, during an audit, staring at half-written account state they couldn't explain. Every resource opened in on_load must be released in on_unload, in order, with a timeout:
Drop the sender before shutting down the runtime. The worker needs to see the channel close and finish its in-flight writes before the async runtime disappears underneath it. Get the order wrong and you're back to partial writes.
06Testing before you touch mainnet#
Most teams compile, copy the .so to the validator, watch logs for ten minutes, and ship it. The first time we hit failure mode #1 on a production validator (plugin loaded cleanly, zero callbacks firing, no errors anywhere), we stopped.
- Unit test the callback. Construct
ReplicaAccountInfoVersions::V0_0_3directly in a test, callupdate_account, assert the channel receiver got a job. Twenty minutes to write. Catches variant mismatches on your laptop, which is where you want to catch them, not during a weekend deploy. - Staging validator, 24 hours. Run a non-voting validator on a separate identity keypair that follows mainnet but doesn't vote. Every customer plugin gets a day on it before mainnet. Watch callback p99 the whole time. If it creeps past 500µs, find the cause before that's your production box missing slots.
- Restart soak, 50 times. Restart the staging validator 50 times and check the database after each for partial writes. We've had teams skip this and find the problem three months later during an audit. 50 restarts surfaces it in an afternoon.
If it breaks on any of the three, you have a specific, reproducible failure before it gets anywhere near mainnet.
07Performance: what healthy actually looks like#
08Do you actually need a Geyser plugin?#
Most teams who ask us this don't. Genuinely.
You probably need one if:
- You're running a sniper, MEV searcher, or liquidator where a missed event has a real dollar cost
- You need to write to a backend no managed stream supports (proprietary schema, internal message bus, something custom)
- Your filter logic can't be expressed in Yellowstone's filter language (cross-program predicates, composite discriminator matching)
- You're integrating a validator into a larger custody or exchange system with specific data contracts
- You have a regulatory or data-sovereignty requirement to own the full stack
You probably don't if:
- You haven't tried a managed Yellowstone gRPC stream yet. Start there honestly.
- You only need account state at human timescales (sub-second WebSocket is fine)
- You're submitting transactions (staked RPC connections, not plugins)
- Updates every second or two are sufficient. WebSocket handles that with zero infrastructure.
We tell people this even when they've already signed up. A managed stream genuinely solves 90% of real-time data problems with a fraction of the complexity. Not sure which side of that line you're on? Read our Yellowstone gRPC vs WebSockets guide before committing to the plugin path.
09The streaming ecosystem in 2026#
10How teams actually ship a Geyser plugin in 2026#
The config restart cost is real and consistently underestimated. On mainnet, a validator restart can take up to an hour: snapshot download, replay, catching up to tip. Hot-reload workarounds exist but require custom implementation to get right. On hosted, a config change is a file upload and a 30-minute staging run.
Self-hosting is the right call when you have a genuine requirement to own the hardware (regulatory, custody, or you're already running a validator for consensus). “We want full control” alone isn't a reason. On hosted plans you own your .so and your config. You just don't carry the hardware cost or the on-call burden.
The short version: a Solana Geyser plugin is in-process on the validator. The RPC is not. Write the filter in the callback, do the I/O in a worker. The six things that break it: variant mismatch, slow callback, silent panic, startup double-count with notification gap, slot ordering assumptions, missing on_unload. 90% of builders should start with a managed gRPC stream. The 10% who need custom consumer logic should host a plugin, not self-host a validator.
Not sure which camp you're in? The Yellowstone gRPC vs WebSockets guide answers it in one read.
Yellowstone gRPC Providers Compared (2026): Latency, Decoded Streams & What Nobody Tells You Before You Go Live
Bare Metal for Solana Applications (2026): Frankfurt, CPU Steal, and Six Servers Matched to What You're Actually Running
Every benchmark in this blog runs against our public endpoints.
Spin up an RPC, WebSocket, or gRPC endpoint in under a minute. Flat pricing, no request caps. Reproduce the numbers for your own workload.