Solana historical blocks: raw getBlock archive, slot 0 to last UTC midnight
Every Solana block, raw getBlock JSON, slot 0 to last UTC midnight. Pick a slot range, pay once, rsync the tar.zst onto your own disks.
Replay any chain history, train ML models on raw transactions, run your own indexer. Bandwidth is on us. Daily updates.
Buy a historical-blocks dataset
Pick a range, drop in your details, pay with card. Download links arrive within 24 hours.
- Vote program (validator votes per slot)Vote111111111111111111111111111111111111111, Vote transactions dominate raw block volume. They're included by default and can be filtered out at parse time
- System program (transfers and account ops)11111111111111111111111111111111, Every transfer, allocate, assign, advance_nonce captured exactly as the validator recorded it
- BPF Loader Upgradeable (program deploys)BPFLoaderUpgradeab1e11111111111111111111111, Every program deploy and upgrade in the archive. Useful for security and program-version research
What's in every bundle
raw getBlock JSON, status meta, rewards, manifest, and a Parquet index
| Event | Type | Description | Frequency | Latency |
|---|---|---|---|---|
| block.json (per slot) | event | Full Solana getBlock response for a single slot. Transactions, instructions, inner instructions, status, balances, rewards, blockhash, parent slot. | Very high | — |
| transactionStatusMeta | event | Inner-instruction trees, log messages, pre/post token balances, compute units consumed, returned data. The richest part of a block dump. | Very high | — |
| rewards | event | Per-slot validator rewards: vote, stake, fee, rent. Useful for staking-rewards research and validator economics analysis. | High | — |
| block_metadata | event | Leader pubkey, parent slot, block height, block_time. Joined at the top of every block.json for indexing. | High | — |
| manifest.json | event | Per-bundle manifest with start_slot, end_slot, file count, total_size, and SHA-256 per file. Drives any reproducibility check. | Low | — |
| index.parquet | event | Optional Parquet index of slot, block_time, signature_count per file. Speeds up “find the block at this timestamp” without unzipping. | Low | — |
Archive scale and tier sizing
last reviewed 2026-04-29
When raw beats parsed
Most teams should buy parsed data. The trading-datasets product covers DEX trades, pool events, mints, and transfers for 40+ programs without the decoder project. If your job is “give me Raydium swaps,” that's the right path.
Raw blocks are the answer when the parsed view doesn't carry what you need. A few cases that show up:
- Custom decoders for programs we don't cover. You don't want to wait on us to onboard a long- tail program; you want the bytes.
- Log-message research. Some bugs show up in
log_messageswith no instruction-level signal. The parsed view discards these by design; the raw view keeps them. - Transaction-shape ML. Models that train on the structure of the transaction (account positions, instruction order, inner-instruction tree) need the raw shape. A parsed event is too lossy.
- Audit and replay. Anything that requires “the exact bytes the validator saw” for legal, forensic, or post-incident analysis. Raw blocks are tamper-evident through the SHA-256 manifest.
- Custom indexer projects. Teams running a Geyser plugin in production often want to backfill the indexer from a slot range before going live. Raw blocks into the same indexer code is the cleanest path.
If none of these apply, parsed is faster and cheaper. If even one applies, raw is the only acceptable input.
How a bundle actually ships
You request a slot range or a tier. We resolve a signed download URL and email the link plus a manifest summary. The link points at a tar.zst, typically a few hundred GB to a few TB. Inside, files are flat by slot:
./ block_240000000.json block_240000001.json block_240000002.json ... block_240432000.json manifest.json index.parquet
Each block_*.json is the unmodified Solana getBlock response: transactions, inner instructions, status meta, balances, rewards, parent slot, blockhash, leader. Vote transactions are kept; drop them in your parser if you don't want them.
The manifest is a JSON array of { name, sha256, size } per file plus a top-level start_slot, end_slot, source_node identifier, and a signing key fingerprint. Verify the archive on download with a 50-line script (the Rust example above is the full version) before any pipeline runs against it. Cheap insurance.
The optional Parquet index is one row per file: slot, block_time, parent_slot, leader, tx_count, file_name. It lets you find “the block at 2025-09-01 14:00:00 UTC” without opening the tar. Worth keeping.
Tier pricing and what each one is for
Four standard tiers. Each is a one-time fee for the data plus rolling daily updates while the tier is active.
| Tier | Coverage | Approx. blocks | Compressed size | Price |
|---|---|---|---|---|
| 30 days | Latest 30 days, rolling | ~13M | ~2 TB | $500 |
| 6 months | Latest 180 days, rolling | ~78M | ~12 TB | $2,000 |
| 12 months | Latest 365 days, rolling | ~156M | ~24 TB | $3,000 |
| Genesis | Slot 0 to now | All blocks | ~60 TB | $5,000 |
Custom slot ranges (specific months for a paper, a window around a particular event) are quoted by slot count. Common ask, easy to fulfill, often cheaper than the next tier up if your range is narrow.
Bandwidth is included. Once your tier is active, signed URLs are reissuable from the dashboard so multi-region pulls don't require re-payment.
Where this fits vs Solana Foundation, AWS, and archival RPC
Four options exist for “I want raw Solana blocks.” Each is right for some workload.
Solana Foundation bigtable is the canonical archive. Free, blessed, complete. Reading it requires a GCP billing account, the bigtable API, and a tolerance for paged gRPC retrieval. If your team is already in GCP and doesn't mind bigtable, save the money on us.
AWS Public Blockchain Data ships getBlock JSON to S3 daily, free to access, S3 egress on you. Layout is theirs, slot-range targeting is on you. If you can absorb the ETL and you only need recent ranges, this is the cheapest option that exists. If you need from-genesis or a specific slot range or the bandwidth covered, the math turns.
Archival RPC providers (Helius, Triton, QuickNode) sell paginated block retrieval per call. Fine for a few thousand random old blocks. Wrong for a scan over a slot range; the per-call bill exceeds our tier price by an order of magnitude.
Us is the “just give me a signed URL to a tar of files” option. Pre-packaged by tier or custom range, manifest-verified, bandwidth included. We're not the cheapest possible source on a TB-by-TB basis; we're the cheapest source measured in “engineer hours consumed before the data is on disk.”
What teams actually do with raw blocks
Custom indexers
Backfill a Postgres or ClickHouse index from a specific slot range, then point the same indexer at a live gRPC stream once you're caught up. The historical and live paths both feed the same writer.
ML training on transaction shape
Models that classify transactions by structure rather than by parsed semantics. Sandwich detection, frontrun detection, generic anomaly detection. The parsed view is too lossy; raw blocks are the right input.
Replay and audit
Re-derive any historical state by re-running your parser against the bundle. Useful for “explain exactly where this number came from” conversations with auditors, regulators, and angry users.
Validator-economics research
Per-slot rewards and leader pubkey on every block. Stake- weight comparisons, MEV-flow attribution, and validator- performance dashboards run off this single field.
Long-tail program decoders
We don't parse every program on Solana; nobody does. For a long-tail program, raw blocks plus a custom Borsh decoder is the only way to get historical coverage. About a week of work for an experienced engineer.
Forensic incident response
When an exploit lands, the right move is to take the slot window in question, put the bundle on a forensic machine, and reconstruct the attacker path against the unmodified chain record. No third-party API in the audit trail.
Frequently asked questions
Related products
When you want trades, mints, and pool events without the parsing project. The faster path for most use cases.
A focused vertical archive of one program. Built on top of the same raw block ingest.
Real-time successor. Subscribe live once you've trained against the historical archive.
When you need archival RPC access to specific slots rather than a flat-file dump.
For the “and now I want this in real time” phase. Run a custom indexer on a managed validator.
Order a slot range
Pick a tier or quote a custom slot range. Bundles ship as signed download URLs with a SHA-256 manifest. Bandwidth is on us.