Historical Dataset

Solana historical blocks: raw getBlock archive, slot 0 to last UTC midnight

Every Solana block, raw getBlock JSON, slot 0 to last UTC midnight. Pick a slot range, pay once, rsync the tar.zst onto your own disks.

Replay any chain history, train ML models on raw transactions, run your own indexer. Bandwidth is on us. Daily updates.

Genesis to nowRaw getBlock JSONtar.zst bundlesSlot-range targeted60+ TB totalDaily updates

Buy a historical-blocks dataset

Pick a range, drop in your details, pay with card. Download links arrive within 24 hours.

Select a block range
On-chain programs covered

What's in every bundle

raw getBlock JSON, status meta, rewards, manifest, and a Parquet index

EventTypeDescriptionFrequencyLatency
block.json (per slot)eventFull Solana getBlock response for a single slot. Transactions, instructions, inner instructions, status, balances, rewards, blockhash, parent slot.Very high
transactionStatusMetaeventInner-instruction trees, log messages, pre/post token balances, compute units consumed, returned data. The richest part of a block dump.Very high
rewardseventPer-slot validator rewards: vote, stake, fee, rent. Useful for staking-rewards research and validator economics analysis.High
block_metadataeventLeader pubkey, parent slot, block height, block_time. Joined at the top of every block.json for indexing.High
manifest.jsoneventPer-bundle manifest with start_slot, end_slot, file count, total_size, and SHA-256 per file. Drives any reproducibility check.Low
index.parqueteventOptional Parquet index of slot, block_time, signature_count per file. Speeds up “find the block at this timestamp” without unzipping.Low

Archive scale and tier sizing

last reviewed 2026-04-29

Coverage
Slot 0 to now
Genesis through last UTC midnight, refreshed daily by 06:00 UTC
Verified 2026-04-29
Total archive size
60+ TB
Compressed, full from-genesis dataset. Slot ranges scale linearly
Verified 2026-04-29
Last 30 days
~13M blocks / ~2 TB
Most-popular tier. Covers a typical research project window
Verified 2026-04-29
Pricing tiers
$500 - $5,000
$500 (1mo), $2,000 (6mo), $3,000 (12mo), $5,000 (genesis). Per-slot custom ranges quoted

When raw beats parsed

Most teams should buy parsed data. The trading-datasets product covers DEX trades, pool events, mints, and transfers for 40+ programs without the decoder project. If your job is “give me Raydium swaps,” that's the right path.

Raw blocks are the answer when the parsed view doesn't carry what you need. A few cases that show up:

  • Custom decoders for programs we don't cover. You don't want to wait on us to onboard a long- tail program; you want the bytes.
  • Log-message research. Some bugs show up in log_messages with no instruction-level signal. The parsed view discards these by design; the raw view keeps them.
  • Transaction-shape ML. Models that train on the structure of the transaction (account positions, instruction order, inner-instruction tree) need the raw shape. A parsed event is too lossy.
  • Audit and replay. Anything that requires “the exact bytes the validator saw” for legal, forensic, or post-incident analysis. Raw blocks are tamper-evident through the SHA-256 manifest.
  • Custom indexer projects. Teams running a Geyser plugin in production often want to backfill the indexer from a slot range before going live. Raw blocks into the same indexer code is the cleanest path.

If none of these apply, parsed is faster and cheaper. If even one applies, raw is the only acceptable input.

How a bundle actually ships

You request a slot range or a tier. We resolve a signed download URL and email the link plus a manifest summary. The link points at a tar.zst, typically a few hundred GB to a few TB. Inside, files are flat by slot:

./
  block_240000000.json
  block_240000001.json
  block_240000002.json
  ...
  block_240432000.json
  manifest.json
  index.parquet

Each block_*.json is the unmodified Solana getBlock response: transactions, inner instructions, status meta, balances, rewards, parent slot, blockhash, leader. Vote transactions are kept; drop them in your parser if you don't want them.

The manifest is a JSON array of { name, sha256, size } per file plus a top-level start_slot, end_slot, source_node identifier, and a signing key fingerprint. Verify the archive on download with a 50-line script (the Rust example above is the full version) before any pipeline runs against it. Cheap insurance.

The optional Parquet index is one row per file: slot, block_time, parent_slot, leader, tx_count, file_name. It lets you find “the block at 2025-09-01 14:00:00 UTC” without opening the tar. Worth keeping.

Tier pricing and what each one is for

Four standard tiers. Each is a one-time fee for the data plus rolling daily updates while the tier is active.

TierCoverageApprox. blocksCompressed sizePrice
30 daysLatest 30 days, rolling~13M~2 TB$500
6 monthsLatest 180 days, rolling~78M~12 TB$2,000
12 monthsLatest 365 days, rolling~156M~24 TB$3,000
GenesisSlot 0 to nowAll blocks~60 TB$5,000

Custom slot ranges (specific months for a paper, a window around a particular event) are quoted by slot count. Common ask, easy to fulfill, often cheaper than the next tier up if your range is narrow.

Bandwidth is included. Once your tier is active, signed URLs are reissuable from the dashboard so multi-region pulls don't require re-payment.

Where this fits vs Solana Foundation, AWS, and archival RPC

Four options exist for “I want raw Solana blocks.” Each is right for some workload.

Solana Foundation bigtable is the canonical archive. Free, blessed, complete. Reading it requires a GCP billing account, the bigtable API, and a tolerance for paged gRPC retrieval. If your team is already in GCP and doesn't mind bigtable, save the money on us.

AWS Public Blockchain Data ships getBlock JSON to S3 daily, free to access, S3 egress on you. Layout is theirs, slot-range targeting is on you. If you can absorb the ETL and you only need recent ranges, this is the cheapest option that exists. If you need from-genesis or a specific slot range or the bandwidth covered, the math turns.

Archival RPC providers (Helius, Triton, QuickNode) sell paginated block retrieval per call. Fine for a few thousand random old blocks. Wrong for a scan over a slot range; the per-call bill exceeds our tier price by an order of magnitude.

Us is the “just give me a signed URL to a tar of files” option. Pre-packaged by tier or custom range, manifest-verified, bandwidth included. We're not the cheapest possible source on a TB-by-TB basis; we're the cheapest source measured in “engineer hours consumed before the data is on disk.”

What teams actually do with raw blocks

Custom indexers

Backfill a Postgres or ClickHouse index from a specific slot range, then point the same indexer at a live gRPC stream once you're caught up. The historical and live paths both feed the same writer.

ML training on transaction shape

Models that classify transactions by structure rather than by parsed semantics. Sandwich detection, frontrun detection, generic anomaly detection. The parsed view is too lossy; raw blocks are the right input.

Replay and audit

Re-derive any historical state by re-running your parser against the bundle. Useful for “explain exactly where this number came from” conversations with auditors, regulators, and angry users.

Validator-economics research

Per-slot rewards and leader pubkey on every block. Stake- weight comparisons, MEV-flow attribution, and validator- performance dashboards run off this single field.

Long-tail program decoders

We don't parse every program on Solana; nobody does. For a long-tail program, raw blocks plus a custom Borsh decoder is the only way to get historical coverage. About a week of work for an experienced engineer.

Forensic incident response

When an exploit lands, the right move is to take the slot window in question, put the bundle on a forensic machine, and reconstruct the attacker path against the unmodified chain record. No third-party API in the audit trail.

Frequently asked questions

A tar.zst of raw getBlock JSON, one file per slot, plus a manifest.json with SHA-256 hashes per file and an optional Parquet index for slot-to-file lookup. Each block file is the same JSON shape the Solana RPC method getBlock returns, including transactions, transactionStatusMeta, blockhash, parent slot, rewards. We don't alter or strip the response.

Order a slot range

Pick a tier or quote a custom slot range. Bundles ship as signed download URLs with a SHA-256 manifest. Bandwidth is on us.

Ready to get started?

Get your free API key and start building in under 30 seconds.

Talk to Sales