Solana trading datasets, parsed and packaged for backtests
Per-month Parquet and CSV bundles of parsed Solana trades, pool events, mints, and transfers. Pick a program, set a duration, pay once. Download links arrive within 24 hours.
$200/mo per program. 30% off at 6 months, 50% off at 12. Drop the bundle into DuckDB or pandas in two minutes.
Build your dataset bundle
Pick programs, set duration per dataset, pay once. Download links arrive within 24 hours.
- Raydium AMM v4675kPX9MHTjS2zt1qfr1NYHuzeLXfQM9H24wFSUt1Mp8, Constant-product swaps, deposits, withdrawals, initialize2 launches. Decoded into one row per instruction
- Orca WhirlpoolwhirLbMiicVdio4qvUfM5KAg6Ct8VwpYzGff3uctyCc, CLMM swaps, openPosition, increaseLiquidity, collectFees with tick lower/upper resolved
- Meteora DLMMLBUZKhRxPF3XUpBCjp4YzTKgLccjZhTSDM9YuVaPwxo, Bin-level swaps, add_liquidity, remove_liquidity, active-bin shifts
- Jupiter Aggregator V6JUP6LkbZbjS1jKKwapdHNy74zcZ3tLUZoi5QNyVTaV4, Aggregator route execution with input mint, output mint, route length, and routed pools
- Pump.fun6EF8rrecthR5Dkzon8Nwu78hRvfCKubJ14M5uBEwF6P, Token creates, trades, bonding-curve graduations parsed into row-per-event Parquet
- SPL Token + Token-2022TokenkegQfeZyiNwAJbNbGKPFXCWuBvf9Ss623VQ5DA, Mints, burns, transfers, authority changes; both legacy and Token-2022 in one schema
Datasets in every program bundle
parsed row-per-event tables, schema documented, decimals normalized
| Event | Type | Description | Frequency | Latency |
|---|---|---|---|---|
| dex_trades | event | One row per swap across Raydium, Orca, Meteora, Jupiter, PumpSwap. Includes amount_in, amount_out, USD value, pool address, route length, signer. | Very high | — |
| pool_events | event | Pool initialize, deposit, withdraw, position open/close, fee collect. Bin and tick resolution preserved where the program supports it. | High | — |
| token_mints | event | Every new SPL or Token-2022 mint with metadata, mint authority, freeze authority, decimals, supply, and the creator wallet. | Medium | — |
| token_transfers | event | Decoded SPL transfers with sender, recipient, mint, decimals normalized, and a USD value computed off the pricing oracle nearest the slot. | Very high | — |
| pumpfun_events | event | create, trade, graduate decoded with bonding-curve reserves, virtual reserves, and SOL/token amounts in native units. | High | — |
| jupiter_routes | event | Per-aggregator-call breakdown: hops, dex names, intermediate mints, slippage realized, total fee paid by the swapper. | High | — |
| liquidity_changes | event | Net base/quote reserve deltas per pool per slot. Drives TVL backfills and impermanent-loss research. | Medium | — |
Catalog scale and pricing at a glance
last reviewed 2026-04-29
Parsed datasets vs raw getBlock JSON
The cheapest historical Solana data on the planet is AWS Public Blockchain Data. Free getBlock JSON on S3, updated daily, going back to genesis. If you have a tolerance for shell pipelines and an ETL team, you don't need us.
Most teams don't. The first wall is the IDL set: every Solana program has its own Borsh layout, and AMM v4, CLMM, and CPMM are three different layouts under the “Raydium” umbrella alone. Add Orca Whirlpool, Meteora DLMM and DAMM, Jupiter V6 inner instructions, Pump.fun's bonding curve, and you're maintaining a few thousand lines of decoder code that breaks every time someone redeploys with a new discriminator.
The second wall is decimals. Token-2022 mints can carry transfer hooks that change the effective amount; SPL Token has implicit decimal handling; PumpSwap quotes one side in lamports and the other in token base units. Normalize one wrong and your USD column is off by a factor of a thousand for an entire program.
We sell the parsed result. Trades, pools, mints, transfers, routes. Per-program-per-month Parquet, schema documented, USD-normalized, decimal-corrected, signed-URL delivered. The first day of work is opening DuckDB and writing your model query, not writing a Borsh decoder.
What's in the catalog
The catalog is split four ways. DEX, DeFi, Token/NFT, and Infrastructure. Each program ships one or more datasets; the most-bought are trades, pool_events, and token_mints.
- Raydium AMM v4, CLMM, CPMM
- Orca Whirlpool
- Meteora DLMM, DAMM, DBC
- Jupiter V6 aggregator
- Phoenix order book
- OpenBook v2
- Lifinity, Stabble, Gavel
- PumpSwap, Heaven, Boop
- Kamino lending + farms
- MarginFi v2
- Drift perps + spot
- Marinade liquid staking
- Solayer restaking
- Zeta options
- Sharky NFT lending
- SPL Token + Token-2022
- Metaplex Core + Token Metadata
- Bubblegum compressed NFTs
- Pump.fun
- Moonshot
- Virtuals
- System program transfers
- Stake program
- Address Lookup Tables
- Name Service
- Circle CCTP
- Memo
- Swig session keys
Don't see your program? It's usually a one-time ingestion to add. Tell us the program ID and which instructions you care about and we'll quote it.
Who actually buys these datasets
Quant teams running backtests
Six months of Raydium plus Meteora trades, joined to mint metadata, joined to Jupiter routes, sitting in DuckDB. Run the strategy in seconds, not over a weekend on a flaky RPC scrape.
ML training pipelines
Token-launch outcome models, rug-pull classifiers, MEV detectors. The signal lives in the parsed instructions, not in raw logs, and you don't want to spend three months building the labeler.
Research and journalism
Volume-by-DEX charts, attacker-flow tracing, exchange-deposit attribution. Three months of trades plus transfers usually covers the brief.
Tax and compliance vendors
Per-wallet trade history with cost basis sourced from the same Parquet bundle the rest of the company uses. No more disagreement between the analytics team and the compliance team about what a trade was.
Internal data warehouses
Drop the monthly bundle into Snowflake or Iceberg, replace the home-grown ingest pipeline, free up two engineers to work on the actual product. The most common reason teams renew.
Liquidity-provider analytics
Per-position P&L on Whirlpool and DLMM with realized fee collection events resolved against tick or bin movement. Hard to compute without a parsed dataset; trivial with one.
Pricing and how the discounts work
Base price is $200 a month per program. That covers every dataset for that program: trades, pool events, mints, all of it. You don't buy “trades” and “pool_events” as two SKUs, you buy the program.
| Term | Per-month rate | Effective $/program | Discount |
|---|---|---|---|
| 1 month | $200 | $200 | 0% |
| 6 months | $140 | $840 total | 30% |
| 12 months | $100 | $1,200 total | 50% |
Multiple programs stack. Most quant customers run three to five programs at the 12-month rate, which lands around $300 to $500 a month all-in for parsed data covering most of Solana DeFi. Compared to a Bitquery enterprise contract or a Dune-export pipeline, the math is unsubtle.
Custom range, custom format, redistribution license, or a program we don't list? Reach out via talk to sales. The base rate covers the standard SKU; everything else gets quoted.
Where we sit vs Bitquery, Dune, and AWS
Three competitors come up in every sales call.
Bitquery is the GraphQL incumbent for parsed Solana data. They're excellent at flexible queries with complex JOINs and very deep history. They charge per query and per dataset, which means the bill scales with how curious your analysts are. If your job is “I want to ask thirty different questions and see what sticks,” Bitquery is the right tool. If your job is “ship me Parquet I can put on disk,” we're cheaper.
Dune Analytics has Solana coverage on top of Spellbook with manual model curation. Strong for ad-hoc SQL dashboards. CSV export caps and rate limits make it painful to use as a real backfill source. Most teams query Dune for one-off charts and buy the Parquet from us for production modeling.
AWS Public Blockchain Data ships the raw getBlock JSON to S3 for free. The price is right; the parsing burden is not. Onboarding even a single AMM into a usable internal schema is a multi-engineer-month project. Worth it if you have the team. Most teams who try it end up buying parsed data from someone after about month two.
We're not the right answer for every workload. We are the right answer when you want parsed Solana trades on disk tomorrow at a flat predictable price. If that's the job, this is the cheapest path that doesn't end with you maintaining decoders.
Frequently asked questions
Related products
The full Pump.fun event history split out as a focused product. Same Parquet format.
Raw getBlock JSON when you want to do your own parsing instead of buying ours.
Pair a historical dataset with the live decoded stream for the same program. Backtest, then deploy.
Inspect the decoded instruction set for any of 37 programs before you buy a dataset.
Real-time aggregator routes. Useful overlay on the historical jupiter_routes dataset.
Download a month of Solana trades today
$200/mo per program, 30% off at 6 months, 50% off at 12. Multiple programs stack. Custom ranges and from-genesis bundles available on contract.