Backtesting Solana Trading Strategies with Historical Raw Blocks (2026)

How to backtest Solana DEX arb, token sniping, liquidation, and LP strategies using historical raw blocks and parsed trading datasets. Python, DuckDB, and Rust.

NoLimitNodes Engineering

Infrastructure Team

Jun 24, 202618 min read

On this page +

The first time we pointed a script at a month of raw blocks and ran a row count, we got 2.3 billion transactions. That was our first result and it was wrong. We went back, re-ran the count, got the same number. Started investigating. About 78% of those transactions were vote transactions: validators confirming slots, not trades. Filter those out and you're at roughly 500 million rows. That's the first thing that breaks a new backtesting setup, and it's documented right on our product page because we've seen teams spend two days debugging it before asking us.

That's one problem. The data quality problem. The infrastructure problem is different, and runs deeper.

RPC nodes cannot replay historical state. You can call getBlock and get the transaction list for any slot. You cannot call getAccountInfo for slot 240,000,000 and get what that account held at that moment. The node will return current state, every time. If your strategy depends on knowing what a lending position's health factor was three months ago, or what reserves a pool held before a large swap, RPC cannot give you that. The block archive is the only source.

CEX price feeds make this worse in a way that's easy to miss. OHLCV aggregates across order books on centralized venues. Solana DEX prices are per-swap, per-slot, including failed transactions that consume block space but don't move the price. A strategy built on Binance 1-minute candles and tested on Solana will have wrong fill prices, wrong timing, and no visibility into on-chain competition for the same order. The strategies look fine on paper. They behave differently live. That gap almost always traces back to the same handful of data errors, not the strategy logic itself.

01What's Wrong With the Data You're Probably Using#

We've debugged enough broken backtests to know which ones are common. Most are silent: no error thrown, just a wrong number that looks plausible.

CEX OHLCV vs on-chain swaps. Binance 1-minute candles aggregate across a central order book. They do not represent what actually happened in any Solana pool at any specific slot. Solana DEX trades happen at the exact slot they're included in. Multiple competing swaps can land in the same slot at slightly different prices depending on transaction ordering. A backtest that uses CEX OHLCV as its price source will undercount opportunities and miscalculate fill prices throughout.

The PumpSwap decimal trap. PumpSwap quotes one side of a swap in lamports (9 decimals) and the other in token base units (typically 6 decimals). If you compute a price ratio from raw amount_in divided by amount_out, the result is wrong by a factor of 10^(9 - token_decimals). For a standard 6-decimal token, that's a 1000x error. No exception is thrown. The number looks like a price. We've seen this go unnoticed for weeks in production code. The fix is to always use the _ui columns from NLN trading datasets, which handle this normalization before delivery.

Raydium's three-program problem. Raydium runs three separate programs: AMM v4 (675kPX9MHTjS2zt1qfr1NYHuzeLXfQM9H24wFSUt1Mp8), CLMM (CAMMCzo5YL8w4VFF8KVHrK22GGUsp5VTaW7grrKgrWqK), and CPMM (CPMMoo8L3F4NbTegBCKVNunggL7H1ZpdTHKxQB5qKP1C). A dataset built on AMM v4 only misses all concentrated liquidity pool volume and all CPMM pool volume. That's a large fraction of Raydium flow, and it's a silent gap: your arb detector runs, finds no arb on some pairs, and the reason is missing data rather than no opportunity. In NLN trading datasets the programVariant column distinguishes which program emitted each event, so all three are in the same Parquet files.

Slot skips. Solana targets 400ms per slot but validators skip slots. The index.parquet file included with the block archive has one row per slot with a tx_count column. Slots with tx_count = 0 were skipped. A simulation that assumes slot N+2 always has a block is wrong on those slots, and any fill delay logic built on that assumption will produce incorrect results during periods of high skip rate.

02What a Raw Block Actually Contains#

Every block_<slot>.json in the archive is a raw getBlock response. The fields you actually use for trading strategy work are a small subset. Most of the response is validator bookkeeping you can ignore.

The top-level fields you'll use most:

transactions[]: the full list of all transactions in the slot, vote and non-vote
blockTime: Unix timestamp (seconds), null for slots before approximately early 2020
parentSlot: the previous confirmed slot number
rewards[]: validator rewards for this slot, including the block reward and staking yields

Inside each transaction:

transaction.message.accountKeys[]: all accounts referenced, in order. Instruction data references these by index, not by address directly.
meta.preTokenBalances[] / meta.postTokenBalances[]: token balance for each (account, mint) pair before and after the transaction. Fields: accountIndex, mint, owner, uiTokenAmount (with amount, decimals, uiAmount).
meta.logMessages[]: program log output. For programs without a published IDL, this is often where decoded events live as Program log: ... strings.
meta.err: null for a successful transaction, an error object for a failed one. Failed transactions consume block space and block compute, they just don't commit their state changes.
meta.computeUnitsConsumed: useful for MEV analysis and for understanding how full a slot is.

The vote program address is Vote111111111111111111111111111111111111111. Any transaction where this address appears in transaction.message.accountKeys is a vote transaction. Filter it out before doing anything else.

filter-vote-txs.py

python

import json
import sys
from pathlib import Path

VOTE_PROGRAM = "Vote111111111111111111111111111111111111111"


def load_block(path: str) -> dict:
    with open(path) as f:
        return json.load(f)


def filter_vote_txs(block: dict) -> list[dict]:
    result = []
    for tx in block.get("transactions", []):
        accounts = tx["transaction"]["message"]["accountKeys"]
        if VOTE_PROGRAM not in accounts:
            result.append(tx)
    return result


if __name__ == "__main__":
    block = load_block(sys.argv[1])
    all_txs = block.get("transactions", [])
    non_vote = filter_vote_txs(block)

    print(f"Slot:         {block.get('parentSlot', '?') + 1}")
    print(f"Total txs:    {len(all_txs)}")
    print(f"Vote txs:     {len(all_txs) - len(non_vote)}")
    print(f"Non-vote txs: {len(non_vote)}  ({len(non_vote) / len(all_txs):.1%})")
    print(f"Failed txs:   {sum(1 for tx in non_vote if tx['meta']['err'] is not None)}")

On most days, that last print will show 15–25% of non-vote transactions failed. Failed transactions are real: they tried, they consumed compute, they just didn't commit. For sniping and arb backtests, the competition you're modeling is partly those failed transactions.

03Two Paths to the Data#

The block archive and the trading datasets solve different problems, and which one you start with determines how fast you get to a first result.

Historical Raw Blocks is the complete getBlock archive from genesis to the previous UTC midnight. Files arrive as a tar.zst archive via signed URL, extracted to flat block_<slot>.json files with no nested directories. The archive includes a manifest.json (start/end slot, file count, total size, SHA-256 per file) and an optional index.parquet (one row per slot: slot, block_time, parent_slot, leader, tx_count, file_name). The last 30 days is roughly 2TB; 6 months is around 12TB.

Trading Datasets are pre-parsed Parquet files across 40+ programs, one file per day, 4 to 12 GB compressed per program per month. They are DuckDB-ready out of the box with _ui normalized columns, pre-computed usd_value, and the programVariant tag on multi-program protocols. No extraction step: DuckDB reads the Parquet directly from the tar.zst via its built-in reader.

	Raw Blocks	Trading Datasets
Strategy on unsupported program	Yes	No
Custom parsing logic	Yes	No
Direct DuckDB query	No	Yes
_ui pre-normalized columns	No	Yes
USD values pre-computed	No	Yes
Failed transactions visible	Yes	No
Storage for 30 days	~2TB	4–12 GB/program
Raydium three-program coverage	Parse all three	Included

Two paths to historical on-chain data. Storage numbers are approximate; trading dataset size varies by program activity.

For teams starting out: if your strategy involves a program listed in the NLN trading datasets, start there. You'll be running DuckDB queries in an hour instead of writing a parser. Use raw blocks when you're working on a protocol not in the covered list, or when the failed-transaction context matters for your strategy.

verify-archive.py

python

import hashlib
import json
from pathlib import Path


def verify_extracted_blocks(manifest_path: str, blocks_dir: str) -> None:
    with open(manifest_path) as f:
        manifest = json.load(f)

    blocks = Path(blocks_dir)
    failed = []
    verified = 0

    for entry in manifest["files"]:
        file_path = blocks / entry["file_name"]

        if not file_path.exists():
            failed.append(f"missing: {entry['file_name']}")
            continue

        h = hashlib.sha256()
        with open(file_path, "rb") as f:
            for chunk in iter(lambda: f.read(65536), b""):
                h.update(chunk)

        if h.hexdigest() != entry["sha256"]:
            failed.append(f"hash mismatch: {entry['file_name']}")
        else:
            verified += 1

    if failed:
        raise ValueError(f"Verification failed ({len(failed)} files):\n" + "\n".join(failed))

    print(f"Verified {verified} block files")
    print(f"Slot range: {manifest['start_slot']} - {manifest['end_slot']}")

The index.parquet is worth loading before anything else when you're working with a large date range. Filter by tx_count > 0 to skip empty slots before you build any time-series logic. Skipped slots with tx_count = 0 are real slot numbers in the sequence but produced no block.

04Strategy 1: DEX Arbitrage Between Raydium and Orca#

Raydium runs three programs. Most arb detectors are built on AMM v4 only, which means they miss all CLMM and CPMM volume. An arb that looks nonexistent on AMM v4 data may be live on CLMM. The backtest will never show it. We ran the simulation without the programVariant filter on the first pass. Results looked promising. Turned out a significant portion of the flagged opportunities were on CLMM pools, and AMM v4 can't fill those. Filter by programVariant before computing anything.

The strategy: when the same token pair trades at different implied prices on Raydium and Orca in the same slot, buy the cheaper side and sell the other. The minimum spread that matters is the combined fee floor: 0.25% Raydium AMM v4 plus 0.30% Orca Whirlpool equals 0.55%. Anything below that is noise.

load-dex-tables.sql

sql

-- Load Raydium dex_trades, all three programs
CREATE OR REPLACE TABLE raydium AS
SELECT
    slot,
    block_time,
    pool_address,
    token_in,
    token_out,
    amount_in_ui,
    amount_out_ui,
    programVariant,
    signer
FROM read_parquet('raydium_dex_trades_2024*.parquet')
WHERE amount_in_ui > 0
  AND programVariant IN ('amm_v4', 'clmm', 'cpmm');

-- Load Orca Whirlpool dex_trades
CREATE OR REPLACE TABLE orca AS
SELECT
    slot,
    block_time,
    pool_address,
    token_in,
    token_out,
    amount_in_ui,
    amount_out_ui,
    signer
FROM read_parquet('orca_dex_trades_2024*.parquet')
WHERE amount_in_ui > 0;

-- Implied price for each swap: units of token_out per token_in
CREATE OR REPLACE TABLE raydium_prices AS
SELECT *, amount_out_ui / amount_in_ui AS implied_price
FROM raydium;

CREATE OR REPLACE TABLE orca_prices AS
SELECT *, amount_out_ui / amount_in_ui AS implied_price
FROM orca;

detect-arb.sql

sql

-- Find slots where the same token pair had >0.55% spread across DEXes
-- 0.55% = 0.25% Raydium fee + 0.30% Orca fee (minimum to profit after fees)
SELECT
    r.slot,
    r.block_time,
    r.token_in,
    r.token_out,
    r.programVariant AS raydium_variant,
    r.implied_price  AS raydium_price,
    o.implied_price  AS orca_price,
    (r.implied_price - o.implied_price) / o.implied_price AS spread_pct,
    r.pool_address   AS raydium_pool,
    o.pool_address   AS orca_pool
FROM raydium_prices r
JOIN orca_prices o
    ON  r.slot      = o.slot
    AND r.token_in  = o.token_in
    AND r.token_out = o.token_out
WHERE ABS((r.implied_price - o.implied_price) / o.implied_price) > 0.0055
ORDER BY ABS(spread_pct) DESC;

The result set here is the raw opportunity distribution: how often the spread existed, how wide it was, and which pools it appeared on. This is not the same as profitability. The next step introduces fill delay.

simulate-arb.py

python

import duckdb
from dataclasses import dataclass

FILL_DELAY_SLOTS = 2   # 1 slot detection + 1 slot bundle inclusion
RAYDIUM_FEE = 0.0025   # 0.25%
ORCA_FEE = 0.0030      # 0.30%
MIN_TRADE_USD = 100    # ignore tiny opportunities


@dataclass
class ArbOpportunity:
    slot: int
    token_in: str
    token_out: str
    raydium_price: float
    orca_price: float
    spread_pct: float
    raydium_pool: str
    orca_pool: str


@dataclass
class SimResult:
    total_opportunities: int
    profitable_after_delay: int
    avg_spread_at_signal: float
    avg_spread_at_fill: float


def simulate_arb(opportunities: list[ArbOpportunity], price_at_slot: dict) -> SimResult:
    profitable = 0
    signal_spreads = []
    fill_spreads = []

    for opp in opportunities:
        fill_slot = opp.slot + FILL_DELAY_SLOTS
        if fill_slot not in price_at_slot:
            continue

        raydium_fill = price_at_slot[fill_slot].get(opp.raydium_pool)
        orca_fill = price_at_slot[fill_slot].get(opp.orca_pool)

        if raydium_fill is None or orca_fill is None:
            continue

        fill_spread = (raydium_fill - orca_fill) / orca_fill
        net_fill_spread = fill_spread - RAYDIUM_FEE - ORCA_FEE

        signal_spreads.append(opp.spread_pct)
        fill_spreads.append(fill_spread)

        if net_fill_spread > 0:
            profitable += 1

    return SimResult(
        total_opportunities=len(opportunities),
        profitable_after_delay=profitable,
        avg_spread_at_signal=sum(signal_spreads) / len(signal_spreads) if signal_spreads else 0,
        avg_spread_at_fill=sum(fill_spreads) / len(fill_spreads) if fill_spreads else 0,
    )

The 2-slot delay is the honest assumption for a bot not co-located with the validator. On our Frankfurt bare metal, we see co-located bots operating closer to 1-slot lag. Both numbers are worth running through the simulation: the delta between them is the value of co-location for this specific strategy and market period.

What this simulation tells you is where the edge exists and what size requires it to be worth trading. What it does not tell you is that you'll capture it: Jito bundles mean other bots are bidding for the same slots.

05Strategy 2: Token Launch Sniping on PumpFun#

The strategy: detect new token launches at the first create instruction, enter at the bonding curve opening price, exit at a configured slot window. Price reconstruction uses virtual_sol_reserves / virtual_token_reserves. Not the real reserves. The virtual values include an offset that stabilizes the curve at low liquidity, and using real reserves gives you a different (wrong) number.

extract-pumpfun-creates.py

python

import hashlib
import json
import base64
from pathlib import Path


# Anchor discriminator for the 'create' instruction
CREATE_DISCRIMINATOR = hashlib.sha256(b"global:create").digest()[:8]
PUMPFUN_PROGRAM = "6EF8rrecthR5Dkzon8Nwu78hRvfCKubJ14M5uBEwF6P"


def decode_create_instruction(ix_data: bytes) -> dict | None:
    if len(ix_data) < 8:
        return None
    if ix_data[:8] != CREATE_DISCRIMINATOR:
        return None
    # Fields after discriminator: name (string), symbol (string), uri (string)
    # Then remaining bonding curve state encoded by Anchor
    return {"raw_ix_data": ix_data[8:].hex()}


def extract_pumpfun_creates(block: dict) -> list[dict]:
    slot = block.get("parentSlot", 0) + 1
    creates = []

    for tx_idx, tx in enumerate(block.get("transactions", [])):
        if tx["meta"]["err"] is not None:
            continue  # skip failed transactions

        accounts = tx["transaction"]["message"]["accountKeys"]
        if PUMPFUN_PROGRAM not in accounts:
            continue

        for ix in tx["transaction"]["message"]["instructions"]:
            program = accounts[ix["programIdIndex"]]
            if program != PUMPFUN_PROGRAM:
                continue

            data = base64.b64decode(ix["data"])
            decoded = decode_create_instruction(data)
            if decoded is None:
                continue

            creates.append({
                "slot": slot,
                "tx_index": tx_idx,
                "signature": tx["transaction"]["signatures"][0],
                "mint": accounts[1],  # mint account is second in PumpFun create
                "creator": accounts[7],  # creator wallet
                "raw_ix": decoded["raw_ix_data"],
            })

    return creates

The raw block path gives you something the trading datasets don't: the failed competing transactions. When a new token launches and 50 bots try to snipe it simultaneously, most of those attempts fail. The raw block shows all of them. That competition context matters for evaluating how realistic your simulated entry actually was.

simulate-launches.sql

sql

-- Load pumpfun_events Parquet, exclude mayhem mode, reconstruct price path
CREATE OR REPLACE TABLE pf_creates AS
SELECT
    slot AS create_slot,
    mint,
    creator,
    virtual_sol_reserves,
    virtual_token_reserves,
    real_token_reserves,
    virtual_sol_reserves / virtual_token_reserves AS opening_price,
    block_time AS create_time
FROM read_parquet('pumpfun_events_creates_2024*.parquet')
WHERE is_mayhem_mode = false;

CREATE OR REPLACE TABLE pf_trades AS
SELECT
    slot,
    mint,
    is_buy,
    sol_amount,
    token_amount,
    virtual_sol_reserves,
    virtual_token_reserves,
    virtual_sol_reserves / virtual_token_reserves AS price_at_trade
FROM read_parquet('pumpfun_events_trades_2024*.parquet');

-- For each launch, compute entry and exit price at configurable slot windows
-- Entry: first trade after create. Exit: price at N slots after create.
SELECT
    c.mint,
    c.create_slot,
    c.opening_price,
    first_trade.price_at_trade  AS entry_price,
    first_trade.slot            AS entry_slot,
    exit_15.price_at_trade      AS exit_price_15slot,
    exit_30.price_at_trade      AS exit_price_30slot,
    exit_50.price_at_trade      AS exit_price_50slot,
    (exit_15.price_at_trade - first_trade.price_at_trade) / first_trade.price_at_trade AS pnl_15slot,
    (exit_30.price_at_trade - first_trade.price_at_trade) / first_trade.price_at_trade AS pnl_30slot,
    (exit_50.price_at_trade - first_trade.price_at_trade) / first_trade.price_at_trade AS pnl_50slot
FROM pf_creates c
LEFT JOIN LATERAL (
    SELECT price_at_trade, slot
    FROM pf_trades
    WHERE mint = c.mint AND slot >= c.create_slot AND is_buy = true
    ORDER BY slot ASC
    LIMIT 1
) first_trade ON true
LEFT JOIN LATERAL (
    SELECT price_at_trade
    FROM pf_trades
    WHERE mint = c.mint AND slot BETWEEN c.create_slot + 13 AND c.create_slot + 17
    ORDER BY ABS(slot - (c.create_slot + 15)) ASC
    LIMIT 1
) exit_15 ON true
LEFT JOIN LATERAL (
    SELECT price_at_trade
    FROM pf_trades
    WHERE mint = c.mint AND slot BETWEEN c.create_slot + 27 AND c.create_slot + 33
    ORDER BY ABS(slot - (c.create_slot + 30)) ASC
    LIMIT 1
) exit_30 ON true
LEFT JOIN LATERAL (
    SELECT price_at_trade
    FROM pf_trades
    WHERE mint = c.mint AND slot BETWEEN c.create_slot + 45 AND c.create_slot + 55
    ORDER BY ABS(slot - (c.create_slot + 50)) ASC
    LIMIT 1
) exit_50 ON true
WHERE first_trade.slot IS NOT NULL;

Run this across three months of launches and look at the pnl_15slot distribution. Most launches go to zero. The ones that don't tend to cluster in specific market conditions: high-volume days, particular launch patterns. That clustering is the actual backtest finding. Not a win rate, but a set of conditions worth filtering on before deploying capital.

06Strategy 3: Liquidation Windows on Kamino#

Most liquidatable positions on Kamino clear within a few slots of becoming eligible. The ones that sit underwater for 10 or more slots are where slower bots still find fills. The backtest tells you which regime you're operating in: competitive (sub-5-slot clearance) or not. That answer changes the infrastructure requirements before you write a line of live code.

Price reconstruction is the part that trips people. The usd_value column in NLN trading datasets is pre-computed from the nearest oracle to each slot, with the source recorded in price_source for auditability. Join position state against that column directly. No need to reconstruct oracle prices yourself. The dataset already did it. The liquidation threshold varies by collateral type; Kamino's risk config publishes the per-asset values and they're worth pulling rather than hardcoding a placeholder.

health-factor.sql

sql

-- Assumes Kamino lending position updates and liquidation events
-- are in separate tables from the trading datasets.
-- usd_value in dex_trades provides slot-level price for collateral tokens.

CREATE OR REPLACE TABLE position_health AS
SELECT
    p.slot,
    p.block_time,
    p.position_address,
    p.owner,
    p.collateral_mint,
    p.collateral_amount_ui,
    p.debt_amount_ui,
    p.debt_mint,
    price.usd_value / p.collateral_amount_ui AS collateral_price_usd,
    debt_price.usd_value / p.debt_amount_ui   AS debt_price_usd,
    -- Health factor = (collateral_usd * liquidation_threshold) / debt_usd
    -- Liquidation threshold varies by asset; using 0.80 as example
    (p.collateral_amount_ui * (price.usd_value / p.collateral_amount_ui) * 0.80)
        / (p.debt_amount_ui * (debt_price.usd_value / p.debt_amount_ui)) AS health_factor
FROM kamino_positions p
LEFT JOIN (
    SELECT slot, usd_value
    FROM read_parquet('dex_trades_*.parquet')
    WHERE token_in = p.collateral_mint OR token_out = p.collateral_mint
) price ON price.slot = p.slot
LEFT JOIN (
    SELECT slot, usd_value
    FROM read_parquet('dex_trades_*.parquet')
    WHERE token_in = p.debt_mint OR token_out = p.debt_mint
) debt_price ON debt_price.slot = p.slot;

-- Compute lag between liquidation-eligible and liquidation-executed
SELECT
    h.position_address,
    h.slot                AS eligible_slot,
    h.health_factor,
    liq.slot              AS liquidation_slot,
    liq.slot - h.slot     AS lag_slots,
    (liq.slot - h.slot) * 0.4 AS lag_seconds_approx
FROM position_health h
JOIN kamino_liquidations liq ON liq.position_address = h.position_address
WHERE h.health_factor < 1.0
  AND liq.slot >= h.slot
  AND h.slot = (
      SELECT MIN(slot) FROM position_health ph
      WHERE ph.position_address = h.position_address
        AND ph.health_factor < 1.0
  )
ORDER BY lag_slots ASC;

Run this on a high-volatility day and compare it to a quiet day. The lag distribution shifts. During sharp price moves, positions go underwater and clear within 1–2 slots. During slow periods the same positions can sit for 20+ slots. If your bot can only react in 3 slots, you're relevant in the second regime, not the first. The backtest tells you which one was more common in your target period.

07Strategy 4: LP Position P&L on Orca Whirlpool#

The fee income numbers from this simulation are an upper bound. JIT liquidity bots add and remove positions within a single slot, capturing fee income on high-volume swaps before IL accumulates. Those positions don't appear cleanly in pool_events because the open and close happen in the same block. The simulation sees the swap volume but misses those bot positions. Your passive LP P&L estimate will be higher than what a passive position would have actually earned.

With that in mind: for a given tick range, compute fee income earned while in range minus IL from price divergence. The pool_events table preserves tick and bin resolution. Use amount_in_ui / amount_out_ui from dex_trades for IL calculation. The decimal normalization trap applies here the same way it applies to price ratios.

lp-pnl.sql

sql

-- Simulate LP position P&L for a given tick range and holding period
-- Parameters: tick_lower = -100, tick_upper = 100, pool, entry_slot, exit_slot

WITH
-- Price path from swaps through the pool
price_path AS (
    SELECT
        slot,
        amount_out_ui / amount_in_ui AS price,
        amount_in_ui                 AS volume_in
    FROM read_parquet('orca_dex_trades_*.parquet')
    WHERE pool_address = '<<your_pool>>'
      AND slot BETWEEN <<entry_slot>> AND <<exit_slot>>
      AND amount_in_ui > 0
),

-- Entry and exit price
entry_price AS (SELECT price FROM price_path ORDER BY slot ASC  LIMIT 1),
exit_price  AS (SELECT price FROM price_path ORDER BY slot DESC LIMIT 1),

-- Slots where price was within tick range
-- Tick range converts to price via: price = 1.0001^tick
in_range_slots AS (
    SELECT slot, price, volume_in
    FROM price_path
    WHERE price BETWEEN pow(1.0001, -100) AND pow(1.0001, 100)
),

-- Total volume through pool in-range (proportional fee estimate)
-- Assumes position represents p% of pool liquidity
total_in_range_volume AS (SELECT SUM(volume_in) AS total_vol FROM in_range_slots),

-- Fee income (Orca default: 0.30% per swap through in-range liquidity)
fee_income AS (
    SELECT total_vol * 0.003 AS fees_earned
    FROM total_in_range_volume
),

-- Impermanent loss from entry to exit price
-- IL = 2 * sqrt(price_ratio) / (1 + price_ratio) - 1
il_calc AS (
    SELECT
        (SELECT price FROM exit_price) / (SELECT price FROM entry_price) AS price_ratio
),
impermanent_loss AS (
    SELECT 2 * sqrt(price_ratio) / (1 + price_ratio) - 1 AS il
    FROM il_calc
)

SELECT
    (SELECT price FROM entry_price)      AS entry_price,
    (SELECT price FROM exit_price)       AS exit_price,
    (SELECT fees_earned FROM fee_income) AS fee_income,
    (SELECT il FROM impermanent_loss)    AS impermanent_loss,
    (SELECT fees_earned FROM fee_income) + (SELECT il FROM impermanent_loss) AS net_pnl,
    (SELECT COUNT(*) FROM in_range_slots) AS slots_in_range,
    (SELECT COUNT(*) FROM price_path)     AS total_slots
;

The result shows fee income minus IL for that tick range over the holding period. Run it across several ranges to find where the fee-to-IL ratio was historically best.

08From DuckDB to Production: A Rust Scaffold#

If fill_delay_slots = 2 makes the strategy unprofitable in the Rust scaffold, the strategy is probably dead. The scaffold exists to kill strategies that only work on idealized timing assumptions, not to validate them. Run it at delay = 1, delay = 2, delay = 3. If the P&L collapses at delay = 2, you need co-location or a different strategy. Better to find that out here than six months into a live deployment.

DuckDB analysis implicitly assumes your fill always lands exactly N slots after signal. The Rust engine makes that assumption explicit and lets you stress-test it. That's the only reason to write it: not to replicate the DuckDB logic in Rust, but to force a commitment to the latency parameter and see what the edge profile looks like under different values.

backtest/src/engine.rs

rust

use std::collections::BTreeMap;
use std::path::PathBuf;
use std::fs;
use serde_json::Value;

#[derive(Debug, Clone)]
pub struct Block {
    pub slot: u64,
    pub block_time: Option<i64>,
    pub transactions: Vec<Value>,
}

#[derive(Debug, Clone, PartialEq)]
pub enum Side { Buy, Sell }

#[derive(Debug, Clone)]
pub struct Order {
    pub pool: String,
    pub side: Side,
    pub amount_ui: f64,
    pub signal_slot: u64,
}

pub trait BlockSource {
    fn next_block(&mut self) -> Option<Block>;
}

pub trait Strategy {
    fn on_block(&mut self, block: &Block) -> Vec<Order>;
}

/// Reads block_<slot>.json files from a local archive directory in slot order.
pub struct ArchiveSource {
    slot_index: BTreeMap<u64, PathBuf>,
    cursor: std::collections::btree_map::IntoIter<u64, PathBuf>,
}

impl ArchiveSource {
    pub fn new(archive_dir: &str, start_slot: u64, end_slot: u64) -> Self {
        let mut index = BTreeMap::new();

        let entries = fs::read_dir(archive_dir).expect("failed to read archive dir");
        for entry in entries.flatten() {
            let name = entry.file_name();
            let name_str = name.to_string_lossy();
            if let Some(slot_str) = name_str.strip_prefix("block_").and_then(|s| s.strip_suffix(".json")) {
                if let Ok(slot) = slot_str.parse::<u64>() {
                    if slot >= start_slot && slot <= end_slot {
                        index.insert(slot, entry.path());
                    }
                }
            }
        }

        let iter = index.into_iter();
        ArchiveSource { slot_index: BTreeMap::new(), cursor: iter }
    }
}

impl BlockSource for ArchiveSource {
    fn next_block(&mut self) -> Option<Block> {
        let (slot, path) = self.cursor.next()?;
        let raw = fs::read_to_string(&path).ok()?;
        let data: Value = serde_json::from_str(&raw).ok()?;

        Some(Block {
            slot,
            block_time: data["blockTime"].as_i64(),
            transactions: data["transactions"].as_array().cloned().unwrap_or_default(),
        })
    }
}

backtest/src/simulator.rs

rust

use std::collections::VecDeque;

#[derive(Debug)]
pub struct SimResult {
    pub final_pnl: f64,
    pub filled_orders: usize,
    pub final_slot: u64,
}

pub struct Simulator {
    pub fill_delay_slots: u64,
    pending: VecDeque<(u64, Order)>,  // (fill_slot, order)
}

impl Simulator {
    pub fn new(fill_delay_slots: u64) -> Self {
        Simulator { fill_delay_slots, pending: VecDeque::new() }
    }

    pub fn run(
        &mut self,
        source: &mut impl BlockSource,
        strategy: &mut impl Strategy,
        price_oracle: &impl Fn(u64, &str) -> Option<f64>,
    ) -> SimResult {
        let mut pnl: f64 = 0.0;
        let mut filled: usize = 0;
        let mut last_slot: u64 = 0;

        loop {
            let Some(block) = source.next_block() else { break };
            last_slot = block.slot;

            // Fill any pending orders whose fill_slot has arrived
            while let Some(&(fill_slot, _)) = self.pending.front() {
                if fill_slot > block.slot { break; }
                let (_, order) = self.pending.pop_front().unwrap();
                if let Some(fill_price) = price_oracle(block.slot, &order.pool) {
                    let trade_pnl = match order.side {
                        Side::Buy  => -fill_price * order.amount_ui,
                        Side::Sell =>  fill_price * order.amount_ui,
                    };
                    pnl += trade_pnl;
                    filled += 1;
                }
            }

            // Ask strategy for new orders
            let orders = strategy.on_block(&block);
            for order in orders {
                let fill_slot = block.slot + self.fill_delay_slots;
                self.pending.push_back((fill_slot, order));
            }
        }

        SimResult { final_pnl: pnl, filled_orders: filled, final_slot: last_slot }
    }
}

This scaffold is not production code: it has no Jito bundle modeling, no position limits, no P&L accounting per-strategy. Those come later. The value here is that it forces you to commit to a fill_delay_slots number and see what the strategy earns under that constraint across a real block range.

For teams who want to run this against a live stream after the backtest passes, the same Geyser plugin infrastructure that powers the archive is what generates the live data. The BlockSource trait swaps from ArchiveSource to a Geyser stream without changing the strategy code.

09What the Backtest Can't Tell You#

Backtests find edges. They do not tell you whether you'll capture them.

The biggest gap is MEV competition. Your simulation assumes you are the only bot. In live trading, Jito bundle ordering means your fill price depends on what other searchers bid for priority in the same slot. A 2-slot fill delay in backtest is a parameter you set. In live trading it's the outcome of a priority fee auction you can't model from historical data.

JIT liquidity is the second gap, specific to LP simulations. Concentrated liquidity bots add and remove positions within a single slot and those positions don't appear cleanly in pool_events. The fee income estimate for a passive position is an upper bound because you're attributing fees to yourself that JIT bots captured in practice.

Slot skips affect any simulation that assumes slot N+2 always exists. The index.parquet shows tx_count = 0 for skipped slots. Check skip rate in your target date range before drawing conclusions about fill latency. It shifts with network conditions.

Geyser-to-submission latency is the one we see most often mismodeled. A 2-slot detection-to-fill delay in backtest often becomes 3 slots or more in production because Geyser stream subscription lag isn't accounted for. On co-located bare metal in Frankfurt, where the validator and your bot share a subnet, this gap is smallest. On a remote VPS it can exceed a full slot.

What the backtest does tell you: whether an edge exists at all, what size makes it worth trading, and which market conditions it survives. We've seen teams skip it and spend months debugging live performance only to discover the edge was never there in the historical data.

The historical data is already there. If you're building a strategy on Solana and you haven't tested it against real on-chain data at scale, you're skipping the most honest feedback loop available. The NLN Historical Raw Blocks archive covers genesis to yesterday, delivered as signed URLs within 24 hours. If you want to skip the raw parsing and start with DuckDB queries today, the NLN Trading Datasets cover Raydium, Orca, PumpFun, Kamino, and 40+ other programs with normalized columns and pre-computed prices.

Get access · Talk to an engineer

///Read next

EngineeringJun 25, 2026

PumpFun Data Analysis: Graduation Rates, Creator Wallets, and Bonding Curve Price Reconstruction (2026)

Analyze PumpFun launch data: graduation rates, creator wallet clustering, bonding curve price reconstruction, is_mayhem_mode, DuckDB queries.

#pumpfun#solana#duckdb

14 min read

EngineeringJun 18, 2026

Yellowstone gRPC Providers Compared (2026): Latency, Decoded Streams & What Nobody Tells You Before You Go Live

Triton, Helius, Alchemy, Chainstack, QuickNode, and NoLimitNodes: latency tables, buffer depth, decoded vs raw streams, and the three things that silently break gRPC consumers at 3am.

#yellowstone#grpc#streaming

12 min read

Run it yourself

Every benchmark in this blog runs against our public endpoints.

Spin up an RPC, WebSocket, or gRPC endpoint in under a minute. Flat pricing, no request caps. Reproduce the numbers for your own workload.

See pricing

Backtesting Solana Trading Strategies with Historical Raw Blocks (2026)

How to backtest Solana DEX arb, token sniping, liquidation, and LP strategies using historical raw blocks and parsed trading datasets. Python, DuckDB, and Rust.

NoLimitNodes Engineering

Infrastructure Team

Jun 24, 202618 min read

On this page +

That's one problem. The data quality problem. The infrastructure problem is different, and runs deeper.

01What's Wrong With the Data You're Probably Using#

We've debugged enough broken backtests to know which ones are common. Most are silent: no error thrown, just a wrong number that looks plausible.

02What a Raw Block Actually Contains#

The top-level fields you'll use most:

transactions[]: the full list of all transactions in the slot, vote and non-vote
blockTime: Unix timestamp (seconds), null for slots before approximately early 2020
parentSlot: the previous confirmed slot number
rewards[]: validator rewards for this slot, including the block reward and staking yields

Inside each transaction:

transaction.message.accountKeys[]: all accounts referenced, in order. Instruction data references these by index, not by address directly.
meta.preTokenBalances[] / meta.postTokenBalances[]: token balance for each (account, mint) pair before and after the transaction. Fields: accountIndex, mint, owner, uiTokenAmount (with amount, decimals, uiAmount).
meta.logMessages[]: program log output. For programs without a published IDL, this is often where decoded events live as Program log: ... strings.
meta.err: null for a successful transaction, an error object for a failed one. Failed transactions consume block space and block compute, they just don't commit their state changes.
meta.computeUnitsConsumed: useful for MEV analysis and for understanding how full a slot is.

filter-vote-txs.py

python

import json
import sys
from pathlib import Path

VOTE_PROGRAM = "Vote111111111111111111111111111111111111111"


def load_block(path: str) -> dict:
    with open(path) as f:
        return json.load(f)


def filter_vote_txs(block: dict) -> list[dict]:
    result = []
    for tx in block.get("transactions", []):
        accounts = tx["transaction"]["message"]["accountKeys"]
        if VOTE_PROGRAM not in accounts:
            result.append(tx)
    return result


if __name__ == "__main__":
    block = load_block(sys.argv[1])
    all_txs = block.get("transactions", [])
    non_vote = filter_vote_txs(block)

    print(f"Slot:         {block.get('parentSlot', '?') + 1}")
    print(f"Total txs:    {len(all_txs)}")
    print(f"Vote txs:     {len(all_txs) - len(non_vote)}")
    print(f"Non-vote txs: {len(non_vote)}  ({len(non_vote) / len(all_txs):.1%})")
    print(f"Failed txs:   {sum(1 for tx in non_vote if tx['meta']['err'] is not None)}")

03Two Paths to the Data#

The block archive and the trading datasets solve different problems, and which one you start with determines how fast you get to a first result.

	Raw Blocks	Trading Datasets
Strategy on unsupported program	Yes	No
Custom parsing logic	Yes	No
Direct DuckDB query	No	Yes
_ui pre-normalized columns	No	Yes
USD values pre-computed	No	Yes
Failed transactions visible	Yes	No
Storage for 30 days	~2TB	4–12 GB/program
Raydium three-program coverage	Parse all three	Included

Two paths to historical on-chain data. Storage numbers are approximate; trading dataset size varies by program activity.

verify-archive.py

python

import hashlib
import json
from pathlib import Path


def verify_extracted_blocks(manifest_path: str, blocks_dir: str) -> None:
    with open(manifest_path) as f:
        manifest = json.load(f)

    blocks = Path(blocks_dir)
    failed = []
    verified = 0

    for entry in manifest["files"]:
        file_path = blocks / entry["file_name"]

        if not file_path.exists():
            failed.append(f"missing: {entry['file_name']}")
            continue

        h = hashlib.sha256()
        with open(file_path, "rb") as f:
            for chunk in iter(lambda: f.read(65536), b""):
                h.update(chunk)

        if h.hexdigest() != entry["sha256"]:
            failed.append(f"hash mismatch: {entry['file_name']}")
        else:
            verified += 1

    if failed:
        raise ValueError(f"Verification failed ({len(failed)} files):\n" + "\n".join(failed))

    print(f"Verified {verified} block files")
    print(f"Slot range: {manifest['start_slot']} - {manifest['end_slot']}")

04Strategy 1: DEX Arbitrage Between Raydium and Orca#

load-dex-tables.sql

sql

-- Load Raydium dex_trades, all three programs
CREATE OR REPLACE TABLE raydium AS
SELECT
    slot,
    block_time,
    pool_address,
    token_in,
    token_out,
    amount_in_ui,
    amount_out_ui,
    programVariant,
    signer
FROM read_parquet('raydium_dex_trades_2024*.parquet')
WHERE amount_in_ui > 0
  AND programVariant IN ('amm_v4', 'clmm', 'cpmm');

-- Load Orca Whirlpool dex_trades
CREATE OR REPLACE TABLE orca AS
SELECT
    slot,
    block_time,
    pool_address,
    token_in,
    token_out,
    amount_in_ui,
    amount_out_ui,
    signer
FROM read_parquet('orca_dex_trades_2024*.parquet')
WHERE amount_in_ui > 0;

-- Implied price for each swap: units of token_out per token_in
CREATE OR REPLACE TABLE raydium_prices AS
SELECT *, amount_out_ui / amount_in_ui AS implied_price
FROM raydium;

CREATE OR REPLACE TABLE orca_prices AS
SELECT *, amount_out_ui / amount_in_ui AS implied_price
FROM orca;

detect-arb.sql

sql

-- Find slots where the same token pair had >0.55% spread across DEXes
-- 0.55% = 0.25% Raydium fee + 0.30% Orca fee (minimum to profit after fees)
SELECT
    r.slot,
    r.block_time,
    r.token_in,
    r.token_out,
    r.programVariant AS raydium_variant,
    r.implied_price  AS raydium_price,
    o.implied_price  AS orca_price,
    (r.implied_price - o.implied_price) / o.implied_price AS spread_pct,
    r.pool_address   AS raydium_pool,
    o.pool_address   AS orca_pool
FROM raydium_prices r
JOIN orca_prices o
    ON  r.slot      = o.slot
    AND r.token_in  = o.token_in
    AND r.token_out = o.token_out
WHERE ABS((r.implied_price - o.implied_price) / o.implied_price) > 0.0055
ORDER BY ABS(spread_pct) DESC;

simulate-arb.py

python

import duckdb
from dataclasses import dataclass

FILL_DELAY_SLOTS = 2   # 1 slot detection + 1 slot bundle inclusion
RAYDIUM_FEE = 0.0025   # 0.25%
ORCA_FEE = 0.0030      # 0.30%
MIN_TRADE_USD = 100    # ignore tiny opportunities


@dataclass
class ArbOpportunity:
    slot: int
    token_in: str
    token_out: str
    raydium_price: float
    orca_price: float
    spread_pct: float
    raydium_pool: str
    orca_pool: str


@dataclass
class SimResult:
    total_opportunities: int
    profitable_after_delay: int
    avg_spread_at_signal: float
    avg_spread_at_fill: float


def simulate_arb(opportunities: list[ArbOpportunity], price_at_slot: dict) -> SimResult:
    profitable = 0
    signal_spreads = []
    fill_spreads = []

    for opp in opportunities:
        fill_slot = opp.slot + FILL_DELAY_SLOTS
        if fill_slot not in price_at_slot:
            continue

        raydium_fill = price_at_slot[fill_slot].get(opp.raydium_pool)
        orca_fill = price_at_slot[fill_slot].get(opp.orca_pool)

        if raydium_fill is None or orca_fill is None:
            continue

        fill_spread = (raydium_fill - orca_fill) / orca_fill
        net_fill_spread = fill_spread - RAYDIUM_FEE - ORCA_FEE

        signal_spreads.append(opp.spread_pct)
        fill_spreads.append(fill_spread)

        if net_fill_spread > 0:
            profitable += 1

    return SimResult(
        total_opportunities=len(opportunities),
        profitable_after_delay=profitable,
        avg_spread_at_signal=sum(signal_spreads) / len(signal_spreads) if signal_spreads else 0,
        avg_spread_at_fill=sum(fill_spreads) / len(fill_spreads) if fill_spreads else 0,
    )

05Strategy 2: Token Launch Sniping on PumpFun#

extract-pumpfun-creates.py

python

import hashlib
import json
import base64
from pathlib import Path


# Anchor discriminator for the 'create' instruction
CREATE_DISCRIMINATOR = hashlib.sha256(b"global:create").digest()[:8]
PUMPFUN_PROGRAM = "6EF8rrecthR5Dkzon8Nwu78hRvfCKubJ14M5uBEwF6P"


def decode_create_instruction(ix_data: bytes) -> dict | None:
    if len(ix_data) < 8:
        return None
    if ix_data[:8] != CREATE_DISCRIMINATOR:
        return None
    # Fields after discriminator: name (string), symbol (string), uri (string)
    # Then remaining bonding curve state encoded by Anchor
    return {"raw_ix_data": ix_data[8:].hex()}


def extract_pumpfun_creates(block: dict) -> list[dict]:
    slot = block.get("parentSlot", 0) + 1
    creates = []

    for tx_idx, tx in enumerate(block.get("transactions", [])):
        if tx["meta"]["err"] is not None:
            continue  # skip failed transactions

        accounts = tx["transaction"]["message"]["accountKeys"]
        if PUMPFUN_PROGRAM not in accounts:
            continue

        for ix in tx["transaction"]["message"]["instructions"]:
            program = accounts[ix["programIdIndex"]]
            if program != PUMPFUN_PROGRAM:
                continue

            data = base64.b64decode(ix["data"])
            decoded = decode_create_instruction(data)
            if decoded is None:
                continue

            creates.append({
                "slot": slot,
                "tx_index": tx_idx,
                "signature": tx["transaction"]["signatures"][0],
                "mint": accounts[1],  # mint account is second in PumpFun create
                "creator": accounts[7],  # creator wallet
                "raw_ix": decoded["raw_ix_data"],
            })

    return creates

simulate-launches.sql

sql

-- Load pumpfun_events Parquet, exclude mayhem mode, reconstruct price path
CREATE OR REPLACE TABLE pf_creates AS
SELECT
    slot AS create_slot,
    mint,
    creator,
    virtual_sol_reserves,
    virtual_token_reserves,
    real_token_reserves,
    virtual_sol_reserves / virtual_token_reserves AS opening_price,
    block_time AS create_time
FROM read_parquet('pumpfun_events_creates_2024*.parquet')
WHERE is_mayhem_mode = false;

CREATE OR REPLACE TABLE pf_trades AS
SELECT
    slot,
    mint,
    is_buy,
    sol_amount,
    token_amount,
    virtual_sol_reserves,
    virtual_token_reserves,
    virtual_sol_reserves / virtual_token_reserves AS price_at_trade
FROM read_parquet('pumpfun_events_trades_2024*.parquet');

-- For each launch, compute entry and exit price at configurable slot windows
-- Entry: first trade after create. Exit: price at N slots after create.
SELECT
    c.mint,
    c.create_slot,
    c.opening_price,
    first_trade.price_at_trade  AS entry_price,
    first_trade.slot            AS entry_slot,
    exit_15.price_at_trade      AS exit_price_15slot,
    exit_30.price_at_trade      AS exit_price_30slot,
    exit_50.price_at_trade      AS exit_price_50slot,
    (exit_15.price_at_trade - first_trade.price_at_trade) / first_trade.price_at_trade AS pnl_15slot,
    (exit_30.price_at_trade - first_trade.price_at_trade) / first_trade.price_at_trade AS pnl_30slot,
    (exit_50.price_at_trade - first_trade.price_at_trade) / first_trade.price_at_trade AS pnl_50slot
FROM pf_creates c
LEFT JOIN LATERAL (
    SELECT price_at_trade, slot
    FROM pf_trades
    WHERE mint = c.mint AND slot >= c.create_slot AND is_buy = true
    ORDER BY slot ASC
    LIMIT 1
) first_trade ON true
LEFT JOIN LATERAL (
    SELECT price_at_trade
    FROM pf_trades
    WHERE mint = c.mint AND slot BETWEEN c.create_slot + 13 AND c.create_slot + 17
    ORDER BY ABS(slot - (c.create_slot + 15)) ASC
    LIMIT 1
) exit_15 ON true
LEFT JOIN LATERAL (
    SELECT price_at_trade
    FROM pf_trades
    WHERE mint = c.mint AND slot BETWEEN c.create_slot + 27 AND c.create_slot + 33
    ORDER BY ABS(slot - (c.create_slot + 30)) ASC
    LIMIT 1
) exit_30 ON true
LEFT JOIN LATERAL (
    SELECT price_at_trade
    FROM pf_trades
    WHERE mint = c.mint AND slot BETWEEN c.create_slot + 45 AND c.create_slot + 55
    ORDER BY ABS(slot - (c.create_slot + 50)) ASC
    LIMIT 1
) exit_50 ON true
WHERE first_trade.slot IS NOT NULL;

06Strategy 3: Liquidation Windows on Kamino#

health-factor.sql

sql

-- Assumes Kamino lending position updates and liquidation events
-- are in separate tables from the trading datasets.
-- usd_value in dex_trades provides slot-level price for collateral tokens.

CREATE OR REPLACE TABLE position_health AS
SELECT
    p.slot,
    p.block_time,
    p.position_address,
    p.owner,
    p.collateral_mint,
    p.collateral_amount_ui,
    p.debt_amount_ui,
    p.debt_mint,
    price.usd_value / p.collateral_amount_ui AS collateral_price_usd,
    debt_price.usd_value / p.debt_amount_ui   AS debt_price_usd,
    -- Health factor = (collateral_usd * liquidation_threshold) / debt_usd
    -- Liquidation threshold varies by asset; using 0.80 as example
    (p.collateral_amount_ui * (price.usd_value / p.collateral_amount_ui) * 0.80)
        / (p.debt_amount_ui * (debt_price.usd_value / p.debt_amount_ui)) AS health_factor
FROM kamino_positions p
LEFT JOIN (
    SELECT slot, usd_value
    FROM read_parquet('dex_trades_*.parquet')
    WHERE token_in = p.collateral_mint OR token_out = p.collateral_mint
) price ON price.slot = p.slot
LEFT JOIN (
    SELECT slot, usd_value
    FROM read_parquet('dex_trades_*.parquet')
    WHERE token_in = p.debt_mint OR token_out = p.debt_mint
) debt_price ON debt_price.slot = p.slot;

-- Compute lag between liquidation-eligible and liquidation-executed
SELECT
    h.position_address,
    h.slot                AS eligible_slot,
    h.health_factor,
    liq.slot              AS liquidation_slot,
    liq.slot - h.slot     AS lag_slots,
    (liq.slot - h.slot) * 0.4 AS lag_seconds_approx
FROM position_health h
JOIN kamino_liquidations liq ON liq.position_address = h.position_address
WHERE h.health_factor < 1.0
  AND liq.slot >= h.slot
  AND h.slot = (
      SELECT MIN(slot) FROM position_health ph
      WHERE ph.position_address = h.position_address
        AND ph.health_factor < 1.0
  )
ORDER BY lag_slots ASC;

07Strategy 4: LP Position P&L on Orca Whirlpool#

lp-pnl.sql

sql

-- Simulate LP position P&L for a given tick range and holding period
-- Parameters: tick_lower = -100, tick_upper = 100, pool, entry_slot, exit_slot

WITH
-- Price path from swaps through the pool
price_path AS (
    SELECT
        slot,
        amount_out_ui / amount_in_ui AS price,
        amount_in_ui                 AS volume_in
    FROM read_parquet('orca_dex_trades_*.parquet')
    WHERE pool_address = '<<your_pool>>'
      AND slot BETWEEN <<entry_slot>> AND <<exit_slot>>
      AND amount_in_ui > 0
),

-- Entry and exit price
entry_price AS (SELECT price FROM price_path ORDER BY slot ASC  LIMIT 1),
exit_price  AS (SELECT price FROM price_path ORDER BY slot DESC LIMIT 1),

-- Slots where price was within tick range
-- Tick range converts to price via: price = 1.0001^tick
in_range_slots AS (
    SELECT slot, price, volume_in
    FROM price_path
    WHERE price BETWEEN pow(1.0001, -100) AND pow(1.0001, 100)
),

-- Total volume through pool in-range (proportional fee estimate)
-- Assumes position represents p% of pool liquidity
total_in_range_volume AS (SELECT SUM(volume_in) AS total_vol FROM in_range_slots),

-- Fee income (Orca default: 0.30% per swap through in-range liquidity)
fee_income AS (
    SELECT total_vol * 0.003 AS fees_earned
    FROM total_in_range_volume
),

-- Impermanent loss from entry to exit price
-- IL = 2 * sqrt(price_ratio) / (1 + price_ratio) - 1
il_calc AS (
    SELECT
        (SELECT price FROM exit_price) / (SELECT price FROM entry_price) AS price_ratio
),
impermanent_loss AS (
    SELECT 2 * sqrt(price_ratio) / (1 + price_ratio) - 1 AS il
    FROM il_calc
)

SELECT
    (SELECT price FROM entry_price)      AS entry_price,
    (SELECT price FROM exit_price)       AS exit_price,
    (SELECT fees_earned FROM fee_income) AS fee_income,
    (SELECT il FROM impermanent_loss)    AS impermanent_loss,
    (SELECT fees_earned FROM fee_income) + (SELECT il FROM impermanent_loss) AS net_pnl,
    (SELECT COUNT(*) FROM in_range_slots) AS slots_in_range,
    (SELECT COUNT(*) FROM price_path)     AS total_slots
;

The result shows fee income minus IL for that tick range over the holding period. Run it across several ranges to find where the fee-to-IL ratio was historically best.

08From DuckDB to Production: A Rust Scaffold#

backtest/src/engine.rs

rust

use std::collections::BTreeMap;
use std::path::PathBuf;
use std::fs;
use serde_json::Value;

#[derive(Debug, Clone)]
pub struct Block {
    pub slot: u64,
    pub block_time: Option<i64>,
    pub transactions: Vec<Value>,
}

#[derive(Debug, Clone, PartialEq)]
pub enum Side { Buy, Sell }

#[derive(Debug, Clone)]
pub struct Order {
    pub pool: String,
    pub side: Side,
    pub amount_ui: f64,
    pub signal_slot: u64,
}

pub trait BlockSource {
    fn next_block(&mut self) -> Option<Block>;
}

pub trait Strategy {
    fn on_block(&mut self, block: &Block) -> Vec<Order>;
}

/// Reads block_<slot>.json files from a local archive directory in slot order.
pub struct ArchiveSource {
    slot_index: BTreeMap<u64, PathBuf>,
    cursor: std::collections::btree_map::IntoIter<u64, PathBuf>,
}

impl ArchiveSource {
    pub fn new(archive_dir: &str, start_slot: u64, end_slot: u64) -> Self {
        let mut index = BTreeMap::new();

        let entries = fs::read_dir(archive_dir).expect("failed to read archive dir");
        for entry in entries.flatten() {
            let name = entry.file_name();
            let name_str = name.to_string_lossy();
            if let Some(slot_str) = name_str.strip_prefix("block_").and_then(|s| s.strip_suffix(".json")) {
                if let Ok(slot) = slot_str.parse::<u64>() {
                    if slot >= start_slot && slot <= end_slot {
                        index.insert(slot, entry.path());
                    }
                }
            }
        }

        let iter = index.into_iter();
        ArchiveSource { slot_index: BTreeMap::new(), cursor: iter }
    }
}

impl BlockSource for ArchiveSource {
    fn next_block(&mut self) -> Option<Block> {
        let (slot, path) = self.cursor.next()?;
        let raw = fs::read_to_string(&path).ok()?;
        let data: Value = serde_json::from_str(&raw).ok()?;

        Some(Block {
            slot,
            block_time: data["blockTime"].as_i64(),
            transactions: data["transactions"].as_array().cloned().unwrap_or_default(),
        })
    }
}

backtest/src/simulator.rs

rust

use std::collections::VecDeque;

#[derive(Debug)]
pub struct SimResult {
    pub final_pnl: f64,
    pub filled_orders: usize,
    pub final_slot: u64,
}

pub struct Simulator {
    pub fill_delay_slots: u64,
    pending: VecDeque<(u64, Order)>,  // (fill_slot, order)
}

impl Simulator {
    pub fn new(fill_delay_slots: u64) -> Self {
        Simulator { fill_delay_slots, pending: VecDeque::new() }
    }

    pub fn run(
        &mut self,
        source: &mut impl BlockSource,
        strategy: &mut impl Strategy,
        price_oracle: &impl Fn(u64, &str) -> Option<f64>,
    ) -> SimResult {
        let mut pnl: f64 = 0.0;
        let mut filled: usize = 0;
        let mut last_slot: u64 = 0;

        loop {
            let Some(block) = source.next_block() else { break };
            last_slot = block.slot;

            // Fill any pending orders whose fill_slot has arrived
            while let Some(&(fill_slot, _)) = self.pending.front() {
                if fill_slot > block.slot { break; }
                let (_, order) = self.pending.pop_front().unwrap();
                if let Some(fill_price) = price_oracle(block.slot, &order.pool) {
                    let trade_pnl = match order.side {
                        Side::Buy  => -fill_price * order.amount_ui,
                        Side::Sell =>  fill_price * order.amount_ui,
                    };
                    pnl += trade_pnl;
                    filled += 1;
                }
            }

            // Ask strategy for new orders
            let orders = strategy.on_block(&block);
            for order in orders {
                let fill_slot = block.slot + self.fill_delay_slots;
                self.pending.push_back((fill_slot, order));
            }
        }

        SimResult { final_pnl: pnl, filled_orders: filled, final_slot: last_slot }
    }
}

09What the Backtest Can't Tell You#

Backtests find edges. They do not tell you whether you'll capture them.

Get access · Talk to an engineer

///Read next

EngineeringJun 25, 2026

PumpFun Data Analysis: Graduation Rates, Creator Wallets, and Bonding Curve Price Reconstruction (2026)

Analyze PumpFun launch data: graduation rates, creator wallet clustering, bonding curve price reconstruction, is_mayhem_mode, DuckDB queries.

#pumpfun#solana#duckdb

14 min read

EngineeringJun 18, 2026

Yellowstone gRPC Providers Compared (2026): Latency, Decoded Streams & What Nobody Tells You Before You Go Live

Triton, Helius, Alchemy, Chainstack, QuickNode, and NoLimitNodes: latency tables, buffer depth, decoded vs raw streams, and the three things that silently break gRPC consumers at 3am.

#yellowstone#grpc#streaming

12 min read

Run it yourself

Every benchmark in this blog runs against our public endpoints.

Spin up an RPC, WebSocket, or gRPC endpoint in under a minute. Flat pricing, no request caps. Reproduce the numbers for your own workload.

See pricing

Backtesting Solana Trading Strategies with Historical Raw Blocks (2026)

01What's Wrong With the Data You're Probably Using#

02What a Raw Block Actually Contains#

03Two Paths to the Data#

04Strategy 1: DEX Arbitrage Between Raydium and Orca#

05Strategy 2: Token Launch Sniping on PumpFun#

06Strategy 3: Liquidation Windows on Kamino#

07Strategy 4: LP Position P&L on Orca Whirlpool#

08From DuckDB to Production: A Rust Scaffold#

09What the Backtest Can't Tell You#

PumpFun Data Analysis: Graduation Rates, Creator Wallets, and Bonding Curve Price Reconstruction (2026)

Yellowstone gRPC Providers Compared (2026): Latency, Decoded Streams & What Nobody Tells You Before You Go Live

Every benchmark in this blog runs against our public endpoints.

Ready to get started?

Backtesting Solana Trading Strategies with Historical Raw Blocks (2026)

01What's Wrong With the Data You're Probably Using#

02What a Raw Block Actually Contains#

03Two Paths to the Data#

04Strategy 1: DEX Arbitrage Between Raydium and Orca#

05Strategy 2: Token Launch Sniping on PumpFun#

06Strategy 3: Liquidation Windows on Kamino#

07Strategy 4: LP Position P&L on Orca Whirlpool#

08From DuckDB to Production: A Rust Scaffold#

09What the Backtest Can't Tell You#

PumpFun Data Analysis: Graduation Rates, Creator Wallets, and Bonding Curve Price Reconstruction (2026)

Yellowstone gRPC Providers Compared (2026): Latency, Decoded Streams & What Nobody Tells You Before You Go Live

Every benchmark in this blog runs against our public endpoints.

Ready to get started?