PumpFun Data Analysis: Graduation Rates, Creator Wallets, and Bonding Curve Price Reconstruction (2026)

Analyze PumpFun launch data: graduation rates, creator wallet clustering, bonding curve price reconstruction, is_mayhem_mode, DuckDB queries.

NoLimitNodes Engineering

Infrastructure Team

Jun 25, 202614 min read

On this page +

PumpFun's bonding curve is deterministic. Every trade commits the post-trade reserve state on-chain. That means the full price history for any token, from first buy to graduation, is reconstructible from pumpfun_events without oracles, without CEX feeds, without any external source. The record is complete.

The harder problem is that thousands of tokens launch per day and most follow the same arc: price spikes in the first few minutes, fades within an hour, flat line after. Individual token analysis at that scale is mostly noise. The useful work is aggregate: graduation rates by market condition, creator wallet patterns, early-buyer entry price distributions across cohorts. That's what the data makes tractable, and what this article covers.

01What the Data Covers#

Two event types make up the pumpfun_events dataset: creates and trades.

Create events are token launches. One row per token, capturing everything from the bonding curve initialization: the mint address, the creator wallet, the initial virtual and real reserves, the is_mayhem_mode flag, the on-chain metadata.

Trade events are buys and sells through the bonding curve. One row per transaction, with the full reserve state after each trade, the SOL and token amounts, fee breakdown, and the instruction name. When the bonding curve accumulates approximately 85 SOL in real reserves, graduation triggers. The final trade row for that mint carries ix_name = 'migrate' or 'complete_bonding_curve'. At that point the bonding curve closes, and the accumulated SOL plus the remaining token supply seed a new PumpSwap AMM pair. That is the graduation event.

After graduation, the token trades on PumpSwap as a standard AMM pair. That history continues in the PumpSwap dex_trades table, not in pumpfun_events. If you're building a full price history across a token's life, you need both tables and a join on mint.

Coverage starts from PumpFun's launch on Solana. Daily updates land by 06:00 UTC the following morning.

02How the Bonding Curve Prices Trades#

Every PumpFun token opens with the same initial virtual reserves: approximately 30 SOL on the SOL side and 1.073 billion tokens on the token side. The opening price works out to roughly 0.000000028 SOL per token. That number is small by design. Early buyers get in cheaply; the curve prices them out as real SOL flows in.

The mechanism is a constant-product curve. As buyers add SOL, virtual_sol_reserves rises and virtual_token_reserves falls. Because the product of the two stays constant, each successive buy is more expensive than the one before it. The price impact of the tenth SOL into a token is larger than the first, and the impact of the eightieth SOL is larger still. By graduation the price is roughly 14 to 15 times the opening price, given standard initial reserve values. That multiple comes directly from the math: price_multiple = ((vsr_0 + sol_added) / vsr_0)^2. At 85 SOL added to a 30-SOL virtual base, that is (115 / 30)^2 = about 14.7x.

The curve's constant k = virtual_sol_reserves * virtual_token_reserves is fixed at launch. From k and the initial reserves you can compute the expected price at any SOL fill level without scanning the trade history:

curve-params.sql

sql

-- Expected price at different SOL fill levels
-- k = vsr * vtr is constant (constant-product invariant)
-- Units: sol_added must match the units of virtual_sol_reserves in the dataset

WITH curve_params AS (
    SELECT
        mint,
        virtual_sol_reserves                                          AS vsr_0,
        virtual_token_reserves                                        AS vtr_0,
        virtual_sol_reserves * virtual_token_reserves                 AS k
    FROM read_parquet('pumpfun_events_creates_*.parquet')
    WHERE mint = '<<your_mint>>'
),
fill_levels AS (
    SELECT UNNEST([0, 5, 10, 20, 30, 50, 70, 85]) AS sol_added
)
SELECT
    f.sol_added,
    (p.vsr_0 + f.sol_added) / (p.k / (p.vsr_0 + f.sol_added))   AS expected_price,
    power((p.vsr_0 + f.sol_added) / p.vsr_0, 2)                  AS price_multiple_vs_open
FROM curve_params p
CROSS JOIN fill_levels f
ORDER BY f.sol_added;

The price_multiple_vs_open column shows how many times the opening price a buyer at that fill level is paying. Run this for a few tokens in your dataset to verify the curve shape before building any strategy that depends on entry price assumptions. The standard initial reserves are consistent across most launches, but is_mayhem_mode tokens use modified parameters and will show a different curve shape entirely.

03Create Event Schema#

The fields we reach for first in any new PumpFun analysis are mint, creator, virtual_sol_reserves, virtual_token_reserves, is_mayhem_mode, and slot. That covers token identity, launch price, segmentation, and timing. The full schema has more, but most queries start with that subset and pull in other fields as needed.

Every row in pumpfun_events_creates represents one token launch. The fields worth understanding before writing any query:

Mint and identity

mint: the SPL mint address. Primary key across all PumpFun tables.
bonding_curve: the PDA holding bonding curve state for this token. Each mint has exactly one.
creator: the wallet that signed the create transaction. Many launches today go through third-party launch services that sign on behalf of the actual project team. A single creator address launching hundreds of tokens per day is almost certainly a proxy, not a person.
name, symbol, uri: on-chain metadata. uri points to a JSON file on IPFS or Arweave with the full metadata including image. Not guaranteed to resolve. Many tokens orphan their metadata within days, so write your pipeline to handle a 404 gracefully rather than treating it as an error.

Reserve state at launch

virtual_sol_reserves: the SOL side of the bonding curve at initialization, including the virtual offset built into the curve. This is the value you use for price calculation.
virtual_token_reserves: the token side at initialization, including virtual offset.
real_token_reserves: the actual token balance held by the bonding curve account. No virtual component. This is what the account physically holds.
token_total_supply: fixed token supply minted at launch.

The virtual/real distinction is the most common source of wrong prices in PumpFun analysis. Price at any point on the bonding curve is virtual_sol_reserves / virtual_token_reserves. Using real_token_reserves in that formula gives a different number. It's not a rounding error, it's a structurally different value, and there's no exception thrown when you use it.

Flags and timing

is_mayhem_mode: boolean, set at launch. Mayhem mode tokens have modified bonding curve parameters. The curve shape, graduation behavior, and effective price range differ from standard launches. Any analysis mixing mayhem mode with standard launches produces a blended distribution that describes neither type accurately.
slot, block_time, timestamp: three time references. block_time is the on-chain Unix timestamp from the block header. timestamp is the program-recorded timestamp. For joining with other datasets, slot and block_time are the reliable columns.
tx_index: position of this transaction within its block. The only reliable way to order multiple events within the same slot.

Additional fields

signature: transaction signature. Useful for cross-referencing with block explorers.
user: wallet that triggered the create. Often the same as creator, but can differ through certain launch interfaces.

04Trade Event Schema#

For price reconstruction, the three fields that matter most are virtual_sol_reserves, virtual_token_reserves, and tx_index. The reserve columns give you the post-trade state after every buy or sell. tx_index is what lets you order events within the same slot correctly. Everything else in this table is supplementary until you get to fee analysis or user-level attribution.

is_buy: boolean. true means SOL in, tokens out. false means tokens in, SOL out.
sol_amount: raw lamports exchanged. Divide by 1e9 for SOL units. The _ui variant (sol_amount_ui) handles this for you if available in the dataset version you're using.
token_amount: raw token base units exchanged. Divide by 10^decimals (typically 6) for human-readable units. Use token_amount_ui to skip the math.
virtual_sol_reserves, virtual_token_reserves: reserve state after this trade. Use these for post-trade price. For pre-trade price, fetch the prior row's reserves using LAG() ordered by slot and tx_index.
fee_basis_points: protocol fee rate. 100 = 1%.
fee: protocol fee for this trade in lamports.
creator_fee_basis_points: creator-set fee rate. Zero if the creator didn't set one.
creator_fee: creator fee for this trade in lamports. Zero when creator_fee_basis_points is zero.
ix_name: instruction name. 'buy', 'sell', or 'migrate' / 'complete_bonding_curve' at graduation.
user: wallet that executed the trade.

05What the Schema Doesn't Tell You#

The thing to get right before your first real query is which reserve values do what. real_token_reserves is the bonding curve account's actual token balance. virtual_sol_reserves and virtual_token_reserves include the virtual offset that keeps the curve well-behaved at low liquidity. Price at any point is virtual_sol_reserves / virtual_token_reserves. Use real reserves in that formula and you get a different number with no error thrown. We've seen this trip up analysts more than any other field in the dataset.

The creator field has a similar surface-area issue. It's the signing wallet, not necessarily the project. When we ran creator-level analysis on a month of creates, the top 20 creator addresses accounted for a disproportionate share of total launches. All of them were services. Run a launch-frequency distribution first before building any creator-level conclusions. Any wallet with thousands of launches is a proxy, not a person.

If you're building a full price history across a token's life, the graduation point is where the decimal handling changes. On the bonding curve, price is virtual_sol_reserves / virtual_token_reserves, normalized by the reserve structure. After graduation, trading moves to PumpSwap dex_trades where sol_amount is in lamports and token_amount in token base units. A raw sol_amount / token_amount ratio there is wrong by 10^(9 - decimals), which is 1000x off for a 6-decimal token. Use _ui columns for post-graduation trades, virtual reserve ratio for everything before.

Two smaller edge cases. block_time is identical for all transactions in a slot, so MIN(block_time) won't resolve intra-slot ordering. Use tx_index instead, which is the transaction's position within the block. And Token-2022 tokens with transfer hooks record the instruction input in token_amount, not the post-hook amount. For most aggregate analysis the difference is negligible. For precise fee accounting on affected tokens it matters.

06Getting the Data#

The pumpfun_events tables are part of the NLN Trading Datasets: pre-parsed Parquet files, partitioned by day, delivered as tar.zst archives via signed URLs. DuckDB reads them directly without an extraction step.

The create and trade tables ship separately: pumpfun_events_creates_YYYY-MM-DD.parquet and pumpfun_events_trades_YYYY-MM-DD.parquet. Join them on mint when you need the full picture from launch through trading history.

setup.sh

bash

pip install duckdb

verify-setup.py

python

import duckdb

con = duckdb.connect()

# Verify your extract landed correctly
row = con.execute("""
    SELECT
        COUNT(*)          AS total_launches,
        MIN(block_time)   AS earliest,
        MAX(block_time)   AS latest
    FROM read_parquet('./pumpfun_data/pumpfun_events_creates_*.parquet')
""").fetchone()
print(row)

All queries in this article run against extracted Parquet files using the glob pattern *.parquet. For pandas users, individual files load with pd.read_parquet(). The glob patterns only apply in DuckDB.

07DuckDB Analysis Patterns#

Five queries for the most common PumpFun analysis needs.

Launch volume by day

launch-volume.sql

sql

-- Count daily launches, separated by is_mayhem_mode
SELECT
    DATE(block_time)   AS launch_date,
    is_mayhem_mode,
    COUNT(*)           AS launches,
    COUNT(DISTINCT creator) AS unique_creators
FROM read_parquet('pumpfun_events_creates_*.parquet')
GROUP BY DATE(block_time), is_mayhem_mode
ORDER BY launch_date DESC, is_mayhem_mode;

Run this first on any new dataset range. The is_mayhem_mode split tells you what fraction of launches to exclude from standard analysis before writing any further queries.

Price path reconstruction for a single token

price-path.sql

sql

-- Reconstruct price path for a given mint using virtual reserves
-- Price = virtual_sol_reserves / virtual_token_reserves after each trade

WITH trades AS (
    SELECT
        slot,
        tx_index,
        block_time,
        is_buy,
        sol_amount,
        token_amount,
        virtual_sol_reserves,
        virtual_token_reserves,
        virtual_sol_reserves / virtual_token_reserves AS price_post_trade,
        LAG(virtual_sol_reserves / virtual_token_reserves)
            OVER (PARTITION BY mint ORDER BY slot, tx_index) AS price_pre_trade,
        ix_name
    FROM read_parquet('pumpfun_events_trades_*.parquet')
    WHERE mint = '<<your_mint>>'
    ORDER BY slot, tx_index
)
SELECT
    slot,
    block_time,
    is_buy,
    ROUND(sol_amount / 1e9, 6)          AS sol_ui,
    price_pre_trade,
    price_post_trade,
    (price_post_trade - price_pre_trade)
        / price_pre_trade               AS price_impact_pct,
    ix_name
FROM trades;

The LAG() window function gives you the pre-trade price, which is what a buyer saw before submitting. The delta between price_pre_trade and price_post_trade is the price impact of that trade on the curve.

Graduation rate analysis

graduation-rate.sql

sql

-- Compute graduation rate and time-to-graduation, segmented by is_mayhem_mode
-- Uses LEFT JOIN so ungraduated tokens are included in the denominator

WITH creates AS (
    SELECT mint, slot AS create_slot, block_time AS create_time, is_mayhem_mode, creator
    FROM read_parquet('pumpfun_events_creates_*.parquet')
),
graduations AS (
    SELECT mint, slot AS grad_slot, block_time AS grad_time
    FROM read_parquet('pumpfun_events_trades_*.parquet')
    WHERE ix_name IN ('migrate', 'complete_bonding_curve')
),
joined AS (
    SELECT
        c.mint,
        c.is_mayhem_mode,
        c.create_time,
        g.grad_time,
        g.grad_slot - c.create_slot              AS slots_to_graduation,
        (g.grad_time - c.create_time) / 60.0     AS minutes_to_graduation,
        c.creator,
        CASE WHEN g.mint IS NOT NULL THEN 1 ELSE 0 END AS graduated
    FROM creates c
    LEFT JOIN graduations g ON g.mint = c.mint
)
SELECT
    is_mayhem_mode,
    COUNT(*)                                                         AS total_launches,
    SUM(graduated)                                                   AS graduated_count,
    ROUND(SUM(graduated) * 100.0 / COUNT(*), 2)                     AS graduation_rate_pct,
    ROUND(AVG(CASE WHEN graduated = 1 THEN minutes_to_graduation END), 1) AS avg_minutes_to_grad
FROM joined
GROUP BY is_mayhem_mode
ORDER BY is_mayhem_mode;

Run this across a month and compare graduation_rate_pct between mayhem mode and standard launches. The avg_minutes_to_grad column tells you whether the tokens that did graduate did so quickly (sniped to graduation within minutes) or over a longer period (organic accumulation over hours). Those two paths look very different in the early buyer data.

Creator wallet distribution

creator-wallets.sql

sql

-- Identify launch services vs individual creators by launch frequency
WITH creator_counts AS (
    SELECT
        creator,
        COUNT(*)                                          AS total_launches,
        SUM(CASE WHEN is_mayhem_mode THEN 1 ELSE 0 END)  AS mayhem_launches,
        MIN(block_time)                                   AS first_launch,
        MAX(block_time)                                   AS last_launch
    FROM read_parquet('pumpfun_events_creates_*.parquet')
    GROUP BY creator
)
SELECT
    creator,
    total_launches,
    mayhem_launches,
    ROUND(mayhem_launches * 100.0 / total_launches, 1) AS mayhem_pct,
    first_launch,
    last_launch,
    CASE
        WHEN total_launches > 500 THEN 'likely_launch_service'
        WHEN total_launches > 50  THEN 'power_user'
        ELSE 'individual'
    END AS creator_type
FROM creator_counts
ORDER BY total_launches DESC
LIMIT 100;

The creator_type thresholds here are illustrative. Adjust them after looking at the actual distribution in your dataset. The key insight is almost always the same: a small number of creator addresses account for a large fraction of total launches, and those addresses are services, not people.

Early buyer entry analysis

early-buyers.sql

sql

-- For each token, find the first N buyers and their entry prices
-- Useful for analyzing early-buyer advantage across a token cohort

WITH ranked_buys AS (
    SELECT
        mint,
        user,
        slot,
        tx_index,
        block_time,
        sol_amount / 1e9                              AS sol_spent_ui,
        virtual_sol_reserves / virtual_token_reserves AS entry_price,
        ROW_NUMBER() OVER (
            PARTITION BY mint
            ORDER BY slot, tx_index
        ) AS buy_rank
    FROM read_parquet('pumpfun_events_trades_*.parquet')
    WHERE is_buy = true
),
peak_prices AS (
    SELECT
        mint,
        MAX(virtual_sol_reserves / virtual_token_reserves) AS peak_price
    FROM read_parquet('pumpfun_events_trades_*.parquet')
    GROUP BY mint
)
SELECT
    b.mint,
    b.buy_rank,
    b.user,
    b.sol_spent_ui,
    b.entry_price,
    p.peak_price,
    (p.peak_price - b.entry_price) / b.entry_price AS unrealized_peak_return
FROM ranked_buys b
JOIN peak_prices p ON p.mint = b.mint
WHERE b.buy_rank <= 5
ORDER BY b.mint, b.buy_rank;

unrealized_peak_return here is the theoretical return if you sold at peak price. In practice early buyers rarely sell at peak, but the distribution of unrealized_peak_return across buy ranks 1 through 5 across a large token cohort tells you whether first-mover advantage was real in your target period. Filter to a specific date range before running this at scale: the peak_prices CTE scans the entire trades Parquet for every mint in the dataset.

08Why Individual Token Analysis Fails at Scale#

PumpFun data at scale inverts the usual framing. Individual token analysis is rarely the productive question. The data is too noisy at the individual level: thousands of tokens launch per day, most follow the same pattern, and the few that don't aren't predictable from the create event alone.

The productive framing is cohort analysis: group launches by some characteristic (mayhem mode vs not, launch time of day, creator type, market activity level) and compare outcomes across cohorts. The questions that have real signal are aggregate ones.

When we've done cohort analysis on graduation rates, the clearest signal appears to be launch conditions and timing rather than anything in the token's metadata. Token name and symbol are noise at the aggregate level. Market-wide activity at the time of launch seems to matter more than anything in the create event itself. The following query tests this directly: it groups launches into quintiles by how busy the launch day was, then compares graduation rates across those quintiles.

graduation-by-activity.sql

sql

-- Does graduation rate change on busier vs quieter launch days?
-- Split standard (non-mayhem) launches into 5 quintiles by daily launch count

WITH daily_activity AS (
    SELECT
        DATE(block_time)    AS launch_date,
        COUNT(*)            AS daily_launches
    FROM read_parquet('pumpfun_events_creates_*.parquet')
    WHERE is_mayhem_mode = false
    GROUP BY DATE(block_time)
),
activity_quintiles AS (
    SELECT
        launch_date,
        daily_launches,
        NTILE(5) OVER (ORDER BY daily_launches) AS activity_quintile
    FROM daily_activity
),
creates AS (
    SELECT
        c.mint,
        q.activity_quintile,
        q.daily_launches
    FROM read_parquet('pumpfun_events_creates_*.parquet') c
    JOIN activity_quintiles q ON DATE(c.block_time) = q.launch_date
    WHERE c.is_mayhem_mode = false
),
graduations AS (
    SELECT mint
    FROM read_parquet('pumpfun_events_trades_*.parquet')
    WHERE ix_name IN ('migrate', 'complete_bonding_curve')
)
SELECT
    c.activity_quintile,
    MIN(c.daily_launches)                               AS min_daily_launches,
    MAX(c.daily_launches)                               AS max_daily_launches,
    COUNT(*)                                            AS total_launches,
    COUNT(g.mint)                                       AS graduated,
    ROUND(COUNT(g.mint) * 100.0 / COUNT(*), 2)         AS graduation_rate_pct
FROM creates c
LEFT JOIN graduations g ON g.mint = c.mint
GROUP BY c.activity_quintile
ORDER BY c.activity_quintile;

Quintile 1 is the quietest days. Quintile 5 is the busiest. The direction of the relationship shifts with market conditions. Run this across your target period before drawing any conclusions about when to deploy capital. The signal shifts, and what held in one quarter may not hold in the next.

When you do find a cohort with better-than-average graduation rates or sustained price action, the next step is usually digging into the raw block data to see the failed competing transactions: who else was trying to get into the same launches, and what did the competition look like. That's where the NLN Historical Raw Blocks archive comes in. The parsed pumpfun_events table tells you what succeeded. The raw blocks show you everything that tried.

If you're building on PumpFun data at scale, the NLN Trading Datasets include pumpfun_events creates and trades as DuckDB-ready Parquet, partitioned by day, covering from PumpFun's launch to yesterday. For research that requires the full context including failed transactions and competing snipe attempts, the raw block archive at NLN Historical Raw Blocks has the complete ledger.

Get access · Talk to an engineer

///Read next

EngineeringJun 24, 2026

Backtesting Solana Trading Strategies with Historical Raw Blocks (2026)

How to backtest Solana DEX arb, token sniping, liquidation, and LP strategies using historical raw blocks and parsed trading datasets. Python, DuckDB, and Rust.

#solana#backtesting#historical-data

18 min read

EngineeringJun 18, 2026

Yellowstone gRPC Providers Compared (2026): Latency, Decoded Streams & What Nobody Tells You Before You Go Live

Triton, Helius, Alchemy, Chainstack, QuickNode, and NoLimitNodes: latency tables, buffer depth, decoded vs raw streams, and the three things that silently break gRPC consumers at 3am.

#yellowstone#grpc#streaming

12 min read

Run it yourself

Every benchmark in this blog runs against our public endpoints.

Spin up an RPC, WebSocket, or gRPC endpoint in under a minute. Flat pricing, no request caps. Reproduce the numbers for your own workload.

See pricing

PumpFun Data Analysis: Graduation Rates, Creator Wallets, and Bonding Curve Price Reconstruction (2026)

Analyze PumpFun launch data: graduation rates, creator wallet clustering, bonding curve price reconstruction, is_mayhem_mode, DuckDB queries.

NoLimitNodes Engineering

Infrastructure Team

Jun 25, 202614 min read

On this page +

01What the Data Covers#

Two event types make up the pumpfun_events dataset: creates and trades.

Coverage starts from PumpFun's launch on Solana. Daily updates land by 06:00 UTC the following morning.

02How the Bonding Curve Prices Trades#

curve-params.sql

sql

-- Expected price at different SOL fill levels
-- k = vsr * vtr is constant (constant-product invariant)
-- Units: sol_added must match the units of virtual_sol_reserves in the dataset

WITH curve_params AS (
    SELECT
        mint,
        virtual_sol_reserves                                          AS vsr_0,
        virtual_token_reserves                                        AS vtr_0,
        virtual_sol_reserves * virtual_token_reserves                 AS k
    FROM read_parquet('pumpfun_events_creates_*.parquet')
    WHERE mint = '<<your_mint>>'
),
fill_levels AS (
    SELECT UNNEST([0, 5, 10, 20, 30, 50, 70, 85]) AS sol_added
)
SELECT
    f.sol_added,
    (p.vsr_0 + f.sol_added) / (p.k / (p.vsr_0 + f.sol_added))   AS expected_price,
    power((p.vsr_0 + f.sol_added) / p.vsr_0, 2)                  AS price_multiple_vs_open
FROM curve_params p
CROSS JOIN fill_levels f
ORDER BY f.sol_added;

03Create Event Schema#

Every row in pumpfun_events_creates represents one token launch. The fields worth understanding before writing any query:

Mint and identity

mint: the SPL mint address. Primary key across all PumpFun tables.
bonding_curve: the PDA holding bonding curve state for this token. Each mint has exactly one.
creator: the wallet that signed the create transaction. Many launches today go through third-party launch services that sign on behalf of the actual project team. A single creator address launching hundreds of tokens per day is almost certainly a proxy, not a person.
name, symbol, uri: on-chain metadata. uri points to a JSON file on IPFS or Arweave with the full metadata including image. Not guaranteed to resolve. Many tokens orphan their metadata within days, so write your pipeline to handle a 404 gracefully rather than treating it as an error.

Reserve state at launch

virtual_sol_reserves: the SOL side of the bonding curve at initialization, including the virtual offset built into the curve. This is the value you use for price calculation.
virtual_token_reserves: the token side at initialization, including virtual offset.
real_token_reserves: the actual token balance held by the bonding curve account. No virtual component. This is what the account physically holds.
token_total_supply: fixed token supply minted at launch.

Flags and timing

is_mayhem_mode: boolean, set at launch. Mayhem mode tokens have modified bonding curve parameters. The curve shape, graduation behavior, and effective price range differ from standard launches. Any analysis mixing mayhem mode with standard launches produces a blended distribution that describes neither type accurately.
slot, block_time, timestamp: three time references. block_time is the on-chain Unix timestamp from the block header. timestamp is the program-recorded timestamp. For joining with other datasets, slot and block_time are the reliable columns.
tx_index: position of this transaction within its block. The only reliable way to order multiple events within the same slot.

Additional fields

signature: transaction signature. Useful for cross-referencing with block explorers.
user: wallet that triggered the create. Often the same as creator, but can differ through certain launch interfaces.

04Trade Event Schema#

is_buy: boolean. true means SOL in, tokens out. false means tokens in, SOL out.
sol_amount: raw lamports exchanged. Divide by 1e9 for SOL units. The _ui variant (sol_amount_ui) handles this for you if available in the dataset version you're using.
token_amount: raw token base units exchanged. Divide by 10^decimals (typically 6) for human-readable units. Use token_amount_ui to skip the math.
virtual_sol_reserves, virtual_token_reserves: reserve state after this trade. Use these for post-trade price. For pre-trade price, fetch the prior row's reserves using LAG() ordered by slot and tx_index.
fee_basis_points: protocol fee rate. 100 = 1%.
fee: protocol fee for this trade in lamports.
creator_fee_basis_points: creator-set fee rate. Zero if the creator didn't set one.
creator_fee: creator fee for this trade in lamports. Zero when creator_fee_basis_points is zero.
ix_name: instruction name. 'buy', 'sell', or 'migrate' / 'complete_bonding_curve' at graduation.
user: wallet that executed the trade.

05What the Schema Doesn't Tell You#

06Getting the Data#

setup.sh

bash

pip install duckdb

verify-setup.py

python

import duckdb

con = duckdb.connect()

# Verify your extract landed correctly
row = con.execute("""
    SELECT
        COUNT(*)          AS total_launches,
        MIN(block_time)   AS earliest,
        MAX(block_time)   AS latest
    FROM read_parquet('./pumpfun_data/pumpfun_events_creates_*.parquet')
""").fetchone()
print(row)

07DuckDB Analysis Patterns#

Five queries for the most common PumpFun analysis needs.

Launch volume by day

launch-volume.sql

sql

-- Count daily launches, separated by is_mayhem_mode
SELECT
    DATE(block_time)   AS launch_date,
    is_mayhem_mode,
    COUNT(*)           AS launches,
    COUNT(DISTINCT creator) AS unique_creators
FROM read_parquet('pumpfun_events_creates_*.parquet')
GROUP BY DATE(block_time), is_mayhem_mode
ORDER BY launch_date DESC, is_mayhem_mode;

Run this first on any new dataset range. The is_mayhem_mode split tells you what fraction of launches to exclude from standard analysis before writing any further queries.

Price path reconstruction for a single token

price-path.sql

sql

-- Reconstruct price path for a given mint using virtual reserves
-- Price = virtual_sol_reserves / virtual_token_reserves after each trade

WITH trades AS (
    SELECT
        slot,
        tx_index,
        block_time,
        is_buy,
        sol_amount,
        token_amount,
        virtual_sol_reserves,
        virtual_token_reserves,
        virtual_sol_reserves / virtual_token_reserves AS price_post_trade,
        LAG(virtual_sol_reserves / virtual_token_reserves)
            OVER (PARTITION BY mint ORDER BY slot, tx_index) AS price_pre_trade,
        ix_name
    FROM read_parquet('pumpfun_events_trades_*.parquet')
    WHERE mint = '<<your_mint>>'
    ORDER BY slot, tx_index
)
SELECT
    slot,
    block_time,
    is_buy,
    ROUND(sol_amount / 1e9, 6)          AS sol_ui,
    price_pre_trade,
    price_post_trade,
    (price_post_trade - price_pre_trade)
        / price_pre_trade               AS price_impact_pct,
    ix_name
FROM trades;

Graduation rate analysis

graduation-rate.sql

sql

-- Compute graduation rate and time-to-graduation, segmented by is_mayhem_mode
-- Uses LEFT JOIN so ungraduated tokens are included in the denominator

WITH creates AS (
    SELECT mint, slot AS create_slot, block_time AS create_time, is_mayhem_mode, creator
    FROM read_parquet('pumpfun_events_creates_*.parquet')
),
graduations AS (
    SELECT mint, slot AS grad_slot, block_time AS grad_time
    FROM read_parquet('pumpfun_events_trades_*.parquet')
    WHERE ix_name IN ('migrate', 'complete_bonding_curve')
),
joined AS (
    SELECT
        c.mint,
        c.is_mayhem_mode,
        c.create_time,
        g.grad_time,
        g.grad_slot - c.create_slot              AS slots_to_graduation,
        (g.grad_time - c.create_time) / 60.0     AS minutes_to_graduation,
        c.creator,
        CASE WHEN g.mint IS NOT NULL THEN 1 ELSE 0 END AS graduated
    FROM creates c
    LEFT JOIN graduations g ON g.mint = c.mint
)
SELECT
    is_mayhem_mode,
    COUNT(*)                                                         AS total_launches,
    SUM(graduated)                                                   AS graduated_count,
    ROUND(SUM(graduated) * 100.0 / COUNT(*), 2)                     AS graduation_rate_pct,
    ROUND(AVG(CASE WHEN graduated = 1 THEN minutes_to_graduation END), 1) AS avg_minutes_to_grad
FROM joined
GROUP BY is_mayhem_mode
ORDER BY is_mayhem_mode;

Creator wallet distribution

creator-wallets.sql

sql

-- Identify launch services vs individual creators by launch frequency
WITH creator_counts AS (
    SELECT
        creator,
        COUNT(*)                                          AS total_launches,
        SUM(CASE WHEN is_mayhem_mode THEN 1 ELSE 0 END)  AS mayhem_launches,
        MIN(block_time)                                   AS first_launch,
        MAX(block_time)                                   AS last_launch
    FROM read_parquet('pumpfun_events_creates_*.parquet')
    GROUP BY creator
)
SELECT
    creator,
    total_launches,
    mayhem_launches,
    ROUND(mayhem_launches * 100.0 / total_launches, 1) AS mayhem_pct,
    first_launch,
    last_launch,
    CASE
        WHEN total_launches > 500 THEN 'likely_launch_service'
        WHEN total_launches > 50  THEN 'power_user'
        ELSE 'individual'
    END AS creator_type
FROM creator_counts
ORDER BY total_launches DESC
LIMIT 100;

Early buyer entry analysis

early-buyers.sql

sql

-- For each token, find the first N buyers and their entry prices
-- Useful for analyzing early-buyer advantage across a token cohort

WITH ranked_buys AS (
    SELECT
        mint,
        user,
        slot,
        tx_index,
        block_time,
        sol_amount / 1e9                              AS sol_spent_ui,
        virtual_sol_reserves / virtual_token_reserves AS entry_price,
        ROW_NUMBER() OVER (
            PARTITION BY mint
            ORDER BY slot, tx_index
        ) AS buy_rank
    FROM read_parquet('pumpfun_events_trades_*.parquet')
    WHERE is_buy = true
),
peak_prices AS (
    SELECT
        mint,
        MAX(virtual_sol_reserves / virtual_token_reserves) AS peak_price
    FROM read_parquet('pumpfun_events_trades_*.parquet')
    GROUP BY mint
)
SELECT
    b.mint,
    b.buy_rank,
    b.user,
    b.sol_spent_ui,
    b.entry_price,
    p.peak_price,
    (p.peak_price - b.entry_price) / b.entry_price AS unrealized_peak_return
FROM ranked_buys b
JOIN peak_prices p ON p.mint = b.mint
WHERE b.buy_rank <= 5
ORDER BY b.mint, b.buy_rank;

08Why Individual Token Analysis Fails at Scale#

graduation-by-activity.sql

sql

-- Does graduation rate change on busier vs quieter launch days?
-- Split standard (non-mayhem) launches into 5 quintiles by daily launch count

WITH daily_activity AS (
    SELECT
        DATE(block_time)    AS launch_date,
        COUNT(*)            AS daily_launches
    FROM read_parquet('pumpfun_events_creates_*.parquet')
    WHERE is_mayhem_mode = false
    GROUP BY DATE(block_time)
),
activity_quintiles AS (
    SELECT
        launch_date,
        daily_launches,
        NTILE(5) OVER (ORDER BY daily_launches) AS activity_quintile
    FROM daily_activity
),
creates AS (
    SELECT
        c.mint,
        q.activity_quintile,
        q.daily_launches
    FROM read_parquet('pumpfun_events_creates_*.parquet') c
    JOIN activity_quintiles q ON DATE(c.block_time) = q.launch_date
    WHERE c.is_mayhem_mode = false
),
graduations AS (
    SELECT mint
    FROM read_parquet('pumpfun_events_trades_*.parquet')
    WHERE ix_name IN ('migrate', 'complete_bonding_curve')
)
SELECT
    c.activity_quintile,
    MIN(c.daily_launches)                               AS min_daily_launches,
    MAX(c.daily_launches)                               AS max_daily_launches,
    COUNT(*)                                            AS total_launches,
    COUNT(g.mint)                                       AS graduated,
    ROUND(COUNT(g.mint) * 100.0 / COUNT(*), 2)         AS graduation_rate_pct
FROM creates c
LEFT JOIN graduations g ON g.mint = c.mint
GROUP BY c.activity_quintile
ORDER BY c.activity_quintile;

Get access · Talk to an engineer

///Read next

EngineeringJun 24, 2026

Backtesting Solana Trading Strategies with Historical Raw Blocks (2026)

How to backtest Solana DEX arb, token sniping, liquidation, and LP strategies using historical raw blocks and parsed trading datasets. Python, DuckDB, and Rust.

#solana#backtesting#historical-data

18 min read

EngineeringJun 18, 2026

Yellowstone gRPC Providers Compared (2026): Latency, Decoded Streams & What Nobody Tells You Before You Go Live

Triton, Helius, Alchemy, Chainstack, QuickNode, and NoLimitNodes: latency tables, buffer depth, decoded vs raw streams, and the three things that silently break gRPC consumers at 3am.

#yellowstone#grpc#streaming

12 min read

Run it yourself

Every benchmark in this blog runs against our public endpoints.

Spin up an RPC, WebSocket, or gRPC endpoint in under a minute. Flat pricing, no request caps. Reproduce the numbers for your own workload.

See pricing

01What the Data Covers#

02How the Bonding Curve Prices Trades#

03Create Event Schema#

Mint and identity

Reserve state at launch

Flags and timing

Additional fields

04Trade Event Schema#

05What the Schema Doesn't Tell You#

06Getting the Data#

07DuckDB Analysis Patterns#

Launch volume by day

Price path reconstruction for a single token

Graduation rate analysis

Creator wallet distribution

Early buyer entry analysis

08Why Individual Token Analysis Fails at Scale#

Backtesting Solana Trading Strategies with Historical Raw Blocks (2026)

Yellowstone gRPC Providers Compared (2026): Latency, Decoded Streams & What Nobody Tells You Before You Go Live

Every benchmark in this blog runs against our public endpoints.

Ready to get started?

01What the Data Covers#

02How the Bonding Curve Prices Trades#

03Create Event Schema#

Mint and identity

Reserve state at launch

Flags and timing

Additional fields

04Trade Event Schema#

05What the Schema Doesn't Tell You#

06Getting the Data#

07DuckDB Analysis Patterns#

Launch volume by day

Price path reconstruction for a single token

Graduation rate analysis

Creator wallet distribution

Early buyer entry analysis

08Why Individual Token Analysis Fails at Scale#

Backtesting Solana Trading Strategies with Historical Raw Blocks (2026)

Yellowstone gRPC Providers Compared (2026): Latency, Decoded Streams & What Nobody Tells You Before You Go Live

Every benchmark in this blog runs against our public endpoints.

Ready to get started?