Skip to main content

GraphQL and General-Purpose Indexer (Beta)

⚙️Early-Stage Feature

This content describes an alpha/beta feature or service. These early stage features and services are in active development, so details are likely to change.

This feature or service is currently available in

  • Devnet
  • Testnet
  • Mainnet

The GraphQL and sui-indexer-alt (Indexer) stack are part of the Sui data access infrastructure. The stack provides access to on-chain data through a high-performance GraphQL service backed by a scalable and general-purpose indexer. This stack is optimized for developers who need flexible queries, structured output, and historical access to data (with configurable retention) across the Sui network.

GraphQL is ideal for applications that require rich query patterns over structured data, such as fetching owned objects, transaction history, specific onchain attributes, and more. The GraphQL service runs on top of a Postgres-compatible database that is updated by different Indexer pipelines in parallel.

The general-purpose indexer includes configurable checkpoint-based pipelines that extract data from the Sui remote checkpoint store and full nodes. The pipelines write processed data to a database optimized for GraphQL query access. Together, the GraphQL service and Indexer offer a modular and production-ready data stack for builders, wallet developers, explorers, and indexer/data providers.

info

Refer to Access Sui Data for an overview of options to access Sui network data.

Key components

The key components of the stack include the following:

  • General-purpose Indexer: Ingests and transforms Sui checkpoint data using configurable and parallel pipelines, and writes it into a Postgres-compatible database. Can be configured to use the Sui remote checkpoint store and a full node as its sources.
  • Postgres-compatible database: Stores indexed data for GraphQL queries. Tested using GCP AlloyDB, but you can run any Postgres-compatible database. You're encouraged to test alternative databases and share feedback on performance, cost, and operational characteristics.
  • GraphQL service: Serves structured queries over indexed data. Follows the GraphQL specification and the supported schema is documented in the GraphQL API reference. Also take a look at the getting started guide.
  • Archival Service: Enables point lookups for historical data from a key-value store. If unavailable, the GraphQL service falls back to the Postgres-compatible database for lookups, which might be limited by that database's retention policy. See Archival Store and Service for more information.
  • Consistent Store: Answers questions about the latest state of the network within the last hour (objects owned by addresses, objects by type, balances by address and type). Consistency is guaranteed by pinning queries to a specific (recent) checkpoint.
  • Full node: Enables transaction execution and simulation. Currently, JSON-RPC is used but there will be a switch to gRPC soon as the long-term full node API in the future.

When to use GraphQL

Use GraphQL if your application:

  • Requires historical (with configurable retention) or filtered access to data (e.g., all transactions sent by an address)
  • Needs to display structured results in a frontend (e.g., wallets, dashboards)
  • Benefits from flexible, composable queries that reduce overfetching
  • Relies on multiple data entities (e.g., transactions + objects + events) in a single request, or in a consistent fashion when spread over multiple requests (as if the responses came from a snapshot at some checkpoint).

Deployment options

You can run or use the GraphQL and Indexer data stack in the following configurations:

Fully managed service

As a developer, you can access GraphQL as a service from an indexer operator or data provider who runs and operates the full stack behind the scenes. Reach out to your data provider and ask if they already offer or plan to offer this service.

Partial self-managed

As a developer, you can:

  • Run the Indexer pipelines and GraphQL service, while using the Archival Service and a full node from an RPC provider or indexer operator.
  • Configure and manage a Postgres-compatible database (local Postgres, AlloyDB, and so on) as the primary data store.
  • Deploy the self-managed components on cloud infrastructure or baremetal.
info

You cannot run and operate the GraphQL service in self-managed configurations just yet. The functionality will become available after required changes are made to how the GraphQL service integrates with the Archival Service.

Fully self-managed

As a developer, indexer operator, or RPC provider, you can:

  • Run the complete stack: Indexer pipelines, GraphQL service, Postgres-compatible database, Archival Service, Consistent Store and full node on cloud infrastructure or bare metal.
  • Serve GraphQL to your own applications or to other builders and third-party services.

Working with the GraphQL service

The GraphQL service exposes a query surface conforming to GraphQL concepts. It allows pagination, filtering, and consistent snapshot queries. The service also supports runtime configuration for schema, query cost limits, and logging.

The GraphQL schema is defined in the GraphQL reference. You can explore supported types and fields there, use the GraphiQL IDE to test queries, and read documentation on the up-to-date schema.

The GraphQL service is deployed as a single binary implementing a stateless, horizontally scalable service. Queries are served with data from one or more of a Postgres-compatible database (filters over historical data), Archival Service (point lookups), Consistent Store (live data), or full node (execution and simulation), based on need. Access to these stores must be configured with the service on start-up, otherwise the service might fail to respond correctly to requests. More details on how to set-up, configure, and run the service is available in its README.

Requests to GraphQL are subject to various limits, to ensure resources are shared fairly between clients. Each limit is configurable, and the values configured for an instance can be queried through Query.serviceConfig. Requests that do not meet limits return with an error. The following limits are in effect:

  • Request size: Requests may not limit a certain size in bytes. The limit is spread across a transaction payload limit, which applies to all values and variable bindings that are parameters to transaction signing, execution, and simulation fields (default: 175KB), and a query payload limit which applies to all other parts of the query (default: 5KB).
  • Request timeout: Time spent on each request is bounded, with different bounds for execution (default: 74s) and regular reads (default: 40s).
  • Query input nodes and depth: The query cannot be too complex, meaning it cannot contain too many input nodes or field names (default: 300) or be too deeply nested (default: 20).
  • Output nodes: The service estimates the maximum number of output nodes the query might produce, assuming every requested field is present, every paginated field returns full pages, and every multi-get finds all requested keys. This estimate must be bounded (default: 1,000,000).
  • Page and multi-get size: Each paginated field (default: 50) and multi-get (default: 200) is subject to a maximum size. Certain paginated fields might override this to provide a higher or lower maximum.
  • (TBD) Rich queries: A request can contain only a bounded number (default: 5) of queries that require dedicated access to the database (cannot be grouped with other requests).

Working with General-purpose Indexer

General-purpose indexer fetches checkpoints data from either a remote object store, local files, or a full node RPC, and indexes data into multiple database tables via a set of specialized pipelines. Each pipeline is responsible for extracting specific data and writing to its target tables.

Click to open

Full list of tables and their schemas

// @generated automatically by Diesel CLI.

diesel::table! {
coin_balance_buckets (object_id, cp_sequence_number) {
object_id -> Bytea,
cp_sequence_number -> Int8,
owner_kind -> Nullable<Int2>,
owner_id -> Nullable<Bytea>,
coin_type -> Nullable<Bytea>,
coin_balance_bucket -> Nullable<Int2>,
}
}

diesel::table! {
coin_balance_buckets_deletion_reference (cp_sequence_number, object_id) {
object_id -> Bytea,
cp_sequence_number -> Int8,
}
}

diesel::table! {
cp_sequence_numbers (cp_sequence_number) {
cp_sequence_number -> Int8,
tx_lo -> Int8,
epoch -> Int8,
}
}

diesel::table! {
ev_emit_mod (package, module, tx_sequence_number) {
package -> Bytea,
module -> Text,
tx_sequence_number -> Int8,
sender -> Bytea,
}
}

diesel::table! {
ev_struct_inst (package, module, name, instantiation, tx_sequence_number) {
package -> Bytea,
module -> Text,
name -> Text,
instantiation -> Bytea,
tx_sequence_number -> Int8,
sender -> Bytea,
}
}

diesel::table! {
kv_checkpoints (sequence_number) {
sequence_number -> Int8,
checkpoint_contents -> Bytea,
checkpoint_summary -> Bytea,
validator_signatures -> Bytea,
}
}

diesel::table! {
kv_epoch_ends (epoch) {
epoch -> Int8,
cp_hi -> Int8,
tx_hi -> Int8,
end_timestamp_ms -> Int8,
safe_mode -> Bool,
total_stake -> Nullable<Int8>,
storage_fund_balance -> Nullable<Int8>,
storage_fund_reinvestment -> Nullable<Int8>,
storage_charge -> Nullable<Int8>,
storage_rebate -> Nullable<Int8>,
stake_subsidy_amount -> Nullable<Int8>,
total_gas_fees -> Nullable<Int8>,
total_stake_rewards_distributed -> Nullable<Int8>,
leftover_storage_fund_inflow -> Nullable<Int8>,
epoch_commitments -> Bytea,
}
}

diesel::table! {
kv_epoch_starts (epoch) {
epoch -> Int8,
protocol_version -> Int8,
cp_lo -> Int8,
start_timestamp_ms -> Int8,
reference_gas_price -> Int8,
system_state -> Bytea,
}
}

diesel::table! {
kv_feature_flags (protocol_version, flag_name) {
protocol_version -> Int8,
flag_name -> Text,
flag_value -> Bool,
}
}

diesel::table! {
kv_genesis (genesis_digest) {
genesis_digest -> Bytea,
initial_protocol_version -> Int8,
}
}

diesel::table! {
kv_objects (object_id, object_version) {
object_id -> Bytea,
object_version -> Int8,
serialized_object -> Nullable<Bytea>,
}
}

diesel::table! {
kv_packages (package_id, package_version) {
package_id -> Bytea,
package_version -> Int8,
original_id -> Bytea,
is_system_package -> Bool,
serialized_object -> Bytea,
cp_sequence_number -> Int8,
}
}

diesel::table! {
kv_protocol_configs (protocol_version, config_name) {
protocol_version -> Int8,
config_name -> Text,
config_value -> Nullable<Text>,
}
}

diesel::table! {
kv_transactions (tx_digest) {
tx_digest -> Bytea,
cp_sequence_number -> Int8,
timestamp_ms -> Int8,
raw_transaction -> Bytea,
raw_effects -> Bytea,
events -> Bytea,
user_signatures -> Bytea,
}
}

diesel::table! {
obj_info (object_id, cp_sequence_number) {
object_id -> Bytea,
cp_sequence_number -> Int8,
owner_kind -> Nullable<Int2>,
owner_id -> Nullable<Bytea>,
package -> Nullable<Bytea>,
module -> Nullable<Text>,
name -> Nullable<Text>,
instantiation -> Nullable<Bytea>,
}
}

diesel::table! {
obj_info_deletion_reference (cp_sequence_number, object_id) {
object_id -> Bytea,
cp_sequence_number -> Int8,
}
}

diesel::table! {
obj_versions (object_id, object_version) {
object_id -> Bytea,
object_version -> Int8,
object_digest -> Nullable<Bytea>,
cp_sequence_number -> Int8,
}
}

diesel::table! {
sum_displays (object_type) {
object_type -> Bytea,
display_id -> Bytea,
display_version -> Int2,
display -> Bytea,
}
}

diesel::table! {
tx_affected_addresses (affected, tx_sequence_number) {
affected -> Bytea,
tx_sequence_number -> Int8,
sender -> Bytea,
}
}

diesel::table! {
tx_affected_objects (affected, tx_sequence_number) {
tx_sequence_number -> Int8,
affected -> Bytea,
sender -> Bytea,
}
}

diesel::table! {
tx_balance_changes (tx_sequence_number) {
tx_sequence_number -> Int8,
balance_changes -> Bytea,
}
}

diesel::table! {
tx_calls (package, module, function, tx_sequence_number) {
package -> Bytea,
module -> Text,
function -> Text,
tx_sequence_number -> Int8,
sender -> Bytea,
}
}

diesel::table! {
tx_digests (tx_sequence_number) {
tx_sequence_number -> Int8,
tx_digest -> Bytea,
}
}

diesel::table! {
tx_kinds (tx_kind, tx_sequence_number) {
tx_kind -> Int2,
tx_sequence_number -> Int8,
}
}

diesel::table! {
watermarks (pipeline) {
pipeline -> Text,
epoch_hi_inclusive -> Int8,
checkpoint_hi_inclusive -> Int8,
tx_hi -> Int8,
timestamp_ms_hi_inclusive -> Int8,
reader_lo -> Int8,
pruner_timestamp -> Timestamp,
pruner_hi -> Int8,
}
}

diesel::allow_tables_to_appear_in_same_query!(
coin_balance_buckets,
coin_balance_buckets_deletion_reference,
cp_sequence_numbers,
ev_emit_mod,
ev_struct_inst,
kv_checkpoints,
kv_epoch_ends,
kv_epoch_starts,
kv_feature_flags,
kv_genesis,
kv_objects,
kv_packages,
kv_protocol_configs,
kv_transactions,
obj_info,
obj_info_deletion_reference,
obj_versions,
sum_displays,
tx_affected_addresses,
tx_affected_objects,
tx_balance_changes,
tx_calls,
tx_digests,
tx_kinds,
watermarks,
);

Below are brief descriptions of the various categories of pipelines based on the type of data they handle:

Blockchain raw content pipelines

Tables:

  • kv_checkpoints
  • kv_transactions
  • kv_objects
  • kv_packages

These pipelines capture the core blockchain data in its raw form, preserving complete checkpoint information, full transaction and objects contents, and Move package bytecode and metadata. They ensure the complete blockchain state is available for direct lookup by key (for example, object ID and version, transaction digest, checkpoint sequence number). Some production deployments use the Archival Store for looking up checkpoints, transactions, and objects contents instead of the corresponding kv_ tables.

The following pipelines create indexed views that allow efficient filtering and querying based on different attributes (for example, object owner, transaction type, affected addresses, event type). These indexes help identify the keys of interest, which can then fetch detailed content from the raw content kv_ tables:

Transaction pipelines

Tables

  • tx_digests
  • tx_kinds
  • tx_calls
  • tx_affected_addresses
  • tx_affected_objects
  • tx_balance_changes

These pipelines extract and index key transaction attributes to support efficient filtering and querying. tx_kinds, tx_calls, tx_affected_addresses, and tx_affected_objects enable fast lookups of transactions based on types, function calls, sender and receiver addresses, and changed objects. tx_digests enable conversions between transaction sequence numbers and transaction digests needed for looking up transactions in kv_ tables by digests and tx_balance_changes stores balance changes information of each transaction.

Object pipelines

Tables

  • obj_info
  • obj_versions
  • coin_balance_buckets

These pipelines manage current and historical object information. They store active object metadata, maintain version histories for each object, and categorize coin balances into buckets for efficient coin queries sorted by balances. obj_versions table is particularly important for the GraphQL service. It tracks the version history of all blockchain objects, storing object ID, version number, digest, and checkpoint sequence number. The GraphQL service uses this table as an efficient index to resolve object queries by version bounds, checkpoint bounds, or exact versions without loading full object data, enabling features like version pagination and temporal consistency.

Pruning policies can be configured for obj_info and coin_balance_buckets to retain historical data within a specified time range, balancing query needs with storage management. This allows supporting use cases that require querying recent object history without retaining all historical data indefinitely.

Epoch information pipelines

Tables

  • kv_epoch_starts
  • kv_epoch_ends
  • kv_feature_flags
  • kv_protocol_configs

These pipelines capture protocol upgrades and epoch transition points. They track the system state, reward distribution, validator committee and protocol configurations of each epoch, providing a historical record of network evolution.

Event processing pipelines

Tables

  • ev_emit_mod
  • ev_struct_inst

These pipelines index blockchain events for efficient querying by sender, emitting module, or event type.

Utility and support pipelines

Tables

  • cp_sequence_numbers
  • watermarks

These pipelines provide support infrastructure, such as checkpoint sequence number tracking for pruning and watermark tracking for ensuring consistent reads across different tables in a GraphQL query.

Other pipelines

Tables

  • sum_displays

sum_displays tables stores the latest version of the Display object for each object type, used for rendering the off-chain representation (display) for a type.

Indexer pipeline architecture and deployment

General-purpose indexer is built using the Indexer framework, where each pipeline is structured as a set of layered components that interact with each other. Each layer has a distinct role in the data processing flow:

  • Ingestion layer: Fetches checkpoint data and distributes it to pipelines with back pressure management.
  • Process layer: Transforms checkpoint data into structured records specific to each pipeline’s purpose.
  • Committer layer: Writes processed data into the database while tracking progress through watermarks.
  • Optional pruner layer: Manages data retention by removing old records from pipelines that support pruning operations. It operates independently from the main processing pipeline and runs at configurable intervals to delete data older than the specified retention period.

Each Indexer instance can run one or more pipelines, allowing deployments to be scaled and tuned according to workload. In some deployments, the pipelines described previously (except kv_ checkpoints, objects, and transactions) are spread across a number of pods, grouping lightweight pipelines together and isolating heavyweight pipelines in their own deployments. This grouping helps mitigate ingestion bottlenecks, as all pipelines within a pod share the same ingestion service, and the slowest pipeline limits the overall throughput for that pod.

The pipeline composition, concurrency, and deployment grouping is configured via a TOML config file. A built-in GenerateConfig command is provided to output sample configuration files for different deployment setups. The configuration generated by this command includes all pipelines in a single indexer deployment.

As an example, the following configuration is used to run separate indexer deployments for each of the following pipelines:

coin_balance_buckets

[ingestion]
checkpoint-buffer-size = 10000
ingest-concurrency = 200
retry-interval-ms = 200

[pruner]
retention = 14400
max-chunk-size = 2000
prune-concurrency = 2

[committer]
write-concurrency = 5
collect-interval-ms = 500
watermark-interval-ms = 500

[pipeline.coin_balance_buckets.pruner]

cp_sequence_numbers

[ingestion]
checkpoint-buffer-size = 10000
ingest-concurrency = 200
retry-interval-ms = 200

[committer]
write-concurrency = 10
collect-interval-ms = 500
watermark-interval-ms = 500

[pipeline.cp_sequence_numbers]

obj_info

[ingestion]
checkpoint-buffer-size = 10000
ingest-concurrency = 200
retry-interval-ms = 200

[pruner]
interval-ms = 30000
retention = 14400
max-chunk-size = 500
prune-concurrency = 20

[committer]
write-concurrency = 10
collect-interval-ms = 500
watermark-interval-ms = 500

[pipeline.obj_info.pruner]

obj_versions

[ingestion]
checkpoint-buffer-size = 5000
ingest-concurrency = 200
retry-interval-ms = 200

[committer]
write-concurrency = 10
collect-interval-ms = 500
watermark-interval-ms = 500

[pipeline.obj_versions]

kv_packages

[ingestion]
checkpoint-buffer-size = 5000
ingest-concurrency = 200
retry-interval-ms = 200

[committer]
write-concurrency = 5
collect-interval-ms = 500
watermark-interval-ms = 500

[pipeline.kv_packages]

tx_affected_addresses

[ingestion]
checkpoint-buffer-size = 5000
ingest-concurrency = 200
retry-interval-ms = 200

[committer]
write-concurrency = 10
collect-interval-ms = 500
watermark-interval-ms = 500

[pipeline.tx_affected_addresses]

tx_balance_changes

[ingestion]
checkpoint-buffer-size = 5000
ingest-concurrency = 200
retry-interval-ms = 200

[committer]
write-concurrency = 10
collect-interval-ms = 500
watermark-interval-ms = 500

[pipeline.tx_balance_changes]

Remaining tables

And a single additional deployment that handles all the remaining tables:

[ingestion]
checkpoint-buffer-size = 5000
ingest-concurrency = 200
retry-interval-ms = 200

[committer]
write-concurrency = 5
collect-interval-ms = 500
watermark-interval-ms = 500

[pipeline.sum_displays]

[pipeline.kv_epoch_ends]

[pipeline.kv_epoch_starts]

[pipeline.kv_feature_flags]

[pipeline.kv_protocol_configs]

[pipeline.tx_digests]

When to build a custom indexer

This document focuses on the general-purpose Indexer that powers GraphQL. If you want to build your own pipelines for application-specific data (for example, Deepbook order books, Walrus blob metadata, Seal access events, and so on), refer to the Build Your First Custom Indexer.

You can run custom indexers separately to populate an app-specific database. You can then build your own lightweight RPC server with your choice of query mechanism (GraphQL, gRPC, or JSON-RPC) to serve app-specific data from that database.

Working with Consistent Store

The Consistent Store is a combined indexer and RPC service that is responsible for indexing live data on-chain, and serving queries about it for recent checkpoints. Retention (the number of checkpoints to serve information for) is configurable and is typically measured in minutes or hours. Its indexer fetches checkpoints from the same sources as the general-purpose Indexer, and writes data to an embedded RocksDB store, while requests are served through gRPC, answering the following queries:

  • Owner's live objects at a recent checkpoint, optionally filtered by type.
  • Live objects for a given type at a recent checkpoint.
  • Address balance at a recent checkpoint.

This service is not stateless as it maintains its own database. A new instance can be spun up similar to the indexer, by syncing it from genesis, or possibly by restoring it from a formal snapshot.

GraphQL for Sui RPC (Beta)

Use GraphQL to make Sui RPC calls. This feature is currently in Beta.

Custom Indexing Framework

The sui-indexer-alt-framework is a powerful Rust framework for building high-performance, custom blockchain indexers on Sui. It provides customizable, production-ready components for data ingestion, processing, and storage.

Indexer Pipeline Architecture

The sui-indexer-alt-framework provides two distinct pipeline architectures. Understand the differences between the sequential and concurrent pipelines that the sui-indexer-alt-framework provides to decide which best suits your project needs.

Archival Store and Service (Beta)

Overview of the Archival Store and Service to access historical Sui network data.

Build Your First Custom Indexer

Establishing a custom indexer helps improve latency, allows pruning the data of your Sui full node, and provides efficient assemblage of checkpoint data.

Sui Indexer Alt

The sui-indexer-alt crate in the Sui repo.

Move Registry

The indexer that the Move Registry (MVR) implements.

DeepBook Indexer

The indexer that DeepBook implements.

GraphQL Beta schema

Schema documentation for GraphQL Beta