Skip to main content

GraphQL and General-Purpose Indexer (Beta)

⚙️Early-Stage Feature

This content describes an alpha/beta feature or service. These early stage features and services are in active development, so details are likely to change.

This feature or service is currently available in

  • Devnet
  • Testnet
  • Mainnet

The GraphQL and sui-indexer-alt (Indexer) stack are part of the Sui data access infrastructure. The stack provides access to on-chain data through a high-performance GraphQL service backed by a scalable and general-purpose indexer. This stack is optimized for developers who need flexible queries, structured output, and historical access to data (with configurable retention) across the Sui network.

GraphQL is ideal for applications that require rich query patterns over structured data, such as fetching owned objects, transaction history, specific onchain attributes, and more. The GraphQL service runs on top of a Postgres-compatible database that is updated by different Indexer pipelines in parallel.

The general-purpose indexer includes configurable checkpoint-based pipelines that extract data from the Sui remote checkpoint store and full nodes. The pipelines write processed data to a database optimized for GraphQL query access. Together, the GraphQL service and Indexer offer a modular and production-ready data stack for builders, wallet developers, explorers, and indexer/data providers.

info

Refer to Access Sui Data for an overview of options to access Sui network data.

Key components

The key components of the stack include the following:

  • General-purpose Indexer: Ingests and transforms Sui checkpoint data using configurable and parallel pipelines, and writes it into a Postgres-compatible database. Can be configured to use the Sui remote checkpoint store and a full node as its sources.
  • Postgres-compatible database: Stores indexed data for GraphQL queries. Tested using GCP AlloyDB, but you can run any Postgres-compatible database. You're encouraged to test alternative databases and share feedback on performance, cost, and operational characteristics.
  • GraphQL service: Serves structured queries over indexed data. Follows the GraphQL specification and the supported schema is documented in the GraphQL API reference. Also take a look at the getting started guide.
  • Archival Service: Enables point lookups for historical data from a key-value store. If unavailable, the GraphQL service falls back to the Postgres-compatible database for lookups, which might be limited by that database's retention policy. See Archival Store and Service for more information.
  • Consistent Store: Answers questions about the latest state of the network within the last hour (objects owned by addresses, objects by type, balances by address and type). Consistency is guaranteed by pinning queries to a specific (recent) checkpoint.
  • Full node: Enables transaction execution and simulation. Currently, JSON-RPC is used but there will be a switch to gRPC soon as the long-term full node API in the future.

When to use GraphQL

Use GraphQL if your application:

  • Requires historical (with configurable retention) or filtered access to data (e.g., all transactions sent by an address)
  • Needs to display structured results in a frontend (e.g., wallets, dashboards)
  • Benefits from flexible, composable queries that reduce overfetching
  • Relies on multiple data entities (e.g., transactions + objects + events) in a single request, or in a consistent fashion when spread over multiple requests (as if the responses came from a snapshot at some checkpoint).

Deployment options

You can run or use the GraphQL and Indexer data stack in the following configurations:

Fully managed service

As a developer, you can access GraphQL as a service from an indexer operator or data provider who runs and operates the full stack behind the scenes. Reach out to your data provider and ask if they already offer or plan to offer this service.

Partial self-managed

As a developer, you can:

  • Run the Indexer pipelines and GraphQL service, while using the Archival Service and a full node from an RPC provider or indexer operator.
  • Configure and manage a Postgres-compatible database (local Postgres, AlloyDB, and so on) as the primary data store.
  • Deploy the self-managed components on cloud infrastructure or baremetal.
info

You cannot run and operate the GraphQL service in self-managed configurations just yet. The functionality will become available after required changes are made to how the GraphQL service integrates with the Archival Service.

Fully self-managed

As a developer, indexer operator, or RPC provider, you can:

  • Run the complete stack: Indexer pipelines, GraphQL service, Postgres-compatible database, Archival Service, Consistent Store and full node on cloud infrastructure or bare metal.
  • Serve GraphQL to your own applications or to other builders and third-party services.

Refer to For RPC providers and Data operators for relevant information.

Working with the GraphQL service

The GraphQL service exposes a query surface conforming to GraphQL concepts. It allows pagination, filtering, and consistent snapshot queries. The service also supports runtime configuration for schema, query cost limits, and logging.

The GraphQL schema is defined in the GraphQL reference. You can explore supported types and fields there, use the GraphiQL IDE to test queries, and read documentation on the up-to-date schema.

The GraphQL service is deployed as a single binary implementing a stateless, horizontally scalable service. Queries are served with data from one or more of a Postgres-compatible database (filters over historical data), Archival Service (point lookups), Consistent Store (live data), or full node (execution and simulation), based on need. Access to these stores must be configured with the service on start-up, otherwise the service might fail to respond correctly to requests. More details on how to set-up, configure, and run the service is available in its README.

Requests to GraphQL are subject to various limits, to ensure resources are shared fairly between clients. Each limit is configurable, and the values configured for an instance can be queried through Query.serviceConfig. Requests that do not meet limits return with an error. The following limits are in effect:

  • Request size: Requests may not limit a certain size in bytes. The limit is spread across a transaction payload limit, which applies to all values and variable bindings that are parameters to transaction signing, execution, and simulation fields (default: 175KB), and a query payload limit which applies to all other parts of the query (default: 5KB).
  • Request timeout: Time spent on each request is bounded, with different bounds for execution (default: 74s) and regular reads (default: 40s).
  • Query input nodes and depth: The query cannot be too complex, meaning it cannot contain too many input nodes or field names (default: 300) or be too deeply nested (default: 20).
  • Output nodes: The service estimates the maximum number of output nodes the query might produce, assuming every requested field is present, every paginated field returns full pages, and every multi-get finds all requested keys. This estimate must be bounded (default: 1,000,000).
  • Page and multi-get size: Each paginated field (default: 50) and multi-get (default: 200) is subject to a maximum size. Certain paginated fields might override this to provide a higher or lower maximum.
  • (TBD) Rich queries: A request can contain only a bounded number (default: 5) of queries that require dedicated access to the database (cannot be grouped with other requests).

Working with General-purpose Indexer

General-purpose indexer fetches checkpoints data from either a remote object store, local files, or a full node RPC, and indexes data into multiple database tables via a set of specialized pipelines. Each pipeline is responsible for extracting specific data and writing to its target tables.

Click to open

Full list of tables and their schemas

// @generated automatically by Diesel CLI.

diesel::table! {
coin_balance_buckets (object_id, cp_sequence_number) {
object_id -> Bytea,
cp_sequence_number -> Int8,
owner_kind -> Nullable<Int2>,
owner_id -> Nullable<Bytea>,
coin_type -> Nullable<Bytea>,
coin_balance_bucket -> Nullable<Int2>,
}
}

diesel::table! {
coin_balance_buckets_deletion_reference (cp_sequence_number, object_id) {
object_id -> Bytea,
cp_sequence_number -> Int8,
}
}

diesel::table! {
cp_sequence_numbers (cp_sequence_number) {
cp_sequence_number -> Int8,
tx_lo -> Int8,
epoch -> Int8,
}
}

diesel::table! {
ev_emit_mod (package, module, tx_sequence_number) {
package -> Bytea,
module -> Text,
tx_sequence_number -> Int8,
sender -> Bytea,
}
}

diesel::table! {
ev_struct_inst (package, module, name, instantiation, tx_sequence_number) {
package -> Bytea,
module -> Text,
name -> Text,
instantiation -> Bytea,
tx_sequence_number -> Int8,
sender -> Bytea,
}
}

diesel::table! {
kv_checkpoints (sequence_number) {
sequence_number -> Int8,
checkpoint_contents -> Bytea,
checkpoint_summary -> Bytea,
validator_signatures -> Bytea,
}
}

diesel::table! {
kv_epoch_ends (epoch) {
epoch -> Int8,
cp_hi -> Int8,
tx_hi -> Int8,
end_timestamp_ms -> Int8,
safe_mode -> Bool,
total_stake -> Nullable<Int8>,
storage_fund_balance -> Nullable<Int8>,
storage_fund_reinvestment -> Nullable<Int8>,
storage_charge -> Nullable<Int8>,
storage_rebate -> Nullable<Int8>,
stake_subsidy_amount -> Nullable<Int8>,
total_gas_fees -> Nullable<Int8>,
total_stake_rewards_distributed -> Nullable<Int8>,
leftover_storage_fund_inflow -> Nullable<Int8>,
epoch_commitments -> Bytea,
}
}

diesel::table! {
kv_epoch_starts (epoch) {
epoch -> Int8,
protocol_version -> Int8,
cp_lo -> Int8,
start_timestamp_ms -> Int8,
reference_gas_price -> Int8,
system_state -> Bytea,
}
}

diesel::table! {
kv_feature_flags (protocol_version, flag_name) {
protocol_version -> Int8,
flag_name -> Text,
flag_value -> Bool,
}
}

diesel::table! {
kv_genesis (genesis_digest) {
genesis_digest -> Bytea,
initial_protocol_version -> Int8,
}
}

diesel::table! {
kv_objects (object_id, object_version) {
object_id -> Bytea,
object_version -> Int8,
serialized_object -> Nullable<Bytea>,
}
}

diesel::table! {
kv_packages (package_id, package_version) {
package_id -> Bytea,
package_version -> Int8,
original_id -> Bytea,
is_system_package -> Bool,
serialized_object -> Bytea,
cp_sequence_number -> Int8,
}
}

diesel::table! {
kv_protocol_configs (protocol_version, config_name) {
protocol_version -> Int8,
config_name -> Text,
config_value -> Nullable<Text>,
}
}

diesel::table! {
kv_transactions (tx_digest) {
tx_digest -> Bytea,
cp_sequence_number -> Int8,
timestamp_ms -> Int8,
raw_transaction -> Bytea,
raw_effects -> Bytea,
events -> Bytea,
user_signatures -> Bytea,
}
}

diesel::table! {
obj_info (object_id, cp_sequence_number) {
object_id -> Bytea,
cp_sequence_number -> Int8,
owner_kind -> Nullable<Int2>,
owner_id -> Nullable<Bytea>,
package -> Nullable<Bytea>,
module -> Nullable<Text>,
name -> Nullable<Text>,
instantiation -> Nullable<Bytea>,
}
}

diesel::table! {
obj_info_deletion_reference (cp_sequence_number, object_id) {
object_id -> Bytea,
cp_sequence_number -> Int8,
}
}

diesel::table! {
obj_versions (object_id, object_version) {
object_id -> Bytea,
object_version -> Int8,
object_digest -> Nullable<Bytea>,
cp_sequence_number -> Int8,
}
}

diesel::table! {
sum_displays (object_type) {
object_type -> Bytea,
display_id -> Bytea,
display_version -> Int2,
display -> Bytea,
}
}

diesel::table! {
tx_affected_addresses (affected, tx_sequence_number) {
affected -> Bytea,
tx_sequence_number -> Int8,
sender -> Bytea,
}
}

diesel::table! {
tx_affected_objects (affected, tx_sequence_number) {
tx_sequence_number -> Int8,
affected -> Bytea,
sender -> Bytea,
}
}

diesel::table! {
tx_balance_changes (tx_sequence_number) {
tx_sequence_number -> Int8,
balance_changes -> Bytea,
}
}

diesel::table! {
tx_calls (package, module, function, tx_sequence_number) {
package -> Bytea,
module -> Text,
function -> Text,
tx_sequence_number -> Int8,
sender -> Bytea,
}
}

diesel::table! {
tx_digests (tx_sequence_number) {
tx_sequence_number -> Int8,
tx_digest -> Bytea,
}
}

diesel::table! {
tx_kinds (tx_kind, tx_sequence_number) {
tx_kind -> Int2,
tx_sequence_number -> Int8,
}
}

diesel::table! {
watermarks (pipeline) {
pipeline -> Text,
epoch_hi_inclusive -> Int8,
checkpoint_hi_inclusive -> Int8,
tx_hi -> Int8,
timestamp_ms_hi_inclusive -> Int8,
reader_lo -> Int8,
pruner_timestamp -> Timestamp,
pruner_hi -> Int8,
}
}

diesel::allow_tables_to_appear_in_same_query!(
coin_balance_buckets,
coin_balance_buckets_deletion_reference,
cp_sequence_numbers,
ev_emit_mod,
ev_struct_inst,
kv_checkpoints,
kv_epoch_ends,
kv_epoch_starts,
kv_feature_flags,
kv_genesis,
kv_objects,
kv_packages,
kv_protocol_configs,
kv_transactions,
obj_info,
obj_info_deletion_reference,
obj_versions,
sum_displays,
tx_affected_addresses,
tx_affected_objects,
tx_balance_changes,
tx_calls,
tx_digests,
tx_kinds,
watermarks,
);

Below are brief descriptions of the various categories of pipelines based on the type of data they handle:

Blockchain raw content pipelines

Tables:

  • kv_checkpoints
  • kv_transactions
  • kv_objects
  • kv_packages

These pipelines capture the core blockchain data in its raw form, preserving complete checkpoint information, full transaction and objects contents, and Move package bytecode and metadata. They ensure the complete blockchain state is available for direct lookup by key (for example, object ID and version, transaction digest, checkpoint sequence number). Some production deployments use the Archival Store for looking up checkpoints, transactions, and objects contents instead of the corresponding kv_ tables.

The following pipelines create indexed views that allow efficient filtering and querying based on different attributes (for example, object owner, transaction type, affected addresses, event type). These indexes help identify the keys of interest, which can then fetch detailed content from the raw content kv_ tables:

Transaction pipelines

Tables

  • tx_digests
  • tx_kinds
  • tx_calls
  • tx_affected_addresses
  • tx_affected_objects
  • tx_balance_changes

These pipelines extract and index key transaction attributes to support efficient filtering and querying. tx_kinds, tx_calls, tx_affected_addresses, and tx_affected_objects enable fast lookups of transactions based on types, function calls, sender and receiver addresses, and changed objects. tx_digests enable conversions between transaction sequence numbers and transaction digests needed for looking up transactions in kv_ tables by digests and tx_balance_changes stores balance changes information of each transaction.

Object pipelines

Tables

  • obj_info
  • obj_versions
  • coin_balance_buckets

These pipelines manage current and historical object information. They store active object metadata, maintain version histories for each object, and categorize coin balances into buckets for efficient coin queries sorted by balances. obj_versions table is particularly important for the GraphQL service. It tracks the version history of all blockchain objects, storing object ID, version number, digest, and checkpoint sequence number. The GraphQL service uses this table as an efficient index to resolve object queries by version bounds, checkpoint bounds, or exact versions without loading full object data, enabling features like version pagination and temporal consistency.

Pruning policies can be configured for obj_info and coin_balance_buckets to retain historical data within a specified time range, balancing query needs with storage management. This allows supporting use cases that require querying recent object history without retaining all historical data indefinitely.

Epoch information pipelines

Tables

  • kv_epoch_starts
  • kv_epoch_ends
  • kv_feature_flags
  • kv_protocol_configs

These pipelines capture protocol upgrades and epoch transition points. They track the system state, reward distribution, validator committee and protocol configurations of each epoch, providing a historical record of network evolution.

Event processing pipelines

Tables

  • ev_emit_mod
  • ev_struct_inst

These pipelines index blockchain events for efficient querying by sender, emitting module, or event type.

Utility and support pipelines

Tables

  • cp_sequence_numbers
  • watermarks

These pipelines provide support infrastructure, such as checkpoint sequence number tracking for pruning and watermark tracking for ensuring consistent reads across different tables in a GraphQL query.

Other pipelines

Tables

  • sum_displays

sum_displays tables stores the latest version of the Display object for each object type, used for rendering the off-chain representation (display) for a type.

Indexer pipeline architecture and deployment

General-purpose indexer is built using the Indexer framework, where each pipeline is structured as a set of layered components that interact with each other. Each layer has a distinct role in the data processing flow:

  • Ingestion layer: Fetches checkpoint data and distributes it to pipelines with back pressure management.
  • Process layer: Transforms checkpoint data into structured records specific to each pipeline’s purpose.
  • Committer layer: Writes processed data into the database while tracking progress through watermarks.
  • Optional pruner layer: Manages data retention by removing old records from pipelines that support pruning operations. It operates independently from the main processing pipeline and runs at configurable intervals to delete data older than the specified retention period.

Each Indexer instance can run one or more pipelines, allowing deployments to be scaled and tuned according to workload. In some deployments, the pipelines described previously (except kv_ checkpoints, objects, and transactions) are spread across a number of pods, grouping lightweight pipelines together and isolating heavyweight pipelines in their own deployments. This grouping helps mitigate ingestion bottlenecks, as all pipelines within a pod share the same ingestion service, and the slowest pipeline limits the overall throughput for that pod.

The pipeline composition, concurrency, and deployment grouping is configured via a TOML config file. A built-in GenerateConfig command is provided to output sample configuration files for different deployment setups. The configuration generated by this command includes all pipelines in a single indexer deployment.

As an example, the following configuration is used to run separate indexer deployments for each of the following pipelines:

coin_balance_buckets

[ingestion]
checkpoint-buffer-size = 10000
ingest-concurrency = 200
retry-interval-ms = 200

[pruner]
retention = 14400
max-chunk-size = 2000
prune-concurrency = 2

[committer]
write-concurrency = 5
collect-interval-ms = 500
watermark-interval-ms = 500

[pipeline.coin_balance_buckets.pruner]

cp_sequence_numbers

[ingestion]
checkpoint-buffer-size = 10000
ingest-concurrency = 200
retry-interval-ms = 200

[committer]
write-concurrency = 10
collect-interval-ms = 500
watermark-interval-ms = 500

[pipeline.cp_sequence_numbers]

obj_info

[ingestion]
checkpoint-buffer-size = 10000
ingest-concurrency = 200
retry-interval-ms = 200

[pruner]
interval-ms = 30000
retention = 14400
max-chunk-size = 500
prune-concurrency = 20

[committer]
write-concurrency = 10
collect-interval-ms = 500
watermark-interval-ms = 500

[pipeline.obj_info.pruner]

obj_versions

[ingestion]
checkpoint-buffer-size = 5000
ingest-concurrency = 200
retry-interval-ms = 200

[committer]
write-concurrency = 10
collect-interval-ms = 500
watermark-interval-ms = 500

[pipeline.obj_versions]

kv_packages

[ingestion]
checkpoint-buffer-size = 5000
ingest-concurrency = 200
retry-interval-ms = 200

[committer]
write-concurrency = 5
collect-interval-ms = 500
watermark-interval-ms = 500

[pipeline.kv_packages]

tx_affected_addresses

[ingestion]
checkpoint-buffer-size = 5000
ingest-concurrency = 200
retry-interval-ms = 200

[committer]
write-concurrency = 10
collect-interval-ms = 500
watermark-interval-ms = 500

[pipeline.tx_affected_addresses]

tx_balance_changes

[ingestion]
checkpoint-buffer-size = 5000
ingest-concurrency = 200
retry-interval-ms = 200

[committer]
write-concurrency = 10
collect-interval-ms = 500
watermark-interval-ms = 500

[pipeline.tx_balance_changes]

Remaining tables

And a single additional deployment that handles all the remaining tables:

[ingestion]
checkpoint-buffer-size = 5000
ingest-concurrency = 200
retry-interval-ms = 200

[committer]
write-concurrency = 5
collect-interval-ms = 500
watermark-interval-ms = 500

[pipeline.sum_displays]

[pipeline.kv_epoch_ends]

[pipeline.kv_epoch_starts]

[pipeline.kv_feature_flags]

[pipeline.kv_protocol_configs]

[pipeline.tx_digests]

When to build a custom indexer

This document focuses on the general-purpose Indexer that powers GraphQL. If you want to build your own pipelines for application-specific data (for example, Deepbook order books, Walrus blob metadata, Seal access events, and so on), refer to the Build Your First Custom Indexer.

You can run custom indexers separately to populate an app-specific database. You can then build your own lightweight RPC server with your choice of query mechanism (GraphQL, gRPC, or JSON-RPC) to serve app-specific data from that database.

Working with Consistent Store

The Consistent Store is a combined indexer and RPC service that is responsible for indexing live data on-chain, and serving queries about it for recent checkpoints. Retention (the number of checkpoints to serve information for) is configurable and is typically measured in minutes or hours. Its indexer fetches checkpoints from the same sources as the general-purpose Indexer, and writes data to an embedded RocksDB store, while requests are served through gRPC, answering the following queries:

  • Owner's live objects at a recent checkpoint, optionally filtered by type.
  • Live objects for a given type at a recent checkpoint.
  • Address balance at a recent checkpoint.

This service is not stateless as it maintains its own database. A new instance can be spun up similar to the indexer, by syncing it from genesis, or possibly by restoring it from a formal snapshot.

For RPC providers and data operators

If you're running the GraphQL RPC + General-purpose Indexer stack as a service, here are a few key considerations for configuring your setup to offer builders a performant and cost-effective experience.

How much data to index and retain

You should retain 30 to 90 days of recent checkpoint data in your Postgres-compatible database. This provides a strong default for most apps without incurring the high storage costs of full historical indexing.

  • 30 days is a great baseline for dashboards and explorers that need recent activity and assets.
  • 90 days improves support for longer-range pagination, historical lookups, or dApps with slower engagement cycles.

You can configure your indexing pipelines to scope which data you include (such as events, objects, and transactions), and disable any components that aren’t needed.

note

Retaining long-term historical data in Postgres is not recommended unless required for specific apps.

Use the Archival Service and Store for historical lookups

For all production deployments, you are strongly encouraged to pair Postgres with the Archival Service to support point lookups of transactions, objects, and checkpoints when relevant data does not exist in Postgres.

  • The Archival Service serves as the backend for historical versions and checkpoint data, reducing pressure on your Postgres instance.
  • While not strictly required, it is strongly recommended that you use the Archival Service in any production setup that aims to support high-retention GraphQL or gRPC workloads.

Current implementation supports GCP Bigtable which is a highly scalable and performant data store. If you plan to operate your own archival store, refer to sui-kvstore and sui-kv-rpc for indexer setup and RPC service implementation respectively. For the indexer setup, make sure to use the custom indexing framework. If you're interested in contributing support for other scalable data stores, reach out on GitHub by creating a new issue.

Click to open

main.rs in sui-kvstore

use anyhow::Result;
use clap::{Parser, Subcommand};
use prometheus::Registry;
use std::io::{self, Write};
use std::str::FromStr;
use sui_data_ingestion_core::{DataIngestionMetrics, IndexerExecutor, ReaderOptions, WorkerPool};
use sui_kvstore::{BigTableClient, BigTableProgressStore, KeyValueStoreReader, KvWorker};
use sui_types::base_types::ObjectID;
use sui_types::digests::TransactionDigest;
use sui_types::storage::ObjectKey;
use telemetry_subscribers::TelemetryConfig;
use tokio::sync::oneshot;

#[derive(Parser)]
struct App {
instance_id: String,
#[command(subcommand)]
command: Option<Command>,
}

#[derive(Subcommand)]
pub enum Command {
Ingestion {
network: String,
},
Fetch {
#[command(subcommand)]
entry: Entry,
},
}

#[derive(Subcommand)]
pub enum Entry {
Object { id: String, version: u64 },
Epoch { id: u64 },
Checkpoint { id: u64 },
Transaction { id: String },
Watermark,
}

#[tokio::main]
async fn main() -> Result<()> {
let _guard = TelemetryConfig::new().with_env().init();
let app = App::parse();
match app.command {
Some(Command::Ingestion { network }) => {
let client = BigTableClient::new_remote(
app.instance_id,
false,
None,
"ingestion".to_string(),
None,
None,
)
.await?;
let (_exit_sender, exit_receiver) = oneshot::channel();
let mut executor = IndexerExecutor::new(
BigTableProgressStore::new(client.clone()),
1,
DataIngestionMetrics::new(&Registry::new()),
);
let worker_pool = WorkerPool::new(KvWorker { client }, "bigtable".to_string(), 50);
executor.register(worker_pool).await?;
executor
.run(
tempfile::tempdir()?.keep(),
Some(format!("https://checkpoints.{}.sui.io", network)),
vec![],
ReaderOptions::default(),
exit_receiver,
)
.await?;
}
Some(Command::Fetch { entry }) => {
let mut client = BigTableClient::new_remote(
app.instance_id,
true,
None,
"cli".to_string(),
None,
None,
)
.await?;
let result = match entry {
Entry::Epoch { id } => client.get_epoch(id).await?.map(|e| bcs::to_bytes(&e)),
Entry::Object { id, version } => {
let objects = client
.get_objects(&[ObjectKey(ObjectID::from_str(&id)?, version.into())])
.await?;
objects.first().map(bcs::to_bytes)
}
Entry::Checkpoint { id } => {
let checkpoints = client.get_checkpoints(&[id]).await?;
checkpoints.first().map(bcs::to_bytes)
}
Entry::Transaction { id } => {
let transactions = client
.get_transactions(&[TransactionDigest::from_str(&id)?])
.await?;
transactions.first().map(bcs::to_bytes)
}
Entry::Watermark => {
let watermark = client.get_latest_checkpoint().await?;
println!("watermark is {}", watermark);
return Ok(());
}
};
match result {
Some(bytes) => io::stdout().write_all(&bytes?)?,
None => println!("not found"),
}
}
None => println!("no command provided"),
}
Ok(())
}
Click to open

main.rs in sui-kv-rpc

use anyhow::Result;
use axum::routing::get;
use axum::Router;
use clap::Parser;
use mysten_network::callback::CallbackLayer;
use prometheus::Registry;
use std::sync::Arc;
use sui_kv_rpc::KvRpcServer;
use sui_rpc::proto::sui::rpc::v2beta2::ledger_service_server::LedgerServiceServer;
use sui_rpc_api::{RpcMetrics, RpcMetricsMakeCallbackHandler, ServerVersion};
use telemetry_subscribers::TelemetryConfig;
use tonic::transport::{Identity, Server, ServerTlsConfig};

bin_version::bin_version!();

#[derive(Parser)]
struct App {
credentials: String,
instance_id: String,
#[clap(default_value = "[::1]:8000")]
address: String,
#[clap(default_value = "127.0.0.1")]
metrics_host: String,
#[clap(default_value_t = 9184)]
metrics_port: usize,
#[clap(long = "tls-cert", default_value = "")]
tls_cert: String,
#[clap(long = "tls-key", default_value = "")]
tls_key: String,
#[clap(long = "app-profile-id")]
app_profile_id: Option<String>,
#[clap(long = "checkpoint-bucket")]
checkpoint_bucket: Option<String>,
}

async fn health_check() -> &'static str {
"OK"
}

#[tokio::main]
async fn main() -> Result<()> {
let _guard = TelemetryConfig::new().with_env().init();
let app = App::parse();
std::env::set_var("GOOGLE_APPLICATION_CREDENTIALS", app.credentials.clone());
let server_version = Some(ServerVersion::new("sui-kv-rpc", VERSION));
let registry_service = mysten_metrics::start_prometheus_server(
format!("{}:{}", app.metrics_host, app.metrics_port).parse()?,
);
let registry: Registry = registry_service.default_registry();
mysten_metrics::init_metrics(&registry);
let server = KvRpcServer::new(
app.instance_id,
app.app_profile_id,
app.checkpoint_bucket,
server_version,
&registry,
)
.await?;
let addr = app.address.parse()?;
let mut builder = Server::builder();
if !app.tls_cert.is_empty() && !app.tls_key.is_empty() {
let identity =
Identity::from_pem(std::fs::read(app.tls_cert)?, std::fs::read(app.tls_key)?);
let tls_config = ServerTlsConfig::new().identity(identity);
builder = builder.tls_config(tls_config)?;
}
let reflection_v1 = tonic_reflection::server::Builder::configure()
.register_encoded_file_descriptor_set(
sui_rpc_api::proto::google::protobuf::FILE_DESCRIPTOR_SET,
)
.register_encoded_file_descriptor_set(sui_rpc_api::proto::google::rpc::FILE_DESCRIPTOR_SET)
.register_encoded_file_descriptor_set(sui_rpc::proto::sui::rpc::v2::FILE_DESCRIPTOR_SET)
.register_encoded_file_descriptor_set(
sui_rpc::proto::sui::rpc::v2beta2::FILE_DESCRIPTOR_SET,
)
.build_v1()?;
let reflection_v1alpha = tonic_reflection::server::Builder::configure()
.register_encoded_file_descriptor_set(
sui_rpc_api::proto::google::protobuf::FILE_DESCRIPTOR_SET,
)
.register_encoded_file_descriptor_set(sui_rpc_api::proto::google::rpc::FILE_DESCRIPTOR_SET)
.register_encoded_file_descriptor_set(sui_rpc::proto::sui::rpc::v2::FILE_DESCRIPTOR_SET)
.register_encoded_file_descriptor_set(
sui_rpc::proto::sui::rpc::v2beta2::FILE_DESCRIPTOR_SET,
)
.build_v1alpha()?;
tokio::spawn(async {
let web_server = Router::new().route("/health", get(health_check));
let listener = tokio::net::TcpListener::bind("0.0.0.0:8081")
.await
.expect("can't bind to the healthcheck port");
axum::serve(listener, web_server.into_make_service())
.await
.expect("healh check service failed");
});
builder
.layer(CallbackLayer::new(RpcMetricsMakeCallbackHandler::new(
Arc::new(RpcMetrics::new(&registry)),
)))
.add_service(LedgerServiceServer::new(server.clone()))
.add_service(
sui_rpc::proto::sui::rpc::v2::ledger_service_server::LedgerServiceServer::new(server),
)
.add_service(reflection_v1)
.add_service(reflection_v1alpha)
.serve(addr)
.await?;
Ok(())
}

Deployment strategies and trade-offs

You don’t need to index everything to provide a reliable and performant GraphQL RPC service. In fact, many developers might need only the latest object and transaction data plus a few weeks to months of history. You can reduce operational overhead and improve query performance by:

  • Configuring a clear retention window (such as 30–90 days) in Postgres.
  • Using the Archival Service to handle deep historical queries, rather than retaining all versions in Postgres.

When designing your deployment, consider the trade-offs between cost, reliability, and feature completeness:

  • Postgres-only with short-retention results in lower storage cost and faster performance, but limited historical coverage.
  • Postgres-only with high retention results in broader data coverage, but relatively higher storage cost and slower performance at scale.
  • Postgres with short-retention + Archival Service results in optimization for cost and completeness, ideal for production deployments.

To improve performance and reliability, also consider these operational best practices:

  • Try and co-locate your database, indexing pipelines, GraphQL RPC service, and archival service in the same region as your users to minimize latency.
  • Use replication and staged deployments to ensure SLA during upgrades or failures.
  • Consider offering different tiers of service to meet different developer needs. For example:
    • A basic tier that serves recent data (30 days, for example) via GraphQL RPC or gRPC.
    • A premium tier with full GraphQL / gRPC + Archival Service access, suited to apps that need historical lookups.
    • Optionally offer region-specific instances or throughput-based pricing to support diverse client footprints.
GraphQL for Sui RPC (Beta)

Use GraphQL to make Sui RPC calls. This feature is currently in Beta.

Custom Indexing Framework

The sui-indexer-alt-framework is a powerful Rust framework for building high-performance, custom blockchain indexers on Sui. It provides customizable, production-ready components for data ingestion, processing, and storage.

Indexer Pipeline Architecture

The sui-indexer-alt-framework provides two distinct pipeline architectures. Understand the differences between the sequential and concurrent pipelines that the sui-indexer-alt-framework provides to decide which best suits your project needs.

Archival Store and Service (Beta)

Overview of the Archival Store and Service to access historical Sui network data.

Build Your First Custom Indexer

Establishing a custom indexer helps improve latency, allows pruning the data of your Sui full node, and provides efficient assemblage of checkpoint data.

Sui Indexer Alt

The sui-indexer-alt crate in the Sui repo.

Move Registry

The indexer that the Move Registry (MVR) implements.

DeepBook Indexer

The indexer that DeepBook implements.

GraphQL Beta schema

Schema documentation for GraphQL Beta