Skip to main content

Sui Validator Alert Reference

When running a Sui Validator node or Full node, you may want to configure alerting based off some or all of the following metrics.

Alert reference

The following sections cover the alert settings, but their details are meant to be customized in the following ways:

  • Replace $network with your actual network label (for example, mainnet, testnet, and so on).
  • Thresholds assume about 10,000 stake units — adjust for your own validator set size.
  • Labels like host and container are stripped to be agnostic on infrastructure.

High-priority chain health alerts (validator-specific)

These alerts should receive the most immediate attention from you or your team.

Safe mode during reconfiguration

KeyValue
NameSafe Mode during Reconfiguration
SummaryEpoch failed to advance; chain entered safe mode
Duration5m
is_safe_mode{network="$network"} > 0.5 or absent(is_safe_mode{network="$network"})

Consensus proposals failure

KeyValue
NameConsensus Proposals Failure
SummaryLess than 80% of stake is proposing consensus blocks
Duration5m
sum(
sum by (host) (current_voting_right{network="$network"})
and
sum by (host) (rate(consensus_proposed_blocks{network="$network"}[5m])) > 0
) < 8000

Checkpoint execution rate is low

KeyValue
NameCheckpoint Execution Rate Is Low
SummaryLess than 80% of stake is executing checkpoints quickly enough
Duration5m
sum(
sum by (host) (current_voting_right{network="$network"})
and
sum by (host) (rate(last_executed_checkpoint{network="$network"}[5m])) > 2
) < 8000

Certificate execution latencies are high

KeyValue
NameCertificate execution latencies are high
SummaryLess than 80% of stake is handling shared-object tx certs with low enough latency
Duration5m
sum(
sum by (host) (current_voting_right{network="$network"})
and
histogram_quantile(0.95, sum by (le, host) (
rate(validator_service_handle_certificate_consensus_latency_bucket{network="$network"}[5m])
)) < 3
) < 8000

Randomness DKG failure

KeyValue
NameRandomnessDkgFailure
SummaryRandom beacon DKG has failed on one or more hosts
Duration5m
epoch_random_beacon_dkg_failed{network="$network"} > 0 or absent(is_safe_mode{network="$network"})

Validators not upgraded

KeyValue
NameMysten validators are not upgraded
SummaryValidators are behind on protocol version
Duration1h
min(sui_configured_max_protocol_version{network="$network", host=~"Mysten-.*"})
< quantile(0.34, sui_configured_max_protocol_version{network="$network"})

⚠️ Non-urgent and warning alerts

All alerts are important, but the following alerts and warnings can be addressed within a normal node maintenance workflow.

Consensus sequencing p99 latency high

KeyValue
NameConsensus sequencing p99 latencies are high
SummaryLess than 80% of stake is sequencing tx certs with acceptable latency
Duration1m
sum(
sum by (host) (current_voting_right{network="$network"})
and
histogram_quantile(0.95, sum by (le, host) (
rate(sequencing_certificate_latency_bucket{network="$network", position="0", tx_type=~"shared_certificate|owned_certificate|soft_bundle"}[2m])
)) < 2
) < 5000

System invariant violations

KeyValue
NameSystem Invariant Violations
SummaryA system invariant violation was reported
Duration1m
max(system_invariant_violations{network="$network"}) > 0