Skip to main content

Sui Validator Alert Reference

When running a Sui Validator node or Full node, you might want to configure alerting based on some or all of the following metrics.

Alert reference​

The following sections cover the alert settings, but their details are meant to be customized in the following ways:

  • Replace $network with your actual network label (for example, mainnet, testnet, and so on).
  • Thresholds assume about 10,000 stake units — adjust for your own validator set size.
  • Labels like host and container are stripped to be agnostic on infrastructure.

High-priority chain health alerts (validator-specific)​

These alerts should receive the most immediate attention from you or your team.

Safe mode during reconfiguration​

KeyValue
NameSafe Mode during Reconfiguration
SummaryEpoch failed to advance; chain entered safe mode
Duration5m
is_safe_mode{network="$network"} > 0.5 or absent(is_safe_mode{network="$network"})

Consensus proposals failure​

KeyValue
NameConsensus Proposals Failure
SummaryLess than 80% of stake is proposing consensus blocks
Duration5m
sum(
sum by (host) (current_voting_right{network="$network"})
and
sum by (host) (rate(consensus_proposed_blocks{network="$network"}[5m])) > 0
) < 8000

Checkpoint execution rate is low​

KeyValue
NameCheckpoint Execution Rate Is Low
SummaryLess than 80% of stake is executing checkpoints quickly enough
Duration5m
sum(
sum by (host) (current_voting_right{network="$network"})
and
sum by (host) (rate(last_executed_checkpoint{network="$network"}[5m])) > 2
) < 8000

Certificate execution latencies are high​

KeyValue
NameCertificate execution latencies are high
SummaryLess than 80% of stake is handling shared-object tx certs with low enough latency
Duration5m
sum(
sum by (host) (current_voting_right{network="$network"})
and
histogram_quantile(0.95, sum by (le, host) (
rate(validator_service_handle_certificate_consensus_latency_bucket{network="$network"}[5m])
)) < 3
) < 8000

Randomness DKG failure​

KeyValue
NameRandomnessDkgFailure
SummaryRandom beacon DKG has failed on one or more hosts
Duration5m
epoch_random_beacon_dkg_failed{network="$network"} > 0 or absent(is_safe_mode{network="$network"})

Validators not upgraded​

KeyValue
NameMysten validators are not upgraded
SummaryValidators are behind on protocol version
Duration1h
min(sui_configured_max_protocol_version{network="$network", host=~"Mysten-.*"})
< quantile(0.34, sui_configured_max_protocol_version{network="$network"})

Non-urgent and warning alerts​

All alerts are important, but the following alerts and warnings can be addressed within a normal node maintenance workflow.

Consensus sequencing p99 latency high​

KeyValue
NameConsensus sequencing p99 latencies are high
SummaryLess than 80% of stake is sequencing tx certs with acceptable latency
Duration1m
sum(
sum by (host) (current_voting_right{network="$network"})
and
histogram_quantile(0.95, sum by (le, host) (
rate(sequencing_certificate_latency_bucket{network="$network", position="0", tx_type=~"shared_certificate|owned_certificate|soft_bundle"}[2m])
)) < 2
) < 5000

System invariant violations​

KeyValue
NameSystem Invariant Violations
SummaryThe system reported an invariant violation
Duration1m
max(system_invariant_violations{network="$network"}) > 0