Gossip-Based High Availability for Solana Validators
Itâs 2:30 AM and the unmistakable sound of your PagerDuty alarm shakes you and the whole house from a deep sleep. You rub your eyes while feeling for your phone and pick it up to see itâs informing you your Solana validator is down. You stumble your way to your computer to begin a groggy attempt at […]

Itâs 2:30 AM and the unmistakable sound of your PagerDuty alarm shakes you and the whole house from a deep sleep. You rub your eyes while feeling for your phone and pick it up to see itâs informing you your Solana validator is down. You stumble your way to your computer to begin a groggy attempt at cobbling together muscle memory commands at the keyboard to get things fixed knowing every ~400ms that goes by youâre missing out on rewards. GLHF!
If this or similar sounds familiar to you, youâve likely had experience running systems on networks that demand high uptime and those often call for high availability (HA) solutions to handle unexpected outages automatically, quickly and gracefully. Solana is no exception, where a primary validator outage can cause (at best) minutes of missed rewards and potential network degradation â an age in Solana time. While failover and HA tools exist, they either lack automation or require extra dependencies and infrastructure that can introduce single points of failure and operational complexity.
To address this challenge, we developed solana-validator-ha â a gossip-based high availability manager that enables validators to automatically detect and monitor peers, and coordinate failover decisions using nothing but the Solana network.
Below is an example of a real automatic failover on a two-node cluster where a backup validator detects a primaryâs outage and automatically takes over as the active (voting) leader, while the primary ensures it comes back as passive (non-voting):
Backup validator (takes over as active after primary outage)

Primary validator (suffers outage, becomes passive)

The Solution
After initial consideration we arrived at the following desirable properties for an HA tool:
- Standalone â we have experience running open source and paid tools for HA and have no appetite to add operational complexity and single points of failure for this use case, we wanted a standalone program that requires nothing but the Solana network.
- Configurable â the tool should be simple to run and easy to configure by the operator since we all do things slightly differently and run under different conditions.
- Safe(ish)Â â as much as possible, build in guardrails to prevent the tool from making a bad situation worse, or make a good one bad (though DYOR before letting it loose on your nodes).
While operators typically run clusters with two validators (one primary and one backup), solana-validator-ha supports clusters of two or more validators. At a high level it runs the following steps on a synchronized loop across all nodes in the cluster:
1. Peer detection â Query gossip for IPs declared in its configuration file and store a snapshot of their role (active or passive) based on their identity and state (available, missing or unreachable). Gossip entries by themselves arenât reliable for HA purposes so the program only stores a node in a gossip state snapshot if it can confirm it can be reached on its advertised port.
2. Leaderless detection â An active discovered peer is crowned the leader and no failover is required. The absence of a leader for a configurable amount of time causes it to trigger a configurable command to ensure it stays passive while existing healthy passive nodes begin evaluating whether they should take over as active.
3. Self-healing â Passive nodes wait for a configurable number of confirmations that a leader is lost and wait a pre-determined amount of time before choosing to self-promote to active using a configurable command. The promotion delay prevents multiple nodes racing to become active at the same time.
More details on the inner workings are available at the repository. The default settings we have found to be reliable on nodes this program has been deployed on and has saved our bacon đ already.
Implementation
Though itâs possible to run from the terminal, operators may find it best to run solana-validator-ha as a systemd service that runs the command:
solana-validator-ha --config config.yaml run
With an example configuration file (see here for more details on settings):
# solana-validator-ha config
validator:
name: primary-validator # vanity name for logging purposes to identify
# this node, its public IP will be auto-detected
# and looked up in gossip
rpc_url: "http://localhost:8899"
identities:
active: "/path/to/active-identity.json" # shared across nodes in the
# validator cluster
passive: "/path/to/passive-identity.json" # unique to this node
cluster:
name: mainnet-beta
failover:
poll_interval_duration: 5s
leaderless_samples_threshold: 3 # allow up to this number of leaderless
# samples (i.e 15s = poll_interval_duration * leaderless_samples_threshold)
# before considering the validator cluster
# leaderless
takeover_jitter_duration: 3s # add a random wait between 0 and this
# duration to the active command to avoid
# before confirming no other peer has become
# active when attempting to become active to
# avoid race conditions
peers:
# vanity names for static IPs identitfying nodes in this validator cluster
# in gossip, these names are irrelevant and used for logging only
backup-validator-1:
ip: 192.168.1.11
backup-validator-2:
ip: 192.168.1.12
active:
command: "set-identity-with-rollback.sh" # BYO command to become active
args: [
"--active-identity-file", "{{ .ActiveIdentityKeypairFile }}", # template string
# referencing value from
# validator.identities.active
"--passive-identity-file", "{{ .PassiveIdentityKeypairFile }}", # template string
# referencing value from
# validator.identities.passive
]
hooks:
pre:
- name: notify-slack
command: "/path/to/notify-script.sh"
args: ["--message", "Promoting to active"]
passive:
command: "seppukku.sh" # BYO command that makes *damn sure*
#this node is passive when it is called
args: [
"--passive-identity-file", "{{ .PassiveIdentityKeypairFile }}",
]
Summary
By leveraging Solanaâs existing gossip protocol for peer discovery and state monitoring, solana-validator-ha provides a simple, robust, and dependency-free solution for high availability validator operations. We have found it to be a good addition to our infrastructure contributing to high levels of uptime and operational resilience through automation.
solana-validator-ha is open source. We welcome contributions and feedback from the community.
Disclaimer
- No Investment Advice or Offer: The information provided here is for general informational purposes only. It does not constitute an offer to sell or a solicitation of an offer to buy any securities, futures, options, or other financial instruments. This information is not investment, legal, or tax advice and should not be considered an individualized recommendation or personalized advice. Any decisions based on this information are your sole responsibility.
- Opinions, Accuracy, and Liability: Views expressed are as of the date indicated, are subject to change without notice, and may not reflect the views of SOL Strategies. Certain statements may be based on SOL Strategiesâ views, estimates, or opinions, which may not be accurate or ultimately realized. Information obtained from third-party sources has not been independently verified, and SOL Strategies does not assume responsibility for its accuracy. SOL Strategies nor any of its affiliates, shareholders, partners, members, directors, officers, management, employees, or representatives makes any representation or warranty, express or implied, as to the accuracy or completeness of this information. SOL Strategies expressly disclaims any and all liability relating to or resulting from the use of this information.
- Company Disclosures & Conflicts: SOL Strategies and its affiliates may own investments or have other incentives in some of the digital assets, protocols, and securities discussed herein. SOL Strategies does not provide services as a money transmitter, custodian, bank, securities broker-dealer, investment adviser, or commodity trading adviser and is not registered as such with the U.S. Securities and Exchange Commission, the U.S. Commodity Futures Trading Commission, or other regulatory agencies.
- Important Risk Warnings: Past performance is no guarantee of future results, and examples are for illustrative purposes only. All investments carry risk. Digital asset investments are high-risk and subject to, among other things, price volatility, regulatory changes, and cyber-attacks. Cryptocurrencies are not legal tender, not backed by any government, can become illiquid, and may result in the total loss of principal. On-chain transactions are irreversible. These investments are only for investors with a high-risk tolerance.
- Forward-Looking Statements: The information provided herein may contain âforward-looking informationâ within the meaning of applicable securities laws. Forward-looking information is based on certain factors and assumptions believed to be reasonable at the time such statements are made and is subject to known and unknown risks, uncertainties, and other factors that may cause the actual results, level of activity, performance, or achievements to be materially different from those expressed or implied by such forward-looking information. There can be no assurance that such forward-looking information will prove to be accurate, as actual results and future events could differ materially from those anticipated in such information. Accordingly, readers should not place undue reliance on forward-looking information. Readers are cautioned against attributing undue certainty to forward-looking statements.








