Evaluating a Cosmos Validator: Securing the Private Key
As we conclude Game of Stakes and march toward the Cosmos Mainnet launch, Delegators must perform the crucial function of selecting Validators with which to delegate. There are multiple dimensions to consider when delegating, and in this series of posts we’ll outline key areas for Delegators to learn about.
In this post we’ll dive into security and Private Key management, and discuss the tradeoffs of different approaches. As a Delegator, you should have a good understanding of your Validator’s approach so you can better understand your slashing risk for Atom loss.
Securing a Validator’s Private Key
A critical part of being a Validator is proposing and signing blocks. This is done in a cryptographically secure way by using a unique “private key” that each Validator controls.
The private key is how a Validator proves who they are (identity), and therefore must be protected with utmost levels of security. If the private key is compromised or lost, the Validator must be abandoned.
The Cosmos/Tendermint private key is stored in a small text file that contains a long string of characters in a format called JSON and it looks like this:
The challenge for a Validator is: how to balance making their private key available to quickly sign blocks to keep the network humming along while at the same time securing and limiting access to it. Said differently: how to reliably access a file while severely restricting access to it?
There are several approaches to securing a private key, each with their own benefits and tradeoffs. As a Delegator, you must understand the approach your Validator is taking, as the penalties for misusing or losing a private key are steep.
Penalties for Misusing or Losing a Private Key
As we wrote about in Cosmos Delegator and Validator Economics, the biggest impact on Delegator returns is limiting slashing risk. One of the ways a Validator can get slashed is if they “double sign”: their private key is used to sign the same block twice.
For the purposes of this discussion, we’ll focus on two ways that double signing can happen: accidents and compromised private key.
As an accident, the Tendermint/Cosmos software may have a bug or an unexpected failure mode (aka crash). As the software restarts, previous blocks may be signed unbeknown to the Validator and could cause a double sign. These sort of situations may be out of the Validator’s control, or they could be the result of misconfigured software or other operator error.
Another accident scenario is when a Validator may keep two servers with the Cosmos/Tendermint software running simultaneously and they both try to sign the same block. Perhaps to limit downtime, a Validator keeps a live and active server signing blocks and a backup server at the ready to switch to in the case the active server fails. If the backup server starts signing blocks while the active server is also signing blocks, the Validator will double sign.
If the private key is compromised and another party gets possession of it, they can very simply force a Validator to double sign by running a second version of the Cosmos/Tendermint software and using the private key. This kind of attack is far more nefarious and usually a directed action against a Validator.
In both cases (accident and compromise), the penalty for a Validator double signing is getting slashed and “tombstone jailed”. When a Validator gets slashed, all Delegators lose a percentage of their tokens. When a Validator is tombstone jailed, it means that the Validator can never use their current private key again on the network and their Validator is effectively dead. All Delegators must redelegate to another Validator.
Options for Protecting a Private Key
Given the necessity to sign blocks (just once!) while also securing access to the private key file, it’s important to understand a Validator’s strategy for private key management.
Here are a couple of options currently available:
- Plaintext file — you can store the priv_validator_key.json file on the hard drive of same server that the Cosmos/Tendermint Validator software runs on. The Validator software requires the file be “plain text” or not encrypted. This means anyone who has access to the Validator server operating system can easily read the private key.
- Remote signer and open source Key Management System (KMS) — the Cosmos/Tendermint software allows for a “remote signer” — a separate server to manage the private key file. There is open source KMS software that runs on a remote signer server and communicates with the main Validator server. This provides some defense in depth, as the remote signer only needs to communicate with the Validator, so it can be secured behind additional layers of access control. Therefore, if your Validator server is compromised you may not necessarily lose your private key on the remote signer. Technically, the operator could keep their private key in plaintext on a remote signer server and use a KMS to talk to the main Validator. However the real benefit is using a KMS with a hardware security module (HSM) as described below.
- Enterprise HSMs like YubiHSM2 — using a KMS remote signer on a separate server from the main Validator, you can move your private key file off the server hard drive and onto a dedicated hardware device that plugs in via USB port and manages access to the private key. The YubiHSM2 is a popular choice to manage the private key in this way, and allows for the KMS server to be compromised without losing the private key. Much like a hardware wallet, like a Ledger Nano-S or a Trezor, the YubiHSM2 stores the private keys in a secure hardware enclave and uses the key to sign transactions.
- Consumer hardware wallet — similar to the YubiHSM2, a Ledger Nano-S can be used to store a private key on a dedicated hardware device. Different from the YubiHSM2, the Ledger can additionally run custom applications on the device. This means that extra logic can be added to avoid things like double signing when accessing the private key. The downside is that since the Ledger is targeted at a consumer use case (plug into laptop, push buttons on device, unplug), it is somewhat challenging to run in an always on, remote managed data center. For example, when the server restarts, someone would have to physically go to the Ledger and push buttons to unlock it. Definitely not ideal for a “Lights Out Management” best practice. While the Ledger Nano-S appears to be quite robust, there are questions about the long term reliability of the device, since they are not engineered for always on operation.
- Proprietary/licensed remote signer — in addition to the open source KMS, several Validators and vendors have developed their own proprietary remote signing solutions. For example, Certus One developed JANUS and has licensed it to other Validators. Chorus One has also developed a proprietary signing solution, and there are certainly others. Given their closed source/proprietary nature, it is hard to evaluate these solutions, other than by observing them in operation or making judgements based on the skills and reputations of the vendors.
One larger network-wide consideration is having a diverse set of signing solutions. If all Validators were running same signing software and it was compromised, all Validators would be at risk. If there are different approaches, theoretically a disastrous bug wouldn’t decimate the entire network.
Choices and Tradeoffs
Given these options, what is the best approach for a Validator? It depends on server setup and budget.
If a Validator is running on cloud virtual servers, like many did in Game of Stakes, they will not be able to plug in an HSM like the YubiHSM2 or Ledger.
If a Validator is running on physical servers in a restricted access data center, they could use the advanced application/logic of the Ledger Nano-S, but would be required to go visit the data center in case of emergency or server restarts.
The YubiHSM2 is the most “enterprise ready” device for hands off management, but lacks the advanced application/logic of the Ledger. That logic would need to be recreated somewhere else, such as in the KMS on the remote signer server, but this is not yet implemented in the open source solutions. Therefore you could argue that the consumer-focused Ledger is currently “safer” at defending against double signs and slashing than the enterprise-focused YubiHSM2.
Double sign prevention is planned for the open source KMS, and in the mean-time validator teams may choose a hybrid approach of using the KMS and implementing their own double sign prevention.
Licensing a proprietary signing solution may get advanced features, but since it is closed source it’s hard to reason about code quality and bugs. It could be better or worse than open source implementations in different ways. Licensees may get access to source code, but it may be difficult to reason about and test, depending on experience with software development.
Finally, there is a tradeoff between high availability and safety. Would you rather have a Validator miss signing a few blocks or risk getting slashed? In general, the design goal of high availability is in conflict with the design goal of safety against an accidental double sign. Extremely thorough engineering and testing is required to achieve a highly available system with a low risk of double sign.
As we’ve discussed in this post, there are various approaches to securing a Validator’s private key. As a Delegator it is crucial to understand your Validator’s strategy as it directly impacts your Atoms via slashing risk.
If you’d like to chat about our approach at Figment Networks, please reach out: firstname.lastname@example.org
Stay tuned for further posts discussing the different dimensions to evaluate a Cosmos Validator.