Cosmos Hub Upgrade: Migration Risks

Hey Cosmos Hub validator operators! I’ve come to the launch party bearing a wet blanket. Here are some important things to pay attention to before celebrating a successful Cosmos Hub 3 launch.

One week to go before the upgrade ?

Now that Prop19 has passed, we are expecting to launch the Cosmos Hub 3 on or about December 11, 2019. More specifically, we will attempt to launch at precisely 60 minutes after Cosmos Hub 2 reaches Block Height 2,902,000, according to this passed proposal. We are excited to welcome up to 25 new Cosmos Hub validator operators!

Don’t get too excited yet

As validator operators, migrating from Cosmos Hub 2 to Cosmos Hub 3 will expose you and your delegators to a heightened slashing risk, since there is a higher-than-normal chance of accidentally committing equivocation (aka “double-signing”). Currently, if you are slashed for equivocation, you and your delegators will irrevocably lose 5% of stake.

Avoiding the Pitfalls of Migration

Here is a guide for mitigating these risks:

If you see anything missing or incorrect in this guide, please make a pull request and/or contact Aleks Bezobchuk in this channel.

The most important part of the migration procedure will be
1) verifying your software version and
2) verifying the correct genesis file hash before starting your validator. 

Take note that the riskiest thing that a validator operator can do during this procedure is a) discover a mistake and then b) repeat the upgrade procedure. 

If you discover a mistake in this process, the best thing to do is wait for the network to start before correcting it. If the network is halted and you discover that you’ve started with a different genesis file than the expected one, seek advice from a Tendermint developer in this channel before resetting your validator.

Here’s what to watch for

There are two major things to pay attention to.

1) Ensure you hash the actual genesis file

While checking, sharing, and comparing the hash of your validator’s genesis file is critical, sometimes the checked genesis and actual genesis file are different, since each file can turn out to be located in different directories.

What’s the difference between the checked genesis file and the actual genesis file?

The genesis file to be checked should be the one in the gaiad home directory. However, this home directory can be different for each validator operator (depending on their setup). 

This is what can go wrong: you may put correct genesis file into the wrong directory, and the incorrect genesis file in the right directory.

If you check the hash output using the file in the wrong directory, you could then proceed as though everything was fine, not realizing that your validator is about to actually use another genesis file.

This scenario may seem unlikely, but my understanding is it’s not an uncommon mistake, particularly with inexperienced validator operators.

2) Never use ‘unsafe reset all’ in response to a chain halt

The state file should never be deleted under any circumstances. Why not?

According to Hyung (B-Harvest), a halted chain can elevate the possibility of double-signing. If a validator operator mistakenly deletes the entire data directory, including state file, a double-sign (equivocation) violation will then occur upon resync.

Key Management Systems (KMS)

If you use a key management system (KMS), I’m assuming that you’re an expert validator operator and that you know what you’re doing. Since each validator’s KMS architecture is unique, KMS is beyond the scope of this article.

Wishing You Way More Than Luck

While I’m excited for the Cosmos Hub 3 launch, let’s balance that with a healthy dose of caution beforehand. Let’s get our validator set safely through this upgrade and celebrate on the other side.

There’s a lot at stake here, so I wish you all way more than luck! Lots to look forward to 🙂

Special thanks to Aleks (All in Bits) and Hyung (B-Harvest) for educating me about these risks.

Hopefully you found this useful. Feedback is always welcome! I’m on Twitter.