Sat 28. Aug 2021
This week there was an Ethereum node split due to outdated clients. This resulted in the following comment on ETHSecurity Community channel:
Personally, I think this is true. I also think there is currently little effort put into instructions or approaches on how to run nodes. This becomes even more essential with Ethereum 2. As such, I think it makes sense as a community to share our approaches for a more resilient network. What follows in the next sections is my attempt.
To update software, one way to receive new updates is by relying on Docker container repositories. Instead of downloading new releases and patching our client software, we separate the concerns of persistent storage and the actual process. In my case, I run the
erigon client, which receives periodical updates on DockerHub at
I update these images roughly once a week by pulling the
/bin/podman pull docker.io/thorax/erigon:stable. I do this via
[Unit] Description=Ethereum 1 mainnet client Requires=network-online.target After=network-online.target [Service] Restart=always RestartSec=5s User=core Type=simple ExecStartPre=-/bin/podman kill erigon erigon-rpcdaemon lighthouse lighthouse-vc ExecStartPre=-/bin/podman rm erigon erigon-rpcdaemon lighthouse lighthouse-vc ExecStartPre=/bin/podman pull docker.io/thorax/erigon:stable ExecStart=/bin/podman run \ --name erigon \ -v /var/mnt/ssdraid/eth1/erigon-mainnet:/data:z \ docker.io/thorax/erigon:stable erigon \ --metrics --metrics.port=6060 \ --pprof --pprof.port=6061 \ --private.api.addr=localhost:4090 \ --datadir /data \ --chain mainnet [Install] WantedBy=multi-user.target
Here you can also see the separation of persistent data: the
-v /var/mnt/ssdraid/eth1/erigon-mainnet:/data:z mounts the a block device which has the data the process needs. In theory, I would only ever need to point this directory to the client process, and the client process could receive upgrades independent of the data it mounts.
I can manually upgrade my client by running
systemctl restart erigon, which will kill all its dependencies, remove the containers, and then pull the newest version. The update will propagate over the
rpcdaemon and its
lighthouse dependants by
After instructions in the
erigon.service file. For example, consider the
[Unit] Description=Ethereum 1 client rpcdaemon Requires=erigon.service After=erigon.service [Service] Restart=always RestartSec=5s User=core Type=simple ExecStart=/bin/podman run \ --net=container:erigon \ --pid=container:erigon \ --ipc=container:erigon \ -v /var/mnt/ssdraid/eth1/erigon-goerli:/data:z \ --name erigon-rpcdaemon \ docker.io/thorax/erigon:stable rpcdaemon \ --datadir /data \ --private.api.addr=localhost:4090 \ --http.api=eth,erigon,web3,net,debug,trace,txpool,shh \ --http.addr=0.0.0.0 [Install] WantedBy=multi-user.target
[Unit] part of the systemd specification has the following lines:
[Unit] Description=Ethereum 1 client rpcdaemon Requires=erigon.service After=erigon.service
This will mean that the
erigon-rpcdaemon.service will only launch after the
erigon.service is running. This propagates to
[Unit] Description=Ethereum 2 mainnet client Requires=erigon-rpcdaemon.service After=erigon-rpcdaemon.service ...
And from there on to the validator:
[Unit] Description=Ethereum 2 mainnet client validator Requires=lighthouse.service After=lighthouse.service ...
This way, a simple
systemctl restart erigon command will cause the whole stack to upgrade.
While running the restart operation could be a cron job, the reality is that the nodes also run other software. This includes but is not limited to
podman, i.e., the Docker host and the kernel of the system. These updates are arguably as important to the client upgrades to avoid shellshocks and vulnerabilities alike to be introduced over time.
To avoid dependency conflicts between different processes and thus risk the maintenance process cascading into chaos, I have taken the approach popularised by the CoreOS Linux distribution. Here, the general idea is that everything except for the kernel and the container runtime interface is a container. And while CoreOS as a company does not exist anymore, the distribution is kept alive by Fedora as Fedora CoreOS.
But how does this help? Well, CoreOS also auto-upgrades the kernel by periodically polling the release windows. Naturally, you may also configure the way these updates are rolled out. This way, by enabling the
systemctl enable erigon.service the operating system will trigger the process to start on each boot. And in CoreOS, each unattended boot operation corresponds to a system upgrade, which by specifying
podman rm and
podman pull operations corresponds to also upgrading the clients automatically. What is thus achieved is that around each week, when Fedora releases a new CoreOS version, the nodes will download the patches, restart, and then upgrade the Ethereum client processes. This allows unattended-upgrades across all nodes that I maintain, thus effectively avoiding chain splits.
In theory, this works all okay, but sometimes backward compatibility with the mounted persistent storage is broken in practice. And sometimes, the command line arguments are tweaked, which may also cause downtime. The way to resolve this is by introducing monitoring to the cluster, for which there already exists process-specific approaches via Prometheus and Grafana.
But, Prometheus and Grafana only work as long as the node itself can recover from configuration errors, which it often cannot because there is no programmatic way to apply patches to filesystem and process arguments. To resolve this, we should first test that the upgrade does not cause downtime, and only if so, then upgrade. While I have not configured these to work automatically, the tools already exist in the form of Linux checkpoints.
Luckily, the checkpoint operations also apply to containers via CRIU. In essence, CRIU allows the containers to be stopped and pushed to a remote computer while the host itself tries to upgrade the system. Interestingly, this also applies to kernel upgrades via seamless kernel upgrades. This way, it could be possible to devise an approach that works roughly as follows:
kexec, and if so, boot into it and pull the latest container images. Otherwise, notify the ops team (that's me!) that the upgrade breaks something.