Highly available Lightning node cluster setup guide \ stacker news ~lightning

pull down to refresh

Highly available Lightning node cluster setup guide github.com/Filiprogrammer/lnd-ha-guide

1703 sats \ 24 comments \ @Filiprogrammer 17 Aug 2024 lightning

Running a Lightning node in a safe and redundant way is not trivial. Scheduled backups are out of the question since we always need the absolute latest channel states. Restoring from just a slightly outdated backup can lead to channel breaches and loss of funds.

So how do we recover our Lightning node from, say, a disk failure?

One compromise is a static channel backup. However, this assumes a certain level of trust in the channel partners. We also have the problem that the node is offline until we set up a new one and restore from the static channel backup.

This can be overcome with redundancy. We can run a cluster of 3 nodes that act as a single Lightning node, replicating the channel state between the nodes in real time. If one node fails, one of the remaining two will take over.

As I could not find a detailed guide online on how to run a highly available lightning node, I decided to write one myself. Here is the result of my research and testing:

This repository provides an example of how to set up a highly available LND lightning node by running it as a 3-node cluster. The state is stored in a replicated etcd database. The active leader node is always accessible via the same floating IP address and a Tor hidden service.

https://m.stacker.news/46417

view all related items

101 sats \ 2 replies \ @leo 17 Aug 2024

Is it really wise to connect multiple LND instances to Bitcoin Core via ZMQ? Have you considered RPC polling as an alternative? Thank you for publishing this!

8 sats \ 1 reply \ @Filiprogrammer OP 17 Aug 2024

I'm not sure I understand why polling would be a better option than LND being notified via ZMQ. Also there is only ever one instance of LND (the leader) connected to the Bitcoin node at the same time.

0 sats \ 0 replies \ @leo 17 Aug 2024

ah, that explains it, thank you

72 sats \ 16 replies \ @k00b 17 Aug 2024

I've been curious about the etcd-backed lnds. Does etcd scale well with large amounts of data?

I suspect I'd prefer their postgres backed storage option to support master election via an advisory lock. I've been meaning to open an issue.

155 sats \ 14 replies \ @justin_shocknet 17 Aug 2024

they've been moving to sql because etcd is too limited for proper HA iiuc... and I don't think lnd has an active/passive mode yet, this would seem to be relying on the loadbalancer to handle active-passive

OP can you break this down a bit? at a glance this seems more perilous than simply running lnd on VM's over ZFS

37 sats \ 6 replies \ @Filiprogrammer OP 17 Aug 2024

OP can you break this down a bit? at a glance this seems more perilous than simply running lnd on VM's over ZFS

ZFS is designed to only run on a single server. So if that server fails, the node will be down. If we are trying to achieve high availability, we need a distributed system where ideally every server has its own uninterrupted power supply.

I explain the setup in more detail in the linked guide.

52 sats \ 5 replies \ @justin_shocknet 17 Aug 2024

Yea I meant something like Ceph over it

117 sats \ 0 replies \ @JesseJames 18 Aug 2024

Ceph and cnpg (Cloud Native postgres) would be a nice fit.
Can you wrap that single server s docker container? If so putting in the kubernetes would solve many issues... I will check OPs repo to get more info, but you have a nice idea there.

44 sats \ 3 replies \ @Filiprogrammer OP 17 Aug 2024

I did consider trying with bbolt on top of Ceph, but since etcd is already implemented in lnd it seemed like the more native approach to use etcd. But I am planning to compare this to a setup with Ceph and do some benchmarks.

53 sats \ 2 replies \ @justin_shocknet 17 Aug 2024

Cool i'll be following, its been too long with LND as the only implementation thats somewhat production ready and not having HA or even an squeel backend... would also like to know more about the cluster awareness so a passive node doesn't broadcast something

view all 2 replies

17 sats \ 1 reply \ @k00b 17 Aug 2024

By passive do you mean not participating in state updates?

31 sats \ 0 replies \ @justin_shocknet 17 Aug 2024

Exactly, iirc LND isn't cluster-aware

10 sats \ 4 replies \ @031ef7d322 18 Aug 2024

deleted by author

20 sats \ 3 replies \ @k00b 18 Aug 2024

Is leader election supported for Postgres?

No, leader election is not supported by Postgres itself since it doesn't have a mechanism to reliably determine a leading node. It is, however, possible to use Postgres as the LND database backend while using an etcd cluster purely for the leader election functionality.

This is wrong though. You can construct something like an expiring lock in Postgres.

116 sats \ 2 replies \ @031ef7d322 18 Aug 2024

I would guess it’s more common for application developers to reach for a tool more specific to the use case, like etcd, zookeeper, or consul.

You’re right that LND could potentially use advisory locks, which might make sense to eliminate an entire dependency when postgres is used as the backend.

https://www.postgresql.org/docs/current/explicit-locking.html#ADVISORY-LOCKS

105 sats \ 1 reply \ @031ef7d322 18 Aug 2024

Relevant issue from 2022: https://github.com/lightningnetwork/lnd/issues/6894

view all 1 replies

122 sats \ 0 replies \ @Filiprogrammer OP 17 Aug 2024

I haven't compared this to postgres yet, but sending a 1000-part payment to a channel partner on etcd slowed the node to a crawl.

I plan to do some benchmarks under different configurations and also compare it to a postgres backend. I also plan to benchmark lnd with a bbolt database running on a 3-node Ceph RBD.

0 sats \ 0 replies \ @Filiprogrammer OP 12 Oct 2024

An update to this is available at: #720736

0 sats \ 0 replies \ @16d86e9daa 17 Aug 2024

This setup will help to achieve a highly available Lightning Network node cluster, but keep in mind that managing such a system requires ongoing attention and expertise in both Lightning Network specifics and general server management.

2 sats \ 1 reply \ @Octopus 18 Aug 2024 outlawed

stackers have outlawed this. turn on wild west mode in your /settings to see outlawed content.