pull down to refresh

I run c-otto.de and, despite my best efforts, sometimes have to pay for force-closes. Because the one from last night was rather expensive, I'd like to share my insights.
2023-12-28 15:12:35.748: My node accepts a forward request coming in from bfx-lnd0, requesting to send to Gravity21 πŸŒŽβ˜„οΈ:
2023-12-28 15:12:35.748 [DBG] HSWC: ChannelLink(43a63045a6b7fc2be07424afd5eda561ea01009c057ab8693b2d2212a4a015d5:1): queueing keystone of ADD open circuit: (Chan ID=798901:1550:1, HTLC ID=1259)->(Chan ID=805200:3121:1, HTLC ID=16830) 2023-12-28 15:12:35.820 [DBG] HSWC: ChannelLink(43a63045a6b7fc2be07424afd5eda561ea01009c057ab8693b2d2212a4a015d5:1): removing Add packet (Chan ID=798901:1550:1, HTLC ID=1259) from mailbox
Sadly, my peer didn't respond to this, leaving my node with a stuck HTLC. After five minutes, lnd decided to disconnect:
2023-12-28 15:17:36.415 [INF] PEER: Peer(03238001dec7155a367248ed7f9a1e6940f3f372f4d6f2586b31c91ae32cc1628f): disconnecting 03238001dec7155a367248ed7f9a1e6940f3f372f4d6f2586b31c91ae32cc1628f
I believe that lnd should not have added the HTLC to a peer that doesn't respond, which is an issue discussed in https://github.com/lightningnetwork/lnd/issues/2992 (from 2019!). If lnd knows that a peer is offline, it would not add a new HTLC, and instead fail the payment back to the sender. Sadly, that's not what lnd currently does.
As lnd does not know whether my peer received the HTLC, it has to wait for a timeout. This timeout ended in block 823615 (2023-12-30 23:12:43). My node uses some crude scripts to reconnect to peers that are offline, even disconnecting from those that appear to be offline if there's some stuck HTLC in one of their channels. Despite these efforts, my peer did not reconnect and, thus, did not settle/fail the HTLC in time, forcing my node to reclaim the funds on-chain:
2023-12-30 23:12:43.051 [DBG] CNCT: ChannelArbitrator(43a63045a6b7fc2be07424afd5eda561ea01009c057ab8693b2d2212a4a015d5:1): new block (height=823615) examining active HTLC's 2023-12-30 23:12:43.055 [DBG] CNCT: ChannelArbitrator(43a63045a6b7fc2be07424afd5eda561ea01009c057ab8693b2d2212a4a015d5:1): checking commit chain actions at height=823615, in_htlc_count=0, out_htlc_count=1 2023-12-30 23:12:43.058 [INF] CNCT: ChannelArbitrator(43a63045a6b7fc2be07424afd5eda561ea01009c057ab8693b2d2212a4a015d5:1): go to chain for outgoing htlc 6ac792db9b79f262bb941dec776ac676980c2bd7c294443316330c742c2ac706: timeout=823615, blocks_until_expiry=0, broadcast_delta=0 2023-12-30 23:12:43.076 [DBG] CNCT: ChannelArbitrator(43a63045a6b7fc2be07424afd5eda561ea01009c057ab8693b2d2212a4a015d5:1): attempting state step with trigger=chainTrigger from state=StateBroadcastCommit 2023-12-30 23:12:43.077 [INF] CNCT: ChannelArbitrator(43a63045a6b7fc2be07424afd5eda561ea01009c057ab8693b2d2212a4a015d5:1): force closing chan 2023-12-30 23:12:43.103 [INF] CNCT: Broadcasting force close transaction bfea88acb1b00ac48cbe6e571f41d8016012ba31813f1b7e94105a349c890d42, ChannelPoint(43a63045a6b7fc2be07424afd5eda561ea01009c057ab8693b2d2212a4a015d5:1): [...]
As you can see, it includes the two anchor outputs (worth 330sats), the balances (52k and 4.9M), and one additional output (13k sats) which is the HTLC in question. As my peer opened the channel, the fees (7.7k sats) were deducted from their channel reserve and I did not pay for this, even though my node sent out the close transaction.
Warning: If there were other pending HTLCs in the channel at this time, they'd also be included in the transaction, no matter how recently they were received! Make sure to limit the number of in-flight HTLCs (per channel), as a single HTLC timeout might cause an extremely large/costly force-close if many other (healthy?) HTLCs have to be included.
Sadly, the fee rate of 24 sat/vByte does not suffice to get the close transaction confirmed in time. As explained in Elle Moutin's excellent blog series (https://ellemouton.com/posts/htlc-deep-dive/), my peer can always reveal the HTLC's preimage and claim the funds. However, my peer on the incoming side (bfx-lnd0) can also claim the funds after some timeout, which is why my node needs to make sure Gravity21 πŸŒŽβ˜„οΈ cannot claim the funds after I already lost them to bfx-lnd0. To prepare for this, lnd needed to get the close transaction confirmed and, thus, "pulled" the anchor output (CPFP):
2023-12-31 03:44:16.708 [INF] SWPR: Creating sweep transaction e65d88492b1344be0bf7f2b4f435e671dfa6fd68bf374b82105878c598ff00ed for 3 inputs (bfea88acb1b00ac48cbe6e571f41d8016012ba31813f1b7e94105a349c890d42:0 (CommitmentAnchor), 0529999cd28817dd9666804d28e88c1badf55a260fd7e996677ee6df9dd7c22a:0 (TaprootPubKeySpend), 39fcf06df92450afb3766911d41689007af95fb36f0c386ba05419ed6c530d30:0 (TaprootPubKeySpend)) using 45093 sat/kw, tx_weight=956, tx_fee=0.00093401 BTC, parents_count=1, parents_fee=0.00007787 BTC, parents_weight=1288
Note that my node has to pay the fees for this sweep (93k sats!).
This sweep transaction got mined a few minutes later:
2023-12-31 03:47:17.917 [INF] LNWL: Marking unconfirmed transaction e65d88492b1344be0bf7f2b4f435e671dfa6fd68bf374b82105878c598ff00ed mined in block 823653
Now lnd wants to claim the HTLC funds and fail the payment upstream to bfx-lnd0 (also to avoid a force-close from bfx-lnd0!). For this, a HTLC timeout transaction is created, which spends the HTLC output to an intermediate address, from which only my node can take funds (leaving the issue of revocation aside, nobody is cheating here):
2023-12-31 03:47:43.333 [INF] SWPR: Creating sweep transaction ee070ca1a7c29ba96bb38eb4b6ac1ff76e3229033eb833e9a7b7ed7db30e15cf for 3 inputs (bfea88acb1b00ac48cbe6e571f41d8016012ba31813f1b7e94105a349c890d42:2 (HtlcOfferedTimeoutSecondLevelInputConfirmed), e65d88492b1344be0bf7f2b4f435e671dfa6fd68bf374b82105878c598ff00ed:0 (TaprootPubKeySpend), ea3a137ce1beb1c5e53937d3fff1a80cb041dd1c35ffd0452da20bd3b5178202:0 (TaprootPubKeySpend)) using 37114 sat/kw, tx_weight=1300, tx_fee=0.00048248 BTC, parents_count=0, parents_fee=0 BTC, parents_weight=0 2023-12-31 03:47:43.347 [INF] NTFN: Found input bfea88acb1b00ac48cbe6e571f41d8016012ba31813f1b7e94105a349c890d42:2, spent in ee070ca1a7c29ba96bb38eb4b6ac1ff76e3229033eb833e9a7b7ed7db30e15cf 2023-12-31 03:47:43.347 [DBG] CNCT: Found mempool spend of HTLC output bfea88acb1b00ac48cbe6e571f41d8016012ba31813f1b7e94105a349c890d42:2 in tx=ee070ca1a7c29ba96bb38eb4b6ac1ff76e3229033eb833e9a7b7ed7db30e15cf 2023-12-31 03:47:43.347 [DBG] CNCT: HTLC output bfea88acb1b00ac48cbe6e571f41d8016012ba31813f1b7e94105a349c890d42:2 spent doesn't reveal preimage 2023-12-31 03:51:54.669 [INF] NTFN: Found input bfea88acb1b00ac48cbe6e571f41d8016012ba31813f1b7e94105a349c890d42:2, spent in ee070ca1a7c29ba96bb38eb4b6ac1ff76e3229033eb833e9a7b7ed7db30e15cf 2023-12-31 03:51:54.966 [INF] NTFN: Dispatching confirmed spend notification for outpoint=bfea88acb1b00ac48cbe6e571f41d8016012ba31813f1b7e94105a349c890d42:2, script=0 49ae7fe296198a7dcc8b50c6f6e613afc5810d49783926bdc43b3dc7a6d5948c at current height=823654: ee070ca1a7c29ba96bb38eb4b6ac1ff76e3229033eb833e9a7b7ed7db30e15cf[0] spending bfea88acb1b00ac48cbe6e571f41d8016012ba31813f1b7e94105a349c890d42:2 at height=823654
Some observations:
  • even though lnd created this transaction, other parts of the code still see it added to the mempool and check for any preimage being revealed
  • my node has to pay the fees for this HTLC timeout transaction (48k sats!)
  • the HTLC is only worth 13k sats, so paying 140k sats to avoid losing it seems pointless. However, as not claiming timed out HTLCs could lead to peers stealing lots of small-ish amounts, being strict seems to be the better approach overall.
Now, after spending 93k + 48k sats, my funds are safe, but locked. In around two weeks my node will be able to claim both the channel balance and the HTLC amount, but this isn't urgent (and I tweaked my node to not spend a whole lot of sats for this rather optional step, which I believe is an ongoing issue in lnd).
As you can see, even though I didn't open the channel and my node was up and running for the whole time, having an unrealiable peer still caused me to spend 140k+ sats.
You can use https://github.com/lightningequipment/circuitbreaker to limit the number of pending HTLCs and "push" the financial risk towards the (potentially spamming) sender.
reply
do you have your limits to set 1 pending per peer or are there some peers that you trust more and give higher pending limits? Or is 1 excessively limited?
reply
I have limits, but higher than one. I also have limits like "X attempts per hour", and I also have higher limits for some trusted peers (which I could reach out to, if need be).
reply
Do you have a spec on reasonable thresholds? Seems like something everyone should do the right way. I kicked on circuit breaker with the defaults (5 pending / 3600 per hour) and racked up 400 fails on one of my channels. Eased back there but that was unexpected. Using Queue mode now as well...
reply
I'm currently manually setting low pending/max hourly rates for a few nodes that I have noticed went offline several times over the last week.
Does trusting a node also mean that you know they have circuit breaker logic that you are matching? Seems like an otherwise trusted node could have no breaker or too high of settings...
reply
This reminds me how little I know about Lightning, thanks for sharing!
reply
Great read
reply
the HTLC is only worth 13k sats, so paying 140k sats to avoid losing it seems pointless. However, as not claiming timed out HTLCs could lead to peers stealing lots of small-ish amounts, being strict seems to be the better approach overall.
Can you elaborate on this? You mean that if they see you not claiming it once, they might repeat it over and over again until you decide to enforce the timed-out HTLCs?
Did you have the option at some point within LND to make this a conscious choice? Or did this all happen under the hood?
Thanks for the write up by the way.
reply
Let's say I tweaked my node (or maybe it's part of an lnd release?) so that force-closes for HTLC timeouts only happen if it's "worth it". In this case, lnd would just fail the payment back to bfx-lnd0, and keep the channel open without the force-close dilemma described here (with the HTLC still pending).
Now, what if my peer is malicious and knows about (or at least assumes) my node's response to the timeout? They could reveal the preimage and claim the funds, effectively "stealing" (I'm not sure if it's real theft, hence the quotes) 11k sats from me. They could do this again and again, possibly hiding behind other pubkeys. Note that bfx-lnd0 would never give the funds to me, because I already failed the payment (and they failed it upstream towards the original sender).
There are tradeoffs, but I believe it's better to have a "it works 100%" atmosphere than "let's just hope my peer isn't a thief" kind of trust dilemma.
I did not tweak lnd, and the "safe" approach is the default. Right now I don't think there's an easy way to use the "unsafe-but-maybe-cheaper" approach.
reply
I had a force closed on a channel opened by me to a peer which I know and we talk about it at the time. We tried to figured out who did trigger the force close as any did it manualy, one of the node might have done it automaticaΓ©lly somehow. The peer node did have other force close at the time that all happened in a short period while I had only this one, we conclude that it was initiated by my peer node somehow. However the fee for the force close was taken on my side and it was during extreme high fee environnement and the fee was even significant higher than the network need for passing in the next block. It was like 200ksats or more, I don't remember exactly.
I never used any circuit breaker or automation but haven't much other issues so far beside a stuck force close, on purpose by me, that got stuck (and still is) due to 12ksat/vB fee and non-anchor channel type. I hope to solve it eventually with help from the other peer or by paying a transaction accelerstor like ViaBTC or Binance are offering.
reply
If lnd knows that a peer is offline, it would not add a new HTLC, and instead fail the payment back to the sender. Sadly, that's not what lnd currently does.
Wow, that's pretty sad TBH. How many network wide force closes have happened due to this common source of negligence?
reply
I had a closer look at my logs and it seems that, in this case, my peer wasn't offline at the time of adding the HTLC (otherwise the HTLC would not appear in the TX published by my node). If you have a closer look at the linked GitHub issue, it seems several improvements have been made already. It's not 100% fixed, but I wouldn't say lots of FCs nowadays happen because of this.
reply
Not really related but I would really love to see a release where the LN node scans the mempool for any transactions affecting a channel and refuse to use the channel if such a TX exists. I've had many payment failures where a channel closed during the middle of my LN transaction and caused my sats to be in limbo. Luckily, these cases usually resolve after a very long CLTV timeout without any FC. I think it makes sense to avoid risky LN transactions that involve channels that could be confirmed-closed at any moment.
reply
PS: Using https://github.com/c-Otto/lnd-manageJ/ you can easily see how much you paid, especially for force-closed channels and the associated sweep transactions. Figuring out the meaning (and owner) of each output can be quite challenging, which is why I automated this step.
reply
Sorry for the typo, it's "Elle Mouton".
reply
stackers have outlawed this. turn on wild west mode in your /settings to see outlawed content.
deleted by author
reply
There is a huge difference between running a routing node and just running your node to plug into the rest of the network. The former is indeed more demanding, I think the latter is still very doable for a plebs.
I ran a pretty profitable node more than a year ago, even though I am not that technical. But back then, mistakes were not that costly... so in the current environment, I agree that you better know your shit, or that you learn quickly from your mistakes.
reply
deleted by author
reply
The tools are getting better. I don't like the current ordinal and other shitcoining on Bitcoin, but the high fees are pushing people to be more mindful and hopefully, some other much-needed improvements will be accelerated because of this.
When I left, channel management could already mostly be automated using tools such as LNDg. So, once channel stability, stuck HTLCs, force closes, etc gets less likely, I believe things will again be possible for amateurs, even if the big routing nodes will always have an edge, just because disposable liquidity is key after all.
reply
deleted by author