The original complaint which caused my investigation was the 100% CPU
consumption of connectd, which we traced to the queue to gossipd.
However, the issue is not really connectd's overproduction, but
gossipd's underconsumption, probably caused by its own queueing issues
with the trace messages to lightningd, which the prior patch fixed.
Nonetheless, gossipd *can* get busy, and if we were to ask multiple
nodes for full gossip, we could see a few hundred thousand messages
come it at once. Hence I'm increasing the warning limit to 250,000
messages.
This commit is also where we attach the Changelog message, even
though it's really "common/msg_queue: use membuf for greater efficiency."
and "gossipd: fix excessive msg_queue length from status_trace()" which
solved the problem.
Here's the backtrace from a previous debug patch:
```
lightning_connectd: msg_queue length excessive (version v24.08.1-17-ga780ad4-modded)
0x5580534051f0 send_backtrace
common/daemon.c:33
0x55805340bd5b do_enqueue
common/msg_queue.c:66
0x55805340bde5 msg_enqueue
common/msg_queue.c:82
0x5580534057ce daemon_conn_send
common/daemon_conn.c:161
0x5580533fe3ff handle_gossip_in
connectd/multiplex.c:624
0x5580533ff23b handle_message_locally
connectd/multiplex.c:763
0x5580533ff2d6 read_body_from_peer_done
connectd/multiplex.c:1112
```
Reported-by: https://github.com/JssDWt
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
Changelog-Fixed: `connectd` and `gossipd` message queues are much more efficient.
When this (very spammy) "handle_recv_gossip" message was changed
from debug to trace, the suppression code wasn't updated: we suppress
overly active debug messages, but not trace messages.
This is the backtrace from an earlier version of the "too large queue"
patch:
```
lightning_gossipd: msg_queue length excessive (version v24.08.1-17-ga780ad4-modded)
0x557e521e833f send_backtrace
common/daemon.c:33
0x557e521eefb9 do_enqueue
common/msg_queue.c:66
0x557e521ef043 msg_enqueue
common/msg_queue.c:82
0x557e521e891d daemon_conn_send
common/daemon_conn.c:161
0x557e521f14f0 status_send
common/status.c:90
0x557e521f1804 status_vfmt
common/status.c:169
0x557e521f1433 status_fmt
common/status.c:180
0x557e521de7c6 handle_recv_gossip
gossipd/gossipd.c:206
0x557e521de9f5 connectd_req
gossipd/gossipd.c:307
0x557e521e862d handle_read
common/daemon_conn.c:35
```
This is like `sendonion` but unwraps the onion as the first hop,
avoiding nasty special cases for blinded paths which start with this
node, and also self-pay.
Tests split into multiple ones after Christian's review.
Changelog-Added: JSON-RPC: `injectpaymentonion` for initiating an HTLC like a peer would do.
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
Changelog-Added: JSON-RPC: `decode` now used modern BOLT 4 language for blinded paths, `first_path_key`.
Changelog-Deprecated: JSON-RPC: `decode` `blinding` in blinded path: use `first_path_key`.
Changelog-Added: Plugins: `onion_message_recv` and `onion_message_recv_secret` hooks now used modern BOLT 4 language for blinded paths, `first_path_key`.
Changelog-Deprecated: JSON-RPC: `onion_message_recv` and `onion_message_recv_secret` hooks `blinding` in blinded path: use `first_path_key`.
No code changes, just catching up with the BOLT changes which rework our
blinded path terminology (for the better!).
Another patch will sweep the rest of our internal names, this tries only to
make things compile and fix up the BOLT quotes.
1. Inside payload: current_blinding_point -> current_path_key
2. Inside update_add_htlc TLV: blinding_point -> blinded_path
3. Inside blinded_path: blinding -> first_path_key
4. Inside onion_message: blinding -> path_key.
5. Inside encrypted_data_tlv: next_blinding_override -> next_path_key_override
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
This is obsolete (since modern onions) and so removed from spec.
We should not set it, and don't need to handle it specially.
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
This means we should support it by default.
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
Changelog-Added: Protocol: `option_quiesce` enabled by default.
Changelog-Deprecated: Config: --experimental-quiesce: it's now the default.
In particular, this lets you find the exact htlc_maximum_msat/htlc_minimum_msat
values.
This means we actually create real channel_updates for local mods, which
requires a second "local" scratch region.
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
Since we don't compact the gossmap on the fly (FIXME!) we can
easily surpass 4GB in the gossmap, and 32 bit offsets are not
sufficient.
I'm a bit surprised we don't crash immediately, but we've definitely
seen issues.
Changelog-Fixed: gossipd: crash errors with large gossip_store (>4MB) growth on longer-running nodes.
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
This minimizes the need to convert back and forth from and to sat
values, and it also removes a new instance of sats in the public
interface (`channel_hints`).
Suggested-By: Rusty Russell <@rustyrussell>
We need to know the overall channel capacity, i.e., the amount_msat
that the channel was funded with, in order to relax the channel_hint
to refill over time.
It was weird not to have a capacity associated with localmods channels, and
fixing it has some very nice side effects.
Now the gossmap_chan_get_capacity() call never fails (we prevented reading
of channels from gossmap in the partially-written case already), so we
make it return the capacity. We do this in msat, because that's what
all the callers want.
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
This is actually what we want in several places: to only override one or
two fields in a channel_update.
We add a gossmap_local_setchan() with a similar API to the old
gossmap_local_updatechan(), for the case where we want to set every
field.
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
We allow adding them, but crash when we remove the localmods. Yet
this could theoretically happen if a channel we modified was removed
from the gossmap, anyway.
Reported-by: Lagrang3 <lagrang3@protonmail.com>
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
Rather than making the callers do this, make the invoice decoder perform
the various sanity checks.
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
I used `amount_msat_eq(x, AMOUNT_MSAT(0))` because I forgot this
function existed. I probably missed it because the name is surprising,
so add "is" in there to make it clear it's a boolean function.
You'll note almost all the places which did use it are Eduardo's and
Lisa's code, so maybe it's just me.
Fix up a few places which I could use it, too.
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
If I put in X, how much can I get out after fees are subtracted?
This was inspired by Eduardo's channel_maximum_forward in renepay, which
is basically the same thing.
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
1) We can't simply cast away const to manipulate a string, the compiler can assume
we don't. The type must be made non-const.
2) cisspace() is nicer to use than isspace() (no cast required!)
3) Simply place a NUL terminator instead of using memmove to set it.
4) Use cast_const to add const to char **, where necessary.
5) Add Changelog line, for CHANGELOG.md
Changelog-Fixed: Config: whitespace at the end of (most) options is now ignored, not complained about.
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
We weren't properly notifying that a channel output has been spent in
the case of it being spent in a splice. This fixes the notification side
of the equation, however there's still some issues remaining for the
bookkeeper side (to come).
Changelog-Fixed: We now send a `coin_movement` notification for splice confirmations of channel funding outpoint spends.
Changelog-Added: pay: The pay plugin now checks whether we have enough spendable capacity before computing a route, returning a clear error message if we don't
This simplifies the callers significantly: all channel_announcements now
have an amount, so gossmap_chan_get_capacity() only fails on a local
modification.
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
This is a final sweep to match the current BOLT12 text:
1563d13999d342680140c693de0b9d65aa522372 ("More bolt12 test vectors.")
Only two code changes, to change the order of checks to match the bolt,
and to give a warning on decode if a path is empty.
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
It's an internal difference, so doesn't actually break compatibility
(it would if we tried to prove we owned an old invoicerequest, but we
don't have infrastructure for that anyway).
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
The current interface, if given a tweak, uses a *different secret key*
and tweaks it. This was an early experiment: we will switch to using
a secret tweak for invoice_requests like we do for invoice path ids.
To make sure there's no funny business, *hsmd* hashes to form the
tweak (i.e. no zero tweaks!).
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
For now we only use a fake id for requesting invoices (as a payer_key), but we
will eventually use this generically, and we want plugins to be able to map them
too, so use the same scheme as path_id: a generated secret using the makesecret
API.
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
invoice_path_id is actually a generic path_id thing, so rename it.
We're going to use the same scheme for path secrets and the tweak to
node_id when we create a fake pubkey for invoice_requests, so a new
header is appropriate.
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
1. Missing offer_description iff offer_amount also missing.
2. Missing offer_issuer_id iff offer_paths is present.
3. Short channel id on introduction point.
4. Experimental range.
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>