It's actually the only one that uses it. We also tweak the way
gossip_store handles failure: gossmap_manage now tells it when to
reset the corrupted store.
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
Instead of making a copy.
To measure the performance impact, I timed
tests/test_askrene.py::test_real_biases on my laptop.
No checksum check: 194.52s
Copying for checksum check: 202.81s
Zero-copy checksum check: 194.40s
But these numbers proved noisy. Still, doesn't hurt.
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
We assume if it's incorrect, we simply need to wait. If this proves incorrect,
we will see a stream of BROKEN log messages.
To measure the performance impact, I timed
tests/test_askrene.py::test_real_biases on my laptop.
Before: 194.52s
After: 202.81s
So it's marginal.
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
While this shouldn't happen, it does (pending other fixes), and we stop reading the
gossip store until next time. The result is partial gossip, demonstrated beautifully
by NicolasDorier's report:
```
lightning_gossipd: gossmap: redundant channel_announce for 864063x1306x1, offsets 1272259 and 1784859!"
```
Gossipd stalld there and don't make more progress. So gossipd itself
doesn't see the entire gossip_store.
Then things get really batshit:
```
2025-02-04T05:53:28.582Z DEBUG gossipd: Store compact time: 1429910 msec
```
This took 1429 seconds to process. Why?
Because it hasn't been processing the gossip store fully, gossipd kept adding "new" records to the end:
```
2025-02-04T05:53:28.583Z DEBUG gossipd: gossip_store: Read 62716143/1739952/5158256/0 cannounce/cupdate/nannounce/delete from store in 31634458462 bytes, now 31634458440 bytes (populated=true)
```
It has 31GB of gossip in there! No wonder it took so long...
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
Fixes: https://github.com/ElementsProject/lightning/issues/8035
Changelog-Fixed: gossipd: corruption in the gossip_store could cause ever-longer startup times and no gossip updates.
Default goes to stderr for LOG_UNUSUAL and higher.
We have to whitelist more cases in map_catchup so we don't spam the logs
with perfectly-expected (but ignored) messages though.
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
We only use it in one place, and that was simply to share an fd between
gossipd writing and gossipd reading, which may be causing our zfs problem
anyway.
In fact, it fixes a race if we don't have HAVE_PWRITEV.
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
We have a report of this happening under ZFS. We cannot do much if
this really is a problem where we can't read back what we write, but
this avoids the immediate crash.
Fixes: https://github.com/ElementsProject/lightning/issues/7971
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
Changelog-Fixed: gossmap: occasional crash (at least on ZFS) reading gossip_store.
In 4e7ba96729 (ightningd: don't kill onchaind
if we are forcing a disconnect.) actually was something which happened
in our generate examples script.
This updates that.
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
plugins/Makefile has target/${RUST_PROFILE}/cln-grpc depend on the
generated files via $(MSGGEN_GENALL), but cln-rpc/Makefile adds to
that variable, so needs to be included first.
Here's an example build error:
```
Combining schemas from /home/rusty/lightning-ltest/doc/schemas into /home/rusty/lightning-ltest/contrib/msggen/msggen/schema.json
Created /home/rusty/lightning-ltest/contrib/msggen/msggen/schema.json from 2 files
error: failed to run custom build command for `cln-grpc v0.3.0 (/home/rusty/lightning-ltest/cln-grpc)`
Caused by:
process didn't exit successfully: `/home/rusty/lightning-ltest/target/debug/build/cln-grpc-95489e3ba33c0ab3/build-script-build` (exit status: 101)
--- stdout
cargo:rerun-if-changed=proto/node.proto
cargo:rerun-if-changed=proto
--- stderr
thread 'main' panicked at cln-grpc/build.rs:7:10:
called `Result::unwrap()` on an `Err` value: Custom { kind: Other, error: "protoc failed: node.proto:134:52: \"AskreneageResponse\" is not defined.\nnode.proto:135:23: \"GetroutesRequest\" is not defined.\nnode.proto:135:50: \"GetroutesResponse\" is not defined.\nnode.proto:136:32: \"AskrenedisablenodeRequest\" is not defined.\nnode.proto:136:68: \"AskrenedisablenodeResponse\" is not defined.\nnode.proto:137:34: \"AskreneinformchannelRequest\" is not defined.\nnode.proto:137:72: \"AskreneinformchannelResponse\" is not defined.\nnode.proto:138:34: \"AskrenecreatechannelRequest\" is not defined.\nnode.proto:138:72: \"AskrenecreatechannelResponse\" is not defined.\nnode.proto:139:34: \"AskreneupdatechannelRequest\" is not defined.\nnode.proto:139:72: \"AskreneupdatechannelResponse\" is not defined.\nnode.proto:140:32: \"AskrenebiaschannelRequest\" is not defined.\nnode.proto:140:68: \"AskrenebiaschannelResponse\" is not defined.\nnode.proto:141:37: \"AskrenelistreservationsRequest\" is not defined.\nnode.proto:141:78: \"AskrenelistreservationsResponse\" is not defined.\nnode.proto:142:32: \"InjectpaymentonionRequest\" is not defined.\nnode.proto:142:68: \"InjectpaymentonionResponse\" is not defined.\nnode.proto:143:18: \"XpayRequest\" is not defined.\nnode.proto:143:40: \"XpayResponse\" is not defined.\nnode.proto:145:33: \"StreamBlockAddedRequest\" is not defined.\nnode.proto:145:74: \"BlockAddedNotification\" is not defined.\nnode.proto:146:40: \"StreamChannelOpenFailedRequest\" is not defined.\nnode.proto:146:88: \"ChannelOpenFailedNotification\" is not defined.\nnode.proto:147:36: \"StreamChannelOpenedRequest\" is not defined.\nnode.proto:147:80: \"ChannelOpenedNotification\" is not defined.\nnode.proto:148:30: \"StreamConnectRequest\" is not defined.\nnode.proto:148:68: \"PeerConnectNotification\" is not defined.\nnode.proto:149:32: \"StreamCustomMsgRequest\" is not defined.\nnode.proto:149:72: \"CustomMsgNotification\" is not defined.\n" }
note: run with `RUST_BACKTRACE=1` environment variable to display a backtrace
make: *** [plugins/Makefile:305: target/debug/cln-grpc] Error 101
make: *** Waiting for unfinished jobs....
```
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
We didn't update this when we extended the timeout to 120 seconds in
ee3133f198 ("lightningd: increase
startup time for plugins to 120 seconds.")
```
def test_failing_plugins(directory):
fail_plugins = [
os.path.join(os.getcwd(), 'contrib/plugins/fail/failtimeout.py'),
os.path.join(os.getcwd(), 'contrib/plugins/fail/doesnotexist.py'),
]
for p in fail_plugins:
> with pytest.raises(subprocess.CalledProcessError):
E Failed: DID NOT RAISE <class 'subprocess.CalledProcessError'>
tests/test_plugin.py:420: Failed
----------------------------- Captured stdout call -----------------------------
{'github_repository': 'ElementsProject/lightning', 'github_sha': '83dca18c5e9610bfaac766f957387b9a1ec48f50', 'github_ref': 'refs/pull/7887/merge', 'github_ref_name': 'HEAD', 'github_run_id': 13253210143, 'github_head_ref': 'guilt/bolt-updates-after-24.11', 'github_run_number': 12237, 'github_base_ref': 'master', 'github_run_attempt': '2', 'testname': 'test_failing_plugins', 'start_time': 1739239278, 'end_time': 1739239340, 'outcome': 'fail'}
=========================== short test summary info ============================
FAILED tests/test_plugin.py::test_failing_plugins - Failed: DID NOT RAISE <class 'subprocess.CalledProcessError'>
============= 1 failed, 80 passed, 2 skipped in 855.37s (0:14:15) ==============
```
We can actually delete it before counters are updated:
```
wait_for(lambda: len(l3.rpc.listinvoices()['invoices']) == 2)
> assert l3.rpc.autoclean_status()['autoclean']['expiredinvoices']['cleaned'] == 3
E assert 1 == 3
tests/test_plugin.py:3266: AssertionError
```
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
We thought it was a good idea to terminate channel subds, but we included onchaind as well!
```
2025-01-29T21:31:46.053Z UNUSUAL 03…00-channeld-chan#255683: Adding HTLC 1737 too slow: killing connection
2025-01-29T21:31:46.053Z INFO 03…00-chan#255683: Peer transient failure in CHANNELD_NORMAL: Adding HTLC timed out: killed connection
2025-01-29T21:31:46.053Z DEBUG 03…00-channeld-chan#255683: Status closed, but not exited. Killing
2025-01-29T21:31:46.054Z DEBUG 03…00-chan#255683: Failing HTLC 1738 due to peer death
2025-01-29T21:31:46.058Z DEBUG 03…00-chan#255683: Failing HTLC 1737 due to peer death
2025-01-29T21:31:46.058Z DEBUG 03…00-chan#255673: Forcing disconnect due to One channel had an error
2025-01-29T21:31:46.059Z DEBUG 03…00-onchaind-chan#255673: Status closed, but not exited. Killing
2025-01-29T21:31:46.093Z DEBUG 03…00-connectd: disconnect_peer
2025-01-29T21:31:46.093Z DEBUG 03…00-lightningd: peer_disconnect_done
```
Reported-by: @whitslack
Fixes: https://github.com/ElementsProject/lightning/issues/8055
Changelog-Fixed: onchaind: don't die if we fail an unrelated channel with the same peer.
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
Raspberry Pi, with bitcoind running and full gossip topology may actually hit
this, and we have a report in practice.
Note that the comment is wrong, so fix that too.
Fixes: https://github.com/ElementsProject/lightning/issues/7724
Reported-by: m-schmoock
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
Currently, pyln tests fail if the `lightning-` prefix is removed from schema/*.json files. In this release, we will update pyln to remove its reliance on this prefix, and in the next release, we will remove the prefixes from the files as well.
Changelog-None.
Test flake where the balance for lightning-2 went negative
```
> assert account_balance(l2, channel_id) == 0
tests/test_closing.py:1314:
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _
tests/utils.py:183: in account_balance
m_sum -= Millisatoshi(m['debit_msat'])
contrib/pyln-client/pyln/client/lightning.py:193: in __sub__
return Millisatoshi(int(self) - int(other))
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _
self = -10000msat, v = -10000
```
Led me to look into this test. lightning-2 should go negative since we
roll back the amounts it's received by going to a prior database state.
Rather than trying to do the right thing with obviously broken node
records, instead we just stop trying to account for them correctly
(impossible).
I also noticed that the anchor tests were failing the utxo output
matchup, which we should be asserting on it. The HTLC RBF that our
anchor code creates was causing an issue by creating another wallet
deposit utxo under the HTLC output. We now optionally add this utxo
in the case that anchors are turned on.
Changelog-None: Fix test flake
It's really hard to tell what on earth went wrong when a coin movement
check fails, since we dont' return good error info.
Here we replace almost every `assert` with a proper check + error with
message to help make debugging easier.
cc @rustyrussell
Changelog-None: improve failure messages
If a user tries to do a splice without signing their inputs we now provide them with a nice error message and cancel the RPC since that wouldn’t be productive for the user anyway.
We also add a helpful message if they do the opposite — try to sign a PSBT where they did not add any inputs.
Changelog-Changed: Update prevents users from trying to splice unsigned PSBTs — protecting against potential issues.
Sure, we have to convert to and from the db set-of-pairs, but that's far simpler
than dealing with the current structure now we want to add code to *remove* from
the blacklist.
Changelog-Changed: JSON-RPC: `blacklistrune` no longer supports of runes over id 100,000,000.
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
This migration was introduced in dccbccf8f2 (pre 23.08), so the only way they
would need this is if they migration straight from 23.05 to 25.02. And then
the solution is to migration to a prior one first, but I'll bet good money
we never, ever see this message:
Commando runes still present? Migration removed in v25.02: call Rusty!
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
When building from a release zipfile and there is no `git` present, the version number lacks a ‘v’.
This triggers db.c to fail it’s `is_released_version` check, which fails because the version not start with a ‘v’ — thereby the database upgrade is rejected.
Issue reported by TonyV on Discord:
“Hey guys, I just upgraded to the latest release and am now getting
```
Refusing to irreversibly upgrade db from version 219 to 261 in non-final version 24.11.1 (use --database-upgrade=true to override)
```
I have the db backed up... but am I good to make those changes without breaking my channels?”
Changelog-Changed: Fix for people upgrading using source release zip packages.
test_closing_different_fees fails:
```
2024-10-14T08:43:30.2733614Z
2024-10-14T08:43:30.2734133Z # Now wait for them all to hit normal state, do payments
2024-10-14T08:43:30.2735205Z > l1.daemon.wait_for_logs(['update for channel .* now ACTIVE'] * num_peers
2024-10-14T08:43:30.2736233Z + ['to CHANNELD_NORMAL'] * num_peers)
2024-10-14T08:43:30.2736725Z
2024-10-14T08:43:30.2736903Z tests/test_closing.py:230:
...
2024-10-14T08:43:30.2761325Z E TimeoutError: Unable to find "[re.compile('update for channel .* now ACTIVE')]" in logs.
```
For some reason one of the channel_update injections does *not* evoke this message
from gossipd...
Changelog-None: debug!
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
- Moved the `Usage` section further down in `createrune` and `commando-rune` for improved UX.
- Added a new example for creating a rune with `read-only` restrictions, extending it to allow only payments of `less than 100,000 sats per day` using the `pay` or `xpay` methods.
- Adjusted formatting by appending an extra space after the `dependentUpon` condition, fixing `[*start* [*end*]][*relist*]` to `[*start* [*end*]] [*relist*]`.
- Relocated `Examples` from the expandable section to a standard heading, as examples are now already placed at the end of the page.
Changelog-None.
After merging the Rust-based `clnrest` plugin into the master, all three reproducible build scripts failed with the following error:
```
error: package `socketioxide v0.15.1` cannot be built because it requires rustc 1.75.0 or newer, while the currently active rustc version is 1.73.0
Either upgrade to rustc 1.75.0 or newer, or use
cargo update -p socketioxide@0.15.1 --precise ver
where `ver` is the latest version of `socketioxide` supporting rustc 1.73.0
make: *** [plugins/Makefile:304: target/release/clnrest] Error 101
```
To resolve this, we can either downgrade `socketioxide` to `v0.11.1`, which is compatible with `Rust >=v1.67` OR Upgrade Rust to `v1.75`.
Since the latest Rust version is `1.84`, upgrading to `1.75` seems like a reasonable choice, as it is already 13 months old.
Changelog-None.
If you re-run a node several times, the log file fills with info from
previous runs. To avoid looking at old logs, only parse the most recent
run's logs when looking for the magic CLN rest startup/deactivated
strings
Changelog-Fixed: startup-regtest.sh now only inspects the most recent run's logs for the active status of the clnrest plugin
We used to not print what happened with an HTLC in the `pay`
plugin. This meant that to follow the HTLCs we'd have to map the `pay`
HTLCs to the `lightningd` HTLCs, and then trace that. BY having `pay`
print the outcome as it sees it, we can make that tracking much
simpler, even allowing for tooling to do it for us.
Changelog-None This is a log-only change
Adding the newly introduced RPCs, `editdescriptionbyoutpoint` and `editdescriptionbypaymentid`, to the `Makefile` for generating the corresponding `.md` files required for the documentation portal.
Changelog-None.
I'm not sure why this happens, and suspect it is caused by an issue elsewhere, so
add some verbose debugging, don't crash.
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
Fixes: https://github.com/ElementsProject/lightning/issues/8017