We were getting the following message in test_feerate_stress:
```
2024-07-08T02:15:45.5663941Z lightningd-2 2024-07-08T02:13:45.696Z **BROKEN** 0266e4598d1d3c415f572a8488830b60f7e744ed9235eb0b1ba93283b315c03518-connectd: Peer did not close, forcing close
```
I can reproduce it locally if I run the test enough, and finally found
the issue by printing the status of the fd when we time it out (using
routines from connectd.c).
The peer fd alternates between reading and writing. When we go to
discard it, we wake the write queue, so write_to_peer() get called.
It won't shutdown the socket if there are still subds attached, and
will wait again for a read.
The last subd exit has to also wake the write queue if we're draining,
so it can do the io_sock_shutdown. Otherwise, we hit the timeout,
causing the message above.
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
Theoretical only, but we could leak an fd if we closed a conn before
the fd was sent. This doesn't happen in our current codebase because
we only hand fds to connectd, which only closes at shutdown.
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
We only hand fds to connectd for now, so this doesn't happen (we don't
destroy that queue except on shutdown). But it's still a potential issue.
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
If the peer sends you their signatures before we return from the *first*
openchannel_update then the state would still be in
MULTIFUNDCHANNEL_STARTED (not UPDATED) and we'd incorrectly switch the
state to MULTIFUNDCHANNEL_SIGNED, which would plunge us headfirst into
`check_sigs_ready`.
However the `mfc->txid` hadn't been set yet, so we'd crash when trying
to confirm that we've got sigs for the correct txid. Oops.
To fix this, we simply invert the check such that the *only* state that
will move into SIGNED is SECURED
Interestingly, this patch and the last take my total test time on my
build machine down by about 10%. From 403.93s (0:06:43) to 366.64s (0:06:06)
though there were some unrelated errors.
Changelog-Changed: connectd: I/O optimizations to significantly speed up larger nodes.
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
A fairly simple change: ccan/io will now call the underlying I/O
routines repeatedly until they indicate they are unfinished, *or* fail
with EAGAIN. This should make a significant difference to large
nodes, which currently spend far too much time calling poll() to
discover a single fd is still writable (mainly, for streaming gossip).
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
Changelog-Changed: connectd: now should use far less CPU on large nodes.
common/version.o depends on common/version_gen.h explicitly already,
and header_versions_gen.h is only used by lightningd, so we can make
that dependency explicit.
This means when version changes (i.e. different git commit) we only
relink, not recompile.
Before:
```
real 0m6.578s
user 0m14.705s
sys 0m13.621s
```
After:
```
real 0m2.098s
user 0m6.763s
sys 0m4.844s
```
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
This slows compilation somewhat when we make small changes (e.g. making the tree dirty).
Instead, simply mark them as "pre-XXX" or "XXX".
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
Their dependencies seem correct, so we shouldn't have to build them every time.
Before:
```shell
$ time make -s
real 0m5.183s
user 0m5.134s
sys 0m11.715s
```
After:
```shell
$ time make -s
real 0m0.784s
user 0m0.775s
sys 0m1.159s
```
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
The file `cln-rpc/src/notifications.rs` is generated by `msggen`.
The `Makefile` has tooling to update the file when needed
- `make check-gen-updated` should error if file isn't properly updated
- `make gen` updates the file
This comit fixes both behaviors mentioned above.
As suggested by @cdecker, I compared `poetry show --tree` results from before/after `cd plugins/clnrest/ && poetry export --output requirements.txt && pip install -r requirements.txt` to check for any significant difference. There is no difference in the tree.
But poetry export removed coincurve & other dependencies from clnrest's requirements.txt.
Changelog-None.
clnrest installation instruction on the markdown tries to install `flask_restx` not `flask-restx`.
Reference: clnrest plugin complains of flask_restx missing (#7383)
Changelog-None.
Documents which are deleted from lightning schema were still listed on readme server.
This script was adding or updating the list but delete action was missing.
Changelog-None.
Shahana decided this was the optimal UX path, though I insisted that we still
report the actual problem directly when in dev mode, as a compromise.
Suggested-by: https://github.com/Amperstrand
Changelog-Changed: JSON-RPC: Do not return the contents of invalid parameters in error messages, refer to logs (use 'check' to get full error messages)
Fixes: https://github.com/ElementsProject/lightning/issues/7338
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
It doesn't actually log, just gets the `struct logger`, so this name is better.
We're also going to implement command_log() as an actual logging
function in next commit.
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
```
2024-06-24T05:09:32.5233603Z assert l1.rpc.listfunds()["channels"][0]["state"] == "ONCHAIN"
2024-06-24T05:09:32.5234187Z > assert l2.rpc.listfunds()["channels"][0]["state"] == "ONCHAIN"
2024-06-24T05:09:32.5234917Z E AssertionError: assert 'FUNDING_SPEND_SEEN' == 'ONCHAIN'
2024-06-24T05:09:32.5235464Z E - ONCHAIN
2024-06-24T05:09:32.5235773Z E + FUNDING_SPEND_SEEN
2024-06-24T05:09:32.5236096Z
2024-06-24T05:09:32.5236242Z tests/test_misc.py:2907: AssertionError
```
It's possible that l2 hasn't seen the onchain tx yet. What we really want to do is
wait for it, then test that l1 really can spend the output it has recovered.
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
```
2024-06-24T05:14:14.9939033Z > l1.rpc.call('dev-feerate', [l2.info['id'], rate - 5])
2024-06-24T05:14:14.9939354Z
2024-06-24T05:14:14.9939466Z tests/test_connection.py:3439:
...
2024-06-24T05:14:14.9967617Z > raise RpcError(method, payload, resp['error'])
2024-06-24T05:14:14.9968833Z E pyln.client.lightning.RpcError: RPC call failed: method: dev-feerate, payload: ['022d223620a359a47ff7f7ac447c85c46c923da53389221a0054c11c1e3ca31d59', 2290], error: {'code': -1, 'message': 'Peer bad state'}
```
The disconnect can actually take a while: wait for both sides to
believe they're reconnected. Also wait a little to make sure the
feerate change doesn't create a bad signature.
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
lightningd-3 penalizes lightningd-2 but then it can see the preimage for the HTLC which
has already been timed out:
```
2024-06-24T02:41:29.4633900Z lightningd-3 2024-06-24T02:33:54.073Z **BROKEN** 022d223620a359a47ff7f7ac447c85c46c923da53389221a0054c11c1e3ca31d59-onchaind-chan#1: HTLC already resolved by THEIR_HTLC_TIMEOUT_TO_THEM when we found preimage
```
This is fair: the test deliberately takes l3 offline for long enough
that the HTLC can get timed out.
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
Found by very slow CI: the two io_breaks() can combine into one, so we
block waiting for the second initialization which never happens.
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
checkchain_timer is run by chaintopology, so why have the pointer in
bitcoind?
And since we free the timers, we don't need them to self-disable by
checking the stopped flag.
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
Handling half in main() and half here was a mess. And the name
"max_blockheight" was poor: it was the max in the db, or UINT32_MAX,
but then we changed it depending on what height we wanted to start at.
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
Fixes: https://github.com/ElementsProject/lightning/issues/6924
Changelog-Changed: lightningd: we wait for bitcoind if it has somehow gone backwards (as long as header height is still ok).
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
The current code is confusing: there are polling loops, but we wait for
them to run once.
Be explicit: make the calls once, then start the loops in begin_topology.
This removes various chain_topology struct members which only exist for
startup.
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>