Be more graceful in shutting down: this should fix the issue where
bookkeeper gets upset that its commands are rejected during shutdown,
and generally make things more graceful.
1. Stop any new RPC connections.
2. Stop any per-peer daemons (channeld, etc).
3. Shut down plugins.
4. Stop all existing RPC connections.
5. Stop global daemons.
6. Free up peer, chanen HTLC datastructures.
7. Close database.
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
Changelog-Changed: Plugins: RPC operations are now still available during shutdown.
AFAICT we should have been doing this since we started sending and
receiving it, but didn't.
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
Changelog-Added: Protocol: we now advertize the `option_channel_type` feature (which we actually supported since v0.10.2)
We were waiting too long for the reconnect to happen (60s default),
which caused this test to timeout.
When testing, let's speed up the reconnect.
L2 tried to reconnect but didn't have connection information in its
gossip -- is there a way to ask/save connection data from a node you're
making a channel with that doesn't rely on their node_announcement?
Most unexpected ones are still 1, but there are a few recognizable error codes
worth documenting.
Rename the HSM ones to put ERRCODE_ at the front, since we have non-HSM ones
too now.
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
If you get the wrong hsm_secret, your node_id will change, and
peers won't know who you are, bitcoind will reject your transaction
signatures, and other madness.
Catch this as soon as it happens, by storing our node_id in the db.
Suggested-by: @cdecker, @fiatjaf
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
Changelog-Changed: Config: `lightningd` will refuse to start with the wrong node_id (i.e. hsm_secret changes).
Complete implementation of BOLT1 port derivation proposal https://github.com/lightning/bolts/pull/968
Changelog-Added: rpc: use the standard port derivation in connect command when the port is not specified.
Signed-off-by: Vincenzo Palazzo <vincenzopalazzodev@gmail.com>
Looks like we woke one of the startup io_loops early, and thus
we thought we'd finished connectd_activate and we hadn't. This
caused us to use an uninitialized ld->announceable array, and
finally caused an assert fail in the main loop.
Make *every* loop assert that it was exited for the correct reason,
so if it happens again, we can maybe figure out what part of
the code to look at.
```
lightningd: lightningd/lightningd.c:1186: main: Assertion `io_loop_ret == ld' failed.
lightningd: FATAL SIGNAL 6 (version 4df66fa)
...
------------------------------- Valgrind errors --------------------------------
Valgrind error file: valgrind-errors.895509
==895509== Conditional jump or move depends on uninitialised value(s)
==895509== at 0x22C58E: to_tal_hdr_or_null (tal.c:184)
==895509== by 0x22D531: tal_bytelen (tal.c:637)
==895509== by 0x1F10B6: towire_gossipd_init (gossipd_wiregen.c:100)
==895509== by 0x13AC6E: gossip_init (gossip_control.c:254)
==895509== by 0x1497EC: main (lightningd.c:1090)
==895509==
```
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
I've wanted this for a while: the ability to log to multiple places
at once.
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
Changelog-Changed: lightningd: `log-file` option specified multiple times opens multiple log files.
Mostly comments and docs: some places are actually paths, which
I have avoided changing. We may migrate them slowly, particularly
when they're user-visible.
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
I removed these prematurely: we *haven't* had a release since
introducing them!
This consists of reverting d15d629b8b
"plugins/fetchinvoice: remove obsolete string-based API." and
plugins/fetchinvoice: remove obsolete string-based
API. "onion_messages: remove obs2 support."
Some minor changes due to updated fromwire_tlv API since they
were removed, but not much.
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
Changelog-EXPERIMENTAL: REVERT: Removed backwards compat with onion messages from v0.10.1.
Activate means a specific thing now (connectd said something), so avoid
confusing it with this function.
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
Otherwise we get weird effects, as htlcs are being freed:
```
2022-01-26T05:07:37.8774610Z lightningd-1: 2022-01-26T04:47:48.770Z DEBUG 030eeb52087b9dbb27b7aec79ca5249369f6ce7b20a5684ce38d9f4595a21c2fda-chan#8: Failing HTLC 18446744073709551615 due to peer death
2022-01-26T05:07:37.8775287Z lightningd-1: 2022-01-26T04:47:48.770Z **BROKEN** 030eeb52087b9dbb27b7aec79ca5249369f6ce7b20a5684ce38d9f4595a21c2fda-chan#8: Neither origin nor in?
```
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
We want to stream gossip through this, but currently connectd treats the
fd as synchronous. While we work on getting rid of that, it's easiest to
have two fds.
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
We also no longer strip the type off: everyone handles both forms, and
Eclair doesn't strip (and it's easier!).
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
This is in preparation for gossipd feeding us the latest channel_updates,
rather than having lightningd and channeld query gossipd when it wants
to send an onion error with an update included.
This means gossipd will start telling us the updates, so we need the
channels loaded first.
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
lightningd would race with the subd destructor to do the waitpid(),
resulting in UNUSUAL log messages, but also us missing if a plugin
was killed via a signal.
We can also get rid of the gratuitous waitpid() in test_subdaemons.
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
Get rid of the 'movement_idx', since we replay events now.
Since we're removing a field from the 'coin_movement' event emission, we
bump the version type.
Changelog-Updated: `coin_movements` events have been revamped and are now on version 2.
The only thing that needs ld->wallet after this is destroy_invoices_waiter (off jsonrpc)
Could not find any other destructors (destroy_*) that need wallet or db access after this.
Any db access would now segfault.
The idea is to have different default ports for different networks.
Current default port is `9735` for everything. Let's use it for
the mainnet and reuse the difference added to the default port
from `rpc_port` values in `bitcoin/chainstate.c`.
Testnet would be `19735` (adding rpc_port - 8332 = `10000`).
Signet would be `39735` (adding rpc_port - 8332 = `30000`).
Regtest would be `19846` (adding rpc_port - 8332 = `10111`).
With Vincenzo's kind pair-programming help over tmate.
Two other commits were squashed into this one so that bisecting
never ends up in half-baked state:
1. chainparams: Fix regtest default rpc_port
bitcoind -help says this:
-rpcport=<port>
Listen for JSON-RPC connections on <port> (default: 8332, testnet:
18332, signet: 38332, regtest: 18443)
2. test_gossip: Default port for regtest
hex: 2607 is now .... (could be 4d86 but Elements uses another port)
dec: 9735 is now any port (could be 19846 ^^ but now is for any port)
The lines which were binding to default port were removed as the
default port is different on each network.
NOTE: Remember not to modify gossip_store tests which loads everything raw
including the checksums.
Changelog-Changed: If the port is unspecified, the default port is chosen according to used network similarly to Bitcoin Core.
And turn "" includes into full-path (which makes it easier to put
config.h first, and finds some cases check-includes.sh missed
previously).
config.h sets _GNU_SOURCE which really needs to be done before any
'#includes': we mainly got away with it with glibc, but other platforms
like Alpine may have stricter requirements.
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
Setting SIGCHLD back to default (i.e. ignored) makes waitpid hang on an
old SIGCHLD that was still in the pipe?
This happens running test_important_plugin with developer=1:
(or with dev=0 and build-in plugins subscribed to "shutdown")
0 0x00007ff8336b6437 in __GI___waitpid (pid=-1, stat_loc=0x0, options=1) at ../sysdeps/unix/sysv/linux/waitpid.c:30
1 0x000055fb468f733a in sigchld_rfd_in (conn=0x55fb47c7cfc8, ld=0x55fb47bdce58) at lightningd/lightningd.c:785
2 0x000055fb469bcc6b in next_plan (conn=0x55fb47c7cfc8, plan=0x55fb47c7cfe8) at ccan/ccan/io/io.c:59
3 0x000055fb469bd80b in do_plan (conn=0x55fb47c7cfc8, plan=0x55fb47c7cfe8, idle_on_epipe=false) at ccan/ccan/io/io.c:407
4 0x000055fb469bd849 in io_ready (conn=0x55fb47c7cfc8, pollflags=1) at ccan/ccan/io/io.c:417
5 0x000055fb469bfa26 in io_loop (timers=0x55fb47c41198, expired=0x7ffdf4be9028) at ccan/ccan/io/poll.c:453
6 0x000055fb468f1be9 in io_loop_with_timers (ld=0x55fb47bdce58) at lightningd/io_loop_with_timers.c:21
7 0x000055fb46924817 in shutdown_plugins (ld=0x55fb47bdce58) at lightningd/plugin.c:2114
8 0x000055fb468f7c92 in main (argc=22, argv=0x7ffdf4be9228) at lightningd/lightningd.c:1195
since PR #3867 utxos are unreserved by height, destroy_utxos and
related functions are not used anymore so clean them up also
However free(ld->jsonrpc) still needs to happen before free(ld) because its
destructors need list_head pointers from ld
because:
- shutdown_subdaemons can trigger db write, comments in that function say so at least
- resurrecting the main event loop with subdaemons still running is counter productive
in shutting down activity (such as htlc's, hook_calls etc.)
- custom behavior injected by plugins via hooks should be consistent, see test
in previous commmit
IDEA:
in shutdown_plugins, when starting new io_loop:
- A plugin that is still running can return a jsonrpc_request response, this triggers
response_cb, which cannot be handled because subdaemons are gone -> so any response_cb should be blocked/aborted
- jsonrpc is still there, so users (such as plugins) can make new jsonrpc_request's which
cannot be handled because subdaemons are gone -> so new rpc_request should also be blocked
- But we do want to send/receive notifications and log messages (handled in jsonrpc as jsonrpc_notification)
as these do not trigger subdaemon calls or db_write's
Log messages and notifications do not have "id" field, where jsonrpc_request *do* have an "id" field
PLAN (hypothesis):
- hack into plugin_read_json_one OR plugin_response_handle to filter-out json with
an "id" field, this should
block/abandon any jsonrpc_request responses (and new jsonrpc_requests for plugins?)
Q. Can internal (so not via plugin) jsonrpc_requests called in the main io_loop return/revive in
the shutdown io_loop?
A. No. All code under lightningd/ returning command_still_pending depends on either a subdaemon, timer or
plugin. In shutdown loop the subdaemons are dead, timer struct cleared and plugins will be taken
care of (in next commits).
fixup: we can only io_break the main io_loop once
This loads up 20MB of plugins temporarily; we seem to be getting OOM
killed under CI and I wonder if this is contributing.
Doesn't significantly reduce runtime here, but I have lots of memory.
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
Fixes: #4868
ChangeLog-Fixed: We now no longer self-limit the number of file descriptors (which limits the number of channels) in sufficiently modern systems, or where we can access `/proc` or `/dev/fd`. We still self-limit on old systems where we cannot find the list of open files on `/proc` or `/dev/fd`, so if you need > ~4000 channels, upgrade or mount `/proc`.
This also inadvertently fixes a latent bug: before this patch, in the
`subd` function in `lightningd/subd.c`, we would close `execfail[1]`
*before* doing an `exec`.
We use an EOF on `execfail[1]` as a signal that `exec` succeeded (the
fd is marked CLOEXEC), and otherwise use it to pump `errno` to the
parent.
The intent is that this fd should be kept open until `exec`, at which
point CLOEXEC triggers and close that fd and sends the EOF, *or* if
`exec` fails we can send the `errno` to the parent process vua that
pipe-end.
However, in the previous version, we end up closing that fd *before*
reaching `exec`, either in the loop which `dup2`s passed-in fds (by
overwriting `execfail[1]` with a `dup2`) or in the "close everything"
loop, which does not guard against `execfail[1]`, only
`dev_disconnect_fd`.
After recent header files clean-up it was not possible to
build c-lightning 7401b2682. This patch fixes it both for
Alpine Linux and OpenBSD.
Proposed-by: nathanael <nathanael@dalliard.ch>
Changelog-None
Before:
Ten builds, laptop -j5, no ccache:
```
real 0m36.686000-38.956000(38.608+/-0.65)s
user 2m32.864000-42.253000(40.7545+/-2.7)s
sys 0m16.618000-18.316000(17.8531+/-0.48)s
```
Ten builds, laptop -j5, ccache (warm):
```
real 0m8.212000-8.577000(8.39989+/-0.13)s
user 0m12.731000-13.212000(12.9751+/-0.17)s
sys 0m3.697000-3.902000(3.83722+/-0.064)s
```
After:
Ten builds, laptop -j5, no ccache: 8% faster
```
real 0m33.802000-35.773000(35.468+/-0.54)s
user 2m19.073000-27.754000(26.2542+/-2.3)s
sys 0m15.784000-17.173000(16.7165+/-0.37)s
```
Ten builds, laptop -j5, ccache (warm): 1% faster
```
real 0m8.200000-8.485000(8.30138+/-0.097)s
user 0m12.485000-13.100000(12.7344+/-0.19)s
sys 0m3.702000-3.889000(3.78787+/-0.056)s
```
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
That was quick!
We remove the 50% test, since the default is now to use quickclose.
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
Changelog-Added: Protocol: We now perform quick-close if the peer supports it.