Commit Graph

134 Commits

Author SHA1 Message Date
niftynei
9b8909e507 dual-fund: keep track of aborted requests, seamlessly restart daemon
Clean restart of daemon after a tx-abort is a nice way to work around
the 'persistent' disconnect that we t-bast noticed.

Changelog-Fixed: `dualopend`: Fix behavior for tx-aborts. No longer hangs, appropriately continues re-init of RBF requests without reconnction msg exchange.
2023-07-30 15:20:04 +09:30
Rusty Russell
5148fcaeed lightningd: fix false memleak report (test flake)!
We get intermittant reports of subd->conn being leaked, but I could never find it.
That's because it's actually subd which is not referenced any more: subd->conn
gets reported because it's subd's tal_parent (and, except for the reference in
subd, not referenced either).

The real issue is that the channel->owner is reassigned to the new subdaemon,
and the old one is still exiting.  During that time, we can see a "leak".

```
- Node /tmp/ltests-hkr089bp/test_sql_1/lightning-3/ has memory leaks: [
   {
       "backtrace": [
           "ccan/ccan/tal/tal.c:477 (tal_alloc_)",
           "ccan/ccan/io/io.c:91 (io_new_conn_)",
           "lightningd/subd.c:774 (new_subd)",
           "lightningd/subd.c:828 (new_channel_subd_)",
           "lightningd/dual_open_control.c:3662 (peer_restart_dualopend)",
           "lightningd/peer_control.c:1161 (connect_activate_subd)",
           "lightningd/peer_control.c:1273 (peer_connected_hook_final)",
           "lightningd/plugin_hook.c:213 (plugin_hook_callback)",
           "lightningd/plugin.c:591 (plugin_response_handle)",
           "lightningd/plugin.c:702 (plugin_read_json_one)",
           "lightningd/plugin.c:747 (plugin_read_json)",
           "ccan/ccan/io/io.c:59 (next_plan)",
           "ccan/ccan/io/io.c:407 (do_plan)",
           "ccan/ccan/io/io.c:417 (io_ready)",
           "ccan/ccan/io/poll.c:453 (io_loop)",
           "lightningd/io_loop_with_timers.c:22 (io_loop_with_timers)",
           "lightningd/lightningd.c:1249 (main)"
       ],
       "label": "ccan/ccan/io/io.c:91:struct io_conn",
       "parents": [
           "lightningd/lightningd.c:107:struct lightningd"
       ],
       "value": "0x556c63c859f8"
   }
```

Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
2023-07-24 16:30:31 +02:00
Rusty Russell
2bf0b922ca lightningd: add "has_io_logging" helper.
Rather than exposing the filtering internals.

Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
2023-07-19 19:13:57 +09:30
Rusty Russell
c074fe050f lightningd/log: clean up nomenclature.
`struct log` becomes `struct logger`, and the member which points to the
`struct log_book` becomes `->log_book` not `->lr`.

Also, we don't need to keep the log_book in struct plugin, since it has
access to ld's log_book.

Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
2023-07-19 19:13:57 +09:30
Rusty Russell
375215a141 lightningd: more graceful shutdown.
Be more graceful in shutting down: this should fix the issue where
bookkeeper gets upset that its commands are rejected during shutdown,
and generally make things more graceful.

1. Stop any new RPC connections.
2. Stop any per-peer daemons (channeld, etc).
3. Shut down plugins.
4. Stop all existing RPC connections.
5. Stop global daemons.
6. Free up peer, chanen HTLC datastructures.
7. Close database.

Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
Changelog-Changed: Plugins: RPC operations are now still available during shutdown.
2022-09-12 14:00:41 +02:00
Rusty Russell
f6f1844e15 options: let log-level subsystem filter also cover nodeid.
That's useful for "tell me everything about this node" debugging.

Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
Fixes: #5348
Changelog-Added: lightningd: `log-level=debug:<partial-nodeid>` supported to get debug-level logs for everything about a peer.
2022-07-09 09:59:52 +09:30
Rusty Russell
e120b4afd6 lightningd: add more information should subd send wrong message.
I saw this once, but could not track it down.

Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
2022-06-27 17:21:35 +09:30
Rusty Russell
3f98cf3fce lightningd: track weird CI crash in test_important_plugin
Looks like we woke one of the startup io_loops early, and thus
we thought we'd finished connectd_activate and we hadn't.  This
caused us to use an uninitialized ld->announceable array, and
finally caused an assert fail in the main loop.

Make *every* loop assert that it was exited for the correct reason,
so if it happens again, we can maybe figure out what part of
the code to look at.

```
lightningd: lightningd/lightningd.c:1186: main: Assertion `io_loop_ret == ld' failed.
lightningd: FATAL SIGNAL 6 (version 4df66fa)
...
------------------------------- Valgrind errors --------------------------------
Valgrind error file: valgrind-errors.895509
==895509== Conditional jump or move depends on uninitialised value(s)
==895509==    at 0x22C58E: to_tal_hdr_or_null (tal.c:184)
==895509==    by 0x22D531: tal_bytelen (tal.c:637)
==895509==    by 0x1F10B6: towire_gossipd_init (gossipd_wiregen.c:100)
==895509==    by 0x13AC6E: gossip_init (gossip_control.c:254)
==895509==    by 0x1497EC: main (lightningd.c:1090)
==895509== 
```

Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
2022-06-27 17:21:35 +09:30
Rusty Russell
5c949e3116 subd: make channel/peer own the subd.
We get some memleak reports because ld owns the subd, but once
the peer/channel is freed, there's no reference for the brief time
until the subd exits.

This happens for both opening and closingd.  For openingd, the
peer owns it, for others (including dualopend) the channel owns it.

Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
2022-03-30 06:27:52 +10:30
Rusty Russell
00bb6f07d7 lightningd: simplify memleak code.
Instead of doing this weird chaining, just call them all at once and
use a reference counter.

To make it simpler, we return the subd_req so we can hang a destructor
off it which decrements after the request is complete.

Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
2022-03-10 09:40:09 +10:30
niftynei
ce12d2b8a9 database: pull out database code into a new module
We're going to reuse the database controllers for the accounting plugin
2022-03-05 15:03:34 +10:30
Rusty Russell
0c24334738 lightningd: clean up subds before freeing HTLCs.
Otherwise we get weird effects, as htlcs are being freed:

```
2022-01-26T05:07:37.8774610Z lightningd-1: 2022-01-26T04:47:48.770Z DEBUG   030eeb52087b9dbb27b7aec79ca5249369f6ce7b20a5684ce38d9f4595a21c2fda-chan#8: Failing HTLC 18446744073709551615 due to peer death
2022-01-26T05:07:37.8775287Z lightningd-1: 2022-01-26T04:47:48.770Z **BROKEN** 030eeb52087b9dbb27b7aec79ca5249369f6ce7b20a5684ce38d9f4595a21c2fda-chan#8: Neither origin nor in?
```

Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
2022-02-08 11:15:52 +10:30
Rusty Russell
3c5d27e3e9 subdaemons: remove gossipd fd from per-peer daemons.
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
2022-02-08 11:15:52 +10:30
Rusty Russell
1c71c9849b connectd: handle custom messages.
This is neater than what we had before, and slightly more general.

Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
Changelog-Changed: JSON_RPC: `sendcustommsg` now works with any connected peer, even when shutting down a channel.
2022-02-08 11:15:52 +10:30
Rusty Russell
4a4f85dd3f subd: fix waitpid properly.
lightningd would race with the subd destructor to do the waitpid(),
resulting in UNUSUAL log messages, but also us missing if a plugin
was killed via a signal.

We can also get rid of the gratuitous waitpid() in test_subdaemons.

Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
2022-01-25 06:26:52 +10:30
Rusty Russell
d51fb5207a msg_queue: don't allow magic MSG_PASS_FD message for peers.
msg_queue was originally designed for inter-daemon comms, and so it has
a special mechanism to mark that we're trying to send an fd.  Unfortunately,
a peer could also send such a message, confusing us!

Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
2022-01-20 15:24:06 +10:30
Rusty Russell
741f44725a patch lightningd-peer-fds.patch 2022-01-20 15:24:06 +10:30
Rusty Russell
6115ed02e8 subdaemons: don't stream gossip_store at all.
We now let gossipd do it.

This also means there's nothing left in 'struct per_peer_state' to
send across the wire (the fds are sent separately), so that gets
removed from wire messages too.

Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
2022-01-20 15:24:06 +10:30
Rusty Russell
7a514112ec connectd: do dev_disconnect logic.
As connectd handles more packets itself, or diverts them to/from gossipd,
it's the only place we can implement the dev_disconnect logic.

Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
2022-01-20 15:24:06 +10:30
Rusty Russell
967ffbfbcb global: use tal_dup_or_null().
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
2021-12-30 14:36:55 +10:30
Rusty Russell
90b669857e lightningd: handle channel cleanups more explicitly.
1. Freeing an unconfirmed channel already releases the subd, so don't
   do that explicitly.
2. Use channel->owner to transfer ownership where possible, using
   channel_set_owner() which handles all the cases.

This simplifies the code and makes it more readable, IMHO.

Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
2021-12-30 09:50:40 +10:30
Rusty Russell
18526a3a5b lightningd: close one more fd for subdaemons.
Noticed by stracing for an unrelated problem.

Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
2021-12-30 09:50:40 +10:30
Rusty Russell
4ffda340d3 check: make sure all files outside contrib/ include "config.h" first.
And turn "" includes into full-path (which makes it easier to put
config.h first, and finds some cases check-includes.sh missed
previously).

config.h sets _GNU_SOURCE which really needs to be done before any
'#includes': we mainly got away with it with glibc, but other platforms
like Alpine may have stricter requirements.

Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
2021-12-06 10:05:39 +10:30
Rusty Russell
d8c59fca77 lightningd: fix compilation error on OpenBSD
```

cc lightningd/subd.c
lightningd/subd.c:216:7: error: expected identifier or '('
                int stdout = STDOUT_FILENO, stderr = STDERR_FILENO;
                    ^
/usr/include/stdio.h:198:17: note: expanded from macro 'stdout'
                 ^
lightningd/subd.c:216:7: error: expected ')'
/usr/include/stdio.h:198:17: note: expanded from macro 'stdout'
                 ^
lightningd/subd.c:216:7: note: to match this '('
/usr/include/stdio.h:198:16: note: expanded from macro 'stdout'
                ^
lightningd/subd.c:224:12: error: cannot take the address of an rvalue of type 'FILE *' (aka 'struct __sFILE *')
                fds[1] = &stdout;
                         ^~~~~~~
lightningd/subd.c:225:12: error: cannot take the address of an rvalue of type 'FILE *' (aka 'struct __sFILE *')
                fds[2] = &stderr;
                         ^~~~~~~
4 errors generated.
gmake: *** [Makefile:279: lightningd/subd.o] Error 1
```

Changelog-None: introduced since last release.
Fixes: #4914
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au> ```
2021-11-17 14:53:30 +10:30
Rusty Russell
78ebddeece subd: clean up our fd shuffling logic.
It's both complex and flawed, as ZmnSCPxj points out.  Make a generic
fd ordering routine, and use it.

Plus, test it!

Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
2021-11-10 10:27:15 +10:30
ZmnSCPxj jxPCSnmZ
5356267f15 *: Use new closefrom module from ccan.
This also inadvertently fixes a latent bug: before this patch, in the
`subd` function in `lightningd/subd.c`, we would close `execfail[1]`
*before* doing an `exec`.
We use an EOF on `execfail[1]` as a signal that `exec` succeeded (the
fd is marked CLOEXEC), and otherwise use it to pump `errno` to the
parent.
The intent is that this fd should be kept open until `exec`, at which
point CLOEXEC triggers and close that fd and sends the EOF, *or* if
`exec` fails we can send the `errno` to the parent process vua that
pipe-end.

However, in the previous version, we end up closing that fd *before*
reaching `exec`, either in the loop which `dup2`s passed-in fds (by
overwriting `execfail[1]` with a `dup2`) or in the "close everything"
loop, which does not guard against `execfail[1]`, only
`dev_disconnect_fd`.
2021-10-22 13:17:37 +02:00
Rusty Russell
7401b26824 cleanup: remove unneeded includes in C files.
Before:
 Ten builds, laptop -j5, no ccache:

```
real	0m36.686000-38.956000(38.608+/-0.65)s
user	2m32.864000-42.253000(40.7545+/-2.7)s
sys	0m16.618000-18.316000(17.8531+/-0.48)s
```

 Ten builds, laptop -j5, ccache (warm):

```
real	0m8.212000-8.577000(8.39989+/-0.13)s
user	0m12.731000-13.212000(12.9751+/-0.17)s
sys	0m3.697000-3.902000(3.83722+/-0.064)s
```

After:
 Ten builds, laptop -j5, no ccache: 8% faster

```
real	0m33.802000-35.773000(35.468+/-0.54)s
user	2m19.073000-27.754000(26.2542+/-2.3)s
sys	0m15.784000-17.173000(16.7165+/-0.37)s
```

 Ten builds, laptop -j5, ccache (warm): 1% faster

```
real	0m8.200000-8.485000(8.30138+/-0.097)s
user	0m12.485000-13.100000(12.7344+/-0.19)s
sys	0m3.702000-3.889000(3.78787+/-0.056)s
```

Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
2021-09-17 09:43:22 +09:30
Rusty Russell
f97a51cc0f lightningd: don't send other messages until we've received version.
This avoids subdaemons complaining about malformed messages from us,
or doing the completely wrong thing, if they are really the wrong
version.

Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
2021-04-24 13:56:58 +09:30
Rusty Russell
32d650f9df lightningd: don't abort on incorrect versions, but try to re-exec.
You still shouldn't do this (you could get some transient failures),
but at least you have a decent chance if you reinstall over a running
daemon, instead of getting confusing internal errors if message
formats have changed.

Changelog-Added: lightningd: we now try to restart if subdaemons are upgraded underneath us.
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
Fixes: #4346
2021-04-24 13:56:58 +09:30
Rusty Russell
b36e9fe473 status: new message for subdaemons to tell us their versions.
For this patch we simply abort if it's wrong.

Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
2021-04-24 13:56:58 +09:30
niftynei
4c9a4250bf subd: remove "swap" methods
only needed for moving the subd->channel from an uncommitted_channel to
a channel; we removed uncommitted_channel from dual_open so it's no
longer necessary
2021-03-03 16:19:04 -06:00
niftynei
de3599e98a subd: remove ctype (channel_type)
We only needed the type check for dual_open, since it was the only
subdaemon path that used two 'types' in the subd->channel field.
2021-03-03 16:19:04 -06:00
Rusty Russell
d14e273b04 common: treat all "all-channels" errors as if they were warnings.
This is in line with the warnings draft, where all-zeroes in a
channel_id is no longer special (i.e. it will be ignored).

But gossipd would send these if it got upset with us, so it's best
practice to ignore them for now anyway.

Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
Changelog-Added: Protocol: we treat error messages from peer which refer to "all channels" as warnings, not errors.
2021-02-04 12:02:52 +10:30
niftynei
bf49bcfa90 subd: keep track of 'channel's type
Back in the days before dual-funding, the `channel` struct on subd was
only every one type per daemon (either struct channel or struct
uncommitted_channel)

The RBF requirement on dualopend means that dualopend's channel,
however, can now be two different things -- either channel or
uncommitted_channel.

To track the difference/disambiguate, we now track the channel type on a
flag on the subd. It gets updated when we swap out the channel.
2021-01-10 13:44:04 +01:00
niftynei
c8aa6d4a55 subd: swap out the channel + error callback
dual funding now swaps out the subdaemon's 'channel' struct in the
middle of daemon existence, so we update the channel and error callback
here.
2021-01-10 13:44:04 +01:00
Rusty Russell
8150d28575 Makefile: use generic rules to make spec-derived sources.
Now we use the same Makefile rules for all CSV->C generation.

Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
2020-08-31 21:33:26 -05:00
Rusty Russell
3e52d4100d common: convert to new wire generation style.
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
2020-08-25 12:53:13 +09:30
Ken Sedgwick
5fd0ed79f4
lightningd: Added --subdaemon command to allow alternate subdaemons.
Changelog-Added: lightningd: Added --subdaemon command to allow alternate subdaemons.

[ Wow, that was mammoth; 44 comments over 12 commits. Feels almost unfair to squash it into one commit, so I wanted to note @ksedgwic's perseverence here! --RR ]
2020-02-04 10:44:13 +10:30
Christian Decker
f08447d624 subd: Allow sending common messages to subdaemons 2020-01-28 23:50:52 +01:00
Rusty Russell
4fc498f901 lightningd: enable io logging on subdaemons iff we're going to print it.
This simplifies our tests, too, since we don't need a magic option to
enable io logging in subdaemons.

Note that test_bad_onion still takes too long, due to a separate minor
bug, so that's marked and left dev-only for now.

Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
2019-11-18 04:50:22 +00:00
Rusty Russell
ef7a820ab1 log: make formatting more consistent.
1. Printed form is always "[<nodeid>-]<prefix>: <string>"
2. "jcon fd %i" becomes "jsonrpc #%i".
3. "jsonrpc" log is only used once, and is removed.
4. "database" log prefix is use for db accesses.
5. "lightningd(%i)" becomes simply "lightningd" without the pid.
6. The "lightningd_" prefix is stripped from subd log prefixes, and pid removed.

Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
Changelog-changed: Logging: formatting made uniform: [NODEID-]SUBSYSTEM: MESSAGE
Changelog-removed: `lightning_` prefixes removed from subdaemon names, including in listpeers `owner` field.
2019-11-18 04:50:22 +00:00
Rusty Russell
86fb54a33b lightningd: remove per-peer log book.
We had a separate logbook for each peer, and copy log entries above
the printable log level into the master logbook.  This didn't always
work well, since we didn't dump it on crash for example.

Keep a single global logbook instead, and remove this infrastructure.

Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
2019-11-18 04:50:22 +00:00
Rusty Russell
e433d4ddc1 lightningd: have logging include an optional node_id for each entry.
A log can have a default node_id, which can be overridden on a per-entry
basis.  This changes the format of logging, so some tests need rework.

Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
2019-11-18 04:50:22 +00:00
Rusty Russell
4fa7b30836 lightningd: have optional node_id associated with subdaemons.
We'll use this for logging it.

Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
2019-11-18 04:50:22 +00:00
Rusty Russell
a3273d4c84 developer: IFDEV() macro.
There are some more #if DEVELOPER one-liners coming, this makes them
clear, but still lets them stand out.

Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
2019-09-12 05:11:56 +00:00
Christian Decker
803007ecdf db: Make the db struct private and provide accessors instead
We will soon generalize the DB, so directly reaching into the `struct db`
instance to talk to the sqlite3 connection is bad anyway. This increases
flexibility and allows us to tailor the actual implementation to the
underlying DB.

Signed-off-by: Christian Decker <decker.christian@gmail.com>
2019-09-05 23:41:05 +00:00
Rusty Russell
0954feddc7 json: speed up shutdown.
We currently end up sleeping for 1 second for channeld and gossipd:
better to use a normal blocking waitpid and an alarm to wake us in
case they don't exit.

This speeds up `lightning-cli stop` on my machine from 2.008s to 0.008s:
a 286 times speedup!

Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
2019-08-22 01:34:03 +00:00
Rusty Russell
dd79813a75 common: add peer_error flag to treat this error as "soft".
The spec says to close the channel if they send us an error, but we
need to be more lenient to preserve channels with other
implementations.

Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
2019-07-26 03:53:03 +00:00
Rusty Russell
38d2899fbb common/per_per_state: generalize lightningd/peer_comm Part 1
Encapsulating the peer state was a win for lightningd; not surprisingly,
it's even more of a win for the other daemons, especially as we want
to add a little gossip information.

Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
2019-06-04 01:29:39 +00:00
ZmnSCPxj
37440e9447 lightningd/subd.c: Return NULL from subd_shutdown.
And set pointers to shut down daemons as NULL in lightningd.
2019-05-31 15:01:58 +02:00