This has been here for a while: self_id hangs around while we're
calling the hook, but now it triggers sometimes.
```
E ValueError:
E Node errors:
E Global errors:
E - Node /tmp/ltests-3mcyp67u/test_dev_rawrequest_1/lightning-1/ has memory leaks: [
E {
E "backtrace": [
E "ccan/ccan/tal/tal.c:442 (tal_alloc_)",
E "gossipd/gossipd_wiregen.c:528 (fromwire_gossipd_got_onionmsg_to_us)",
E "lightningd/onion_message.c:152 (handle_onionmsg_to_us)",
E "lightningd/gossip_control.c:137 (gossip_msg)",
E "lightningd/subd.c:548 (sd_msg_read)",
E "ccan/ccan/io/io.c:59 (next_plan)",
E "ccan/ccan/io/io.c:407 (do_plan)",
E "ccan/ccan/io/io.c:417 (io_ready)",
E "ccan/ccan/io/poll.c:453 (io_loop)",
E "lightningd/io_loop_with_timers.c:21 (io_loop_with_timers)",
E "lightningd/lightningd.c:1164 (main)"
E ],
E "label": "gossipd/gossipd_wiregen.c:528:struct secret",
E "parents": [
E "lightningd/onion_message.c:149:struct onion_message_hook_payload",
E "lightningd/plugin_hook.c:81:struct hook_instance *[]"
E ],
E "value": "0x55cf3cbc9458"
E }
E ]
```
As of 2b923a0367c5f9154fcec706e3302cc4658dd889.
Recurrence quotes need to be marked separately, since they're no longer
in offers main bolt.
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
This builds on the enctlv vectors, but actually goes all the way
to creating a modern onionmessage.
Thanks to Thomas H for corrections!
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
It's very similar to the previous, but there are a few changes:
1. The enctlv fields are numbered differently.
2. The message itself is a different number.
The onionmsg_path type is the same, however, so we keep that constant
at least.
The result is a lot of cut & paste, but we will delete the old one
next release.
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
This is from 6e99c5feaf60cb797507d181fe583224309318e9
We renamed the enctlv field to encrypted_recipient_data in the spec, and the
new onion_message is message 513. We don't handle it until the next patch.
Two renames:
1. blinding_seed -> blinding_point.
2. enctlv -> encrypted_recipient_data.
We don't do a compat cycle for our JSON APIs for these experimental
features only used by our own plugins, we just rename.
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
Temporarily disable sendpay_blinding test which uses obsolete onionmsg;
there's still some debate on the PR about how blinded HTLCs will work.
Changelog-EXPERIMENTAL: onionmessage: removed support for v0.10.1 onion messages.
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
Not all plugins depended on their headers. Keep it simple: all
plugins depend on all plugin headers.
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
Fixes: #4785Fixes: #4883
Changelog-Changed: Plugins: `shutdown` notification is now send when lightningd is almost completely shutdown, RPC calls then fail with error code -5.
No idea why TCSAFLUSH was used, could not find anything in PR comments.
Also cannot explain exactly what causes the problem, but the hang can be reproduced
*with* TCSAFLUSH and not with TCSANOW.
According to termios doc:
TCSANOW
the change occurs immediately.
TCSAFLUSH
the change occurs after all output written to the object referred by fd has been
transmitted, and all input that has been received but not read will be discarded
before the change is made.
Setting SIGCHLD back to default (i.e. ignored) makes waitpid hang on an
old SIGCHLD that was still in the pipe?
This happens running test_important_plugin with developer=1:
(or with dev=0 and build-in plugins subscribed to "shutdown")
0 0x00007ff8336b6437 in __GI___waitpid (pid=-1, stat_loc=0x0, options=1) at ../sysdeps/unix/sysv/linux/waitpid.c:30
1 0x000055fb468f733a in sigchld_rfd_in (conn=0x55fb47c7cfc8, ld=0x55fb47bdce58) at lightningd/lightningd.c:785
2 0x000055fb469bcc6b in next_plan (conn=0x55fb47c7cfc8, plan=0x55fb47c7cfe8) at ccan/ccan/io/io.c:59
3 0x000055fb469bd80b in do_plan (conn=0x55fb47c7cfc8, plan=0x55fb47c7cfe8, idle_on_epipe=false) at ccan/ccan/io/io.c:407
4 0x000055fb469bd849 in io_ready (conn=0x55fb47c7cfc8, pollflags=1) at ccan/ccan/io/io.c:417
5 0x000055fb469bfa26 in io_loop (timers=0x55fb47c41198, expired=0x7ffdf4be9028) at ccan/ccan/io/poll.c:453
6 0x000055fb468f1be9 in io_loop_with_timers (ld=0x55fb47bdce58) at lightningd/io_loop_with_timers.c:21
7 0x000055fb46924817 in shutdown_plugins (ld=0x55fb47bdce58) at lightningd/plugin.c:2114
8 0x000055fb468f7c92 in main (argc=22, argv=0x7ffdf4be9228) at lightningd/lightningd.c:1195
shutdown_subdaemons frees the channel and calls destroy_close_command_on_channel_destroy, see gdb:
0 destroy_close_command_on_channel_destroy (_=0x55db6ca38e18, cc=0x55db6ca43338) at lightningd/closing_control.c:94
1 0x000055db6a8181b5 in notify (ctx=0x55db6ca38df0, type=TAL_NOTIFY_FREE, info=0x55db6ca38e18, saved_errno=0) at ccan/ccan/tal/tal.c:237
2 0x000055db6a8186bb in del_tree (t=0x55db6ca38df0, orig=0x55db6ca38e18, saved_errno=0) at ccan/ccan/tal/tal.c:402
3 0x000055db6a818a47 in tal_free (ctx=0x55db6ca38e18) at ccan/ccan/tal/tal.c:486
4 0x000055db6a73fffa in shutdown_subdaemons (ld=0x55db6c8b4ca8) at lightningd/lightningd.c:543
5 0x000055db6a741098 in main (argc=21, argv=0x7ffffa3e8048) at lightningd/lightningd.c:1192
Before this PR, there was no io_loop after shutdown_subdaemons and client side raised a
general `Connection to RPC server lost.`
Now we test the more specific `Channel forgotten before proper close.`, which is good!
BTW, this test was added recently in PR #4599.
since PR #3867 utxos are unreserved by height, destroy_utxos and
related functions are not used anymore so clean them up also
However free(ld->jsonrpc) still needs to happen before free(ld) because its
destructors need list_head pointers from ld
because:
- shutdown_subdaemons can trigger db write, comments in that function say so at least
- resurrecting the main event loop with subdaemons still running is counter productive
in shutting down activity (such as htlc's, hook_calls etc.)
- custom behavior injected by plugins via hooks should be consistent, see test
in previous commmit
IDEA:
in shutdown_plugins, when starting new io_loop:
- A plugin that is still running can return a jsonrpc_request response, this triggers
response_cb, which cannot be handled because subdaemons are gone -> so any response_cb should be blocked/aborted
- jsonrpc is still there, so users (such as plugins) can make new jsonrpc_request's which
cannot be handled because subdaemons are gone -> so new rpc_request should also be blocked
- But we do want to send/receive notifications and log messages (handled in jsonrpc as jsonrpc_notification)
as these do not trigger subdaemon calls or db_write's
Log messages and notifications do not have "id" field, where jsonrpc_request *do* have an "id" field
PLAN (hypothesis):
- hack into plugin_read_json_one OR plugin_response_handle to filter-out json with
an "id" field, this should
block/abandon any jsonrpc_request responses (and new jsonrpc_requests for plugins?)
Q. Can internal (so not via plugin) jsonrpc_requests called in the main io_loop return/revive in
the shutdown io_loop?
A. No. All code under lightningd/ returning command_still_pending depends on either a subdaemon, timer or
plugin. In shutdown loop the subdaemons are dead, timer struct cleared and plugins will be taken
care of (in next commits).
fixup: we can only io_break the main io_loop once
Incase the error handling happening after the quted line is non-critical:
```
return tal_fmt(NULL, "Unable to parse address '%s': %s", arg, err_msg);
```
we should not expand the proposed_listen_announce array without adding
a proposed_wireaddr. So we move the expand of proposed_listen_announce
to the location where we also expand the proposed_wireaddr.
Changelog-None
This include the following commits:
- review 1/2: move from tab to space, and remove the exp. prop from doc;
- review 2/2: remove experimental features;
Signed-off-by: Vincenzo Palazzo <vincenzopalazzodev@gmail.com>
It runs 6 nodes: under valgrind this ends up consuming 5.3 GB RSS. By
stopping nodes between, we peak about 1G RSS.
Measured using:
(while true; do echo $(for i in 4 5 6; do ps uh | tr -s ' ' | cut -d\ -f$i | tally; done); sleep 5; done)&
(Which measures my other processes as well, but that's only about 100M).
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
This happened in my tal_dump(), and I couldn't see how we ended up
with object having more than one "backtrace". Adding asserts that we
never added a second backtrace didn't trigger.
Finally I wondered if we were tal_steal() backtraces, and sure enough
we do that blinding in one place: libwally wrapping. So fix that.
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
Do proper refcounting on log prefixes; previously we kept them around,
which is fine, but the extra notleak() and backtrace in developer mode
could get quite heavy (I ended up with 1G of backtraces!). This is
mainly due to creating one on every JSONRPC command, and running
clboss.
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
Changelog-Fixed: lightningd: remove slow memory leak in DEVELOPER builds.