These are very noisy at the debug level:
DEBUG lightningd: Calling rpc_command hook of plugin clboss
DEBUG lightningd: Plugin clboss returned from rpc_command hook call
As seen in https://github.com/ZmnSCPxj/clboss/issues/194
Covered up by notleak() :(
Changelog-Fixed: lightingd: slow memory leak when using plugin hooks fixed (introduced in v23.11)
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
```
Already in transaction from lightningd/plugin.c:727
```
There are two callers, and one didn't disable transactions, so do it in plugin_exclusive_loop.
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
Now the internal code will generate a "PLUGIN_TERMINATED" response
when the plugin dies, we can handle it in plugin_hook.
But we can also simplify it by turning the snapshot of hooks into
a simple array: this means we are robust against any combination of plugins
exiting at any time.
Note: this reveals an issue with test_rpc_command_hook where we run
the request hook again (unexpectedly), so we disable that for the next
patch.
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
It was always a bit weird they weren't, and it seems a premature
optimization to make the callbacks to this themselves.
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
`struct log` becomes `struct logger`, and the member which points to the
`struct log_book` becomes `->log_book` not `->lr`.
Also, we don't need to keep the log_book in struct plugin, since it has
access to ld's log_book.
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
Usually the calls are spontanous, so it's just "cln:<method>#NNN", but
json_invoice() calls listincoming, and json_checkmessage calls
listnodes, so those become "cli:invoice-<pid>/cln:listincoming#NNN".
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
Be more graceful in shutting down: this should fix the issue where
bookkeeper gets upset that its commands are rejected during shutdown,
and generally make things more graceful.
1. Stop any new RPC connections.
2. Stop any per-peer daemons (channeld, etc).
3. Shut down plugins.
4. Stop all existing RPC connections.
5. Stop global daemons.
6. Free up peer, chanen HTLC datastructures.
7. Close database.
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
Changelog-Changed: Plugins: RPC operations are now still available during shutdown.
We have them split over common/param.c, common/json.c,
common/json_helpers.c, common/json_tok.c and common/json_stream.c.
Change that to:
* common/json_parse (all the json_to_xxx routines)
* common/json_parse_simple (simplest the json parsing routines, for cli too)
* common/json_stream (all the json_add_xxx routines)
* common/json_param (all the param and param_xxx routines)
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
Looks like we woke one of the startup io_loops early, and thus
we thought we'd finished connectd_activate and we hadn't. This
caused us to use an uninitialized ld->announceable array, and
finally caused an assert fail in the main loop.
Make *every* loop assert that it was exited for the correct reason,
so if it happens again, we can maybe figure out what part of
the code to look at.
```
lightningd: lightningd/lightningd.c:1186: main: Assertion `io_loop_ret == ld' failed.
lightningd: FATAL SIGNAL 6 (version 4df66fa)
...
------------------------------- Valgrind errors --------------------------------
Valgrind error file: valgrind-errors.895509
==895509== Conditional jump or move depends on uninitialised value(s)
==895509== at 0x22C58E: to_tal_hdr_or_null (tal.c:184)
==895509== by 0x22D531: tal_bytelen (tal.c:637)
==895509== by 0x1F10B6: towire_gossipd_init (gossipd_wiregen.c:100)
==895509== by 0x13AC6E: gossip_init (gossip_control.c:254)
==895509== by 0x1497EC: main (lightningd.c:1090)
==895509==
```
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
The only thing that needs ld->wallet after this is destroy_invoices_waiter (off jsonrpc)
Could not find any other destructors (destroy_*) that need wallet or db access after this.
Any db access would now segfault.
And turn "" includes into full-path (which makes it easier to put
config.h first, and finds some cases check-includes.sh missed
previously).
config.h sets _GNU_SOURCE which really needs to be done before any
'#includes': we mainly got away with it with glibc, but other platforms
like Alpine may have stricter requirements.
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
because:
- shutdown_subdaemons can trigger db write, comments in that function say so at least
- resurrecting the main event loop with subdaemons still running is counter productive
in shutting down activity (such as htlc's, hook_calls etc.)
- custom behavior injected by plugins via hooks should be consistent, see test
in previous commmit
IDEA:
in shutdown_plugins, when starting new io_loop:
- A plugin that is still running can return a jsonrpc_request response, this triggers
response_cb, which cannot be handled because subdaemons are gone -> so any response_cb should be blocked/aborted
- jsonrpc is still there, so users (such as plugins) can make new jsonrpc_request's which
cannot be handled because subdaemons are gone -> so new rpc_request should also be blocked
- But we do want to send/receive notifications and log messages (handled in jsonrpc as jsonrpc_notification)
as these do not trigger subdaemon calls or db_write's
Log messages and notifications do not have "id" field, where jsonrpc_request *do* have an "id" field
PLAN (hypothesis):
- hack into plugin_read_json_one OR plugin_response_handle to filter-out json with
an "id" field, this should
block/abandon any jsonrpc_request responses (and new jsonrpc_requests for plugins?)
Q. Can internal (so not via plugin) jsonrpc_requests called in the main io_loop return/revive in
the shutdown io_loop?
A. No. All code under lightningd/ returning command_still_pending depends on either a subdaemon, timer or
plugin. In shutdown loop the subdaemons are dead, timer struct cleared and plugins will be taken
care of (in next commits).
fixup: we can only io_break the main io_loop once
When we are calling hooks, we track them via a linked list. As they
execute, we pop them off the list in plugin_hook_killed().
When we kill a plugin, we have a destructor which remove its entry from the linked list: plugin_hook_killed.
If it's at the head of the list, that means the plugin died while
processing the hook, so instead of just deleting it, we call
plugin_hook_killed() which behaves as if it said "result: continue".
But plugin_hook_killed() just returns if we're shutting down; this
leaves the link (then freed) on the list, and the *next* plugin tries
to unlink from the list, accessing the previous free entry.
The fix is simple: unlink from the list in plugin_hook_killed() even
if we're shutting down.
```
Valgrind error file: valgrind-errors.78570
==78570== Invalid write of size 8
==78570== at 0x174B55: list_del_ (list.h:328)
==78570== by 0x174FCC: plugin_hook_killed (plugin_hook.c:135)
==78570== by 0x21DC3F: notify (tal.c:240)
==78570== by 0x21E156: del_tree (tal.c:402)
==78570== by 0x21E1A8: del_tree (tal.c:412)
==78570== by 0x21E4F2: tal_free (tal.c:486)
==78570== by 0x16EBD1: plugin_kill (plugin.c:345)
==78570== by 0x16F9C4: plugin_conn_finish (plugin.c:724)
==78570== by 0x20F1A5: destroy_conn (poll.c:244)
==78570== by 0x20F1C9: destroy_conn_close_fd (poll.c:250)
==78570== by 0x21DC3F: notify (tal.c:240)
==78570== by 0x21E156: del_tree (tal.c:402)
==78570== Address 0x6aee688 is 40 bytes inside a block of size 72 free'd
==78570== at 0x483CA3F: free (in /usr/lib/x86_64-linux-gnu/valgrind/vgpreload_memcheck-amd64-linux.so)
==78570== by 0x21E224: del_tree (tal.c:421)
==78570== by 0x21E1A8: del_tree (tal.c:412)
==78570== by 0x21E4F2: tal_free (tal.c:486)
==78570== by 0x16EBD1: plugin_kill (plugin.c:345)
==78570== by 0x16F9C4: plugin_conn_finish (plugin.c:724)
==78570== by 0x20F1A5: destroy_conn (poll.c:244)
==78570== by 0x20F1C9: destroy_conn_close_fd (poll.c:250)
==78570== by 0x21DC3F: notify (tal.c:240)
==78570== by 0x21E156: del_tree (tal.c:402)
==78570== by 0x21E4F2: tal_free (tal.c:486)
==78570== by 0x20D7B6: io_close (io.c:450)
==78570== Block was alloc'd at
==78570== at 0x483B7F3: malloc (in /usr/lib/x86_64-linux-gnu/valgrind/vgpreload_memcheck-amd64-linux.so)
==78570== by 0x21DCAD: allocate (tal.c:250)
==78570== by 0x21E26E: tal_alloc_ (tal.c:428)
==78570== by 0x175599: plugin_hook_call_ (plugin_hook.c:259)
==78570== by 0x13616F: plugin_hook_call_onion_message_blinded (onion_message.c:126)
==78570== by 0x13643B: handle_obs_onionmsg_to_us (onion_message.c:187)
==78570== by 0x138BBD: gossip_msg (gossip_control.c:140)
==78570== by 0x178AEC: sd_msg_read (subd.c:495)
==78570== by 0x20CA00: next_plan (io.c:59)
==78570== by 0x20D608: do_plan (io.c:407)
==78570== by 0x20D64A: io_ready (io.c:417)
==78570== by 0x20F8F1: io_loop (poll.c:445)
```
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
Before:
Ten builds, laptop -j5, no ccache:
```
real 0m36.686000-38.956000(38.608+/-0.65)s
user 2m32.864000-42.253000(40.7545+/-2.7)s
sys 0m16.618000-18.316000(17.8531+/-0.48)s
```
Ten builds, laptop -j5, ccache (warm):
```
real 0m8.212000-8.577000(8.39989+/-0.13)s
user 0m12.731000-13.212000(12.9751+/-0.17)s
sys 0m3.697000-3.902000(3.83722+/-0.064)s
```
After:
Ten builds, laptop -j5, no ccache: 8% faster
```
real 0m33.802000-35.773000(35.468+/-0.54)s
user 2m19.073000-27.754000(26.2542+/-2.3)s
sys 0m15.784000-17.173000(16.7165+/-0.37)s
```
Ten builds, laptop -j5, ccache (warm): 1% faster
```
real 0m8.200000-8.485000(8.30138+/-0.097)s
user 0m12.485000-13.100000(12.7344+/-0.19)s
sys 0m3.702000-3.889000(3.78787+/-0.056)s
```
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
We previously registered hooks up in who-replies-to-getmanifest-first
order, but then if any had dependencies it would scatter that order.
This allows users to manually set dependencies developers have
forgotten by specifying the plugins manually in their configuration or
cmdline. This was an excellent consideration by @mschmook.
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
The next patch will use these to order the hooks.
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
Changelog-Added: plugins: hooks can now specify that they must be called 'before' or 'after' other plugins.
Changelog-Removed: JSON API: The hook `db_write` can no longer return `true` (deprecated in 0.8.1)
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
One is called on every plugin return, and tells us whether to continue;
the other is only called if every plugin says ok.
This works for things like payload replacement, where we need to process
the results from each plugin, not just the final one!
We should probably turn everything into a chained callback next
release.
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
They callback must take ownership of the payload (almost all do, but
now it's explicit).
And since the payload and cb_arg arguments to plugin_hook_call_() are
always identical, make them a single parameter.
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
It was a pointer into the list of plugins for the hook, but it was rather
unstable: if a plugin exits after handling the event we could end up skipping
a later plugin. We now rely on the much more stable `call_chain` list, so we
can clean up that useless field.
We are attaching the destructor to notify us when the plugin exits, but we
also need to clear them once the request is handled correctly, so we don't
call the destructor when it exits later.
We make the current state of `lightningd` explicit so we don't have to
identify a shutdown by its side-effects. We then use this in order to prevent
the killing and freeing of plugins to continue down the chain of registered
plugins.
This used to be necessary because we allocated the `plugin_hook_request` off
of the plugin instance (only tal allocated object we could grab at that
time. Now the plugin was replaced by a list, which itself is tal-allocated,
making that workaround pointless, or even wrong once we have multiple plugins
registering for that hook.
We will be using `plugin_hook_call_next` as part of the loop to traverse all
plugins that registered the hook, so group initialization in the init function
and move per-plugin logic into `plugin_hook_call_next`
We are about to call multiple plugins, and we'll have to pass the payload into
each call. Sadly the serialized stream gets consumed during the call, so keep
the unserialized payload around.
Switch from having a single plugin to a list of plugins. If the hook is of
type single we will enforce that constraint on the number of registered
plugins when attempting to add.
The newly introduced type is used to determine what the call semantics of the
hook are. We have `single` corresponding to the old behavior, as well as
`chain` which allows multiple plugins to register for the hook, and they are
then called sequentially (if all plugins return `{"result": "continue"}`) or
exit the chain if the hook event was handled.
Changelog-Changed: The hooks `db_write`, `invoice_payment`, and `rpc_command` now accept `{ "result": "continue" }` to mean "do default action", in addition to `true` (`db_write`), `{}` (`invoice_payment`), and `{"continue": true}` (`rpc_command`). The older "default" indicators are now deprecated and are now recognized only if `--deprecated-apis` is set.
This increments the `data_version` upon committing dirty transactions, reads
the last data_version upon startup, and tracks the number in memory in
parallel to the DB (see next commit for rationale).
Changelog-Changed: JSON-RPC: Added a `data_version` field to the `db_write` hook which returns a numeric transaction counter.