It was using a trick to only shut down the first node, and forgetting
about the others. This could lead to processes not being stopped
correctly and to test failures because the directory isn't cleaned up
correctly.
Now we use the executor to shut as many nodes as possible in parallel.
We split into quick smoke-tests, normal-tests, and valgrind-tests. The
first provides a quick-abort preventing overusing resources on what
will be a failed run anyway.
We'd like to have some more structure to the tests, with smoke-tests
going first, followed by valgrind and normal tests running in
parallel. So to cut down on the copy-pasta we extract the scripts from
the yaml, into their own files from where they can be easily invoked.
network was missing regtest as an option.
The example was too simple, so I added a second one so show how the k option works.
The explanation of how k works should let the user know that in some cases lightningd will interpret the position of an argument which can lead to a weird response from lightningd.
let the users know that using null is not encouraged
@cdecker said "I think we should definitely not encourage null, since that is very brittle and forces us to keep argument ordering, and deprecating arguments is practically impossible."
In the case where you want a PSBT and also want the output to be added
as a change address, use `excess_as_change` = true.
Generates a change address to use. If you want to pay the excess
elsewhere, you will have to add separately.
Changelog-Added: JSON-RPC: Add new parameter `excess_as_change` to fundpsbt+utxopsbt
Since we turned many errors into warnings, we want our tests to fail
when they happen unexpectedly. We make WARNING clear in the strings
we print, too, to help out.
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
No more sending "all-channel" errors; in particular, gossipd now only
sends warnings (which make us hang up), not errors, and peer_connected
rejections are warnings (and disconnect), not errors.
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
Changelog-Changed: Plugins: `peer_connected` rejections now send a warning, not an error, to the peer.
And make all the callers choose which one. In general, I prefer warn,
which lets them reconnect and try again, however some places are either
stated that they must be errors in the spec itself, or in openingd
where we abandon the channel when we close the connection anyway.
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
Changelog-Changed: Protocol: we now send warning messages and close the connection, except on unrecoverable errors.
This is in line with the warnings draft, where all-zeroes in a
channel_id is no longer special (i.e. it will be ignored).
But gossipd would send these if it got upset with us, so it's best
practice to ignore them for now anyway.
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
Changelog-Added: Protocol: we treat error messages from peer which refer to "all channels" as warnings, not errors.
This takes from the draft spec at https://github.com/lightningnetwork/lightning-rfc/pull/834
Note that if this draft does not get included, the peer will simply
ignore the warning message (we always close the connection afterwards
anyway).
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
Changelog-Added: Protocol: we now report the new (draft) warning message.
We construct the route manually so we may not have the
channel_announcement yet. But we can get an update from the error
packet, which can lead to:
```
2021-01-29T01:38:23.4767334Z ValueError:
2021-01-29T01:38:23.4767987Z Node errors:
2021-01-29T01:38:23.4768767Z - lightningd-1: had bad gossip messages
2021-01-29T01:38:23.4769512Z Global errors:
2021-01-29T01:38:23.4770300Z
2021-01-29T01:38:23.4771109Z contrib/pyln-testing/pyln/testing/fixtures.py:197: ValueError
...
2021-01-29T01:38:23.7820197Z lightningd-1: 2021-01-29T01:26:57.460Z DEBUG gossipd: Extracted channel_update 01027217b3086ad9f3dee1fa55b94c5fd2a4b0637bec70ba727ba4151a8de5173ddc749db3502d41ab0ae164addc8fd013d2088b6a12a2f478ae0affa94d76d8845c06226e46111a0b59caaf126043eb5bbf28c34f3a5e332a1fc7b2b73cf188910f000067000001000160136459010000060000000000000000000000010000000a000000003b023380 from onionreply 100d0000007500887217b3086ad9f3dee1fa55b94c5fd2a4b0637bec70ba727ba4151a8de5173ddc749db3502d41ab0ae164addc8fd013d2088b6a12a2f478ae0affa94d76d8845c06226e46111a0b59caaf126043eb5bbf28c34f3a5e332a1fc7b2b73cf188910f000067000001000160136459010000060000000000000000000000010000000a000000003b023380
2021-01-29T01:38:23.7837450Z lightningd-1: 2021-01-29T01:26:57.461Z DEBUG gossipd: Bad gossip order: WIRE_CHANNEL_UPDATE before announcement 103x1x1/0
```
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
Now we create a separate set of local mods, and apply and unapply it.
This is more efficient than the previous approach, since we can do
some work up-front. It's also more graceful (and well-defined) when a
local modification overlaps an existing one.
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
We should actually be including this (as it may define _GNU_SOURCE
etc) before any system headers. But where we include <assert.h> we
often didn't, because check-includes would complain that the headers
included it too.
Weaken that check, and include config.h in C files before assert.h.
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
This 'fixes' the `wait_for` helper by removing a pointless final
`time.sleep()`, thus potentially making the method return quicker.
The old code could have had three final states:
- success() := True
- Timeout and success() := True
- Timeout and success() := False
The new code has just two final state:
- success() := True
- Timeout and success() := False
It ensures the final `time.sleep()` is just the right amount before timeout.
And more importantly making it more readable :-)
Changelog-None
We fix up the test by using pay, instead of sendpay (and making pay log
the expected message).
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
Changelog-Changed: sendpay no longer extracts updates from errors, the caller should do it from the `raw_message`.
This overcomes the internal spam filter on updates, which can be useful
if we're actually trying to send through such a node.
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
Changelog-Fixed: Protocol: always accept channel_updates from errors, even they'd otherwise be rejected as spam.
Fixes: #4300
Took me a while (stressing under valgrind) to reproduce this,
then longer to figure out how it happened.
Turns out io_new_conn() can fail if the init function fails.
In our case, this can happen if connect() immediately returns
an error (inside io_connect). But we've already set the finish
function, which (if this was the last address), will free connect,
making the assignment `connect->conn = ...` write to a freed address.
Either way, if it fails, try_connect_one_addr() has taken care to
update connect->conn, or free connect, and the caller should not do it.
Here's the valgrind trace:
```
==384981== Invalid write of size 8
==384981== at 0x11127C: try_connect_one_addr (connectd.c:880)
==384981== by 0x112BA1: destroy_io_conn (connectd.c:708)
==384981== by 0x141459: destroy_conn (poll.c:244)
==384981== by 0x14147F: destroy_conn_close_fd (poll.c:250)
==384981== by 0x149EB9: notify (tal.c:240)
==384981== by 0x149F8B: del_tree (tal.c:402)
==384981== by 0x14A51A: tal_free (tal.c:486)
==384981== by 0x140036: io_close (io.c:450)
==384981== by 0x1400B3: do_plan (io.c:401)
==384981== by 0x140134: io_ready (io.c:423)
==384981== by 0x141A57: io_loop (poll.c:445)
==384981== by 0x112CB0: main (connectd.c:1703)
==384981== Address 0x4d67020 is 64 bytes inside a block of size 160 free'd
==384981== at 0x483CA3F: free (in /usr/lib/x86_64-linux-gnu/valgrind/vgpreload_memcheck-amd64-linux.so)
==384981== by 0x14A020: del_tree (tal.c:421)
==384981== by 0x14A51A: tal_free (tal.c:486)
==384981== by 0x1110C5: try_connect_one_addr (connectd.c:806)
==384981== by 0x112BA1: destroy_io_conn (connectd.c:708)
==384981== by 0x141459: destroy_conn (poll.c:244)
==384981== by 0x14147F: destroy_conn_close_fd (poll.c:250)
==384981== by 0x149EB9: notify (tal.c:240)
==384981== by 0x149F8B: del_tree (tal.c:402)
==384981== by 0x14A51A: tal_free (tal.c:486)
==384981== by 0x140036: io_close (io.c:450)
==384981== by 0x1405DC: io_connect_ (io.c:345)
==384981== Block was alloc'd at
==384981== at 0x483B7F3: malloc (in /usr/lib/x86_64-linux-gnu/valgrind/vgpreload_memcheck-amd64-linux.so)
==384981== by 0x149CF1: allocate (tal.c:250)
==384981== by 0x14A3C6: tal_alloc_ (tal.c:428)
==384981== by 0x1114F2: try_connect_peer (connectd.c:1526)
==384981== by 0x111717: connect_to_peer (connectd.c:1558)
==384981== by 0x1124F5: recv_req (connectd.c:1627)
==384981== by 0x1188B2: handle_read (daemon_conn.c:31)
==384981== by 0x13FBCB: next_plan (io.c:59)
==384981== by 0x140076: do_plan (io.c:407)
==384981== by 0x140113: io_ready (io.c:417)
==384981== by 0x141A57: io_loop (poll.c:445)
==384981== by 0x112CB0: main (connectd.c:1703)
```
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
Changelog-Fixed: Occasional crash in connectd due to use-after-free
Fixes: #4343
From #clightning:
(11:24:10) andytoshi: hiya, i'm trying to set up a new lightningd node, and when i run lightningd --network=bitcoin --log-level=debug --daemon
(11:24:17) andytoshi: i get errors of the form fetchinvoice: Malformed JSON reply '2021-01-25T00:51:16.655Z DEBUG plugin-offers: disabled itself at init: offers not enabled in config
(11:24:43) andytoshi: there are a couple variants of this, but always some form of "something: failed to parse <a log line> as json"
Indeed, we close stdout, and it ends up being reused for some plugin.
But the real problem is that we log to stdout by default, which doesn't
make sense. If they really want to discard logs, they can use
--log-file=/dev/null.
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
Changelog-Fixed: JSON failures when --daemon is used without --log-file.
rearranges the`peer_connected_hook_payload` definition to the location
where this is used in the file.
Fixes certain blanklines and linebreaks to make the code look nicer.
Nit: The underscore in "openchannel_hook" is wrong, bcause the name of
the hook is just "openchannel". The "_hook" implied this to be part of
the name.
Changelog-None