Since we turned many errors into warnings, we want our tests to fail
when they happen unexpectedly. We make WARNING clear in the strings
we print, too, to help out.
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
No more sending "all-channel" errors; in particular, gossipd now only
sends warnings (which make us hang up), not errors, and peer_connected
rejections are warnings (and disconnect), not errors.
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
Changelog-Changed: Plugins: `peer_connected` rejections now send a warning, not an error, to the peer.
And make all the callers choose which one. In general, I prefer warn,
which lets them reconnect and try again, however some places are either
stated that they must be errors in the spec itself, or in openingd
where we abandon the channel when we close the connection anyway.
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
Changelog-Changed: Protocol: we now send warning messages and close the connection, except on unrecoverable errors.
This is in line with the warnings draft, where all-zeroes in a
channel_id is no longer special (i.e. it will be ignored).
But gossipd would send these if it got upset with us, so it's best
practice to ignore them for now anyway.
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
Changelog-Added: Protocol: we treat error messages from peer which refer to "all channels" as warnings, not errors.
This takes from the draft spec at https://github.com/lightningnetwork/lightning-rfc/pull/834
Note that if this draft does not get included, the peer will simply
ignore the warning message (we always close the connection afterwards
anyway).
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
Changelog-Added: Protocol: we now report the new (draft) warning message.
We construct the route manually so we may not have the
channel_announcement yet. But we can get an update from the error
packet, which can lead to:
```
2021-01-29T01:38:23.4767334Z ValueError:
2021-01-29T01:38:23.4767987Z Node errors:
2021-01-29T01:38:23.4768767Z - lightningd-1: had bad gossip messages
2021-01-29T01:38:23.4769512Z Global errors:
2021-01-29T01:38:23.4770300Z
2021-01-29T01:38:23.4771109Z contrib/pyln-testing/pyln/testing/fixtures.py:197: ValueError
...
2021-01-29T01:38:23.7820197Z lightningd-1: 2021-01-29T01:26:57.460Z DEBUG gossipd: Extracted channel_update 01027217b3086ad9f3dee1fa55b94c5fd2a4b0637bec70ba727ba4151a8de5173ddc749db3502d41ab0ae164addc8fd013d2088b6a12a2f478ae0affa94d76d8845c06226e46111a0b59caaf126043eb5bbf28c34f3a5e332a1fc7b2b73cf188910f000067000001000160136459010000060000000000000000000000010000000a000000003b023380 from onionreply 100d0000007500887217b3086ad9f3dee1fa55b94c5fd2a4b0637bec70ba727ba4151a8de5173ddc749db3502d41ab0ae164addc8fd013d2088b6a12a2f478ae0affa94d76d8845c06226e46111a0b59caaf126043eb5bbf28c34f3a5e332a1fc7b2b73cf188910f000067000001000160136459010000060000000000000000000000010000000a000000003b023380
2021-01-29T01:38:23.7837450Z lightningd-1: 2021-01-29T01:26:57.461Z DEBUG gossipd: Bad gossip order: WIRE_CHANNEL_UPDATE before announcement 103x1x1/0
```
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
Now we create a separate set of local mods, and apply and unapply it.
This is more efficient than the previous approach, since we can do
some work up-front. It's also more graceful (and well-defined) when a
local modification overlaps an existing one.
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
We should actually be including this (as it may define _GNU_SOURCE
etc) before any system headers. But where we include <assert.h> we
often didn't, because check-includes would complain that the headers
included it too.
Weaken that check, and include config.h in C files before assert.h.
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
This 'fixes' the `wait_for` helper by removing a pointless final
`time.sleep()`, thus potentially making the method return quicker.
The old code could have had three final states:
- success() := True
- Timeout and success() := True
- Timeout and success() := False
The new code has just two final state:
- success() := True
- Timeout and success() := False
It ensures the final `time.sleep()` is just the right amount before timeout.
And more importantly making it more readable :-)
Changelog-None
We fix up the test by using pay, instead of sendpay (and making pay log
the expected message).
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
Changelog-Changed: sendpay no longer extracts updates from errors, the caller should do it from the `raw_message`.
This overcomes the internal spam filter on updates, which can be useful
if we're actually trying to send through such a node.
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
Changelog-Fixed: Protocol: always accept channel_updates from errors, even they'd otherwise be rejected as spam.
Fixes: #4300
Took me a while (stressing under valgrind) to reproduce this,
then longer to figure out how it happened.
Turns out io_new_conn() can fail if the init function fails.
In our case, this can happen if connect() immediately returns
an error (inside io_connect). But we've already set the finish
function, which (if this was the last address), will free connect,
making the assignment `connect->conn = ...` write to a freed address.
Either way, if it fails, try_connect_one_addr() has taken care to
update connect->conn, or free connect, and the caller should not do it.
Here's the valgrind trace:
```
==384981== Invalid write of size 8
==384981== at 0x11127C: try_connect_one_addr (connectd.c:880)
==384981== by 0x112BA1: destroy_io_conn (connectd.c:708)
==384981== by 0x141459: destroy_conn (poll.c:244)
==384981== by 0x14147F: destroy_conn_close_fd (poll.c:250)
==384981== by 0x149EB9: notify (tal.c:240)
==384981== by 0x149F8B: del_tree (tal.c:402)
==384981== by 0x14A51A: tal_free (tal.c:486)
==384981== by 0x140036: io_close (io.c:450)
==384981== by 0x1400B3: do_plan (io.c:401)
==384981== by 0x140134: io_ready (io.c:423)
==384981== by 0x141A57: io_loop (poll.c:445)
==384981== by 0x112CB0: main (connectd.c:1703)
==384981== Address 0x4d67020 is 64 bytes inside a block of size 160 free'd
==384981== at 0x483CA3F: free (in /usr/lib/x86_64-linux-gnu/valgrind/vgpreload_memcheck-amd64-linux.so)
==384981== by 0x14A020: del_tree (tal.c:421)
==384981== by 0x14A51A: tal_free (tal.c:486)
==384981== by 0x1110C5: try_connect_one_addr (connectd.c:806)
==384981== by 0x112BA1: destroy_io_conn (connectd.c:708)
==384981== by 0x141459: destroy_conn (poll.c:244)
==384981== by 0x14147F: destroy_conn_close_fd (poll.c:250)
==384981== by 0x149EB9: notify (tal.c:240)
==384981== by 0x149F8B: del_tree (tal.c:402)
==384981== by 0x14A51A: tal_free (tal.c:486)
==384981== by 0x140036: io_close (io.c:450)
==384981== by 0x1405DC: io_connect_ (io.c:345)
==384981== Block was alloc'd at
==384981== at 0x483B7F3: malloc (in /usr/lib/x86_64-linux-gnu/valgrind/vgpreload_memcheck-amd64-linux.so)
==384981== by 0x149CF1: allocate (tal.c:250)
==384981== by 0x14A3C6: tal_alloc_ (tal.c:428)
==384981== by 0x1114F2: try_connect_peer (connectd.c:1526)
==384981== by 0x111717: connect_to_peer (connectd.c:1558)
==384981== by 0x1124F5: recv_req (connectd.c:1627)
==384981== by 0x1188B2: handle_read (daemon_conn.c:31)
==384981== by 0x13FBCB: next_plan (io.c:59)
==384981== by 0x140076: do_plan (io.c:407)
==384981== by 0x140113: io_ready (io.c:417)
==384981== by 0x141A57: io_loop (poll.c:445)
==384981== by 0x112CB0: main (connectd.c:1703)
```
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
Changelog-Fixed: Occasional crash in connectd due to use-after-free
Fixes: #4343
From #clightning:
(11:24:10) andytoshi: hiya, i'm trying to set up a new lightningd node, and when i run lightningd --network=bitcoin --log-level=debug --daemon
(11:24:17) andytoshi: i get errors of the form fetchinvoice: Malformed JSON reply '2021-01-25T00:51:16.655Z DEBUG plugin-offers: disabled itself at init: offers not enabled in config
(11:24:43) andytoshi: there are a couple variants of this, but always some form of "something: failed to parse <a log line> as json"
Indeed, we close stdout, and it ends up being reused for some plugin.
But the real problem is that we log to stdout by default, which doesn't
make sense. If they really want to discard logs, they can use
--log-file=/dev/null.
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
Changelog-Fixed: JSON failures when --daemon is used without --log-file.
rearranges the`peer_connected_hook_payload` definition to the location
where this is used in the file.
Fixes certain blanklines and linebreaks to make the code look nicer.
Nit: The underscore in "openchannel_hook" is wrong, bcause the name of
the hook is just "openchannel". The "_hook" implied this to be part of
the name.
Changelog-None
We are printing `repr(obj)` which is not pretty-printed, hard to read,
and can't even be copied and inspected to JSON tools. We now print the
JSONified and indented calls and responses for easier debugging based
on solely the logs (useful for CI!).
Changelog-Added: pyln-testing: The RPC client will now pretty-print requests and responses to facilitate log-based debugging.
We don't have a good way of referring to the configuration that
failed, so let's give them a numberic ID. Particularly useful for the
artifacts that'd be overwritten otherwise.
The test was not considering that concurrent sendrawtx of the same tx
is not stable, and either endpoint will submit it first. Now just
checking state transitions and the mempool.
If we're quick (or the node is slow) we end up reconnecting before our
counterparty has realized the state transition, resulting in an
unexpected re-establish.