These tests have proven to be:
a) very expensive, as they spin up many nodes, and perform long setup
b) are not testing anything specific, they just fuzz functionality
that is already tested otherwise
c) have not helped pinpoint any issues in living memory
d) are very flaky, making for really bad signal-to-noise, so much
that devs usually just restart without even looking at the logs
e) even if we were to look at the logs, we'd be unable to reproduce
due to the inherent randomness involved in these tests
f) are really noisy neighbors, causing other tests to flake as well,
further muddying the water
All in all, these tests are a waste of time, and source of
frustration.
[ Cleaned up python unused imports --RR ]
Changelog-None
client_read_next(…) calls io_read_wire(…), passing &c->msg_in as the
address of a pointer that will be set to the address of the buffer that
io_read_wire_(…) will allocate, and passing c (a pointer to the struct
client instance) as the parent for the new allocation. As long as the
struct client instance eventually gets freed, the allocated message
buffer will be freed too, so there is no "leak" in the strict sense of
the term, but the freeing of the buffer may not occur for an arbitrarily
long time after the buffer has become disused, and indeed many millions
of message buffers may be allocated within the lifetime of one struct
client instance.
handle_client(…) ultimately hands off the c->msg_in to one of several
message-type-specific handler functions, and those functions are not
TAKES or STEALS on their message buffer parameters and do not free their
message buffer arguments. Consequently, each successive call to
client_read_next(…) will cause io_read_wire_(…) to overwrite the
c->msg_in pointer with the address of a newly allocated message buffer,
and the old buffer will be left dangling off of the struct client
instance indefinitely.
Fix this by initializing c->msg_in to NULL in new_client(…) and then
having client_read_next(…) do `c->msg_in = tal_free(c->msg_in)` prior to
calling io_read_wire(…). That way, the previous message buffer will be
freed just before beginning to read the next message. The same strategy
is already employed in common/daemon_conn.c, albeit without nulling out
dc->msg_in after freeing it.
Fixes: #5035
Changelog-Fixed: hsmd: Fixed a significant memory leak
This restores the behaviour prior to `lightningd: use our cached
channel_update for errors instead of asking gossipd.`, where gossipd
would refuse to give us channel_updates for unannounced channels.
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
It looks like decode_c doesn't set have_c unlike the other decode_
methods. At the start of the function, decode_c checks have_c to see if
it's set, but it is never set. It seems like this could allow for
duplicate c tags, which is probably not intended.
Signed-off-by: William Casarin <jb55@jb55.com>
If this field is missing for whatever reason (weird db state?)
clightning will crash when listing invoices.
Signed-off-by: William Casarin <jb55@jb55.com>
Unfortunately we can't do any smart parsing here since
wiregen does not support switch/type cases for different
substructure unions yet. So just give us a pointer we can use.
But this requires a watch-only wallet, and python-bitcoinlib doesn't support
multiple wallets, so we need to unload the original one, but then we need
to generate a block, so that can't generate a new address, so we need
an address arg to generate_block.
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
We detect whether we have the rust tooling available (mainly `cargo`)
and enable or disable the rust libraries, plugins and examples when it
is enabled. Since the rest of the Makefiles assumes that executables
have an associated header and C source file, we also needed to add a
target that we can add non-C binaries to.
We build an in-memory model of what the API should look like, which
will later be used to generate a variety of bindings. In this PR we
will use the model to build structs corresponding to the requests and
responses for the various methods.
The JSON-RPC schemas serve as ground-truth, however they are missing a
bit of context: methods, and the request-response matching (as well as
a higher level grouping we'll call a Service). I'm tempted to create a
new document that describes this behavior and we could even generate
the rather repetitive JSON schemas from that document. Furthermore
it'd allow us to add some required metadata such as grpc field
numbering once we generate those bindings.
Changelog-Added: JSON-RPC: A new `msggen` library allows easy generation of language bindings for the JSON-RPC from the JSON schemas
I've tried automatically parsing the docs, and these inconsistencies made it harder to do that.
(I tried to do that for a project which I can't share yet, I'm not sure if it'll even work).
We really need our own lnprototest tests for packet-based stuff;
these message-based tests are inherently delicate and awkward.
In particular, connectd now does dev-disconnect, so the socket is not
immediately closed after a dev-disconnect command. In this case, the
WIRE_SHUTDOWN has often already been written from connectd to channeld.
But it sometimes works, too.
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
If the HTLCs are completely negotiated, we can get a channel break when
we mine a pile of blocks. This is mainly seen with Postgres, due to the db
speed.
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
If we call update_channel_from_inflight *twice* with the same inflight, we
will get bad results. Using tal_steal() here was a premature optimization:
```
Valgrind error file: valgrind-errors.496395
==496395== Invalid read of size 8
==496395== at 0x22A9D3: to_tal_hdr (tal.c:174)
==496395== by 0x22B4B5: tal_steal_ (tal.c:498)
==496395== by 0x16A13D: update_channel_from_inflight (peer_control.c:1225)
==496395== by 0x16A4C7: funding_depth_cb (peer_control.c:1299)
==496395== by 0x182807: txw_fire (watch.c:232)
==496395== by 0x182AA9: watch_topology_changed (watch.c:300)
==496395== by 0x1290ED: updates_complete (chaintopology.c:624)
==496395== by 0x129BF4: get_new_block (chaintopology.c:835)
==496395== by 0x125EEF: getrawblockbyheight_callback (bitcoind.c:362)
==496395== by 0x176ECC: plugin_response_handle (plugin.c:584)
==496395== by 0x1770F5: plugin_read_json_one (plugin.c:690)
==496395== by 0x1772D9: plugin_read_json (plugin.c:735)
==496395== Address 0x89fbb08 is 24 bytes inside a block of size 104 free'd
==496395== at 0x483CA3F: free (in /usr/lib/x86_64-linux-gnu/valgrind/vgpreload_memcheck-amd64-linux.so)
==496395== by 0x22B193: del_tree (tal.c:421)
==496395== by 0x22B461: tal_free (tal.c:486)
==496395== by 0x16A123: update_channel_from_inflight (peer_control.c:1223)
==496395== by 0x16A4C7: funding_depth_cb (peer_control.c:1299)
==496395== by 0x182807: txw_fire (watch.c:232)
==496395== by 0x182AA9: watch_topology_changed (watch.c:300)
==496395== by 0x1290ED: updates_complete (chaintopology.c:624)
==496395== by 0x129BF4: get_new_block (chaintopology.c:835)
==496395== by 0x125EEF: getrawblockbyheight_callback (bitcoind.c:362)
==496395== by 0x176ECC: plugin_response_handle (plugin.c:584)
==496395== by 0x1770F5: plugin_read_json_one (plugin.c:690)
==496395== Block was alloc'd at
==496395== at 0x483B7F3: malloc (in /usr/lib/x86_64-linux-gnu/valgrind/vgpreload_memcheck-amd64-linux.so)
==496395== by 0x22AC1C: allocate (tal.c:250)
==496395== by 0x22B1DD: tal_alloc_ (tal.c:428)
==496395== by 0x22B3A6: tal_alloc_arr_ (tal.c:471)
==496395== by 0x22C094: tal_dup_ (tal.c:805)
==496395== by 0x12B274: new_inflight (channel.c:187)
==496395== by 0x136D4C: wallet_commit_channel (dual_open_control.c:1260)
==496395== by 0x13B084: handle_commit_received (dual_open_control.c:2839)
==496395== by 0x13B6AF: dual_opend_msg (dual_open_control.c:2976)
==496395== by 0x1809FF: sd_msg_read (subd.c:553)
==496395== by 0x218F5D: next_plan (io.c:59)
==496395== by 0x219B65: do_plan (io.c:407)
```
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
Otherwise we get weird effects, as htlcs are being freed:
```
2022-01-26T05:07:37.8774610Z lightningd-1: 2022-01-26T04:47:48.770Z DEBUG 030eeb52087b9dbb27b7aec79ca5249369f6ce7b20a5684ce38d9f4595a21c2fda-chan#8: Failing HTLC 18446744073709551615 due to peer death
2022-01-26T05:07:37.8775287Z lightningd-1: 2022-01-26T04:47:48.770Z **BROKEN** 030eeb52087b9dbb27b7aec79ca5249369f6ce7b20a5684ce38d9f4595a21c2fda-chan#8: Neither origin nor in?
```
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
`hc` is never NULL, since it's `hc = &chan->half[direction];`;
we really meant "is it initialized", and valgrind under CI finally
caught it:
```
==69243== Conditional jump or move depends on uninitialised value(s)
==69243== at 0x11C595: handle_local_channel_update (gossip_generation.c:758)
==69243== by 0x115254: recv_req (gossipd.c:986)
==69243== by 0x128F8D: handle_read (daemon_conn.c:31)
==69243== by 0x16BEE1: next_plan (io.c:59)
==69243== by 0x16CAE9: do_plan (io.c:407)
==69243== by 0x16CB2B: io_ready (io.c:417)
==69243== by 0x16EE1E: io_loop (poll.c:453)
==69243== by 0x1154DA: main (gossipd.c:1089)
==69243==
```
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
If we fund a channel between two nodes, then mine all the blocks to
announce it, any other nodes may see the announcement before the
blocks, causing CI to complain about "bad gossip":
```
lightningd-4: 2022-01-25T22:33:25.468Z DEBUG 032cf15d1ad9c4a08d26eab1918f732d8ef8fdc6abb9640bf3db174372c491304e-gossipd: Ignoring future channel_announcment for 113x1x1 (current block 112)
lightningd-4: 2022-01-25T22:33:25.468Z DEBUG 032cf15d1ad9c4a08d26eab1918f732d8ef8fdc6abb9640bf3db174372c491304e-gossipd: Bad gossip order: WIRE_CHANNEL_UPDATE before announcement 113x1x1/0
lightningd-4: 2022-01-25T22:33:25.468Z DEBUG 032cf15d1ad9c4a08d26eab1918f732d8ef8fdc6abb9640bf3db174372c491304e-gossipd: Bad gossip order: WIRE_CHANNEL_UPDATE before announcement 113x1x1/1
lightningd-4: 2022-01-25T22:33:25.468Z DEBUG 032cf15d1ad9c4a08d26eab1918f732d8ef8fdc6abb9640bf3db174372c491304e-gossipd: Bad gossip order: WIRE_NODE_ANNOUNCEMENT before announcement 032cf15d1ad9c4a08d26eab1918f732d8ef8fdc6abb9640bf3db174372c491304e
```
Add a new helper for this case, and use it where there are more than 2 nodes.
Cleans up test_routing_gossip and a few other places which did this manually.
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
We were relying on the fee update to create an additional tx. That's
ugly; do an actual payment and make sure we definitely complete a new
tx by waiting for that *then* both revoke_and_ack.
(Without this, we could get a unilateral close instead of a penalty).
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
Gossipd now simply gets told by channeld when peers arrive or leave.
(it only needs to know for the seeker).
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
We currently die when gossipd vanishes, but our direct connection will
go away. We then complain if the node is shutting down while we're talking
to hsmd.
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>