Originally I (incorrectly?) assumed that since TX_COMMITMENT_SIGNED
always came before TX_SIGNATURES, we would always receive a response
from openchannel_update (w/ commitment_secured = true) before getting
notification of receipt of the peer's signatures.
But it's observable in the logs of hung tests that this in fact is a
wrong assumption -- the notification for the tx_sigs arrives at our
spender plugin before the callback from our openchannel_update RPC.
This mis-ordering causes a hang.
Luckily we're pretty much setup to handle this race already w/ states
etc, minus actually calling the method advance the plot in case we're
ready.
2022-07-26T05:37:59.4529095Z lightningd-1 2022-07-26T05:10:07.395Z DEBUG 035d2b1192dfba134e10e540875d366ebc8bc353d5aa766b80c090b39c3a5d885d-dualopend-chan#2: peer_in WIRE_COMMITMENT_SIGNED
2022-07-26T05:37:59.4530452Z lightningd-1 2022-07-26T05:10:07.396Z DEBUG 035d2b1192dfba134e10e540875d366ebc8bc353d5aa766b80c090b39c3a5d885d-hsmd: Got WIRE_HSMD_VALIDATE_COMMITMENT_TX
2022-07-26T05:37:59.4530719Z lightningd-1 2022-07-26T05:10:07.396Z DEBUG hsmd: Client: Received message 35 from client
2022-07-26T05:37:59.4531386Z lightningd-1 2022-07-26T05:10:07.396Z DEBUG 035d2b1192dfba134e10e540875d366ebc8bc353d5aa766b80c090b39c3a5d885d-dualopend-chan#2: billboard: channel open: commitment received, sending to lightningd to save
2022-07-26T05:37:59.4531856Z lightningd-1 2022-07-26T05:10:07.398Z DEBUG 035d2b1192dfba134e10e540875d366ebc8bc353d5aa766b80c090b39c3a5d885d-dualopend-chan#2: peer_in WIRE_TX_SIGNATURES
>>> 2022-07-26T05:37:59.4532553Z lightningd-1 2022-07-26T05:10:07.400Z DEBUG plugin-spenderp: mfc 60:`openchannel_peer_sigs` notice received for channel 9d145e763f08ee6f715ba7677f869cbb9580c7406f4d0b0ff3a0987efe501e13 <<<< THIS ONE WAS ASSUMED TO COME AFTER openchannel_update (next line)
2022-07-26T05:37:59.4533048Z lightningd-1 2022-07-26T05:10:07.400Z DEBUG plugin-spenderp: mfc 60, dest 0: openchannel_update 035d2b1192dfba134e10e540875d366ebc8bc353d5aa766b80c090b39c3a5d885d returned.
2022-07-26T05:37:59.4554292Z lightningd-1 2022-07-26T05:10:07.400Z DEBUG plugin-spenderp: mfc 60: parallel `openchannel_update`.
2022-07-26T05:37:59.4555485Z lightningd-1 2022-07-26T05:10:07.400Z DEBUG plugin-spenderp: mfc 60: funding tx 50425e20dbf0ca6fe112a8811b8048edb5bfa8d2922079668c5f353b859b45cb
2022-07-26T05:37:59.4557934Z lightningd-1 2022-07-26T05:10:07.508Z DEBUG 0266e4598d1d3c415f572a8488830b60f7e744ed9235eb0b1ba93283b315c03518-hsmd: Got WIRE_HSMD_CUPDATE_SIG_REQ
2022-07-26T05:37:59.4558244Z lightningd-1 2022-07-26T05:10:07.508Z DEBUG hsmd: Client: Received message 3 from client
2022-07-26T05:37:59.4558738Z lightningd-3 2022-07-26T05:11:03.234Z DEBUG 0266e4598d1d3c415f572a8488830b60f7e744ed9235eb0b1ba93283b315c03518-gossipd: seeker: startup peer finished
2022-07-26T05:37:59.4559209Z lightningd-3 2022-07-26T05:11:03.234Z DEBUG 0266e4598d1d3c415f572a8488830b60f7e744ed9235eb0b1ba93283b315c03518-gossipd: seeker: state = PROBING_SCIDS Seeking scids 1 - 105
(The last 2 log messages (from a different node) are >1min after the last
log line from lightning-1, because lightning-1 hung)
Hacked lightningd up to test this (such that notification always sent
before the RPC response, works as intended w/ patch)
There's a race btw disconnecting and returning a successful RPC call for
openchannel_signed; if we disconnect quickly, we get an RPC error back.
Too slow and it returns w/o an error.
This needs to be cleaned up on a whole, work that I'm planning to get
into as part of a funder re-write. For now, let's just let the test
continue whether this call succeeds or fails.
We were cmd was getting free'd but holding on to reference of the
thing was causing problems.
==523280== Invalid read of size 8
==523280== at 0x1B3E14: del_notifier_property (tal.c:326)
==523280== by 0x1B3E14: tal_del_notifier_ (tal.c:569)
==523280== by 0x1123E7: handle_rpc_reply (libplugin.c:671)
==523280== by 0x1123E7: rpc_read_response_one (libplugin.c:866)
==523280== by 0x1123E7: rpc_conn_read_response (libplugin.c:886)
==523280== by 0x1A7B53: next_plan (io.c:59)
==523280== by 0x1A7B53: do_plan (io.c:407)
==523280== by 0x1A7B53: io_ready (io.c:417)
==523280== by 0x1A9BDB: io_loop (poll.c:453)
==523280== by 0x1141D0: plugin_main (libplugin.c:1708)
==523280== by 0x10D7E4: main (commando.c:937)
==523280== Address 0x52de928 is 8 bytes inside a block of size 40 free'd
==523280== at 0x483F0C3: free (vg_replace_malloc.c:872)
==523280== by 0x1B2CDD: del_tree (tal.c:419)
==523280== by 0x1B37BB: tal_free (tal.c:486)
==523280== by 0x1B37BB: tal_free (tal.c:474)
==523280== by 0x110CB2: command_complete (libplugin.c:255)
==523280== by 0x110CB2: command_done_err (libplugin.c:390)
==523280== by 0x10F511: handle_reply (commando.c:560)
==523280== by 0x10F511: handle_custommsg (commando.c:609)
==523280== by 0x113877: ld_command_handle (libplugin.c:1441)
==523280== by 0x113877: ld_read_json_one (libplugin.c:1491)
==523280== by 0x113877: ld_read_json (libplugin.c:1511)
==523280== by 0x1A7B53: next_plan (io.c:59)
==523280== by 0x1A7B53: do_plan (io.c:407)
==523280== by 0x1A7B53: io_ready (io.c:417)
==523280== by 0x1A9BDB: io_loop (poll.c:453)
==523280== by 0x1141D0: plugin_main (libplugin.c:1708)
==523280== by 0x10D7E4: main (commando.c:937)
==523280== Block was alloc'd at
==523280== at 0x483C855: malloc (vg_replace_malloc.c:381)
==523280== by 0x1B3BBD: allocate (tal.c:250)
==523280== by 0x1B3BBD: add_notifier_property (tal.c:303)
==523280== by 0x1B3BBD: tal_add_destructor2_ (tal.c:529)
==523280== by 0x110725: jsonrpc_request_start_ (libplugin.c:181)
==523280== by 0x10E0EA: send_more_cmd (commando.c:643)
==523280== by 0x11243C: handle_rpc_reply (libplugin.c:696)
==523280== by 0x11243C: rpc_read_response_one (libplugin.c:866)
==523280== by 0x11243C: rpc_conn_read_response (libplugin.c:886)
==523280== by 0x1A7B53: next_plan (io.c:59)
==523280== by 0x1A7B53: do_plan (io.c:407)
==523280== by 0x1A7B53: io_ready (io.c:417)
==523280== by 0x1A9BDB: io_loop (poll.c:453)
==523280== by 0x1141D0: plugin_main (libplugin.c:1708)
==523280== by 0x10D7E4: main (commando.c:937)
==523280==
{
<insert_a_suppression_name_here>
------------------------------- Valgrind errors --------------------------------
Valgrind error file: valgrind-errors.493330
==493330== Conditional jump or move depends on uninitialised value(s)
==493330== at 0x154051: opt_add_addr_withtype (options.c:275)
==493330== by 0x154406: opt_add_announce_addr (options.c:302)
==493330== by 0x2696E6: parse_one (parse.c:121)
==493330== by 0x25CFB5: opt_parse (opt.c:228)
==493330== by 0x155DB6: handle_opts (options.c:1413)
==493330== by 0x127317: main (lightningd.c:994)
==493330==
{
<insert_a_suppression_name_here>
Memcheck:Cond
fun:opt_add_addr_withtype
fun:opt_add_announce_addr
fun:parse_one
fun:opt_parse
fun:handle_opts
fun:main
}
--------------------------------------------------------------------------------
Leaving base_dir /tmp/ltests-iyf2dw3n intact, it still has test sub-directories with failure details: ['test_announce_dns_without_port_1']
====================================== short test summary info ======================================
ERROR tests/test_gossip.py::test_announce_dns_without_port - ValueError:
We reset counters every minute, so ratelimit tests can flake since we
might hit that boundary.
Instead, wait for the reset then test explicitly, assuming that takes
less than 60 seconds.
```
for rune, cmd, params in failures:
print("{} {}".format(cmd, params))
with pytest.raises(RpcError, match='Not authorized:') as exc_info:
l2.rpc.call(method='commando',
payload={'peer_id': l1.info['id'],
'rune': rune['rune'],
'method': cmd,
> 'params': params})
E Failed: DID NOT RAISE <class 'pyln.client.lightning.RpcError'>
...
DEBUG:root:Calling commando with payload {'peer_id': '0266e4598d1d3c415f572a8488830b60f7e744ed9235eb0b1ba93283b315c03518', 'rune': 'O8Zr-ULTBKO3_pKYz0QKE9xYl1vQ4Xx9PtlHuist9Rk9NCZwbnVtPTAmcmF0ZT0zJnJhdGU9MQ==', 'method': 'getinfo', 'params': {}}
```
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
As the comment in set_feerates says: "Technically, this waits until
it's called, not until it's processed.".
And the wait_for() line doesn't work, since that condition is already true.
```
@unittest.skipIf(TEST_NETWORK == 'liquid-regtest', "Fees on elements are different")
@unittest.skipIf(
not DEVELOPER or DEPRECATED_APIS, "Without DEVELOPER=1 we snap to "
"FEERATE_FLOOR on testnets, and we test the new API."
)
def test_feerates(node_factory):
l1 = node_factory.get_node(options={'log-level': 'io',
'dev-no-fake-fees': True}, start=False)
l1.daemon.rpcproxy.mock_rpc('estimatesmartfee', {
'error': {"errors": ["Insufficient data or no feerate found"], "blocks": 0}
})
l1.start()
# All estimation types
types = ["opening", "mutual_close", "unilateral_close", "delayed_to_us",
"htlc_resolution", "penalty"]
# Try parsing the feerates, won't work because can't estimate
for t in types:
with pytest.raises(RpcError, match=r'Cannot estimate fees'):
feerate = l1.rpc.parsefeerate(t)
# Query feerates (shouldn't give any!)
wait_for(lambda: len(l1.rpc.feerates('perkw')['perkw']) == 2)
feerates = l1.rpc.feerates('perkw')
assert feerates['warning_missing_feerates'] == 'Some fee estimates unavailable: bitcoind startup?'
assert 'perkb' not in feerates
assert feerates['perkw']['max_acceptable'] == 2**32 - 1
assert feerates['perkw']['min_acceptable'] == 253
for t in types:
assert t not in feerates['perkw']
wait_for(lambda: len(l1.rpc.feerates('perkb')['perkb']) == 2)
feerates = l1.rpc.feerates('perkb')
assert feerates['warning_missing_feerates'] == 'Some fee estimates unavailable: bitcoind startup?'
assert 'perkw' not in feerates
assert feerates['perkb']['max_acceptable'] == (2**32 - 1)
assert feerates['perkb']['min_acceptable'] == 253 * 4
for t in types:
assert t not in feerates['perkb']
# Now try setting them, one at a time.
# Set CONSERVATIVE/2 feerate, for max
l1.set_feerates((15000, 0, 0, 0), True)
wait_for(lambda: len(l1.rpc.feerates('perkw')['perkw']) == 2)
feerates = l1.rpc.feerates('perkw')
assert feerates['warning_missing_feerates'] == 'Some fee estimates unavailable: bitcoind startup?'
assert 'perkb' not in feerates
> assert feerates['perkw']['max_acceptable'] == 15000 * 10
E assert 4294967295 == (15000 * 10)
tests/test_misc.py:1392: AssertionError
```
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
On fast machines, we don't get failures sometimes on commando commands.
(*But* we still got "New cmd replacing old" messages, which is how I
realized we weren't freeing them promptly, hence the previous fix).
```
# Should have exactly one discard msg from each discard
> nodes[0].daemon.wait_for_logs([r"New cmd from .*, replacing old"] * discards)
tests/test_plugin.py:2839:
...
> raise TimeoutError('Unable to find "{}" in logs.'.format(exs))
E TimeoutError: Unable to find "[]" in logs.
```
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
This was weird. Here is the message (with \n turned into real new lines):
```
2022-07-24T07:20:08.9144998Z Plugin '/home/runner/work/lightning/lightning/tests/plugins/dblog.py' returned an invalid response to the db_write hook: {"jsonrpc": "2.0", "id": 40, "error": {"code": -32600, "message": "Error while processing db_write: UNIQUE constraint failed: shachain_known.shachain_id, shachain_known.pos", "traceback": "Traceback (most recent call last):
File \"/home/runner/work/lightning/lightning/contrib/pyln-client/pyln/client/plugin.py\", line 631, in _dispatch_request
result = self._exec_func(method.func, request)
File \"/home/runner/work/lightning/lightning/contrib/pyln-client/pyln/client/plugin.py\", line 616, in _exec_func
return func(*ba.args, **ba.kwargs)
File \"/home/runner/work/lightning/lightning/tests/plugins/dblog.py\", line 45, in db_write
plugin.conn.execute(c)
sqlite3.IntegrityError: UNIQUE constraint failed: shachain_known.shachain_id, shachain_known.pos
"}}
```
Finally, I realized that we *kill* l2: this means it has updated the
plugin db but not the real db. This is expected: a real backup plugin
would handle this case.
Simply disable the test for this case.
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
I wondered how `tests/plugins/dblog.py` worked, since it is supposed to fail
unless the `dblog-file` arg is set:
```
@plugin.init()
def init(configuration, options, plugin):
if not plugin.get_option('dblog-file'):
raise RpcError("No dblog-file specified")
```
But it was set to "null". That's because 'None' in python is turned into a literal
JSON "null", and we take that as the default value.
We also cleanup the popt->description double-assignment (a leftover
from when this was optional).
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
Changelog-Fixed: plugins: setting the default value of a parameter to `null` is the same as not setting it (pyln plugins did this!).
As we'll see in the next patch, this wasn't *supposed* to work wihtout dblog-file,
but it did, creating a dblog called "null".
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
Don't log_io final messages twice (multiplex_final_message already does
this, so it's confusing to see us send e.g. WIRE_ERROR twice!).
And report that the peer has failed to connect out *before* telling
lightningd, otherwise we get a very confusing ordering, e.g.:
```
2022-07-23T05:17:36.096Z DEBUG 027d0de66d08f956a8d606c0d1c34e59bda38c05a3b1cc738fdd6378716c644997-lightningd: Reconnecting in 4 seconds
2022-07-23T05:17:36.096Z DEBUG 027d0de66d08f956a8d606c0d1c34e59bda38c05a3b1cc738fdd6378716c644997-lightningd: Will try reconnect in 4 seconds
2022-07-23T05:17:36.096Z DEBUG 027d0de66d08f956a8d606c0d1c34e59bda38c05a3b1cc738fdd6378716c644997-connectd: Failed connected out:
```
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
If rune contains invalid UTF-8, offers (which implements decode) would
produce JSON with invalid UTF-8, which causes lightningd to complain
and kill it, and then die because it's an important plugin.
So don't decode invalid UTF-8!
Reported-by: @jb55
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
The one to stderr is fine, the log one gets corrupted, like so:
```
2022-07-24T07:20:08.6250702Z lightningd-2 2022-07-24T06:49:19.494Z **BROKEN** lightningd: Plugin '????UH??SH??8H?}?H?u?H?U?H?M?H?M?H?E?H?????' returned an invalid response to the db_write hook: (F???U
```
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
If compilation failed, we didn't stop (though except for fuzzing, we
would fail when we try to run the tests).
Also use make -s instead of redirecting make ooutput.
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
We were waiting too long for the reconnect to happen (60s default),
which caused this test to timeout.
When testing, let's speed up the reconnect.
L2 tried to reconnect but didn't have connection information in its
gossip -- is there a way to ask/save connection data from a node you're
making a channel with that doesn't rely on their node_announcement?
This gives a nice way to ensure your secret is the correct one.
Also, we don't need to suppress VALGRIND for this test, now the output
races are fixed.
Changelog-Added: `hsmtool`: new command `checkhsm` to check BIP39 passphrase against hsm_secret.
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
Some tests need to inspect it, but most don't, and I suspect I'm missing some
error messages due to this.
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
The error routine returns a string literal in this case, which we can't take().
Reported-by: @jb55
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
1. fromwire now allocates TLVs, so this was actually a leak.
2. We can simply hand "NULL" to towire_, since that is the same as
this empty tlv.
```
...89221a0054c11c1e3ca31d59-dualopend-chan#1: MEMLEAK: 0x56148649c458
...89221a0054c11c1e3ca31d59-dualopend-chan#1: label=wire/peer_exp_wiregen.c:1041:struct tlv_channel_reestablish_tlvs
...89221a0054c11c1e3ca31d59-dualopend-chan#1: backtrace:
...89221a0054c11c1e3ca31d59-dualopend-chan#1: /home/rusty/devel/cvs/lightning/ccan/ccan/tal/tal.c:442 (tal_alloc_)
...89221a0054c11c1e3ca31d59-dualopend-chan#1: /home/rusty/devel/cvs/lightning/wire/peer_exp_wiregen.c:1041 (tlv_channel_reestablish_tlvs_new)
...89221a0054c11c1e3ca31d59-dualopend-chan#1: /home/rusty/devel/cvs/lightning/openingd/dualopend.c:3536 (do_reconnect_dance)
...89221a0054c11c1e3ca31d59-dualopend-chan#1: /home/rusty/devel/cvs/lightning/openingd/dualopend.c:3955 (main)
...89221a0054c11c1e3ca31d59-dualopend-chan#1: ../sysdeps/nptl/libc_start_call_main.h:58 (__libc_start_call_main)
...89221a0054c11c1e3ca31d59-dualopend-chan#1: ../csu/libc-start.c:392 (__libc_start_main_impl)
...89221a0054c11c1e3ca31d59-dualopend-chan#1: parents:
```
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>