core-lightning

mirror of https://github.com/ElementsProject/lightning.git synced 2024-12-27 09:04:40 +01:00

Author	SHA1	Message	Date
Rusty Russell	9ab7c8aed3	connected/test: fix memleak in test. ``` VALGRIND=1 valgrind -q --error-exitcode=7 --track-origins=yes --leak-check=full --show-reachable=yes --errors-for-leak-kinds=all connectd/test/run-netaddress > /dev/null ==2483395== 16 bytes in 1 blocks are still reachable in loss record 1 of 15 ==2483395== at 0x483B7F3: malloc (in /usr/lib/x86_64-linux-gnu/valgrind/vgpreload_memcheck-amd64-linux.so) ==2483395== by 0x10D59A: autodata_register_ (autodata.c:20) ==2483395== by 0x10EB26: register_autotype_type_to_string (type_to_string.h:77) ==2483395== by 0x10EB6B: register_one_type_to_string0 (type_to_string.c:8) ==2483395== by 0x188C0C: __libc_csu_init (in /home/rusty/devel/cvs/lightning/connectd/test/run-netaddress) ==2483395== by 0x4A3A00F: (below main) (libc-start.c:264) ==2483395== ==2483395== 40 bytes in 1 blocks are still reachable in loss record 2 of 15 ==2483395== at 0x483B7F3: malloc (in /usr/lib/x86_64-linux-gnu/valgrind/vgpreload_memcheck-amd64-linux.so) ... ``` Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>	2022-06-29 21:07:42 +09:30
Rusty Russell	fd90e5746b	connectd: don't keep around more than one old connection. This was fixed in `1c495ca5a8` ("connectd: fix accidental handling of old reconnections.") and then reverted by the rework in "connectd: avoid use-after-free upon multiple reconnections by a peer". The latter made the race much less likely, since we cleaned up the reconnecting struct once the connection was hung up by the remote node, but it's still theoretically possible. Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>	2022-06-28 13:47:27 +09:30
Rusty Russell	afbddcf7f3	lightningd: fix crash on rapid reconnect. Happens occasionally when running `tests/test_connection.py::test_mutual_reconnect_race` (which is too flaky to add, without more fixes): ``` lightningd: lightningd/peer_control.c:1252: peer_active: Assertion `!channel->owner' failed. lightningd: FATAL SIGNAL 6 (version v0.11.0.1-38-g4f167da) 0x5594a41f8f45 send_backtrace common/daemon.c:33 0x5594a41f8fef crashdump common/daemon.c:46 0x7f7cb585c08f ??? /build/glibc-SzIz7B/glibc-2.31/signal/../sysdeps/unix/sysv/linux/x86_64/sigaction.c:0 0x7f7cb585c00b __GI_raise ../sysdeps/unix/sysv/linux/raise.c:51 0x7f7cb583b858 __GI_abort /build/glibc-SzIz7B/glibc-2.31/stdlib/abort.c:79 0x7f7cb583b728 __assert_fail_base /build/glibc-SzIz7B/glibc-2.31/assert/assert.c:92 0x7f7cb584cfd5 __GI___assert_fail /build/glibc-SzIz7B/glibc-2.31/assert/assert.c:101 0x5594a41b45ca peer_active lightningd/peer_control.c:1252 0x5594a418794c connectd_msg lightningd/connect_control.c:457 0x5594a41cd457 sd_msg_read lightningd/subd.c:556 0x5594a41ccbe5 read_fds lightningd/subd.c:357 0x5594a4269fc2 next_plan ccan/ccan/io/io.c:59 0x5594a426abca do_plan ccan/ccan/io/io.c:407 0x5594a426ac0c io_ready ccan/ccan/io/io.c:417 0x5594a426ceff io_loop ccan/ccan/io/poll.c:453 0x5594a41930d9 io_loop_with_timers lightningd/io_loop_with_timers.c:22 0x5594a4199293 main lightningd/lightningd.c:1181 0x7f7cb583d082 __libc_start_main ../csu/libc-start.c:308 0x5594a416e15d ??? ???:0 0xffffffffffffffff ??? ???:0 ``` Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>	2022-06-28 13:47:27 +09:30
Matt Whitlock	83c825945c	connectd: avoid use-after-free upon multiple reconnections by a peer `peer_reconnected` was freeing a `struct peer_reconnected` instance while a pointer to that instance was registered to be passed as an argument to the `retry_peer_connected` callback function. This caused a use-after-free crash when `retry_peer_connected` attempted to reparent the instance to the temporary context. Instead, never have `peer_reconnected` free a `struct peer_reconnected` instance, and only ever allow such an instance to be freed after the `retry_peer_connected` callback has finished with it. To ensure that the instance is freed even if the connection is closed before the callback can be invoked, parent the instance to the connection rather than to the daemon. Absent the need to free `struct peer_reconnected` instances outside of the `retry_peer_connected` callback, there is no use for the `reconnected` hashtable, so remove it as well. See: https://github.com/ElementsProject/lightning/issues/5282#issuecomment-1141454255 Fixes: #5282 Fixes: #5284 Changelog-Fixed: connectd no longer crashes when peers reconnect.	2022-06-28 13:47:27 +09:30
Rusty Russell	4ee55acc71	connectd: don't start connecting in parallel in peer_conn_closed. The crash below from @zerofeerouting left me confused. The invalid value in fmt_wireaddr_internal is a telltale sign of use-after-free. This backtrace shows us destroying the conn twice: what's happening? Well, tal carefully protects against destroying twice: it's not that unusual to free something in a destructor which has already been freed. So this indicates that there are two io_conn hanging off one struct connecting, which isn't supposed to happen! We deliberately call try_connect_one_addr() initially, then inside the io_conn destructor. But due to races in connectd vs lightningd connection state, we added a fix which allows a connect command to sit around while the peer is cleaning up (`6cc9f37cab`) and get fired off when it's done. But what if, in the chaos, we are already connecting again? Now we'll end up with two connections. Fortunately, we have a `conn` pointer inside struct connecting, which (with a bit of additional care) we can ensure is only non-NULL while we're actually trying to connect. This lets us check that before firing off a new connection attempt in peer_conn_closed. ``` lightning_connectd: FATAL SIGNAL 6 (version v0.11.2rc2-2-g8f7e939) 0x5614a4915ae8 send_backtrace common/daemon.c:33 0x5614a4915b72 crashdump common/daemon.c:46 0x7ffa14fcd72f ??? ???:0 0x7ffa14dc87bb ??? ???:0 0x7ffa14db3534 ??? ???:0 0x5614a491fc71 fmt_wireaddr_internal common/wireaddr.c:255 0x5614a491fc7a fmt_wireaddr_internal_ common/wireaddr.c:257 0x5614a491ea6b type_to_string_ common/type_to_string.c:32 0x5614a490beaa destroy_io_conn connectd/connectd.c:754 0x5614a494a2f1 destroy_conn ccan/ccan/io/poll.c:246 0x5614a494a313 destroy_conn_close_fd ccan/ccan/io/poll.c:252 0x5614a4953804 notify ccan/ccan/tal/tal.c:240 0x5614a49538d6 del_tree ccan/ccan/tal/tal.c:402 0x5614a4953928 del_tree ccan/ccan/tal/tal.c:412 0x5614a4953e07 tal_free ccan/ccan/tal/tal.c:486 0x5614a4908b7a try_connect_one_addr connectd/connectd.c:870 0x5614a490bef1 destroy_io_conn connectd/connectd.c:759 0x5614a494a2f1 destroy_conn ccan/ccan/io/poll.c:246 0x5614a494a313 destroy_conn_close_fd ccan/ccan/io/poll.c:252 0x5614a4953804 notify ccan/ccan/tal/tal.c:240 0x5614a49538d6 del_tree ccan/ccan/tal/tal.c:402 0x5614a4953e07 tal_free ccan/ccan/tal/tal.c:486 0x5614a4948f08 io_close ccan/ccan/io/io.c:450 0x5614a4948f59 do_plan ccan/ccan/io/io.c:401 0x5614a4948fe1 io_ready ccan/ccan/io/io.c:417 0x5614a494a8e6 io_loop ccan/ccan/io/poll.c:453 0x5614a490c12f main connectd/connectd.c:2164 0x7ffa14db509a ??? ???:0 0x5614a4904e99 ??? ???:0 0xffffffffffffffff ??? ???:0 ``` Fixes: #5339 Changelog-Fixed: connectd: occasional crash when we reconnect to a peer quickly. Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>	2022-06-28 13:46:59 +09:30
Rusty Russell	8e2dcc1167	doc: document the [] IPv6 address hack. Fixes: #5317 Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>	2022-06-28 13:46:41 +09:30
Vincenzo Palazzo	2754e30269	devtools: revert changes and make sure chainparams always set. [Also fixes crash with -h, since chainparams was not set! --RR]	2022-06-28 06:09:01 +09:30
Vincenzo Palazzo	2c9d7484a2	test: introduce the default port function and remove hard coded port	2022-06-28 06:09:01 +09:30
Vincenzo Palazzo	7ff62b4a00	lightnind: remove`DEFAULT_PORT` global definition Signed-off-by: Vincenzo Palazzo <vincenzopalazzodev@gmail.com>	2022-06-28 06:09:01 +09:30
Vincenzo Palazzo	c07d44b4d4	docs: update the docs in according with the new code Suggested-by: @rustyrussell Signed-off-by: Vincenzo Palazzo <vincenzopalazzodev@gmail.com> # Title: Signed-off-by: Vincenzo Palazzo <vincenzopalazzodev@gmail.com>	2022-06-28 06:09:01 +09:30
Vincenzo Palazzo	4b11f968ad	lightningd: change the regtest default port according with the tests This doesn't make sense, but for now we just keep it Signed-off-by: Vincenzo Palazzo <vincenzopalazzodev@gmail.com>	2022-06-28 06:09:01 +09:30
Vincenzo Palazzo	cc7a405ca4	lightningd: use the standard port derivation in connect command Complete implementation of BOLT1 port derivation proposal https://github.com/lightning/bolts/pull/968 Changelog-Added: rpc: use the standard port derivation in connect command when the port is not specified. Signed-off-by: Vincenzo Palazzo <vincenzopalazzodev@gmail.com>	2022-06-28 06:09:01 +09:30
Rusty Russell	9152b8c424	pytest: fix test_multifunding_feerates There's a 1 in 256 chance that our signature on the transaction is 70, not 71 bytes long. This changes the freerate. So fix up the weight in this case, to be the expected weight. ``` @unittest.skipIf(TEST_NETWORK == 'liquid-regtest', "Fees on elements are different") @pytest.mark.developer("uses dev-fail") @pytest.mark.openchannel('v1') # v2 the weight calculation is off by 3 deftest_multifunding_feerates(node_factory, bitcoind): ''' Test feerate parameters for multifundchannel ''' funding_tx_feerate = '10000perkw' commitment_tx_feerate_int = 2000 commitment_tx_feerate = str(commitment_tx_feerate_int) + 'perkw' l1, l2, l3 = node_factory.get_nodes(3, opts={'log-level': 'debug'}) l1.fundwallet(1 << 26) def_connect_str(node): return'{}@localhost:{}'.format(node.info['id'], node.port) destinations = [{"id": _connect_str(l2), 'amount': 50000}] res = l1.rpc.multifundchannel(destinations, feerate=funding_tx_feerate, commitment_feerate=commitment_tx_feerate) entry = bitcoind.rpc.getmempoolentry(res['txid']) weight = entry['weight'] expected_fee = int(funding_tx_feerate[:-5]) * weight // 1000 > assert expected_fee == entry['fees']['base'] * 10 ** 8 E AssertionError: assert 7000 == (Decimal('0.00007010') * (10 ** 8)) tests/test_connection.py:1982: AssertionError ``` Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>	2022-06-27 17:21:35 +09:30
Rusty Russell	25699994e5	pytest: fix flake in test_node_reannounce Again, our new behaviour of sending our own gossip even before they ask can confuse our gossip query tests. In this case, simply eliminate duplicates. Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>	2022-06-27 17:21:35 +09:30
Rusty Russell	a677ea404b	pytest: fix flake in test_query_short_channel_id Again, our new behaviour of sending our own gossip even before they ask can confuse our gossip query tests. Create a new node, attach it, and perform queries on it. Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>	2022-06-27 17:21:35 +09:30
Rusty Russell	4970704d24	pytest: fix gossipwith flake harder. Even though we generally wait until a node has seen the gossip, that doesn't mean that connectd has processed it! This means when we connect it may still send us "old" gossip. So we set the OPT_GOSSIP_QUERIES bit, which means don't send until we ask. But now it sends us WIRE_QUERY_CHANNEL_RANGE, so everyone needs to filter that out. Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>	2022-06-27 17:21:35 +09:30
Rusty Russell	e120b4afd6	lightningd: add more information should subd send wrong message. I saw this once, but could not track it down. Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>	2022-06-27 17:21:35 +09:30
Rusty Russell	3f98cf3fce	lightningd: track weird CI crash in test_important_plugin Looks like we woke one of the startup io_loops early, and thus we thought we'd finished connectd_activate and we hadn't. This caused us to use an uninitialized ld->announceable array, and finally caused an assert fail in the main loop. Make every loop assert that it was exited for the correct reason, so if it happens again, we can maybe figure out what part of the code to look at. ``` lightningd: lightningd/lightningd.c:1186: main: Assertion `io_loop_ret == ld' failed. lightningd: FATAL SIGNAL 6 (version 4df66fa) ... ------------------------------- Valgrind errors -------------------------------- Valgrind error file: valgrind-errors.895509 ==895509== Conditional jump or move depends on uninitialised value(s) ==895509== at 0x22C58E: to_tal_hdr_or_null (tal.c:184) ==895509== by 0x22D531: tal_bytelen (tal.c:637) ==895509== by 0x1F10B6: towire_gossipd_init (gossipd_wiregen.c:100) ==895509== by 0x13AC6E: gossip_init (gossip_control.c:254) ==895509== by 0x1497EC: main (lightningd.c:1090) ==895509== ``` Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>	2022-06-27 17:21:35 +09:30
Rusty Russell	9137ea262b	pytest: don't assume that join_nodes needs only check ends. I saw another "only_one()" fail on alias checking: it's not entirely clear to me with the more aggressive sending of own gossip, that we necessarily process in order, so we might not have actually seen all channels just because we saw the farthest one. Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>	2022-06-27 17:21:35 +09:30
Rusty Russell	3958d59e39	pytest: fix test_channel_lease_post_expiry flake. Make sure bitcoind gets tx before mining blocks! ``` # l1<->l2 mutual close should work chan = l1.get_channel_scid(l2) l2.rpc.connect(l1.info['id'], 'localhost', l1.port) l1.rpc.close(chan) l2.daemon.wait_for_log('State changed from CLOSINGD_SIGEXCHANGE to CLOSINGD_COMPLETE') bitcoind.generate_block(2) sync_blockheight(bitcoind, [l1, l2]) > l1.daemon.wait_for_log('Resolved FUNDING_TRANSACTION/FUNDING_OUTPUT by MUTUAL_CLOSE') tests/test_closing.py:851: ``` Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>	2022-06-27 17:21:35 +09:30
Rusty Russell	70b091d9f6	lightningd: fix transient leak report when openingd shutting down. ``` > raiseValueError(str(errors)) E ValueError: E Node errors: E Global errors: E - Node /tmp/ltests-x5sfpiwp/test_openchannel_hook_chaining_1/lightning-2/ has memory leaks: [ E { E "backtrace": [ E "ccan/ccan/tal/tal.c:442 (tal_alloc_)", E "ccan/ccan/io/io.c:91 (io_new_conn_)", E "lightningd/subd.c:773 (new_subd)", E "lightningd/subd.c:827 (new_channel_subd_)", E "lightningd/opening_control.c:870 (peer_start_openingd)", E "lightningd/peer_control.c:1307 (peer_active)", E "lightningd/connect_control.c:457 (connectd_msg)", E "lightningd/subd.c:556 (sd_msg_read)", E "lightningd/subd.c:357 (read_fds)", E "ccan/ccan/io/io.c:59 (next_plan)", E "ccan/ccan/io/io.c:407 (do_plan)", E "ccan/ccan/io/io.c:417 (io_ready)", E "ccan/ccan/io/poll.c:453 (io_loop)", E "lightningd/io_loop_with_timers.c:22 (io_loop_with_timers)", E "lightningd/lightningd.c:1181 (main)", E "../csu/libc-start.c:308 (__libc_start_main)" E ], E "label": "ccan/ccan/io/io.c:91:struct io_conn", E "parents": [ E "lightningd/lightningd.c:107:struct lightningd" E ], E "value": "0x2b5a898" E } E ] ``` Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>	2022-06-27 17:21:35 +09:30
Rusty Russell	fcff21fae5	pytest: allow more time for test_waitblockheight !DEVELOPER. It actually timed out with the default 60 seconds, just before it saw the block: ``` 2022-06-07T02:16:05.2557049Z bitcoind.generate_block(1) 2022-06-07T02:16:05.2557300Z sync_blockheight(bitcoind, [node]) 2022-06-07T02:16:05.2557594Z fut1.result(5) 2022-06-07T02:16:05.2557912Z assert not fut2.done() 2022-06-07T02:16:05.2558121Z 2022-06-07T02:16:05.2558370Z # Trigger two blocks. 2022-06-07T02:16:05.2558689Z bitcoind.generate_block(1) 2022-06-07T02:16:05.2558941Z sync_blockheight(bitcoind, [node]) 2022-06-07T02:16:05.2559219Z > fut2.result(5) 2022-06-07T02:16:05.2559350Z 2022-06-07T02:16:05.2559508Z tests/test_misc.py:2138: ... 2022-06-07T02:16:05.2586947Z elif "error" in resp: 2022-06-07T02:16:05.2587398Z > raise RpcError(method, payload, resp['error']) 2022-06-07T02:16:05.2588026Z E pyln.client.lightning.RpcError: RPC call failed: method: waitblockheight, payload: {'blockheight': 103}, error: {'code': 2000, 'message': 'Timed out.'} 2022-06-07T02:16:05.2588325Z 2022-06-07T02:16:05.2588563Z contrib/pyln-client/pyln/client/lightning.py:387: RpcError ``` Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>	2022-06-27 17:21:35 +09:30
Rusty Russell	b4820d6706	lightningd: don't run off end of buffer if db_hook returns nonsense. It shouldn't return nonsense, but it did, and we segfaulted. Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>	2022-06-27 17:21:35 +09:30
Rusty Russell	6221ac621d	pytest: fix timeout in test_option_types We were waiting for the start to timeout waiting for the "public key" message. Instead, start manually. Before, this took (with TIMEOUT=30) 97 seconds, after 8 seconds. Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>	2022-06-27 17:21:35 +09:30
Rusty Russell	61d8eb5fa8	pyln-testing: increase default daemon.wait() timeout. Valgrind in CI is slow: ``` def test_bitcoin_failure(node_factory, bitcoind): ... # Ignore BROKEN log message about blocksonly mode. l2 = node_factory.get_node(start=False, expect_fail=True, allow_broken_log=True) l2.daemon.start(wait_for_initialized=False) # Will exit with failure code. > assert l2.daemon.wait() == 1 tests/test_misc.py:114: ``` Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>	2022-06-27 17:21:35 +09:30
Rusty Russell	d2952576cd	pyln-testing: restore proper streaming behaviour for lightningd. Get it to log direct to stdout, so we see what's happening as it happens rather than as we read it. We could restore the thread we were using before, but that added more problems than it solved. This means that we need the hsm password prompts in the log though. Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>	2022-06-27 17:21:35 +09:30
Rusty Russell	a0e0dbf229	pyln-testing: use files for stdout and stderr, not threads. Some flakes are caused by weird races in this code. Plus, if we get things to write straight to files, we might see things in there on post-mortem which happen after the python runner exits. It's a bit less efficient, but much simpler. Let's see if it helps! Some tests need a rework now, since we don't get a failure (except eventual timeout), but they're simpler. Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>	2022-06-27 17:21:35 +09:30
Rusty Russell	bcd050a610	pytest: fix test_ping_timeout flake. We can hang up due to ping while waiting for channel to become active. But we don't even need an active channel for this test, so simplify. ``` ______________________________ test_ping_timeout _______________________________ [gw1] linux -- Python 3.7.13 /opt/hostedtoolcache/Python/3.7.13/x64/bin/python3 node_factory = <pyln.testing.utils.NodeFactory object at 0x7f132cca8a90> @pytest.mark.developer("dev-disconnect required") def test_ping_timeout(node_factory): # Disconnects after this, but doesn't know it. l1_disconnects = ['xWIRE_PING'] l1, l2 = node_factory.line_graph(2, opts=[{'dev-no-reconnect': None, 'disconnect': l1_disconnects}, > {}]) tests/test_connection.py:3826: _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ contrib/pyln-testing/pyln/testing/utils.py:1493: in line_graph self.join_nodes(nodes, fundchannel, fundamount, wait_for_announce, announce_channels) contrib/pyln-testing/pyln/testing/utils.py:1470: in join_nodes nodes[i].wait_channel_active(scids[i]) contrib/pyln-testing/pyln/testing/utils.py:1003: in wait_channel_active wait_for(lambda: self.is_channel_active(chanid)) _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ success = <function LightningNode.wait_channel_active.<locals>.<lambda> at 0x7f132c3ef830> timeout = 900 def wait_for(success, timeout=TIMEOUT): start_time = time.time() interval = 0.25 while not success(): time_left = start_time + timeout - time.time() if time_left <= 0: > raise ValueError("Timeout while waiting for {}", success) E ValueError: ('Timeout while waiting for {}', <function LightningNode.wait_channel_active.<locals>.<lambda> at 0x7f132c3ef830>) ```	2022-06-27 17:21:35 +09:30
Rusty Russell	517828adb2	lightningd: don't print nasty message when onchaind fails partially-failed HTLC 1. We set an outgoing htlc's `failonion` when we get a commitment_signed. 2. We don't transfer it to the corresponding incoming HTLC until we send commitment_signed and receive revoke_and_ack (meaning, outgoing htlc is completely dead). 3. If between these steps we go onchain, onchaind (after 3 blocks) tells us to fail the HTLC. 4. hout->failonion is set, but hout->hin has not been failed yet. We do a sanity check and print a nasty message, and fail it with WIRE_PERMANENT_CHANNEL_FAILURE instead of relaying the error. So handle this case explicitly. Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>	2022-06-27 17:21:35 +09:30
Rusty Russell	353361a05c	pytest: test to demonstrate BROKEN message on onchaind htlc close. We assumed that if the outgoing htlc has failed, we should have always failed the incoming one. This is not true, as this testcase demonstrates: ``` lightningd-2: 2022-05-25T04:38:31.449Z DEBUG 035d2b1192dfba134e10e540875d366ebc8bc353d5aa766b80c090b39c3a5d885d-onchaind-chan#2: Sending 1 missing htlc messages lightningd-2: 2022-05-25T04:38:31.449Z DEBUG 035d2b1192dfba134e10e540875d366ebc8bc353d5aa766b80c090b39c3a5d885d-chan#2: onchain_failed_our_htlc lightningd-2: 2022-05-25T04:38:31.449Z DEBUG 035d2b1192dfba134e10e540875d366ebc8bc353d5aa766b80c090b39c3a5d885d-chan#2: HTLC id 0 failonion = 0x55cb8bc045d8, failmsg = (nil), preimage = (nil) lightningd-2: 2022-05-25T04:38:31.449Z BROKEN 035d2b1192dfba134e10e540875d366ebc8bc353d5aa766b80c090b39c3a5d885d-chan#2: HTLC id 0 already complete, but ->in not resolved! failonion = 206816a391d3a7666ddd5213914cbb68f5da1fc4a0937e729de5a1990c94d26312caa5f2778e8da0c6bdefc68dd1a3bc28b5b5650fc0bdb3c2247ecca94ed0bbb224c8448c2c71eb1656a8740cadb301bd1ee1c1e774a8fef817352f502e4217b11e93aa6877b88b37afab0e4d4e49ed0385be9ab9a1ab1ac0e3460e41cfafb30ed896cea96e346041919a6c524ce56c3e5f27c7cd78a36b6df221e90a1c6e048c72b4146a5a51885fb70649037fe7ace77a016ae3ec8aee97960d0e5f0582713f671df79d8dee11b82708b6d882ee5adbb328db1938e824110f57ead1b27410bc6f775c7bb4ae40c1768d77a166c9bfda8f634ba0ac4f8a9fe199894dd3754c5ce41c9694544c805ffc177517661f11221dd8dffac60ce1c8c5bf54cda8e5ea44d8ec6b, failmsg = (null), preimage = (null) lightningd-2: 2022-05-25T04:38:31.449Z BROKEN 035d2b1192dfba134e10e540875d366ebc8bc353d5aa766b80c090b39c3a5d885d-chan#2: MISSING incoming fail for 0: failing incoming now lightningd-2: 2022-05-25T04:38:31.449Z BROKEN 0266e4598d1d3c415f572a8488830b60f7e744ed9235eb0b1ba93283b315c03518-chan#1: MISSED incoming fail for 0: failing now lightningd-2: 2022-05-25T04:38:31.449Z DEBUG 0266e4598d1d3c415f572a8488830b60f7e744ed9235eb0b1ba93283b315c03518-chan#1: HTLC in 0 RCVD_ADD_ACK_REVOCATION->SENT_REMOVE_HTLC ``` Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>	2022-06-27 17:21:35 +09:30
Rusty Russell	11de721ba9	gossipd: fix gossmap race. When upgrading a channel from private to public, we would delete the private channel then add the public one. However, this is visible in the gossmap! In particular, the node may be removed from the gossmap and then re-added, so it may temporarily vanish! I was seeing an occasional assertion inside node_factory.line_graph: ``` @pytest.mark.developer def test_reconnect_remote_sends_no_sigs(node_factory): """We re-announce, even when remote node doesn't send its announcement_signatures on reconnect. """ > l1, l2 = node_factory.line_graph(2, wait_for_announce=True, opts={'may_reconnect': True}) tests/test_connection.py:870: _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ contrib/pyln-testing/pyln/testing/utils.py:1467: in line_graph self.join_nodes(nodes, fundchannel, fundamount, wait_for_announce, announce_channels) contrib/pyln-testing/pyln/testing/utils.py:1460: in join_nodes wait_for(lambda: 'alias' in only_one(end.rpc.listnodes(n.info['id'])['nodes'])) contrib/pyln-testing/pyln/testing/utils.py:88: in wait_for while not success(): contrib/pyln-testing/pyln/testing/utils.py:1460: in <lambda> wait_for(lambda: 'alias' in only_one(end.rpc.listnodes(n.info['id'])['nodes'])) _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ arr = [] def only_one(arr): """Many JSON RPC calls return an array; often we only expect a single entry """ > assert len(arr) == 1 E AssertionError ``` Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>	2022-06-27 17:21:35 +09:30
Rusty Russell	2b7915359c	gossmap: handle case where private channel turns into public. Usually we won't see this, since private is deleted. But we could have already read the private channel before that. Handle it properly. (Tested by removing the gossip_store deletion code and making sure this worked). We have to fix up the test, which announces a channel twice! Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>	2022-06-27 17:21:35 +09:30
Rusty Russell	c1a111b68c	pytest: wait for stderr, rather than asserting. Two almost identical assertion fails under CI. Try waiting. Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>	2022-06-27 17:21:35 +09:30
Rusty Russell	2f27aad8d0	pytest: fix flake in test_wallet.py::test_hsm_secret_encryption Under valgrind, at least, this test fails regularly, and this sleep fixes it. Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>	2022-06-27 17:21:35 +09:30
Rusty Russell	b1f393f355	pytest: fix race in test_multichan We can fail to use larger channel if it's not ready yet: ``` 2022-05-23T01:20:05.5325600Z # Check it used the larger channel! 2022-05-23T01:20:05.5326376Z > assert before[chan23a_idx]['to_us_msat'] == after[chan23a_idx]['to_us_msat'] 2022-05-23T01:20:05.5326961Z E assert 1000000000msat == 900000000msat 2022-05-23T01:20:05.5327240Z 2022-05-23T01:20:05.5327621Z tests/test_connection.py:3896: AssertionError ``` Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>	2022-06-27 17:21:35 +09:30
Rusty Russell	6247bee11e	pytest: try to fix bitcoind socket timeout in test_channel_lease_unilat_closes Split harder! Signed-off-by: Rusty Russell <rusty@rustcorp.com.au> ``` # This can timeout, so do it in four easy stages. for i in range(4): > bitcoind.generate_block(4032 // 4) tests/test_closing.py:971: _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ contrib/pyln-testing/pyln/testing/utils.py:475: in generate_block return self.rpc.generatetoaddress(numblocks, to_addr) contrib/pyln-testing/pyln/testing/utils.py:372: in f res = proxy._call(name, args) /opt/hostedtoolcache/Python/3.7.13/x64/lib/python3.7/site-packages/bitcoin/rpc.py:233: in _call response = self._get_response() /opt/hostedtoolcache/Python/3.7.13/x64/lib/python3.7/site-packages/bitcoin/rpc.py:263: in _get_response http_response = self.__conn.getresponse() /opt/hostedtoolcache/Python/3.7.13/x64/lib/python3.7/http/client.py:1373: in getresponse response.begin() /opt/hostedtoolcache/Python/3.7.13/x64/lib/python3.7/http/client.py:319: in begin version, status, reason = self._read_status() /opt/hostedtoolcache/Python/3.7.13/x64/lib/python3.7/http/client.py:280: in _read_status line = str(self.fp.readline(_MAXLINE + 1), "iso-8859-1") _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ self = <socket.SocketIO object at 0x7f1a4b3702d0> b = <memory at 0x7f1a4ac51600> def readinto(self, b): """Read up to len(b) bytes into the writable buffer b* and return the number of bytes read. If the socket is non-blocking and no bytes are available, None is returned. If b is non-empty, a 0 return value indicates that the connection was shutdown at the other end. """ self._checkClosed() self._checkReadable() if self._timeout_occurred: raise OSError("cannot read from timed out object") while True: try: > return self._sock.recv_into(b) E socket.timeout: timed out ``` Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>	2022-06-27 17:21:35 +09:30
Rusty Russell	eb25e08039	pytest: fix port allocation race when nodes restart. We use ephemeral_port_reserve to grab ports, but this can fail when we restart a node, since the port can be reallocated at that point. Attempt to overcome this using a global reserved list (is there a neater way?). Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>	2022-06-27 17:21:35 +09:30
Rusty Russell	a1b8b40d13	connectd: fix debug message on bind fail. It doesn't get the right errno, and it says "create" not "bind". ``` 2022-05-20T03:04:46.498Z DEBUG connectd: Failed to create 2 socket: Success 2022-05-20T03:04:46.500Z DEBUG connectd: REPLY WIRE_CONNECTD_INIT_REPLY with 0 fds 2022-05-20T03:04:46.501Z DEBUG connectd: connectd_init_done 2022-05-20T03:04:46.503Z BROKEN connectd: Failed to bind socket for 127.0.0.1:37871: Address already in use ``` Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>	2022-06-27 17:21:35 +09:30
Rusty Russell	71b1eaf2fe	pyln-testing: try harder if cleaning directory fails. This happens under CI, but it's not informative. Sleep and retry. Also, "except (OSError, Exception)" does not seem to do what you'd think: this clause never gets run. Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>	2022-06-27 17:21:35 +09:30
Rusty Russell	d274de2bf4	pytest: fix flake in test_htlc_too_dusty_outgoing We can sneak the HTLC add in before we do the update_fee which fails: ``` # the channel should start warning -- too much dust inv = l2.rpc.invoice(htlc_val_msat, str(num_dusty_htlcs + 1), str(num_dusty_htlcs + 1)) with pytest.raises(RpcError, match=r'WIRE_TEMPORARY_CHANNEL_FAILURE'): > l1.rpc.sendpay(route, inv['payment_hash'], payment_secret=inv['payment_secret']) E Failed: DID NOT RAISE <class 'pyln.client.lightning.RpcError'> tests/test_pay.py:2654: Failed ``` From the logs: ``` lightningd-1: 2022-05-20T02:16:40.008Z DEBUG 022d223620a359a47ff7f7ac447c85c46c923da53389221a0054c11c1e3ca31d59-channeld-chan#1: NEW:: HTLC LOCAL 14 = SENT_ADD_HTLC/RCVD_ADD_HTLC lightningd-1: 2022-05-20T02:16:40.010Z DEBUG 022d223620a359a47ff7f7ac447c85c46c923da53389221a0054c11c1e3ca31d59-channeld-chan#1: Adding HTLC 14 amount=10000000msat cltv=113 gave CHANNEL_ERR_ADD_OK lightningd-1: 2022-05-20T02:16:40.012Z DEBUG 022d223620a359a47ff7f7ac447c85c46c923da53389221a0054c11c1e3ca31d59-channeld-chan#1: peer_out WIRE_UPDATE_ADD_HTLC lightningd-1: 2022-05-20T02:16:40.015Z DEBUG 022d223620a359a47ff7f7ac447c85c46c923da53389221a0054c11c1e3ca31d59-channeld-chan#1: REPLY WIRE_CHANNELD_OFFER_HTLC_REPLY with 0 fds lightningd-1: 2022-05-20T02:16:40.026Z DEBUG 022d223620a359a47ff7f7ac447c85c46c923da53389221a0054c11c1e3ca31d59-channeld-chan#1: peer_out WIRE_WARNING lightningd-1: 2022-05-20T02:16:40.029Z DEBUG 022d223620a359a47ff7f7ac447c85c46c923da53389221a0054c11c1e3ca31d59-channeld-chan#1: billboard perm: Too much dust to update fee (Desired feerate update 20000) lightningd-1: 2022-05-20T02:16:40.035Z INFO 022d223620a359a47ff7f7ac447c85c46c923da53389221a0054c11c1e3ca31d59-chan#1: Peer transient failure in CHANNELD_NORMAL: channeld WARNING: Too much dust to update fee (Desired feerate update 20000) lightningd-1: 2022-05-20T02:16:40.039Z DEBUG 022d223620a359a47ff7f7ac447c85c46c923da53389221a0054c11c1e3ca31d59-lightningd: Will try reconnect in 60 seconds ``` Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>	2022-06-27 17:21:35 +09:30
Rusty Russell	930860c781	pytest: fix flakes in test_gossip_query_channel_range Nodes now spray their own gossip at the start; this can race with our query, and they can give us gossip we didn't ask for. The fix is to always query an uninvolved node: in this case, we only connected l3 and l4 to test zlib, which is gone, so now we can remove that and simply always query l4. ``` 2022-05-23T01:32:53.4695809Z # reply_channel_range == 264 2022-05-23T01:32:53.4696416Z > assert msgs == ['0108' 2022-05-23T01:32:53.4696835Z # blockhash 2022-05-23T01:32:53.4697328Z + genesis_blockhash 2022-05-23T01:32:53.4697841Z # first_blocknum, number_of_blocks, complete 2022-05-23T01:32:53.4698685Z + format(0, '08x') + format(65535, '08x') + '01' 2022-05-23T01:32:53.4699245Z # encoded_short_ids 2022-05-23T01:32:53.4699859Z + format(len(encoded) // 2, '04x') 2022-05-23T01:32:53.4700255Z + encoded] 2022-05-23T01:32:53.4700799Z E AssertionError: assert ['0100194b419...700000010000'] == ['010806226e4...700000010000'] 2022-05-23T01:32:53.4710678Z E At index 0 diff: '0100194b4190af98beedf231901aeda2dcee0e757dd5327219aa007ea6027e769c98665f49e0f99e03b91e80896ed01cd9c4b157c77e908852097d2ba753c6ca01ae41a90a81c9f492c770d9dd35010c702c7c564045cf85a4bf6efaf25cceb9b81b56d1526429d3c60bf1c7ed6a835720730fd7ad1851a937d18aecf8bc3400dd4e0a30d5c4d65999ac39ae6eb3b98d27e3157973598256d2dd4718cd39010a3612580e7f3bba9eda0fb595976f955e345b6affa62fd556f7257a30d54dd87743dc38585e7c31e1ca80b42cf803bd4c1aaa3127351c95de8a4198c46eb0bd7b4ce175a4b51fcaf9676b01806d237758494fcf3d1558279060a0c3b84806a3330c41000006226e46111a0b59caaf126043eb5bbf28c34f3a5e332a1fc7b2b73cf188910f0000700000010000035d2b1192dfba134e10e540875d366ebc8bc353d5aa766b80c090b39c3a5d885d0382ce59ebf18be7d84677c2e35f23294b9992ceca95491fcf8a56c6cb2d9de19903c47b1b2afcfd68c7b86c0976adb9a4f0835bc78242fced745d78433497a867d6021c29746e4136ddff456483df3980c2d0d5f31c93ef5ded564f7294a10d7414aa' != '010806226e46111a0b59caaf126043eb5bbf28c34f3a5e332a1fc7b2b73cf188910f000000000000ffff01001900000068000001000100006900000100000000700000010000' ``` Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>	2022-06-27 17:21:35 +09:30
Rusty Russell	aefcea44a3	pytest: fix flake in test_gossip_timestamp_filter We didn't get the second channel_announcement in the filter: ``` # Now choose range which will only give second update. msgs = l4.query_gossip('gossip_timestamp_filter', genesis_blockhash, after_12 - backdate, after_23 - after_12 + 1, filters=['0109', '0012']) # 0x0100 = channel_announcement # 0x0102 = channel_update # (Node announcement may have any timestamp) types = Counter([m[0:4] for m in msgs]) > assert types['0100'] == 1 E assert 0 == 1 ``` Looking at the gossip store: ``` 1089: t=1653008725 channel_announcement: 0100420c13f233d9b461b035257b77704e0003a0280efdbb8f8964bf4c5bd22b8f8b2e205a3c57d7d5f4dcd812cd41be3d8f84c60046c1fd31abefd0985c2263f8f5074bb7dff548ac106dc10ef4485f5921d9b252d094f9518b44a842c1a2c7f98759fa2f8803febeb59e50b65a8f258877e18f6224262e218de38f21438b2112ef0425d64174092bcd8d757f18982c3fb83c8ae2fb5ee88363248e51bdea247cef055a01a17580b60735b82b88fea87dccb724a743beaf4bf3cf578b0da445afcc786fe3b061429f3288e6970819236e97134898ce900b7b61159148d6785e26932e21f1c337cb6f86a16e967e27396b168a19b0d1478c4be2579a9b96aa0125aa000006226e46111a0b59caaf126043eb5bbf28c34f3a5e332a1fc7b2b73cf188910f00006e0000010000022d223620a359a47ff7f7ac447c85c46c923da53389221a0054c11c1e3ca31d59035d2b1192dfba134e10e540875d366ebc8bc353d5aa766b80c090b39c3a5d885d02d595ae92b3544c3250fb772f214ad8d4c51425033740a5bcc357190add6d7e7a02d6063d022691b2490ab454dee73a57c6ff5d308352b461ece69f3c284f2c2412 1533: channel_amount: 100000sat 1555: t=1653008725 channel_update: 01022b98e7d680ffa8096d0a2bda62c0b8b3fc975a5f91c0cff0a42a76b37285276c0156581f85c0be2ee0ed7e7003cb67f5d476a7e3714960836b882bb3ad78097406226e46111a0b59caaf126043eb5bbf28c34f3a5e332a1fc7b2b73cf188910f00006e00000100006286e955010000060000000000000000000000010000000a0000000005e69ec0 1705: t=1653008729 node_announcement: 010175f8ed9f48db511d43f1da96aa307ccc53a521dcc2f8fb732c6938650e6cc00f672e772134a29e16b723a1e9cf626cf942323bd3fa294befd25430a9395c1fa40007800000080269a26286e959035d2b1192dfba134e10e540875d366ebc8bc353d5aa766b80c090b39c3a5d885d035d2b484f5050494e47464952452d34646636366661000000000000000000000000000000 1866: t=1653008720 channel_update: 010266385083f1d6e554e1701f2456c24212c5aa937f7b090d32d243678b2fe78f7f52ed3626d6cdd52703bf78013bb9d1a621ae5b7d68fc965d29bb94574e340b5606226e46111a0b59caaf126043eb5bbf28c34f3a5e332a1fc7b2b73cf188910f00006e00000100006286e950010100060000000000000000000000010000000a00000 ``` It's not clear what we times we were querying, but let's make it clearer by putting a full 10 seconds between them, and making our queries more exact. Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>	2022-06-27 17:21:35 +09:30
Rusty Russell	575b94c1ef	pytest: Remove all trace of python's "flaky" module. Over time, it has cost us more developer cycles than it has gained. It has hidden intermittant bugs, and allowed cruft to accumulate: when we eventually tried to figure out what was going wrong, the actual change which caused it was now stale and forgotten. This was a particular bane during the connectd rewrite, and I worked through some issues which had occurred before, but were not more likely. Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>	2022-06-27 17:21:35 +09:30
Rusty Russell	56dde2cb77	lightningd: multiple log-file options allow more than one log output. I've wanted this for a while: the ability to log to multiple places at once. Signed-off-by: Rusty Russell <rusty@rustcorp.com.au> Changelog-Changed: lightningd: `log-file` option specified multiple times opens multiple log files.	2022-06-27 17:21:35 +09:30
Christian Decker	92f10f2c34	pyln: Fix relative path dependencies when publishing to PyPI So this was quite a journey: - We want relative depdendencies (using the `path` argument) whenever developing locally. Otherwise we would have to install each dependency every time we change a single character, which undoubtedly would cause us to waste time trying to debug an issue just because we forgot to install. - When publishing however we want to rely on the version number, since the repo context gets lost upon publishing, and path dependencies cause failures. The solution then it seems is to use `dev-dependencies` (not that surprising once you find it) with relative paths, so that `poetry install` uses these over the normal dependencies (no idea how they dedup them) and use `dependencies` when publishing. The paths are still in there when publishing, but `pip install` ignores them. I checked that `poetry install` from an unrelated project doesn't accidentally use the path dependencies, even when adding them as dev-dependencies. This should hopefully also allow installing them as a repo link, though I can't test that right now.	2022-06-26 13:54:01 +09:30
Rusty Russell	a196c77fe6	common: downgrade LND 'internal error' properly. Thanks to @zerofeerouting for another report. "desc" here is the sanitized message, eg: "ERROR error channel 0123456789abcdef0123456789abcdef0123456789abcdef0123456789abcdef: internal error" Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>	2022-06-26 13:52:14 +09:30
Rusty Russell	e48c0dda85	common: downgrade "internal error" errors from lnd. Prior to 0.11.0 we had cases where we would treat errors as warnings: regretfully, this is still needed. This message in particular has been widely reported, and it now causes channel force closes. Downgrade and log. I did insert some snarky log message earlier, but hey, I'm sure CLN has done worse things to our peers! Signed-off-by: Rusty Russell <rusty@rustcorp.com.au> Changelog-Fixed: Protocol: treat LND "internal error" as warnings, not force close events (as we did in v0.10).	2022-06-26 13:52:14 +09:30
Christian Decker	5444a843b6	git: Ignore some more generated files and the cln-grpc plugin	2022-06-24 16:27:13 +02:00
Vincenzo Palazzo	d4bc4f6460	signmessage: improve the UX of the rpc command when zbase is not a valid one Changelog-Fixed: signmessage: improve the UX of the rpc command when zbase is not a valid one Stacktrace generated with a bad `zbase` ``` lightningd: lightningd/signmessage.c:59: from_zbase32: Assertion `len == tal_bytelen(u8arr)' failed lightningd: FATAL SIGNAL 6 (version v0.11.1) 0x55b9b1b4e617 send_backtrace [...] ``` Signed-off-by: Vincenzo Palazzo <vincenzopalazzodev@gmail.com>	2022-06-24 14:02:04 +09:30
Dustin Dettmer	49c6459148	Update hsm error formatting. Changelog-None	2022-06-24 13:57:17 +09:30

... 15 16 17 18 19 ...

12177 Commits