mirrors/core-lightning

mirror of https://github.com/ElementsProject/lightning.git synced 2025-02-24 23:18:25 +01:00

Author	SHA1	Message	Date
Rusty Russell	5192eebef9	lightningd: wire channel closing tx through channel_fail_permanent. Cleans up the API: we have two functions now, one which is explicitly for "I'm failing this because I saw this tx onchain". Now we can correctly report the tx which closed the channel (previously we would always report our own tx(s)!). Signed-off-by: Rusty Russell <rusty@rustcorp.com.au> Changelog-Fixed: JSON-RPC: `close` now correctly reports the txid of the remote onchain unilateral tx if it races with a peer close. Changelog-Fixed: Protocol: we no longer try to spend anchors if a commitment tx is already mined (reported by @niftynei). Fixes: #7526	2024-11-25 20:23:21 +10:30
Rusty Russell	bfb94fe0c3	lightnind: make channel_set_state string arg const. Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>	2024-11-25 20:23:21 +10:30
Rusty Russell	656ac34756	lightningd: make close_txs parameter to resolve_close_command const. We don't need to change these txs. Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>	2024-11-25 20:23:21 +10:30
niftynei	193b4425ab	nit: spelling fix	2024-11-25 20:23:21 +10:30
Rusty Russell	5701123209	pytest: fix flake in test_gossip_force_broadcast_channel_msgs Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>	2024-11-25 15:39:13 +10:30
Rusty Russell	7cdf45bb00	pytest: fix flake in test_ping_timeout The seeker can send a full gossip query, which means the ping doesn't happen (it needs 14-45 seconds of quiet!). We disable the gossip_queries feature, so it doesn't ask. ``` def test_ping_timeout(node_factory): # Disconnects after this, but doesn't know it. l1_disconnects = ['xWIRE_PING'] l1, l2 = node_factory.get_nodes(2, opts=[{'dev-no-reconnect': None, 'disconnect': l1_disconnects}, {'dev-no-ping-timer': None}]) l1.rpc.connect(l2.info['id'], 'localhost', l2.port) # This can take 10 seconds (dev-fast-gossip means timer fires every 5 seconds) l1.daemon.wait_for_log('seeker: startup peer finished', timeout=15) # Ping timers runs at 15-45 seconds, but only fires if also 60 seconds # after previous traffic. > l1.daemon.wait_for_log('dev_disconnect: xWIRE_PING', timeout=60 + 45 + 5) tests/test_connection.py:4194: ... > raise TimeoutError('Unable to find "{}" in logs.'.format(exs)) E TimeoutError: Unable to find "[re.compile('dev_disconnect: xWIRE_PING')]" in logs. ``` Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>	2024-11-25 15:39:13 +10:30
Rusty Russell	faf7ae6ad4	pytest: add test for connection ratelimiting. Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>	2024-11-25 15:39:13 +10:30
Rusty Russell	3d294f813d	connectd: limit to 10 connections at once. We wait until a connection fails, or a subd is connected to the peer, before letting another one through. This should prevent us from overwhelming lightningd on large nodes, but unlike the previous back-off, it's based on how fast lightningd is, not an arbitrary time. We also let one through each second, in case we're connecting to many, but not doing anything but gossip (e.g. 100 explicit connect commands). Signed-off-by: Rusty Russell <rusty@rustcorp.com.au> Changelog-Changed: Reconnecting to peers at startup should be significantly faster (dependent on machine speed).	2024-11-25 15:39:13 +10:30
Rusty Russell	3587afeaa2	connectd: remove transient flag. The important flag replaces it, and now we can be more intelligent about eviction in overload. Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>	2024-11-25 15:39:13 +10:30
Rusty Russell	73b9812178	pytest: restore test_sendpay_grouping test. Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>	2024-11-25 15:39:13 +10:30
Rusty Russell	15950bb7d4	connectd: reconnect for non-transient connections. Rather than have lightningd call us repeatedly to try to connect, have it tell us what peers are transient and aren't, and connectd will automatically try to maintain that connection. There's a new "downgrade_peer" message to tell it a peer is now transient: to make it non-transient we simply tell connectd to connect as a non-transient. The first time, I missed that dual_open_control does its own state transitions :( Signed-off-by: Rusty Russell <rusty@rustcorp.com.au> Changelog-Changed: `connectd` now handles maintaining/reconnecting to important peers, and we remember the last successful address we connected to.	2024-11-25 15:39:13 +10:30
Rusty Russell	ff290b19c9	recovery: save last_known_addr for peer if we know it. This is more useful than the last address, which may be it connecting to us. And use it when we restore it. Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>	2024-11-25 15:39:13 +10:30
Rusty Russell	22a481fbaa	common: routine to make wireaddr_internal from wireaddr. Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>	2024-11-25 15:39:13 +10:30
Rusty Russell	68feb55dbf	wallet: save last known address. If we connected out, remember that address. We always remember the last address, but that may be an incoming address. This is explicitly the last outgoing address which worked. Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>	2024-11-25 15:39:13 +10:30
Rusty Russell	64af5db45c	lightningd: generalize peer_any_channel to filter on entire channel, not just state. We're going to use this to ask if there are any channels which make it important to reconnect to the peer. Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>	2024-11-25 15:39:13 +10:30
Rusty Russell	4ee59e7a49	connectd: expose --dev-no-reconnect and --dev-fast-reconnect options. Once connectd is controlling reconnections, it'll need these. Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>	2024-11-25 15:39:13 +10:30
Rusty Russell	c6fce50951	gossipd: don't tell connectd what address to connect to. In fact, only 951 of 17419 (5%) of node announcements are missing an address (and gossipd doesn't know if we can connect to Tor addresses anyway) so just check it has a node_announcement. Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>	2024-11-25 15:39:13 +10:30
Rusty Russell	23dc10cf81	connectd: get our own addresses to contact node from node_announcements. Let lightningd feed us hints to try first, but we can extract the addresses from node_announcement messages ourselves. (Lightningd used to ask gossipd on our behalf: this is far simpler!) One side effect of this is that we don't hand back address hints given to us by lightningd: it would use these again for reconnecting. This is breaks test_sendpay_grouping, so we disable it temporarily. Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>	2024-11-25 15:39:13 +10:30
Rusty Russell	5b92383b02	connectd: send self-advertizing gossip rather than having gossipd do it. It's now trivial for us to do this ourselves, since we have gossmap. Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>	2024-11-25 15:39:13 +10:30
Alex Myers	11580dfd43	pyln-testing: disable seeker autoconnect by default This avoids test flakes, but can be explicitly set if needed. Changelog-None	2024-11-24 12:03:16 +10:30
Christian Decker	c596550de1	common: Make trace debugging output configurate at compile time Just added a couple of compile-time guards and sprinkled the invariant checking in a couple of places (disabled if compile time guard is unset).	2024-11-24 10:24:31 +10:30
Christian Decker	557142627f	common: Fix a potential cycle in the trace structure It turns out that under some circumstances we end up clearing the pointee of `current` but not the pointer. Thus when we select the next slot we can end up reusing the same slot, making it its own parent. We forcefull break these cycles by enforcing that `current` should never be returned and be set as its own parent. Changelog-None	2024-11-24 10:24:31 +10:30
Christian Decker	a6c81d4174	common: Add a tree checker for trace spans Trace spans form a tree, but we don't actually check that the structure doesn't break. Breakage can for example come if we use the same key accidentally, making a new span its own ancestor.	2024-11-24 10:24:31 +10:30
Christian Decker	2e59ab8f15	common: Remove the recursive parent resolution in traces We have the space in memory set aside anyway, so let's just copy the `trace_id` into the span itself, rather than resolving the `root` at time of emission.	2024-11-24 10:24:31 +10:30
Christian Decker	57b9648d30	common: Resume the startup trace after exiting `io_loop` This was a bit harder to identify: during an `io_loop` run we suspend the current span before handing over to `io_loop`, and later when a callback is called we resume the span again. Depending on how we return from the `io_loop` instance that is used to drive the startup, we either have resumed the last span, or we don't. Since we start a span before `io_loop` and want it to be emitted afterwards, we need to take care of the case where we returned from a callback that did not resume, and therefore the current context is empty. Making `trace_span_resume` idempotent means we can just resume it manually. Ideally we'd push the suspend / resume logic down into `io_loop` itself, and then we'd have just one place. Maybe suspend and resume callbacks that can be configured in `io_loop`?	2024-11-24 10:24:31 +10:30
Christian Decker	1900dd53bf	db: Fix a broken span context pointer	2024-11-24 10:24:31 +10:30
Christian Decker	4f3ea8c048	common: Add some debuggig capabilities to the trace subsystem After adding the DB query instrumentation we ran into a couple of issues, with spans not being resumed correctly, and it was rather hard to identify the problem. This adds debug statements so we can trace the tracing (traception if you will). Changelog-None	2024-11-24 10:24:31 +10:30
Matt Whitlock	7a2006842f	lightningd/test/Makefile: add missing dependency on header_versions_gen.h lightningd/test/run-find_my_abspath.c includes ../lightningd.c, which includes header_versions_gen.h, a generated header file. lightningd/Makefile correctly declares that lightningd/lightningd.o depends on header_versions_gen.h, but lightningd/test/Makefile lacks any such declaration regarding lightningd/test/run-find_my_abspath.c, which leads to build failure: In file included from lightningd/test/run-find_my_abspath.c:5: lightningd/test/../lightningd.c:64:10: fatal error: header_versions_gen.h: No such file or directory 64 \| #include <header_versions_gen.h> \| ^~~~~~~~~~~~~~~~~~~~~~~ Declare the missing dependency in lightningd/test/Makefile so that Make will ensure that header_versions_gen.h is generated before it attempts to build lightningd/test/run-find_my_abspath.o. Changelog-None	2024-11-23 13:03:00 +01:00
daywalker90	54e7ac6872	startup_regtest: remove experimental-offers flag Changelog-None	2024-11-23 10:48:32 +10:30
Michael Cho	94c5695d6f	Makefile: fix defines on ARM macOS Due to Darwin-arm64 conditional setting of `CPPFLAGS`, the subsequent `CPPFLAGS +=` is resolved earlier on ARM macOS which results in empty paths being used. Changelog-None	2024-11-23 10:47:32 +10:30
Rusty Russell	dba9746d21	pytest: fix flake in test_gossip_pruning. If the first one doesn't use the entire timeout, the second might need longer (I used TIMEOUT=10 normally): ``` FAILED tests/test_gossip.py::test_gossip_pruning - TimeoutError: Unable to find "[re.compile('Pruning channel 103x1x0 from network view')]" in logs. ``` Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>	2024-11-23 10:20:30 +10:30
Rusty Russell	90ab9325a1	xpay: give an additional block "slack" for CLTV values. pay does this, xpay does not. Which means if a block comes in (or you're behind), you get gratuitous failures: ``` def test_xpay_simple(node_factory): l1, l2, l3, l4 = node_factory.get_nodes(4, opts={'may_reconnect': True}) node_factory.join_nodes([l1, l2, l3], wait_for_announce=True) node_factory.join_nodes([l3, l4], announce_channels=False) # BOLT 11, direct peer b11 = l2.rpc.invoice('10000msat', 'test_xpay_simple', 'test_xpay_simple bolt11')['bolt11'] > ret = l1.rpc.xpay(b11) tests/test_xpay.py:148: ... if not isinstance(resp, dict): raise TypeError("Malformed response, response is not a dictionary %s." % resp) elif "error" in resp: > raise RpcError(method, payload, resp['error']) E pyln.client.lightning.RpcError: RPC call failed: method: xpay, payload: ('lnbcrt100n1pn5qu7csp53rp0mfwtfsyyy8gzsggepnxgslyalwvz3jkg9ptmqq452ln2nmgqpp58ak9nmfz9l93r0fpm266ewyjrhurhatrs05nda0r03p82cykp0vsdp9w3jhxazl0pcxz72lwd5k6urvv5sxymmvwscnzxqyjw5qcqp99qxpqysgqa798258yppu2tlfj8herr3zuz0zgux79zvtx6z57cmfzs2wdesmr4nvnkcmyssyu6k64ud54eg0v45c3mcw342jj6uy7tu202p6klrcp6ljc9w',), error: {'code': 203, 'message': "Destination said it doesn't know invoice: incorrect_or_unknown_payment_details"} ``` Signed-off-by: Rusty Russell <rusty@rustcorp.com.au> Changelog-None: xpay is new this release.	2024-11-23 10:20:30 +10:30
Rusty Russell	4186591a70	pyln-client: restore backwards compatibility with CLN prior to 24.08 24.05 and before requires a "description" field. We should not have removed it here until that was EOL! Changelog-Fixed: pyln-client: plugins now compatible with CLN <= 24.05 (broken in 24.08) Reported-by: Christian Decker Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>	2024-11-22 09:12:27 +01:00
Rusty Russell	d5c0d21db8	gossipd: hand gossmap to gossmap_manage_get_node_addresses, not gossmap_manage. We don't want to to refresh the gossmap internally: this could invalidate the gossmap held by the current callers. Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>	2024-11-22 15:21:45 +10:30
Rusty Russell	69c252e06f	gossmap: implement gossmap_random_node(), use it in gossipd. It's easy for gossmap, since it has access to the htable. Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>	2024-11-22 15:21:45 +10:30
Alex Myers	363b721cd3	gossipd: use autoconnect-seeker-peers setting	2024-11-22 15:21:45 +10:30
Alex Myers	dc878dc937	lightningd: add option for minimum seeker autoconnect peers Changelog-added: Added option --autoconnect-seeker-peers, allowing seeker to reach out to new nodes for additional gossip.	2024-11-22 15:21:45 +10:30
Alex Myers	f2243e6013	pytest: Add seeker autoconnect test	2024-11-22 15:21:45 +10:30
Alex Myers	dff5c893e7	gossipd: seeker: select random peer and tell lightningd This does not validate a node announcement and address, but it does select a node at random from the gossmap and asks lightningd to attempt a connection to it.	2024-11-22 15:21:45 +10:30
Alex Myers	7fc214a67f	gossipd: add request to connect to new gossip peer Gossipd uses this to ask lightningd -> connectd to initiate a connection to a new gossip peer. This can be used when there are insufficient peers already connected to gossip with. Changelog-Changed: Gossipd can now request connections to additional nodes for improved gossip sync	2024-11-22 15:21:45 +10:30
Alex Myers	9cba417ed0	gossipd: seeker: allocate gossiper array at init This will let us change the default gossipers at runtime	2024-11-22 15:21:45 +10:30
Rusty Russell	9295b4f77e	common/test: fix -O3 compile error with gcc-12 (Ubuntu 12.3.0-17ubuntu1) 12.3.0 ``` common/test/run-splice_script.c: In function ‘main’: common/test/run-splice_script.c:349:17: error: ‘%.s’ directive argument is null [-Werror=format-overflow=] 349 \| printf("%.s\n", (int)len, str); \| ^~~~ cc1: all warnings being treated as errors make: * [Makefile:297: common/test/run-splice_script.o] Error 1 make: * Waiting for unfinished jobs.... ``` Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>	2024-11-22 15:21:45 +10:30
Rusty Russell	8566370087	pytest: fix flake in test_gossip_force_broadcast_channel_msgs We can get more gossip_filter messages now. And we can also go over max-messages, so increase that too. ``` del tally['query_short_channel_ids'] del tally['query_channel_range'] del tally['ping'] > assert tally == {'channel_announce': 1, 'channel_update': 3, 'node_announce': 1, 'gossip_filter': 1} E AssertionError: assert {'channel_ann..._announce': 1} == {'channel_ann..._announce': 1} E Omitting 2 identical items, use -vv to show E Differing items: E {'gossip_filter': 2} != {'gossip_filter': 1} E {'channel_update': 2} != {'channel_update': 3} E Full diff: E { E 'channel_announce': 1,... E E ...Full output truncated (10 lines hidden), use '-vv' to show tests/test_gossip.py:2326: AssertionError ``` Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>	2024-11-22 14:01:44 +10:30
Rusty Russell	a295099ace	pytest: fix flake in test_onchaind_replay. We actually mine 300 blocks, not 200, and if timing is right l1 can have mined the txid before mine_txid_or_rbf() checks the mempool: ``` def test_onchaind_replay(node_factory, bitcoind): disconnects = ['+WIRE_REVOKE_AND_ACK', 'permfail'] # Feerates identical so we don't get gratuitous commit to update them l1, l2 = node_factory.line_graph(2, opts=[{'watchtime-blocks': 201, 'cltv-delta': 101, 'disconnect': disconnects, 'feerates': (7500, 7500, 7500, 7500)}, {'watchtime-blocks': 201, 'cltv-delta': 101}], wait_for_announce=True) inv = l2.rpc.invoice(108, 'onchaind_replay', 'desc') rhash = inv['payment_hash'] routestep = { 'amount_msat': 108 - 1, 'id': l2.info['id'], 'delay': 101, 'channel': first_scid(l1, l2) } l1.rpc.sendpay([routestep], rhash, payment_secret=inv['payment_secret']) l1.daemon.wait_for_log('sendrawtx exit 0') bitcoind.generate_block(1, wait_for_mempool=1) # Wait for nodes to notice the failure, this seach needle is after the # DB commit so we're sure the tx entries in onchaindtxs have been added l1.daemon.wait_for_log("Deleting channel .* due to the funding outpoint being spent") l2.daemon.wait_for_log("Deleting channel .* due to the funding outpoint being spent") # We should at least have the init tx now assert len(l1.db_query("SELECT * FROM channeltxs;")) > 0 assert len(l2.db_query("SELECT * FROM channeltxs;")) > 0 # Generate some blocks so we restart the onchaind from DB (we rescan # last_height - 100) bitcoind.generate_block(100) sync_blockheight(bitcoind, [l1, l2]) # l1 should still have a running onchaind assert len(l1.db_query("SELECT * FROM channeltxs;")) > 0 l2.rpc.stop() l1.restart() # Can't wait for it, it's after the "Server started" wait in restart() assert l1.daemon.is_in_log(r'Restarting onchaind \(ONCHAIN\): closed in block 109') # l1 should still notice that the funding was spent and that we should react to it _, txid, blocks = l1.wait_for_onchaind_tx('OUR_DELAYED_RETURN_TO_WALLET', 'OUR_UNILATERAL/DELAYED_OUTPUT_TO_US') assert blocks == 200 bitcoind.generate_block(200) # Could be RBF! > l1.mine_txid_or_rbf(txid) tests/test_closing.py:1864: _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ contrib/pyln-testing/pyln/testing/utils.py:1375: in mine_txid_or_rbf wait_for(lambda: rbf_or_txid_broadcast(txids)) _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ success = <function LightningNode.mine_txid_or_rbf.<locals>.<lambda> at 0x7f9b129c4550> timeout = 180 def wait_for(success, timeout=TIMEOUT): start_time = time.time() interval = 0.25 while not success(): time_left = start_time + timeout - time.time() if time_left <= 0: > raise ValueError("Timeout while waiting for {}".format(success)) E ValueError: Timeout while waiting for <function LightningNode.mine_txid_or_rbf.<locals>.<lambda> at 0x7f9b129c4550> ``` Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>	2024-11-22 14:01:44 +10:30
Rusty Russell	8132d19ab5	configure: make configuration with address sanitizer find zlib. The test program has a leak, so address sanitizer complains and makes it "fail" the zlib detection test! Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>	2024-11-22 14:01:44 +10:30
Jesse de Wit	a90d9c9f4f	tests: add pay test over unannounced channels This test fails with cln v24.08.2. Add this test, so it doesn't happen again. Changelog-None	2024-11-21 11:22:26 +01:00
Rusty Russell	2c9023ee25	pytest: reenable askrene bias test. We can fix the median calc by removing the (unused) reverse edges. Also analyze the failure case in test_real_data: it's a real edge case, so hardcode that one as "ok". Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>	2024-11-21 16:17:52 +10:30
Lagrang3	05514b46e3	Askrene: change median factor to 1. The ratio of the median of the fees and probability cost is overall not a bad factor to combine these two features. This is what the test_real_data shows. Changelog-None Signed-off-by: Lagrang3 <lagrang3@protonmail.com>	2024-11-21 16:17:52 +10:30
Lagrang3	2b3fd67dfb	askrene: don't skip fee_fallback test The fee_fallback test would fail after fixing the computation of the median. Now by we can restore it by making the probability cost factor 1000x higher than the ratio of the median. This shows how hard it is to combine fee and probability costs and why is the current approach so fragile. Changelog-None Signed-off-by: Lagrang3 <lagrang3@protonmail.com>	2024-11-21 16:17:52 +10:30
Lagrang3	9fdcc26d1d	askrene: bugfix queue overflow Changelog-none Signed-off-by: Lagrang3 <lagrang3@protonmail.com>	2024-11-21 16:17:52 +10:30

... 3 4 5 6 7 ...

15932 commits