mirrors/core-lightning

mirror of https://github.com/ElementsProject/lightning.git synced 2025-02-24 15:10:51 +01:00

Author	SHA1	Message	Date
Rusty Russell	2b4b1479ed	gossipd: check that gossmap code sees updates from gossip_store writes. After analyzing various weird cases where we ended up with duplicate gossip_store entries, it could be explained by us not fully processing the gossip store. It's not clear that my assumptions that we would always see our own writes are true: technically this may require an fsync(). So we now add the check, and do an fsync and try again. Signed-off-by: Rusty Russell <rusty@rustcorp.com.au> Changelog-Fixed: gossipd: more sanity checks that we are correctly updating the gossip_store file.	2025-02-11 15:11:47 -06:00
Rusty Russell	1df1300cc9	gossip_store: don't need to check for truncated amounts. That's actually caught by the gossmap load now. Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>	2025-02-11 15:11:47 -06:00
Rusty Russell	9d98740e18	gossmap: stricter checks when gossipd itself loads the gossip_store. This means we will correctly reset the store if it has redundant records, for example. Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>	2025-02-11 15:11:47 -06:00
Rusty Russell	05bc4ca5f3	gossmap: use mmap directly to check checksums. Instead of making a copy. To measure the performance impact, I timed tests/test_askrene.py::test_real_biases on my laptop. No checksum check: 194.52s Copying for checksum check: 202.81s Zero-copy checksum check: 194.40s But these numbers proved noisy. Still, doesn't hurt. Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>	2025-02-11 15:11:47 -06:00
Rusty Russell	4b5e5b27ae	gossmap: check checksums. We assume if it's incorrect, we simply need to wait. If this proves incorrect, we will see a stream of BROKEN log messages. To measure the performance impact, I timed tests/test_askrene.py::test_real_biases on my laptop. Before: 194.52s After: 202.81s So it's marginal. Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>	2025-02-11 15:11:47 -06:00
Rusty Russell	5e2f6c5028	gossmap: don't stop reading if we hit a redundant channel_announce. While this shouldn't happen, it does (pending other fixes), and we stop reading the gossip store until next time. The result is partial gossip, demonstrated beautifully by NicolasDorier's report: ``` lightning_gossipd: gossmap: redundant channel_announce for 864063x1306x1, offsets 1272259 and 1784859!" ``` Gossipd stalld there and don't make more progress. So gossipd itself doesn't see the entire gossip_store. Then things get really batshit: ``` 2025-02-04T05:53:28.582Z DEBUG gossipd: Store compact time: 1429910 msec ``` This took 1429 seconds to process. Why? Because it hasn't been processing the gossip store fully, gossipd kept adding "new" records to the end: ``` 2025-02-04T05:53:28.583Z DEBUG gossipd: gossip_store: Read 62716143/1739952/5158256/0 cannounce/cupdate/nannounce/delete from store in 31634458462 bytes, now 31634458440 bytes (populated=true) ``` It has 31GB of gossip in there! No wonder it took so long... Signed-off-by: Rusty Russell <rusty@rustcorp.com.au> Fixes: https://github.com/ElementsProject/lightning/issues/8035 Changelog-Fixed: gossipd: corruption in the gossip_store could cause ever-longer startup times and no gossip updates.	2025-02-11 15:11:47 -06:00
Rusty Russell	fdfc7ce62f	gossmap: add (and use) logging hook. Default goes to stderr for LOG_UNUSUAL and higher. We have to whitelist more cases in map_catchup so we don't spam the logs with perfectly-expected (but ignored) messages though. Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>	2025-02-11 15:11:47 -06:00
Rusty Russell	607b14fe12	common/gossmap: remove open-by-fd. We only use it in one place, and that was simply to share an fd between gossipd writing and gossipd reading, which may be causing our zfs problem anyway. In fact, it fixes a race if we don't have HAVE_PWRITEV. Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>	2025-02-11 15:11:47 -06:00
Rusty Russell	927d062b04	gossmap: don't crash if we hit a zero-length record. We have a report of this happening under ZFS. We cannot do much if this really is a problem where we can't read back what we write, but this avoids the immediate crash. Fixes: https://github.com/ElementsProject/lightning/issues/7971 Signed-off-by: Rusty Russell <rusty@rustcorp.com.au> Changelog-Fixed: gossmap: occasional crash (at least on ZFS) reading gossip_store.	2025-02-11 15:11:47 -06:00
Rusty Russell	b6c1ffa359	ccan/htable: update to explicit DUPS/NODUPS types. The updated API requires typed htables to explicitly state whether they allow duplicates: for most cases we don't, but we've had issues in the past. This is a big patch, but mainly mechanical. Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>	2025-01-21 09:18:25 +10:30
Rusty Russell	69c252e06f	gossmap: implement gossmap_random_node(), use it in gossipd. It's easy for gossmap, since it has access to the htable. Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>	2024-11-22 15:21:45 +10:30
Rusty Russell	0baac77a1c	gossmap: allow gossmap_chan_get_update_details on locally-modified channels. In particular, this lets you find the exact htlc_maximum_msat/htlc_minimum_msat values. This means we actually create real channel_updates for local mods, which requires a second "local" scratch region. Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>	2024-10-15 09:58:04 +10:30
Rusty Russell	4ee9d1d2f2	gossmap: include cltv_expiry_delta in gossmap_chan_get_update_details for completeness. Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>	2024-10-15 09:58:04 +10:30
Rusty Russell	d067066b17	common/gossmap: use u64 for all offsets. Since we don't compact the gossmap on the fly (FIXME!) we can easily surpass 4GB in the gossmap, and 32 bit offsets are not sufficient. I'm a bit surprised we don't crash immediately, but we've definitely seen issues. Changelog-Fixed: gossipd: crash errors with large gossip_store (>4MB) growth on longer-running nodes. Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>	2024-10-08 09:50:17 +02:00
Rusty Russell	5052f0763f	gossmap: keep capacity for locally-generated channels as well. It was weird not to have a capacity associated with localmods channels, and fixing it has some very nice side effects. Now the gossmap_chan_get_capacity() call never fails (we prevented reading of channels from gossmap in the partially-written case already), so we make it return the capacity. We do this in msat, because that's what all the callers want. Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>	2024-10-04 11:27:53 +09:30
Rusty Russell	a65e325b13	gossmap: implement partial updates. This is actually what we want in several places: to only override one or two fields in a channel_update. We add a gossmap_local_setchan() with a similar API to the old gossmap_local_updatechan(), for the case where we want to set every field. Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>	2024-10-04 11:27:53 +09:30
Rusty Russell	bc1aabb014	gossmap: don't crash on localmods on non-existant channels. We allow adding them, but crash when we remove the localmods. Yet this could theoretically happen if a channel we modified was removed from the gossmap, anyway. Reported-by: Lagrang3 <lagrang3@protonmail.com> Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>	2024-10-04 11:27:53 +09:30
Rusty Russell	e11bab8bbb	gossmap: don't process channel_announcement until amount is present. This simplifies the callers significantly: all channel_announcements now have an amount, so gossmap_chan_get_capacity() only fails on a local modification. Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>	2024-08-07 20:35:30 +09:30
Rusty Russell	15fb37f6d1	common: fix endless loop in gossmap iteration. If we need to iterate forward to find a timestamp (only happens if we have gossip older than 2 hours), we didn't exit the loop, as it didn't actually move the offset. Fixes: https://github.com/ElementsProject/lightning/issues/7462 Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>	2024-07-25 12:30:42 -07:00
Rusty Russell	b29b96aae8	common: hoist scidd->pubkey conversion function into gossmap. We will want to use it in the pay plugin too. Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>	2024-07-18 10:53:55 +09:30
Rusty Russell	ba2bb5531d	gossmap: add linear streaming interface. Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>	2024-07-10 12:21:19 +09:30
Rusty Russell	6b91497223	common: make gossmap ignore redundant channel_announcements. This seems to be happening to some people, so don't panic. Unfortunately we don't have a good error callback here, so msg to stderr. Fixes: https://github.com/ElementsProject/lightning/issues/7249 Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>	2024-05-23 20:23:36 +02:00
Rusty Russell	744116e501	gossipd: make extra-sure we don't put in redundant channel_announcement messages. We only write these in two places: one where we get a message from lightningd about our own channel, and one where we get a reply from lightningd about a txout check. The former case we explicitly check that we don't already have it in gossmap, so add checks to the latter case, and give verbose detail if it's found. Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>	2024-05-23 20:23:36 +02:00
Rusty Russell	9450d46db1	bitcoin/short_channel_id: pass by copy everywhere. It's a u64, we should pass by copy. This is a big sweeping change, but mainly mechanical (change one, compile, fix breakage, repeat). Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>	2024-03-20 13:51:48 +10:30
Rusty Russell	e0e879c003	common: remove type_to_string files altogther. This means including <common/utils.h> where it was indirectly included. Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>	2024-03-20 13:51:48 +10:30
Rusty Russell	0a7e6211df	common: fix uninitialized member in gossmap. Wrote a test program which passed num_channel_updates_rejected as NULL (which we don't usually do), and valgrind complained: ``` ==1048302== Conditional jump or move depends on uninitialised value(s) ==1048302== at 0x118B90: update_channel (gossmap.c:550) ==1048302== by 0x119EEE: map_catchup (gossmap.c:663) ==1048302== by 0x11A299: load_gossip_store (gossmap.c:726) ==1048302== by 0x11A352: gossmap_load (gossmap.c:1052) ==1048302== by 0x125362: main (run-route-infloop.c:90) ``` Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>	2024-03-07 14:09:14 +01:00
Rusty Russell	87f6ceb721	gossmap: fix OpenBSD crash. Thanks to amazing debugging assistance from grubles, we figured out that indeed, my memory was correct: write and mmap are not consistent on all platforms. The easiest fix is to disable mmap on OpenBSD for now: the better fix is to do in-place updates using the mmap, and only rely on write() for append (which always causes a remap anyway before it's accessed). Fixes: https://github.com/ElementsProject/lightning/issues/7109 Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>	2024-02-27 15:33:04 +01:00
Rusty Russell	5135658805	common: add gossmap_chan_is_dying() helper to check flags. And fix up gossip_store backwards comment! Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>	2024-02-12 11:43:33 +01:00
Rusty Russell	e7ceffd565	gossipd: remove zombie handling. We never enabled it, because we seemed to be eliminating valid channels. We discard zombie-marked records on loading. Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>	2024-02-04 09:24:44 +10:30
Rusty Russell	ce39309c0c	common: optional gossmap callbacks for better failure handling. In particular, allow callers to see unknown records we ignore (and let them fail as a result), and get called if we can't pack a channel_update into our internal format. Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>	2024-02-04 09:24:44 +10:30
Rusty Russell	f2cf353431	common: gossmap method to load fd directly, not filename. And helpers to tell if a node_announcement exists, and get a full channel_update. Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>	2024-02-04 09:24:44 +10:30
Rusty Russell	37ccca5d69	common/gossmap: remove now-unused private flag. The only way you'll see private channel_updates is if you put them there yourself with localmods. I also renamed the confusing gossmap_chan_capacity to gossmap_chan_has_capacity. Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>	2024-02-04 09:24:44 +10:30
Rusty Russell	8454e4910a	topology: don't call gossmap for locall added channels. This happens in deprecated mode, and we get bogus results. Valgrind caught it! Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>	2024-01-31 13:39:23 +10:30
Rusty Russell	4b92c773df	common: gossmap now always ignores private gossip_store messages. In the next PR, they'll be removed, but for now all our code doesn't want them. Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>	2023-12-14 09:16:56 +10:30
Rusty Russell	f2fff4de55	gossmap: insert temporary per-caller flag to turn off private gossip. This lets us convert one user at a time. Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>	2023-12-14 09:16:56 +10:30
Rusty Russell	be3a59c7c3	gossmap: fix false valgrind uninitialized error on arm64, ppc. Doesn't happen on x86, but struct gossmap_chan defines: ``` u32 private: 1; u32 plus_scid_off: 31; ``` And complains when we initialize plus_scid_off and access it later: ``` VALGRIND=1 valgrind -q --error-exitcode=7 --track-origins=yes --leak-check=full --show-reachable=yes --errors-for-leak-kinds=all plugins/renepay/test/run-mcf > /dev/null ==186886== Conditional jump or move depends on uninitialised value(s) ==186886== at 0x10076388: chan_iter (gossmap.c:1098) ==186886== by 0x100797F3: gossmap_next_chan (gossmap.c:1112) ==186886== by 0x1008C5AF: main (run-mcf.c:309) ==186886== Uninitialised value was created by a heap allocation ==186886== at 0x40F0A44: malloc (vg_replace_malloc.c:431) ==186886== by 0x10072BAF: allocate (tal.c:256) ==186886== by 0x100737A7: tal_alloc_ (tal.c:463) ==186886== by 0x100738DF: tal_alloc_arr_ (tal.c:506) ==186886== by 0x10079507: load_gossip_store (gossmap.c:690) ==186886== by 0x10079667: gossmap_load (gossmap.c:978) ==186886== by 0x1008C4AF: main (run-mcf.c:295) ``` Reported-by: @grubles Signed-off-by: Rusty Russell <rusty@rustcorp.com.au> Fixes: #6557	2023-08-18 16:21:57 +09:30
Rusty Russell	2005ca436e	common/gossmap: don't memcpy NULL, 0, and don't add 0 to NULL pointer. Of course, NULL and length 0 are natural partners, but We Can't Have Nice Things. Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>	2023-04-05 06:12:24 +09:30
Vincenzo Palazzo	a104380e49	fix: fixes `FATAL SIGNAL 11` on gossmap node This will fix a crash that I caused on armv7 and by looking inside the coredump with gdb (by adding an assert on n that must be different from null) I get the following stacktrace ``` (gdb) bt \#0 0x00000000 in ?? () \#1 0x0043a038 in send_backtrace (why=0xbe9e3600 "FATAL SIGNAL 11") at common/daemon.c:36 \#2 0x0043a0ec in crashdump (sig=11) at common/daemon.c:46 \#3 <signal handler called> \#4 0x00406d04 in node_announcement (map=0x938ecc, nann_off=495146) at common/gossmap.c:586 \#5 0x00406fec in map_catchup (map=0x938ecc, num_rejected=0xbe9e3a40) at common/gossmap.c:643 \#6 0x004073a4 in load_gossip_store (map=0x938ecc, num_rejected=0xbe9e3a40) at common/gossmap.c:697 \#7 0x00408244 in gossmap_load (ctx=0x0, filename=0x4e16b8 "gossip_store", num_channel_updates_rejected=0xbe9e3a40) at common/gossmap.c:976 \#8 0x0041a548 in init (p=0x93831c, buf=0x9399d4 "\n\n{\"jsonrpc\":\"2.0\",\"id\":\"cln:init#25\",\"method\":\"init\",\"params\":{\"options\":{},\"configuration\":{\"lightning-dir\":\"/home/vincent/.lightning/testnet\",\"rpc-file\":\"lightning-rpc\",\"startup\":true,\"network\":\"te"..., config=0x939cdc) at plugins/topology.c:622 \#9 0x0041e5d0 in handle_init (cmd=0x938934, buf=0x9399d4 "\n\n{\"jsonrpc\":\"2.0\",\"id\":\"cln:init#25\",\"method\":\"init\",\"params\":{\"options\":{},\"configuration\":{\"lightning-dir\":\"/home/vincent/.lightning/testnet\",\"rpc-file\":\"lightning-rpc\",\"startup\":true,\"network\":\"te"..., params=0x939c8c) at plugins/libplugin.c:1208 \#10 0x0041fc04 in ld_command_handle (plugin=0x93831c, toks=0x939bec) at plugins/libplugin.c:1572 \#11 0x00420050 in ld_read_json_one (plugin=0x93831c) at plugins/libplugin.c:1667 \#12 0x004201bc in ld_read_json (conn=0x9391c4, plugin=0x93831c) at plugins/libplugin.c:1687 \#13 0x004cb82c in next_plan (conn=0x9391c4, plan=0x9391d8) at ccan/ccan/io/io.c:59 \#14 0x004cc67c in do_plan (conn=0x9391c4, plan=0x9391d8, idle_on_epipe=false) at ccan/ccan/io/io.c:407 \#15 0x004cc6dc in io_ready (conn=0x9391c4, pollflags=1) at ccan/ccan/io/io.c:417 \#16 0x004cf8cc in io_loop (timers=0x9383c4, expired=0xbe9e3ce4) at ccan/ccan/io/poll.c:453 \#17 0x00420af4 in plugin_main (argv=0xbe9e3eb4, init=0x41a46c <init>, restartability=PLUGIN_STATIC, init_rpc=true, features=0x0, commands=0x6167e8 <commands>, num_commands=4, notif_subs=0x0, num_notif_subs=0, hook_subs=0x0, num_hook_subs=0, notif_topics=0x0, num_notif_topics=0) at plugins/libplugin.c:1891 \#18 0x0041a6f8 in main (argc=1, argv=0xbe9e3eb4) at plugins/topology.c:679 ``` I do not know if this is a solution because I do not know when I can parse a node announcement for a node that it is not longer in the gossip map. So, I hope this is just usefult for @rustyrussell Changelog-Fixed: fixes `FATAL SIGNAL 11` on gossmap node announcement parsing. Signed-off-by: Vincenzo Palazzo <vincenzopalazzodev@gmail.com>	2023-02-13 17:51:41 -06:00
Rusty Russell	0274d88bad	common/gossip_store: clean up header. It's actually two separate u16 fields, so actually treat it as such! Cleans up zombie handling code a bit too. Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>	2023-01-30 15:15:41 -06:00
Alex Myers	1bae8cd28a	gossipd: zombify inactive channels instead of pruning Though BOLT 7 says a channel may be pruned when one side becomes inactive and fails to refresh their channel_update, in practice, the channel_announcement can be difficult to recover if deleted entirely. Here the channel_announcement is tagged as zombie such that gossip_store consumers may safely ignore it, but it may be retained should the channel come back online in the future. Node_announcements and channel_updates may also be retained in such a fashion until the channel is ready to be resurrected. Changelog-Fixed: Pruned channels are more reliably restored.	2023-01-30 16:33:03 +10:30
Rusty Russell	5dfcd15782	all: no longer need to call htable_clear to free htable contents. Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>	2023-01-12 11:44:10 +10:30
Rusty Russell	4a570c9419	gossmap: ensure htables are always tal objects. We want to change the htable allocator to use tal, which will need this. Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>	2023-01-12 11:44:10 +10:30
Rusty Russell	4bc10579e6	listincoming: add htlc_min_msat, public and peer_features fields. This is needed for offers to generate blinded paths. No documentation changes since listincoming is an undocumented internal hack interface which topology presents for production of routehints. Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>	2022-11-09 15:08:03 +01:00
Rusty Russell	82d98e4b96	gossmap: move gossmap_guess_node_id to pay plugin. This removes a point32 dependency. Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>	2022-10-26 11:29:06 +10:30
Rusty Russell	bed905a394	lightningd: use 33 byte pubkeys internally. We still use 32 bytes on the wire, but internally don't use x-only. Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>	2022-10-26 11:29:06 +10:30
Rusty Russell	bb49e1bea5	common: assume htlc_maximum_msat, don't check bit any more. Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>	2022-09-24 15:22:27 +09:30
Rusty Russell	253b25522b	BOLT: update to version which requires option_channel_htlc_max. We will now simply reject old-style ones as invalid. Turns out the only trace we could find is a channel between two nodes unconnected to the rest of the network. Signed-off-by: Rusty Russell <rusty@rustcorp.com.au> Changelog-Changed: Protocol: We now require all channel_update messages include htlc_maximum_msat (as per latest BOLTs)	2022-09-24 15:22:27 +09:30
Rusty Russell	6338758018	gossmap: make API more robust against future changes. Many changes to gossmap (including the pending ones!) don't actually concern readers, as long as they obey certain rules: 1. Ignore unknown messages. 2. Treat all 16 upper bits of length as flags, ignore unknown ones. So now we split the version byte into MAJOR and MINOR, and you can ignore MINOR changes. We don't expose the internal version (for creating the map) programmatically: you should really hardcode what major version you understand! Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>	2022-09-24 15:22:27 +09:30
Rusty Russell	fd71dfc7f7	gossmap: optimize asserts(). They are surprisingly expensive! Running `time ./plugins/renepay/test/run-not_mcf-gossmap gossip_store-sgl.rustcorp.com.au-2022-04-19 024b9a1fa8e006f1e3937f65f66c408e6da8e1ca728ea43222a7381df1cc449605 02ebb3b8a2316b3e876ea3f3d8124a3ab97f30b128f619608eb06b5251235dc2d9 10000000000 0.1`: Before (-Og): real 0m1.495s Before (no opt): real 0m2.552s After (-Og): real 0m0.579s After (no opt): real 0m1.061s Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>	2022-09-19 10:18:55 +09:30
Rusty Russell	4cdb4167d2	gossmap: make local_addchan create private channel_announcement in correct order. Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>	2022-09-19 10:18:55 +09:30

1 2

79 commits