This looked like a test flake, but was real:
```
l1.daemon.wait_for_log("closing soon due to the funding outpoint being spent")
# We won't gossip the dead channel any more (but we still propagate node_announcement). But connectd is not explicitly synced, so wait for "a bit".
time.sleep(1)
> assert len(get_gossip(l1)) == 2
E assert 4 == 2
```
We can see that two channel_updates come in *after* we mark it dying:
```
gossipd: channel 103x1x0 closing soon due to the funding outpoint being spent
gossipd: REPLY WIRE_GOSSIPD_NEW_BLOCKHEIGHT_REPLY with 0 fds
022d223620a359a47ff7f7ac447c85c46c923da53389221a0054c11c1e3ca31d59-gossipd: Received channel_update for channel 103x1x0/0 now DISABLED
022d223620a359a47ff7f7ac447c85c46c923da53389221a0054c11c1e3ca31d59-gossipd: Received channel_update for channel 103x1x0/1 now DISABLED
```
We should keep marking channel_updates the same way.
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
We don't actually delete them for 12 blocks, but we can't avoid
propagating them. We don't mark node_announcements, which is a bit
weird, but avoids us tracking logic to un-dying them if a channel is
opened.
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
Reported in #6270, there was an attempt to delete gossip overrunning
the end of the gossip_store. This logs the gossip type that was attempted to be deleted and avoids an immediate crash (tombstones would be fine to
skip over at least.)
Changelog-None
The push bit was convenient for connectd to send our own gossip
to peers upon connecting by naively traversing the gossip_store
and sending anything flagged `push`. This function is now
performed by gossipd leaving no use for the push bit.
Changelog-Changed: `gossipd`: gossip_store PUSH bit is no longer set.
Loading the gossip_store would not create a pending node announcement
when the node already had a zombie channel. This would cause the node
announcement to attempt to be loaded, but fail because it had no
broadcastable channels. Accepting a pending node announcement as when
normally loading from the channel corrects this.
`node_has_public_channels` taking into account zombie channels enables
this behavior.
Separately, node_announcements were still being flagged as zombies
in the gossip store despite that feature being removed.
Changelog-None
Without inheriting zombie status, gossipd would allow regular channel updates
into the store until the pruning cycle hits (and the channel is properly
flagged) which is 3.5 days. Applying zombie status when reading channel
updates from the store prevents this.
Changelog-None
This simplifies things (we'll get node_announcement if they ever
rebroadcast), since we clearly have an issue with node_announcement for
zombie nodes.
Changes:
1. Remove now-unused gossip_store_mark_nannounce_zombie and resurrect_nannouncements.
2. Don't consider zombie channels to count when deciding whether to move node_announcement
(node_announcement must be preceded by at least one broadcastable channel_announcement).
3. Treat incoming node_announcement where we have all-zombie channels the same as if
we had no channels.
4. Remove node_announcement whenever we have no announcable channels (not just zombies).
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
It's actually two separate u16 fields, so actually treat it as
such!
Cleans up zombie handling code a bit too.
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
Though BOLT 7 says a channel may be pruned when one side becomes inactive
and fails to refresh their channel_update, in practice, the
channel_announcement can be difficult to recover if deleted entirely.
Here the channel_announcement is tagged as zombie such that gossip_store
consumers may safely ignore it, but it may be retained should the channel
come back online in the future. Node_announcements and channel_updates may
also be retained in such a fashion until the channel is ready to be
resurrected.
Changelog-Fixed: Pruned channels are more reliably restored.
This adds a new "chan_dying" message to the gossip_store, but since we
already changed the minor version in this PR, we don't bump it again.
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
Changelog-Added: Protocol: We now delay forgetting funding-spent channels for 12 blocks (as per latest BOLTs, to support splicing in future).
Many changes to gossmap (including the pending ones!) don't actually
concern readers, as long as they obey certain rules:
1. Ignore unknown messages.
2. Treat all 16 upper bits of length as flags, ignore unknown ones.
So now we split the version byte into MAJOR and MINOR, and you can
ignore MINOR changes.
We don't expose the internal version (for creating the map)
programmatically: you should really hardcode what major version you
understand!
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
If they really upgrade directly from 0.9.2, it will simply delete the
store and re-fetch it.
We still update from v9 (which could be v0.11), since it's a noop.
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
routing.c fixed to properly remove rate-limited gossip_store entries
when channels are closed. This caused gossipd to crash on a subsequent
gossip_store_load. Also corrects an overzealous limit of one gossip_store
entry per message (should now allow one broadcastable and one
rate-limited). Addresses issues 5387, 5395.
Changelog-None
routing.c now flags rate-limited gossip as it enters the gossip_store but
makes use of it in updating the routing graph. Flagged gossip is not
rebroadcast to gossip peers.
Changelog-Changed: gossipd: now accepts spam gossip, but squelches it for
peers.
This will be used to decouple internal use of gossip from what is
passed to gossip peers. Updates GOSSIP_STORE_VERION to 10.
Changelog-Changed: gossip_store updated to version 10.
@whitslack complained of large CPU usage by connectd at startup;
I ran perf record on connectd on my machine (which sees a little spike, only)
and I see the cost of reading and discarding the entries:
```
- 95.52% 5.24% lightning_conne lightning_connectd [.] gossip_store_next
- 90.28% gossip_store_next
+ 40.27% tal_alloc_arr_
+ 22.78% tal_free
+ 11.74% crc32c
+ 9.32% fromwire_peektype
+ 4.10% __libc_pread64 (inlined)
1.70% be32_to_cpu
```
Much of this is caused by the search for our own gossip: keeping this separately
would be even better, but this fix is minimal.
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
Changelog-Fixed: connectd: reduce initial CPU load when connecting to peers.
And turn "" includes into full-path (which makes it easier to put
config.h first, and finds some cases check-includes.sh missed
previously).
config.h sets _GNU_SOURCE which really needs to be done before any
'#includes': we mainly got away with it with glibc, but other platforms
like Alpine may have stricter requirements.
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
Before:
Ten builds, laptop -j5, no ccache:
```
real 0m36.686000-38.956000(38.608+/-0.65)s
user 2m32.864000-42.253000(40.7545+/-2.7)s
sys 0m16.618000-18.316000(17.8531+/-0.48)s
```
Ten builds, laptop -j5, ccache (warm):
```
real 0m8.212000-8.577000(8.39989+/-0.13)s
user 0m12.731000-13.212000(12.9751+/-0.17)s
sys 0m3.697000-3.902000(3.83722+/-0.064)s
```
After:
Ten builds, laptop -j5, no ccache: 8% faster
```
real 0m33.802000-35.773000(35.468+/-0.54)s
user 2m19.073000-27.754000(26.2542+/-2.3)s
sys 0m15.784000-17.173000(16.7165+/-0.37)s
```
Ten builds, laptop -j5, ccache (warm): 1% faster
```
real 0m8.200000-8.485000(8.30138+/-0.097)s
user 0m12.485000-13.100000(12.7344+/-0.19)s
sys 0m3.702000-3.889000(3.78787+/-0.056)s
```
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
This is better than using the previous "keep statting the file" approach,
since we can also tell you how long the replacement is, to avoid a
gratitous load.
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
They're not defined to be, though we've not seen this on Linux (testing
showed that it is page-level atomic, which means it can still happen across
page boundaries though!). This was pointed out by whitslack in
https://github.com/ElementsProject/lightning/issues/4288
In practice, this just means not complaining when it happens, and also
not trying to get tricky to use it on MacOS (we can safely seek & write,
since we're single-threaded).
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
Changelog-Removed: Removed bogus UNUSUAL log about gossip_store 'short test'.
If gossip_store is an incorrect version, we will recreate it: with an incorrect
version! This means we never get persistent gossip, *and* the pay plugin will
fail to map the gossip_store. Everyone will be sad.
Debugged-by: Matt Whitlock
Fixes: #4376Fixes: #4288
Typing-by: Rusty Russell <rusty@rustcorp.com.au>
This overcomes the internal spam filter on updates, which can be useful
if we're actually trying to send through such a node.
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
Changelog-Fixed: Protocol: always accept channel_updates from errors, even they'd otherwise be rejected as spam.
Fixes: #4300
Instead of a boutique message, use a "real" channel_announcement for
private channels (with fake sigs and pubkeys). This makes it far
easier for gossmap to handle local channels.
Backwards compatible update, since we update old stores.
We also fix devtools/dump-gossipstore to know about the tombstone markers.
Since we increment our channel_announce count for local channels now,
the stats in the tests changed too.
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
This avoids overwriting the ones in git, and generally makes things neater.
We have convenience headers wire/peer_wire.h and wire/onion_wire.h to
avoid most #ifdefs: simply include those.
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
This is a better fix than doing it manually, which turned out
to do it in the wrong order (node_announcement followed by
channel_announcement) anyway.
Should fix many "Bad gossip" messages.
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
It usually means we're missing something, but there's no way to ask what.
Simply start a broad scid probe.
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
Since we have to validate, there can be a delay (and peer might
vanish) between receiving the gossip and actually confirming it, hence
the use of softref.
We will use this information to check that the peers are making progress
as we start asking them for specific information.
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
This is the modified-time of the file. We have to store it internally
since we overwrite the gossip file with compaction on startup.
This means the "are we behind on gossip?" heuristic is no longer inside
gossip_store.c, which is cleaner.
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
We've been slack, but it's going to be important for testing
ratelimiting. And it currently has a minor memory leak.
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>