We now *never* consider asking them anything, even if they are
capable (e.g. enabling full gossip stream). This aligns with
the latest spec proposal, where lack of `gossip_queries` means
"we don't have anything useful to say".
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
It's a u64, we should pass by copy. This is a big sweeping change,
but mainly mechanical (change one, compile, fix breakage, repeat).
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
We add a temporary stub gossmap_manage constructor, which simply opens
the gossmap and doesn't do anything else.
Then seeker uses this, rather than routing.c, to probe.
We optimize our "get random node announcements" a bit by traversing a
random set of nodes directly, and seeing if we have no
node_announcement, then querying their first channel.
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
Most nodes don't really care about exact timestamps on gossip filters,
so just keep a flag on whether we have anything in the gossip_store,
and use that to determine whether we ask peers for everything.
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
This is easier for future callers, which don't have a convenient peer
structure: in particular, asynchronous processing of gossip for peers.
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
We keep several peer pointers, but we just add a hook to NULL them
manually when a peer dies, rather than using voodoo.
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
Gossipd didn't actually suppress all gossip, resulting in a flake!
Doing it in connectd now makes much more sense.
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
And turn "" includes into full-path (which makes it easier to put
config.h first, and finds some cases check-includes.sh missed
previously).
config.h sets _GNU_SOURCE which really needs to be done before any
'#includes': we mainly got away with it with glibc, but other platforms
like Alpine may have stricter requirements.
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
Before:
Ten builds, laptop -j5, no ccache:
```
real 0m36.686000-38.956000(38.608+/-0.65)s
user 2m32.864000-42.253000(40.7545+/-2.7)s
sys 0m16.618000-18.316000(17.8531+/-0.48)s
```
Ten builds, laptop -j5, ccache (warm):
```
real 0m8.212000-8.577000(8.39989+/-0.13)s
user 0m12.731000-13.212000(12.9751+/-0.17)s
sys 0m3.697000-3.902000(3.83722+/-0.064)s
```
After:
Ten builds, laptop -j5, no ccache: 8% faster
```
real 0m33.802000-35.773000(35.468+/-0.54)s
user 2m19.073000-27.754000(26.2542+/-2.3)s
sys 0m15.784000-17.173000(16.7165+/-0.37)s
```
Ten builds, laptop -j5, ccache (warm): 1% faster
```
real 0m8.200000-8.485000(8.30138+/-0.097)s
user 0m12.485000-13.100000(12.7344+/-0.19)s
sys 0m3.702000-3.889000(3.78787+/-0.056)s
```
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
We assume if they set this to 0 (which nobody did previously), they're
using it as a modern flag and use it to indicate when they're
finished. Otherwise, we count how many blocks they've sent and use
that to determine whether they've finished.
See: https://github.com/lightningnetwork/lightning-rfc/pull/826
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
Changelog-Changed: Protocol: we use `sync_complete` for gossip range query replies, with detection for older spec nodes.
It's not (yet?) compulsory to have the timestamps, but handing them around
together makes sense (a missing timestamp has the same effect as a zero
timestamp).
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
The spec (since d4bafcb67dcf1e4de4d16224ea4de6b543ae73bf in March
2020) requires that reply_channel_range be in order (and all
implementations did this anyway).
But when I tried this, I found that LND doesn't (always) obey this,
since don't divide on block boundaries. So we have to loosen the
constraints here a little.
We got rid of the old LND compat handling though, since everyone should
now be upgraded (there are CVEs out for older LNDs).
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
Changelog-Removed: Support for receiving full gossip from ancient LND nodes.
This avoids overwriting the ones in git, and generally makes things neater.
We have convenience headers wire/peer_wire.h and wire/onion_wire.h to
avoid most #ifdefs: simply include those.
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
I was wondering why TAGS was missing some functions, and finally
tracked it down: PRINTF_FMT() confuses etags if it's at the start
of a function, and it ignores the rest of the file.
So we put PRINTF_FMT at the end, but that doesn't work for
*definitions*, only *declarations*. So we remove it from definitions
and add gratuitous declarations in the few static places.1
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
Asking for the last few blocks was logical, but my node is missing
most gossip in practice.
For the moment, simply ask a peer for every channel it knows, once
we're started up.
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
When probing, no point probing for before lightning became cool. Current
logic means we often probe below block 500,000, and think things are OK
because there are no short_channel_ids.
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
We always ended up sending an empty query when we had stale scids!
And it turns out we consider such a query invalid:
Bad query_short_channel_ids query_flags 010506226e46111a0b59caaf126043eb5bbf28c34f3a5e332a1fc7b2b73cf188910f000100010100
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
We only chose 3 peers to gossip with us (down from 8 last release).
There's no justification for this number, or reason to believe that
it is sufficient to keep us in sync.
Be more conservative for now; we can always decrease it later once
we have more data.
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
polling the last 32 is fairly useless in practice, since they tend to
be recent nodes; it won't detect long-forgotten ones.
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
I thought LND had a bug, but turns out it doesn't like out-of-order
short_channel_ids: in fact, the spec says they have to be in order!
This means we use uintmap instead of a htable for unknown_scids and
stale_scids so they're nicely ordered.
But our nodes-missing-announcements probe is harder since they can
also contain duplicates: we switch that to iterate through channels
rather than nodes.
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
We weren't supposed to do any gossiping until we were synced (and thus
knew blockheight), but our seeker_check() didn't wait for it! Move the
waiting all into seeker.c, so it can handle it all consistently.
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
On testing, I found a node which would hang up every time we asked it
for query_short_channel_ids (despite it offering features 0x81, meaning
it should handle this message).
Then it would reconnect, and we'd choose it again as our
PROBING_NANNOUNCES peer!
Instead, leave finding another peer to the once-a-minute
seeker_check() function.
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
By combining set_state() with selected_peer() we can give a single log
line describing what we're asking for, from whom.
We also add more verbosity to a few key areas, such as gossip rotation
and when gossipd tells a peer to send an error. And move a comment which
was above the wrong function (due to rebase?).
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
It usually means we're missing something, but there's no way to ask what.
Simply start a broad scid probe.
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
This should give more reliable results, though it risks us getting
suckered into always consulting the same peer.
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
We assume that the time for gossip propagation is < 10 minutes, so by
going back that far from last gossip we won't miss anything,
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
It's simple: if we wouldn't accept the timestamp we see, don't put
the channel in the stale_scid_map.
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
Just try to choose another. Under Travis, this causes many failures due
to slowness (they only get 10 seconds in -dev mode).
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
We eliminate the "need peer" states and instead check if the
random_peer_softref has been cleared.
We can also unify our restart handlers for all these cases; even the
probe_scids case, by giving gossip credit for the scids as they come
in (at a discount, since scids are 8 bytes vs the ~200 bytes for
normal gossip messages).
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
Build up a map of short_channel_ids which we have old info for (only
if peer supports gossip_query_ex).
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>