Commit Graph

116 Commits

Author SHA1 Message Date
Rusty Russell
9dadcc858b common/gossip_store: avoid fd pass for new store, use end marker.
This is also simpler.

Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
2021-05-22 17:53:04 +09:30
Rusty Russell
0d4f014021 gossip_store: create end marker for EOF.
This is better than using the previous "keep statting the file" approach,
since we can also tell you how long the replacement is, to avoid a
gratitous load.

Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
2021-05-22 17:53:04 +09:30
Rusty Russell
7fbf728a34 gossipd: assert we're not blatting the version header.
Suggested-by: whitslack
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
2021-03-31 12:26:21 +10:30
Rusty Russell
7b853d0fa5 gossip_store: don't make bogus assumption that writes are atomic wrt readers.
They're not defined to be, though we've not seen this on Linux (testing
showed that it is page-level atomic, which means it can still happen across
page boundaries though!).  This was pointed out by whitslack in
https://github.com/ElementsProject/lightning/issues/4288

In practice, this just means not complaining when it happens, and also
not trying to get tricky to use it on MacOS (we can safely seek & write,
since we're single-threaded).

Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
Changelog-Removed: Removed bogus UNUSUAL log about gossip_store 'short test'.
2021-03-31 12:26:21 +10:30
Rusty Russell
b689d33e97 gossipd: fix rolling corruption.
If gossip_store is an incorrect version, we will recreate it: with an incorrect
version!  This means we never get persistent gossip, *and* the pay plugin will
fail to map the gossip_store.  Everyone will be sad.

Debugged-by: Matt Whitlock
Fixes: #4376
Fixes: #4288
Typing-by: Rusty Russell <rusty@rustcorp.com.au>
2021-03-26 12:13:51 +10:30
Rusty Russell
f1c599516e gossipd: add an internal flag to force a channel update
This overcomes the internal spam filter on updates, which can be useful
if we're actually trying to send through such a node.

Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
Changelog-Fixed: Protocol: always accept channel_updates from errors, even they'd otherwise be rejected as spam.
Fixes: #4300
2021-02-02 13:44:01 +01:00
Rusty Russell
eadf2c91fe libplugin-pay: incorporate gossip store.
So we can use this for routing determinations.

Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
2020-10-21 08:58:34 +10:30
Rusty Russell
83aea6b2bb gossip_store: make private channels more similar to channel_announcement
Instead of a boutique message, use a "real" channel_announcement for
private channels (with fake sigs and pubkeys).  This makes it far
easier for gossmap to handle local channels.

Backwards compatible update, since we update old stores.

We also fix devtools/dump-gossipstore to know about the tombstone markers.

Since we increment our channel_announce count for local channels now,
the stats in the tests changed too.

Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
2020-10-21 08:58:34 +10:30
Rusty Russell
bb9ad57a03 gossip_store: don't copy old delete markers on startup compact.
So we don't have to handle them at load time, either.

Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
2020-10-21 08:58:34 +10:30
Rusty Russell
c34c055d82 Makefile: use completely separate spec-derived files for EXPERIMENTAL_FEATURES
This avoids overwriting the ones in git, and generally makes things neater.

We have convenience headers wire/peer_wire.h and wire/onion_wire.h to
avoid most #ifdefs: simply include those.

Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
2020-09-08 09:42:00 +09:30
Rusty Russell
8150d28575 Makefile: use generic rules to make spec-derived sources.
Now we use the same Makefile rules for all CSV->C generation.

Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
2020-08-31 21:33:26 -05:00
Rusty Russell
7dd6f8f2b5 gossipd: add tombstone when we remove a channel.
For those following along at home.

Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
2020-08-28 10:56:50 +09:30
Rusty Russell
dffbf8de85 gossipd: convert wire to new scheme.
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
2020-08-25 12:53:13 +09:30
Rusty Russell
855debcfe1 gossipd: upgrade v7 gossip_store to v8.
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
2020-05-07 08:44:58 +09:30
Rusty Russell
eed654f684 connectd, gossipd: use per-peer logging.
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
2019-11-18 04:50:22 +00:00
Rusty Russell
7f45e55d84 gossipd: set the push marker for our own messages.
This is a better fix than doing it manually, which turned out
to do it in the wrong order (node_announcement followed by
channel_announcement) anyway.

Should fix many "Bad gossip" messages.

Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
2019-11-04 17:50:58 +01:00
Rusty Russell
bb370e66a8 gossipd: handle a "push" marker into the gossip_store.
This tells clients to ignore any timestamp_filter and always
send this message when it sees it.

Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
2019-11-04 17:50:58 +01:00
Rusty Russell
a1644c1b6e seeker: start doing a channel probe if we see unknown node_announcement msgs.
It usually means we're missing something, but there's no way to ask what.
Simply start a broad scid probe.

Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
2019-10-10 21:48:52 -05:00
Rusty Russell
0091300ee3 gossipd: track what peer gave us gossip msgs so we can credit it.
Since we have to validate, there can be a delay (and peer might
vanish) between receiving the gossip and actually confirming it, hence
the use of softref.

We will use this information to check that the peers are making progress
as we start asking them for specific information.

Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
2019-10-10 21:48:52 -05:00
Rusty Russell
296868daf4 gossipd: have gossip_store_load() return a timestamp.
This is the modified-time of the file.  We have to store it internally
since we overwrite the gossip file with compaction on startup.

This means the "are we behind on gossip?" heuristic is no longer inside
gossip_store.c, which is cleaner.

Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
2019-10-10 21:48:52 -05:00
Rusty Russell
2577ad87d5 gossipd: use gossip_time_now() everywhere.
We've been slack, but it's going to be important for testing
ratelimiting.  And it currently has a minor memory leak.

Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
2019-09-12 05:11:56 +00:00
Rusty Russell
afbed94a6c gossipd: work around missing pwritev().
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
2019-09-11 05:58:36 +00:00
darosior
0b0ad4c22d transition from status_trace() to status_debug 2019-09-10 02:02:51 +00:00
Rusty Russell
f9ecc76d99 gossipd: check that we don't try to access a deleted gossip entry.
We ignored this before, which meant that the DEVELOPER-mode check that we
delete the correct record didn't check that it wasn't already deleted.

Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
2019-08-09 08:58:05 +02:00
Rusty Russell
f57f068592 gossipd: don't use O_APPEND on the gossip_store.
We always know the length, so we don't need it.  It causes much extra work
when we want to delete a record, which I suspect may cause issues amongst
some users who've been seeing gossip_store corruption.

Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
2019-08-03 12:50:51 +02:00
Rusty Russell
c95b4eedf4 gossipd: fail clearly if we can't open/create gossip_store.
Otherwise we fail at the write, and then it's not clear
*why* we couldn't open file:

lightning_gossipd: Writing version to store: Bad file descriptor (version v0.7.1-16-g7ea5c5c)
0x560dcf1a3779 send_backtrace common/daemon.c:40
0x560dcf1a634d status_failed common/status.c:192
0x560dcf19726a gossip_store_new gossipd/gossip_store.c:195
0x560dcf199fd0 new_routing_state gossipd/routing.c:177
0x560dcf1a098b gossip_init gossipd/gossipd.c:2113
0x560dcf1a197a recv_req gossipd/gossipd.c:2946
0x560dcf1a38cd handle_read common/daemon_conn.c:31
0x560dcf1bae2c next_plan ccan/ccan/io/io.c:59
0x560dcf1bb314 do_plan ccan/ccan/io/io.c:407
0x560dcf1bb341 io_ready ccan/ccan/io/io.c:417
0x560dcf1bcb13 io_loop ccan/ccan/io/poll.c:445
0x560dcf1a1ba0 main gossipd/gossipd.c:3073

Reported-by: @JavierRSobrino
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
2019-07-04 16:10:20 +02:00
Rusty Russell
c303d7d534 gossipd: only do (automatic) store compaction at startup.
Rewriting the gossip_store is much more trivial when we don't have
any pointers into it, so add some simple offline compaction code
and disable the automatic compaction code.

Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
2019-06-21 20:03:10 -05:00
Rusty Russell
c15d9ed37c gossip_store: make copy of corrupt gossip_store on failure.
This should help debugging vastly.

Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
2019-06-21 22:03:35 +00:00
Rusty Russell
8928f0b5f9 gossipd: remove gossip entirely if we hit a problem on load.
The crashes in #2750 are mostly caused by us trying to partially truncate
the store.  The simplest fix for release is to discard the whole thing if
we detect a problem.

This is a workaround: it'd be far nicer to try to recover.

Fixes: #2750
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
2019-06-21 22:03:35 +00:00
Rusty Russell
47b5f2e837 gossipd: truncate gossip_store.tmp for compaction.
If something went wrong and there was an old one, we were
appending to it!

Reported-by: @SimonVrouwe
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
2019-06-20 02:53:52 +00:00
Rusty Russell
5e3690b3c5 gossipd: delete channel_amount from the store when we delete channel_announcement.
Otherwise we slowly build up cruft: compaction simply moves them since
they're not deleted.

Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
2019-06-15 10:52:05 +02:00
Rusty Russell
10c503b4b4 gossip_store: clean up a truncated store.
We might have channel_announcements which have no channel_update: normally
these don't get written into the store until there is one, but if the
store was truncated it can happen.  We then get upset on compaction, since
we don't have an in-memory representation of the channel_announcement.

Similarly, we leave the node_announcement pending until after that
channel_announcement, leading to a similar case.

Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
2019-06-15 10:52:05 +02:00
Rusty Russell
24cc371cdf gossipd: gossip_store errors after rewrite are fatal.
We can't continue, since we've moved the indexes.  We'll just crash
anyway, as seen from bugs #2742 and #2743.

Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
2019-06-14 02:17:32 +00:00
Rusty Russell
eb5cc47bdd gossipd: count deleted records correctly when loading gossip_store.
The result of an incorrect count was that we failed on next compaction.

Fixes: #2743
Fixes: #2742
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
2019-06-14 02:17:32 +00:00
Rusty Russell
21c920a8e8 gossipd: note if loaded store seems reasonably up-to-date.
If not, we can ask peers for full gossip (for now we just set a flag).

Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
2019-06-12 00:37:46 +00:00
Rusty Russell
0d2a4830ed ccan: update to faster and correct crc32c implementation.
I decided to try a faster implementation, only to find our crc32c was
not correct!  Ouch.

I removed the crc32c functions from ccan/crc, and added a new crc32c
module which has the Mark Adler x86-64-optimized variants.

We bump gossip_store version again, since csums have changed.

Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
2019-06-11 23:40:10 +00:00
Rusty Russell
628b65fb40 gossip_store: don't leave dangling channel_announce if we truncate.
(Or, if we crashed before we got to write out the channel_update).
It's a corner case, but one reported by @darosior and reproduced
on my test node (both with bad gossip_store due to previous iterations
of this patchset!).

Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
2019-06-04 01:29:39 +00:00
Rusty Russell
3e733afb2b gossipd: remove broadcast map altogether.
This clarifies things a fair bit: we simply add and remove from the
gossip_store directly.

Before this series: (--disable-developer, -Og)
    store_load_msec:20669-20902(20822.2+/-82)
    vsz_kb:439704-439712(439706+/-3.2)
    listnodes_sec:0.890000-1.000000(0.92+/-0.04)
    listchannels_sec:11.960000-13.380000(12.576+/-0.49)
    routing_sec:3.070000-5.970000(4.814+/-1.2)
    peer_write_all_sec:28.490000-30.580000(29.532+/-0.78)

After: (--disable-developer, -Og)
    store_load_msec:19722-20124(19921.6+/-1.4e+02)
    vsz_kb:288320
    listnodes_sec:0.860000-0.980000(0.912+/-0.056)
    listchannels_sec:10.790000-12.260000(11.65+/-0.5)
    routing_sec:2.540000-4.950000(4.262+/-0.88)
    peer_write_all_sec:17.570000-19.500000(18.048+/-0.73)

Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
2019-06-04 01:29:39 +00:00
Rusty Russell
dd83453b2f gossipd/gossip_store: fix compacting, don't use broadcast ordering.
We have a problem: if we get halfway through writing the compacted store
and run out of disk space, we've already changed half the indexes.

This changes it so we do nothing until writing is finished: then we
iterate through and update indexes.  It also weans us off broadcast
ordering, which we can now eliminated.

Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
2019-06-04 01:29:39 +00:00
Rusty Russell
5161b79bfc gossipd/gossip_store: keep count of deleted entries, don't use bs->count.
We didn't count some records before, so we could compare the two counters.
This is much simpler, and avoids reliance on bs.

Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
2019-06-04 01:29:39 +00:00
Rusty Russell
948490ec58 gossipd: add timestamp in gossip store header.
(We don't increment the gossip_store version, since there are only a
few commits since the last time we did this).

This lets the reader simply filter messages; this is especially nice since
the channel_announcement timestamp is *derived*, not in the actual message.

This also creates a 'struct gossip_hdr' which makes the code a bit
clearer.

Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
2019-06-04 01:29:39 +00:00
Rusty Russell
bad9734dc7 gossip_store: remove redundant copy_message.
The single caller can easily use transfer_store_msg instead.

Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
2019-06-04 01:29:39 +00:00
Rusty Russell
5591c0b5d8 gossipd: don't send gossip stream, let per-peer daemons read it themselves.
Keeping the uintmap ordering all the broadcastable messages is expensive:
130MB for the million-channels project.  But now we delete obsolete entries
from the store, we can have the per-peer daemons simply read that sequentially
and stream the gossip itself.

This is the most primitive version, where all gossip is streamed;
successive patches will bring back proper handling of timestamp filtering
and initial_routing_sync.

We add a gossip_state field to track what's happening with our gossip
streaming: it's initialized in gossipd, and currently always set, but
once we handle timestamps the per-peer daemon may do it when the first
filter is sent.

Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
2019-06-04 01:29:39 +00:00
Rusty Russell
4399faf57c gossipd: make writes to gossip_store atomic.
There's a corner case where otherwise a reader could see the header and
not the body of a message.  It could handle that in various ways,
but simplest (and most efficient) is to avoid it happening.

Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
2019-06-04 01:29:39 +00:00
Rusty Russell
df00f20e4a gossipd: erase old entries from the store, don't just append.
We use the high bit of the length field: this way we can still check
that the checksums are valid on deleted fields.

Once this is done, serially reading the gossip_store file will result
in a complete, ordered, minimal gossip broadcast.  Also, the horrible
corner case where we might try to delete things from the store during
load time is completely gone: we only load non-deleted things.

Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
2019-06-04 01:29:39 +00:00
Rusty Russell
696dc6b597 gossipd: disable gossip_store upgrade.
We're about to bump version again, and the code to upgrade it was
quite hairy (and buggy!).  It's not worthwhile for such a
poorly-tested path: I will just add code to limit how much incoming
gossip we get to avoid flooding when we upgrade, however.

I also use a modern gossip_store version in our test_gossip_store_load
test, instead of relying on the upgrade path.

Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
2019-06-04 01:29:39 +00:00
Rusty Russell
43f2cbd250 gossipd: track gossip_store locations of local channels.
We currently don't care, but the next patch means we have to find them
again.

Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
2019-06-04 01:29:39 +00:00
Rusty Russell
180a552fba gossip_store: mark private updates separately from normal ones.
They're really gossipd-internal, and we don't want per-peer daemons
to confuse them with normal updates.

I don't bump the gossip_store version; that's coming with another update
anyway.

Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
2019-06-04 01:29:39 +00:00
Rusty Russell
763697eb4c gossipd: fix gossip_store calling delete.
Now we handle node_announcements properly, we have a failure case where we
try to move them when a channel is deleted while loading the store.

We're going to remove this soon, in favor of in-place delete, so
workaround this for now to avoid an assert() when we try to write to
the store while loading.

Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
2019-06-03 11:04:25 -07:00
Rusty Russell
c233fc5063 gossipd: fix spurious unused error with gcc-9 -O3.
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
2019-06-03 00:07:11 +00:00