This will be used when we want to specify these in a route. But for now, they
only alter gossipd, which always sets them to NULL.
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
It's almost always "their_features" and "our_features" respectively, so
make those names clear.
Suggested-by: @cdecker
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
Turns out that unnecessary: all callers can access the feature_set,
so make it much more like a normal primitive.
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
Even without optimization, it's faster to walk all the channels than
ping another daemon and wait for the response.
Changelog-Changed: Forwarding messages is now much faster (less inter-daemon traffic)
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
The idea is that gossipd can give us the cupdate we need for an error, and
we wire things up so that we ask for it (async) just before we send the
error to the subdaemon.
I tried many other things, but they were all too high-risk.
1. We need to ask gossipd every time, since it produces these lazily
(in particular, it doesn't actually generate an offline update unless
the channel is used).
2. We can't do async calls in random places, since we'll end up with
an HTLC in limbo. What if another path tries to fail it at the same time?
3. This allows us to use a temporary_node_failure error, and upgrade it
when gossipd replies. This doesn't change any existing assumptions.
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
This is a common thing to do, so create a macro.
Unfortunately, it still needs the type arg, because the paramter may
be const, and the return cannot be, and C doesn't have a general
"(-const)" cast.
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
Before this patch we used to send `double`s over the wire by just
copying them. This is not portable because the internal represenation
of a `double` is implementation specific.
Instead of this, multiply any floating-point numbers that come from
the outside (e.g. JSONs) by 1 million and round them to integers when
handling them.
* Introduce a new param_millionths() that expects a floating-point
number and returns it multipled by 1000000 as an integer.
* Replace param_double() and param_percent() with param_millionths()
* Previously the riskfactor would be allowed to be negative, which must
have been unintentional. This patch changes that to require a
non-negative number.
Changelog-None
I hadn't realized that lightningd asks gossipd every time we forward
a payment. But I'm going to abuse it here to get the latest channel_update,
otherwise (as lightningd takes over error message generation) lightningd
needs to do an async request at various painful points.
So have gossipd tell us the lastest update (stripped so compatible with
the strange in-onion-error format).
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
This lets us do more flexible filtering in the next patch. But it also
keeps some weird logic out of gossipd.
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
This prevents a gratuitous lookup of we get a late channel_announce,
but even better, it suppresses the "bad gossip" messages in case of
a late channel_update, which have plagued Travis (especially since we
got aggressive in pushing our own updates).
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
This is a better fix than doing it manually, which turned out
to do it in the wrong order (node_announcement followed by
channel_announcement) anyway.
Should fix many "Bad gossip" messages.
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
I had a report of a 0.7.2 user whose node hadn't appeared on 1ml. Their
node_announcement wasn't visible to my node, either.
I suspect this is a consequence of recent version reducing the amount of
gossip they send, as well as large nodes increasingly turning off gossip
altogether from some peers (as we do). We should ignore timestamp filters
for our own channels: the easiest way to do this is to push them out
directly from gossipd (other messages are sent via the store).
We change channeld to wrap the local channel_announcements: previously
we just handed it to gossipd as for any other gossip message we received
from our peer. Now gossipd knows to push it out, as it's local.
This interferes with the logic in tests/test_misc.py::test_htlc_send_timeout
which expects the node_announcement message last, so we generalize
that too.
[ Thanks to @trueptolmy for bugfix! ]
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
This is mainly an internal-only change, especially since we don't
offer any globalfeatures.
However, LND (as of next release) will offer global features, and also
expect option_static_remotekey to be a *global* feature. So we send
our (merged) feature bitset as both global and local in init, and fold
those bitsets together when we get an init msg.
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
We weren't supposed to do any gossiping until we were synced (and thus
knew blockheight), but our seeker_check() didn't wait for it! Move the
waiting all into seeker.c, so it can handle it all consistently.
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
It usually means we're missing something, but there's no way to ask what.
Simply start a broad scid probe.
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
We assume that the time for gossip propagation is < 10 minutes, so by
going back that far from last gossip we won't miss anything,
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
We eliminate the "need peer" states and instead check if the
random_peer_softref has been cleared.
We can also unify our restart handlers for all these cases; even the
probe_scids case, by giving gossip credit for the scids as they come
in (at a discount, since scids are 8 bytes vs the ~200 bytes for
normal gossip messages).
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
Instead of a linear array which is fairly inefficient if it turns out
we know nothing at all.
We remove the gossip_missing() call by changing the api to
remove_unknown_scid() to include a flag as to whether the scid turned
out to be real or not.
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
The seeker starts by asking a peer (the first peer!) for all gossip
since a minute before the modified time of the gossip store.
This algorithm is enhanced in successive patches.
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
Since we have to validate, there can be a delay (and peer might
vanish) between receiving the gossip and actually confirming it, hence
the use of softref.
We will use this information to check that the peers are making progress
as we start asking them for specific information.
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
This is the modified-time of the file. We have to store it internally
since we overwrite the gossip file with compaction on startup.
This means the "are we behind on gossip?" heuristic is no longer inside
gossip_store.c, which is cleaner.
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
We now have a pointer to chainparams, that fails valgrind if we do anything
chain-specific before setting it.
Suggested-by: Rusty Russell <@rustyrussell>
We do this by keeping a current and an old map, and moving the current to old
every hour or 10,000 entries.
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
I was seeing some accidental pruning under load / Travis, and in
particular we stopped accepting channel_updates because they were 103
seconds old. But making it too long makes the prune test untenable,
so restore a separate flag that this test can use.
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
The only real change is dump_gossip() used to call
maybe_create_next_scid_reply(), but now I've simply renamed
that to maybe_send_query_responses() and we call it directly.
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
The first one means we don't discard channels just because we're not
synced, and the second is implied by the spec: don't accept
channel_announcement if the channel isn't 6 deep. Since LND defers in
such cases, we do too (unless it's newer than the current block, in
which case we simply discard). Otherwise there's a risk that a slow
node might discard valid gossip.
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
This will let gossipd be more intelligent about gossiping before we're
synced, and also it might know how far behind we are.
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
It's generally clearer to have simple hardcoded numbers with an
#if DEVELOPER around it, than apparent variables which aren't, really.
Interestingly, our pruning test was always kinda broken: we have to pass
two cycles, since l2 will refresh the channel once to avoid pruning.
Do the more obvious thing, and cut the network in half and check that
l1 and l3 time out.
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
Make update_local_channel use a timer if it's too soon to make another
update.
1. Implement cupdate_different() which compares two updates.
2. make update_local_channel() take a single arg for timer usage.
3. Set timestamp of non-disable update back 5 minutes, so we can
always generate a disable update if we need to.
4. Make update_local_channel() itself do the "unchanged update" suppression.
gossipd: clean up local channel updates.
5. Keep pointer to the current timer so we override any old updates with
a new one, to avoid a race.
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
Write helpers to split it into non-timestamp, non-signature parts,
and simply compare those. We extract a helper to do channel_update, too.
This is more generic than our previous approach, and simpler.
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
We've been slack, but it's going to be important for testing
ratelimiting. And it currently has a minor memory leak.
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
Rather than reaching into data structures, let them register their own
callbacks. This avoids us having to expose "memleak_remove_xxx"
functions, and call them manually.
Under the hood, this is done by having a specially-named tal child of
the thing we want to assist, containing the callback.
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
Fortunately, again, only happens with EXPERIMENTAL_FEATURES.
If the query causes us not to actually send anything, we won't
get called again. This can validly happen if they only asked for
the node_announcements, for example.
(Found by protocol tests).
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
Our "are we finished?" logic was wrong: it tested if there are no more
node_announcements, but it's possible that there were no node_announcements
for either end of the channel whose information we sent.
This is actually quite unusual on the real network: looking at mainnet
statis from last May, 4301 of 4337 nodes have node_announcements.
However, with query flags it's much more likely, since they might not
ask for node announcements at all.
(Found by gossip protocol tests)
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
These both allow us to reproduce the test vectors in the next patch. But
using Z_DEFAULT_COMPRESSION is a reasonable idea anyway.
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
Make the TLV element a simple array. This is a bit neater, in fact, and
makes the test vectors in that 557 PR work.
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
In fact, we always generate them, we only send them if asked. And we set
the flags to 0 if not --enable-experimental-features, so we never send in
that case.
Generating checksums involves pulling the channel_update from the
gossip_store, which is suboptimal: there's a FIXME to store the
checksum in memory.
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
We're about to use the for gossip extended info too, which *don't* put
the encoding byte at the beginning of the data stream. So this removes
some "scids" from function names and separates out the "prepend a byte"
case from the "external encoding_type" case.
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
These indicate what fields we are to return. If there's now TLV, or we
haven't got --enable-experimental-features, it's set to all 1s so behaviour
is unchanged.
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
updates the bolt version to 6639cef095a2ecc7b8f0c48c6e7f2f906fbfbc58.
this requires us to use the new bolt parser at generate-bolt.py
and updates to all of the type specifications (ie. from u8 -> byte)
We hit the timestamp assert on #2750; it shouldn't happen, but crashing
doesn't leave much information.
Reported-by: @m-schmook
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
We seek a certain number of peers at each level of gossip; 3 "flood"
if we're missing gossip, 2 at 24 hours past to catch recent gossip, and
8 with current gossip. The rest are given a filter which causes them
not to gossip to us at all.
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
The first sign that we're missing gossip is that we get a channel_update
for an unknown channel. The peer might be wrong (or lying), but if it turns
out to be a real channel, we were definitely missing something.
This patch does two things: queries when we get an unknown channel_update,
and then notes that a channel_announcement was from such an update when
it's finally processed.
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
In particular, we'll need to know the short_channel_id if a
channel_update is unknown (implies we're missing a channel), and whether
processing a pending channel_announcement was successful (implies that
the channel was real).
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
Up until now we only generated these in dev mode for testing. Hoist
into common code, turn counter into a flag (we're only allowed one!)
and note if query is internal or not.
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
This means there's now a semantic difference between the default `fromid`
and setting `fromid` explicitly to our own node_id. In the default case,
it means we don't charge ourselves fees on the route.
This means we can spend the full channel balance.
We still want to consider the pricing of local channels, however:
there's a *reason* to discount one over another, and that is to bias
things. So we add the first-hop fee to the *risk* value instead.
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
This clarifies things a fair bit: we simply add and remove from the
gossip_store directly.
Before this series: (--disable-developer, -Og)
store_load_msec:20669-20902(20822.2+/-82)
vsz_kb:439704-439712(439706+/-3.2)
listnodes_sec:0.890000-1.000000(0.92+/-0.04)
listchannels_sec:11.960000-13.380000(12.576+/-0.49)
routing_sec:3.070000-5.970000(4.814+/-1.2)
peer_write_all_sec:28.490000-30.580000(29.532+/-0.78)
After: (--disable-developer, -Og)
store_load_msec:19722-20124(19921.6+/-1.4e+02)
vsz_kb:288320
listnodes_sec:0.860000-0.980000(0.912+/-0.056)
listchannels_sec:10.790000-12.260000(11.65+/-0.5)
routing_sec:2.540000-4.950000(4.262+/-0.88)
peer_write_all_sec:17.570000-19.500000(18.048+/-0.73)
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
This means we intercept the peer's gossip_timestamp_filter request
in the per-peer subdaemon itself. The rest of the semantics are fairly
simple however.
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
Keeping the uintmap ordering all the broadcastable messages is expensive:
130MB for the million-channels project. But now we delete obsolete entries
from the store, we can have the per-peer daemons simply read that sequentially
and stream the gossip itself.
This is the most primitive version, where all gossip is streamed;
successive patches will bring back proper handling of timestamp filtering
and initial_routing_sync.
We add a gossip_state field to track what's happening with our gossip
streaming: it's initialized in gossipd, and currently always set, but
once we handle timestamps the per-peer daemon may do it when the first
filter is sent.
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
They already send *us* gossip messages, so they have to be distinct anyway.
Why make us both do extra work?
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
We use the high bit of the length field: this way we can still check
that the checksums are valid on deleted fields.
Once this is done, serially reading the gossip_store file will result
in a complete, ordered, minimal gossip broadcast. Also, the horrible
corner case where we might try to delete things from the store during
load time is completely gone: we only load non-deleted things.
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
Each destructor2 costs 40 bytes, and struct chan is only 120 bytes. So
this drops our memory usage quite a bit:
MCP bench results change:
-vsz_kb:580004-580016(580006+/-4.8)
+vsz_kb:533148
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
This has two effects: most importantly, it avoids the problem where
lightningd creates a 800MB JSON blob in response to listchannels,
which causes OOM on the Raspberry Pi (our previous max allocation was
832MB). This is because lightning-cli can start draining the JSON
while we're filling the buffer, so we end up with a max allocation of
68MB.
But despite being less efficient (multiple queries to gossipd), it
actually speeds things up due to the parallelism:
MCP with -O3 -flto before vs after:
-listchannels_sec:8.980000-9.330000(9.206+/-0.14)
+listchannels_sec:7.500000-7.830000(7.656+/-0.11)
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
We now have a test blockchain for MCP which has the correct channels,
so this is not needed.
Also fix a benchmark script bug where 'mv "$DIR"/log
"$DIR"/log.old.$$' would fail if you log didn't exist from a previous run.
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
Instead of reading the store ourselves, we can just send them an
offset. This saves gossipd a lot of work, putting it where it belongs
(in the daemon responsible for the specific peer).
MCP bench results:
store_load_msec:28509-31001(29206.6+/-9.4e+02)
vsz_kb:580004-580016(580006+/-4.8)
store_rewrite_sec:11.640000-12.730000(11.908+/-0.41)
listnodes_sec:1.790000-1.880000(1.83+/-0.032)
listchannels_sec:21.180000-21.950000(21.476+/-0.27)
routing_sec:2.210000-11.160000(7.126+/-3.1)
peer_write_all_sec:36.270000-41.200000(38.168+/-1.9)
Signficant savings in streaming gossip:
-peer_write_all_sec:48.160000-51.480000(49.608+/-1.1)
+peer_write_all_sec:35.780000-37.980000(36.43+/-0.81)
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>