We are logging way too much from gossipd, causing noisy logs. This PR reduces
logs for incoming messages to those that actually caused a change in our
internal state (duplicate and old messages are just dropped silently now).
Changelog-Changed: gossipd: The `gossipd` is now a lot quieter, and will log only when a message changed our network topology.
Note that other directories were explicitly depending on the generated
file, instead of relying on their (already existing) dependency on
$(LIGHTNINGD_HSM_CLIENT_OBJS), so we remove that.
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
See https://github.com/lightningnetwork/lightning-rfc/pull/767
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
Changelog-Changed: Protocol: channels now pruned after two weeks unless both peers refresh it (see lightning-rfc#767)
It's not all that rare to do these operations, and requiring annotations
for it is a little painful.
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
There were no channel updates in my log; because sendonion doesn't know the
actual node_ids or channel_ids, we can't tell gossipd what node/channel it was
so it can no longer remove them on PERM errors.
However, we can tell it the error message so it can apply the update.
Fixes: #3877
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
The main change here is that the previously-optional open/accept
fields and reestablish fields are now compulsory (everyone was
including them anyway). In fact, the open/accept is a TLV
because it was actually the same format.
For more details, see lightning-rfc/f068dd0d8dfa5ae75feedd99f269e23be4777381
Changelog-Removed: protocol: support for optioned form of reestablish messages now compulsory.
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
Since we now over-write the wally malloc/free functions, we need to do
so for tests as well. Here we pull up all of the common setup/teardown
logic into a separate place, and update the tests that use libwally to
use the new common_setup core
Changelog-None
This saves us keeping it in memory (so far, no channels have features), but
lets us optimize that case so we don't need to hit the disk for most of the
channels in listchannels.
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
When we have only a single member in a TLV (e.g. an optional u64),
wrapping it in a struct is awkward. This changes it to directly
access those fields.
This is not only more elegant (60 fewer lines), it would also be
more cache friendly. That's right: cache hot singles!
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
This will be used when we want to specify these in a route. But for now, they
only alter gossipd, which always sets them to NULL.
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
Use `LC_ALL=C sort` instead of `sort` so that mocks get sorted in
the same way on all developers' environments.
Re-record the result of `make update-mocks`.
Changelog-None
It's almost always "their_features" and "our_features" respectively, so
make those names clear.
Suggested-by: @cdecker
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
Turns out that unnecessary: all callers can access the feature_set,
so make it much more like a normal primitive.
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
This is to prepare for dynamic features, including making plugins first
class citizens at setting them.
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
```
clang10 -DBINTOPKGLIBEXECDIR="\"../libexec/c-lightning\"" -Wall -Wundef -Wmissing-prototypes -Wmissing-declarations -Wstrict-prototypes -Wold-style-definition -Werror -std=gnu11 -g -fstack-protector -I ccan -I external/libwally-core/include/ -I external/libwally-core/src/secp256k1/include/ -I external/jsmn/ -I external/libbacktrace/ -I external/libbacktrace-build -I . -I/usr/local/include -DCCAN_TAKE_DEBUG=1 -DCCAN_TAL_DEBUG=1 -DCCAN_JSON_OUT_DEBUG=1 -DSHACHAIN_BITS=48 -DJSMN_PARENT_LINKS -DBUILD_ELEMENTS=1 -DBINTOPKGLIBEXECDIR="\"../libexec/c-lightning\"" -c -o gossipd/routing.o gossipd/routing.c
gossipd/routing.c:651:10: error: implicit conversion from 'unsigned long' to 'double' changes value
from 18446744073709551615 to 18446744073709551616 [-Werror,-Wimplicit-int-float-conversion]
if (r > UINT64_MAX)
~ ^~~~~~~~~~
```
It is ok to change the values because they are approximate anyway. Thus,
explicitly typecast to `double` to silence the warning without changing
behavior.
Changelog-None
Even without optimization, it's faster to walk all the channels than
ping another daemon and wait for the response.
Changelog-Changed: Forwarding messages is now much faster (less inter-daemon traffic)
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
The idea is that gossipd can give us the cupdate we need for an error, and
we wire things up so that we ask for it (async) just before we send the
error to the subdaemon.
I tried many other things, but they were all too high-risk.
1. We need to ask gossipd every time, since it produces these lazily
(in particular, it doesn't actually generate an offline update unless
the channel is used).
2. We can't do async calls in random places, since we'll end up with
an HTLC in limbo. What if another path tries to fail it at the same time?
3. This allows us to use a temporary_node_failure error, and upgrade it
when gossipd replies. This doesn't change any existing assumptions.
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
This is a common thing to do, so create a macro.
Unfortunately, it still needs the type arg, because the paramter may
be const, and the return cannot be, and C doesn't have a general
"(-const)" cast.
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
common should not include specific per-daemon files. Turns out this
caused a lot of indirect includes to be exposed.
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
Before this patch we used to send `double`s over the wire by just
copying them. This is not portable because the internal represenation
of a `double` is implementation specific.
Instead of this, multiply any floating-point numbers that come from
the outside (e.g. JSONs) by 1 million and round them to integers when
handling them.
* Introduce a new param_millionths() that expects a floating-point
number and returns it multipled by 1000000 as an integer.
* Replace param_double() and param_percent() with param_millionths()
* Previously the riskfactor would be allowed to be negative, which must
have been unintentional. This patch changes that to require a
non-negative number.
Changelog-None
I hadn't realized that lightningd asks gossipd every time we forward
a payment. But I'm going to abuse it here to get the latest channel_update,
otherwise (as lightningd takes over error message generation) lightningd
needs to do an async request at various painful points.
So have gossipd tell us the lastest update (stripped so compatible with
the strange in-onion-error format).
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
Now that we have json_stream in common/, we can move all the related
helpers from lightningd/json to common/json. This way everyone can
benefit of them (including libplugin, the plugins themselves,
potentially lightning-cli), not lightningd alone!
Note that the Makefile of the common/test/ had to be modified, because
the new helpers make use of common/wireaddr... Which turns out to
\#include <lightingd/lightningd.h> ! So we couldnt just include the .c
and add mocks if we redefined some structs (hello run-param).
This makes it clear we're dealing with a message which is a wrapped error
reply (needing unwrap_onionreply), not an already-wrapped one.
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
This lets us do more flexible filtering in the next patch. But it also
keeps some weird logic out of gossipd.
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
We've been sending them errors for invalid replies; instead, this works
around it.
Changelog-Added: Workaround LND's reply_channel_range issues instead of sending error.
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
This is ignored in subdaemons which are per-peer, but very useful for
multi-peer daemons like connectd and gossipd.
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
It really, really doesn't matter. But we were dramatically reducing
our view of the network:
In my gossip_store (mainnet):
channel_announcement: 30349
channel_update: 55119
node_announcment: 1783
Changelog-Fixed: No longer discard most node_announcements (fixes#3194)
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
The flat feature PR changes the rules so these are OK to propagate.
That makes sense: the unsupported features means there's something
unsupported about the *node* or *channel*, not the msg itself
(for that we'd use a different message type).
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
This prevents a gratuitous lookup of we get a late channel_announce,
but even better, it suppresses the "bad gossip" messages in case of
a late channel_update, which have plagued Travis (especially since we
got aggressive in pushing our own updates).
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
This is a better fix than doing it manually, which turned out
to do it in the wrong order (node_announcement followed by
channel_announcement) anyway.
Should fix many "Bad gossip" messages.
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
I was wondering why TAGS was missing some functions, and finally
tracked it down: PRINTF_FMT() confuses etags if it's at the start
of a function, and it ignores the rest of the file.
So we put PRINTF_FMT at the end, but that doesn't work for
*definitions*, only *declarations*. So we remove it from definitions
and add gratuitous declarations in the few static places.1
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
Asking for the last few blocks was logical, but my node is missing
most gossip in practice.
For the moment, simply ask a peer for every channel it knows, once
we're started up.
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
When probing, no point probing for before lightning became cool. Current
logic means we often probe below block 500,000, and think things are OK
because there are no short_channel_ids.
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
We always ended up sending an empty query when we had stale scids!
And it turns out we consider such a query invalid:
Bad query_short_channel_ids query_flags 010506226e46111a0b59caaf126043eb5bbf28c34f3a5e332a1fc7b2b73cf188910f000100010100
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
We only chose 3 peers to gossip with us (down from 8 last release).
There's no justification for this number, or reason to believe that
it is sufficient to keep us in sync.
Be more conservative for now; we can always decrease it later once
we have more data.
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
polling the last 32 is fairly useless in practice, since they tend to
be recent nodes; it won't detect long-forgotten ones.
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
If we get a channel_update while we're still verifying the channel_announcement
we didn't set the peer pointer, so it didn't get credit. As a result, the
seeker tended to think we were done gossiping sooner than we were.
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
I had a report of a 0.7.2 user whose node hadn't appeared on 1ml. Their
node_announcement wasn't visible to my node, either.
I suspect this is a consequence of recent version reducing the amount of
gossip they send, as well as large nodes increasingly turning off gossip
altogether from some peers (as we do). We should ignore timestamp filters
for our own channels: the easiest way to do this is to push them out
directly from gossipd (other messages are sent via the store).
We change channeld to wrap the local channel_announcements: previously
we just handed it to gossipd as for any other gossip message we received
from our peer. Now gossipd knows to push it out, as it's local.
This interferes with the logic in tests/test_misc.py::test_htlc_send_timeout
which expects the node_announcement message last, so we generalize
that too.
[ Thanks to @trueptolmy for bugfix! ]
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
This is mainly an internal-only change, especially since we don't
offer any globalfeatures.
However, LND (as of next release) will offer global features, and also
expect option_static_remotekey to be a *global* feature. So we send
our (merged) feature bitset as both global and local in init, and fold
those bitsets together when we get an init msg.
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
I thought LND had a bug, but turns out it doesn't like out-of-order
short_channel_ids: in fact, the spec says they have to be in order!
This means we use uintmap instead of a htable for unknown_scids and
stale_scids so they're nicely ordered.
But our nodes-missing-announcements probe is harder since they can
also contain duplicates: we switch that to iterate through channels
rather than nodes.
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
We weren't supposed to do any gossiping until we were synced (and thus
knew blockheight), but our seeker_check() didn't wait for it! Move the
waiting all into seeker.c, so it can handle it all consistently.
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
On testing, I found a node which would hang up every time we asked it
for query_short_channel_ids (despite it offering features 0x81, meaning
it should handle this message).
Then it would reconnect, and we'd choose it again as our
PROBING_NANNOUNCES peer!
Instead, leave finding another peer to the once-a-minute
seeker_check() function.
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
By combining set_state() with selected_peer() we can give a single log
line describing what we're asking for, from whom.
We also add more verbosity to a few key areas, such as gossip rotation
and when gossipd tells a peer to send an error. And move a comment which
was above the wrong function (due to rebase?).
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
It usually means we're missing something, but there's no way to ask what.
Simply start a broad scid probe.
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
This should give more reliable results, though it risks us getting
suckered into always consulting the same peer.
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
We assume that the time for gossip propagation is < 10 minutes, so by
going back that far from last gossip we won't miss anything,
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
It's simple: if we wouldn't accept the timestamp we see, don't put
the channel in the stale_scid_map.
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
Just try to choose another. Under Travis, this causes many failures due
to slowness (they only get 10 seconds in -dev mode).
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
We eliminate the "need peer" states and instead check if the
random_peer_softref has been cleared.
We can also unify our restart handlers for all these cases; even the
probe_scids case, by giving gossip credit for the scids as they come
in (at a discount, since scids are 8 bytes vs the ~200 bytes for
normal gossip messages).
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
Build up a map of short_channel_ids which we have old info for (only
if peer supports gossip_query_ex).
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
This asks peers to append the timestamps or checksums: if it has
gossip_query_ex support, it will, otherwise it's ignored.
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
If the peer supports `gossip_query_ex` we can use query_flags to simply
request the node_announcements when probing for nodes, rather than
getting everything. If a peer doesn't support `gossip_query_ex` then
it's harmless to add it.
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
We pick some nodes which don't seem to have node_announcements and we
ask a channel associated with them. Again, if this reveals more
node_announcements, we probe for twice as many next time.
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
If we have any unknown short_channel_ids, we ask a random peer for
those channels. Once it responds, we probe again for a small random
range in case more are missing, again enlarging if we find some.
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
Instead of a linear array which is fairly inefficient if it turns out
we know nothing at all.
We remove the gossip_missing() call by changing the api to
remove_unknown_scid() to include a flag as to whether the scid turned
out to be real or not.
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
Once we've finished streaming gossip from the first peer, we ask a
random peer (maybe the same one) for all short_channel_ids in the last
6 blocks from the latest channel we know about.
If this reveals new channels we didn't know about, we expand the probe
by a factor of 2 each time.
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
The seeker starts by asking a peer (the first peer!) for all gossip
since a minute before the modified time of the gossip store.
This algorithm is enhanced in successive patches.
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>