They're really gossipd-internal, and we don't want per-peer daemons
to confuse them with normal updates.
I don't bump the gossip_store version; that's coming with another update
anyway.
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
Now we handle node_announcements properly, we have a failure case where we
try to move them when a channel is deleted while loading the store.
We're going to remove this soon, in favor of in-place delete, so
workaround this for now to avoid an assert() when we try to write to
the store while loading.
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
When we first receive a channel_update, we write both the
channel_announcement and that channel_update to the store: we need
that first update so we can set the channel_announcement timestamp.
However, the channel_update can be replaced later. This means we can
have a channel_announcement, a node_update which relies on it, then
the channel_update later.
So move the "this applies to a pending announcement" check lower, where
gossip_store can use it too. Has a nice side-effect of avoiding
one lookup of the node id.
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
Fix a path where tal_free is called on an uninitialized variable
If the first `goto bad_total` executes, then that path has
uninitialized `short_route` but bad_total passes through to `out`
whose first call is tal_free(short_route).
This was noticed by a maybe-uninitialized heuristic on gcc 7.4.0:
gossipd/routing.c: In function ‘find_shorter_route’:
gossipd/routing.c:1096:2: error: ‘short_route’ may be used
uninitialized in this function [-Werror=maybe-uninitialized]
tal_free(short_route);
Reported-by: @ZmnSCPxj <https://github.com/ElementsProject/lightning/pull/2674#issuecomment-495617253>
Signed-off-by: William Casarin <jb55@jb55.com>
Each destructor2 costs 40 bytes, and struct chan is only 120 bytes. So
this drops our memory usage quite a bit:
MCP bench results change:
-vsz_kb:580004-580016(580006+/-4.8)
+vsz_kb:533148
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
We now have a test blockchain for MCP which has the correct channels,
so this is not needed.
Also fix a benchmark script bug where 'mv "$DIR"/log
"$DIR"/log.old.$$' would fail if you log didn't exist from a previous run.
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
We need to store the channel capacity for channel_announcement: hand it
in directly rather than having the gossip_store code do a lookup.
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
Now we can benchmark, and remove 500 bytes per node.
MCP results from 5 runs, min-max(mean +/- stddev):
store_load_msec:35093-37907(36146+/-1.1e+03)
vsz_kb:555168
store_rewrite_sec:12.120000-13.750000(12.7+/-0.6)
listnodes_sec:1.270000-1.370000(1.322+/-0.039)
listchannels_sec:29.770000-31.600000(30.82+/-0.64)
routing_sec:0.00
peer_write_all_sec:63.630000-67.850000(65.432+/-1.7)
MCP notable changes from pre-Dijkstra (>1 stddev):
-vsz_kb:577456
+vsz_kb:555168
-routing_sec:60.70
+routing_sec:12.04
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
Do it inside the can_reach() function, which is less optimal for BFG
which does 20 ops on the same channel, but fine for Dijkstra.
This does have a measurable cost, so we might want to use
non-cryptographic fuzz in future:
$ gossipd/test/run-bench-find_route 100000 100:
Before:
100 (100 succeeded) routes in 100000 nodes in 97346 msec (973461784 nanoseconds per route)
After:
100 (100 succeeded) routes in 100000 nodes in 113381 msec (1133813412 nanoseconds per route)
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
If a route is too long, we try to bias Dijkstra towards choosing a
shorter route by adding a per-hop cost. We do a naive "shortest path"
pass, then using that cost as a ceiling on per-hop cost, we do a
binary search.
There are some subtleties: we use risk rather than total as our
counter field (we normally bias this by 1 anyway, so it's easy to make
that a variable), and we set riskfactor to a mimimal value once we're
iterating. It's good enough to get a solution, we don't need to do a
2-dimensional search on riskfactor and riskbias.
Of course, this is extremely slow if we hit it on our benchmark,
though it doesn't happen in a more realistic network:
$ gossipd/test/run-bench-find_route 100000 100:
Before:
100 (79 succeeded) routes in 100000 nodes in 25341 msec (253412314 nanoseconds per route)
After:
100 (100 succeeded) routes in 100000 nodes in 97346 msec (973461784 nanoseconds per route)
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
Our uintmap can be a little slow with all the reallocation, so leave
NULL entries and walk to find the first one. Since we don't clean
them up, keep a cache of where the min non-all-NULL value is in the
heap.
It's clearer benefit on really large tests, so here's 1M nodes:
Comparison using gossipd/test/run-bench-find_route 1000000 10:
Before:
10 (10 succeeded) routes in 1000000 nodes in 91995 msec (9199532898 nanoseconds per route)
After:
10 (10 succeeded) routes in 1000000 nodes in 20605 msec (2060539287 nanoseconds per route)
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
Use a uintmap as our minheap.
Note that Dijkstra can give overlength routes, so some checks are disabled.
Comparison using gossipd/test/run-bench-find_route 100000 10:
Before:
10 (10 succeeded) routes in 100000 nodes in 120087 msec (12008708402 nanoseconds per route)
After:
10 (10 succeeded) routes in 100000 nodes in 2269 msec (226925462 nanoseconds per route)
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
When we compact the store, we need to adjust the broadast index for
peers so they know where they're up to.
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
This requires some trickiness when we want to re-add unannounced channels
to the store after compaction, so we extract a common "copy_message" to
transfer from old store to new.
MCP results from 5 runs, min-max(mean +/- stddev):
store_load_msec:36034-37853(37109.8+/-5.9e+02)
vsz_kb:577456
store_rewrite_sec:12.490000-13.250000(12.862+/-0.27)
listnodes_sec:1.250000-1.480000(1.364+/-0.09)
listchannels_sec:30.820000-31.480000(31.068+/-0.24)
routing_sec:26.940000-27.990000(27.616+/-0.39)
peer_write_all_sec:65.690000-68.600000(66.698+/-0.99)
MCP notable changes from previous patch (>1 stddev):
-vsz_kb:1202316
+vsz_kb:577456
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
The txout_script field is unused; the local_disable only applies to
the handful of local channels, so move that into a hash table.
MCP results from 5 runs, min-max(mean +/- stddev):
store_load_msec:39207-45089(41374.6+/-2.2e+03)
vsz_kb:1202316
store_rewrite_sec:15.090000-16.790000(15.654+/-0.63)
listnodes_sec:1.290000-3.790000(1.938+/-0.93)
listchannels_sec:30.190000-32.120000(31.31+/-0.69)
routing_sec:28.220000-31.340000(29.314+/-1.2)
peer_write_all_sec:66.830000-76.850000(71.976+/-3.6)
MCP notable changes from previous patch (>1 stddev):
-store_load_msec:35107-37944(36686+/-1e+03)
+store_load_msec:39207-45089(41374.6+/-2.2e+03)
-vsz_kb:1218036
+vsz_kb:1202316
-listchannels_sec:28.510000-30.270000(29.6+/-0.6)
+listchannels_sec:30.190000-32.120000(31.31+/-0.69)
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
We used to have a `struct chan` while we're waiting for an update; now we
keep that internally. So a `struct chan` without a channel_announcement
in the store is private, and other is public.
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
Reload them from disk if they do listnodes.
MCP results from 5 runs, min-max(mean +/- stddev):
store_load_msec:35390-38659(37336.4+/-1.3e+03)
vsz_kb:1780516
store_rewrite_sec:13.800000-16.800000(15.02+/-0.98)
listnodes_sec:1.280000-1.530000(1.382+/-0.096)
listchannels_sec:28.700000-30.440000(29.34+/-0.68)
routing_sec:30.120000-31.080000(30.526+/-0.35)
peer_write_all_sec:65.910000-76.850000(69.462+/-4.1)
MCP notable changes from previous patch (>1 stddev):
-vsz_kb:1792996
+vsz_kb:1780516
-listnodes_sec:1.030000-1.120000(1.068+/-0.032)
+listnodes_sec:1.280000-1.530000(1.382+/-0.096)
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
We currently create a struct chan when we receive a `channel_announcement`,
but we can only broadcast once we have a `channel_update` (since that
provides the timestamp).
This means a `struct chan` can be in a weird state where it exists,
but is unusable (can't use without an update), and also means we need to
keep the channel_announcement message around until an update arrives, so
we can put it in the gossip_store.
Instead, keep track of these "unupdated" channels separately, and check
for them in all the places we search for a specific channel to update.
MCP results from 5 runs, min-max(mean +/- stddev):
store_load_msec:30640-33236(32202+/-8.7e+02)
vsz_kb:1812956
store_rewrite_sec:13.410000-16.970000(14.438+/-1.3)
listnodes_sec:0.590000-0.660000(0.62+/-0.033)
listchannels_sec:28.140000-29.560000(28.816+/-0.56)
routing_sec:29.530000-32.590000(30.352+/-1.1)
peer_write_all_sec:60.380000-61.320000(60.836+/-0.37)
MCP notable changes from previous patch (>1 stddev):
-vsz_kb:1812904
+vsz_kb:1812956
-store_rewrite_sec:21.390000-27.070000(23.596+/-2.4)
+store_rewrite_sec:13.410000-16.970000(14.438+/-1.3)
-listnodes_sec:1.120000-1.230000(1.176+/-0.044)
+listnodes_sec:0.590000-0.660000(0.62+/-0.033)
-listchannels_sec:38.900000-50.580000(44.716+/-3.9)
+listchannels_sec:28.140000-29.560000(28.816+/-0.56)
-routing_sec:45.080000-48.160000(46.814+/-1.1)
+routing_sec:29.530000-32.590000(30.352+/-1.1)
-peer_write_all_sec:58.780000-87.150000(72.278+/-9.7)
+peer_write_all_sec:60.380000-61.320000(60.836+/-0.37)
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
If we have a channel_announcement, we catch any node_announcement for
either end while we validate the channel_announcement. But if we have
multiple channel_announcements and the first one failed to verify, it
would remove this catch, meaning we'd discard following node_announcements
even though there was a pending channel_announcement.
The answer is to use a simple reference count, and as a further
optimization, only place the `pending_node_announce` if there's no
node already.
We also move the process_pending_node_announcement() calls lower down,
so *any* new channel creation checks it. This is more robust, and
will prove useful for the next patch, where we can use the same
mechanism to handle node_announcements on channel_announcements which
are verified, but don't yet have a channel_update.
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
This is currently done higher up, in handle_channel_update(), but
that's one reason why handle_channel_update() has to do a channel
lookup. Moving the check down means handle_channel_update() can do a
minimal "get node id for this channel" so it can check the signature.
This helps, because the chan lookup semantics are changing in the next
few patches.
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
If we need the payload, pull it from the gossip store.
MCP results from 5 runs, min-max(mean +/- stddev):
store_load_msec:30189-52561(39416.4+/-8.8e+03)
vsz_kb:1812904
store_rewrite_sec:21.390000-27.070000(23.596+/-2.4)
listnodes_sec:1.120000-1.230000(1.176+/-0.044)
listchannels_sec:38.900000-50.580000(44.716+/-3.9)
routing_sec:45.080000-48.160000(46.814+/-1.1)
peer_write_all_sec:58.780000-87.150000(72.278+/-9.7)
MCP notable changes from previous patch (>1 stddev):
-vsz_kb:2288784
+vsz_kb:1812904
-store_rewrite_sec:38.060000-39.130000(38.426+/-0.39)
+store_rewrite_sec:21.390000-27.070000(23.596+/-2.4)
-listnodes_sec:0.750000-0.850000(0.794+/-0.042)
+listnodes_sec:1.120000-1.230000(1.176+/-0.044)
-listchannels_sec:30.740000-31.760000(31.096+/-0.35)
+listchannels_sec:38.900000-50.580000(44.716+/-3.9)
-routing_sec:29.600000-33.560000(30.472+/-1.5)
+routing_sec:45.080000-48.160000(46.814+/-1.1)
-peer_write_all_sec:49.220000-52.690000(50.892+/-1.3)
+peer_write_all_sec:58.780000-87.150000(72.278+/-9.7)
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
Instead of an arbitrary counter, we can use the file offset for our
partial ordering, removing a field. It takes some care when we compact
the store, however, as this field changes.
MCP results from 5 runs, min-max(mean +/- stddev):
store_load_msec:34271-35283(34789.6+/-3.3e+02)
vsz_kb:2288784
store_rewrite_sec:38.060000-39.130000(38.426+/-0.39)
listnodes_sec:0.750000-0.850000(0.794+/-0.042)
listchannels_sec:30.740000-31.760000(31.096+/-0.35)
routing_sec:29.600000-33.560000(30.472+/-1.5)
peer_write_all_sec:49.220000-52.690000(50.892+/-1.3)
MCP notable changes from previous patch (>1 stddev):
-store_load_msec:35685-38538(37090.4+/-9.1e+02)
+store_load_msec:34271-35283(34789.6+/-3.3e+02)
-vsz_kb:2288768
+vsz_kb:2288784
-peer_write_all_sec:51.140000-58.350000(55.69+/-2.4)
+peer_write_all_sec:49.220000-52.690000(50.892+/-1.3)
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
This is more compact, but also required once we replace the arbitrary
"index" with an actual offset into the gossip store. That will let us
remove the in-memory variants entirely.
MCP results from 5 runs, min-max(mean +/- stddev):
store_load_msec:35685-38538(37090.4+/-9.1e+02)
vsz_kb:2288768
store_rewrite_sec:35.530000-41.230000(37.904+/-2.3)
listnodes_sec:0.720000-0.810000(0.762+/-0.041)
listchannels_sec:30.750000-35.990000(32.704+/-2)
routing_sec:29.570000-34.010000(31.374+/-1.8)
peer_write_all_sec:51.140000-58.350000(55.69+/-2.4)
MCP notable changes from previous patch (>1 stddev):
-vsz_kb:2621808
+vsz_kb:2288768
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
We used an s64 so we could use -1 and save a check, but that's just
silly as we have adjacent non-u64 fields: wastes 7 bytes per node
and 16 per channel.
Interestingly, this seemed to make us a little slower for some reason.
MCP results from 5 runs, min-max(mean +/- stddev):
store_load_msec:35569-38776(37169.8+/-1.2e+03)
vsz_kb:2621808
store_rewrite_sec:35.870000-40.290000(38.14+/-1.6)
listnodes_sec:0.740000-0.800000(0.768+/-0.023)
listchannels_sec:29.820000-32.730000(30.972+/-0.99)
routing_sec:30.110000-30.590000(30.346+/-0.18)
peer_write_all_sec:52.420000-59.160000(54.692+/-2.5)
MCP notable changes from previous patch (>1 stddev):
-store_load_msec:32825-36365(34615.6+/-1.1e+03)
+store_load_msec:35569-38776(37169.8+/-1.2e+03)
-vsz_kb:2637488
+vsz_kb:2621808
-store_rewrite_sec:35.150000-36.200000(35.59+/-0.4)
+store_rewrite_sec:35.870000-40.290000(38.14+/-1.6)
-listnodes_sec:0.590000-0.710000(0.682+/-0.046)
+listnodes_sec:0.740000-0.800000(0.768+/-0.023)
-peer_write_all_sec:49.020000-52.890000(50.376+/-1.5)
+peer_write_all_sec:52.420000-59.160000(54.692+/-2.5)
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
Don't turn them to/from pubkeys implicitly. This means nodeids in the store
don't get converted, but bitcoin keys still do.
MCP results from 5 runs, min-max(mean +/- stddev):
store_load_msec:33934-35251(34531.4+/-5e+02)
vsz_kb:2637488
store_rewrite_sec:34.720000-35.130000(34.94+/-0.14)
listnodes_sec:1.020000-1.290000(1.146+/-0.086)
listchannels_sec:51.110000-58.240000(54.826+/-2.5)
routing_sec:30.000000-33.320000(30.726+/-1.3)
peer_write_all_sec:50.370000-52.970000(51.646+/-1.1)
MCP notable changes from previous patch (>1 stddev):
-store_load_msec:46184-47474(46673.4+/-4.5e+02)
+store_load_msec:33934-35251(34531.4+/-5e+02)
-vsz_kb:2638880
+vsz_kb:2637488
-store_rewrite_sec:46.750000-48.280000(47.512+/-0.51)
+store_rewrite_sec:34.720000-35.130000(34.94+/-0.14)
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
I tried to just do gossipd, but it was uncontainable, so this ended up being
a complete sweep.
We didn't get much space saving in gossipd, even though we should save
24 bytes per node.
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
Allocating a htable is overkill for most nodes; we can fit 11 pointers
in the same space (10, since we use 1 to indicate we're using an array).
MCP results from 5 runs, min-max(mean +/- stddev):
store_load_msec:45947-47016(46683.4+/-4e+02)
vsz_kb:2639240
store_rewrite_sec:46.950000-49.830000(48.048+/-0.95)
listnodes_sec:1.090000-1.350000(1.196+/-0.095)
listchannels_sec:48.960000-57.640000(53.358+/-2.8)
routing_sec:29.990000-33.880000(31.088+/-1.4)
peer_write_all_sec:49.360000-53.210000(51.338+/-1.4)
MCP notable changes from previous patch (>1 stddev):
- vsz_kb:2641316
+ vsz_kb:2639240
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
Makes the next step easier.
MCP results from 5 runs, min-max(mean +/- stddev):
store_load_msec:45791-46917(46330.4+/-3.6e+02)
vsz_kb:2641316
store_rewrite_sec:47.040000-48.720000(47.684+/-0.57)
listnodes_sec:1.140000-1.340000(1.2+/-0.072)
listchannels_sec:50.970000-54.250000(52.698+/-1.3)
routing_sec:29.950000-31.010000(30.332+/-0.37)
peer_write_all_sec:51.570000-52.970000(52.1+/-0.54)
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
Either private or simply not enough confirms. They would have been added
on reconnect, but that's not ideal.
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
This lets us benchmark without a valid blockchain.
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
Header from folded patch 'fixup!_gossipd__dev_option_to_allow_unknown_channels.patch':
fixup! gossipd: dev option to allow unknown channels.
Suggested-by: @cdecker
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
For giant nodes, it seems we spend a lot of time memmoving this array.
Normally we'd go for a linked list, but that's actually hard: each
channel has two nodes, so needs two embedded list pointers, and when
iterating there's no good way to figure out which embedded pointer
we'd be using.
So we (ab)use htable; we don't really need an index, but it's good for
cache-friendly iteration (our main operation). We can actually change
to a hybrid later to avoid the extra allocation for small nodes.
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
If we asked `bitcoind` for a txout and it failed we were not storing that
information anywhere, meaning that when we see the channel announcement the
next time we'd be reaching out to `lightningd` and `bitcoind` again, just to
see it fail again. This adds an in-memory cache for these failures so we can
just ignore these the next time around.
Fixes#2503
Signed-off-by: Christian Decker <decker.christian@gmail.com>
We need to do it in various places, but we shouldn't do it lightly:
the primitives are there to help us get overflow handling correct.
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
Basically we tell it that every field ending in '_msat' is a struct
amount_msat, and 'satoshis' is an amount_sat. The exceptions are
channel_update's fee_base_msat which is a u32, and
final_incorrect_htlc_amount's incoming_htlc_amt which is also a
'struct amount_msat'.
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
As a side-effect of using amount_msat in gossipd/routing.c, we explicitly
handle overflows and don't need to pre-prune ridiculous-fee channels.
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
Up until now, riskfactor was useless due to implementation bugs, and
also the default setting is wrong (too low to have an effect on
reasonable payment scenarios).
Let's simplify the definition (by assuming that P(failure) of a node
is 1), to make it a simple percentage. I examined the current network
fees to see what would work, and under this definition, a default of
10 seems reasonable (equivalent to 1000 under the old definition).
It is *this* change which finally fixes our test case! The riskfactor
is now 40msat (1500000 * 14 * 10 / 5259600 = 39.9), comparable with
worst-case fuzz is 50msat (1001 * 0.05 = 50).
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
We were only comparing by total msatoshis.
Note, this *still* isn't sufficient to fix our indirect problem, as
our risk values are all 1 (the minimum):
lightning_gossipd(25480): 2 hop solution: 1501990 + 2
lightning_gossipd(25480): 3 hop solution: 1501971 + 3
...
lightning_gossipd(25480): => chose 3 hop solution
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
We have a seed, which is for (future!) unit testing consistency. This
makes it change every time, so our pay_direct_test is more useful.
I tried restarting the noed around the loop, but it tended to fail
rebinding to the same port for some reason?
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
As a general rule, lightningd shouldn't parse user packets. We move the
parsing into gossipd, and have it respond only to permanent failures.
Note that we should *not* unconditionally remove a channel on
WIRE_INVALID_ONION_HMAC, as this can be triggered (and we do!) by
feeding sendpay a route with an incorrect pubkey.
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
Currently only used by gossipd for channel elimination.
Also print them in canonical form (/[01]), so tests need to be
changed.
Suggested-by: @cdecker
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
Christian and I both unwittingly used it in form:
*tal_arr_expand(&x) = tal(x, ...)
Since '=' isn't a sequence point, the compiler can (and does!) cache
the value of x, handing it to tal *after* tal_arr_expand() moves it
due to tal_resize().
The new version is somewhat less convenient to use, but doesn't have
this problem, since the assignment is always evaluated after the
resize.
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
This is mainly just copying over the copy-editing from the
lightning-rfc repository.
[ Split to just perform changes after the UNKNOWN_PAYMENT_HASH change --RR ]
Signed-off-by: Christian Decker <decker.christian@gmail.com>
Reported-by: Rusty Russell <@rustyrussell>
This is mainly just copying over the copy-editing from the
lightning-rfc repository.
[ Split to just perform changes prior to the UNKNOWN_PAYMENT_HASH change --RR ]
Signed-off-by: Christian Decker <decker.christian@gmail.com>
Reported-by: Rusty Russell <@rustyrussell>
We keep a chain_hash in struct daemon, becayse otherwise we end up with
`&peer->daemon->rstate->chainparams->genesis_blockhash` which is a bit
ridiculous.
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
This avoids some very ugly switch() statements which mixed the two,
but we also take the chance to rename 'towire_gossip_' to
'towire_gossipd_' for those inter-daemon messages; they're messages to
gossipd, not gossip messages.
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
Messages from a peer may be invalid in many ways: we send an error
packet in that case. Rather than internally calling peer_error,
however, we make it explicit by having the handle_ functions return
NULL or an error packet.
Messages from the daemon itself should not be invalid: we log an error
and close the fd to them if it is. Previously we logged an error but
didn't kill them.
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
If another channel has set the optional `htlc_maximum_msat` field,
we should correctly parse that field and respect it when drawing up
routes for payments.
And use ARRAY_SIZE() everywhere which will break compile if it's not a
literal array, plus assertions that it's the same length.
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
We do this a lot, and had boutique helpers in various places. So add
a more generic one; for convenience it returns a pointer to the new
end element.
I prefer the name tal_arr_expand to tal_arr_append, since it's up to
the caller to populate the new array entry.
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
We have a lot of infrastructure to delay local channel_updates to
avoid spamming on each peer reconnect; we had to keep tracking of
pending ones though, in case we needed the very latest for sending an
error when failing an HTLC.
Instead, it's far simpler to set the local_disabled flag on a channel
when we disconnect, but only send a disabling channel_update if we
actually fail an HTLC.
Note: handle_channel_update() TAKES update (due to tal_arr_dup), but we
didn't use that before. Now we do, add annotation.
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
We trade channel_update before channel_announce makes the channel
public, and currently forget them when we finally get the
channel_announce. We should instead apply them, and not rely on
retransmission (which we remove in the next patch!).
This earlier channel_update means test_gossip_jsonrpc triggers too
early, so have that wait for node_announcement.
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
It's simpler and more robust to just check that it's not yet announced
(the broadcast index will be 0).
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
BOLT 7's been updated to split the flags field in `channel_update`
into two: `channel_flags` and `message_flags`. This changeset does the
minimal necessary to get to building with the new flags.
If we receive a channel_announce but not a channel_update, we store the announce
but don't put it in the broadcast map.
When we delete a channel, we check if the node_announcement broadcast
now preceeds all channel_announcements, and if so, we move it to the
end of the map. However, with a channel_announcement at index '0',
this test fails.
This is at least one potential cause of the node map getting out of order.
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
These happen after we compact the store; every log I've seen of a
restart on a real node has a message about truncating the store,
because node_announcements predate channel_announcements.
I extracted one such case from testnet, and reduced it to test here.
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
As pointed out by @rustyrussell the capacity is now always defined, so we can
fold that into the construction of the channel itself.
Reported-by: Rusty Russell <@rustyrussell>
Signed-off-by: Christian Decker <@cdecker>
The `htlc_minimum_msat` parameter was ignored so far, and we'd be attempting to
pay and hitting a brick wall by doing so. This patch just skips channels that
are not eligible anyway.
We know the total channel capacity after checking for its existence on-chain, so
we can actually make use of that information to discard channels that don't have
a sufficient capacity anyway, reducing the number of failed attempts.
We were adding channels without their capacity, and eventually annotated them
when we exchanged `channel_update`s. This worked as long as we weren't
considering the channel capacity, but would result in local-only channels to be
unusable once we start checking.
'cursor < ser + max' isn't valid because we reduce 'max' as we go! Effectively
we'll stop once we're past halfway, which can only happen with ipv6 + a torv2
address.
Ths fix is one-line, but we rename 'max' to 'len' which makes its purpose
clearer.
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
tal_count() is used where there's a type, even if it's char or u8, and
tal_bytelen() is going to replace tal_len() for clarity: it's only needed
where a pointer is void.
We shim tal_bytelen() for now.
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
We used to just manually set ROUTING_FLAGS_DISABLED, but that means we
then suppressed the real channel_update because we thought it was a
duplicate!
So use a local flag: set it for the channel when the peer disconnects,
and clear it when channeld sends a local update.
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
This doesn't do anything for us now, since we actually tend to produce
DISABLE/ENABLE update pairs. But the infrastructure is useful for the
next patch.
We also add more details to the trace message in the core update code.
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
structeq() is too dangerous: if a structure has padding, it can fail
silently.
The new ccan/structeq instead provides a macro to define foo_eq(),
which does the right thing in case of padding (which none of our
structures currently have anyway).
Upgrade ccan, and use it everywhere. Except run-peer-wire.c, which
is only testing code and can use raw memcmp(): valgrind will tell us
if padding exists.
Interestingly, we still declared short_channel_id_eq, even though
we didn't define it any more!
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
We wrap it in 'struct pubkey' for typesafety and consistency, and the
next patch takes advantage of that when we move to pubkey_eq.
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
This makes the exposed interface much smaller, cleaner and will allow us to just
replay gossip messages from the broadcast.
Signed-off-by: Christian Decker <decker.christian@gmail.com>
Two cases:
1. Node no longer has any public channels: remove node_announcement.
2. Node's node_announcement now preceeds all the channel_announcements:
move node_announcement to the end.
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
This lets detect if a node announce preceeds a channel announce once we
delete the node announcement.
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
We *accept* a node_announce if we have a channel_announce, but we
can't queue it until we queue the channel_announce, which we only do
once we have recieved a channel_update.
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
This is kind of orthogonal to the other changes, but makes sense: if we
would instantly or never prune the message, don't accept it.
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
In general, we need to only publish node announcements after
publishing channel announcements, though we can accept node
announcements as soon as we see channel announcements. So we keep a
flag for those node_announcement which haven't been broadcast yet.
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
handle_pending_cannouncement might not actually add the announcment,
as it could be waiting for a channel_update. We need to wait for
the actual announcement before considering announcing our node.
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
We generate new ones anyway; removing this code changes fixes coming
up which now only need to change one place.
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
We erroneously create updates with the same timestamps when tests run
quickly, and the second one is ignored.
We've already noted that this should be fixed: gossipd should generate
all the updates, as it already has to do the case where channeld
crashed, for example. But that's a bigger change.
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
@cdecker points out that in test_forward, where we manually create a route,
we get an error back which contains an update for an unknown channel.
We should still note this, but it's not an error for testing.
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
This is something which generally shouldn't happen, but we didn't
notice it previously.
We ignore this warning in the case where a channel was deleted: this
happens because one side can send an update while the other notices
that the channel is closed.
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
Note: this will break the gossip_store if they have current channels,
but it will fail to parse and be discarded.
Have local_add_channel do just that: the update is logically separate
and can be sent separately.
This removes the ugly 'bool add_to_store' flag.
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
1. If we have a channel_announcement, the channel is public, otherwise
it's not. Not all channels are public, as they can be local: those
have a NULL channel_announcement.
2. If we don't have a channel_update, we know nothing about that half
of the channel, and no other fields are valid.
3. We can tell if a half channel is disabled by the flags field directly.
Note that we never send halfchannels without an update over
gossip_getchannels_reply so that marshalling/unmarshalling can be
vastly simplified.
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
Make the update/announce messages own the element in the broadcast map
not the other way around.
Then we keep a pointer to the message, and when we free it
(eg. channel closed, update replaces it), it gets freed from the
broadcast map automatically.
The result is much nicer!
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
Basically, if we don't have an announcement for the channel, stash it,
and once we get an announcement, replay if necessary.
Fixes: #1485
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
Someone could try to announce an internal address, and we might probe
it.
This breaks tests, so we add '--dev-allow-localhost' for our tests, so
we don't eliminate that one. Of course, now we need to skip some more
tests in non-developer mode.
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
If channeld dies for some reason (eg, reconnect) and we didn't yet announce
the channel, we can miss doing so. This is unusual, because if lightningd
restarts it rearms the callback which gives us funding_locked, so it only
happens if just channel dies before sending the announcement message.
This problem applies to both temporary announcement (for gossipd) and
the real one. For the temporary one, simply re-send on startup, and
remote the error msg gossipd gives if it sees a second one. For the
real one, we need a flag to tell us the depth is sufficient; the peer
will ignore re-sends anyway.
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
If something goes (fatally) wrong, we won't add it to the store.
This reveals a latent bug in routing_add_channel_announcement() and
friend which did a take() on msg, which it doesn't own. TAKES means
that it will take ownership IF the caller requests, not an unconditional
ownership transfer (which is an antipattern).
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
We enter nodes in the map when we create channels, but those channels
could be local and unannounced. This triggered a failure in
test_gossip_persistence since the store truncated when it saw the
first thing was a node_announce.
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
Internally both payment and routing use 64-bit, but the interface
between them used 32-bit.
Since both components already support 64-bit we should use that.
This was a tricky one to find, it turns out that some nodes are sending
node_announcements even if they don't have a channel announced yet. If they are
a peer and the channel is currently verifying then we'll have a local channel in
the network view, hence accept the node_announcement, but when replaying, the
node_announcement will be replayed and we won't have a channel yet. This just
skips node_announcements, which is always safe.
Reported-by: @laszlohanyecz
Signed-off-by: Christian Decker <decker.christian@gmail.com>
Messages from peers and messages from the gossip_store now have completely
different entrypoints, so we don't need to trace their origin around the message
handling code any longer.
This stores and reads the channel_announcements in the wrapping message which
allows us to store associated data with the raw channel_announcements.
The gossip_store applies channel_announcements directly but it also returns it,
and it gets discarded as a duplicate. In the next commit we'll have gossip_store
apply all changes, bypassing verification, so the duplication is only temporary.
Signed-off-by: Christian Decker <decker.christian@gmail.com>
Moves any modifications based on an incoming gossip message into its own
function separate from the message verification. This allows us to skip
verification when reading messages from a trusted source, e.g., the
gossip_store, speeding up the gossip replay.
Signed-off-by: Christian Decker <decker.christian@gmail.com>
When we read from the gossip_store we set store=false so that we don't duplicate
messages in the store.
Signed-off-by: Christian Decker <decker.christian@gmail.com>
As we add more features, the current code is insufficient.
1. Keep an array of single feature bits, for easy switching on and off.
2. Create feature_offered() which checks for both compulsory and optional
variants.
3. Invert requires_unsupported_features() and unsupported_features()
which tend to be double-negative, all_supported_features() and
features_supported().
4. Move single feature definition from wire/peer_wire.h to common/features.h.
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
We currently keep two copies; one in the broadcast structure to send
in order, and one in the routing information. Since we already keep
the broadcast index in the routing information, use that.
Conveniently, a zero index is the same as the old NULL test.
Rename struct node's announcement_idx to node_announce_msgidx to
make it match the other users.
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>