Now we can benchmark, and remove 500 bytes per node.
MCP results from 5 runs, min-max(mean +/- stddev):
store_load_msec:35093-37907(36146+/-1.1e+03)
vsz_kb:555168
store_rewrite_sec:12.120000-13.750000(12.7+/-0.6)
listnodes_sec:1.270000-1.370000(1.322+/-0.039)
listchannels_sec:29.770000-31.600000(30.82+/-0.64)
routing_sec:0.00
peer_write_all_sec:63.630000-67.850000(65.432+/-1.7)
MCP notable changes from pre-Dijkstra (>1 stddev):
-vsz_kb:577456
+vsz_kb:555168
-routing_sec:60.70
+routing_sec:12.04
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
Use a uintmap as our minheap.
Note that Dijkstra can give overlength routes, so some checks are disabled.
Comparison using gossipd/test/run-bench-find_route 100000 10:
Before:
10 (10 succeeded) routes in 100000 nodes in 120087 msec (12008708402 nanoseconds per route)
After:
10 (10 succeeded) routes in 100000 nodes in 2269 msec (226925462 nanoseconds per route)
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
When we compact the store, we need to adjust the broadast index for
peers so they know where they're up to.
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
This requires some trickiness when we want to re-add unannounced channels
to the store after compaction, so we extract a common "copy_message" to
transfer from old store to new.
MCP results from 5 runs, min-max(mean +/- stddev):
store_load_msec:36034-37853(37109.8+/-5.9e+02)
vsz_kb:577456
store_rewrite_sec:12.490000-13.250000(12.862+/-0.27)
listnodes_sec:1.250000-1.480000(1.364+/-0.09)
listchannels_sec:30.820000-31.480000(31.068+/-0.24)
routing_sec:26.940000-27.990000(27.616+/-0.39)
peer_write_all_sec:65.690000-68.600000(66.698+/-0.99)
MCP notable changes from previous patch (>1 stddev):
-vsz_kb:1202316
+vsz_kb:577456
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
The txout_script field is unused; the local_disable only applies to
the handful of local channels, so move that into a hash table.
MCP results from 5 runs, min-max(mean +/- stddev):
store_load_msec:39207-45089(41374.6+/-2.2e+03)
vsz_kb:1202316
store_rewrite_sec:15.090000-16.790000(15.654+/-0.63)
listnodes_sec:1.290000-3.790000(1.938+/-0.93)
listchannels_sec:30.190000-32.120000(31.31+/-0.69)
routing_sec:28.220000-31.340000(29.314+/-1.2)
peer_write_all_sec:66.830000-76.850000(71.976+/-3.6)
MCP notable changes from previous patch (>1 stddev):
-store_load_msec:35107-37944(36686+/-1e+03)
+store_load_msec:39207-45089(41374.6+/-2.2e+03)
-vsz_kb:1218036
+vsz_kb:1202316
-listchannels_sec:28.510000-30.270000(29.6+/-0.6)
+listchannels_sec:30.190000-32.120000(31.31+/-0.69)
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
We used to have a `struct chan` while we're waiting for an update; now we
keep that internally. So a `struct chan` without a channel_announcement
in the store is private, and other is public.
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
Reload them from disk if they do listnodes.
MCP results from 5 runs, min-max(mean +/- stddev):
store_load_msec:35390-38659(37336.4+/-1.3e+03)
vsz_kb:1780516
store_rewrite_sec:13.800000-16.800000(15.02+/-0.98)
listnodes_sec:1.280000-1.530000(1.382+/-0.096)
listchannels_sec:28.700000-30.440000(29.34+/-0.68)
routing_sec:30.120000-31.080000(30.526+/-0.35)
peer_write_all_sec:65.910000-76.850000(69.462+/-4.1)
MCP notable changes from previous patch (>1 stddev):
-vsz_kb:1792996
+vsz_kb:1780516
-listnodes_sec:1.030000-1.120000(1.068+/-0.032)
+listnodes_sec:1.280000-1.530000(1.382+/-0.096)
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
We currently create a struct chan when we receive a `channel_announcement`,
but we can only broadcast once we have a `channel_update` (since that
provides the timestamp).
This means a `struct chan` can be in a weird state where it exists,
but is unusable (can't use without an update), and also means we need to
keep the channel_announcement message around until an update arrives, so
we can put it in the gossip_store.
Instead, keep track of these "unupdated" channels separately, and check
for them in all the places we search for a specific channel to update.
MCP results from 5 runs, min-max(mean +/- stddev):
store_load_msec:30640-33236(32202+/-8.7e+02)
vsz_kb:1812956
store_rewrite_sec:13.410000-16.970000(14.438+/-1.3)
listnodes_sec:0.590000-0.660000(0.62+/-0.033)
listchannels_sec:28.140000-29.560000(28.816+/-0.56)
routing_sec:29.530000-32.590000(30.352+/-1.1)
peer_write_all_sec:60.380000-61.320000(60.836+/-0.37)
MCP notable changes from previous patch (>1 stddev):
-vsz_kb:1812904
+vsz_kb:1812956
-store_rewrite_sec:21.390000-27.070000(23.596+/-2.4)
+store_rewrite_sec:13.410000-16.970000(14.438+/-1.3)
-listnodes_sec:1.120000-1.230000(1.176+/-0.044)
+listnodes_sec:0.590000-0.660000(0.62+/-0.033)
-listchannels_sec:38.900000-50.580000(44.716+/-3.9)
+listchannels_sec:28.140000-29.560000(28.816+/-0.56)
-routing_sec:45.080000-48.160000(46.814+/-1.1)
+routing_sec:29.530000-32.590000(30.352+/-1.1)
-peer_write_all_sec:58.780000-87.150000(72.278+/-9.7)
+peer_write_all_sec:60.380000-61.320000(60.836+/-0.37)
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
Instead of an arbitrary counter, we can use the file offset for our
partial ordering, removing a field. It takes some care when we compact
the store, however, as this field changes.
MCP results from 5 runs, min-max(mean +/- stddev):
store_load_msec:34271-35283(34789.6+/-3.3e+02)
vsz_kb:2288784
store_rewrite_sec:38.060000-39.130000(38.426+/-0.39)
listnodes_sec:0.750000-0.850000(0.794+/-0.042)
listchannels_sec:30.740000-31.760000(31.096+/-0.35)
routing_sec:29.600000-33.560000(30.472+/-1.5)
peer_write_all_sec:49.220000-52.690000(50.892+/-1.3)
MCP notable changes from previous patch (>1 stddev):
-store_load_msec:35685-38538(37090.4+/-9.1e+02)
+store_load_msec:34271-35283(34789.6+/-3.3e+02)
-vsz_kb:2288768
+vsz_kb:2288784
-peer_write_all_sec:51.140000-58.350000(55.69+/-2.4)
+peer_write_all_sec:49.220000-52.690000(50.892+/-1.3)
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
This is more compact, but also required once we replace the arbitrary
"index" with an actual offset into the gossip store. That will let us
remove the in-memory variants entirely.
MCP results from 5 runs, min-max(mean +/- stddev):
store_load_msec:35685-38538(37090.4+/-9.1e+02)
vsz_kb:2288768
store_rewrite_sec:35.530000-41.230000(37.904+/-2.3)
listnodes_sec:0.720000-0.810000(0.762+/-0.041)
listchannels_sec:30.750000-35.990000(32.704+/-2)
routing_sec:29.570000-34.010000(31.374+/-1.8)
peer_write_all_sec:51.140000-58.350000(55.69+/-2.4)
MCP notable changes from previous patch (>1 stddev):
-vsz_kb:2621808
+vsz_kb:2288768
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
We used an s64 so we could use -1 and save a check, but that's just
silly as we have adjacent non-u64 fields: wastes 7 bytes per node
and 16 per channel.
Interestingly, this seemed to make us a little slower for some reason.
MCP results from 5 runs, min-max(mean +/- stddev):
store_load_msec:35569-38776(37169.8+/-1.2e+03)
vsz_kb:2621808
store_rewrite_sec:35.870000-40.290000(38.14+/-1.6)
listnodes_sec:0.740000-0.800000(0.768+/-0.023)
listchannels_sec:29.820000-32.730000(30.972+/-0.99)
routing_sec:30.110000-30.590000(30.346+/-0.18)
peer_write_all_sec:52.420000-59.160000(54.692+/-2.5)
MCP notable changes from previous patch (>1 stddev):
-store_load_msec:32825-36365(34615.6+/-1.1e+03)
+store_load_msec:35569-38776(37169.8+/-1.2e+03)
-vsz_kb:2637488
+vsz_kb:2621808
-store_rewrite_sec:35.150000-36.200000(35.59+/-0.4)
+store_rewrite_sec:35.870000-40.290000(38.14+/-1.6)
-listnodes_sec:0.590000-0.710000(0.682+/-0.046)
+listnodes_sec:0.740000-0.800000(0.768+/-0.023)
-peer_write_all_sec:49.020000-52.890000(50.376+/-1.5)
+peer_write_all_sec:52.420000-59.160000(54.692+/-2.5)
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
I tried to just do gossipd, but it was uncontainable, so this ended up being
a complete sweep.
We didn't get much space saving in gossipd, even though we should save
24 bytes per node.
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
Allocating a htable is overkill for most nodes; we can fit 11 pointers
in the same space (10, since we use 1 to indicate we're using an array).
MCP results from 5 runs, min-max(mean +/- stddev):
store_load_msec:45947-47016(46683.4+/-4e+02)
vsz_kb:2639240
store_rewrite_sec:46.950000-49.830000(48.048+/-0.95)
listnodes_sec:1.090000-1.350000(1.196+/-0.095)
listchannels_sec:48.960000-57.640000(53.358+/-2.8)
routing_sec:29.990000-33.880000(31.088+/-1.4)
peer_write_all_sec:49.360000-53.210000(51.338+/-1.4)
MCP notable changes from previous patch (>1 stddev):
- vsz_kb:2641316
+ vsz_kb:2639240
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
Makes the next step easier.
MCP results from 5 runs, min-max(mean +/- stddev):
store_load_msec:45791-46917(46330.4+/-3.6e+02)
vsz_kb:2641316
store_rewrite_sec:47.040000-48.720000(47.684+/-0.57)
listnodes_sec:1.140000-1.340000(1.2+/-0.072)
listchannels_sec:50.970000-54.250000(52.698+/-1.3)
routing_sec:29.950000-31.010000(30.332+/-0.37)
peer_write_all_sec:51.570000-52.970000(52.1+/-0.54)
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
This lets us benchmark without a valid blockchain.
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
Header from folded patch 'fixup!_gossipd__dev_option_to_allow_unknown_channels.patch':
fixup! gossipd: dev option to allow unknown channels.
Suggested-by: @cdecker
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
For giant nodes, it seems we spend a lot of time memmoving this array.
Normally we'd go for a linked list, but that's actually hard: each
channel has two nodes, so needs two embedded list pointers, and when
iterating there's no good way to figure out which embedded pointer
we'd be using.
So we (ab)use htable; we don't really need an index, but it's good for
cache-friendly iteration (our main operation). We can actually change
to a hybrid later to avoid the extra allocation for small nodes.
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
If we asked `bitcoind` for a txout and it failed we were not storing that
information anywhere, meaning that when we see the channel announcement the
next time we'd be reaching out to `lightningd` and `bitcoind` again, just to
see it fail again. This adds an in-memory cache for these failures so we can
just ignore these the next time around.
Fixes#2503
Signed-off-by: Christian Decker <decker.christian@gmail.com>
As a side-effect of using amount_msat in gossipd/routing.c, we explicitly
handle overflows and don't need to pre-prune ridiculous-fee channels.
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
We have a seed, which is for (future!) unit testing consistency. This
makes it change every time, so our pay_direct_test is more useful.
I tried restarting the noed around the loop, but it tended to fail
rebinding to the same port for some reason?
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
As a general rule, lightningd shouldn't parse user packets. We move the
parsing into gossipd, and have it respond only to permanent failures.
Note that we should *not* unconditionally remove a channel on
WIRE_INVALID_ONION_HMAC, as this can be triggered (and we do!) by
feeding sendpay a route with an incorrect pubkey.
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
Currently only used by gossipd for channel elimination.
Also print them in canonical form (/[01]), so tests need to be
changed.
Suggested-by: @cdecker
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
We keep a chain_hash in struct daemon, becayse otherwise we end up with
`&peer->daemon->rstate->chainparams->genesis_blockhash` which is a bit
ridiculous.
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
This avoids some very ugly switch() statements which mixed the two,
but we also take the chance to rename 'towire_gossip_' to
'towire_gossipd_' for those inter-daemon messages; they're messages to
gossipd, not gossip messages.
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
Messages from a peer may be invalid in many ways: we send an error
packet in that case. Rather than internally calling peer_error,
however, we make it explicit by having the handle_ functions return
NULL or an error packet.
Messages from the daemon itself should not be invalid: we log an error
and close the fd to them if it is. Previously we logged an error but
didn't kill them.
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
If another channel has set the optional `htlc_maximum_msat` field,
we should correctly parse that field and respect it when drawing up
routes for payments.
And use ARRAY_SIZE() everywhere which will break compile if it's not a
literal array, plus assertions that it's the same length.
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
We have a lot of infrastructure to delay local channel_updates to
avoid spamming on each peer reconnect; we had to keep tracking of
pending ones though, in case we needed the very latest for sending an
error when failing an HTLC.
Instead, it's far simpler to set the local_disabled flag on a channel
when we disconnect, but only send a disabling channel_update if we
actually fail an HTLC.
Note: handle_channel_update() TAKES update (due to tal_arr_dup), but we
didn't use that before. Now we do, add annotation.
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
BOLT 7's been updated to split the flags field in `channel_update`
into two: `channel_flags` and `message_flags`. This changeset does the
minimal necessary to get to building with the new flags.
If we receive a channel_announce but not a channel_update, we store the announce
but don't put it in the broadcast map.
When we delete a channel, we check if the node_announcement broadcast
now preceeds all channel_announcements, and if so, we move it to the
end of the map. However, with a channel_announcement at index '0',
this test fails.
This is at least one potential cause of the node map getting out of order.
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
These happen after we compact the store; every log I've seen of a
restart on a real node has a message about truncating the store,
because node_announcements predate channel_announcements.
I extracted one such case from testnet, and reduced it to test here.
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
As pointed out by @rustyrussell the capacity is now always defined, so we can
fold that into the construction of the channel itself.
Reported-by: Rusty Russell <@rustyrussell>
Signed-off-by: Christian Decker <@cdecker>
We used to just manually set ROUTING_FLAGS_DISABLED, but that means we
then suppressed the real channel_update because we thought it was a
duplicate!
So use a local flag: set it for the channel when the peer disconnects,
and clear it when channeld sends a local update.
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
We wrap it in 'struct pubkey' for typesafety and consistency, and the
next patch takes advantage of that when we move to pubkey_eq.
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
This lets detect if a node announce preceeds a channel announce once we
delete the node announcement.
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
In general, we need to only publish node announcements after
publishing channel announcements, though we can accept node
announcements as soon as we see channel announcements. So we keep a
flag for those node_announcement which haven't been broadcast yet.
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
handle_pending_cannouncement might not actually add the announcment,
as it could be waiting for a channel_update. We need to wait for
the actual announcement before considering announcing our node.
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
@cdecker points out that in test_forward, where we manually create a route,
we get an error back which contains an update for an unknown channel.
We should still note this, but it's not an error for testing.
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
Note: this will break the gossip_store if they have current channels,
but it will fail to parse and be discarded.
Have local_add_channel do just that: the update is logically separate
and can be sent separately.
This removes the ugly 'bool add_to_store' flag.
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
1. If we have a channel_announcement, the channel is public, otherwise
it's not. Not all channels are public, as they can be local: those
have a NULL channel_announcement.
2. If we don't have a channel_update, we know nothing about that half
of the channel, and no other fields are valid.
3. We can tell if a half channel is disabled by the flags field directly.
Note that we never send halfchannels without an update over
gossip_getchannels_reply so that marshalling/unmarshalling can be
vastly simplified.
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
Make the update/announce messages own the element in the broadcast map
not the other way around.
Then we keep a pointer to the message, and when we free it
(eg. channel closed, update replaces it), it gets freed from the
broadcast map automatically.
The result is much nicer!
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
Basically, if we don't have an announcement for the channel, stash it,
and once we get an announcement, replay if necessary.
Fixes: #1485
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
Someone could try to announce an internal address, and we might probe
it.
This breaks tests, so we add '--dev-allow-localhost' for our tests, so
we don't eliminate that one. Of course, now we need to skip some more
tests in non-developer mode.
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
Internally both payment and routing use 64-bit, but the interface
between them used 32-bit.
Since both components already support 64-bit we should use that.
This was a tricky one to find, it turns out that some nodes are sending
node_announcements even if they don't have a channel announced yet. If they are
a peer and the channel is currently verifying then we'll have a local channel in
the network view, hence accept the node_announcement, but when replaying, the
node_announcement will be replayed and we won't have a channel yet. This just
skips node_announcements, which is always safe.
Reported-by: @laszlohanyecz
Signed-off-by: Christian Decker <decker.christian@gmail.com>
Messages from peers and messages from the gossip_store now have completely
different entrypoints, so we don't need to trace their origin around the message
handling code any longer.
Moves any modifications based on an incoming gossip message into its own
function separate from the message verification. This allows us to skip
verification when reading messages from a trusted source, e.g., the
gossip_store, speeding up the gossip replay.
Signed-off-by: Christian Decker <decker.christian@gmail.com>
When we read from the gossip_store we set store=false so that we don't duplicate
messages in the store.
Signed-off-by: Christian Decker <decker.christian@gmail.com>
We currently keep two copies; one in the broadcast structure to send
in order, and one in the routing information. Since we already keep
the broadcast index in the routing information, use that.
Conveniently, a zero index is the same as the old NULL test.
Rename struct node's announcement_idx to node_announce_msgidx to
make it match the other users.
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
As per BOLT #7.
We don't do this for channel_update which are queued because the
channel_announcement is pending though.
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
We already have 'struct node', so rename 'struct routing_channel' to
'struct chan', and 'struct node_connection' to 'struct half_chan'.
Other minor changes:
1. rstate->channels -> rstate->chanmap.
2. 'connections' -> 'half'.
3. connection_to -> half_chan_to
4. connection_from -> half_chan_from
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
The containing `struct routing_channel` contains src and dst, so
remove them. However, the channel_update msgidx does belong int
`struct node_connection` along with the channel_update.
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
To remove the redundant fields in `struct node_connection` (ie. 'src'
and 'dst' pointers) we need to deal with `struct routing_channel`.
This means we get a series of channels, from which the direction is
implied, so it's a bit more complex to decode. We add a helper
`other_node` to help with this, and since we're the only user of
`connection_to` we change that function to return the index.
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
Failure and pruning were the two places where a node_connection could
be freed; now they both deal with entire channels, we can remove the
NULL checks, and the destructor.
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
This is twice the 'update_channel_interval' we get handed.
We delete the non-existent channel_add_connection and delete_connection
declarations from the header too.
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
We make new_routing_channel() populate both connections
(active=false), so local_add_channel becomes simpler. We also
suppress listchannels output of active=false unannounced channels, to
avoid breaking tests (also, these are unusable, so it makes sense to
omit them)
It also seems the logic in add_channel_direction is legacy: a
channel_announce cannot replace the scid (that would be a different
channel), we don't allow duplicate announcements, and the announcement
is never NULL.
And since we disallow repeated channel_announce already, I believe
'forward' is always true, greatly simplifying the logic in
handle_pending_cannouncement.
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
This makes 'routing_channel' the primary object in the system; it can have
one or two 'node_connection's attached, and points to two nodes.
The nodes are freed when no more routing_channel refer to them. The
routing_channel are freed when they contain no more 'node_connection'.
This fixes#1072 which I surmise was caused by a dangling
routing_channel after pruning.
Each node contains a single array of 'routing_channel's, not one for
each direction. The 'routing_channel' itself orders nodes in key
order (conveniently the index is equal to the direction flag we use),
and 'node_connection' with source in the same order.
There are helpers to assist with common questions like "which
'node_connection' leads out of this node?".
There are now two ways to find a channel:
1. Direct scid lookup via rstate->channels map.
2. Node key lookup, followed by channel traversal.
Several FIXMEs are inserted for where we can now do things more optimally.
Fixes: #1072
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
We're going to make it a first-class citizen, and pending routing_channel
are not real ones (in particular, we don't want to create pending nodes).
We had a linked list called rstate->pending_cannouncement which we didn't
actually use, so put that back for now and add a FIXME to use a faster
data structure.
We need to check that list now in handle_channel_update, but we never
have a real routing_channel and a pending, unless the routing_channel
isn't public.
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
Commit a57a2dcb86 introduced a time_t
in routing.h. So also move the time.h include to the header. This
fixes the build on FreeBSD.
Signed-off-by: Wladimir J. van der Laan <laanwj@gmail.com>