Commit Graph

19 Commits

Author SHA1 Message Date
Rusty Russell
1d0c433dc4 channeld: treat all incoming errors as "soft", so we retry.
We still close the channel if we *send* an error, but we seem to have hit
another case where LND sends an error which seems transient, so this will
make a best-effort attempt to preserve our channel in that case.

Some test have to be modified, since they don't terminate as they did
previously :(

Changelog-Changed: quirks: We'll now reconnect and retry if we get an error on an established channel. This works around lnd sending error messages that may be non-fatal.

Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
2019-12-13 16:36:18 +01:00
Rusty Russell
728bb4e662 common/gossip_store: handle timestamp filtering.
This means we intercept the peer's gossip_timestamp_filter request
in the per-peer subdaemon itself.  The rest of the semantics are fairly
simple however.

Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
2019-06-04 01:29:39 +00:00
Rusty Russell
38d2899fbb common/per_per_state: generalize lightningd/peer_comm Part 1
Encapsulating the peer state was a win for lightningd; not surprisingly,
it's even more of a win for the other daemons, especially as we want
to add a little gossip information.

Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
2019-06-04 01:29:39 +00:00
Rusty Russell
f1b4b14be5 channeld: don't queue gossip msgs while waiting for foreign_channel_update.
We ask gossipd for the channel_update for the outgoing channel; any other
messages it sends us get queued for later processing.

But this is overzealous: we can shunt those msgs to the peer while
we're waiting.  This fixes a nasty case where we have to handle
WIRE_GOSSIPD_NEW_STORE_FD messages by queuing the fd for later.

This then means that WIRE_GOSSIPD_NEW_STORE_FD can be handled
internally inside handle_gossip_msg(), since it's always dealt with
the same, simplifying all callers.

Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
2019-06-04 01:29:39 +00:00
Rusty Russell
b4e7b198e6 common/read_peer_msg: fix header comment.
Reported-by: @trueptolemy
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
2019-05-13 05:16:18 +00:00
Rusty Russell
f5a218f9d1 gossipd: send per-peer daemons offsets into gossip store.
Instead of reading the store ourselves, we can just send them an
offset.  This saves gossipd a lot of work, putting it where it belongs
(in the daemon responsible for the specific peer).

MCP bench results:
   store_load_msec:28509-31001(29206.6+/-9.4e+02)
   vsz_kb:580004-580016(580006+/-4.8)
   store_rewrite_sec:11.640000-12.730000(11.908+/-0.41)
   listnodes_sec:1.790000-1.880000(1.83+/-0.032)
   listchannels_sec:21.180000-21.950000(21.476+/-0.27)
   routing_sec:2.210000-11.160000(7.126+/-3.1)
   peer_write_all_sec:36.270000-41.200000(38.168+/-1.9)

Signficant savings in streaming gossip:
   -peer_write_all_sec:48.160000-51.480000(49.608+/-1.1)
   +peer_write_all_sec:35.780000-37.980000(36.43+/-0.81)

Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
2019-05-13 05:16:18 +00:00
Rusty Russell
d8db4e871f gossipd: provide new fd to per-peer daemons when we compact it.
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
2019-05-13 05:16:18 +00:00
Rusty Russell
13717c6ebb gossipd: hand a gossip_store_fd to all subdaemons.
This will let them read from the gossip store directly.

Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
2019-05-13 05:16:18 +00:00
Rusty Russell
136f10e4a3 common/read_peer_msg: remove.
Also means we simplify the handle_gossip_msg() since everyone wants it to
use sync_crypto_write().

Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
2018-08-05 02:03:58 +00:00
Rusty Russell
0b08601951 sync_crypto_write/sync_crypto_read: just fail, don't return NULL.
There's only one thing the caller ever does, just do that internally.

Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
2018-08-05 02:03:58 +00:00
Rusty Russell
09cce4a9c7 common/read_peer_msg: deconstruct into individual helper routines.
The One Big API is confusing, and has enough corner cases that we should
ditch it rather than add more.

See: https://www.sandimetz.com/blog/2016/1/20/the-wrong-abstraction

In particular, when openingd is changed to chat to peers even when
it's not actively opening a channel, it wants to handle (most) errors
by continuing, not calling peer_failed().

This exposes the constituent parts.

Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
2018-08-05 02:03:58 +00:00
Rusty Russell
b5fcd54ef0 channeld: don't read from gossipfd while we're reconnecting.
That was the cause of the bad gossip order failures: gossipd thought our
channel was live, but the other end didn't receive message last time.

Now gossipd doesn't use fd to kill us (connectd tells master to do so), we
can implement read_peer_msg_nogossip().

Fixes: #1706
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
2018-07-25 02:13:52 +00:00
Rusty Russell
2d533dc82e channeld: don't manually disable channel.
gossipd will do it when peer dies anyway.

Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
2018-07-07 16:07:53 +02:00
Rusty Russell
b68fb24758 read_peer_msg: handle incoming gossip from gossipd.
This means that openingd and closingd now forward our gossip.  But the real
reason we want to do this is that it gives an easy way for gossipd to kill
any active daemon, by closing its fd: previously closingd and openingd didn't
read the fd, so tended not to notice.

Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
2018-04-26 05:47:57 +00:00
Rusty Russell
ab9d9ef3b8 gossipd: drain fd instead of passing around gossip index.
(This was sitting in my gossip-enchancement patch queue, but it simplifies
this set too, so I moved it here).

In 94711969f we added an explicit gossip_index so when gossipd gets
peers back from other daemons, it knows what gossip it has sent (since
gossipd can send gossip after the other daemon is already complete).

This solution is insufficient for the more general case where gossipd
wants to send other messages reliably, so replace it with the other
solution: have gossipd drain the "gossip fd" which the daemon returns.

This turns out to be quite simple, and is probably how I should have
done it originally :(

Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
2018-04-26 05:47:57 +00:00
Rusty Russell
cfa50d393a openingd: use peer_failed like normal instead of boutique negotiation_failed.
Because peer_failed would previously drop the connection, we had a
special 'negotiation_failed' message which made the master hand it
back to gossipd.  We don't need that any more.

This also meant we no longer need a special hook in read_peer_msg
for openingd to send this message.

Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
2018-02-19 02:56:51 +00:00
Rusty Russell
f76ff90485 status: split off error messages into a new 'peer_status' type.
Several daemons (onchaind, hsm) want to use the status messages, but
don't communicate with peers.  The coming changes made them drag in
more code they didn't need, so instead we have a different
non-overlapping type.

We combine the status_received_errmsg and status_sent_errmsg
into a single status_peer_error, with the presence or not of the
'error_for_them' field indicating direction. 

We also rename status_fatal_connection_lost() to
peer_failed_connection_lost() to fit in.

Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
2018-02-19 02:56:51 +00:00
Rusty Russell
cc9ca82821 status: separate types for peer failure vs "impossible" failures.
Ideally we'd rename status_failed() to status_fatal(), but that's
too much churn for now.

Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
2018-02-08 19:07:12 +01:00
Rusty Russell
ef9b16399c common: read_peer_msg helpers for per-peer subds.
It's a bit ugly because each caller has slightly different needs, but
we have three hooks and standard helpers.

Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
2018-02-01 05:57:56 +00:00