Connectd already does this when we *receive* an error or warning, but
now do it on send. This causes some slight behavior change: we don't
disconnect when we close a channel, for example (our behaviour here
has been inconsistent across versions, depending on the code).
When connectd is told to disconnect, it now does so immediately, and
doesn't wait for subds to drain etc. That simplifies the manual
disconnect case, which now cleans up as it would from any other
disconnection when connectd says it's disconnected.
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
We want to avoid lost messages in the common cases.
This generalizes our drain code, by giving the subds each 5 seconds to
close themselves, but continue to allow them to send us traffic (if
peer is still connected) and continue to send them traffic.
We continue to send traffic *out* to the peer (if it's still
connected), until all subds are gone. We still have a 5 second timer
to close the connection to peer.
On reconnects, we don't do this "drain period" on reconnects: we kill
immediately.
We fix up one test which was looking for the "disconnect" message
explicitly.
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
Before this patch:
1. connectd says it's connected (peer_connected)
2. we tell connectd we want to talk about each channel (peer_make_active)
3. connectd gives us an fd for each channel, and we connect it to a subd (peer_active)
4. OR, connectd says it sent something about a channel we didn't tell it about, with an fd (peer_active)
Now:
1. connectd says it's connected (peer_connected)
2. we start all appropriate subds and tell connectd to what channels/fds (peer_connect_subd).
3. if connectd says it sent something about a channel we didn't tell it about, we either tell
it to hang up (peer_final_msg), or connect a new opening daemon (peer_connect_subd).
This is the minimal-size patch, which is why we create socket pairs in
so many places to use the existing functions. Many cleanups are
possible, since the new flow is so simple.
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
Now we have separate peer draining logic, we can simply use it when
connectd tells us to release the peer, without waiting. (We could
simply free the peer, but that's a bit rude, as messages can get
lost).
This removes various complex flags and logic we had before.
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
Changelog-Fixed: `connectd`: various crashes and issues fixed by simplification and rewrite.
Either because lightningd tells us it wants to talk, or because the peer
says something about a channel.
We also introduce a behavior change: we disconnect after a failed open.
We might want to modify this later, but we it's a side-effect of openingd
not holding onto idle connections.
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
This is neater than what we had before, and slightly more general.
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
Changelog-Changed: JSON_RPC: `sendcustommsg` now works with any connected peer, even when shutting down a channel.
We don't need to log msgs from subds, but we do our own, and we weren't.
1. Rename queue_peer_msg to inject_peer_msg for clarity, make it do logging
2. In the one place where we're relaying, call msg_queue() directly.
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
We would lose packets sometimes due to this previously, but it
doesn't happen over localhost so our tests didn't notice. However,
now we have connectd being sole thing talking to peers, we can do
a more elegant shutdown, which should fix closing.
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
Changelog-Fixed: Protocol: Always flush sockets to increase chance that final message get to peer (esp. error packets).
We actually intercept the gossip_timestamp_filter, so the gossip_store
mechanism inside the per-peer daemon never kicks off for normal connections.
The gossipwith tool doesn't set OPT_GOSSIP_QUERIES, so it gets both, but
that only effects one place.
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
channeld can't do it any more: it's using local sockets. Connectd
can do it, and simply does it by type.
Amazingly, on my machine the timing change *always* caused
test_channel_receivable() to fail, due to a latent race.
Includes feedback from @cdecker.
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
We temporarily hack to sync_crypto_write/sync_crypto_read functions to
not do any crypto, and do it all in connectd.
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
Instead of passing the incoming socket to lightningd for the
subdaemon, create a new one and simply shuffle data between them,
keeping connectd in the loop.
For the moment, we don't decrypt at all, just shuffle. This means our
buffer code is kind of a hack, but that goes away once we start
actually decrypting and understanding message boundaries.
This implementation is naive: it closes the socket to the local daemon
as soon as the peer closes the socket to us. This is fixed in a
successive patch series (along with many other similar issues).
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>