Either because lightningd tells us it wants to talk, or because the peer
says something about a channel.
We also introduce a behavior change: we disconnect after a failed open.
We might want to modify this later, but we it's a side-effect of openingd
not holding onto idle connections.
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
We would return success from connect even though the peer was closing;
this is technically correct but fairly undesirable. Better is to pass
every connect attempt to connectd, and have it block if the peer is
exiting (and retry), otherwise tell us it's already connected.
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
The message from lightningd simply acknowleges that we are allowed to
discard the peer (because no subdaemons are talking to it anymore).
This difference becomes more stark once connectd holds on to idle
peers.
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
Use tmpctx, rather than freeing manually everywhere (proof: next patch
added a branch and forgot to free it!).
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
This happens when we send a warning or lightningd tells us to send a
final message then close. Normally io logging is done by the
subdaemon that creates it, but this is a special case.
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
Example request that is dying:
NEW REQUEST! lightning_websocketd:main [1955685] <-- bad request from safari
read 507
write_all 1
-> websocket_to_lightningd
-> read_payload_header
read 2
read_all 1
read -11 <--- This tried to read a part of the header, is this -EAGAIN?
read_all 0 should we be blocking on these reads?
*dies*
Fixes#5089
Changelog-Fixed: `experimental-websocket` intermittent read errors fixed
Signed-off-by: William Casarin <jb55@jb55.com>
When compiled without DEVELOPER this will now filter out `remote_addr` that
come from localhost. The testcase checks for DEVELOPER to test for correct
function of `remote_addr`.
Also, I renamed "test_connect" to "test_connect_basic" so it can be started
without all the other tests in that file that start with "test_connect..."
For example, if you do:
```
./lightningd/lightningd --network=regtest --experimental-websocket-port=19846
```
Then you're trying to reuse the normal port as the websocket port, but this
only fails at *listen* time, when we activate connectd. Catch this too.
Fixes incorrect fatal() message, too.
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
By accessing `addr` after the loop, it's possible that it's one which
failed, in complex scenarios.
Also gives us a chance to warn if they specify a websocket but don't
actually end up advertizing it (you *must* advertize a normal addr as
well).
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
We always added to both arrays, might as well just keep one.
We make mayfail an explicit flag, rather than relying on the presence
of errstr, which is never NULL now.
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
This lets us give a better error message if listen fails, and also
moved the callback closer to where it's needed.
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
Gossipd now simply gets told by channeld when peers arrive or leave.
(it only needs to know for the seeker).
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
This is neater than what we had before, and slightly more general.
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
Changelog-Changed: JSON_RPC: `sendcustommsg` now works with any connected peer, even when shutting down a channel.
We don't need to log msgs from subds, but we do our own, and we weren't.
1. Rename queue_peer_msg to inject_peer_msg for clarity, make it do logging
2. In the one place where we're relaying, call msg_queue() directly.
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
We want to stream gossip through this, but currently connectd treats the
fd as synchronous. While we work on getting rid of that, it's easiest to
have two fds.
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
We used to shut down peers atomically, but now we flush the
connections there's a delay. If we are asked to connect in that time,
we ignore it, as we are already connected, but that's wrong: we need
to remember that we were told to connect and reconnect.
This should solve a few weird test failures where "connect" would hang
indefinitely.
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
This is critical in the common case where peer sends an error and
hangs up: we almost never get to relay the error to the subd in time.
This also applies in the other direction: we need to flush the queue
to the peer when the subd closes. Note we only free the actual peer
struct when lightningd reaps us with connectd_peer_disconnected().
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
We would lose packets sometimes due to this previously, but it
doesn't happen over localhost so our tests didn't notice. However,
now we have connectd being sole thing talking to peers, we can do
a more elegant shutdown, which should fix closing.
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
Changelog-Fixed: Protocol: Always flush sockets to increase chance that final message get to peer (esp. error packets).
msg_queue was originally designed for inter-daemon comms, and so it has
a special mechanism to mark that we're trying to send an fd. Unfortunately,
a peer could also send such a message, confusing us!
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
dev_blackhole_fd was a hack, and doesn't work well now we are async
(it worked for sync comms in per-peer daemons, but now we could sneak
through a read before we get to the next write).
So, make explicit flags and use them. This is much easier now we
have all peer comms in one place.
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
It's weird to have connectd ask gossipd, when lightningd can just do it
and hand all the addresses together.
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
We now let gossipd do it.
This also means there's nothing left in 'struct per_peer_state' to
send across the wire (the fds are sent separately), so that gets
removed from wire messages too.
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
We actually intercept the gossip_timestamp_filter, so the gossip_store
mechanism inside the per-peer daemon never kicks off for normal connections.
The gossipwith tool doesn't set OPT_GOSSIP_QUERIES, so it gets both, but
that only effects one place.
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
channeld can't do it any more: it's using local sockets. Connectd
can do it, and simply does it by type.
Amazingly, on my machine the timing change *always* caused
test_channel_receivable() to fail, due to a latent race.
Includes feedback from @cdecker.
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
As connectd handles more packets itself, or diverts them to/from gossipd,
it's the only place we can implement the dev_disconnect logic.
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>