However fast we can handle them, it's antisocial to allow others to
make us spam the rest of the network.
Changelog-Protocol: onion messages: we limit incoming to 4 per second, allowing a little burst.
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
This basically means moving the code from gossipd to connectd to handle
these queries.
This will get connectd have finer control over ratelimiting them.
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
This is more efficient in a few ways:
1. It's trivial to get to the end of the gossip_store, we don't have
to iterate.
2. It tends to be mmaped so we don't have to call pread().
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
We currently stream gossip as fast as we can, even if they start at
timestamp 0. Instead, use a simple token bucket filter and only let
them have 1MB per second (500 bytes per second for testing).
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
Changelog-Protocol: connectd: we now throttle outgoing gossip at 1MB/second per peer.
Currently, anything which doesn't have a live channel is considered transient.
We free this first under stress, and also if they're still connecting.
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
We use a crude heuristic: if we were trying to contact them, it's a
"deliberate" connection, and should be preserved.
Changelog-Changed: connectd: prioritize peers with channels (and log!) if we run low on file descriptors.
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
I thought I was going to want to have a convenient way of counting
these, but it turns out unnecessary. Still, this is slightly more
efficient and simple, so I am including it.
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
We still refuse to run dev commands if lightningd sends it to us
despite us not being in developer mode, but that's mainly paranoia.
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
Also requires us to expose memleak when !DEVELOPER, however we only
ever used the memleak tracking when the LIGHTNINGD_DEV_MEMLEAK
environment variable was set, so keep that.
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
Most of this is piping the flag through so we know it's a websocket!
Reported-by: @ShahanaFarooqui
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
This allows us to detect when lightningd hasn't seen our latest
disconnect/reconnect; in particular, we would hit the following pattern:
1. lightningd says to connect a subd.
2. connectd disconnects and reconnects.
3. connectd reads message, connects subd.
4. lightningd reads disconnect and reconnect, sends msg to connect to subd again.
5. connectd asserts because subd is alreacy connected.
This way connectd can tell if lightningd is talking about the previous
connection, and ignoere it.
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
We don't have to put aside a peer which is reconnecting and wait for
lightningd to remove the old peer, we can now simply free the old
and add the new.
Fixes: #5240
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
Now we have separate peer draining logic, we can simply use it when
connectd tells us to release the peer, without waiting. (We could
simply free the peer, but that's a bit rude, as messages can get
lost).
This removes various complex flags and logic we had before.
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
Changelog-Fixed: `connectd`: various crashes and issues fixed by simplification and rewrite.
This removes it from the hashtable, and forces it to do nothing but
send out any remaining packets, then close.
It is, in effect, reduced to a stub, with no further interactions
with the rest of the system (all subds are freed already).
Also removes the need for an explicit "final_msg" too.
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
Got complaints about us hanging up on some nodes because they don't respond
to pings in a timely manner (e.g. ACINQ?), but that turned out to be something
else.
Nonetheless, we've had reports in the past of LND badly prioritizing gossip
traffic, and thus important messages can get queued behind gossip dumps!
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
Changelog-Changed: connectd: give busy peers more time to respond to pings.
This was fixed in 1c495ca5a8 ("connectd:
fix accidental handling of old reconnections.") and then reverted by
the rework in "connectd: avoid use-after-free upon multiple
reconnections by a peer".
The latter made the race much less likely, since we cleaned up the
reconnecting struct once the connection was hung up by the remote
node, but it's still theoretically possible.
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
`peer_reconnected` was freeing a `struct peer_reconnected` instance
while a pointer to that instance was registered to be passed as an
argument to the `retry_peer_connected` callback function. This caused a
use-after-free crash when `retry_peer_connected` attempted to reparent
the instance to the temporary context.
Instead, never have `peer_reconnected` free a `struct peer_reconnected`
instance, and only ever allow such an instance to be freed after the
`retry_peer_connected` callback has finished with it. To ensure that the
instance is freed even if the connection is closed before the callback
can be invoked, parent the instance to the connection rather than to the
daemon.
Absent the need to free `struct peer_reconnected` instances outside of
the `retry_peer_connected` callback, there is no use for the
`reconnected` hashtable, so remove it as well.
See: https://github.com/ElementsProject/lightning/issues/5282#issuecomment-1141454255Fixes: #5282Fixes: #5284
Changelog-Fixed: connectd no longer crashes when peers reconnect.
We had multiple reports of channels being unilaterally closed because
it seemed like the peer was sending old revocation numbers.
Turns out, it was actually old reestablish messages! When we have a
reconnection, we would put the new connection aside, and tell lightningd
to close the current connection: when it did, we would restart
processing of the initial reconnection.
However, we could end up with *multiple* "reconnecting" connections,
while waiting for an existing connection to close. Though the
connections were long gone, there could still be messages queued
(particularly the channel_reestablish message, which comes early on).
Eventually, a normal reconnection would cause us to process one of
these reconnecting connections, and channeld would see the (perhaps
very old!) messages, and get confused.
(I have a test which triggers this, but it also hangs the connect
command, due to other issues we will fix in the next release...)
Fixes: #5240
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
This seems to prevent broad propagation, due to LND not allowing it. See
https://github.com/lightningnetwork/lnd/issues/6432
We still announce it if you disable deprecated-apis, so tests still work,
and hopefully we can enable it in future.
Fixes: #5196
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
Changelog-EXPERIMENTAL: Protocol: disabled websocket announcement due to LND propagation issues
Gossipd didn't actually suppress all gossip, resulting in a flake!
Doing it in connectd now makes much more sense.
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
Either because lightningd tells us it wants to talk, or because the peer
says something about a channel.
We also introduce a behavior change: we disconnect after a failed open.
We might want to modify this later, but we it's a side-effect of openingd
not holding onto idle connections.
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
The message from lightningd simply acknowleges that we are allowed to
discard the peer (because no subdaemons are talking to it anymore).
This difference becomes more stark once connectd holds on to idle
peers.
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
We always added to both arrays, might as well just keep one.
We make mayfail an explicit flag, rather than relying on the presence
of errstr, which is never NULL now.
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
Gossipd now simply gets told by channeld when peers arrive or leave.
(it only needs to know for the seeker).
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
We want to stream gossip through this, but currently connectd treats the
fd as synchronous. While we work on getting rid of that, it's easiest to
have two fds.
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
We would lose packets sometimes due to this previously, but it
doesn't happen over localhost so our tests didn't notice. However,
now we have connectd being sole thing talking to peers, we can do
a more elegant shutdown, which should fix closing.
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
Changelog-Fixed: Protocol: Always flush sockets to increase chance that final message get to peer (esp. error packets).
dev_blackhole_fd was a hack, and doesn't work well now we are async
(it worked for sync comms in per-peer daemons, but now we could sneak
through a read before we get to the next write).
So, make explicit flags and use them. This is much easier now we
have all peer comms in one place.
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
We actually intercept the gossip_timestamp_filter, so the gossip_store
mechanism inside the per-peer daemon never kicks off for normal connections.
The gossipwith tool doesn't set OPT_GOSSIP_QUERIES, so it gets both, but
that only effects one place.
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
@thestick613 noticed that since tor version below 0.3.2.2-alpha
will not support V3 ed25519 address formats, the error handling
is not that helpful in the error message from cli.
So now we add an hint.
Changelog-None:
Signed-off-by: Saibato <saibato.naga@pm.me>
connectd/connectd.h; Add helper function to update conn error list
Signed-off-by: Saibato <saibato.naga@pm.me>
It's almost always "their_features" and "our_features" respectively, so
make those names clear.
Suggested-by: @cdecker
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
This will help with the next patch, where we wean off using a global
for features: connectd.c has access to the feature bits.
Since connectd might now want to send a message, it needs the crypto_state
non-const, which makes this less trivial than it would otherwise be.
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>