Long ago, we used the `no_connection_possible` to signal that a
peer has some unknown feature set or some other condition prevents
us from ever connecting to the given peer. In that case we'd
automatically force-close all channels with the given peer. This
was somewhat surprising to users so we removed the automatic
force-close, leaving the flag serving no LDK-internal purpose.
Distilling the concept of "can we connect to this peer again in the
future" to a simple flag turns out to be ripe with edge cases, so
users actually using the flag to force-close channels would likely
cause surprising behavior.
Thus, there's really not a lot of reason to keep the flag,
especially given its untested and likely to be broken in subtle
ways anyway.
Now that we allow `handle_channel_announcement` to (indirectly)
spawn async tasks which will complete later, we have to ensure it
can apply backpressure all the way up to the TCP socket to ensure
we don't end up with too many buffers allocated for UTXO
validation.
We do this by adding a new method to `RoutingMessageHandler` which
allows it to signal if there are "many" checks pending and
`channel_announcement` messages should be delayed. The actual
`PeerManager` implementation thereof is done in the next commit.
The `chain::Access` trait (and the `chain::AccessError` enum) is a
bit strange - it only really makes sense if users import it via the
`chain` module, otherwise they're left with a trait just called
`Access`. Worse, for bindings users its always just called
`Access`, in part because many downstream languages don't have a
mechanism to import a module and then refer to it.
Further, its stuck dangling in the `chain` top-level mod.rs file,
sitting in a module that doesn't use it at all (it's only used in
`routing::gossip`).
Instead, we give it its full name - `UtxoLookup` (and rename the
error enum `UtxoLookupError`) and put it in the a new
`routing::utxo` module, next to `routing::gossip`.
Also swaps `PublicKey` for `NodeId` in `get_next_node_announcement`
and `InitSyncTracker` to avoid unnecessary deserialization that came
from changing `UnsignedNodeAnnouncement`.
Secrets should not be exposed in-memory at the interface level as it
would be impossible the implement it against a hardware security
module/secure element.
We have a number of debug assertions which are expected to never
fire when running in a single thread. This is just fine in tests,
and gives us good coverage of our lockorder requirements, but is
not-irregularly surprising to users, who may run with their own
debug assertions in test environments.
Instead, we gate these checks by the `cfg(test)` setting as well as
the `_test_utils` feature, ensuring they run in our own tests, but
not downstream tests.
As we move towards specify supported/required feature bits in the
module(s) where they are supported, the global `known` feature set
constructors no longer make sense.
Here we stop relying on the `known` method in the
`lightning-net-tokio` crate.
As we remove the concept of a global "known/supported" feature set
in LDK, we should also remove the concept of a global "required"
feature set. This does so by moving the checks for specific
required features into handlers.
Specifically, it allows the handler `peer_connected` method to
return an `Err` if the peer should be disconnected. Only one such
required feature bit is currently set - `static_remote_key`, which
is required in `ChannelManager`.
When we broadcast a node announcement, the features we support are really a
combination of all the various features our different handlers support. This
commit captures this concept by OR'ing our NodeFeatures across both our channel
and routing message handlers.
When we go to send an Init message to new peers, the features we
support are really a combination of all the various features our
different handlers support. This commit captures this concept by
OR'ing our InitFeatures across both our Channel and Routing
handlers.
Note that this also disables setting the `initial_routing_sync`
flag in init messages, as was intended in
e742894492, per the comment added on
`clear_initial_routing_sync`, though this should not be a behavior
change in practice as nodes which support gossip queries ignore the
initial routing sync flag.
Like we now do for `NodeFeatures`, this converts to asking our
registered `ChannelMessageHandler` for our `InitFeatures` instead
of hard-coding them to the global LDK known set.
This allows handlers to set different feature bits based on what
our configuration actually supports rather than what LDK supports
in aggregate.
Some `NodeFeatures` will, in the future, represent features which
are not enabled by the `ChannelManager`, but by other message
handlers handlers. Thus, it doesn't make sense to determine the
node feature bits in the `ChannelManager`.
The simplest fix for this is to change to generating the
node_announcement in `PeerManager`, asking all the connected
handlers which feature bits they support and simply OR'ing them
together. While this may not be sufficient in the future as it
doesn't consider feature bit dependencies, support for those could
be handled at the feature level in the future.
This commit moves the `broadcast_node_announcement` function to
`PeerHandler` but does not yet implement feature OR'ing.
Adds the boilerplate needed for PeerManager and OnionMessenger to work
together, with some corresponding docs and misc updates mostly due to the
PeerManager public API changing.
Instead of backfilling gossip by buffering (up to) ten messages at
a time, only buffer one message at a time, as the peers' outbound
socket buffer drains. This moves the outbound backfill messages out
of `PeerHandler` and into the operating system buffer, where it
arguably belongs.
Not buffering causes us to walk the gossip B-Trees somewhat more
often, but avoids allocating vecs for the responses. While its
probably (without having benchmarked it) a net performance loss, it
simplifies buffer tracking and leaves us with more room to play
with the buffer sizing constants as we add onion message forwarding
which is an important win.
Note that because we change how often we check if we're out of
messages to send before pinging, we slightly change how many
messages are exchanged at once, impacting the
`test_do_attempt_write_data` constants.
This avoids any extra calls to `read_event` after a write fails to
flush the write buffer fully, as is required by the PeerManager
API (though it isn't critical).
Unlike very ancient versions of lightning-net-tokio, this does not
rely on a single global process_events future, but instead has one
per connection. This could still cause significant contention, so
we'll ensure only two process_events calls can exist at once in
the next few commits.
I recently saw the following panic on one of my test nodes:
```
thread 'tokio-runtime-worker' panicked at 'called `Result::unwrap()`
on an `Err` value: Os { code: 107, kind: NotConnected, message:
"Transport endpoint is not connected" }',
rust-lightning/lightning-net-tokio/src/lib.rs:250:38
```
Presumably what happened is somehow the connection was closed in
between us accepting it and us going to start processing it. While
this is a somewhat surprising race, its clearly reachable. The fix
proposed here is quite trivial - simply don't `unwrap` trying to
fetch our peer's socket address, instead treat the peer address as
`None` and discover the disconnection later when we go to read.