Historically, LDK has considered the "set of known/supported
feature bits" to be an LDK-level thing. Increasingly this doesn't
make sense - different message handlers may provide or require
different feature sets.
In a previous PR, we began the process of transitioning with
feature bits sent to peers being sourced from the attached message
handler.
This commit makes further progress by moving the concept of which
feature bits are supported by our ChannelManager into
channelmanager.rs itself, via the new `provided_*_features`
methods, rather than in features.rs via the `known_channel_features`
and `known` methods.
As we remove the concept of a global "known/supported" feature set
in LDK, we should also remove the concept of a global "required"
feature set. This does so by moving the checks for specific
required features into handlers.
Specifically, it allows the handler `peer_connected` method to
return an `Err` if the peer should be disconnected. Only one such
required feature bit is currently set - `static_remote_key`, which
is required in `ChannelManager`.
For non-gossip-broadcast messages, our current flow is to first
serialize the message into a `Vec`, and then allocate a new `Vec`
into which we write the encrypted+MAC'd message and header.
This is somewhat wasteful, and its rather simple to instead
allocate only one buffer and encrypt the message in-place.
In 47e818f198, forwarding broadcasted
gossip messages was split into a separate per-peer message buffer.
However, both it and the original regular-message queue are
encrypted immediately when the messages are enqueued. Because the
lightning P2P encryption algorithm is order-dependent, this causes
messages to fail their MAC checks as the messages from the two
queues may not be sent to peers in the order in which they were
encrypted.
The fix is to simply queue broadcast gossip messages unencrypted,
encrypting them when we add them to the regular outbound buffer.
Similar to how we OR our InitFeaures and NodeFeatures across both our channel
and routing message handlers, we also want to OR the features of our onion
message handler.
When we broadcast a node announcement, the features we support are really a
combination of all the various features our different handlers support. This
commit captures this concept by OR'ing our NodeFeatures across both our channel
and routing message handlers.
When `ChannelMessageHandler` implementations wish to return an
`InitFeatures` which contain all the known flags that are relevant
to channel handling, but not gossip handling, they currently need
to do so by manually constructing an InitFeatures with all known
flags and then clearing the ones they dont want.
Instead of spreading this logic out across the codebase, this
consolidates such construction to one place in features.rs.
When we go to send an Init message to new peers, the features we
support are really a combination of all the various features our
different handlers support. This commit captures this concept by
OR'ing our InitFeatures across both our Channel and Routing
handlers.
Note that this also disables setting the `initial_routing_sync`
flag in init messages, as was intended in
e742894492, per the comment added on
`clear_initial_routing_sync`, though this should not be a behavior
change in practice as nodes which support gossip queries ignore the
initial routing sync flag.
Like we now do for `NodeFeatures`, this converts to asking our
registered `ChannelMessageHandler` for our `InitFeatures` instead
of hard-coding them to the global LDK known set.
This allows handlers to set different feature bits based on what
our configuration actually supports rather than what LDK supports
in aggregate.
Some `NodeFeatures` will, in the future, represent features which
are not enabled by the `ChannelManager`, but by other message
handlers handlers. Thus, it doesn't make sense to determine the
node feature bits in the `ChannelManager`.
The simplest fix for this is to change to generating the
node_announcement in `PeerManager`, asking all the connected
handlers which feature bits they support and simply OR'ing them
together. While this may not be sufficient in the future as it
doesn't consider feature bit dependencies, support for those could
be handled at the feature level in the future.
This commit moves the `broadcast_node_announcement` function to
`PeerHandler` but does not yet implement feature OR'ing.
When we connect to a new peer, immediately send them any
channel_announcement and channel_update messages for any public
channels we have with other peers. This allows us to stop sending
those messages on a timer when they have not changed and ensures
we are sending messages when we have peers connected, rather than
broadcasting at startup when we have no peers connected.
In this commit, we check if a peer's outbound buffer has room for onion
messages, and if so pulls them from an implementer of a new trait,
OnionMessageProvider.
Makes sure channel messages are prioritized over OMs, and OMs are prioritized
over gossip.
The onion_message module remains private until further rate limiting is added.
Adds the boilerplate needed for PeerManager and OnionMessenger to work
together, with some corresponding docs and misc updates mostly due to the
PeerManager public API changing.
This allows us to better prioritize channel messages over gossip broadcasts and
lays groundwork for rate limiting onion messages more simply, since they won't
be competing with gossip broadcasts for space in the main message queue.
Instead of backfilling gossip by buffering (up to) ten messages at
a time, only buffer one message at a time, as the peers' outbound
socket buffer drains. This moves the outbound backfill messages out
of `PeerHandler` and into the operating system buffer, where it
arguably belongs.
Not buffering causes us to walk the gossip B-Trees somewhat more
often, but avoids allocating vecs for the responses. While its
probably (without having benchmarked it) a net performance loss, it
simplifies buffer tracking and leaves us with more room to play
with the buffer sizing constants as we add onion message forwarding
which is an important win.
Note that because we change how often we check if we're out of
messages to send before pinging, we slightly change how many
messages are exchanged at once, impacting the
`test_do_attempt_write_data` constants.
This consolidates our various checks on peer buffer space into the
`Peer` impl itself, making the thresholds at which we stop taking
various actions on a peer more readable as a whole.
This commit was primarily authored by `Valentine Wallace
<vwallace@protonmail.com>` with some amendments by `Matt Corallo
<git@bluematt.me>`.
In 4703d4e725 we changed
PeerManager::socket_disconnected to no longer require that sockets
which the PeerManager decided to disconnect not be disconnected.
However, we forgot to remove the scary warnings on the
`new_{inbound,outbound}_connection` functions which warned of the
old behavior.
We do so here.
In the next commit we add lockorder testing based on the line each
mutex was created on rather than the particular mutex instance.
This causes some additional test failure because of lockorder
inversions for the same mutex across different tests, which is
fixed here.
P2PGossipSync logs before delegating to NetworkGraph in its
EventHandler. In order to share this handling with RapidGossipSync,
NetworkGraph needs to take a logger so that it can implement
EventHandler instead.
NetGraphMsgHandler implements RoutingMessageHandler to handle gossip
messages defined in BOLT 7 and maintains a view of the network by
updating NetworkGraph. Rename it to P2PGossipSync, which better
describes its purpose, and to contrast with RapidGossipSync.
Instead of including a `Secp256k1` context per
`PeerChannelEncryptor`, which is relatively expensive memory-wise
and nontrivial CPU-wise to construct, we should keep one for all
peers and simply reuse it.
This is relatively trivial so we do so in this commit.
Since its trivial to do so, we also take this opportunity to
randomize the new PeerManager context.
Because we handle messages (which can take some time, persisting
things to disk or validating cryptographic signatures) with the
top-level read lock, but require the top-level write lock to
connect new peers or handle disconnection, we are particularly
sensitive to writer starvation issues.
Rust's libstd RwLock does not provide any fairness guarantees,
using whatever the OS provides as-is. On Linux, pthreads defaults
to starving writers, which Rust's RwLock exposes to us (without
any configurability).
Here we work around that issue by blocking readers if there are
pending writers, optimizing for readable code over
perfectly-optimized blocking.
Only one instance of PeerManager::process_events can run at a time,
and each run always finishes all available work before returning.
Thus, having several threads blocked on the process_events lock
doesn't accomplish anything but blocking more threads.
Here we limit the number of blocked calls on process_events to two
- one processing events and one blocked at the top which will
process all available events after the first completes.
Because the peers write lock "blocks the world", and happens after
each read event, always taking the write lock has pretty severe
impacts on parallelism. Instead, here, we only take the global
write lock if we have to disconnect a peer.
Users are required to only ever call `read_event` serially
per-peer, thus we actually don't need any locks while we're
processing messages - we can only be processing messages in one
thread per-peer.
That said, we do need to ensure that another thread doesn't
disconnect the peer we're processing messages for, as that could
result in a peer_disconencted call while we're processing a
message for the same peer - somewhat nonsensical.
This significantly improves parallelism especially during gossip
processing as it avoids waiting on the entire set of individual
peer locks to forward a gossip message while several other threads
are validating gossip messages with their individual peer locks
held.
This adds the required locking to process messages from different
peers simultaneously in `PeerManager`. Note that channel messages
are still processed under a global lock in `ChannelManager`, and
most work is still processed under a global lock in gossip message
handling, but parallelizing message deserialization and message
decryption is somewhat helpful.