This adds testing of the `futures` feature to CI. In order to avoid
introducing a dependency on `std`, we switch to using the `futures-util`
crate directly enabling only a subset of features. To this end, we also
switch to using the `select_biased!` macro, which is equivalent to
`select!` except that it doesn't choose ready futures pseudo-randomly
at runtime.
We never had the `NetworkGraph::node_failed` method implemented. The
scorer now handles non-permanent failures by downgrading nodes, so we
don't want that implemented.
The method is renamed to `node_failed_permanent` to explicitly indicate
that this is the only case it handles. We also add tracking in the form
of two maps as fields of `NetworkGraph`, namely, `removed_nodes` and
`removed_channels`. We track these removed graph entries to ensure we
don't just resync them right away from gossip soon after removing them.
We stop tracking these removed nodes whenever `remove_stale_channels_and_tracking()`
is called and the entries have been tracked for longer than
`REMOVED_ENTRIES_TRACKING_AGE_LIMIT_SECS` which is currently set to one
week.
We do this to enable users to create routers that do not need a scorer.
This can be useful if they are running a node the delegates pathfinding.
* Move `Score` type parameterization from `InvoicePayer` and `Router` to
`DefaultRouter`
* Adds a new field, `scorer`, to `DefaultRouter`
* Move `AccountsForInFlightHtlcs` to `DefaultRouter`, which we
will use to wrap the new `scorer` field, so scoring only happens in
`DefaultRouter` explicitly.
* Add scoring related functions to `Router` trait that we used to call
directly from `InvoicePayer`.
* Instead of parameterizing `scorer` in `find_route`, we replace it with
inflight_map so `InvoicePayer` can pass on information about inflight
HTLCs to the router.
* Introduced a new tuple struct, InFlightHtlcs, that wraps functionality
for querying used liquidity.
As we move towards specify supported/required feature bits in the
module(s) where they are supported, the global `known` feature set
constructors no longer make sense.
Here we stop relying on the `known` method in the
`lightning-background-processor` and `lightning-persister` crate
tests.
Some `NodeFeatures` will, in the future, represent features which
are not enabled by the `ChannelManager`, but by other message
handlers handlers. Thus, it doesn't make sense to determine the
node feature bits in the `ChannelManager`.
The simplest fix for this is to change to generating the
node_announcement in `PeerManager`, asking all the connected
handlers which feature bits they support and simply OR'ing them
together. While this may not be sufficient in the future as it
doesn't consider feature bit dependencies, support for those could
be handled at the feature level in the future.
This commit moves the `broadcast_node_announcement` function to
`PeerHandler` but does not yet implement feature OR'ing.
Adds the boilerplate needed for PeerManager and OnionMessenger to work
together, with some corresponding docs and misc updates mostly due to the
PeerManager public API changing.
If a user restores from a backup that they know is stale, they'd
like to force-close all of their channels (or at least the ones
they know are stale) *without* broadcasting the latest state,
asking their peers to do so instead. This simply adds methods to do
so, renaming the existing `force_close_channel` and
`force_close_all_channels` methods to disambiguate further.
BackgroundProcessor can take an optional P2PGossipSync and an optional
RapidGossipSync, but doing so may be easy to misuse. Each has a
reference to a NetworkGraph, which could be different between the two,
but only one is actually used.
Instead, allow passing one object wrapped in a GossipSync enum. Also,
fix a bug where the NetworkGraph is not persisted on shutdown if only a
RapidGossipSync is given.
Instead of implementing EventHandler for P2PGossipSync, implement it on
NetworkGraph. This allows RapidGossipSync to handle events, too, by
delegating to its NetworkGraph.
P2PGossipSync logs before delegating to NetworkGraph in its
EventHandler. In order to share this handling with RapidGossipSync,
NetworkGraph needs to take a logger so that it can implement
EventHandler instead.
NetGraphMsgHandler implements RoutingMessageHandler to handle gossip
messages defined in BOLT 7 and maintains a view of the network by
updating NetworkGraph. Rename it to P2PGossipSync, which better
describes its purpose, and to contrast with RapidGossipSync.
Create a wrapper struct for rapid gossip sync that can be passed to
BackgroundProcessor's start method, allowing it to only start pruning
the network graph upon rapid gossip sync's completion.
The main loop of the background processor has this line:
`peer_manager.process_events(); // Note that this may block on ChannelManager's locking`
which does, indeed, sometimes block waiting on the `ChannelManager`
to finish whatever its doing. Specifically, its the only place in
the background processor loop that we block waiting on the
`ChannelManager`, so if the `ChannelManager` is relatively busy, we
may end up being blocked there most of the time.
This should be fine, except today we had a user who's node was
particularly slow in processing some channel updates, resulting in
the background processor being blocked there (as expected). Then,
when the channel updates were completed (and persisted) the next
thing the background processor did was hand the user events to
process, creating yet more channel updates. Ultimately, the users'
node crashed before finishing the event processing. This left us
with an updated monitor on disk and an outdated manager, and they
lost the channel on startup.
Here we simply move the above quoted line to after the normal event
processing, ensuring the next thing we do after blocking on
`ChannelManager` locks is persist the manager, prior to event
handling.
Instead of creating a separate trait for persisting NetworkGraph, use and rename the existing ChannelManagerPersister to handle them both. persist_graph is then called on removal of stale channels and on exit.
Because many lightning nodes can take quite some time to respond to
pings, the five second ping timer can sometimes cause spurious
disconnects even though a peer is online. However, in part as a
response to mobile users where a connection may be lost as result
of only a short time with the app in a "paused" state, we had a
rather aggressive ping time to ensure we would disconnect quickly.
However, since we now just used a fixed time for the "went to
sleep" detection, we can somewhat increase the ping timer. We still
want to be fairly aggressive to avoid sending HTLCs to a peer that
is offline, but the tradeoff between spurious disconnections and
stuck payments is likely doesn't need to be quite as aggressive.
In the sample client (and likely other downstream users), event
processing may block on slow operations (e.g. Bitcoin Core RPCs)
and ChannelManager persistence may take some time. This should be
fine, except that we consider this a case of possible backgrounding
and disconnect all of our peers when it happens.
Instead, we here avoid considering event processing time in the
time between PeerManager events.
We update the `Channel::update_time_counter` field (which is copied
into `ChannelUpdate::timestamp`) only when the channel is
initialized or closes, and when a new block is connected. However,
if a peer disconnects or reconnects, we may wish to generate
`ChannelUpdate` updates in between new blocks. In such a case, we
need to make sure the `timestamp` field is newer than any previous
updates' `timestamp` fields, which we do here by simply
incrementing it when the channel status is changed.
As a side effect of this we have to update
`test_background_processor` to ensure it eventually succeeds even
if the serialization of the `ChannelManager` changes after the test
begins.
I realized on my own node that I don't have any visibility into how
long a monitor or manager persistence call takes, potentially
blocking other operations. This makes it much more clear by adding
a relevant log_trace!() print immediately before and immediately
after persistence.
Scorer uses time to determine how much to penalize a channel after a
failure occurs. Parameterizing it by time cleans up the code such that
no-std support is in a single AlwaysPresent struct, which implements the
Time trait. Time is implemented for std::time::Instant when std is
available.
This parameterization also allows for deterministic testing since a
clock could be devised to advance forward as needed.
NetworkGraph is owned by NetGraphMsgHandler, but DefaultRouter requires
a reference to it. Introduce shared ownership to NetGraphMsgHandler so
that both can use the same NetworkGraph.
Upon receiving a PaymentPathFailed event, the failing payment may be
retried on a different path. To avoid using the channel responsible for
the failure, a scorer should be notified of the failure before being
used to find a new route.
Add a payment_path_failed method to routing::Score and call it in
InvoicePayer's event handler. Introduce a LockableScore parameterization
to InvoicePayer so the scorer is locked only once before calling
find_route.
When we detect a channel `is_shutdown()` or call on it
`force_shutdown()`, we notify the user with a Event::ChannelClosed
informing about the id and closure reason.
Decorate the user-supplied EventHandler with NetGraphMsgHandler in
the BackgroundProcessor. The resulting handler will intercept
PaymentFailed events in order to update the NetworkGraph in the
background before delegating to the user's event handler.
Previously we'd been expecting to implement anchor outputs before
shipping 0.1, thus reworking our channel fee update process
entirely and leaving it as a future task. However, due to the
difficulty of working with on-chain anchor pools, we are now likely
to ship 0.1 without requiring anchor outputs.
In either case, there isn't a lot of reason to require that users
call an explicit "prevailing feerates have changed" function now
that we have a timer method which is called regularly. Further, we
really should be the ones deciding on the channel feerate in terms
of the users' FeeEstimator, instead of requiring users implement a
second fee-providing interface by calling an update_fee method.
Finally, there is no reason for an update_fee method to be
channel-specific, as we should be updating all (outbound) channel
fees at once.
Thus, we move the update_fee handling to the background, calling it
on the regular 1-minute timer. We also update the regular 1-minute
timer to fire on startup as well as every minute to ensure we get
fee updates even on mobile clients that are rarely, if ever, open
for more than one minute.
It is easy for users to have a bug where they drop a
`BackgroundProcessor` immediately, causing it to start and then
immediately stop. Instead, add a `#[must_use]` tag to provide a
compiler warning for such instances.
The previous commit wraps the background thread's JoinHandle in an
Option. Providing a dedicated method to join hides this implementation
detail from users.
Without stopping the thread when BackgroundProcessor is dropped, it will
run free. In the context of language bindings, it is difficult to know
how long references held by the thread should live. Implement Drop to
stop the thread just as is done when explicitly calling stop().
The specific error from the ChannelManager persister is not asserted for
in test_persist_error. Rather, any error will do. Update the test to use
BackgroundProcessor::stop and assert for the expected value.
Currently the base fee we apply is always the expected cost to
claim an HTLC on-chain in case of closure. This results in
significantly higher than market rate fees [1], and doesn't really
match the actual forwarding trust model anyway - as long as
channel counterparties are honest, our HTLCs shouldn't end up
on-chain no matter what the HTLC sender/recipient do.
While some users may wish to use a feerate that implies they will
not lose funds even if they go to chain (assuming no flood-and-loot
style attacks), they should do so by calculating fees themselves;
since they're already charging well above market-rate,
over-estimating some won't have a large impact.
Worse, we current re-calculate fees at forward-time, not based on
the fee we set in the channel_update. This means that the fees
others expect to pay us (and which they calculate their route based
on), is not what we actually want to charge, and that any attempt
to forward through us is inherently race-y.
This commit adds a configuration knob to set the base fee
explicitly, defaulting to 1 sat, which appears to be market-rate
today.
[1] Note that due to an msat-vs-sat bug we currently actually
charge 1000x *less* than the calculated cost.
If we are a public node and have a private channel, our
counterparty needs to know the fees which we will charge to forward
payments to them. Without sending them a channel_update, they have
no way to learn that information, resulting in the channel being
effectively useless for outbound-from-us payments.
This commit fixes our lack of channel_update messages to private
channel counterparties, ensuring we always send them a
channel_update after the channel funding is confirmed.