Unfortunately, the RAII types used by `RwLock` are not `Send`, which is
why they can't be held over `await` boundaries. In order to allow
asynchronous events processing in multi-threaded environments, we here
allow to process events without holding the `total_consistency_lock`.
This fixes two potential panics within the test if the
`BackgroundProcessor` for `nodes[0]` consumed the `ChannelPending` event
prior to us consuming it manually in `end_open_channel`. The first panic
would happen within the event handler, since `ChannelPending` was not
being handled. The second panic would happen upon expecting the
`ChannelPending` event after handling `nodes[1]`'s `funding_signed` if
the `BackgroundProcessor` handled the event first. To ensure we still
reliably receive a `ChannelPending` event once possible, we let the
`BackgroundProcessor` consume the event and notify it.
If the user's sleep future passed to an async background processor
only returns true for exiting once and then reverts back to false,
we should exit anyway when we get a chance to. We do to this here
by always ensuring we check the exit flag even when only polling
sleep futures with no intent to (yet) exit. This is utilized in the
tests added in the coming commit(s).
If `ChannelManager` is persistable before the async background
processor even starts, it may not even get around to overwriting
the `should_exit` flag before testing it, and the default value is
(incorrectly) true, causing an immediate unconditional exit.
The default value should simply be false.
Fixes#2140
Instead of asserting a `Result` `is_ok`, we should always simply
`unwrap` to get a backgrace, and we should avoid doing so if the
thread is already panicking.
Currently, users don't have good way of being notified when channel open
negotiations have succeeded and new channels are pending confirmation on
chain. To this end, we add a new `ChannelPending` event that is emitted
when send or receive a `funding_signed` message, i.e., at the last
moment before waiting for the confirmation period.
We track whether the event had previously been emitted in `Channel` and
remove it from `internal_funding_created` entirely. Hence, we now
only emit the event after ChannelMonitorUpdate completion, or upon
channel reestablish. This mitigates a race condition where where we
wouldn't persist the event *and* wouldn't regenerate it on restart,
therefore potentially losing it, if async CMU wouldn't complete before
ChannelManager persistence.
Some users have suggested that waking every 100ms can be
CPU-intensive in deployments with hundreds or thousands of nodes
all running on the same machine. Thus, we add an option to the
futures-based `background-processor` to avoid waking every 100ms to
check for iOS having backgrounded our app and cut our TCP sockets.
This cuts the normal sleep time down from 100ms to 10s, for those
who turn it on.
If the `ChainMonitor` gets an async monitor update completion, this
means the `ChannelManager` needs to be polled for event processing.
Here we wake it using the new multi-`Future`-await `Sleeper`, or
the existing `select` block in the async BP.
Fixes#2052.
Rather than having three ways to await a `ChannelManager` being
persistable, this moves to just exposing the awaitable `Future` and
having sleep functions on that.
As `futures` apparently makes no guarantees on MSRVs even in patch
releases we really can't rely on it at all, and while it currently
has an acceptable MSRV without the macros feature, its best to just
remove it wholesale.
Luckily, removing it is relatively trivial, even if it requires
the most trivial of unsafe tags.
`futures` recently broke our MSRV by bumping the `syn` major
version in a patch release. This makes it impractical for us to
use, instead here we replace the usage of its `select_biased` macro
with a trivial enum.
Given its simplicity we likely should have done this without ever
taking the dependency.
This is largely motivated by some follow-up work for anchors that will
introduce an event handler for `BumpTransaction` events, which we can
now include in this new top-level `events` module.
`poll`ing completed futures invokes undefined behavior in Rust
(panics, etc, obviously not memory corruption as its not unsafe).
Sadly, in our futures-based version of
`lightning-background-processor` we have one case where we can
`poll` a completed future - if the timer for the network graph
prune + persist completes without a network graph to prune +
persist we'll happily poll the same future over and over again,
likely panicing in user code.
This field was previous useful in manual retries for users to know when all
paths of a payment have failed and it is safe to retry. Now that we support
automatic retries in ChannelManager and no longer support manual retries, the
field is no longer useful.
For backwards compat, we now always write false for this field. If we didn't do
this, previous versions would default this field's value to true, which can be
problematic because some clients have relied on the field to indicate when a
full payment retry is safe.
Forcing users to pass a genesis block hash has ended up being
error-prone largely due to byte-swapping questions for bindings
users. Further, our API is currently inconsistent - in
`ChannelManager` we take a `Bitcoin::Network` but in `NetworkGraph`
we take the genesis block hash.
Luckily `NetworkGraph` is the only remaining place where we require
users pass the genesis block hash, so swapping it for a `Network`
is a simple change.
The `chain::Access` trait (and the `chain::AccessError` enum) is a
bit strange - it only really makes sense if users import it via the
`chain` module, otherwise they're left with a trait just called
`Access`. Worse, for bindings users its always just called
`Access`, in part because many downstream languages don't have a
mechanism to import a module and then refer to it.
Further, its stuck dangling in the `chain` top-level mod.rs file,
sitting in a module that doesn't use it at all (it's only used in
`routing::gossip`).
Instead, we give it its full name - `UtxoLookup` (and rename the
error enum `UtxoLookupError`) and put it in the a new
`routing::utxo` module, next to `routing::gossip`.
Secrets should not be exposed in-memory at the interface level as it
would be impossible the implement it against a hardware security
module/secure element.
This makes `background-processor` build without `std` at all. This
isn't particularly useful in the general no-std case as
`background-processor` is only useful with the `futures` feature,
and async will generally need `std` in some way or another. Still,
it ensures we don't end up reintroducing a dependency on the
current time, which breaks `wasm` use-cases.
`background-processor` does a number of jobs on various timers.
Instead of doing those by interrogating `std::time::Instant`, this
change swaps to using the existing user-provided sleep future.
Fixes#1864.
`background-processor` does a number of jobs on various timers.
Currently, those are all done by checking the timers every 100ms
by interrogating `std::time::Instant`. This is fine for the
threaded version, but we'd like more flexibility in the `futures`-
based `background-processor`.
Here we swap the `std::time::Instant` interrogation for a lambda
which we will switch out to the user-provided sleeper in the next
commit.
As of HEAD the `ChannelManager` is parametrized by a `Router`, while
`InvoicePayer` also owns a `Router`. In order to allow for a single
object being reused, we make the `InvoicePayer` side `Deref`.
This is purely a refactor that does not change the InitFeatures
advertised by a ChannelManager. This allows users to configure which
features should be advertised based on the values of `UserConfig`. While
there aren't any existing features currently leveraging this behavior,
it will be used by the upcoming anchors_zero_fee_htlc_tx feature.
The UserConfig dependency on provided_init_features caused most
callsites of the main test methods responsible for opening channels to
be updated. This commit foregos that completely by no longer requiring
the InitFeatures of each side to be provided to these methods. The
methods already require a reference to each node's ChannelManager to
open the channel, so we use that same reference to obtain their
InitFeatures. A way to override such features was required for some
tests, so a new `override_init_features` config option now exists on
the test harness.
If no network graph is provided to the `BackgroundProcessor`, we
log every time the processor loop goes around (at least every
100ms, if not more) which fille up logs with useless indications
that we have no network graph.
This adds testing of the `futures` feature to CI. In order to avoid
introducing a dependency on `std`, we switch to using the `futures-util`
crate directly enabling only a subset of features. To this end, we also
switch to using the `select_biased!` macro, which is equivalent to
`select!` except that it doesn't choose ready futures pseudo-randomly
at runtime.
We never had the `NetworkGraph::node_failed` method implemented. The
scorer now handles non-permanent failures by downgrading nodes, so we
don't want that implemented.
The method is renamed to `node_failed_permanent` to explicitly indicate
that this is the only case it handles. We also add tracking in the form
of two maps as fields of `NetworkGraph`, namely, `removed_nodes` and
`removed_channels`. We track these removed graph entries to ensure we
don't just resync them right away from gossip soon after removing them.
We stop tracking these removed nodes whenever `remove_stale_channels_and_tracking()`
is called and the entries have been tracked for longer than
`REMOVED_ENTRIES_TRACKING_AGE_LIMIT_SECS` which is currently set to one
week.
We do this to enable users to create routers that do not need a scorer.
This can be useful if they are running a node the delegates pathfinding.
* Move `Score` type parameterization from `InvoicePayer` and `Router` to
`DefaultRouter`
* Adds a new field, `scorer`, to `DefaultRouter`
* Move `AccountsForInFlightHtlcs` to `DefaultRouter`, which we
will use to wrap the new `scorer` field, so scoring only happens in
`DefaultRouter` explicitly.
* Add scoring related functions to `Router` trait that we used to call
directly from `InvoicePayer`.
* Instead of parameterizing `scorer` in `find_route`, we replace it with
inflight_map so `InvoicePayer` can pass on information about inflight
HTLCs to the router.
* Introduced a new tuple struct, InFlightHtlcs, that wraps functionality
for querying used liquidity.
As we move towards specify supported/required feature bits in the
module(s) where they are supported, the global `known` feature set
constructors no longer make sense.
Here we stop relying on the `known` method in the
`lightning-background-processor` and `lightning-persister` crate
tests.
Some `NodeFeatures` will, in the future, represent features which
are not enabled by the `ChannelManager`, but by other message
handlers handlers. Thus, it doesn't make sense to determine the
node feature bits in the `ChannelManager`.
The simplest fix for this is to change to generating the
node_announcement in `PeerManager`, asking all the connected
handlers which feature bits they support and simply OR'ing them
together. While this may not be sufficient in the future as it
doesn't consider feature bit dependencies, support for those could
be handled at the feature level in the future.
This commit moves the `broadcast_node_announcement` function to
`PeerHandler` but does not yet implement feature OR'ing.
Adds the boilerplate needed for PeerManager and OnionMessenger to work
together, with some corresponding docs and misc updates mostly due to the
PeerManager public API changing.
If a user restores from a backup that they know is stale, they'd
like to force-close all of their channels (or at least the ones
they know are stale) *without* broadcasting the latest state,
asking their peers to do so instead. This simply adds methods to do
so, renaming the existing `force_close_channel` and
`force_close_all_channels` methods to disambiguate further.
BackgroundProcessor can take an optional P2PGossipSync and an optional
RapidGossipSync, but doing so may be easy to misuse. Each has a
reference to a NetworkGraph, which could be different between the two,
but only one is actually used.
Instead, allow passing one object wrapped in a GossipSync enum. Also,
fix a bug where the NetworkGraph is not persisted on shutdown if only a
RapidGossipSync is given.
Instead of implementing EventHandler for P2PGossipSync, implement it on
NetworkGraph. This allows RapidGossipSync to handle events, too, by
delegating to its NetworkGraph.
P2PGossipSync logs before delegating to NetworkGraph in its
EventHandler. In order to share this handling with RapidGossipSync,
NetworkGraph needs to take a logger so that it can implement
EventHandler instead.
NetGraphMsgHandler implements RoutingMessageHandler to handle gossip
messages defined in BOLT 7 and maintains a view of the network by
updating NetworkGraph. Rename it to P2PGossipSync, which better
describes its purpose, and to contrast with RapidGossipSync.
Create a wrapper struct for rapid gossip sync that can be passed to
BackgroundProcessor's start method, allowing it to only start pruning
the network graph upon rapid gossip sync's completion.
The main loop of the background processor has this line:
`peer_manager.process_events(); // Note that this may block on ChannelManager's locking`
which does, indeed, sometimes block waiting on the `ChannelManager`
to finish whatever its doing. Specifically, its the only place in
the background processor loop that we block waiting on the
`ChannelManager`, so if the `ChannelManager` is relatively busy, we
may end up being blocked there most of the time.
This should be fine, except today we had a user who's node was
particularly slow in processing some channel updates, resulting in
the background processor being blocked there (as expected). Then,
when the channel updates were completed (and persisted) the next
thing the background processor did was hand the user events to
process, creating yet more channel updates. Ultimately, the users'
node crashed before finishing the event processing. This left us
with an updated monitor on disk and an outdated manager, and they
lost the channel on startup.
Here we simply move the above quoted line to after the normal event
processing, ensuring the next thing we do after blocking on
`ChannelManager` locks is persist the manager, prior to event
handling.
Instead of creating a separate trait for persisting NetworkGraph, use and rename the existing ChannelManagerPersister to handle them both. persist_graph is then called on removal of stale channels and on exit.
Because many lightning nodes can take quite some time to respond to
pings, the five second ping timer can sometimes cause spurious
disconnects even though a peer is online. However, in part as a
response to mobile users where a connection may be lost as result
of only a short time with the app in a "paused" state, we had a
rather aggressive ping time to ensure we would disconnect quickly.
However, since we now just used a fixed time for the "went to
sleep" detection, we can somewhat increase the ping timer. We still
want to be fairly aggressive to avoid sending HTLCs to a peer that
is offline, but the tradeoff between spurious disconnections and
stuck payments is likely doesn't need to be quite as aggressive.
In the sample client (and likely other downstream users), event
processing may block on slow operations (e.g. Bitcoin Core RPCs)
and ChannelManager persistence may take some time. This should be
fine, except that we consider this a case of possible backgrounding
and disconnect all of our peers when it happens.
Instead, we here avoid considering event processing time in the
time between PeerManager events.
We update the `Channel::update_time_counter` field (which is copied
into `ChannelUpdate::timestamp`) only when the channel is
initialized or closes, and when a new block is connected. However,
if a peer disconnects or reconnects, we may wish to generate
`ChannelUpdate` updates in between new blocks. In such a case, we
need to make sure the `timestamp` field is newer than any previous
updates' `timestamp` fields, which we do here by simply
incrementing it when the channel status is changed.
As a side effect of this we have to update
`test_background_processor` to ensure it eventually succeeds even
if the serialization of the `ChannelManager` changes after the test
begins.
I realized on my own node that I don't have any visibility into how
long a monitor or manager persistence call takes, potentially
blocking other operations. This makes it much more clear by adding
a relevant log_trace!() print immediately before and immediately
after persistence.
Scorer uses time to determine how much to penalize a channel after a
failure occurs. Parameterizing it by time cleans up the code such that
no-std support is in a single AlwaysPresent struct, which implements the
Time trait. Time is implemented for std::time::Instant when std is
available.
This parameterization also allows for deterministic testing since a
clock could be devised to advance forward as needed.
NetworkGraph is owned by NetGraphMsgHandler, but DefaultRouter requires
a reference to it. Introduce shared ownership to NetGraphMsgHandler so
that both can use the same NetworkGraph.