We had some user confusion on how the probabilistic scorer works,
especially in reference to the half-life parameter. This attempts
to clarify how the bounds work, and how they are decayed.
We never had the `NetworkGraph::node_failed` method implemented. The
scorer now handles non-permanent failures by downgrading nodes, so we
don't want that implemented.
The method is renamed to `node_failed_permanent` to explicitly indicate
that this is the only case it handles. We also add tracking in the form
of two maps as fields of `NetworkGraph`, namely, `removed_nodes` and
`removed_channels`. We track these removed graph entries to ensure we
don't just resync them right away from gossip soon after removing them.
We stop tracking these removed nodes whenever `remove_stale_channels_and_tracking()`
is called and the entries have been tracked for longer than
`REMOVED_ENTRIES_TRACKING_AGE_LIMIT_SECS` which is currently set to one
week.
After a `Notifier` has been `notify`'d, attempts to `get_future`
should return a `Future` which is pre-completed, however this was
not the case.
This commit simply fixes the behavior, adding a test to demonstrate
the issue.
To avoid scoring based on incredibly old historical liquidity data,
we add a new half-life here which is used to (very slowly) decay
historical liquidity tracking buckets.
This simplifies adding additional half lives in
DirectedChannelLiquidity by simply storing a reference to the full
parameter set rather than only the single current half-life.
Our current `ProbabilisticScorer` attempts to build a model of the
current liquidity across the payment channel network. This works
fine to ignore channels we *just* tried to pay through, but it
fails to remember patterns over longer time horizons.
Specifically, there are *many* channels within the network that are
very often either fully saturated in one direction, or are
regularly rebalanced and rarely saturated. While our model may
discover that, when it decays its offsets or if there is a
temporary change in liquidity, it effectively forgets the "normal"
state of the channel.
This causes substantially suboptimal routing in practice, and
avoiding discarding older knowledge when new datapoints come in is
a potential solution to this.
Here, we implement one such design, using the decaying buckets
added in the previous commit to calculate a probability of payment
success based on a weighted average of recent liquidity estimates
for a channel.
For each min/max liquidity bucket pair (where the min liquidity is
less than the max liquidity), we can calculate the probability that
a payment succeeds using our traditional `amount / capacity`
formula. From there, we weigh the probability by the number of
points in each bucket pair, calculating a total probability for the
payment, and assigning a penalty using the same log-probability
calculation used for the non-historical penalties.
This introduces two new fields to the `ChannelLiquidity` struct -
`min_liquidity_offset_history` and `max_liquidity_offset_history`,
both an array of 8 `u16`s. Each entry represents the proportion of
time that we spent with the min or max liquidity offset in the
given 1/8th of the channel's liquidity range. ie the first bucket
in `min_liquidity_offset_history` represents the proportion of time
we've thought the channel's minimum liquidity is lower than 1/8th's
the channel's capacity.
Each bucket is stored, effectively, as a fixed-point number with
5 bits for the fractional part, which is incremented by one (ie 32)
each time we update our liquidity estimates and decide our
estimates are in that bucket. We then decay each bucket by
2047/2048.
Thus, memory of a payment sticks around for more than 8,000
data points, though the majority of that memory decays after 1,387
data points.
When we log liquidity updates, if we decline to update anything as
the new bounds are already within the old bounds, the
directionality of the log entries was reversed.
While we're at it, we also log the old values.
If we receive a monitor event from a forwarded-to channel which
contains a preimage for an HTLC, we have to propogate that preimage
back to the forwarded-from channel monitor. However, once we have
that update, we're running in a relatively unsafe state - we have
the preimage in memory, but if we were to crash the forwarded-to
channel monitor will not regenerate the update with the preimage
for us. If we haven't managed to write the monitor update to the
forwarded-from channel by that point, we've lost the preimage, and,
thus, money!
When a `chain::Watch` `ChannelMonitor` update method is called, the
user has three options:
(a) persist the monitor update immediately and return success,
(b) fail to persist the monitor update immediately and return
failure,
(c) return a flag indicating the monitor update is in progress and
will complete in the future.
(c) is rather harmless, and in some deployments should be expected
to be the return value for all monitor update calls, but currently
requires returning `Err(ChannelMonitorUpdateErr::TemporaryFailure)`
which isn't very descriptive and sounds scarier than it is.
Instead, here, we change the return type used to be a single enum
(rather than a Result) and rename `TemporaryFailure`
`UpdateInProgress`.
While we could, in theory, add support to the bindings logic to map
`Box<dyn Trait>`, there isn't a whole lot of use doing so when its
incredibly trivial to do directly.
This adds a trivial wrapper around `Future::register_callback` that
is only built in bindings and which is linked in the
`register_callback` docs for visibility.
`hashbrown` depends on `ahash` which depends on `once_cell`. Sadly,
in https://github.com/matklad/once_cell/issues/201 the `once_cell`
maintainer decided they didn't want to do the work of having an
MSRV policy for `once_cell`, making `ahash`, and thus `hashbrown`
require the latest compiler. I've reached out to `ahash` to suggest
they drop the dependency (as they could trivially work around not
having it), but until then we simply downgrade `hashbrown`.
`rust-bitcoin` also requires an older `hashbrown` so we're actually
reducing our total `no-std` code here anyway.
In the bindings, we don't directly export any `std` types. Instead,
we provide trivial wrappers around them which expose only a
bindings-compatible API (and which is code in-crate, where the
bindings generator can see it).
We never quite finished this for `MultiThreadedLockableScore` - due
to some limitations in the bindings generator and the way the
scores are used in `lightning-invoice`, the scoring API ended up
being further concretized in patches for the bindings. Luckily the
scoring interface has been rewritten somewhat, and it so happens
that we can now generate bindings with the upstream code.
The final piece of the puzzle is done here, where we add a struct
which wraps `MutexGuard` so that we can expose the lock for
`MultiThreadedLockableScore`.
The `forward_htlc` was prior to this commit only held at the same time
as the `channel_state` lock during the write process of the
`ChannelManager`. This commit removes the lock order dependency, by
taking the `channel_state`lock temporarily during the write process.
As we are eventually removing the `channel_state` lock, this commit
moves the `forward_htlcs` map out of the `channel_state` lock, to ease
that process.
`cargo bench` sets `#[cfg(test)]` so our current checks for
enabling our lockorder debugging end up matching when we're trying
to build performance benchmarks.
This adds explicit checks to our debug_lockorder logic to filter
out `feature = "_bench_unstable"` builds.
See doc updates for more info on the edge case this prevents, and
there isn't really a strong reason why we would need to broadcast
the latest state immediately. Specifically, in the case of HTLC
claims (the most important reason to ensure we have state on chain
if it cannot be persisted), we will still force-close if there are
HTLCs which need claiming and are going to expire.
Surprisingly, there were no tests which failed as a result of this
change, but a new one has been added.
Now that the features contexts track the full set of all known
features, rather than the set of supported features, all defined
features should be listed in the context definition macro.
This adds a compile-time assertion to check that all bits for known
features are set in the context known set.
Now that the `*Features::known` constructor has been removed, there
is no reason to define feature bits as either optional required in
`features.rs` - that logic now belongs in the modules that are
responsible for the given features.
Instead, we only list all features in each context.
As we move towards specify supported/required feature bits in the
module(s) where they are supported, the global `known` feature set
constructors no longer make sense.
Here we (finally) remove the `known` constructor entirely,
modifying tests in the `features` module as required.
As we move towards specify supported/required feature bits in the
module(s) where they are supported, the global `known` feature set
constructors no longer make sense.
In anticipation of removing the `known` constructor, this commit
removes all remaining references to it outside of features.rs.
As we move towards specify supported/required feature bits in the
module(s) where they are supported, the global `known` feature set
constructors no longer make sense.
Here we stop relying on the `known` method in the channel modules.
As we move towards specify supported/required feature bits in the
module(s) where they are supported, the global `known` feature set
constructors no longer make sense.
Here we stop relying on the `known` method in the
functional_test_utils module.
As we move towards specify supported/required feature bits in the
module(s) where they are supported, the global `known` feature set
constructors no longer make sense.
Here we stop relying on the `known` method in the `routing` module,
which was only used in tests.
Historically, LDK has considered the "set of known/supported
feature bits" to be an LDK-level thing. Increasingly this doesn't
make sense - different message handlers may provide or require
different feature sets.
In a previous PR, we began the process of transitioning with
feature bits sent to peers being sourced from the attached message
handler.
This commit makes further progress by moving the concept of which
feature bits are supported by our ChannelManager into
channelmanager.rs itself, via the new `provided_*_features`
methods, rather than in features.rs via the `known_channel_features`
and `known` methods.
Each test featuring HTLCs had a minimum and maximum feerate case. This
is no longer necessary for the zero HTLC transaction anchors variant as
the commitment feerate does not impact whether HTLCs can be trimmed or
not, only the dust limit does.