Since a node may announce that the htlc_maximum_msat of a channel is
zero, adding one to the denominator in the bucket formulas will prevent
the panic from ever happening. While the routing algorithm may never
select such a channel to score, this precaution may still be useful in
case the algorithm changes or if the scorer is used with a different
routing algorithm.
ProbabilisticScorer takes a ChannelUsage when computing a penalty for a
channel. The formula for calculating the liquidity penalty reduces the
maximum capacity by the amount of in-flight HTLCs (available capacity)
and adds one to prevent division by zero.
However, since the available capacity is passed to
DirectedChannelLiquidity as the capacity, other penalty formulas may use
the available (i.e., reduced) capacity inadvertently. In practice, this
has two ramifications for the historical liquidity penalty computation:
1. The bucket formula doesn't have a consistent denominator for a given
channel.
2. The bucket formula may divide by zero when the in-flight HTLC amount
equals or exceeds the effective capacity.
Fixing this involves only using the available capacity when appropriate.
Taking two instances of the same mutex may be totally fine, but it
requires a total lockorder that we cannot (trivially) check. Thus,
its generally unsafe to do if we can avoid it.
To discourage doing this, here we default to panicing on such locks
in our lockorder tests, with a separate lock function added that is
clearly labeled "unsafe" to allow doing so when we can guarantee a
total lockorder.
This requires adapting a number of sites to the new API, including
fixing a bug this turned up in `ChannelMonitor`'s `PartialEq` where
no lockorder was guaranteed.
Our existing lockorder tests assume that a read lock on a thread
that is already holding the same read lock is totally fine. This
isn't at all true. The `std` `RwLock` behavior is
platform-dependent - on most platforms readers can starve writers
as readers will never block for a pending writer. However, on
platforms where this is not the case, one thread trying to take a
write lock may deadlock with another thread that both already has,
and is attempting to take again, a read lock.
Worse, our in-tree `FairRwLock` exhibits this behavior explicitly
on all platforms to avoid the starvation issue.
Thus, we shouldn't have any special handling for allowing recursive
read locks, so we simply remove it here.
fbc08477e8 purported to "move" the
`final_cltv_expiry_delta` field to `PaymentParamters` from
`RouteParameters`. However, for naive backwards-compatibility
reasons it left the existing on in place and only added a new,
redundant field in `PaymentParameters`.
It turns out there's really no reason for this - if we take a more
critical eye towards backwards compatibility we can figure out the
correct value in every `PaymentParameters` while deserializing.
We do this here - making `PaymentParameters` a `ReadableArgs`
taking a "default" `cltv_expiry_delta` when it goes to read. This
allows existing `RouteParameters` objects to pass the read
`final_cltv_expiry_delta` field in to be used if the new field
wasn't present.
When we read a `Route` (or a list of `RouteHop`s), we should never
have zero paths or zero `RouteHop`s in a path. As such, its fine to
simply reject these at deserialization-time. Technically this could
lead to something which we can generate not round-trip'ing
serialization, but that seems okay here.
When using lower level macros such as read_tlv_stream, upgradable_required
fields have been treated as regular options. This is incorrect, they should
either be upgradable_options or treated as required fields.
Forcing users to pass a genesis block hash has ended up being
error-prone largely due to byte-swapping questions for bindings
users. Further, our API is currently inconsistent - in
`ChannelManager` we take a `Bitcoin::Network` but in `NetworkGraph`
we take the genesis block hash.
Luckily `NetworkGraph` is the only remaining place where we require
users pass the genesis block hash, so swapping it for a `Network`
is a simple change.
We're no longer supporting manual retries since
ChannelManager::send_payment_with_retry can be parameterized by a retry
strategy
This commit also updates all docs related to retry_payment and abandon_payment.
Since these docs frequently overlap with changes in preceding commits where we
start abandoning payments on behalf of the user, all the docs are updated in
one go.
Now that we allow `handle_channel_announcement` to (indirectly)
spawn async tasks which will complete later, we have to ensure it
can apply backpressure all the way up to the TCP socket to ensure
we don't end up with too many buffers allocated for UTXO
validation.
We do this by adding a new method to `RoutingMessageHandler` which
allows it to signal if there are "many" checks pending and
`channel_announcement` messages should be delayed. The actual
`PeerManager` implementation thereof is done in the next commit.
Gossip messages which were verified against the chain
asynchronously should still be forwarded to peers, but must now go
out via a new `P2PGossipSync` parameter in the
`AccessResolver::resolve` method, allowing us to wire them up to
the `P2PGossipSync`'s `MessageSendEventsProvider` implementation.
If we have a `channel_announcement` which is waiting on a UTXO
lookup before we can process it, and we receive a `channel_update`
or `node_announcement` for the same channel or a node which is a
part of the channel, we have to wait until the lookup completes
until we can decide if we want to accept the new message.
Here, we store the new message in the pending lookup state and
process it asynchronously like the original `channel_announcement`.
If we receive two `channel_announcement`s for the same channel at
the same time, we shouldn't spawn a second UTXO lookup for an
identical message. This likely isn't too rare - if we start syncing
the graph from two peers at the same time, it isn't unlikely that
we'll end up with the same messages around the same time.
In order to avoid this we keep a hash map of all the pending
`channel_announcement` messages by SCID and simply ignore duplicate
message lookups.
For those operating in an async environment, requiring
`ChainAccess::get_utxo` return information about the requested UTXO
synchronously is incredibly painful. Requesting information about a
random UTXO is likely to go over the network, and likely to be a
rather slow request.
Thus, here, we change the return type of `get_utxo` to have both a
synchronous and asynchronous form. The asynchronous form requires
the user construct a `AccessFuture` which they `clone` and pass
back to us. Internally, an `AccessFuture` has an `Arc` to the
`channel_announcement` message which we need to process. When the
user completes their lookup, they call `resolve` on their
`AccessFuture` which we pull the `channel_announcement` from and
then apply to the network graph.
The `chain::Access` trait (and the `chain::AccessError` enum) is a
bit strange - it only really makes sense if users import it via the
`chain` module, otherwise they're left with a trait just called
`Access`. Worse, for bindings users its always just called
`Access`, in part because many downstream languages don't have a
mechanism to import a module and then refer to it.
Further, its stuck dangling in the `chain` top-level mod.rs file,
sitting in a module that doesn't use it at all (it's only used in
`routing::gossip`).
Instead, we give it its full name - `UtxoLookup` (and rename the
error enum `UtxoLookupError`) and put it in the a new
`routing::utxo` module, next to `routing::gossip`.
Also swaps `PublicKey` for `NodeId` in `get_next_node_announcement`
and `InitSyncTracker` to avoid unnecessary deserialization that came
from changing `UnsignedNodeAnnouncement`.
Adds the macro `get_pubkey_from_node_id`
to parse `PublicKey`s back from `NodeId`s for signature
verification, as well as `make_funding_redeemscript_from_slices`
to avoid parsing back and forth between types.
`PaymentParams` is all about the parameters for a payment, i.e. the
parameters which are static across all the paths of a paymet.
`RouteParameters` is about the information specific to a given
`Route` (i.e. a set of paths, among multiple potential sets of
paths for a payment). The CLTV delta thus doesn't belong in
`RouterParameters` but instead in `PaymentParameters`.
Worse, because `RouteParameters` is built from the information in
the last hops of a `Route`, when we deliberately inflate the CLTV
delta in path-finding, retries of the payment will have the final
CLTV delta double-inflated as it inflates starting from the final
CLTV delta used in the last attempt.
By moving the CLTV delta to `PaymentParameters` we avoid this
issue, leaving only the sought amount in the `RouteParameters`.
When we're calculating if, once we apply the unupdated decays, the
historical data tracker has enough data to assign a score, we
previously calculated the decayed points while walking the buckets
as we don't use the decayed buckets anyway (to avoid losing
precision). That is fine, except that as written it decayed
individual buckets additional times.
Instead, here we actually calculate the full set of decayed buckets
and use those to decide if we have valid points. This adds some
additional stack space and may in fact be slower, but will be
useful in the next commit and shouldn't be a huge change.
Often when we call `compute_fees` we really just want it to
saturate and we deal with `u64::max_value` later. In that case,
we're much better off doing the saturating in the `compute_fees` as
it can use CMOVs rather than branching at each step and then
`unwrap_or`ing at the callsite.
Our network graph has to be iterable in a deterministic order and
with the ability to iterate over a specific range. Thus,
historically, we've used a `BTreeMap` to do the iteration. This is
fine, except our map needs to also provide high performance lookups
in order to make route-finding fast. Sadly, `BTreeMap`s are quite
slow due to the branching penalty.
Here we replace the `BTreeMap`s in the scorer with a dummy wrapper.
In the next commit the internals thereof will be replaced with a
`HashMap`-based implementation.
As evidenced by the previous commit, it appears our A* router
does worse than a more naive approach. This isn't super surpsising,
as the A* heuristic calculation requires a map lookup, which is
relatively expensive.
```
test routing::router::benches::generate_mpp_routes_with_probabilistic_scorer ... bench: 169,991,943 ns/iter (+/- 30,838,048)
test routing::router::benches::generate_mpp_routes_with_zero_penalty_scorer ... bench: 122,144,987 ns/iter (+/- 61,708,911)
test routing::router::benches::generate_routes_with_probabilistic_scorer ... bench: 48,546,068 ns/iter (+/- 10,379,642)
test routing::router::benches::generate_routes_with_zero_penalty_scorer ... bench: 32,898,557 ns/iter (+/- 14,157,641)
```
Historically we've had various bugs in keeping the
`lowest_inbound_channel_fees` field in `NodeInfo` up-to-date as we
go. This leaves the A* routing less efficient as it can't prune
hops as aggressively.
In order to get accurate benchmarks, this commit updates the
minimum-inbound-fees field on load. This is not the most efficient
way of doing so, but suffices for fetching benchmarks and will be
removed in the coming commits.
Note that this is *slower* than the non-updating version in the
previous commit. While I haven't dug into this incredibly deeply,
the graph snapshot in use has min-fee info for only 9,618 of
20,818 nodes. Thus, it is my guess that with the graph snapshot
as-is the branch predictor is able to largely remove the A*
heuristic lookups, but with this change it is forced to wait for
A* heuristic map lookups to complete, causing a performance
regression.
```
test routing::router::benches::generate_mpp_routes_with_probabilistic_scorer ... bench: 182,980,059 ns/iter (+/- 32,662,047)
test routing::router::benches::generate_mpp_routes_with_zero_penalty_scorer ... bench: 151,170,457 ns/iter (+/- 75,351,011)
test routing::router::benches::generate_routes_with_probabilistic_scorer ... bench: 58,187,277 ns/iter (+/- 11,606,440)
test routing::router::benches::generate_routes_with_zero_penalty_scorer ... bench: 41,210,193 ns/iter (+/- 18,103,320)
```
The previous copy was more than one and a half years old, the
lightning network has changed a lot since!
As of this commit, performance on my Xeon W-10885M with a
SK hynix Gold P31 storing a BTRFS volume is as follows:
```
test ln::channelmanager::bench::bench_sends ... bench: 5,896,492 ns/iter (+/- 512,421)
test routing::gossip::benches::read_network_graph ... bench: 1,645,740,604 ns/iter (+/- 47,611,514)
test routing::gossip::benches::write_network_graph ... bench: 234,870,775 ns/iter (+/- 8,301,775)
test routing::router::benches::generate_mpp_routes_with_probabilistic_scorer ... bench: 166,155,032 ns/iter (+/- 30,206,162)
test routing::router::benches::generate_mpp_routes_with_zero_penalty_scorer ... bench: 136,843,661 ns/iter (+/- 67,111,218)
test routing::router::benches::generate_routes_with_probabilistic_scorer ... bench: 52,954,598 ns/iter (+/- 11,360,547)
test routing::router::benches::generate_routes_with_zero_penalty_scorer ... bench: 37,598,126 ns/iter (+/- 17,262,519)
test bench::bench_sends ... bench: 37,760,922 ns/iter (+/- 5,179,123)
test bench::bench_reading_full_graph_from_file ... bench: 25,615 ns/iter (+/- 1,149)
```