Commit graph

8468 commits

Author SHA1 Message Date
YI
988e5aad83 Simplify some code channel.rs 2024-12-23 18:01:46 +08:00
Matt Corallo
463e432e92
Merge pull request #3495 from TheBlueMatt/2024-12-optimal-score-params
Tweak historical scoring model PDF and default penalties
2024-12-20 18:47:26 +00:00
Matt Corallo
adb0afc523 Raise bucket weights to the power four in the historical model
Utilizing the results of probes sent once a minute to a random node
in the network for a random amount (within a reasonable range), we
were able to analyze the accuracy of our resulting success
probability estimation with various PDFs across the historical and
live-bounds models.

For each candidate PDF (as well as other parameters, including the
histogram bucket weight), we used the
`min_zero_implies_no_successes` fudge factor in
`success_probability` as well as a total probability multiple fudge
factor to get both the historical success model and the a priori
model to be neither too optimistic nor too pessimistic (as measured
by the relative log-loss between succeeding and failing hops in our
sample data).

We then compared the resulting log-loss for the historical success
model and selected the candidate PDF with the lowest log-loss,
skipping a few candidates with similar resulting log-loss but with
more extreme constants (such as a power of 11 with a higher
`min_zero_implies_no_successes` penalty).

Somewhat surprisingly (to me at least), the (fairly strongly)
preferred model was one where the bucket weights in the historical
histograms are exponentiated. In the current design, the weights
are effectively squared as we multiply the minimum- and maximum-
histogram buckets together before adding the weight*probabilities
together.

Here we multiply the weights yet again before addition. While the
simulation runs seemed to prefer a slightly stronger weight than
the 4th power we do here, the difference wasn't substantial
(log-loss 0.5058 to 0.4941), so we do the simpler single extra
multiply here.

Note that if we did this naively we'd run out of bits in our
arithmetic operations - we have 16-bit buckets, which when raised
to the 4th can fully fill a 64-bit int. Additionally, when looking
at the 0th min-bucket we occasionally add up to 32 weights together
before multiplying by the probability, requiring an additional five
bits.

Instead, we move to using floats during our histogram walks, which
further avoids some float -> int conversions because it allows for
retaining the floats we're already using to calculate probability.

Across the last handful of commits, the increased pessimism more
than makes up for the increased runtime complexity, leading to a
40-45% pathfinding speedup on a Xeon Silver 4116 and a 25-45%
speedup on a Xeon E5-2687W v3.

Thanks to @twood22 for being a sounding board and helping analyze
the resulting PDF.
2024-12-19 20:15:15 +00:00
Matt Corallo
85afe25e72 Split success_probability calculation into two separate methods
In the next commit we'll want to return floats or ints from
`success_probability` depending on the callsite, so instead of
duplicating the calculation logic, here we split the linear (which
always uses int math) and nonlinear (which always uses float math)
into separate methods, allowing us to write trivial
`success_probability` wrappers that return the desired type.
2024-12-19 20:15:14 +00:00
Matt Corallo
cd33ade75b Reduce the default default channel bounds half-life
Utilizing the results of probes sent once a minute to a random node
in the network for a random amount (within a reasonable range), we
were able to analyze the accuracy of our resulting success
probability estimation with various PDFs across the historical and
live-bounds models.

For each candidate PDF (as well as other parameters, to be tuned in
the coming commits), we used the `min_zero_implies_no_successes`
fudge factor in `success_probability` as well as a total
probability multiple fudge factor to get both the historical
success model and the a priori model to be neither too optimistic
nor too pessimistic (as measured by the relative log-loss between
succeeding and failing hops in our sample data).

Across the simulation runs, for a given PDF and other parameters,
we nearly always did better with a shorter half-life (even as short
as 1ms, i.e. only learning per-probe rather than across probes).
While this likely makes sense for nodes which do live probing, not
all nodes do, and thus we should avoid over-biasing on the dataset
we have.

While it may make sense to only learn per-payment and not across
payments, I can't fully rationalize this result and thus want to
avoid over-tuning, so here we reduce the half-life from 6 hours to
30 minutes.
2024-12-19 20:04:34 +00:00
Matt Corallo
6582f29f06 Skip calculation steps when the liquidity penalty multipliers are 0
If the liquidity penalty multipliers in the scoring config are both
0 (as is now the default), the corresponding liquiditiy penalties
will be 0. Thus, we should avoid doing the work to calculate them
if we're ultimately just gonna get a value of zero anyway, which we
do here.
2024-12-19 20:04:34 +00:00
Matt Corallo
e65900254e Update the default scoring parameters to use historical model only
Utilizing the results of probes sent once a minute to a random node
in the network for a random amount (within a reasonable range), we
were able to analyze the accuracy of our resulting success
probability estimation with various PDFs across the historical and
live-bounds models.

For each candidate PDF (as well as other parameters, to be tuned in
the coming commits), we used the `min_zero_implies_no_successes`
fudge factor in `success_probability` as well as a total
probability multiple fudge factor to get both the historical
success model and the a priori model to be neither too optimistic
nor too pessimistic (as measured by the relative log-loss between
succeeding and failing hops in our sample data).

We then compared the resulting log-loss for the historical success
model and selected the candidate PDF with the lowest log-loss,
skipping a few candidates with similar resulting log-loss but with
more extreme constants (such as a power of 11 with a higher
`min_zero_implies_no_successes` penalty).

In every case, the historical model performed substantially better
than the live-bounds model, so here we simply disable the
live-bounds model by default and use only the historical model.
Further, we use the calculated total probability multiple fudge
factor (0.7886892844179266) to choose the ratio between the
historical model and the per-hop penalty (as multiplying each hop's
probability by 78% is equivalent to adding a per-hop penalty of
log10(0.78) of our probabilistic penalty).

We take this opportunity to bump the penalties up a bit as well, as
anecdotally LDK users are willing to pay more than they do today to
get more successful paths.

Fixes #3040
2024-12-19 20:04:34 +00:00
Matt Corallo
e422ab207e Use a new PDF for our channel liquidity estimation when scoring
Utilizing the results of probes sent once a minute to a random node
in the network for a random amount (within a reasonable range), we
were able to analyze the accuracy of our resulting success
probability estimation with various PDFs.

For each candidate PDF (as well as other parameters, to be tuned in
the coming commits), we used the `min_zero_implies_no_successes`
fudge factor in `success_probability` as well as a total
probability multiple fudge factor to get both the historical
success model and the a priori model to be neither too optimistic
nor too pessimistic (as measured by the relative log-loss between
succeeding and failing hops in our sample data).

We then compared the resulting log-loss for the historical success
model and selected the candidate PDF with the lowest log-loss,
skipping a few candidates with similar resulting log-loss but with
more extreme constants (such as a power of 11 with a higher
`min_zero_implies_no_successes` penalty).

This resulted in a PDF of `128 * (1/256 + 9*(x - 0.5)^8)` with a
`min_zero_implies_no_successes` probability multiplier of 64/78.

Thanks to @twood22 for being a sounding board and helping analyze
the resulting PDF.
2024-12-19 20:00:52 +00:00
Matt Corallo
d414ba9128
Merge pull request #3493 from tnull/2024-12-3436-followup
Follow-ups to #3436
2024-12-19 16:25:01 +00:00
Elias Rohrer
dd91418463
Add notes to docs/README to indicate beta status of service-side
As a few things are missing (most importantly persistence), we add notes
that the service-side integration is currently considered 'beta'.
2024-12-19 17:12:09 +01:00
Elias Rohrer
6e06262935
Update best_block field in Confirm::best_block_updated
Previously, we wouldn't set the field as we aren't yet making use of it.
Here, we start setting the field. To this end, we make `best_block` an
`RwLock<Option<BestBlock>>` rather than `Option<RwLock<BestBlock>>`.
2024-12-18 10:12:53 +01:00
Elias Rohrer
8825fc387b
Drop disconnecting peers from the list of ignored peers
When a peer misbehaves/sends bogus data we reply with an error message
and insert it to the ignored list.

Here, we avoid having this list grow unboundedly over time by removing
peers again once they disconnect, allowing them a second chance upon
reconnection.
2024-12-18 09:54:05 +01:00
Matt Corallo
42cc4e7f4c
Merge pull request #3486 from TheBlueMatt/2024-12-async-sign
Remove the async_signing cfg flag
2024-12-17 19:41:50 +00:00
Matt Corallo
d2172e389c Update signer docs to describe which methods are async
We should update the return types on the signing methods here as
well, but we should at least start by documenting which methods are
async and which are not.

Once we complete async support for `get_per_commitment_point`, we
can change the return types as most things in the channel signing
traits will be finalized.
2024-12-17 18:36:39 +00:00
valentinewallace
0e6f47e23f
Merge pull request #3490 from TheBlueMatt/2024-12-nits
Fix fuzz deadlock
2024-12-17 13:29:15 -05:00
Matt Corallo
a91bf02b24 Fix validation of payment success in chanmon_consistency fuzzer
Prior to bcaba29f92, the
`chanmon_consistency` fuzzer checked that payments sent either
succeeded or failed as expected by looking at the `APIError` which
we received as a result of calling the send method.

bcaba29f92 removed the legacy send
method during fuzzing so attempted to replicate the old logic by
checking for events which contained the legacy `APIError` value.
While this was plenty thorough, it was somewhat brittle in that it
made expectations about the event state of a `ChannelManager` which
turned out to not be true.

Instead, here, we validate the send correctness by examining the
`RecentPaymentDetails` list from a `ChannelManager` immediately
after sending.
2024-12-17 15:37:43 +00:00
Elias Rohrer
4cb4d66e1b
Address doc nits
We address some minor mistakes that made it into the docs before.
2024-12-17 11:06:15 +01:00
Matt Corallo
177a6122e6 Correct route construction in chanmon_consistency fuzz
bcaba29f92 started returning
pre-built `Route`s from the router in the `chanmon_consistency`
fuzzer. In doing so, it didn't properly fill in the `route_parms`
field which is expected to match the requested parameters. This
causes a debug assertion when sending.

Here we fix this by setting the correct `route_params`.
2024-12-17 04:14:11 +00:00
Matt Corallo
351d414457 Fix deadlock in chanmon_consistency fuzzer
bcaba29f92 introduced a deadlock in
the `chanmon_consistency` fuzzer by holding a lock on the route
expectations before sending a payment, which ultimately tries to
lock the route expectations. Here we fix this deadlock.
2024-12-17 03:55:03 +00:00
Matt Corallo
54725ceee4 Marginally expand do_test_data_migration coverage
This marginally expands the test coverage in our persister
`do_test_data_migration` test.
2024-12-17 03:31:32 +00:00
Matt Corallo
dcc531ebed
Merge pull request #3481 from tnull/2024-12-add-kvstore-migration-ext
Add `MigratableKVStore` trait
2024-12-17 02:39:00 +00:00
Matt Corallo
6ad40f996a
Merge pull request #3436 from tnull/2024-12-add-lightning-liquidity-crate
Add `lightning-liquidity` crate to the workspace
2024-12-17 02:02:55 +00:00
Elias Rohrer
b0af39faaf
Add test for FilesystemStore-to-FilesystemStore migration 2024-12-16 20:07:57 +01:00
Elias Rohrer
4c1f6bdf34
Add migrate_kv_store_data util method
.. allowing to migrate data from one store to another.
2024-12-16 20:07:56 +01:00
Elias Rohrer
96f7bfda8c
Implement MigratableKVStore for FilesystemStore
We implement the new interface on `FilesystemStore`, in particular
`list_all_keys`.
2024-12-16 20:07:56 +01:00
Elias Rohrer
d69a5ee460
Prefactor: DRY up dir entry checks and key extraction to helper methods
.. which we'll be reusing shortly.
2024-12-16 20:07:56 +01:00
Matt Corallo
f53a09d6dc
Merge pull request #3485 from dunxen/2024-12-cfgflagdualfunding
Reintroduce cfg(dual_funding) for handling of open_channel2 messages
2024-12-16 17:51:14 +00:00
Valentine Wallace
96db8aa3d2
Add onion message AsyncPaymentsContext for inbound payments
This context is included in static invoice's blinded message paths, provided
back to us in HeldHtlcAvailable onion messages for blinded path authentication.
In future work, we will check if this context is valid and respond with a
ReleaseHeldHtlc message to release the upstream payment if so.

We also add creation methods for the hmac used for authenticating said blinded
path.
2024-12-16 11:30:23 -05:00
Matt Corallo
c55f54802c
Merge pull request #3414 from TheBlueMatt/2024-09-async-persist-claiming-from-closed-chan
Support async `ChannelMonitorUpdate` persist for claims against closed channels
2024-12-16 16:06:48 +00:00
Elias Rohrer
f68c6c5be1
LSPS2: Limit the total number of peers
While LDK/`ChannelManager` should already introduce an upper-bound on
the number of peers, here we assert that our `PeerState` map can't
grow unboundedly. To this end, we simply return an `Internal error` and
abort when we would hit the limit of 100000 peers.
2024-12-16 16:19:27 +01:00
Elias Rohrer
7a8952110c
LSPS2: Include channels pending intial payment in the per-peer limit
We include any `OutboundJITChannel` that has not made it further than
`PendingInitialPayment` in the per-peer request limit, and will of
course prune it once it expires.
2024-12-16 16:19:27 +01:00
Elias Rohrer
440962e4fe
LSPS2: Prune empty PeerStates
In addition to pruning expired requests on peer disconnection we also
regularly prune for all peers on block connection, and also remove the
entire `PeerState` if it's empty after pruning (i.e., has no pending
requsts or in-flight channels left).
2024-12-16 16:19:27 +01:00
Elias Rohrer
776ede44cb
LSPS2: Also prune expired OutboundJITChannels pending initial payments
We're now also pruning any expired `OutboundJITChannels` if we haven't
seen any related HTLCs.
2024-12-16 16:19:26 +01:00
Elias Rohrer
b39c8b09ba
LSPS2: Prune expired buy requests on disconnection
.. we clean up any pending buy requests that hit their `valid_until`
time when the counterparty disconnects.
2024-12-16 16:19:24 +01:00
Duncan Dean
76608f7c29
Modify release notes for PR 3137
We will not support accepting V2 channels in the v0.1 release, but
we do need to document the API change for push_msats -> channel_negotiation_type.
2024-12-16 06:07:05 +02:00
Duncan Dean
6f8328ebd1
Reintroduce cfg(dual_funding) for handling of open_channel2 messages 2024-12-16 06:06:48 +02:00
Matt Corallo
47ca19d39e Remove the async_signing cfg flag
Now that the core features required for `async_signing` are in
place, we can go ahead and expose it publicly (rather than behind a
a `cfg`-flag). We still don't have full async support for
`get_per_commitment_point`, but only one case in channel
reconnection remains. The overall logic may still have some
hiccups, but its been in use in production at a major LDK user for
some time now. Thus, it doesn't really make sense to hide behind a
`cfg`-flag, even if the feature is only 99% complete. Further, the
new paths exposed are very restricted to signing operations that
run async, so the risk for existing users should be incredibly low.
2024-12-16 00:39:39 +00:00
Matt Corallo
79190adcc1 DRY the pre-startup ChannelMonitorUpdate handling
This moves the common `if during_startup { push background event }
else { apply ChannelMonitorUpdate }` pattern by simply inlining it
in `handle_new_monitor_update`.
2024-12-16 00:27:13 +00:00
Matt Corallo
41f703c6d5 Support async ChannelMonitorUpdates to closed chans at startup
One of the largest gaps in our async persistence functionality has
been preimage (claim) updates to closed channels. Here we finally
implement support for this (for updates which are generated during
startup).

Thanks to all the work we've built up over the past many commits,
this is a fairly straightforward patch, removing the
immediate-completion logic from `claim_mpp_part` and adding the
required in-flight tracking logic to
`apply_post_close_monitor_update`.

Like in the during-runtime case in the previous commit, we sadly
can't use the `handle_new_monitor_update` macro wholesale as it
handles the `Channel` resumption as well which we don't do here.
2024-12-16 00:27:13 +00:00
Matt Corallo
260f8759b0 Don't double-claim MPP payments that are pending on multiple chans
On startup, we walk the preimages and payment HTLC sets on all our
`ChannelMonitor`s, re-claiming all payments which we recently
claimed. This ensures all HTLCs in any claimed payments are claimed
across all channels.

In doing so, we expect to see the same payment multiple times,
after all it may have been received as multiple HTLCs across
multiple channels. In such cases, there's no reason to redundantly
claim the same set of HTLCs again and again. In the current code,
doing so may lead to redundant `PaymentClaimed` events, and in a
coming commit will instead cause an assertion failure.
2024-12-16 00:27:13 +00:00
Matt Corallo
e938ed74bb Support async ChannelMonitorUpdates to closed chans at runtime
One of the largest gaps in our async persistence functionality has
been preimage (claim) updates to closed channels. Here we finally
implement support for this (for updates at runtime).

Thanks to all the work we've built up over the past many commits,
this is a well-contained patch within `claim_mpp_part`, pushing
the generated `ChannelMonitorUpdate`s through the same pipeline we
use for open channels.

Sadly we can't use the `handle_new_monitor_update` macro wholesale
as it handles the `Channel` resumption as well which we don't do
here.
2024-12-16 00:27:13 +00:00
Matt Corallo
3395938771 Add an additional variant to handle_new_monitor_update!
In d1c340a0e1 we added support in
`handle_new_monitor_update!` for handling updates without dropping
locks.

In the coming commits we'll start handling `ChannelMonitorUpdate`s
"like normal" for updates against closed channels. Here we set up
the first step by adding a new `POST_CHANNEL_CLOSE` variant on
`handle_new_monitor_update!` which attempts to handle the
`ChannelMonitorUpdate` and handles completion actions if it
finishes immediately, just like the pre-close variant.
2024-12-16 00:27:13 +00:00
Matt Corallo
1481216793 Set closed chan mon upd update_ids at creation not application
In c99d3d785d we added a new
`apply_post_close_monitor_update` method which takes a
`ChannelMonitorUpdate` (possibly) for a channel which has been
closed, sets the `update_id` to the right value to keep our updates
well-ordered, and then applies it.

Setting the `update_id` at application time here is fine - updates
don't really have an order after the channel has been closed, they
can be applied in any order - and was done for practical reasons
as calculating the right `update_id` at generation time takes a
bit more work on startup, and was impossible without new
assumptions during claim.

In the previous commit we added exactly the new assumption we need
at claiming (as it's required for the next few commits anyway), so
now the only thing stopping us is the extra complexity.

In the coming commits, we'll move to tracking post-close
`ChannelMonitorUpdate`s as in-flight like any other updates, which
requires having an `update_id` at generation-time so that we know
what updates are still in-flight.

Thus, we go ahead and eat the complexity here, creating
`update_id`s when the `ChannelMonitorUpdate`s are generated for
closed-channel updates, like we do for channels which are still
live.

We also ensure that we always insert `ChannelMonitorUpdate`s in the
pending updates set when we push the background event, avoiding a
race where we push an update as a background event, then while its
processing another update finishes and the post-update actions get
run.
2024-12-16 00:27:13 +00:00
Matt Corallo
c62cd1551a
Merge pull request #3109 from alecchendev/2024-06-async-commit-point-funding
Handle fallible per commitment point in channel establishment
2024-12-15 23:08:17 +00:00
Alec Chen
d1e94bd5ee Add test for async open and accept channel
Here we make a test that disables a channel signer's ability
to return commitment points upon being first derived for a channel.

We also fit in a couple cleanups: removing a comment referencing a
previous design with a `HolderCommitmentPoint::Uninitialized` variant,
as well as adding coverage for updating channel maps in async closing
signed.
2024-12-13 13:06:31 -08:00
Alec Chen
e64af019f3 Handle fallible commitment point when getting channel_ready
Here we handle the case where our signer is pending the next commitment
point when we try to send channel ready. We set a flag to remember to
send this message when our signer is unblocked. This follows the same
general pattern as everywhere else where we're waiting on a commitment
point from the signer in order to send a message.
2024-12-13 13:06:31 -08:00
Alec Chen
8058a600d0 Handle fallible commitment point when getting accept_channel
Similar to `open_channel`, if a signer cannot provide a commitment point
immediately, we set a flag to remember we're waiting for a point to send
`accept_channel`. We make sure to get the first two points before moving
on, so when we advance our commitment we always have a point available.
2024-12-13 13:06:31 -08:00
Alec Chen
08251ca14b Move setting signer flags to get_funding_created_msg
For all of our async signing logic in channel establishment v1, we set
signer flags in the method where we create the raw lightning message
object. To keep things consistent, this commit moves setting the signer
flags to where we create funding_created, since this was being set
elsewhere before.

While we're doing this cleanup, this also slightly refactors our
funding_signed method to move some code out of an indent, as well
as removes a log to fix a nit from #3152.
2024-12-13 13:06:31 -08:00
Alec Chen
5026d7114c Handle fallible commitment point for open_channel message
In the event that a signer cannot provide a commitment point
immediately, we set a flag to remember we're waiting for this before we
can send `open_channel`. We make sure to get the first two commitment
points, so when we advance commitments, we always have a commitment
point available.

When initializing a context, we set the `signer_pending_open_channel`
flag to false, and leave setting this flag for where we attempt to
generate a message.

When checking to send messages when a signer is unblocked, we must
handle both when we haven't gotten any commitment point, as well as when
we've gotten the first but not the second point.
2024-12-13 13:06:31 -08:00
Alec Chen
2de72f3490 Remove holder commitment point from channel context
Following a previous commit adding `HolderCommitmentPoint` elsewhere, we
make the transition to use those commitment points and remove the
existing one.
2024-12-13 13:06:28 -08:00