When we forward a payment and receive an `update_fulfill_htlc`
message from the downstream channel, we immediately claim the HTLC
on the upstream channel, before even doing a `commitment_signed`
dance on the downstream channel. This implies that our
`ChannelMonitorUpdate`s "go out" in the right order - first we
ensure we'll get our money by writing the preimage down, then we
write the update that resolves giving money on the downstream node.
This is safe as long as `ChannelMonitorUpdate`s complete in the
order in which they are generated, but of course looking forward we
want to support asynchronous updates, which may complete in any
order.
Thus, here, we enforce the correct ordering by blocking the
downstream `ChannelMonitorUpdate` until the upstream one completes.
Like the `PaymentSent` event handling we do so only for the
`revoke_and_ack` `ChannelMonitorUpdate`, ensuring the
preimage-containing upstream update has a full RTT to complete
before we actually manage to slow anything down.
In 0ad1f4c943 we fixed a nasty bug
where a failure to persist a `ChannelManager` faster than a
`ChannelMonitor` could result in the loss of a `PaymentSent` event,
eventually resulting in a `PaymentFailed` instead!
As noted in that commit, there's still some risk, though its been
substantially reduced - if we receive an `update_fulfill_htlc`
message for an outbound payment, and persist the initial removal
`ChannelMonitorUpdate`, then respond with our own
`commitment_signed` + `revoke_and_ack`, followed by receiving our
peer's final `revoke_and_ack`, and then persist the
`ChannelMonitorUpdate` generated from that, all prior to completing
a `ChannelManager` persistence, we'll still forget the HTLC and
eventually trigger a `PaymentFailed` rather than the correct
`PaymentSent`.
Here we fully fix the issue by delaying the final
`ChannelMonitorUpdate` persistence until the `PaymentSent` event
has been processed and document the fact that a spurious
`PaymentFailed` event can still be generated for a sent payment.
The original fix in 0ad1f4c943 is
still incredibly useful here, allowing us to avoid blocking the
first `ChannelMonitorUpdate` until the event processing completes,
as this would cause us to add event-processing delay in our general
commitment update latency. Instead, we ultimately race the user
handling the `PaymentSent` event with how long it takes our
`revoke_and_ack` + `commitment_signed` to make it to our
counterparty and receive the response `revoke_and_ack`. This should
give the user plenty of time to handle the event before we need to
make progress.
Sadly, because we change our `ChannelMonitorUpdate` semantics, this
change requires a number of test changes, avoiding checking for a
post-RAA `ChannelMonitorUpdate` until after we process a
`PaymentSent` event. Note that this does not apply to payments we
learned the preimage for on-chain - ensuring `PaymentSent` events
from such resolutions will be addressed in a future PR. Thus, tests
which resolve payments on-chain switch to a direct call to the
`expect_payment_sent` function with the claim-expected flag unset.
When reloading a node in the test framework, we end up with a new
`ChannelManager` that has references to various test util structs.
In order for the tests to compile reliably in the face of unrelated
changes, those test structs need to always be initialized before
both the new but also the original `ChannelManager`.
Here we make that change.
Makes it easier to add new arguments without a ton of resulting test changes.
Useful for route blinding testing because we need to check for malformed HTLCs,
which is not currently supported by reconnect_nodes. It also makes it easier to
tell what is being checked in relevant tests.
If a `ChannelMonitorUpdate` completes being persisted, but the
`ChannelManager` isn't informed thereof (or isn't persisted) before
shutdown, on startup we may still have it listed as in-flight. When
we compare the available `ChannelMonitor` with the in-flight set,
we'll notice it completed and remove it, however this may leave
some post-update actions dangling which need to complete.
Here we handle this with a new `BackgroundEvent` indicating we need
to handle any post-update action(s) for a given channel.
This is a first step for simplifying the channel state and introducing
new unfunded channel types that hold similar state before being promoted
to funded channels.
Essentially, we want the outer `Channel` type (and upcoming channel types)
to wrap the context so we can apply typestate patterns to the that wrapper
while also deduplicating code for common state and other internal fields.
This was a fairly old introduction to the spec to allow nodes to indicate
to their peers what chains they are interested in (i.e. will open channels
and gossip for).
We don't do any of the handling of this message in this commit and leave
that to the very next commit, so the behaviour is effectively the same
(ignore networks preference).
`rust-bitcoin v0.30.0` introduces concrete variants for data members of
block `Header`s. To avoid having to update these across every use, we
introduce new helpers to create dummy blocks and headers, such that the
update process is a bit more straight-forward.
In the coming commits, we need to delay `ChannelMonitorUpdate`s
until future actions (specifically `Event` handling). However,
because we should only notify users once of a given
`ChannelMonitorUpdate` and they must be provided in-order, we need
to track which ones have or have not been given to users and, once
updating resumes, fly the ones that haven't already made it to
users.
To do this we simply add a `bool` in the `ChannelMonitorUpdate` set
stored in the `Channel` which indicates if an update flew and
decline to provide new updates back to the `ChannelManager` if any
updates have their flown bit unset.
Further, because we'll now by releasing `ChannelMonitorUpdate`s
which were already stored in the pending list, we now need to
support getting a `Completed` result for a monitor which isn't the
only pending monitor (or even out of order), thus we also rewrite
the way monitor updates are marked completed.
Now that we guarantee `claim_payment` will always succeed we have
to let the user know what the deadline is. We still fail payments
if they haven't been claimed in time, which we now expose in
`PaymentClaimable`.
This moves the public payment sending API from passing an explicit
`PaymentSecret` to a new `RecipientOnionFields` struct (which
currently only contains the `PaymentSecret`). This gives us
substantial additional flexibility as we look at add both
`PaymentMetadata`, a new (well, year-or-two-old) BOLT11 invoice
extension to provide additional data sent to the recipient.
In the future, we should also add the ability to add custom TLV
entries in the `RecipientOnionFields` struct.
Currently, users don't have good way of being notified when channel open
negotiations have succeeded and new channels are pending confirmation on
chain. To this end, we add a new `ChannelPending` event that is emitted
when send or receive a `funding_signed` message, i.e., at the last
moment before waiting for the confirmation period.
We track whether the event had previously been emitted in `Channel` and
remove it from `internal_funding_created` entirely. Hence, we now
only emit the event after ChannelMonitorUpdate completion, or upon
channel reestablish. This mitigates a race condition where where we
wouldn't persist the event *and* wouldn't regenerate it on restart,
therefore potentially losing it, if async CMU wouldn't complete before
ChannelManager persistence.
This is largely motivated by some follow-up work for anchors that will
introduce an event handler for `BumpTransaction` events, which we can
now include in this new top-level `events` module.
Taking two instances of the same mutex may be totally fine, but it
requires a total lockorder that we cannot (trivially) check. Thus,
its generally unsafe to do if we can avoid it.
To discourage doing this, here we default to panicing on such locks
in our lockorder tests, with a separate lock function added that is
clearly labeled "unsafe" to allow doing so when we can guarantee a
total lockorder.
This requires adapting a number of sites to the new API, including
fixing a bug this turned up in `ChannelMonitor`'s `PartialEq` where
no lockorder was guaranteed.
Our existing lockorder tests assume that a read lock on a thread
that is already holding the same read lock is totally fine. This
isn't at all true. The `std` `RwLock` behavior is
platform-dependent - on most platforms readers can starve writers
as readers will never block for a pending writer. However, on
platforms where this is not the case, one thread trying to take a
write lock may deadlock with another thread that both already has,
and is attempting to take again, a read lock.
Worse, our in-tree `FairRwLock` exhibits this behavior explicitly
on all platforms to avoid the starvation issue.
Thus, we shouldn't have any special handling for allowing recursive
read locks, so we simply remove it here.
We currently have two codepaths on most channel update functions -
most methods return a set of messages to send a peer iff the
`ChannelMonitorUpdate` succeeds, but if it does not we push the
messages back into the `Channel` and then pull them back out when
the `ChannelMonitorUpdate` completes and send them then. This adds
a substantial amount of complexity in very critical codepaths.
Instead, here we swap all our channel update codepaths to
immediately set the channel-update-required flag and only return a
`ChannelMonitorUpdate` to the `ChannelManager`. Internally in the
`Channel` we store a queue of `ChannelMonitorUpdate`s, which will
become critical in future work to surface pending
`ChannelMonitorUpdate`s to users at startup so they can complete.
This leaves some redundant work in `Channel` to be cleaned up
later. Specifically, we still generate the messages which we will
now ignore and regenerate later.
This commit updates the `ChannelMonitorUpdate` pipeline across all
the places we generate them.
Long ago, we used the `no_connection_possible` to signal that a
peer has some unknown feature set or some other condition prevents
us from ever connecting to the given peer. In that case we'd
automatically force-close all channels with the given peer. This
was somewhat surprising to users so we removed the automatic
force-close, leaving the flag serving no LDK-internal purpose.
Distilling the concept of "can we connect to this peer again in the
future" to a simple flag turns out to be ripe with edge cases, so
users actually using the flag to force-close channels would likely
cause surprising behavior.
Thus, there's really not a lot of reason to keep the flag,
especially given its untested and likely to be broken in subtle
ways anyway.
In the next commit(s) we'll start holding `ChannelMonitorUpdate`s
that are being persisted in `Channel`s until they're done
persisting. In order to do that, switch to applying the updates by
reference instead of value.
This is purely a refactor that does not change the InitFeatures
advertised by a ChannelManager. This allows users to configure which
features should be advertised based on the values of `UserConfig`. While
there aren't any existing features currently leveraging this behavior,
it will be used by the upcoming anchors_zero_fee_htlc_tx feature.
The UserConfig dependency on provided_init_features caused most
callsites of the main test methods responsible for opening channels to
be updated. This commit foregos that completely by no longer requiring
the InitFeatures of each side to be provided to these methods. The
methods already require a reference to each node's ChannelManager to
open the channel, so we use that same reference to obtain their
InitFeatures. A way to override such features was required for some
tests, so a new `override_init_features` config option now exists on
the test harness.
Since `ChannelMonitor`s will now re-derive signers rather than
persisting them, we can no longer use the OnlyReadsKeysInterface
concrete implementation.
When we process a `channel_reestablish` message we free the HTLC
update holding cell as things may have changed while we were
disconnected. However, some time ago, to handle freeing from the
holding cell when a monitor update completes, we added a holding
cell freeing check in `get_and_clear_pending_msg_events`. This
leaves the in-`channel_reestablish` holding cell clear redundant,
as doing it immediately or is `get_and_clear_pending_msg_events` is
not a user-visible difference.
Thus, we remove the redundant code here, substantially simplifying
`handle_chan_restoration_locked` while we're at it.
Asserting that specific log entries were printed isn't all that
useful, we should really be focusing on the expected messages (or,
when a monitor udpate fails, the lack thereof). In the next commit
one of these log checks would otherwise break due to the particular
time a monitor update fails changing, but I also plan on reworking
the montior update flows substantially soon, breaking lots of them.
In c986e52ce8, an `MppId` was added
to `HTLCSource` objects as a way of correlating HTLCs which belong
to the same payment when the `ChannelManager` sees an HTLC
succeed/fail. This allows it to have awareness of the state of all
HTLCs in a payment when it generates the ultimate user-facing
payment success/failure events. This was used in the same PR to
avoid generating duplicative success/failure events for a single
payment.
Because the field was only used as an internal token to correlate
HTLCs, and retries were not supported, it was generated randomly by
calling the `KeysInterface`'s 32-byte random-fetching function.
This also provided a backwards-compatibility story as the existing
HTLC randomization key was re-used for older clients.
In 28eea12bbe `MppId` was renamed to
the current `PaymentId` which was then used expose the
`retry_payment` interface, allowing users to send new HTLCs which
are considered a part of an existing payment.
At no point has the payment-sending API seriously considered
idempotency, a major drawback which leaves the API unsafe in most
deployments. Luckily, there is a simple solution - because the
`PaymentId` must be unique, and because payment information for a
given payment is held for several blocks after a payment
completes/fails, it represents an obvious idempotency token.
Here we simply require the user provide the `PaymentId` directly in
`send_payment`, allowing them to use whatever token they may
already have for a payment's idempotency token.