`BackgroundEvent` was used to store `ChannelMonitorUpdate`s which
result in a channel force-close, avoiding relying on
`ChannelMonitor`s having been loaded while `ChannelManager`
block-connection methods are called during startup.
In the coming commit(s) we'll also generate non-channel-closing
`ChannelMonitorUpdate`s during startup, which will need to be
replayed prior to any other `ChannelMonitorUpdate`s generated from
normal operation.
In the next commit we'll handle that by handling `BackgroundEvent`s
immediately after locking the `total_consistency_lock`.
Rather than letting `AChannelManager` be bounded by all traits
being `Sized` we make them explicitly `?Sized`. We also make the
trait no longer test-only as it will be used in a coming commit.
In the coming commits we'll need the counterparty node_id when
handling a background monitor update as we may need to resume
normal channel operation as a result. Thus, we go ahead and pipe it
through from the shutdown end, as it makes the codepaths
consistent.
Sadly, the monitor-originated shutdown case doesn't allow for a
required counterparty node_id as some versions of LDK didn't have
it present in the ChannelMonitor.
This makes much clearer at sites generating such events that they
will be lost on restart, to reduce risk of bugs creeping in due to
lost monitor updates.
In d4810087c1 we added logic to apply `ChannelMonitorUpdate`s which
were a part of a channel closure async via a background queue to
address some startup issues. When we did that we persisted those
updates to ensure we replayed them when starting next time.
However, there was no reason to - if we persisted and then
restarted even without those monitor updates we'd find a monitor
without a channel, which we'd tell to broadcast the latest
commitment transaction to force-close.
Since adding that logic, we've used the same background queue for
several purposes.
The previous commits set up the ability for us to hold
`ChannelMonitorUpdate`s which are pending until we're ready to pass
them to users and have them be applied. However, if the
`ChannelManager` is persisted while we're waiting to give the user
a `ChannelMonitorUpdate` we'll be confused on restart - seeing our
latest `ChannelMonitor` state as stale compared to our
`ChannelManager` - a critical error.
Luckily the solution is trivial, we simply need to store the
pending `ChannelMonitorUpdate` state and load it with the
`ChannelManager` data, allowing stale monitors on load as long as
we have the missing pending updates between where we are and the
latest `ChannelMonitor` state.
This adds handling of the new `EventCompletionAction`s after
`Event`s are handled, letting `ChannelMonitorUpdate`s which were
blocked fly after a relevant `Event`.
This will allow us to block `ChannelMonitorUpdate`s on `Event`
processing in the next commit.
Note that this gets dangerously close to breaking forwards
compatibility - if we have an `Event` with an
`EventCompletionAction` tied to it, we persist a new, even, TLV in
the `ChannelManager`. Hopefully this should be uncommon, as it
implies an `Event` was delayed until after a full round-trip to a
peer.
In the coming commits, we need to delay `ChannelMonitorUpdate`s
until future actions (specifically `Event` handling). However,
because we should only notify users once of a given
`ChannelMonitorUpdate` and they must be provided in-order, we need
to track which ones have or have not been given to users and, once
updating resumes, fly the ones that haven't already made it to
users.
To do this we simply add a `bool` in the `ChannelMonitorUpdate` set
stored in the `Channel` which indicates if an update flew and
decline to provide new updates back to the `ChannelManager` if any
updates have their flown bit unset.
Further, because we'll now by releasing `ChannelMonitorUpdate`s
which were already stored in the pending list, we now need to
support getting a `Completed` result for a monitor which isn't the
only pending monitor (or even out of order), thus we also rewrite
the way monitor updates are marked completed.
While these transactions were still valid, we incorrectly assumed that
they would propagate with a locktime of `current_height + 1`, when in
reality, only those with a locktime strictly lower than the next height
in the chain are allowed to enter the mempool.
In a future commit, we plan to correctly enforce that the spending
transaction has a valid locktime relative to the chain for the node
broascasting it in `TestBroadcaster::broadcast_transaction` to. We catch
up these test node instances to their expected height, such that we do
not fail said enforcement.
Unfortunately, the RAII types used by `RwLock` are not `Send`, which is
why they can't be held over `await` boundaries. In order to allow
asynchronous events processing in multi-threaded environments, we here
allow to process events without holding the `total_consistency_lock`.
This finally completes the piping of the `payment_metadata` from
from the BOLT11 invoice on the sending side all the way through the
onion sending + receiving ends to the user on the receive events.
When we receive an HTLC, we want to pass the `payment_metadata`
through to the `PaymentClaimable` event. This does most of the
internal refactoring required to do so - storing a
`RecipientOnionFields` in the inbound HTLC tracking structs,
including the `payment_metadata`.
In the future this struct will allow us to do MPP keysend receipts
(as it now stores an Optional `payment_secret` for all inbound
payments) as well as custom TLV receipts (as the struct is
extensible to store additional fields and the internal API supports
filtering for fields which are consistent across HTLCs).
If we receive an HTLC and are processing it a potential MPP part,
we always continue in the per-HTLC loop if we call the `fail_htlc`
macro, thus its nice to actually do the `continue` therein rather
than at the callsites.
If we add an entry to `claimable_payments` we have to ensure we
actually accept the HTLC we're considering, otherwise we'll end up
with an empty `claimable_payments` entry.
This adds the new `payment_metadata` to `RecipientOnionFields`,
passing the metadata from BOLT11 invoices through the send pipeline
and finally copying them info the onion when sending HTLCs.
This completes send-side support for the new payment metadata
feature.
We correctly send out a gossip channel disable update after one
full time tick being down (1-2 minutes). This is pretty nice in
that it avoids nodes trying to route through our nodes too often
if they're down. Other nodes have a much longer time window,
causing them to have much less aggressive channel disables. Sadly,
at one minute it's not super uncommon for tor nodes to get disabled
(once a day or so on two nodes I looked at), and this causes the
lightning terminal scorer to consider the LDK node unstable (even
though it's the one doing the disabling - so is online). This
causes user frustration and makes LDK look bad (even though it's
probably failing fewer payments).
Given this, and future switches to block-based `channel_update`
timestamp fields, it makes sense to go ahead and switch to delaying
channel disable announcements for 10 minutes. This puts us more in
line with other implementations and reduces gossip spam, at the
cost of less reliable payments.
Fixes#2175, at least the currently visible parts.
When generating a `channel_update` either in response to a fee
configuration change or an HTLC failure, we currently poll the
channel to check if the peer's connected when setting the disabled
bit in the `channel_update`. This could cause cases where we set
the disable bit even though the peer *just* disconnected, and don't
generate a followup broadcast `channel_update` with the disabled
bit unset.
While a node generally shouldn't rebroadcast a `channel_update` it
received in an onion, there's nothing inherently stopping them from
doing so. Obviously in the fee-update case we expect the message to
propagate.
Luckily, since we already "stage" disable-changed updates, we can
check the staged state and use that to set the disabled bit in all
`channel_update` cases.