When a payment path fails, it may be retried. Typically, this means
re-computing the route after updating the NetworkGraph and channel
scores in order to avoid the failing hop. The last hop in
PaymentPathFailed's path field contains the pubkey, amount, and CLTV
values needed to pass to get_route. However, it does not contain the
payee's features and route hints from the invoice.
Include the entire set of parameters in PaymentPathRetry and add it to
the PaymentPathFailed event. Add a get_retry_route wrapper around
get_route that takes PaymentPathRetry. This allows an EventHandler to
retry failed payment paths using the payee's route hints and features.
A payee can be identified by a pubkey and optionally have an associated
set of invoice features and route hints. Use this in get_route instead
of three separate parameters. This may be included in PaymentPathFailed
later to use when finding a new route.
This implements the channel type negotiation, though as we currently
only support channels with only static_remotekey set, it doesn't
implement the negotiation explicitly.
Its semantics are somewhat different from existing features,
however not enough to merit a different struct entirely.
Specifically, it only supports required features (if you send a
channel_type, the counterparty has to accept it wholesale or try
again, it cannot select only a subset of the flags) and it is
serialized differently (only appearing in TLVs).
If we go to send a payment, add the HTLC(s) to the channel(s),
commit the ChannelMonitor updates to disk, and then crash, we'll
come back up with no pending payments but HTLC(s) ready to be
claim/failed.
This makes it rather impractical to write a payment sender/retryer,
as you cannot guarantee atomicity - you cannot guarantee you'll
have retry data persisted even if the HTLC(s) are actually pending.
Because ChannelMonitors are *the* atomically-persisted data in LDK,
we lean on their current HTLC data to figure out what HTLC(s) are a
part of an outbound payment, rebuilding the pending payments list
on reload.
In the next commit, we will reload lost pending payments from
ChannelMonitors during restart. However, in order to avoid
re-adding pending payments which have already been fulfilled, we
must ensure that we do not fully remove pending payments until all
HTLCs for the payment have been fully removed from their
ChannelMonitors.
We do so here, introducing a new PendingOutboundPayment variant
called `Completed` which only tracks the set of pending HTLCs.
When an HTLC has been failed, we track it up until the point there
exists no broadcastable commitment transaction which has the HTLC
present, at which point Channel returns the HTLCSource back to the
ChannelManager, which fails the HTLC backwards appropriately.
When an HTLC is fulfilled, however, we fulfill on the backwards path
immediately. This is great for claiming upstream HTLCs, but when we
want to track pending payments, we need to ensure we can check with
ChannelMonitor data to rebuild pending payments. In order to do so,
we need an event similar to the HTLC failure event, but for
fulfills instead.
Specifically, if we force-close a channel, we remove its off-chain
`Channel` object entirely, at which point, on reload, we may notice
HTLC(s) which are not present in our pending payments map (as they
may have received a payment preimage, but not fully committed to
it). Thus, we'd conclude we still have a retryable payment, which
is untrue.
This commit does so, informing the ChannelManager via a new return
element where appropriate of the HTLCSource corresponding to the
failed HTLC.
This allows us to read a `HashMap` that has values which may be
skipped if they are some backwards-compatibility type.
We also take this opportunity to fail deserialization if keys are
duplicated.
If we have a `ChannelMonitor` update from an on-chain event which
returns a `TemporaryFailure`, we block `MonitorEvent`s from that
`ChannelMonitor` until the update is persisted. This prevents
duplicate payment send events to the user after payments get
reloaded from monitors on restart.
However, if the event being avoided isn't going to generate a
PaymentSent, but instead result in us claiming an HTLC from an
upstream channel (ie the HTLC was forwarded), then the result of a
user delaying the event is that we delay getting our money, not a
duplicate event.
Because user persistence may take an arbitrary amount of time, we
need to bound the amount of time we can possibly wait to return
events, which we do here by bounding it to 3 blocks.
Thanks to Val for catching this in review.
ChannelMonitors now require that they be re-persisted before
MonitorEvents be provided to the ChannelManager, the exact thing
that test_dup_htlc_onchain_fails_on_reload was testing for when it
*didn't* happen. As such, test_dup_htlc_onchain_fails_on_reload is
now testing that we bahve correctly when the API guarantees are not
met, something we don't need to do.
Here, we adapt it to test the new API requirements through
ChainMonitor's calls to the Persist trait instead.
This resolves several user complaints (and issues in the sample
node) where startup is substantially delayed as we're always
waiting for the chain data to sync.
Further, in an upcoming PR, we'll be reloading pending payments
from ChannelMonitors on restart, at which point we'll need the
change here which avoids handling events until after the user
has confirmed the `ChannelMonitor` has been persisted to disk.
It will avoid a race where we
* send a payment/HTLC (persisting the monitor to disk with the
HTLC pending),
* force-close the channel, removing the channel entry from the
ChannelManager entirely,
* persist the ChannelManager,
* connect a block which contains a fulfill of the HTLC, generating
a claim event,
* handle the claim event while the `ChannelMonitor` is being
persisted,
* persist the ChannelManager (before the CHannelMonitor is
persisted fully),
* restart, reloading the HTLC as a pending payment in the
ChannelManager, which now has no references to it except from
the ChannelMonitor which still has the pending HTLC,
* replay the block connection, generating a duplicate PaymentSent
event.
In the next commit, we'll be originating monitor updates both from
the ChainMonitor and from the ChannelManager, making simple
sequential update IDs impossible.
Further, the existing async monitor update API was somewhat hard to
work with - instead of being able to generate monitor_updated
callbacks whenever a persistence process finishes, you had to
ensure you only did so at least once all previous updates had also
been persisted.
Here we eat the complexity for the user by moving to an opaque
type for monitor updates, tracking which updates are in-flight for
the user and only generating monitor-persisted events once all
pending updates have been committed.
In the next commit we'll need ChainMonitor to "see" when a monitor
persistence completes, which means `monitor_updated` needs to move
to `ChainMonitor`. The simplest way to then communicate that
information to `ChannelManager` is via `MonitorEvet`s, which seems
to line up ok, even if they're now constructed by multiple
different places.
Expand routing::Score::channel_penalty_msat to include the source and
target node ids of the channel. This allows scorers to avoid certain
nodes altogether if desired.
In order to make the scoring tests easier to read, only check the
relevant RouteHop fields. The remaining fields are tested elsewhere.
Expand the test to show the path used without scoring.
Failed payments may be retried, but calling get_route may return a Route
with the same failing path. Add a routing::Score trait used to
parameterize get_route, which it calls to determine how much a channel
should be penalized in terms of msats willing to pay to avoid the
channel.
Also, add a Scorer struct that implements routing::Score with a constant
constant penalty. Subsequent changes will allow for more robust scoring
by feeding back payment path success and failure to the scorer via event
handling.
As ChainMonitor will need to see those errors in a coming PR,
we need to return errors via Persister so that our ChainMonitor
chain::Watch implementation sees them.
Previously, if a Persister returned a TemporaryFailure error when
we tried to persist a new channel, the ChainMonitor wouldn't track
the new ChannelMonitor at all, generating a PermanentFailure later
when the updating is restored.
This fixes that by correctly storing the ChannelMonitor on
TemporaryFailures, allowing later update restoration to happen
normally.
This is (indirectly) tested in the next commit where we use
Persister to return all monitor-update errors.
test_simple_monitor_permanent_update_fail and
test_simple_monitor_temporary_update_fail both have a mode where
they use either chain::Watch or persister to return errors.
As we won't be doing any returns directly from the chain::Watch
wrapper in a coming commit, the chain::Watch-return form of the
test will no longer make sense.
Exposing a `RwLock<HashMap<>>` directly was always a bit strange,
and in upcoming changes we'd like to change the internal
datastructure in `ChainMonitor`.
Further, the use of `RwLock` and `HashMap` meant we weren't able
to expose the ChannelMonitors themselves to users in bindings,
leaving a bindings/rust API gap.
Thus, we take this opportunity go expose ChannelMonitors directly
via a wrapper, hiding the internals of `ChainMonitor` behind
getters. We also update tests to use the new API.