`ChannelManager::fail_htlc_backwards`' bool return value is quite
confusing - just because it returns false doesn't mean the payment
wasn't (already) failed. Worse, in some race cases around shutdown
where a payment was claimed before an unclean shutdown and then
retried on startup, `fail_htlc_backwards` could return true even
though (a duplicate copy of the same payment) was claimed, but the
claim event has not been seen by the user yet.
While its possible to use it correctly, its somewhat confusing to
have a return value at all, and definitely lends itself to misuse.
Instead, we should push users towards a model where they don't care
if `fail_htlc_backwards` succeeds - either they've locally marked
the payment as failed (prior to seeing any `PaymentReceived`
events) and will fail any attempts to pay it, or they have not and
the payment is still receivable until its timeout time is reached.
We can revisit this decision based on user feedback, but will need
to very carefully document the potential failure modes here if we
do.
As additional sanity checks, before claiming a payment, we check
that we have the full amount available in `claimable_htlcs` that
the payment should be for. Concretely, this prevents one
somewhat-absurd edge case where a user may receive an MPP payment,
wait many *blocks* before claiming it, allowing us to fail the
pending HTLCs and the sender to retry some subset of the payment
before we go to claim. More generally, this is just good
belt-and-suspenders against any edge cases we may have missed.
While the HTLC-claim process happens across all MPP parts under one
lock, this doesn't imply that they are claimed fully atomically on
disk. Ultimately, an application can crash after persisting one
`ChannelMonitorUpdate` out of multiple monitor updates needed for
the full claim.
Previously, this would leave us in a very bad state - because of
the all-channels-available check in `claim_funds` we'd refuse to
claim the payment again on restart (even though the
`PaymentReceived` event will be passed to the user again), and we'd
end up having partially claimed the payment!
The fix for the consistency part of this issue is pretty
straightforward - just check for this condition on startup and
complete the claim across all channels/`ChannelMonitor`s if we
detect it.
This still leaves us in a confused state from the perspective of
the user, however - we've actually claimed a payment but when they
call `claim_funds` we return `false` indicating it could not be
claimed.
This update also includes a minor refactor. The return type of
`pending_monitor_events` has been changed to a `Vec` tuple with the
`OutPoint` type. This associates a `Vec` of `MonitorEvent`s with a
funding outpoint.
We've also renamed `source/sink_channel_id` to `prev/next_channel_id` in
the favour of clarity.
When we fail an HTLC which was destined for a channel that the HTLC
sender didn't know the real SCID for, we should ensure we continue
to use the alias in the channel_update we provide them. Otherwise
we will leak the channel's real SCID to HTLC senders.
This makes tests slightly more realistic by delivering
`channel_update`s to `ChannelManager`s, ensuring we have
forwarding data stored locally for all channels, including public
ones.
... by calling it both before and after every chain event in
testing and fuzzing.
This requires fixing some blockchain inconsistencies in
`do_test_onchain_htlc_reorg`, `do_retry_with_no_persist`, and
`do_test_dup_htlc_onchain_fails_on_reload` where we'd connect
conflicting transactions in the same chain.
`cargo bench` sets `cfg(test)`, causing us to hit some test-only
code in the router when benchmarking, throwing off our benchmarks
substantially. Here we swap from the `unstable` feature to a more
clearly internal feature (`_bench_unstable`) and also checking for
it when enabling test-only code.
The spec actually requires we never send `announcement_signatures`
(and, thus, `channel_announcement`s) until after six confirmations.
However, we would happily have sent them prior to that as long as
we exchange `funding_locked` messages with our countarparty. Thanks
to re-broadcasting this issue is largely harmless, however it could
have some negative interactions with less-robust peers. Much more
importantly, this represents an important step towards supporting
0-conf channels, where `funding_locked` messages may be exchanged
before we even have an SCID to construct the messages with.
Because there is no ACK mechanism for `announcement_signatures` we
rely on existing channel updates to stop rebroadcasting them - if
we sent a `commitment_signed` after an `announcement_signatures`
and later receive a `revoke_and_ack`, we know our counterparty also
received our `announcement_signatures`. This may resolve some rare
edge-cases where we send a `funding_locked` which our counterparty
receives, but lose connection before the `announcement_signatures`
(usually the very next message) arrives.
Sadly, because the set of places where an `announcement_signatures`
may now be generated more closely mirrors where `funding_locked`
messages may be generated, but they are now separate, there is a
substantial amount of code motion providing relevant parameters
about current block information and ensuring we can return new
`announcement_signatures` messages.
When a payment fails, a payer needs to know when they can consider
a payment as fully-failed, and when only some of the HTLCs in the
payment have failed. This isn't possible with the current event
scheme, as discovered recently and as described in the previous
commit.
This adds a new event which describes when a payment is fully and
irrevocably failed, generating it only after the payment has
expired or been marked as expired with
`ChannelManager::mark_retries_exceeded` *and* all HTLCs for it
have failed. With this, a payer can more simply deduce when a
payment has failed and use that to remove payment state or
finalize a payment failure.
During event handling, ChannelManager methods may need to be called as
indicated in the Event documentation. Ensure that these calls are
idempotent for the same event rather than panicking. This allows users
to persist events for later handling without needing to worry about
processing the same event twice (e.g., if ChannelManager is not
persisted but the events were, the restarted ChannelManager would return
some of the same events).
A peer providing a channel_reserve_satoshis of 0 (or less than our
dust limit) is insecure, but only for them. Because some LSPs do it
with some level of trust of the clients (for a substantial UX
improvement), we explicitly allow it. Because its unlikely to
happen often in normal testing, we test it explicitly here.
A single PaymentSent event is generated when a payment is fulfilled.
This is occurs when the preimage is revealed on the first claimed HTLC.
For subsequent HTLCs, the event is not generated.
In order to score channels involved with a successful payments, the
scorer must be notified of each successful path involved in the payment.
Add a PaymentPathSuccessful event for this purpose. Generate it whenever
a part is removed from a pending outbound payment. This avoids duplicate
events when reconnecting to a peer.
In upcoming commits, we'll be making the payment secret and payment hash/preimage
derivable from info about the payment + a node secret. This means we don't
need to store any info about incoming payments and can eventually get rid of the
channelmanager::pending_inbound_payments map.
Scorer uses time to determine how much to penalize a channel after a
failure occurs. Parameterizing it by time cleans up the code such that
no-std support is in a single AlwaysPresent struct, which implements the
Time trait. Time is implemented for std::time::Instant when std is
available.
This parameterization also allows for deterministic testing since a
clock could be devised to advance forward as needed.