The `rejected_by_dest` field of the `PaymentPathFailed` event has
always been a bit of a misnomer, as its really more about retry
than where a payment failed. Now is as good a time as any to
rename it.
This uses the various new tracking added in the prior commits to
expose a new `Balance` type - `CounterpartyRevokedOutputClaimable`.
Some nontrivial work is required, however, as we now have to track
HTLC outputs as spendable in a transaction that comes *after* an
HTLC-Success/HTLC-Timeout transaction, which we previously didn't
need to do. Thus, we have to check if an
`onchain_events_awaiting_threshold_conf` event spends a commitment
transaction's HTLC output while walking events. Further, because
we now need to track HTLC outputs after the
HTLC-Success/HTLC-Timeout confirms, and because we have to track
the counterparty's `to_self` output as a contentious output which
could be claimed by either party, we have to examine the
`OnchainTxHandler`'s set of outputs to spend when determining if
certain outputs are still spendable.
Two new tests are added which test various different transaction
formats, and hopefully provide good test coverage of the various
revoked output paths.
Instead of backfilling gossip by buffering (up to) ten messages at
a time, only buffer one message at a time, as the peers' outbound
socket buffer drains. This moves the outbound backfill messages out
of `PeerHandler` and into the operating system buffer, where it
arguably belongs.
Not buffering causes us to walk the gossip B-Trees somewhat more
often, but avoids allocating vecs for the responses. While its
probably (without having benchmarked it) a net performance loss, it
simplifies buffer tracking and leaves us with more room to play
with the buffer sizing constants as we add onion message forwarding
which is an important win.
Note that because we change how often we check if we're out of
messages to send before pinging, we slightly change how many
messages are exchanged at once, impacting the
`test_do_attempt_write_data` constants.
We add `HTLCHandlingFailedConditions` to express the failure parameters,
that will be enforced by a new macro, `expect_pending_htlcs_forwardable_conditions`.
When an HTLC fails, we currently rely on the scorer learning the
failed channel and assigning an infinite (`u64::max_value()`)
penalty to the channel so as to avoid retrying over the exact same
path (if there's only one available path). This is common when
trying to pay a mobile client behind an LSP if the mobile client is
currently offline.
This leads to the scorer being overly conservative in some cases -
returning `u64::max_value()` when a given path hasn't been tried
for a given payment may not be the best decision, even if that
channel failed 50 minutes ago.
By tracking channels which failed on a payment part level and
explicitly refusing to route over them we can relax the
requirements on the scorer, allowing it to make different decisions
on how to treat channels that failed relatively recently without
causing payments to retry the same path forever.
This does have the drawback that it could allow two separate part
of a payment to traverse the same path even though that path just
failed, however this should only occur if the payment is going to
fail anyway, at least as long as the scorer is properly learning.
Closes#1241, superseding #1252.
In the next commit we add lockorder testing based on the line each
mutex was created on rather than the particular mutex instance.
This causes some additional test failure because of lockorder
inversions for the same mutex across different tests, which is
fixed here.
This is mostly motivated by the fact that payments may happen while the
latest `ChannelUpdate` indicating our new `ChannelConfig` is still
propagating throughout the network. By temporarily allowing the previous
config, we can help reduce payment failures across the network.
ChannelConfig now has its static fields removed. We introduce a new
LegacyChannelConfig struct that maintains the serialization as
previously defined by ChannelConfig to remain backwards compatible with
clients running 0.0.107 and earlier.
In the near future, we plan to allow users to update their
`ChannelConfig` after the initial channel handshake. In order to reuse
the same struct and expose it to users, we opt to move out all static
fields that cannot be updated after the initial channel handshake.
P2PGossipSync logs before delegating to NetworkGraph in its
EventHandler. In order to share this handling with RapidGossipSync,
NetworkGraph needs to take a logger so that it can implement
EventHandler instead.
NetGraphMsgHandler implements RoutingMessageHandler to handle gossip
messages defined in BOLT 7 and maintains a view of the network by
updating NetworkGraph. Rename it to P2PGossipSync, which better
describes its purpose, and to contrast with RapidGossipSync.
A NetworkUpdate indicating ChannelClosed actually corresponds to a
channel failure as described in BOLT 4:
0x2000 (NODE): node failure (otherwise channel)
Rename the enum variant to ChannelFailure and rename NetworkGraph
methods close_channel_from_update and fail_node to channel_failed and
node_failed, respectively.
`ChannelManager::fail_htlc_backwards`' bool return value is quite
confusing - just because it returns false doesn't mean the payment
wasn't (already) failed. Worse, in some race cases around shutdown
where a payment was claimed before an unclean shutdown and then
retried on startup, `fail_htlc_backwards` could return true even
though (a duplicate copy of the same payment) was claimed, but the
claim event has not been seen by the user yet.
While its possible to use it correctly, its somewhat confusing to
have a return value at all, and definitely lends itself to misuse.
Instead, we should push users towards a model where they don't care
if `fail_htlc_backwards` succeeds - either they've locally marked
the payment as failed (prior to seeing any `PaymentReceived`
events) and will fail any attempts to pay it, or they have not and
the payment is still receivable until its timeout time is reached.
We can revisit this decision based on user feedback, but will need
to very carefully document the potential failure modes here if we
do.
As additional sanity checks, before claiming a payment, we check
that we have the full amount available in `claimable_htlcs` that
the payment should be for. Concretely, this prevents one
somewhat-absurd edge case where a user may receive an MPP payment,
wait many *blocks* before claiming it, allowing us to fail the
pending HTLCs and the sender to retry some subset of the payment
before we go to claim. More generally, this is just good
belt-and-suspenders against any edge cases we may have missed.
While the HTLC-claim process happens across all MPP parts under one
lock, this doesn't imply that they are claimed fully atomically on
disk. Ultimately, an application can crash after persisting one
`ChannelMonitorUpdate` out of multiple monitor updates needed for
the full claim.
Previously, this would leave us in a very bad state - because of
the all-channels-available check in `claim_funds` we'd refuse to
claim the payment again on restart (even though the
`PaymentReceived` event will be passed to the user again), and we'd
end up having partially claimed the payment!
The fix for the consistency part of this issue is pretty
straightforward - just check for this condition on startup and
complete the claim across all channels/`ChannelMonitor`s if we
detect it.
This still leaves us in a confused state from the perspective of
the user, however - we've actually claimed a payment but when they
call `claim_funds` we return `false` indicating it could not be
claimed.
We have a bunch of fancy infrastructure to ensure we can connect
blocks using all our different connection interfaces, but we only
bother to use it in a few select tests.
This expands our use of `ConnectStyle` to most of our tests by
simply randomizing the style in each test. This makes our tests
non-deterministic, but we print the connection style at start so
that it's easy to reproduce a failure deterministically.
This update also includes a minor refactor. The return type of
`pending_monitor_events` has been changed to a `Vec` tuple with the
`OutPoint` type. This associates a `Vec` of `MonitorEvent`s with a
funding outpoint.
We've also renamed `source/sink_channel_id` to `prev/next_channel_id` in
the favour of clarity.
When we fail an HTLC which was destined for a channel that the HTLC
sender didn't know the real SCID for, we should ensure we continue
to use the alias in the channel_update we provide them. Otherwise
we will leak the channel's real SCID to HTLC senders.