We increase the `user_channel_id` type from `u64` to `u128`. In order to
maintain backwards compatibility, we have to de-/serialize it as two
separate `u64`s in `Event` as well as in the `Channel` itself.
It was pointed out that its quite confusing that
`AllFailedRetrySafe` does not allow you to call `retry_payment`,
though the documentation on it does specify this. Instead, we
simply rename it to `AllFailedResendSafe` to indicate that the
action that is safe to take is *resending*, not *retrying*.
In c986e52ce8, an `MppId` was added
to `HTLCSource` objects as a way of correlating HTLCs which belong
to the same payment when the `ChannelManager` sees an HTLC
succeed/fail. This allows it to have awareness of the state of all
HTLCs in a payment when it generates the ultimate user-facing
payment success/failure events. This was used in the same PR to
avoid generating duplicative success/failure events for a single
payment.
Because the field was only used as an internal token to correlate
HTLCs, and retries were not supported, it was generated randomly by
calling the `KeysInterface`'s 32-byte random-fetching function.
This also provided a backwards-compatibility story as the existing
HTLC randomization key was re-used for older clients.
In 28eea12bbe `MppId` was renamed to
the current `PaymentId` which was then used expose the
`retry_payment` interface, allowing users to send new HTLCs which
are considered a part of an existing payment.
At no point has the payment-sending API seriously considered
idempotency, a major drawback which leaves the API unsafe in most
deployments. Luckily, there is a simple solution - because the
`PaymentId` must be unique, and because payment information for a
given payment is held for several blocks after a payment
completes/fails, it represents an obvious idempotency token.
Here we simply require the user provide the `PaymentId` directly in
`send_payment`, allowing them to use whatever token they may
already have for a payment's idempotency token.
When a `chain::Watch` `ChannelMonitor` update method is called, the
user has three options:
(a) persist the monitor update immediately and return success,
(b) fail to persist the monitor update immediately and return
failure,
(c) return a flag indicating the monitor update is in progress and
will complete in the future.
(c) is rather harmless, and in some deployments should be expected
to be the return value for all monitor update calls, but currently
requires returning `Err(ChannelMonitorUpdateErr::TemporaryFailure)`
which isn't very descriptive and sounds scarier than it is.
Instead, here, we change the return type used to be a single enum
(rather than a Result) and rename `TemporaryFailure`
`UpdateInProgress`.
As we move towards specify supported/required feature bits in the
module(s) where they are supported, the global `known` feature set
constructors no longer make sense.
Here we stop relying on the `known` method in the
functional_test_utils module.
In the next commit we'll enforce counterparty `InitFeatures`
matching our required set in `ChannelManager`, implying they must
be set for many tests where they previously did not need to be (as
they were enforced in `PeerManager`, which is not used in
functional tests).
Some `NodeFeatures` will, in the future, represent features which
are not enabled by the `ChannelManager`, but by other message
handlers handlers. Thus, it doesn't make sense to determine the
node feature bits in the `ChannelManager`.
The simplest fix for this is to change to generating the
node_announcement in `PeerManager`, asking all the connected
handlers which feature bits they support and simply OR'ing them
together. While this may not be sufficient in the future as it
doesn't consider feature bit dependencies, support for those could
be handled at the feature level in the future.
This commit moves the `broadcast_node_announcement` function to
`PeerHandler` but does not yet implement feature OR'ing.
When we connect to a new peer, immediately send them any
channel_announcement and channel_update messages for any public
channels we have with other peers. This allows us to stop sending
those messages on a timer when they have not changed and ensures
we are sending messages when we have peers connected, rather than
broadcasting at startup when we have no peers connected.
The `rejected_by_dest` field of the `PaymentPathFailed` event has
always been a bit of a misnomer, as its really more about retry
than where a payment failed. Now is as good a time as any to
rename it.
This uses the various new tracking added in the prior commits to
expose a new `Balance` type - `CounterpartyRevokedOutputClaimable`.
Some nontrivial work is required, however, as we now have to track
HTLC outputs as spendable in a transaction that comes *after* an
HTLC-Success/HTLC-Timeout transaction, which we previously didn't
need to do. Thus, we have to check if an
`onchain_events_awaiting_threshold_conf` event spends a commitment
transaction's HTLC output while walking events. Further, because
we now need to track HTLC outputs after the
HTLC-Success/HTLC-Timeout confirms, and because we have to track
the counterparty's `to_self` output as a contentious output which
could be claimed by either party, we have to examine the
`OnchainTxHandler`'s set of outputs to spend when determining if
certain outputs are still spendable.
Two new tests are added which test various different transaction
formats, and hopefully provide good test coverage of the various
revoked output paths.
Instead of backfilling gossip by buffering (up to) ten messages at
a time, only buffer one message at a time, as the peers' outbound
socket buffer drains. This moves the outbound backfill messages out
of `PeerHandler` and into the operating system buffer, where it
arguably belongs.
Not buffering causes us to walk the gossip B-Trees somewhat more
often, but avoids allocating vecs for the responses. While its
probably (without having benchmarked it) a net performance loss, it
simplifies buffer tracking and leaves us with more room to play
with the buffer sizing constants as we add onion message forwarding
which is an important win.
Note that because we change how often we check if we're out of
messages to send before pinging, we slightly change how many
messages are exchanged at once, impacting the
`test_do_attempt_write_data` constants.
We add `HTLCHandlingFailedConditions` to express the failure parameters,
that will be enforced by a new macro, `expect_pending_htlcs_forwardable_conditions`.
When an HTLC fails, we currently rely on the scorer learning the
failed channel and assigning an infinite (`u64::max_value()`)
penalty to the channel so as to avoid retrying over the exact same
path (if there's only one available path). This is common when
trying to pay a mobile client behind an LSP if the mobile client is
currently offline.
This leads to the scorer being overly conservative in some cases -
returning `u64::max_value()` when a given path hasn't been tried
for a given payment may not be the best decision, even if that
channel failed 50 minutes ago.
By tracking channels which failed on a payment part level and
explicitly refusing to route over them we can relax the
requirements on the scorer, allowing it to make different decisions
on how to treat channels that failed relatively recently without
causing payments to retry the same path forever.
This does have the drawback that it could allow two separate part
of a payment to traverse the same path even though that path just
failed, however this should only occur if the payment is going to
fail anyway, at least as long as the scorer is properly learning.
Closes#1241, superseding #1252.
In the next commit we add lockorder testing based on the line each
mutex was created on rather than the particular mutex instance.
This causes some additional test failure because of lockorder
inversions for the same mutex across different tests, which is
fixed here.
This is mostly motivated by the fact that payments may happen while the
latest `ChannelUpdate` indicating our new `ChannelConfig` is still
propagating throughout the network. By temporarily allowing the previous
config, we can help reduce payment failures across the network.
ChannelConfig now has its static fields removed. We introduce a new
LegacyChannelConfig struct that maintains the serialization as
previously defined by ChannelConfig to remain backwards compatible with
clients running 0.0.107 and earlier.
In the near future, we plan to allow users to update their
`ChannelConfig` after the initial channel handshake. In order to reuse
the same struct and expose it to users, we opt to move out all static
fields that cannot be updated after the initial channel handshake.
P2PGossipSync logs before delegating to NetworkGraph in its
EventHandler. In order to share this handling with RapidGossipSync,
NetworkGraph needs to take a logger so that it can implement
EventHandler instead.
NetGraphMsgHandler implements RoutingMessageHandler to handle gossip
messages defined in BOLT 7 and maintains a view of the network by
updating NetworkGraph. Rename it to P2PGossipSync, which better
describes its purpose, and to contrast with RapidGossipSync.
A NetworkUpdate indicating ChannelClosed actually corresponds to a
channel failure as described in BOLT 4:
0x2000 (NODE): node failure (otherwise channel)
Rename the enum variant to ChannelFailure and rename NetworkGraph
methods close_channel_from_update and fail_node to channel_failed and
node_failed, respectively.
`ChannelManager::fail_htlc_backwards`' bool return value is quite
confusing - just because it returns false doesn't mean the payment
wasn't (already) failed. Worse, in some race cases around shutdown
where a payment was claimed before an unclean shutdown and then
retried on startup, `fail_htlc_backwards` could return true even
though (a duplicate copy of the same payment) was claimed, but the
claim event has not been seen by the user yet.
While its possible to use it correctly, its somewhat confusing to
have a return value at all, and definitely lends itself to misuse.
Instead, we should push users towards a model where they don't care
if `fail_htlc_backwards` succeeds - either they've locally marked
the payment as failed (prior to seeing any `PaymentReceived`
events) and will fail any attempts to pay it, or they have not and
the payment is still receivable until its timeout time is reached.
We can revisit this decision based on user feedback, but will need
to very carefully document the potential failure modes here if we
do.
As additional sanity checks, before claiming a payment, we check
that we have the full amount available in `claimable_htlcs` that
the payment should be for. Concretely, this prevents one
somewhat-absurd edge case where a user may receive an MPP payment,
wait many *blocks* before claiming it, allowing us to fail the
pending HTLCs and the sender to retry some subset of the payment
before we go to claim. More generally, this is just good
belt-and-suspenders against any edge cases we may have missed.