In c02b6a3807 we moved the
`payment_preimage` copy from inside the macro which only runs if we
are spending an output we know is an HTLC output to doing it for
any script that matches our expected length. This can panic if an
inbound channel is created with a bogus funding transaction that
has a witness program of the HTLC-Success/-Offered length but which
does not have a second-to-last witness element which is 32 bytes.
Luckily this panic is relatively simple for downstream users to
work around - if an invalid-length-copy panic occurs, simply remove
the ChannelMonitor from the bogus channel on startup and run
without it. Because the channel must be funded by a bogus script in
order to reach this panic, the channel will already have closed by
the time the funding transaction is spent, and there can be no
local funds in such a channel, so removing the `ChannelMonitor`
wholesale is completely safe.
In order to test this we have to disable an in-line assertion that
checks that our transactions match expected scripts which we do by
checking for the specific bogus script that we now use in
`test_invalid_funding_tx`.
Thanks to Eugene Siegel for reporting this issue.
Because downstream languages are often garbage-collected, having
the user directly allocate a `ReadOnlyNetworkGraph` and pass a
reference to it to `find_route` often results in holding a read
lock long in excess of the `find_route` call. Worse, some languages
(like JavaScript) tend to only garbage collect when other code is
not running, possibly leading to deadlocks.
When we receive a `channel_reestablish` with a `data_loss_protect`
that proves we're running with a stale state, instead of
force-closing the channel, we immediately panic. This lines up with
our refusal to run if we find a `ChannelMonitor` which is stale
compared to our `ChannelManager` during `ChannelManager`
deserialization. Ultimately both are an indication of the same
thing - that the API requirements on `chain::Watch` were violated.
In the "running with outdated state but ChannelMonitor(s) and
ChannelManager lined up" case specifically its likely we're running
off of an old backup, in which case connecting to peers with
channels still live is explicitly dangerous. That said, because
this could be an operator error that is correctable, panicing
instead of force-closing may allow for normal operation again in
the future (cc #1207).
In any case, we provide instructions in the panic message for how
to force-close channels prior to peer connection, as well as a note
on how to broadcast the latest state if users are willing to take
the risk.
Note that this is still somewhat unsafe until we resolve#1563.
If a user restores from a backup that they know is stale, they'd
like to force-close all of their channels (or at least the ones
they know are stale) *without* broadcasting the latest state,
asking their peers to do so instead. This simply adds methods to do
so, renaming the existing `force_close_channel` and
`force_close_all_channels` methods to disambiguate further.
When we see a counterparty revoked commitment transaction on-chain
we shouldn't immediately queue up HTLCs present in it for
resolution until we have spent the HTLC outputs in some kind of
claim transaction.
In order to do so, we first have to change the
`fail_unbroadcast_htlcs!()` call to provide it with the HTLCs which
are present in the (revoked) commitment transaction which was
broadcast. However, this is not sufficient - because all of those
HTLCs had their `HTLCSource` removed when the commitment
transaction was revoked, we also have to update
`fail_unbroadcast_htlcs` to check the payment hash and amount when
the `HTLCSource` is `None`.
Somewhat surprisingly, several tests actually explicitly tested for
the old behavior, which required amending to pass with the new
changes.
Finally, this adds a debug assertion when writing `ChannelMonitor`s
to ensure `HTLCSource`s do not leak.
This is mostly motivated by the fact that payments may happen while the
latest `ChannelUpdate` indicating our new `ChannelConfig` is still
propagating throughout the network. By temporarily allowing the previous
config, we can help reduce payment failures across the network.
We do this to prevent payment failures while the `ChannelUpdate` for the
new `ChannelConfig` still propagates throughout the network. In a follow
up commit, we'll honor forwarding HTLCs that were constructed based on
either the previous or current `ChannelConfig`.
To handle expiration (when we should stop allowing the previous config),
we rely on the ChannelManager's `timer_tick_occurred` method. After
enough ticks, the previous config is cleared from memory, and only the
current config applies moving forward.
A new `update_channel_config` method is exposed on the `ChannelManger`
to update the `ChannelConfig` for a set of channels atomically. New
`ChannelUpdate` events are generated for each eligible channel.
Note that as currently implemented, a buggy and/or
auto-policy-management client could spam the network with updates as
there is no rate-limiting in place. This could already be done with
`broadcast_node_announcement`, though users are less inclined to update
that as frequently as its data is mostly static.
As we prepare to expose an API to update a channel's ChannelConfig,
we'll also want to expose this struct to consumers such that they have
insights into the current ChannelConfig applied for each channel.
Previously, while processing a confirmed revoked counterparty
commitment transaction, we'd populate `OnchainEvent`s for live
HTLCs with a `txid` source of the txid of the latest counterparty
commitment transactions, not the confirmed revoked one. This meant
that, if the user is using `transaction_unconfirmed` to notify us
of reorg information, we'd end up not removing the entry if the
revoked commitment transaction was reorg'd out. This would
ultimately cause us to spuriously resolve the HTLC(s) as the chain
advanced, even though we were doing so based on a now-reorged-out
transaction.
Luckily the fix is simple - set the correct txid in the
`OnchainEventEntry`. We also take this opportunity to update
logging in a few places with the txid of the transaction causing an
event.
If the funding transaction is timelocked beyond the next block of
our best known chain tip, return an APIError instead of silently
failing at broadcast attempt.
ChannelConfig now has its static fields removed. We introduce a new
LegacyChannelConfig struct that maintains the serialization as
previously defined by ChannelConfig to remain backwards compatible with
clients running 0.0.107 and earlier.
As like the previous commit, `commit_upfront_shutdown_pubkey` is another
static field that cannot change after the initial channel handshake. We
therefore move it out from its existing place in `ChannelConfig`.
In the near future, we plan to allow users to update their
`ChannelConfig` after the initial channel handshake. In order to reuse
the same struct and expose it to users, we opt to move out all static
fields that cannot be updated after the initial channel handshake.
P2PGossipSync logs before delegating to NetworkGraph in its
EventHandler. In order to share this handling with RapidGossipSync,
NetworkGraph needs to take a logger so that it can implement
EventHandler instead.
NetGraphMsgHandler implements RoutingMessageHandler to handle gossip
messages defined in BOLT 7 and maintains a view of the network by
updating NetworkGraph. Rename it to P2PGossipSync, which better
describes its purpose, and to contrast with RapidGossipSync.
A NetworkUpdate indicating ChannelClosed actually corresponds to a
channel failure as described in BOLT 4:
0x2000 (NODE): node failure (otherwise channel)
Rename the enum variant to ChannelFailure and rename NetworkGraph
methods close_channel_from_update and fail_node to channel_failed and
node_failed, respectively.
If the user broadcasts a funding transaction before the
counterparty provides a `funding_signed` we will panic in
`check_get_channel_ready`. This is expected - the user did
something which may lead to loss of funds, and we *really* need to
let them know.
However, the fuzzer can do this and we shouldn't treat it as a bug,
its a totally expected panic. Thus, we disable the panic in fuzz.
Thanks to Chaincode for providing fuzzing resources which managed
to hit this panic.
`ChannelManager::fail_htlc_backwards`' bool return value is quite
confusing - just because it returns false doesn't mean the payment
wasn't (already) failed. Worse, in some race cases around shutdown
where a payment was claimed before an unclean shutdown and then
retried on startup, `fail_htlc_backwards` could return true even
though (a duplicate copy of the same payment) was claimed, but the
claim event has not been seen by the user yet.
While its possible to use it correctly, its somewhat confusing to
have a return value at all, and definitely lends itself to misuse.
Instead, we should push users towards a model where they don't care
if `fail_htlc_backwards` succeeds - either they've locally marked
the payment as failed (prior to seeing any `PaymentReceived`
events) and will fail any attempts to pay it, or they have not and
the payment is still receivable until its timeout time is reached.
We can revisit this decision based on user feedback, but will need
to very carefully document the potential failure modes here if we
do.
As additional sanity checks, before claiming a payment, we check
that we have the full amount available in `claimable_htlcs` that
the payment should be for. Concretely, this prevents one
somewhat-absurd edge case where a user may receive an MPP payment,
wait many *blocks* before claiming it, allowing us to fail the
pending HTLCs and the sender to retry some subset of the payment
before we go to claim. More generally, this is just good
belt-and-suspenders against any edge cases we may have missed.
If we crashed during a payment claim and then detected a partial
claim on restart, we should ensure the user is aware that the
payment has been claimed. We do so here by using the new
partial-claim detection logic to create a `PaymentClaimed` event.