When receiving an update_fulfill_htlc message, we immediately
forward the claim backwards along the payment path before waiting
for a full commitment_signed dance. This is great, but can cause
duplicative claims if a node sends an update_fulfill_htlc message,
disconnects, reconnects, and then has to re-send its
update_fulfill_htlc message again.
While there was code to handle this, it treated it as a channel
error on the inbound channel, which is incorrect - this is an
expected, albeit incredibly rare, condition. Instead, we handle
these double-claims correctly, simply ignoring them.
With debug_assertions enabled, we also check that the previous
close of the same HTLC was a fulfill, and that we are not moving
from a HTLC failure to an HTLC claim after its too late.
A test is also added, which hits all three failure cases in
`Channel::get_update_fulfill_htlc`.
Found by the chanmon_consistency fuzzer.
This adds four new fields in `ChannelDetails`:
1. holder_selected_ and counterparty_selected_channel_reserve_delay
are useful to determine what amount of the channel is
unavailable for payments.
2. confirmations_required is useful when awaiting funding
confirmation to determine how long you will need to wait.
3. to_self_delay is useful to determine how long it will take to
receive funds after a force-close.
Fixes#983.
These fields are set with a dummy value, which we should generally
be avoiding since Rust gives us a nice `Option` type to use
instead.
Further, we stop rejecting channel_update messages outright when
the htlc_maximum_msat field includes the reserve values, which
nodes could reasonably do without it meriting a channel closure.
We had a user who pointed out that we weren't creating
`SpendableOutputs` events when we should have been after they
called `ChannelMonitor::best_block_updated` with a block well
after a CSV locktime and then called
`ChannelMonitor::transactions_confirmed` with the transaction which
we should have been spending (with a block height/hash a ways in
the past).
This was due to `ChannelMonitor::transactions_confirmed` only
calling `ChannelMonitor::block_confirmed` with the height at which
the transactions were confirmed, resulting in all checks being done
against that, not the current height.
Further, in the same scenario, we also would not fail-back and HTLC
where the HTLC-Timeout transaction was confirmed more than
ANTI_REORG_DELAY blocks ago.
To address this, we use the best block height for confirmation
threshold checks in `ChannelMonitor::block_confirmed` and pass both
the confirmation and current heights through to
`OnchainTx::update_claims_view`, using each as appropriate.
Fixes#962.
No matter the context, if we're told about a block which is
guaranteed by our API semantics to be on the best chain, and it has
a higher height than our current understanding of the best chain,
we should update our understanding. This avoids complexity
in `block_confirmed` by never having a height set which is *higher*
than our current best chain, potentially avoiding some bugs in the
rather-complicated code.
It also requires a minor test tweak as we in some cases now no
longer broadcast a conflicting transaction after the original has
reached the ANTI_REORG_DELAY.
We use `Channel::is_live()` to gate inclusion of a channel in
`ChannelManager::list_usable_channels()` and when sending an
HTLC to select whether a channel is available for
forwarding through/sending to.
In both of these cases, we should consider a channel `is_live()` when
they are pending a monitor update. Some clients may update monitors
asynchronously, thus we may simply be waiting a short duration for a
monitor update to complete, and shouldn't fail all forwarding HTLCs
during that time.
After #851, we always ensure any holding cells are free'd when
sending P2P messages, making this change much more trivially
correct - instead of having to ensure that we always free the holding
cell when a channel becomes live again after adding something to the
holding cell, we can simply rely on the fact that it always happens.
Fixes#661.
If we receive a `channel_update` message for a channel unrelated to
our own, we shouldn't trigger a persistence of our
`ChannelManager`. This avoids significant persistence traffic during
initial node startup.
ChannelMonitor and related log entries can generally lean towards
being higher log levels than they necessarily need to be, as they
should be exceedingly rare, if only because they require
confirmation of an on-chain transaction.
This updates a number of log sites in channel and channelmanager to
* Be a bit more verbose at the TRACE level,
* Move some error/useful messages to the ERROR/WARN/INFO level,
* Add new logs to always log once at the DEBUG level when we
send/receive a commitment_signed (with some extra data),
* Include the channel id being operated on in more log messages.
We very often receive duplicate gossip messages, which now causes us
to log at the DEBUG level, which is almost certainly not what a
user wants. Instead, we add a new form of ErrorAction which causes
us to only log at the TRACE level.
This much more consistently logs information about messages
sent/received, including logging the full messages being
sent/received at the TRACE log level. Many other log messages which
are more often of interest were moved to the DEBUG log level.
We had a client application which provided inconsistent monitor
state when deserializing a ChannelManager, resulting in opaque and
generic "InvalidData" deserialization failures. Instead, we log
some informative (and appropriately scary) warning messages in
such cases.
lnd has a long-standing bug where, upon reconnection, if the
channel is not yet confirmed they will not send a
channel_reestablish until the channel locks in. Then, they will
send a funding_locked *before* sending the channel_reestablish
(which is clearly a violation of the BOLT specs). We copy
c-lightning's workaround here and simply store the funding_locked
message until we receive a channel_reestablish.
See-also https://github.com/lightningnetwork/lnd/issues/4006Fixes#963
Previous to this PR, TLV serialization involved iterating from 0 to the highest
given TLV type. This worked until we decided to implement keysend, which has a
TLV type of ~5.48 billion.
So instead, we now specify the type of whatever is being (de)serialized (which
can be an Option, a Vec type, or a non-Option (specified in the serialization macros as "required").
When a peer sends us the routing graph, it may include gossip
messages for our channels, despite it not being a party to them.
This is completely fine, but we currently print a somewhat-scary
looking log messages in these cases, eg:
```
ERROR [lightning::ln::channelmanager:4104] Got a message for a channel from the wrong node!
TRACE [lightning::ln::peer_handler:1267] Handling SendErrorMessage HandleError event in peer_handler for node ... with message Got a message for a channel from the wrong node!
```
Instead, we should simply not consider this an "error" condition
and stay silent.
The only practical way to meet this requirement is to block
disconnect_socket until any pending events are fully processed,
leading to this trivial deadlock:
* Thread 1: select() woken up due to a read event
* Thread 2: Event processing causes a disconnect_socket call to
fire while the PeerManager lock is held.
* Thread 2: disconnect_socket blocks until the read event in
thread 1 completes.
* Thread 1: bytes are read from the socket and
PeerManager::read_event is called, waiting on the lock
still held by thread 2.
There isn't a trivial way to address this deadlock without simply
making the final read_event call return immediately, which we do
here. This also implies that users can freely call event methods
after disconnect_socket, but only so far as the socket descriptor
is different from any later socket descriptor (ie until the file
descriptor is re-used).
`Julian Knutsen <julianknutsen@users.noreply.github.com>` pointed
out in a previous discussion that `read_event` can reenter user
code despite the documentation stating explicitly that it will not.
This was addressed in #456 by simply codifying the reentrancy, but
its somewhat simpler to just drop the `do_attempt_write_data` call.
Ideally we could land most of Julian's work, but its still in need
of substantial git history cleanup to get it in a reviewable state
and this solves the immediate issue.
We can never assume that messages were reliably delivered whether
we placed them in the socket or not, so there isn't a lot of use in
explicitly handling the case that a peer was not connected when we
went to send it a message.
Two TODOs are left for the generation of a `FundingAbandoned` (or
similar) event, though it ultimately belongs in `ChannelManager`.