We use `Channel::is_live()` to gate inclusion of a channel in
`ChannelManager::list_usable_channels()` and when sending an
HTLC to select whether a channel is available for
forwarding through/sending to.
In both of these cases, we should consider a channel `is_live()` when
they are pending a monitor update. Some clients may update monitors
asynchronously, thus we may simply be waiting a short duration for a
monitor update to complete, and shouldn't fail all forwarding HTLCs
during that time.
After #851, we always ensure any holding cells are free'd when
sending P2P messages, making this change much more trivially
correct - instead of having to ensure that we always free the holding
cell when a channel becomes live again after adding something to the
holding cell, we can simply rely on the fact that it always happens.
Fixes#661.
If we receive a `channel_update` message for a channel unrelated to
our own, we shouldn't trigger a persistence of our
`ChannelManager`. This avoids significant persistence traffic during
initial node startup.
ChannelMonitor and related log entries can generally lean towards
being higher log levels than they necessarily need to be, as they
should be exceedingly rare, if only because they require
confirmation of an on-chain transaction.
This updates a number of log sites in channel and channelmanager to
* Be a bit more verbose at the TRACE level,
* Move some error/useful messages to the ERROR/WARN/INFO level,
* Add new logs to always log once at the DEBUG level when we
send/receive a commitment_signed (with some extra data),
* Include the channel id being operated on in more log messages.
We very often receive duplicate gossip messages, which now causes us
to log at the DEBUG level, which is almost certainly not what a
user wants. Instead, we add a new form of ErrorAction which causes
us to only log at the TRACE level.
This much more consistently logs information about messages
sent/received, including logging the full messages being
sent/received at the TRACE log level. Many other log messages which
are more often of interest were moved to the DEBUG log level.
We had a client application which provided inconsistent monitor
state when deserializing a ChannelManager, resulting in opaque and
generic "InvalidData" deserialization failures. Instead, we log
some informative (and appropriately scary) warning messages in
such cases.
lnd has a long-standing bug where, upon reconnection, if the
channel is not yet confirmed they will not send a
channel_reestablish until the channel locks in. Then, they will
send a funding_locked *before* sending the channel_reestablish
(which is clearly a violation of the BOLT specs). We copy
c-lightning's workaround here and simply store the funding_locked
message until we receive a channel_reestablish.
See-also https://github.com/lightningnetwork/lnd/issues/4006Fixes#963
Previous to this PR, TLV serialization involved iterating from 0 to the highest
given TLV type. This worked until we decided to implement keysend, which has a
TLV type of ~5.48 billion.
So instead, we now specify the type of whatever is being (de)serialized (which
can be an Option, a Vec type, or a non-Option (specified in the serialization macros as "required").
When a peer sends us the routing graph, it may include gossip
messages for our channels, despite it not being a party to them.
This is completely fine, but we currently print a somewhat-scary
looking log messages in these cases, eg:
```
ERROR [lightning::ln::channelmanager:4104] Got a message for a channel from the wrong node!
TRACE [lightning::ln::peer_handler:1267] Handling SendErrorMessage HandleError event in peer_handler for node ... with message Got a message for a channel from the wrong node!
```
Instead, we should simply not consider this an "error" condition
and stay silent.
The only practical way to meet this requirement is to block
disconnect_socket until any pending events are fully processed,
leading to this trivial deadlock:
* Thread 1: select() woken up due to a read event
* Thread 2: Event processing causes a disconnect_socket call to
fire while the PeerManager lock is held.
* Thread 2: disconnect_socket blocks until the read event in
thread 1 completes.
* Thread 1: bytes are read from the socket and
PeerManager::read_event is called, waiting on the lock
still held by thread 2.
There isn't a trivial way to address this deadlock without simply
making the final read_event call return immediately, which we do
here. This also implies that users can freely call event methods
after disconnect_socket, but only so far as the socket descriptor
is different from any later socket descriptor (ie until the file
descriptor is re-used).
`Julian Knutsen <julianknutsen@users.noreply.github.com>` pointed
out in a previous discussion that `read_event` can reenter user
code despite the documentation stating explicitly that it will not.
This was addressed in #456 by simply codifying the reentrancy, but
its somewhat simpler to just drop the `do_attempt_write_data` call.
Ideally we could land most of Julian's work, but its still in need
of substantial git history cleanup to get it in a reviewable state
and this solves the immediate issue.
We can never assume that messages were reliably delivered whether
we placed them in the socket or not, so there isn't a lot of use in
explicitly handling the case that a peer was not connected when we
went to send it a message.
Two TODOs are left for the generation of a `FundingAbandoned` (or
similar) event, though it ultimately belongs in `ChannelManager`.
VecReadWrapper is only used in TLVs so there is no need to prepend
a length before writing/reading the objects - we can instead simply
read until we reach the end of the TLV stream.
Now that our MSRV supports the native methods, we have no need
for the helpers anymore. Because LLVM was already matching our
byte_utils methods as byteswap functions, this should have no
impact on generated (optimzied) code.
This removes most of the byte_utils usage, though some remains to
keep the patch size reasonable.
Channel serialization should happen "as if
remove_uncommitted_htlcs_and_mark_paused had just been called".
This is true for the most part, but outbound RemoteRemoved HTLCs
were being serialized as normal, even though
`remote_uncommitted_htlcs_and_mark_paused` resets them to
`Committed`.
This led to a bug identified by the `chanmon_consistency_target`
fuzzer wherein, if we receive a update_*_htlc message bug not the
corresponding commitment_signed prior to a serialization roundtrip,
we'd force-close the channel due to the peer "attempting to
fail/claim an HTLC which was already failed/claimed".
This stores transaction templates temporarily until their locktime
is reached, avoiding broadcasting (or RBF bumping) transactions
prior to their locktime. For those broadcasting transactions
(potentially indirectly) via Bitcoin Core RPC, this ensures no
automated rebroadcast of transactions on the client side is
required to get transactions confirmed.