When we forward a payment and receive an `update_fulfill_htlc`
message from the downstream channel, we immediately claim the HTLC
on the upstream channel, before even doing a `commitment_signed`
dance on the downstream channel. This implies that our
`ChannelMonitorUpdate`s "go out" in the right order - first we
ensure we'll get our money by writing the preimage down, then we
write the update that resolves giving money on the downstream node.
This is safe as long as `ChannelMonitorUpdate`s complete in the
order in which they are generated, but of course looking forward we
want to support asynchronous updates, which may complete in any
order.
Here we add infrastructure to handle downstream
`ChannelMonitorUpdate`s which are blocked on an upstream
preimage-containing one. We don't yet actually do the blocking which
will come in a future commit.
If a `ChannelMonitorUpdate` was created and given to the user but
left uncompleted when the `ChannelManager` is persisted prior to a
restart, the user likely lost the `ChannelMonitorUpdate`(s). Thus,
we need to replay them for the user, which we do here using the
new `BackgroundEvent::MonitorUpdateRegeneratedOnStartup` variant.
When we generated a `ChannelMonitorUpdate` during `ChannelManager`
deserialization, we must ensure that it gets processed before any
other `ChannelMonitorUpdate`s. The obvious hook for this is when
taking the `total_consistency_lock`, which makes it unlikely we'll
regress by forgetting this.
Here we add that call in the `PersistenceNotifierGuard`, with a
test-only atomic bool to test that this criteria is met.
`BackgroundEvent` was used to store `ChannelMonitorUpdate`s which
result in a channel force-close, avoiding relying on
`ChannelMonitor`s having been loaded while `ChannelManager`
block-connection methods are called during startup.
In the coming commit(s) we'll also generate non-channel-closing
`ChannelMonitorUpdate`s during startup, which will need to be
replayed prior to any other `ChannelMonitorUpdate`s generated from
normal operation.
In the next commit we'll handle that by handling `BackgroundEvent`s
immediately after locking the `total_consistency_lock`.
Rather than letting `AChannelManager` be bounded by all traits
being `Sized` we make them explicitly `?Sized`. We also make the
trait no longer test-only as it will be used in a coming commit.
In the coming commits we'll need the counterparty node_id when
handling a background monitor update as we may need to resume
normal channel operation as a result. Thus, we go ahead and pipe it
through from the shutdown end, as it makes the codepaths
consistent.
Sadly, the monitor-originated shutdown case doesn't allow for a
required counterparty node_id as some versions of LDK didn't have
it present in the ChannelMonitor.
Our `no-std` locks simply panic if a lock cannot be taken as there
should be no lock contention in a single-threaded environment.
However, the `held_by_thread` debug methods were delegating to the
lock methods which resulted in a panic when asserting that a lock
*is* held by the current thread.
Instead, they are updated here to call the relevant `RefCell`
testing methods.
To support receiving MPP keysends, we will add a new non-backward
compatible field to `PendingHTLCRouting::ReceiveKeysend`, which will
break deserialization of `ChannelManager` on downgrades, so we allow the
user choose whether they want to do this. Note that this commit only
adds the config flag, while full MPP support is added in later commits.
At times, we've noticed that channels with `lnd` counterparties do not
receive messages we expect to in a timely manner (or at all) after
sending them a `ChannelReestablish` upon reconnection, or a
`CommitmentSigned` message. This can block the channel state machine
from making progress, eventually leading to force closes, if any pending
HTLCs are committed and their expiration is met.
It seems common wisdom for `lnd` node operators to periodically restart
their node/reconnect to their peers, allowing them to start from a fresh
state such that the message we expect to receive hopefully gets sent. We
can achieve the same end result by disconnecting peers ourselves
(regardless of whether they're a `lnd` node), which we opt to implement
here by awaiting their response within two timer ticks.
`enqueue_message` simply adds the message to the outbound queue, it
still needs to be written to the socket with `do_attempt_write_data`.
However, since we immediately return an error causing the socket to be
closed, the message never actually gets sent.
If thre's a thread currently handling `PeerManager` events, the
next thread which attempts to handle events will block on the first
and then handle events after the first completes. (later threads
will return immediately to avoid blocking more than one thread).
This works fine as long as the user has a spare thread to leave
blocked, but if they don't (e.g. are running with a single-threaded
tokio runtime) this can lead to a full deadlock.
Instead, here, we never block waiting on another event processing
thread, returning immediately after signaling that the first thread
should start over once its complete to ensure all events are
handled.
While this could lead to starvation as we cause one thread to go
around and around and around again, the risk of that should be
relatively low as event handling should be pretty quick, and it's
certainly better than deadlocking.
Fixes https://github.com/lightningdevkit/rapid-gossip-sync-server/issues/32
Atomic lock simplification suggestion from @andrei-21
`NetworkGraph`'s `PartialEq` impl before this commit was deadlock-prone.
Similarly to `ChannelMonitor`'s, `PartialEq` impl, we use position in
memory for a total lockorder. This uses the assumption that the objects
cannot move within memory while the inner locks are held.
When calculating the amount available to send for the next HTLC, if
we over-count we may create routes which are not actually usable.
Historically this has been an issue, which we resolve over a few
commits.
Here we consider the number of in-flight HTLCs which we are allowed
to push towards a counterparty at once, setting the available
balance to zero if we cannot push any further HTLCs.
We also add some testing when sending to ensure that send failures
are accounted for in our balance calculations.
When calculating the amount available to send for the next HTLC, if
we over-count we may create routes which are not actually usable.
Historically this has been an issue, which we resolve over a few
commits.
Here we include the cost of the commitment transaction fee in our
calculation, subtracting the commitment tx fee cost from the
available as we do in `send_payment`.
We also add some testing when sending to ensure that send failures
are accounted for in our balance calculations.
This commit is based on original work by
Gleb Naumenko <naumenko.gs@gmail.com> and modified by
Matt Corallo <git@bluematt.me>.
In the coming commits we redo our next-HTLC-available logic which
requires some minor test changes for tests which relied on
calculating routes which were not usable.
Here we do a minor prefactor to simplify a test which now no longer
requires later changes.
While its nice to be able to push an HTLC which spends balance that
is removed in our local commitment transaction but awaiting an RAA
from our peer for final removal its by no means a critical feature.
Because peers should really be sending RAAs quickly after we send
a commitment, this should be an exceedingly rare case, and we
already don't expose this as available balance when routing, so
this isn't even made available when sending, only forwarding.
Note that `test_pending_claimed_htlc_no_balance_underflow` is
removed as it tested a case which was only possible because of this
and now is no longer possible.
Also get rid of some trailing whitespace because my text editor likes to do
that.
We'll next add a new variant for max_htlc provided in route hints, which will
be treated differently in scoring.
See https://github.com/lightning/bolts/pull/803
This protect the justice claim of counterparty revoked output. As
otherwise if the all the revoked outputs claims are batched in a
single transaction, low-feerate HTLCs transactions can delay our
honest justice claim transaction until BREAKDOWN_TIMEOUT expires.
Previously, we would panic when failing to construct onion messages in
certain circumstances. Here we opt to always rather error out and don't
panic if something goes wrong during OM packet construction.