The previous test vectors contained recurrences and older TLV types, and
therefore couldn't be parsed. Update the tests with the latest test
vectors from the spec and stop ignoring the tests.
These were added to help debug an encoding issue. However, the encoding
code was moved to the blinded_path module. Additionally, the test vector
used an old TLV encoding.
When we forward a payment and receive an `update_fulfill_htlc`
message from the downstream channel, we immediately claim the HTLC
on the upstream channel, before even doing a `commitment_signed`
dance on the downstream channel. This implies that our
`ChannelMonitorUpdate`s "go out" in the right order - first we
ensure we'll get our money by writing the preimage down, then we
write the update that resolves giving money on the downstream node.
This is safe as long as `ChannelMonitorUpdate`s complete in the
order in which they are generated, but of course looking forward we
want to support asynchronous updates, which may complete in any
order.
Here we add infrastructure to handle downstream
`ChannelMonitorUpdate`s which are blocked on an upstream
preimage-containing one. We don't yet actually do the blocking which
will come in a future commit.
If a `ChannelMonitorUpdate` was created and given to the user but
left uncompleted when the `ChannelManager` is persisted prior to a
restart, the user likely lost the `ChannelMonitorUpdate`(s). Thus,
we need to replay them for the user, which we do here using the
new `BackgroundEvent::MonitorUpdateRegeneratedOnStartup` variant.
When we generated a `ChannelMonitorUpdate` during `ChannelManager`
deserialization, we must ensure that it gets processed before any
other `ChannelMonitorUpdate`s. The obvious hook for this is when
taking the `total_consistency_lock`, which makes it unlikely we'll
regress by forgetting this.
Here we add that call in the `PersistenceNotifierGuard`, with a
test-only atomic bool to test that this criteria is met.
`BackgroundEvent` was used to store `ChannelMonitorUpdate`s which
result in a channel force-close, avoiding relying on
`ChannelMonitor`s having been loaded while `ChannelManager`
block-connection methods are called during startup.
In the coming commit(s) we'll also generate non-channel-closing
`ChannelMonitorUpdate`s during startup, which will need to be
replayed prior to any other `ChannelMonitorUpdate`s generated from
normal operation.
In the next commit we'll handle that by handling `BackgroundEvent`s
immediately after locking the `total_consistency_lock`.
Rather than letting `AChannelManager` be bounded by all traits
being `Sized` we make them explicitly `?Sized`. We also make the
trait no longer test-only as it will be used in a coming commit.
In the coming commits we'll need the counterparty node_id when
handling a background monitor update as we may need to resume
normal channel operation as a result. Thus, we go ahead and pipe it
through from the shutdown end, as it makes the codepaths
consistent.
Sadly, the monitor-originated shutdown case doesn't allow for a
required counterparty node_id as some versions of LDK didn't have
it present in the ChannelMonitor.
Our `no-std` locks simply panic if a lock cannot be taken as there
should be no lock contention in a single-threaded environment.
However, the `held_by_thread` debug methods were delegating to the
lock methods which resulted in a panic when asserting that a lock
*is* held by the current thread.
Instead, they are updated here to call the relevant `RefCell`
testing methods.
At times, we've noticed that channels with `lnd` counterparties do not
receive messages we expect to in a timely manner (or at all) after
sending them a `ChannelReestablish` upon reconnection, or a
`CommitmentSigned` message. This can block the channel state machine
from making progress, eventually leading to force closes, if any pending
HTLCs are committed and their expiration is met.
It seems common wisdom for `lnd` node operators to periodically restart
their node/reconnect to their peers, allowing them to start from a fresh
state such that the message we expect to receive hopefully gets sent. We
can achieve the same end result by disconnecting peers ourselves
(regardless of whether they're a `lnd` node), which we opt to implement
here by awaiting their response within two timer ticks.
`enqueue_message` simply adds the message to the outbound queue, it
still needs to be written to the socket with `do_attempt_write_data`.
However, since we immediately return an error causing the socket to be
closed, the message never actually gets sent.
If thre's a thread currently handling `PeerManager` events, the
next thread which attempts to handle events will block on the first
and then handle events after the first completes. (later threads
will return immediately to avoid blocking more than one thread).
This works fine as long as the user has a spare thread to leave
blocked, but if they don't (e.g. are running with a single-threaded
tokio runtime) this can lead to a full deadlock.
Instead, here, we never block waiting on another event processing
thread, returning immediately after signaling that the first thread
should start over once its complete to ensure all events are
handled.
While this could lead to starvation as we cause one thread to go
around and around and around again, the risk of that should be
relatively low as event handling should be pretty quick, and it's
certainly better than deadlocking.
Fixes https://github.com/lightningdevkit/rapid-gossip-sync-server/issues/32
Atomic lock simplification suggestion from @andrei-21
`NetworkGraph`'s `PartialEq` impl before this commit was deadlock-prone.
Similarly to `ChannelMonitor`'s, `PartialEq` impl, we use position in
memory for a total lockorder. This uses the assumption that the objects
cannot move within memory while the inner locks are held.
See https://github.com/lightning/bolts/pull/803
This protect the justice claim of counterparty revoked output. As
otherwise if the all the revoked outputs claims are batched in a
single transaction, low-feerate HTLCs transactions can delay our
honest justice claim transaction until BREAKDOWN_TIMEOUT expires.
Previously, we would panic when failing to construct onion messages in
certain circumstances. Here we opt to always rather error out and don't
panic if something goes wrong during OM packet construction.
Rather than using the std benchmark framework (which isn't
maintained and is unlikely to get any further maintenance), we swap
for criterion, which at least gets us a variable number of test
runs so our benchmarks don't take forever.
We also fix the RGS benchmark to pass now that the file in use is
stale compared to today's date.
When benchmarking our router, we previously only ever tested with
amounts under 1,000 sats, which is an incredibly small amount.
While this ensures we have the maximal number of available channels
to consider, it prevents our scorer from getting exercise across
its range. Further, we only score the immediate path we are
expecting to to send over, and not randomly but rather based on the
amount sent.
Here we try to make the benchmarks a bit more realistic by adding
a new benchmark which attempts to send around 100K sats, which is
a reasonable amount to send over a channel today. We also convert
the scoring data to be randomized based on the seed as well as
attempt to (possibly) find a new route for a much larger value and
score based on that. This potentially allows us to score multiple
potential paths between the source and destination as the large
route-find may return an MPP result.
There's a few route tests which do the same thing as the benchmarks
as they're also a good test. However, they didn't share code, which
is somewhat wasteful, so we fix that here.