Commit graph

15 commits

Author SHA1 Message Date
Matt Corallo
3acf7e2c9d Drop the dummy no-std Condvar which never sleeps
In `no-std`, we exposed `wait` functions which rely on a dummy
`Condvar` which never actually sleeps. This is somwhat nonsensical,
not to mention confusing to users. Instead, we simply remove the
`wait` methods in `no-std` builds.
2023-04-03 16:49:54 +00:00
Matt Corallo
a1b5a1bba3 Add CondVar::wait_{timeout_,}while to debug_sync
These are useful, but we previously couldn't use them due to our
MSRV. Now that we can, we should use them, so we expose them via
our normal debug_sync wrappers.
2023-04-03 16:49:54 +00:00
Wilmer Paulino
2166c8a2c4
Ignore lockorder violation on same callsite with different construction
As long as the lock order on such locks is still valid, we should allow
them regardless of whether they were constructed at the same location or
not. Note that we can only really enforce this if we require one lock
call per line, or if we have access to symbol columns (as we do on Linux
and macOS). We opt for a smaller patch by relying on the latter.

This was previously triggered by some recent test changes to
`test_manager_serialize_deserialize_inconsistent_monitor`. When the
test ends and a node is dropped causing us to persist each, we'd detect
a possible lockorder violation deadlock across three different `Mutex`
instances that are held at the same location when serializing our
`per_peer_states` in `ChannelManager::write`.

The presumed lockorder violation happens because the first `Mutex` held
shares the same construction location with the third one, while the
second `Mutex` has a different construction location. When we hold the
second one, we consider the first as a dependency, and then consider the
second as a dependency when holding the third, causing a circular
dependency (since the third shares the same construction location as the
first).

This isn't considered a lockorder violation that could result in a
deadlock as the comment suggests inline though, since we are under a
dependent write lock which no one else can have access to.
2023-03-28 17:27:47 -07:00
Matt Corallo
fac5373687
Merge pull request #2068 from jkczyz/2023-03-doc-fixes
Doc and build warning fixes
2023-03-03 22:19:59 +00:00
Jeffrey Czyz
1d1323a3d0
Fix build warnings 2023-03-03 14:23:18 -06:00
Matt Corallo
0ad1f4c943 Track claimed outbound HTLCs in ChannelMonitors
When we receive an update_fulfill_htlc message, we immediately try
to "claim" the HTLC against the HTLCSource. If there is one, this
works great, we immediately generate a `ChannelMonitorUpdate` for
the corresponding inbound HTLC and persist that before we ever get
to processing our counterparty's `commitment_signed` and persisting
the corresponding `ChannelMonitorUpdate`.

However, if there isn't one (and this is the first successful HTLC
for a payment we sent), we immediately generate a `PaymentSent`
event and queue it up for the user. Then, a millisecond later, we
receive the `commitment_signed` from our peer, removing the HTLC
from the latest local commitment transaction as a side-effect of
the `ChannelMonitorUpdate` applied.

If the user has processed the `PaymentSent` event by that point,
great, we're done. However, if they have not, and we crash prior to
persisting the `ChannelManager`, on startup we get confused about
the state of the payment. We'll force-close the channel for being
stale, and see an HTLC which was removed and is no longer present
in the latest commitment transaction (which we're broadcasting).
Because we claim corresponding inbound HTLCs before updating a
`ChannelMonitor`, we assume such HTLCs have failed - attempting to
fail after having claimed should be a noop. However, in the
sent-payment case we now generate a `PaymentFailed` event for the
user, allowing an HTLC to complete without giving the user a
preimage.

Here we address this issue by storing the payment preimages for
claimed outbound HTLCs in the `ChannelMonitor`, in addition to the
existing inbound HTLC preimages already stored there. This allows
us to fix the specific issue described by checking for a preimage
and switching the type of event generated in response. In addition,
it reduces the risk of future confusion by ensuring we don't fail
HTLCs which were claimed but not fully committed to before a crash.

It does not, however, full fix the issue here - because the
preimages are removed after the HTLC has been fully removed from
available commitment transactions if we are substantially delayed
in persisting the `ChannelManager` from the time we receive the
`update_fulfill_htlc` until after a full commitment signed dance
completes we may still hit this issue. The full fix for this issue
is to delay the persistence of the `ChannelMonitorUpdate` until
after the `PaymentSent` event has been processed. This avoids the
issue entirely, ensuring we process the event before updating the
`ChannelMonitor`, the same as we ensure the upstream HTLC has been
claimed before updating the `ChannelMonitor` for forwarded
payments.

The full solution will be implemented in a later work, however this
change still makes sense at that point as well - if we were to
delay the initial `commitment_signed` `ChannelMonitorUpdate` util
after the `PaymentSent` event has been processed (which likely
requires a database update on the users' end), we'd hold our
`commitment_signed` + `revoke_and_ack` response for two DB writes
(i.e. `fsync()` calls), making our commitment transaction
processing a full `fsync` slower. By making this change first, we
can instead delay the `ChannelMonitorUpdate` from the
counterparty's final `revoke_and_ack` message until the event has
been processed, giving us a full network roundtrip to do so and
avoiding delaying our response as long as an `fsync` is faster than
a network roundtrip.
2023-03-03 17:19:03 +00:00
Matt Corallo
065dc6e689 Make sure individual mutexes are constructed on different lines
Our lockdep logic (on Windows) identifies a mutex based on which
line it was constructed on. Thus, if we have two mutexes
constructed on the same line it will generate false positives.
2023-02-28 01:06:35 +00:00
Matt Corallo
f082ad40b5 Disallow taking two instances of the same mutex at the same time
Taking two instances of the same mutex may be totally fine, but it
requires a total lockorder that we cannot (trivially) check. Thus,
its generally unsafe to do if we can avoid it.

To discourage doing this, here we default to panicing on such locks
in our lockorder tests, with a separate lock function added that is
clearly labeled "unsafe" to allow doing so when we can guarantee a
total lockorder.

This requires adapting a number of sites to the new API, including
fixing a bug this turned up in `ChannelMonitor`'s `PartialEq` where
no lockorder was guaranteed.
2023-02-28 01:06:35 +00:00
Matt Corallo
9c08fbd435 Refuse recursive read locks in lockorder testing
Our existing lockorder tests assume that a read lock on a thread
that is already holding the same read lock is totally fine. This
isn't at all true. The `std` `RwLock` behavior is
platform-dependent - on most platforms readers can starve writers
as readers will never block for a pending writer. However, on
platforms where this is not the case, one thread trying to take a
write lock may deadlock with another thread that both already has,
and is attempting to take again, a read lock.

Worse, our in-tree `FairRwLock` exhibits this behavior explicitly
on all platforms to avoid the starvation issue.

Thus, we shouldn't have any special handling for allowing recursive
read locks, so we simply remove it here.
2023-02-28 01:06:35 +00:00
Matt Corallo
6090d9e6a8 Test if a given mutex is locked by the current thread in tests
In anticipation of the next commit(s) adding threaded tests, we
need to ensure our lockorder checks work fine with multiple
threads. Sadly, currently we have tests in the form
`assert!(mutex.try_lock().is_ok())` to assert that a given mutex is
not locked by the caller to a function.

The fix is rather simple given we already track mutexes locked by a
thread in our `debug_sync` logic - simply replace the check with a
new extension trait which (for test builds) checks the locked state
by only looking at what was locked by the current thread.
2023-02-16 21:35:23 +00:00
Matt Corallo
9422370dd2 Move fairrwlock to the sync module 2023-02-14 23:33:02 +00:00
Matt Corallo
ab46f6b988 Make debug_sync regex more robust
On windows the symbol names appear to sometimes be truncated,
which causes the symbol name to not include the `::new` at the end.
This causes the regex to mis-match and track the wrong location
for the mutex construction, leading to bogus lockorder violations.

For example, in testing the following symbol name appeared on
Windows, without the function name itself:

`lightning::debug_sync::RwLock<std::collections:#️⃣:map::HashMap<lightning::chain::transaction::OutPoint,lightning::chain::chainmonitor::MonitorHolder<lightning::util::enforcing_trait_impls::EnforcingSigner>,std::collections:#️⃣:map::RandomState> >::`
2023-01-10 06:48:04 +00:00
Matt Corallo
230331f3e8 Move tests from debug_sync to a new submodule
This will allow us to change the module regex match in debug_sync
to make it more robust.
2023-01-10 06:48:04 +00:00
Matt Corallo
558bfa3fb3 Move debug_sync to the new sync folder 2023-01-10 06:31:13 +00:00
Matt Corallo
f66f720fa5 Move no-std sync implementations to a folder to clean up 2023-01-10 06:26:46 +00:00