Our lockdep logic (on Windows) identifies a mutex based on which
line it was constructed on. Thus, if we have two mutexes
constructed on the same line it will generate false positives.
Taking two instances of the same mutex may be totally fine, but it
requires a total lockorder that we cannot (trivially) check. Thus,
its generally unsafe to do if we can avoid it.
To discourage doing this, here we default to panicing on such locks
in our lockorder tests, with a separate lock function added that is
clearly labeled "unsafe" to allow doing so when we can guarantee a
total lockorder.
This requires adapting a number of sites to the new API, including
fixing a bug this turned up in `ChannelMonitor`'s `PartialEq` where
no lockorder was guaranteed.
Our existing lockorder tests assume that a read lock on a thread
that is already holding the same read lock is totally fine. This
isn't at all true. The `std` `RwLock` behavior is
platform-dependent - on most platforms readers can starve writers
as readers will never block for a pending writer. However, on
platforms where this is not the case, one thread trying to take a
write lock may deadlock with another thread that both already has,
and is attempting to take again, a read lock.
Worse, our in-tree `FairRwLock` exhibits this behavior explicitly
on all platforms to avoid the starvation issue.
Thus, we shouldn't have any special handling for allowing recursive
read locks, so we simply remove it here.
When handling a `ChannelMonitor` update via the new
`handle_new_monitor_update` macro, we always call the macro with
the `per_peer_state` read lock held and have the macro drop the
per-peer state lock. Then, when handling the resulting updates, we
may take the `per_peer_state` read lock again in another function.
In a coming commit, recursive read locks will be disallowed, so we
have to drop the `per_peer_state` read lock before calling
additional functions in `handle_new_monitor_update`, which we do
here.
Our existing lockorder tests assume that a read lock on a thread
that is already holding the same read lock is totally fine. This
isn't at all true. The `std` `RwLock` behavior is
platform-dependent - on most platforms readers can starve writers
as readers will never block for a pending writer. However, on
platforms where this is not the case, one thread trying to take a
write lock may deadlock with another thread that both already has,
and is attempting to take again, a read lock.
Worse, our in-tree `FairRwLock` exhibits this behavior explicitly
on all platforms to avoid the starvation issue.
Sadly, a user ended up hitting this deadlock in production in the
form of a call to `get_and_clear_pending_msg_events` which holds
the `ChannelManager::total_consistency_lock` before calling
`process_pending_monitor_events` and eventually
`channel_monitor_updated`, which tries to take the same read lock
again.
Luckily, the fix is trivial, simply remove the redundand read lock
in `channel_monitor_updated`.
Fixes#2000
We previously avoided holding the `total_consistency_lock` while
doing crypto operations to build onions. However, now that we've
abstracted out the outbound payment logic into a utility module,
ensuring the state is consistent at all times is now abstracted
away from code authors and reviewers, making it likely to break.
Further, because we now call `send_payment_along_path` both with,
and without, the `total_consistency_lock`, and because recursive
read locks may deadlock, it would now be quite difficult to figure
out which paths through `outbound_payment` need the lock and which
don't.
While it may slow writes somewhat, it's not really worth trying to
figure out this mess, instead we just hold the
`total_consistency_lock` before going into `outbound_payment`
functions.
fbc08477e8 purported to "move" the
`final_cltv_expiry_delta` field to `PaymentParamters` from
`RouteParameters`. However, for naive backwards-compatibility
reasons it left the existing on in place and only added a new,
redundant field in `PaymentParameters`.
It turns out there's really no reason for this - if we take a more
critical eye towards backwards compatibility we can figure out the
correct value in every `PaymentParameters` while deserializing.
We do this here - making `PaymentParameters` a `ReadableArgs`
taking a "default" `cltv_expiry_delta` when it goes to read. This
allows existing `RouteParameters` objects to pass the read
`final_cltv_expiry_delta` field in to be used if the new field
wasn't present.
This adds `required` support for trait-wrapped reading (e.g. for
objects read via `ReadableArgs`) as well as support for the
trait-wrapped reading syntax across the TLV struct/enum
serialization macros.
When we read a `Route` (or a list of `RouteHop`s), we should never
have zero paths or zero `RouteHop`s in a path. As such, its fine to
simply reject these at deserialization-time. Technically this could
lead to something which we can generate not round-trip'ing
serialization, but that seems okay here.
Because `lightning-transaction-sync` does not have an MSRV (and
because its dev-dependencies are huge), we can't build it by
default when devs run `cargo test`, so it is moved out of the
top-level workspace.
When using lower level macros such as read_tlv_stream, upgradable_required
fields have been treated as regular options. This is incorrect, they should
either be upgradable_options or treated as required fields.
For some reason rustc, at some point, decided that our optimization
of dependencies implies we want to also build with LTO. This causes
our test builds to take substantially longer to compile, even with
only a trivial change. By hard-disabling this (even keeping the
optimization of the test and in-tree libraries enabled) the time
required to build with only a trivial change to
`functional_tests.rs` goes from 0m25.635s wall clock/1m14.220s CPU
time to 0m17.841s wall clock/0m17.828s CPU time on my i7-13700K on
rustc 1.63.
The changes in test execution time appear to be within noise.
While we're at it, we also bump dependencies to build with -O2
because their build time is now substantially reduced cost.
This field was previous useful in manual retries for users to know when all
paths of a payment have failed and it is safe to retry. Now that we support
automatic retries in ChannelManager and no longer support manual retries, the
field is no longer useful.
For backwards compat, we now always write false for this field. If we didn't do
this, previous versions would default this field's value to true, which can be
problematic because some clients have relied on the field to indicate when a
full payment retry is safe.
An overflow can occur when multiplying the offer amount by the requested
quantity when no amount is given in the request. Return an error instead
of overflowing.
An overflow can occur when multiplying the offer amount by the requested
quantity when checking if the given amount is enough. Return an error
instead of overflowing.
Fuzz testing bech32 decoding along with deserializing the underlying
message can result in overly exhaustive searches. Instead, the message
deserializations are now fuzzed separately. Add fuzzing for bech32
decoding.
In order to fuzz test Bech32Encode parsing independent of the underlying
message deserialization, the trait needs to be exposed. Conditionally
expose it only for fuzzing.
An invoice is serialized as a TLV stream and encoded as bytes. Add a
fuzz test that parses the TLV stream and deserializes the underlying
Invoice. Then compare the original bytes with those obtained by
re-serializing the Invoice.
An invoice request is serialized as a TLV stream and encoded as bytes.
Add a fuzz test that parses the TLV stream and deserializes the
underlying InvoiceRequest. Then compare the original bytes with those
obtained by re-serializing the InvoiceRequest.
Forcing users to pass a genesis block hash has ended up being
error-prone largely due to byte-swapping questions for bindings
users. Further, our API is currently inconsistent - in
`ChannelManager` we take a `Bitcoin::Network` but in `NetworkGraph`
we take the genesis block hash.
Luckily `NetworkGraph` is the only remaining place where we require
users pass the genesis block hash, so swapping it for a `Network`
is a simple change.
Prior to this, we returned PaymentSendFailure from auto retry send payment
methods. This implied that we might return a PartialFailure from them, which
has never been the case. So it makes sense to rework the errors to be a better
fit for the methods.
We're taking error handling in a totally different direction now to make it
more asynchronous, see send_payment_internal for more information.
The `Channel::get_shutdown` docs are very clear - if the channel
jumps to `Shutdown` as a result of not being funded when we go to
initiate shutdown we should not generate a `ChannelMonitorUpdate`
as there's no need to bother with the shutdown script - we're
force-closing anyway.
However, this wasn't actually implemented, potentially causing a
spurious monitor update for no reason.