Commit graph

4838 commits

Author SHA1 Message Date
Matt Corallo
1751f77edf Avoid polling completed futures in the background-processor
`poll`ing completed futures invokes undefined behavior in Rust
(panics, etc, obviously not memory corruption as its not unsafe).
Sadly, in our futures-based version of
`lightning-background-processor` we have one case where we can
`poll` a completed future - if the timer for the network graph
prune + persist completes without a network graph to prune +
persist we'll happily poll the same future over and over again,
likely panicing in user code.
2023-03-07 18:06:12 +00:00
Matt Corallo
af76face12
Merge pull request #2065 from TheBlueMatt/2023-02-0.0.114
Cut 0.0.114
2023-03-04 01:00:06 +00:00
Matt Corallo
a361be3544 Update crate versions to 0.0.114/invoice 0.22 2023-03-04 00:06:46 +00:00
Matt Corallo
e5c1a6d4dd Add release notes for 0.0.114 2023-03-04 00:06:46 +00:00
Matt Corallo
4d2679fac6
Merge pull request #2072 from jkczyz/2023-01-fix-scoring-div-by-zero
Fix division by zero in `ProbabilisticScorer`
2023-03-04 00:06:29 +00:00
Matt Corallo
0188861585
Merge pull request #2071 from TheBlueMatt/2023-01-fix-fast-extra-ready-panic
Fix panic on receiving `channel_ready` after 1st commitment update
2023-03-03 23:32:42 +00:00
Matt Corallo
dfe9ea3762
Use ProbabilisticScorer in router fuzzing, to cover overflows there 2023-03-03 16:25:57 -06:00
Jeffrey Czyz
bfd1a57434
Guard against division by zero in scorer
Since a node may announce that the htlc_maximum_msat of a channel is
zero, adding one to the denominator in the bucket formulas will prevent
the panic from ever happening. While the routing algorithm may never
select such a channel to score, this precaution may still be useful in
case the algorithm changes or if the scorer is used with a different
routing algorithm.
2023-03-03 16:25:57 -06:00
Jeffrey Czyz
e44868681d
Update scoring history buckets when no change
Even when there is no change in min/max liquidity knowledge, tracking
should still be updated to include the additional data point.
2023-03-03 16:25:17 -06:00
Matt Corallo
fac5373687
Merge pull request #2068 from jkczyz/2023-03-doc-fixes
Doc and build warning fixes
2023-03-03 22:19:59 +00:00
Matt Corallo
852cf2b3f2
Merge pull request #2069 from TheBlueMatt/2023-03-no-tx-sync-auto-std
Do not auto-select the lightning `std` feature from tx-sync crate
2023-03-03 22:19:46 +00:00
Matt Corallo
eed4c62f36
Merge pull request #2070 from TheBlueMatt/2023-03-get-key
Expose the node secret key in `{Phantom,}KeysManager`
2023-03-03 22:19:37 +00:00
Jeffrey Czyz
1d1323a3d0
Fix build warnings 2023-03-03 14:23:18 -06:00
Jeffrey Czyz
8408b8d0e4
Add more documentation about BlockSourceError
Some BlockSource implementations provide more error details. Document
this in case users want to examine it further.
2023-03-03 14:23:17 -06:00
Matt Corallo
029b66540a Expose the node secret key in {Phantom,}KeysManager
When we removed the private keys from the signing interface we
forgot to re-add them in the public interface of our own
implementations, which users may need.
2023-03-03 20:11:41 +00:00
Matt Corallo
06f09eeafc
Merge pull request #2067 from benthecarman/fix-typos
Fix typos in lightning-transaction-sync
2023-03-03 20:00:12 +00:00
Matt Corallo
41bcd68a40
Merge pull request #2066 from TheBlueMatt/2023-02-no-enum-refs
Pass `FailureCode` to `fail_htlc_backwards` by ownership
2023-03-03 19:44:40 +00:00
Matt Corallo
a9e6341f79
Merge pull request #2048 from TheBlueMatt/2023-02-send-persist-order-a
Track claimed outbound HTLCs in ChannelMonitors
2023-03-03 19:37:31 +00:00
Matt Corallo
1a18b881bb Do not auto-select the lightning std feature from tx-sync crate
We have some downstream folks who are using LDK in wasm compiled
via the normal rust wasm path. To ensure nothing breaks they want
to use `no-std` on the lightning crate, disabling time calls as
those panic. However, the HTTP logic in
`lightning-transaction-sync` gets automatically stubbed out by the
HTTP client crates when targeting wasm via `wasm_bindgen`, so it
works fine despite the std restrictions.

In order to make both work, `lightning-transaction-sync` can remain
`std`, but needs to not automatically enable the `std` flag on the
`lightning` crate, ie by setting `default-features = false`. We do
so here.
2023-03-03 19:26:14 +00:00
Matt Corallo
0987b32bed Pass FailureCode to fail_htlc_backwards by ownership
`FaliureCode` is a trivial enum with no body, so we shouldn't be
passing it by reference. Its sufficiently strange that the Java
bindings aren't happy with it, which is fine, we should just fix it
here.
2023-03-03 17:20:58 +00:00
Matt Corallo
75527dbdf1 Remove unused compat block in provide_latest_holder_commitment_tx 2023-03-03 17:19:03 +00:00
Matt Corallo
0ad1f4c943 Track claimed outbound HTLCs in ChannelMonitors
When we receive an update_fulfill_htlc message, we immediately try
to "claim" the HTLC against the HTLCSource. If there is one, this
works great, we immediately generate a `ChannelMonitorUpdate` for
the corresponding inbound HTLC and persist that before we ever get
to processing our counterparty's `commitment_signed` and persisting
the corresponding `ChannelMonitorUpdate`.

However, if there isn't one (and this is the first successful HTLC
for a payment we sent), we immediately generate a `PaymentSent`
event and queue it up for the user. Then, a millisecond later, we
receive the `commitment_signed` from our peer, removing the HTLC
from the latest local commitment transaction as a side-effect of
the `ChannelMonitorUpdate` applied.

If the user has processed the `PaymentSent` event by that point,
great, we're done. However, if they have not, and we crash prior to
persisting the `ChannelManager`, on startup we get confused about
the state of the payment. We'll force-close the channel for being
stale, and see an HTLC which was removed and is no longer present
in the latest commitment transaction (which we're broadcasting).
Because we claim corresponding inbound HTLCs before updating a
`ChannelMonitor`, we assume such HTLCs have failed - attempting to
fail after having claimed should be a noop. However, in the
sent-payment case we now generate a `PaymentFailed` event for the
user, allowing an HTLC to complete without giving the user a
preimage.

Here we address this issue by storing the payment preimages for
claimed outbound HTLCs in the `ChannelMonitor`, in addition to the
existing inbound HTLC preimages already stored there. This allows
us to fix the specific issue described by checking for a preimage
and switching the type of event generated in response. In addition,
it reduces the risk of future confusion by ensuring we don't fail
HTLCs which were claimed but not fully committed to before a crash.

It does not, however, full fix the issue here - because the
preimages are removed after the HTLC has been fully removed from
available commitment transactions if we are substantially delayed
in persisting the `ChannelManager` from the time we receive the
`update_fulfill_htlc` until after a full commitment signed dance
completes we may still hit this issue. The full fix for this issue
is to delay the persistence of the `ChannelMonitorUpdate` until
after the `PaymentSent` event has been processed. This avoids the
issue entirely, ensuring we process the event before updating the
`ChannelMonitor`, the same as we ensure the upstream HTLC has been
claimed before updating the `ChannelMonitor` for forwarded
payments.

The full solution will be implemented in a later work, however this
change still makes sense at that point as well - if we were to
delay the initial `commitment_signed` `ChannelMonitorUpdate` util
after the `PaymentSent` event has been processed (which likely
requires a database update on the users' end), we'd hold our
`commitment_signed` + `revoke_and_ack` response for two DB writes
(i.e. `fsync()` calls), making our commitment transaction
processing a full `fsync` slower. By making this change first, we
can instead delay the `ChannelMonitorUpdate` from the
counterparty's final `revoke_and_ack` message until the event has
been processed, giving us a full network roundtrip to do so and
avoiding delaying our response as long as an `fsync` is faster than
a network roundtrip.
2023-03-03 17:19:03 +00:00
Jeffrey Czyz
fbe9f47e12
Reference Router in ChannelManager docs 2023-03-03 09:48:25 -06:00
Jeffrey Czyz
a58b81aa1e
DRY up historical bucket_idx calculation 2023-03-03 09:41:52 -06:00
Jeffrey Czyz
10cfe5c973
Fix scorer panic when available capacity is zero
ProbabilisticScorer takes a ChannelUsage when computing a penalty for a
channel. The formula for calculating the liquidity penalty reduces the
maximum capacity by the amount of in-flight HTLCs (available capacity)
and adds one to prevent division by zero.

However, since the available capacity is passed to
DirectedChannelLiquidity as the capacity, other penalty formulas may use
the available (i.e., reduced) capacity inadvertently. In practice, this
has two ramifications for the historical liquidity penalty computation:

1. The bucket formula doesn't have a consistent denominator for a given
   channel.
2. The bucket formula may divide by zero when the in-flight HTLC amount
   equals or exceeds the effective capacity.

Fixing this involves only using the available capacity when appropriate.
2023-03-03 09:41:42 -06:00
benthecarman
53788913c9
Fix typos in lightning-transaction-sync 2023-03-02 23:36:54 -06:00
Matt Corallo
5c6e36727c
Merge pull request #2060 from TheBlueMatt/2023-02-peers-disconnect-consistency
Remove peers from the node_id_to_descriptor even without init
2023-03-02 23:44:23 +00:00
Matt Corallo
33c36d007e Avoid removing stale preimages when hashes collide in fuzzing 2023-03-02 23:42:40 +00:00
Matt Corallo
48946d9add Fuzz rapid peer connection/disconnections in threads
This test fails on the bug fixed two commits ago with the
additional assertions in the previous commit.
2023-03-02 21:44:43 +00:00
Matt Corallo
cc9aa45c7e Improve PeerHandler debug_assertions and checks
This removes two panics from `PeerHandler` which can trivially be
`debug_assert!(false); return Err;`s, and adds another
`debug_assertion` on internal state consistency during disconnect.
2023-03-02 21:15:58 +00:00
Wilmer Paulino
6ddf69c93b
Merge pull request #2064 from TheBlueMatt/2023-03-debug-futures
Make waking after a future completes propagates to the next future
2023-03-02 11:49:54 -08:00
Jeffrey Czyz
f1145158fe
Merge pull request #2057 from johncantrell97/rpc-error-code
Surface bitcoind rpc error information
2023-03-02 08:26:29 -06:00
Matt Corallo
cd03cb6477 Make waking after a future completes propagates to the next future
In our `wakers`, if we first `notify` a future, which is then
`poll`ed complete, and then `notify` the same waker again before a
new future is fetched, that new future will be marked as
non-complete initially and wait for a third `notify`.

The fix is luckily rather trivial, when we `notify` a future, if it
is completed immediately, simply wipe the future state so that we
look at the pending-notify flag when we generate the next future.
2023-03-02 07:50:16 +00:00
Matt Corallo
04ce6de4eb
Merge pull request #2061 from TheBlueMatt/2023-02-114-upstream-bindings
`C-not exported` tags for 0.0.114
2023-03-01 18:57:12 +00:00
Matt Corallo
40fcfafef2 Tag types used for the TLV macros with (C-not exported)
Obviously bindings users can't use the Rust TLV-implementation
macros, so there's no reason to export typsed used exclusively by
them.
2023-03-01 00:21:28 +00:00
Matt Corallo
08cb01e6ef Mark IndexedMap types as (C-not exported)
While we could try to expose the type explicitly, we already have
alternative accessors for bindings, and mapping `Hash`, `Ord` and
the other requirements for `IndexedMap` would be a good chunk of
additional work.
2023-03-01 00:21:23 +00:00
Matt Corallo
2c15bb437e Remove peers from the node_id_to_descriptor even without init
When a peer has finished the noise handshake, but has not yet
completed the lightning `Init`-based handshake, they will be
present in the `node_id_to_descriptor` set, even though
`Peer::handshake_complete()` returns false. Thus, when we go to
disconnect such a peer, we must ensure that we remove it from the
descriptor set as well.

Failing to do so caused an `Inconsistent peers set state!` panic in
the C bindings network handler.
2023-02-28 21:40:20 +00:00
Wilmer Paulino
0b1a64f12d
Merge pull request #2046 from TheBlueMatt/2023-02-rgs-robust-and-log
Do not fail to apply RGS updates for removed channels
2023-02-28 11:36:04 -08:00
John Cantrell
a7af24bf3c Surface bitcoind rpc error code
Users of the RpcClient had no way to access the error code
returned by bitcoind's rpc.  We embed a new RpcError struct
as the inner error for the returned io::Error. Users can access
both the code and the message using this inner struct.
2023-02-28 14:03:28 -05:00
Matt Corallo
d2f5dc01d1 Add some basic logging to Rapid Gossip Sync processing 2023-02-28 17:56:16 +00:00
Matt Corallo
df20bcf188 Make log macros more usable outside of the lightning crate
... by using explicit paths rather than requiring imports.
2023-02-28 17:56:16 +00:00
Matt Corallo
b8eecd1fd8 Do not fail to apply RGS updates for removed channels
If we receive a Rapid Gossip Sync update for channels where we are
missing the existing channel data, we should ignore the missing
channel. This can happen in a number of cases, whether because we
received updated channel information via an onion error from an
HTLC failure or because we've partially synced the graph from a
peer over the standard lightning P2P protocol.
2023-02-28 17:56:16 +00:00
Wilmer Paulino
8311581fe1
Merge pull request #2006 from TheBlueMatt/2023-02-no-recursive-read-locks
Refuse recursive read locks
2023-02-28 00:24:16 -08:00
Matt Corallo
b8bea74d6a
Merge pull request #2015 from TheBlueMatt/2023-02-no-dumb-redundant-fields 2023-02-28 07:55:56 +00:00
Matt Corallo
065dc6e689 Make sure individual mutexes are constructed on different lines
Our lockdep logic (on Windows) identifies a mutex based on which
line it was constructed on. Thus, if we have two mutexes
constructed on the same line it will generate false positives.
2023-02-28 01:06:35 +00:00
Matt Corallo
22662efbc4 Export RUST_BACKTRACE=1 in --feature backtrace CI test
as this test often fails on windows which is hard to debug locally
for most contributors.
2023-02-28 01:06:35 +00:00
Matt Corallo
f082ad40b5 Disallow taking two instances of the same mutex at the same time
Taking two instances of the same mutex may be totally fine, but it
requires a total lockorder that we cannot (trivially) check. Thus,
its generally unsafe to do if we can avoid it.

To discourage doing this, here we default to panicing on such locks
in our lockorder tests, with a separate lock function added that is
clearly labeled "unsafe" to allow doing so when we can guarantee a
total lockorder.

This requires adapting a number of sites to the new API, including
fixing a bug this turned up in `ChannelMonitor`'s `PartialEq` where
no lockorder was guaranteed.
2023-02-28 01:06:35 +00:00
Matt Corallo
9c08fbd435 Refuse recursive read locks in lockorder testing
Our existing lockorder tests assume that a read lock on a thread
that is already holding the same read lock is totally fine. This
isn't at all true. The `std` `RwLock` behavior is
platform-dependent - on most platforms readers can starve writers
as readers will never block for a pending writer. However, on
platforms where this is not the case, one thread trying to take a
write lock may deadlock with another thread that both already has,
and is attempting to take again, a read lock.

Worse, our in-tree `FairRwLock` exhibits this behavior explicitly
on all platforms to avoid the starvation issue.

Thus, we shouldn't have any special handling for allowing recursive
read locks, so we simply remove it here.
2023-02-28 01:06:35 +00:00
Matt Corallo
2c8a26c6d2 Don't per_peer_state read locks recursively in monitor updating
When handling a `ChannelMonitor` update via the new
`handle_new_monitor_update` macro, we always call the macro with
the `per_peer_state` read lock held and have the macro drop the
per-peer state lock. Then, when handling the resulting updates, we
may take the `per_peer_state` read lock again in another function.

In a coming commit, recursive read locks will be disallowed, so we
have to drop the `per_peer_state` read lock before calling
additional functions in `handle_new_monitor_update`, which we do
here.
2023-02-28 01:06:35 +00:00
Matt Corallo
38374dde42 Expect callers to hold read locks before channel_monitor_updated
Our existing lockorder tests assume that a read lock on a thread
that is already holding the same read lock is totally fine. This
isn't at all true. The `std` `RwLock` behavior is
platform-dependent - on most platforms readers can starve writers
as readers will never block for a pending writer. However, on
platforms where this is not the case, one thread trying to take a
write lock may deadlock with another thread that both already has,
and is attempting to take again, a read lock.

Worse, our in-tree `FairRwLock` exhibits this behavior explicitly
on all platforms to avoid the starvation issue.

Sadly, a user ended up hitting this deadlock in production in the
form of a call to `get_and_clear_pending_msg_events` which holds
the `ChannelManager::total_consistency_lock` before calling
`process_pending_monitor_events` and eventually
`channel_monitor_updated`, which tries to take the same read lock
again.

Luckily, the fix is trivial, simply remove the redundand read lock
in `channel_monitor_updated`.

Fixes #2000
2023-02-28 01:06:35 +00:00