Useful for failing blinded payments back with malformed, and will also be
useful in the future when we move onion decoding into
process_pending_htlc_forwards, after which Channel::fail_htlc will be used for
all malformed htlcs.
Currently it returns only update_fail, but we'll want it to be able to return
update_malformed as well in upcoming commits. We'll use this for correctly
failing blinded received HTLCs backwards with malformed and
invalid_onion_blinding error per BOLT 4.
For context, blinded HTLCs where we are not the intro node must always be
failed back with malformed and invalid_onion_blinding error per BOLT 4.
Prior to supporting blinded payments, the only way for an update_malformed to
be returned from Channel was if an onion was actually found to be malformed
during initial update_add processing. This meant that any malformed HTLCs would
never live in the holding cell but instead would be returned directly upon
initial RAA processing.
Now, we need to be able to store these HTLCs in the holding cell because the
HTLC failure necessitating an update_malformed may come long after the RAA is
initially processed, and we may not be a state to send the update_malformed
message at that time.
Therefore, add a new holding cell HTLC variant for blinded non-intro node
HTLCs, which will signal to Channel to fail with malformed and the correct
error code.
Will be used in the next commit(s) to let us know to fail blinded received
HTLCs backwards with the malformed and invalid_onion_blinding error per BOLT 4.
When a peer is connected, OnionMessenger tracks it only if it supports
onion messages. On disconnect, we debug_assert that the peer was in a
state ConnectedPeer, failing when it is in the PendingConnection state.
However, we were mistakenly asserting for peers that we were not
tracking (i.e., that don't support onion messages). Relax the check to
not fail on the latter.
Historically, `ChannelMonitor`s had idea who their counterparty
was. This was fine, until `ChannelManager` started indexing by
peer, at which point it needed to know the counterparty when it saw
a `ChannelMonitorUpdate` complete. To address this, a "temporary"
map from channel ID to peer was added, but no upgrade path was
created for existing `ChannelMonitor`s to not rely on this map.
This commit adds such an upgrade path, setting the
`counterparty_node_id` on all `ChannelMonitor`s as they're updated,
allowing us to eventually break backwards compatibility and remove
`ChannelManager::outpoint_to_peer`.
For backwards compatibility reasons, we need to track a mapping
from funding outpoints to channel ids. To reduce diff, this was
previously done with channel IDs, converting the `OutPoint`s to
channel IDs before using the map.
This worked fine, but is somewhat brittle - because we allow
redundant channel IDs across different peers, we had to avoid
insertion until we had a real channel ID, and thus also had to be
careful to avoid removal unless we were using a real channel ID,
rather than a temporary one.
This brittleness actually crept in to handling of errors in funding
acceptance, allowing a remote party to get us to remove an entry by
sending a overlapping temporary channel ID with a separate real
channel ID.
Luckily, this map is relatively infrequently used, only used in the
case we see a monitor update completion from a rather ancient
monitor which is unaware of the counterparty node.
Even after this change, the channel -> peer tracking storage is
still somewhat brittle, as we rely on entries not being added until
we are confident no conflicting `OutPoint`s have been used across
channels, and similarly not removing unless that check has
completed.
Currently, our holder commitment broadcast only goes through the
`OnchainTxHandler` for anchor outputs channels because we can actually
bump the commitment transaction fees with it. For non-anchor outputs
channels, we would just broadcast once directly via the
`ChannelForceClosed` monitor update, without going through the
`OnchainTxHandler`.
As we add support for async signing, we need to be tolerable to signing
failures. A signing failure of our holder commitment will currently
panic, but once the panic is removed, we must be able to retry signing
once the signer is available. We can easily achieve this via the
existing `OnchainTxHandler::rebroadcast_pending_claims`, but this
requires that we first queue our holder commitment as a claim. This
commit ensures we do so everywhere we need to broadcast a holder
commitment transaction, regardless of the channel type.
Co-authored-by: Rachel Malonson <rachel@lightspark.com>
Once a commitment transaction is broadcast/confirms, we may need to
claim some of the HTLCs in it. These claims are sent as requests to the
`OnchainTxHandler`, which will bump their feerate as they remain
unconfirmed. When said commitment transaction becomes unconfirmed
though, and another commitment confirms instead, i.e., a reorg happens,
the `OnchainTxHandler` doesn't have any insight into whether these
claims are still valid or not, so it continues attempting to claim the
HTLCs from the previous commitment (now unconfirmed) forever, along with
the HTLCs from the newly confirmed commitment.
Implement the Display trait for Outpoint and utilize it in the codebase for monitoring outpoints.
Additionally, add log tracing for best_block_update and confirmed transactions.
solves #2348
Ensure that if we call construct_onion_packet and friends where payloads are
too large for the allotted packet length, we'll fail to construct. Previously,
senders would happily construct invalid packets by array-shifting the final
node's HMAC out of the packet when adding an intermediate onion layer, causing
the receiver to error with "final payload provided for us as an intermediate
node."
We previously assumed that the final node's payload would be ~93 bytes, and had
code to ensure that the filler encoded after that payload is not all 0s. Now
with custom TLVs and metadata supported, the final node's payload may take up
the entire onion packet, so we can't assume that there are 64 bytes of filler
to check.
`RouteGraphNode` currently recalculates scores in its `Ord`
implementation, wasting time while sorting the main Dijkstra's
heap.
Further, some time ago, when implementing the `htlc_maximum_msat`
amount reduction while walking the graph, we added
`PathBuildingHop::was_processed`, looking up the source node in
`dist` each time we pop'ed an element off of the binary heap.
As a result, we now have a reference to our `PathBuildingHop` when
processing a best-node's channels, leading to several fields in
`RouteGraphNode` being entirely redundant.
Here we drop those fields, but add a pre-calculated score field,
as well as force a suboptimal `RouteGraphNode` layout, retaining
its existing 64 byte size.
Without the suboptimal layout, performance is very mixed, but with
it performance is mostly improved, by around 10% in most tests.
Given `PathBuildingHop` is now an even multiple of cache lines, we
can pick which fields "fall off" the cache line we have visible
when dealing with hops, which we do here.
We'd previously aggressively cached elements in the
`PathBuildingHop` struct (and its sub-structs), which resulted in a
rather bloated size. This implied cache misses as we read from and
write to multiple cache lines during processing of a single
channel.
Here, we reduce caching in `DirectedChannelInfo`, fitting the
`(NodeId, PathBuildingHop)` tuple in exactly 128 bytes. While this
should fit in a single cache line, it sadly does not generally lie
in only two lines, as glibc returns large buffers from `malloc`
which are very well aligned, plus 16 bytes (for its own allocation
tracking). Thus, we try to avoid reading from the last 16 bytes of
a `PathBuildingHop`, but luckily that isn't super hard.
Note that here we make accessing
`DirectedChannelInfo::effective_capacity` somewhat slower, but
that's okay as its only ever done once per `DirectedChannelInfo`
anyway.
While our routing benchmarks are quite noisy, this appears to
result in between a 5% and 15% performance improvement in the
probabilistic scoring benchmarks.
This avoids bloating `CandidateRouteHop` with a full 33-byte
node_id (and avoids repeated public key serialization when we do
multiple pathfinding passes).
`TestRouter` tries to make scoring calls that mimic what an actual
router would do, but the changes in f0ecc3ec73
failed to make scoring calls for private hints or if we take a
public hop for the last hop.
This fixes those regressions, though no tests currently depend on
this behavior.
Rather than calling `CandidateRouteHop::FirstHop::node_id` just
`node_id`, we should call it `payer_node_id` to provide more
context.
We also take this opportunity to make it a reference, avoiding
bloating `CandidateRouteHop`.
These are used in the performance-critical routing and scoring
operations, which may happen outside of our crate. Thus, we really
need to allow downstream crates to inline these accessors into
their code, which we do here.
Short channel "ID"s are not globally unique when they come from a
BOLT 11 route hint or a first hop (which can be an outbound SCID
alias). In those cases, its rather confusing that we have a
`short_channel_id` method which mixes them all together, and even
more confusing that we have a `CandidateHopId` which is not, in
fact returning a unique identifier.
In our routing logic this is mostly fine - the cost of a collision
isn't super high and we should still do just fine finding a route,
however the same can't be true for downstream users, as they may or
may not rely on the apparent guarantees.
Thus, here, we privatise the SCID and id accessors.
f0ecc3ec73 introduced a regression in
the `remembers_historical_failures` test, and disabled it by simply
removing the `#[test]` annotation. This fixes the test and marks it
as a test again.
The only remaining step is to use the update_add blinding point in decoding
inbound onion payloads.
Error handling will be completed in upcoming commits.
When enqueuing a message for a node already awaiting a connection,
BufferedAwaitingConnection should be returned when a node is not yet
connected as a peer. However, it was only returned when the first
message was enqueued. Any messages enqueued after but before a
connection was established incorrectly returned Buffered.