When a new connection happens while already connected, the `Peer` will
switch to the new connection.
In the current implementation, upon receiving a `ConnectionReady`, the
`Peer` will kill the current connection, then sends back the
`ConnectionReady` message to itself. In the meantime, the
`PeerConnection` will happily forward any incoming messages to the `Peer`.
This opens up to a race in the `Peer`'s mailbox between the
`ConnectionReady` message, and any incoming messages from the new
connection. If the latter win, they will get dropped because the `Peer`
is in state `DISCONNECTED`. Typically those are `ChannelReestablish`
messages and channels will get stuck in state `SYNCING`.
This PR make the `Peer` atomically switch to the new connection, without
going back to the `DISCONNECTED` state. As a result, we now have a
`CONNECTED`->`CONNECTED` transition.
* Extract faulty channels selection from PaymentLifecycle
Move the logic of figuring out which channels/nodes should be ignored
when retrying after a payment failure out of the PaymentLifecycle.
We can figure this out looking only at the `PaymentFailure` generated,
and the multi-part logic could leverage these helpers.
* Refactor RouteResponse
It was useless to return `ignoreNodes` and `ignoreChannels`, it's rather
the responsibility of the caller (PaymentLifecycle) to store and update
these sets.
Preparing for the MPP move inside the router, we introduce a Route class
and let RouteResponse return a collection of Routes.
This creates some ugliness in PaymentLifecycle because of the `routePrefix`,
but this is just temporary: the `routePrefix` "hack" will be removed soon.
* Use channel capacity and balance in path-finding
The path finding algorithm uses channel capacity instead of htlcMaximumMsat.
It also takes into account channel balance when available and excludes
channels that don't have enough funds to relay the payment.
This change also fixes an off-by-one error in weight computation: we were
incorrectly applying a channel's fee to the amount that needs to be relayed
through that channel (whereas this is instead what the node needs to receive
to collect enough fee *before* relaying).
* Refactor Graph file
Add documentation, update comments, rename fields and reformat to (helpfully)
make the code clearer.
* Simplify path-fiding implementation
There were a couple confusing steps in the implementation of Yen's algorithm.
The first one was the computation of the `edgesToIgnore` and the specific
handling of the case i = 0. This specific case wasn't needed and made the
code a bit hard to read.
The second one was the weight provided to dijkstra for spur paths.
The weight of the root path was applied to the target node. It was probably
an attempt to take into account the fact that dijkstra wasn't computing
a complete path and that fees may not match, but it couldn't really work.
I removed that and added a fee check at the end of the path-finding.
* Update graph balance for duplicate channel_update
This case regularly happens after a restart: the router already has the
latest channel_update for that channel, but we want to update the graph's
balances because they are all at `None` after a restart.
* minor: catch harmless unhandled events
This prevents unnecessary warnings to clog up the logs.
* fix race condition in test
Changing the fake ip address from 1.2.3.4:42000 to localhost:42000
in 32f15c85eb made the dummy connection
fail much faster, creating a race in the test. Reverting to the previous
ip and increasing the timeout should improve things a bit.
* Unlock transaction inputs if tx cannot be published
In some cases, funding a tx will work but publishing may fail (because mempool fees are not met for example).
In that case we need to make sure that the tx inputs are unlocked.
We should validate the announcement signatures in the channel too.
We still validate them in the router, but it's a different layer and the
check should happen as soon as possible.
We don't want to keep the channel open if the peer is sending us garbage.
This allows decoupling the reconnection task from the actual client
creation.
Also improved tests.
Co-Authored-By: Bastien Teinturier <31281497+t-bast@users.noreply.github.com>
This test was randomly failing. There may be a rounding error or a bit of
randomness somewhere in Bitcoin Core 0.19.1, sometimes a feerate right
below the limit is accepted into the mempool.
We've been witnessing random test suites freezes (since ages).
We've observed that when these freezes happen, there are usually a lot of
"too many open files" errors raised by the OS.
The backup handler is a likely culprit as the IntegrationSpec is running
multiple nodes and exchanging HTLCs at a fast rate.
At least it won't hurt disabling it in tests, and will speed up the
test suite.
We also increase the file limits in CI providers, when possible.
Instead of waiting for htlc-success txs to be confirmed, eclair also looks
at mempool txs to detect preimages as soon as possible.
This has been the case for a very long time, but our integration tests
didn't showcase this correctly.
Refactored common watcher test helpers and added tests to
ZmqWatcherSpec.
This is almost a drop-in replacement. I had to relaxed compiler
parameters to allow deprecated features though.
Main changes:
- relaxed compiler parameters to minimize impact (e.g. allow
deprecated features)
- `scala.collection.JavaConverters` -> `scala.jdk.CollectionConverters`
- `MultiMap` -> `MultiDict`
Compilation is 25% faster on my machine, compiler is a bit more strict
(it found an "invalid comparison" bug).
Do all the changes that will be required and are already possible to
minimize the diff:
- update dependencies
- `'something` -> `Symbol("something")`
- `BigDecimal.xValue()` -> `BigDecimal.xValue`
- `Map.filterKeys` -> `Map.filterKeys.toMap` (same for `Map.mapValues`)
- `def myMethod(...)` -> `def myMethod(...): Unit`
Router uses channel events (LocalChannelUpdate and AvailableBalanceChanged)
to track the balance of local channels.
This information will be used in path-finding to improve path-finding,
especially in the MPP case.
The information is added to the graph structures, but it's not used yet in
path-finding. Some A/B testing will be needed before we can use those
safely for the path-finding algorithm.
When switching to a new connection while already connected, peer
immediately kills the current connection and sends back the
`PeerConnection.ConnectionReady` to itself. Since #1379, the sender of
this message is assumed to be the `PeerConnection` actor. If peer
doesn't preserve the sender by using a `forward` instead of a `tell`, it
will assume that itself is the `PeerConnection`, which will break
everything.
Transaction generation functions used to throw exceptions.
We have a good TxGenerationSkipped type to express potential errors,
so these functions should return an Either to make the contract explicit.
We update transaction fees at every block (ie every 10 minutes). While this
works well when the remote peer is a node that's online for more than 10 minutes,
it's an issue for mobile wallets that usually come online for a few minutes
and then disconnect.
We want to make sure we send these wallet peers an update_fee when one
is needed, so we now check for feerate updates on reconnection.
Fixes#1381.
In case a channel has been pruned, and we receive a recent update, we
"unprune" it and immediately request the channel announcement again
(which will cause us to revalidate it). We also discard the update,
assuming that we will receive it again with the channel announcement.
We were using a `GossipDecision.Duplicate` rejection for the channel
update, which is inaccurate. This PR introduces a new
`GossipDecision.RelatedChannelPruned`.
This commit reverts #1278 where connecting to an Electrum server
would disable the SSL check. The correct way to handle that is to
allow users to choose their SSL behavior in the frontend applications.
If our latest successful connection attempt was less
than 30 seconds ago, we pick up the exponential
back-off retry delay where we left it. The
goal is to address cases where the reconnection
is successful, but we are disconnected right away.
* Support additional user defined TLVs when sending a payment (both single-part and MPP)
* Allow encoding and decoding of even TLV types above the high range
* Add missing cases to PostRestart
When a channel is closed we want to remove its HTLCs from our
list of pending broken HTLCs (they are being resolved on-chain).
We should also ignore outgoing HTLCs that have already been
settled upstream (which can happen when downstream is closing).
* Watch for downstream HTLC resolved on-chain
When a downstream channel is closing, we can safely fail upstream the
HTLCs that were either timed out on-chain or not included in the
broadcast commit transaction.
Channels will not always raise events about those after a reboot, so we
need to inspect the channel state and detect such HTLCs.
* Add helper function to HTLC scripts
To extract the payment_hash or preimage from an HTLC script seen on-chain.
* Cleanup on-chain HTLC timeout handling for MPP
With MPP, it's possible that a channel contains multiple HTLCs for the
same payment hash, and potentially even for the same expiry and amount.
We add more fine-grained handling of HTLC timeouts that share the same
payment hash. This allows a cleaner handling after a restart, and makes
sure we correctly detect failure that should be propagated upstream.
Otherwise we wouldn't be losing any money, but some channels may be closed
that we can avoid.
* Handle out-of-order htlc-timeout txs
It may happen that a commit tx and some htlc-timeout txs end up in the
same block. In that case, there is no guarantee on the order we'll receive
the confirmation events.
If any tx in a local/remoteCommitPublished is confirmed, that implicitly
means that the commit tx is confirmed (because it spends from it).
So we can consider the closing type known and forward the failure upstream.
* removed the `Direction` class
* improved the non-reg test for htlcs
- check actual content instead of only success and roundtrip
- use randomized data for all fields instead of all-zero
- check the remaining data, not only the decoded value (codecs are
chained so a regression here will cause the next codec to fail)
Co-Authored-By: Bastien Teinturier <31281497+t-bast@users.noreply.github.com>
* Sort commit transaction outputs using BIP69 + CLTV as tie-breaker for offered HTLCs
* Type DirectedHtlc:
We now use a small hierarchy of classes to represent HTLC directions.
There is also a type alias for a collection of commitment output links.
* front now handles ping/sync
Peer has been split in two and now handles only channel related stuff.
A new `PeerConnection` class is in charge of managing the BOLT 1 part
(init handshake, pings) and has the same lifetime as the underlying
connection.
Also, made `TransportHandler` be a child of `PeerConnection` by making
the `remoteNodeId` an attribute of the state of `PeerConnection` instead
of a constructor argument (since we cannot be sure of the remote nodeid
before the auth handshake is done). Now we don't need to worry about
cleaning up the underlying `TransportHandler` if the `PeerConnection`
dies.
* remove `Authenticator`
Instead of first authenticating a connection, then passing it to the
`PeerConnection` actor, we pass the connection directly to the
`PeerConnection` and let it handle the crypto handshake, before the LN
init. This removes a central point of management and makes things easier
to reason about. As a side effect, the `TransportHandler` actor is now a
child of `PeerConnection` which gives us a guarantee that it dies when
its parent dies.
* separated connection logic from `Peer`
The `ReconnectionTask` actor handles outgoing connections to a peer. The
goal is to free
the `Peer` actor from the reconnection logic and have it just react to
already established
connections, independently of whether those connections are incoming or
outgoing.
The base assumption is that the `Peer` will send its state transitions
to the `ReconnectionTask` actor.
This is more complicated than it seems and there are various corner
cases to consider:
- multiple available addresses
- concurrent outgoing connections and conflict between
automated/user-requested attempts
- concurrent incoming/outgoing connections and risk of reconnection
loops
- etc.
Co-Authored-By: Bastien Teinturier <31281497+t-bast@users.noreply.github.com>
* Refactor timed out HTLC helpers: directly take a DATA_CLOSING
and extract the relevant parts.
* ClosingStateSpec: test dust HTLCs
* Improve ClosingStateSpec
* Clean up usage of AddHtlcFailed
We were abusing AddHtlcFailed in some cases where an outgoing HTLC
was correctly added, but was later settled on-chain (fulfilled, timed
out or overridden by a different commit transaction).
These cases are now specifically handled with new Relayer.ForwardMessage
dedicated to on-chain settling.
* Refactor Relayer's ForwardMessages
ForwardFail and ForwardFulfill are now traits.
Handle both on-chain and remote fail/fulfills.