manualTransition is a fix for old akka version where onTransition events
are not fired for same state transitions. Must not be used if states are
different, otherwise duplicate events are fired.
Light clients don't always validate channels by fetching the blockchain tx.
That means they don't have access to the exact capacity.
When that happens, we can fallback to htlc_maximum_msat if
available, or to a default capacity (otherwise path-finding will ignore
these edges).
There are several scenarii when phoenix starts up:
(a) peer sends a `channel_reestablish` for an unknown channel
This is the restore use case. Phoenix will decrypt the attached
`channel_data` and restore the channel
(b) peer sends a `channel_reestablish` for a known channel
Phoenix decrypts the attached `channel_data` and compares them to
the data it gots. Then:
(1) if the remote data is more recent than local ones, Phoenix will
use the remote data. This is the multi device use case.
(2) otherwise, Phoenix keeps its data. This is the normal use case.
(c) peer doesn't send a `channel_reestablish` for a known channel.
This would happen if the peer lost its data. In that case Phoenix
will wait forever for a `channel_reestablish` and the channel will
stay in OFFLINE. User can decide to force close the channel.
Until now, we were only handling (a) and (b)(2). (b)(1) would lead
to a force close of the channel, because Phoenix would enter the data
recovery scenario.
We now make Phoenix *react* to the peer's `channel_reestablish`, which
creates the (c) corner case, which seems acceptable. If the peer lost
its data, the only way out is to force close the channel anyway.
When we announce a new public channel, make sure we don't override the
balance information with None.
Clean up IntegrationSpec warnings.
Fix PaymentLifecycle test: this test was broken by the recent changes in
BaseRouterSpec.
* Electrum client: downgrade log-level for errors that happened before a proper handshake
We only care for errors that we received during or after the "handshake" (i.e. the exchange of "version" messages).
Other errors cause by dead/unresponsive servers... which are very common on testnet just add spams to the logs.
* Electrum client pool: downgrade log level for errors with our secondary servers
We care for errors with our master electrum servers.
* Update Electrum checkpoints
Move the maximum fee computation outside of `findRoute`: this should be
done earlier in the payment pipeline if we want to allow accurate fee control
for MPP retries.
Right now MPP uses approximations when retrying which can lead to payments
that exceed the maximum configured fees. This is a first step towards
ensuring that this situation cannot happen anymore.
* Replace ArrayDeque by Queue
This is clearly not a hot path. This collection will have less than
10 elements and the bottleneck is rather on the dijkstra call.
ArrayDeque doesn't exist in Scala 2.11 so it makes the merge to
eclair-mobile more cumbersome.
* Use standard Scala collections in Dijkstra
Collections performance has improved greatly in Scala 2.13.
This change yields a consistent 10% improvement compared to the previous
implementation on my laptop based on the mainnet graph.
Both `Client` and `TransportHandler` were watching the connection actor,
which resulted in undeterministic behavior during termination of
`PeerConnection`.
We now always return a message when a connection fails during
authentication.
Took the opportunity to add more typing (insert
deathtoallthestring.jpg).
When a new connection happens while already connected, the `Peer` will
switch to the new connection.
In the current implementation, upon receiving a `ConnectionReady`, the
`Peer` will kill the current connection, then sends back the
`ConnectionReady` message to itself. In the meantime, the
`PeerConnection` will happily forward any incoming messages to the `Peer`.
This opens up to a race in the `Peer`'s mailbox between the
`ConnectionReady` message, and any incoming messages from the new
connection. If the latter win, they will get dropped because the `Peer`
is in state `DISCONNECTED`. Typically those are `ChannelReestablish`
messages and channels will get stuck in state `SYNCING`.
This PR make the `Peer` atomically switch to the new connection, without
going back to the `DISCONNECTED` state. As a result, we now have a
`CONNECTED`->`CONNECTED` transition.
* Extract faulty channels selection from PaymentLifecycle
Move the logic of figuring out which channels/nodes should be ignored
when retrying after a payment failure out of the PaymentLifecycle.
We can figure this out looking only at the `PaymentFailure` generated,
and the multi-part logic could leverage these helpers.
* Refactor RouteResponse
It was useless to return `ignoreNodes` and `ignoreChannels`, it's rather
the responsibility of the caller (PaymentLifecycle) to store and update
these sets.
Preparing for the MPP move inside the router, we introduce a Route class
and let RouteResponse return a collection of Routes.
This creates some ugliness in PaymentLifecycle because of the `routePrefix`,
but this is just temporary: the `routePrefix` "hack" will be removed soon.
* Use channel capacity and balance in path-finding
The path finding algorithm uses channel capacity instead of htlcMaximumMsat.
It also takes into account channel balance when available and excludes
channels that don't have enough funds to relay the payment.
This change also fixes an off-by-one error in weight computation: we were
incorrectly applying a channel's fee to the amount that needs to be relayed
through that channel (whereas this is instead what the node needs to receive
to collect enough fee *before* relaying).
* Refactor Graph file
Add documentation, update comments, rename fields and reformat to (helpfully)
make the code clearer.
* Simplify path-fiding implementation
There were a couple confusing steps in the implementation of Yen's algorithm.
The first one was the computation of the `edgesToIgnore` and the specific
handling of the case i = 0. This specific case wasn't needed and made the
code a bit hard to read.
The second one was the weight provided to dijkstra for spur paths.
The weight of the root path was applied to the target node. It was probably
an attempt to take into account the fact that dijkstra wasn't computing
a complete path and that fees may not match, but it couldn't really work.
I removed that and added a fee check at the end of the path-finding.
* Update graph balance for duplicate channel_update
This case regularly happens after a restart: the router already has the
latest channel_update for that channel, but we want to update the graph's
balances because they are all at `None` after a restart.
* minor: catch harmless unhandled events
This prevents unnecessary warnings to clog up the logs.
* fix race condition in test
Changing the fake ip address from 1.2.3.4:42000 to localhost:42000
in 32f15c85eb made the dummy connection
fail much faster, creating a race in the test. Reverting to the previous
ip and increasing the timeout should improve things a bit.
* Unlock transaction inputs if tx cannot be published
In some cases, funding a tx will work but publishing may fail (because mempool fees are not met for example).
In that case we need to make sure that the tx inputs are unlocked.
We should validate the announcement signatures in the channel too.
We still validate them in the router, but it's a different layer and the
check should happen as soon as possible.
We don't want to keep the channel open if the peer is sending us garbage.
This allows decoupling the reconnection task from the actual client
creation.
Also improved tests.
Co-Authored-By: Bastien Teinturier <31281497+t-bast@users.noreply.github.com>
This test was randomly failing. There may be a rounding error or a bit of
randomness somewhere in Bitcoin Core 0.19.1, sometimes a feerate right
below the limit is accepted into the mempool.
We've been witnessing random test suites freezes (since ages).
We've observed that when these freezes happen, there are usually a lot of
"too many open files" errors raised by the OS.
The backup handler is a likely culprit as the IntegrationSpec is running
multiple nodes and exchanging HTLCs at a fast rate.
At least it won't hurt disabling it in tests, and will speed up the
test suite.
We also increase the file limits in CI providers, when possible.
* Electrum: refactor test
No functional changes, simulated test is re-written so that each test start with its own wallet and is independent from other tests.
* Electrum: add test that illustrates issue #1396
When the current tip is received again because of the comparison bug, wallet processes the new block and emits
a transaction confidence change event, but is should not do anyhting.
* Electrum: fix invalid header comparison (fixes#1396)