When we have a trusted relationship with some of our peers (business
relations, family members, our own mobile wallet, etc) it makes sense to
relax the feerate mismatch constraint.
This must be done per-node, to avoid leaving the gates open for attackers.
The `supportedMandatoryFeatures` set made less and less sense as we start
supporting more features and allowing users to turn features on and off
depending on the peers they connect to.
Instead of `Peer` watching `PeerConnection`, we have `PeerConnection`
notify `Peer` when it dies. Not relying on a watch allows for harder
security settings when enabling akka cluster.
In the revoked commit tx case, both nodes are competing to claim the
HTLC outputs from the commit tx.
The test incorrectly assumed that node F would always win that race.
When a channel gets confirmed while nodes are disconnected, recent lnd
nodes sometimes fail to send their channel_reestablish message and instead
send a funding_locked first (due to a race condition on their side).
When that happened, the channel stayed stuck in the SYNCING state.
To avoid that, we trigger a reconnection until the race condition is eventually
won by the channel_reestablish.
There is no good reason to completely disable `var_onion_optin` and it can
even be harmful: pending HTLCs after a restart may be lost if they used a
variable-length onion, causing a loss of funds and potential channel closes.
We were calling `nodeParams.features` from inside the channel, which is
problematic because we may have overridden those features for specific
peers. This is now fixed.
* Send payment through specific channels
While `sendtoroute` was letting you provide a custom route as a list of
nodes, it made it difficult to have fine-grain control over the channels
used by payments.
The API now allows choosing the exact channels that will be used for the
payment; in particular, this will be helpful when consolidating and
rebalancing multiple channels to a given node.
Fixes#1472
* Fix serialization regression
#1520 introduced a regression in the serialization of channel data,
impacted the `channels` and `channel` APIs.
We restrict the custom command response serializer to revert to the
previous behavior.
Features are now provided by the `switchboard`, in response to the
`PeerConnection.Authenticated` message.
The `switchboard` will also decide whether or not we sync with that
peer, depending on the `syncWhiteList` configuration.
* [ChannelRelayer] Expose Wrapped messages to tests
Exposing the private wrapped messages to tests allows removing the
dependency on capturing logs which felt very brittle.
* Prioritize low capacity channels during relay
This makes it more difficult for attackers to "squat" high-capacity channels
by sending HTLCs and then hodling them.
It results in less locked liquidity during this kind of attacks.
When using MPP, if we can't find a route, we need to add an entry to the
DB. Otherwise when users query their payment status, nothing will be
returned which is a bad UX.
Fixes#1512
We were previously only counting the additional HTLC at twice
the current feerate.
An HTLC in isolation doesn't make much sense: the feerate
applies to the whole commit tx. Our fee buffer needs to account
for a x2 feerate increase on the commit tx + an additional htlc.
Note that this introduces another subtlety. The commit tx fee at twice the
current feerate may actually be lower than the commit tx fee at the current
feerate (if the commit tx contains many htlcs that are only slightly above
the trim threshold).
Don't swallow bitcoind exceptions: we wrap them but preserve the
original one.
Allow configuring bitcoin core wallet: it makes sense to allow users
to use a different wallet from the default one.
There's one important caveat: once set, users shouldn't change it while
they have open channels. We mention it clearly in the documentation.
Fixes#1538
This change makes `IntegrationSpec` an abstract class trait that defines
common utilities used in most integration tests, and splits the actual tests
in two separate files that extend that class. Channel tests have themselves
been split between standard and anchor commitments to gain extra
parallelism.
The tests themselves aren't changed at all, except for the TCP ports they
use, and to remove nodes that have become useless in each of the specs. `expectMsg`
have been increased to 60 seconds to account for the increased parallelization.
Some tests have been refactored, cleaned up and block expiries have been reduced.
Allow plugins to register unknown features and message types they're able to handle.
This allows plugins to add new features independently of what eclair-core understands.
Plugins are able to receive and send arbitrary lightning messages, and advertise support
for non-standard features freely.
Change the static_remotekey behavior to use a wallet basepoint only when
using "standard" commitments; with anchor outputs we need to claim this
output (it doesn't directly spend to our wallet) so we must use
key-derivation with our lightning keys.
Correctly claim all outputs in unilateral cases, and add corresponding
test cases.
Anchor output commitments should now work end-to-end (but there's no support
for fee bumping yet).
We were watching actors for ignored watches (WatchLost), and printing
useless logs when ignoring duplicate watches.
Co-authored-by: Bastien Teinturier <31281497+t-bast@users.noreply.github.com>
In some tests we are parsing logs to prevent race conditions. This
change adds more leeway to wait for logging events, because they may be
delayed when we run a lot of tests in parallel.
The replies are always ignored currently anyway. A new trait `NoReplyTo`
has been introduced. Those commands have a particular workflow because
they are persisted in the pending relay db.
* introduce a new relay identifier
In a channel relay, it will be unique for all retries.
In a trampoline relay, it is equal to what previously was the parent
payment id.
* moved and cleaned up remaining relayer tests
Now all relay-related tests are in the `relayer` package.
The only impactful change is that by default on regtest and testnet
fallback fee (used when there is not enough historical data to correctly
estimate the feerate) is now set to 0, whereas it was set to 0.0002 btc
in previous versions.
We set it manually in tests `bitcoin.conf` to preserve the previous behavior.
* typeify response to command messages
* Status.Failure(AddHtlcFailed)->RES_ADD_HTLC_FAILED
`AddHtlcFailed` is renamed `RES_ADD_FAILED` and it is just a particular
kind of `CommandError`
* remove Status.Failure from payment lifecycle
* use replyTo pattern in relayers
* return ForwardMessage in CommandResponse
Instead of having the channel send `Relayer.ForwardMessage` to the
relayer, we encapsulate the `Relayer.ForwardMessage` within a
`CommandResponse[CMD_ADD_HTLC]`.
It looks like a cosmetic change, but it's not: now when the relayer
sends a `CMD_ADD_HTLC` to the channel, it will receive one or more
`CommandResponse[CMD_ADD_HTLC]`, for example:
- success scenario:
- `RES_SUCCESS[CMD_ADD_HTLC]`
- `RES_ADD_COMPLETED[RelayBackward.RelayFulfill]`
- htlc failed by downstream
- `RES_SUCCESS[CMD_ADD_HTLC]`
- `RES_ADD_SETTLED[RelayBackward.RelayFail]`
- command rejected:
- `RES_ADD_FAILED[_]`
- peer disconnected before signing:
- `RES_SUCCESS[CMD_ADD_HTLC]`
- `RES_ADD_FAILED[ChannelUnavailable]`
In addition to that, `RelayMessage` have been slightly refactored, to
better separate between `RelayForward` and `RelayBackward`.
This paves the way for typing `CMD_ADD_HTLC.replyTo` to
`ActorRef[CommandResponse[CMD_ADD_HTLC]]` and have the channel send all
related messages to the `replyTo` actor. Note that the `RelayForward`
actor will always be sent to the relayer, which makes sense since there
was no related command (the htlc was sent by the peer).
NB: `CMD_ADD_HTLC` is a special case, for all other commands there is
exactly one `CommandResponse[CMD_*]`, either `RES_SUCCESS[CMD_*]` or
`RES_FAILURE[CMD_*]`.
* use replyTo pattern in payment lifecycle
We were already close to this pattern with constructs like `case class
WaitingForRoute(sender: ActorRef, ...`
* typeify Origin/Upstream classes
The relationship between `Origin` and `Upstream` was obscure and they were
defined in two different files.
`Upstream` is the source of the payment in the context of a chain
of htlcs. For example, in the case of a typical relayed payment, it
would be an incoming htlc.
`Origin` is the source of the payment in the application: it can be an
actor, or nothing if the reference was lost after a restart.
Instead of using an `Option[ActorRef]` to differentiate between
known/unknown origin, new `Hot/Cold` traits that extend `Origin` have
been introduced. This means that now the `PostRestartHtlcCleaner` only
deals with `Cold` origins, whereas the `NodeRelayer` only handles `Hot`
origins. The channel codec will encode from both `Hot/Cold` origins, but
will only decode to `Cold`.
* refactor response types
Generalize the `CommandResponse[Command]` pattern for all commands.
There seem to be something ambiguous about the way we deal with closing
commands during the initialization phase of the channel. We used to
conflate `CommandResponse[CMD_CLOSE]` and
`ChannelOpenResponse.ChannelClosed` but those are not sent to the same
actors.
It turns out our testing of the `EclairImpl` class is very weak. We
could use this class in `IntegrationSpec` instead of sending raw
messages to channels.
* handle channel-relay in the post-htlc-restart too
The downstream HTLC-timeout integration test was sometimes hanging waiting
for the local commit tx to appear inside the mempool.
The reason was that the remote peer was also trying to get its version of
the commit tx in the mempool, and when it won that race we weren't testing
the right thing.
Simply disconnecting the two nodes fixes the issue as it ensure only the
local node will be broadcasting his commit tx.
Currently, the fundee computes it after receiving the `open_channel`
message, by calling `ChannelVersion.pickChannelVersion`.
Instead, we call this method in the `Peer` and add the resulting channel
version to the initialization parameters.
In the end, the behavior is exactly the same, but:
- it is more consistent with how the funder works;
- it may make sense to compute the channel version a bit earlier in the
process, because we may initiate a different kind of channel actor in
the future for some particular versions (?)
* Correctly handle the case where a tx has both a sequence and locktime set.
* Add tx publish tests to ZmqWatcher and ElectrumWatcher.
* Add documentation on watcher types.
We were previously returning TemporaryNodeFailure for trampoline payments
to neighbour nodes with depleted liquidity. This prevented us from
finding alternative, indirect routes.
In such cases, we now roughly estimate whether the fee is "big enough" to
allow finding alternative routes; if not we ask the sender to raise the fee
before telling them we're lacking liquidity.
A more costly alternative that we may implement in the future would be to
run the path-finding to find a route without bounding the fee, and send
that information back to the sender.
* add replyTo to Register messages
In preparation of the migration to typed actors, we need to remove the
use of `sender`, which doesn't exist in typed actors (and even returns
dead letters when mixing typed/untyped actors).
This also means that, in the tests, we should stop using the
`TestProbe.send()` method, which also relies on the recipient replying
to `sender`. Instead, we should use `targetActor ! msg` which guarantees
that the sender is void.
Using the `replyTo` pattern doesn't mix well with the `ask` pattern,
because we don't know the reference to the temporary actor. To deal with
that, we set `replyTo = ActorRef.noSender` which is a bit hackish.
* don't use `Status.Failure` in register responses
Encapsulating error responses in a `Status.Failure` is convenient when
using the `ask` pattern because those messages are automatically
converted to a failed future.
It does however force us to use exceptions, and make things more
complicated, especially when moving to _typed_ actors.
* add replyTo to Register messages
In preparation of the migration to typed actors, we need to remove the
use of `sender`, which doesn't exist in typed actors (and even returns
dead letters when mixing typed/untyped actors).
This also means that, in the tests, we should stop using the
`TestProbe.send()` method, which also relies on the recipient replying
to `sender`. Instead, we should use `targetActor ! msg` which guarantees
that the sender is void.
Using the `replyTo` pattern doesn't mix well with the `ask` pattern,
because we don't know the reference to the temporary actor. To deal with
that, we set `replyTo = ActorRef.noSender` which is a bit hackish.
Co-authored-by: Bastien Teinturier <31281497+t-bast@users.noreply.github.com>
We don't need one node per force-close scenario, we can use different
channels to the same node which makes the spec simpler.
Force-close tests now have better isolation: they create the channel at
the beginning of the test, and the test ends with that channel closed.
Common parts have been refactored as well, which will make it easier to
add more tests for anchor outputs without duplicating too much code.
These tests have been heavily enriched and refactored; they previously
relied on many unwritten assumptions about event ordering that appeared
as soon as I tried updating them (e.g. to use push_msat to ensure both
sides had an output in the commit tx).
* Type fee rates info
Fixes#1188
* Fix vsize comment
This is an alternative to #1425.
This may not correctly represent what Bitcoin Core does, it's likely that
we can in fact use a value smaller than 253, but this shows how we choose
to err on the side of safety with that calculation.
* Add 1008 feerate block target
Fixes#1486
It can be useful to sign arbitrary messages with the key associated with our node_id (to prove ownership of a node).
Adds 2 new API commands:
eclair-cli signmessage --msg=${message}
eclair-cli verifymessage --msg=${message} --sig=${signature}
We previously refused to relay HTLCs that would result in a low expiry
delta for the next node. However it's not our decision to make, it's the
remote's. We should forward these HTLCs and let the remote decide whether
they fail them because the expiry is too close or fulfill them.
Allow activating anchor outputs and have fully operating channels
during normal operation (open, add/fulfill/fail htlcs, close).
Interop testing has been done with lnd, and there is only one pending
issue during mutual close, where they incorrectly compute the closing
amounts, which they should fix soon.
However, anchor outputs should NOT be activated yet as unilateral
close scenario are not fully handled yet.
We don't do any kind of automatic fee bumping either; this will be done
later, once we have PSBT support and once bitcoind offers the
`psbtbumpfee` RPC (see bitcoin/bitcoin#18654).
* allow to explicitly disable features from configuration
* improved Features.toString
* explicitely disable unlisted features
Co-authored-by: pm47 <pm.padiou@gmail.com>
Watching the blockchain is an asynchronous task, so it is "always late"
and it doesn't matter if we don't synchronously set the watch back when
the node is restarted.
It allows us to smoothen the load if needed at startup, because setting
tens of thousands of `watch-spent` all at once at startup is pretty
expensive.
Co-authored-by: Bastien Teinturier <31281497+t-bast@users.noreply.github.com>
* Activate wumbo by default
This is safe as `max-funding-satoshis` is set to 16777215 sats, which is
the non-wumbo limit.
If users want to increase the maximum channel size, they can update this
configuration value.
* Update default minFinalCltvExpiryDelta
See https://github.com/lightningnetwork/lightning-rfc/pull/785
* Set minFinalCltvExpiryDelta in invoices
Our default fulfill-safety-window is now greater than the spec's default
min-final-expiry-delta in invoices, so we need to explicitly tell payers
what value they must use.
Otherwise we may end up closing channels if a block is produced while we're
waiting for our peer to accept an UpdateFulfillHtlc.
* Introduce transaction commitment format trait
This lets us re-use most of our existing transaction utilities for
anchor outputs.
Channels will always use the default commitment format (current spec),
but we will be able to change that by setting the `ChannelVersion` to
something appropriate for anchor outputs (and later other commitment formats).
* Clean up transaction tests
Remove obsolete claim-htlc tests: they used a different format from what
lightning uses and are redundant with existing tests in TransactionsSpec.
Add missing test cases in TransactionsSpec.
* Implement anchor outputs commitment transaction
Add support for creating an anchor outputs commitment transaction,
without any HTLC.
Support for the new HTLC format will be added in a separate commit.
* Refactor TestVectorsSpec
For anchor outputs, some values will not match the hard-coded ones.
We will instead need to read from the test vector.
* TestVectorsSpec filter unimplemented anchor outputs
Anchor outputs isn't fully implemented yet, so we need to ignore the tests
that are currently not passing.
* Implement anchor outputs HTLC transactions
Add support for HTLC transactions with a 1-block relative delay.
Add missing anchor outputs spec tests.
Add missing tests for some revoked paths.
* Always subtract both anchors from funder amount
For simplicity's sake, we always subtract both anchors from the funder's
main output, even if only one anchor materializes.
Shuffle methods around between ExtendedBitcoinClient and
BitcoinCoreWallet to help readability and separate concerns.
Add some documentation and fix harmless warnings.
Add bitcoin client tests.
* Non blocking bounded mailbox for backup handler
Akka 2.4 introduced a non-blocking unbounded mailbox.
It makes sense to upgrade the backup handler to that mailbox type.
We also add some backup-related metrics.
* Wrap DB calls to record metrics
We record the number of times each operation is executed and its
duration.
When we receive an `UpdateFulfillHtlc` from a downstream channel, it is
critical that we don't lose it because that is what allows us to pull
funds from the corresponding upstream channel(s). But this(ese) upstream
channel(s) may very well be currently OFFLINE and unable to update the
commitment right away, so we need to remember it for later. Same applies
to an `UpdateFailHtlc` (although it is less critical), because we don't
want the upstream htlc to timeout and cause a channel to force-close.
We were previously relying on a `CommandBuffer` actor, that uses a
"pending relay" database to store commands. Once the command is processed
by the target channel, it sends back an acknowledgment to the
`CommandBuffer`, which then removes the command from the database.
Unacknowledged commands are replayed at each reconnection or app restart.
This works well, but the flow is a little cumbersome and not easy to
understand.
With this PR, the sender (channel, payment handler, ...) is responsible for
storing commands to the pending relay db, instead of the command buffer,
which is completely removed. The target channel will acknowledge the
message and remove it from the pending relay db.
In the end, the logic is the same as before, we have just dropped the
intermediate `CommandBuffer`.
* Split feerate mismatch configuration
We want to be much stricter with feerates that are below our estimation
than feerates that are above it.
This also makes this configuration parameter easier to understand
for end users.
* Tolerate feerate mismatch while channel is unused
We can relax the conditions where we close a channel because of a feerate
mismatch: when the channel has no pending HTLCs, it's ok to temporarily
disagree on the feerate.
While we disagree on feerates, we don't use this channel to offer outgoing
HTLCs. If we receive an incoming HTLC, we have to close the channel because
that HTLC would be at risk (incorrect feerate).
This mechanism gives us time to adapt to feerate changes, hopefully reducing
the amount of unnecessary channel closures.
The channelstats API only returns results for the *outgoing* channels
used when relaying. We must also include the *incoming* channels, otherwise
it looks like they're inactive which doesn't reflect their real usage.
Fixes#1465
- Test was not executed (because the "tests" variable was an iterator that was emptied by the call to .size())
- HTLC regex had to be updated to skip over the HTLC number that was added to the reference test vectors
* Add metric to track onion payload format
This will be useful to decide when we can safely phase out support for
the legacy format.
* Add metric to track htlcs in flight
We track both the number of HTLCs and their amounts.
We track this at the channel level and globally.
If all trampoline retries fail, we should convert the error to a route
not found. We tried multiple fee targets and none of those was enough to
allow the trampoline node to find a satisfying route.
MPP lifecycle shares preimage as soon as received.
This allows removing the use of the node-relayer as a passthrough for
fulfills, it can now simply listen to this event.
Long term, this could be sent to the event stream to share with more actors.
* PaymentLifecycle cleanup
Remove temporary hooks added for first version of MPP (route prefix,
empty routes, etc).
Allow specifying the whole route (not only nodeIds) in SendPaymentToRoute.
* Rework MultiPartPaymentLifecycle
Use the Router's new MPP RouteRequest.
Remove the "blind" split based on channel balances.
Reactivate trampoline relay to MPP non-trampoline recipient.
* Add MPP payment metrics
* Activate MPP feature by default
This provider will save the feerates retrieved by another provider to
database.
This feature can be used to retrieve the last used feerates when starting
the node, which will save time. This can have a significant effect on nodes
running with a slow connection (e.g. mobile devices).
Note that this commit does not affect the current setup and does not
actually create the database, the feature must be implemented separately.
Fixes#1447
Legacy codecs are isolated in a separate file, with a visibility restricted to "package" in order to reduce the risk of using those codecs. Also codecs are restricted to `decodeOnly` for the same reason.
* localPaymentBasepoint->staticPaymentBasepoint
Use `getOrElse` on the option value to decide between static/dynamic
payment point, instead of the `ChannelVersion` bit.
* define explicit method to test channel features
Also reduce visibility of a few members of `ChannelVersion`, and some
cleanup.
* use `if/else` instead of `match` for version bit
Leverage Yen's k-shortest paths and a simple split algorithm
to move MPP entirely inside the Router.
This is currently unused, the multipart payment lifecycle needs
to be updated to leverage this new algorithm.
We go to the `CLOSING` if and only if the funding tx has been spent by
one transaction. The `require` is absolutely necessary. We could
probably enforce this constraint at the compilation level by more clever
typing but that's another matter.
It should return stats for all channels (funder and fundee).
The previous code was incorrectly filtering on channels for which we paid
an on-chain fees, excluding the channels for which we were fundee.
Fixes#1449.
Light clients don't always validate channels by fetching the blockchain tx.
That means they don't have access to the exact capacity.
When that happens, we can fallback to htlc_maximum_msat if
available, or to a default capacity (otherwise path-finding will ignore
these edges).
When we announce a new public channel, make sure we don't override the
balance information with None.
Clean up IntegrationSpec warnings.
Fix PaymentLifecycle test: this test was broken by the recent changes in
BaseRouterSpec.
* Electrum client: downgrade log-level for errors that happened before a proper handshake
We only care for errors that we received during or after the "handshake" (i.e. the exchange of "version" messages).
Other errors cause by dead/unresponsive servers... which are very common on testnet just add spams to the logs.
* Electrum client pool: downgrade log level for errors with our secondary servers
We care for errors with our master electrum servers.
* Update Electrum checkpoints
Move the maximum fee computation outside of `findRoute`: this should be
done earlier in the payment pipeline if we want to allow accurate fee control
for MPP retries.
Right now MPP uses approximations when retrying which can lead to payments
that exceed the maximum configured fees. This is a first step towards
ensuring that this situation cannot happen anymore.
* Replace ArrayDeque by Queue
This is clearly not a hot path. This collection will have less than
10 elements and the bottleneck is rather on the dijkstra call.
ArrayDeque doesn't exist in Scala 2.11 so it makes the merge to
eclair-mobile more cumbersome.
* Use standard Scala collections in Dijkstra
Collections performance has improved greatly in Scala 2.13.
This change yields a consistent 10% improvement compared to the previous
implementation on my laptop based on the mainnet graph.
Both `Client` and `TransportHandler` were watching the connection actor,
which resulted in undeterministic behavior during termination of
`PeerConnection`.
We now always return a message when a connection fails during
authentication.
Took the opportunity to add more typing (insert
deathtoallthestring.jpg).
When a new connection happens while already connected, the `Peer` will
switch to the new connection.
In the current implementation, upon receiving a `ConnectionReady`, the
`Peer` will kill the current connection, then sends back the
`ConnectionReady` message to itself. In the meantime, the
`PeerConnection` will happily forward any incoming messages to the `Peer`.
This opens up to a race in the `Peer`'s mailbox between the
`ConnectionReady` message, and any incoming messages from the new
connection. If the latter win, they will get dropped because the `Peer`
is in state `DISCONNECTED`. Typically those are `ChannelReestablish`
messages and channels will get stuck in state `SYNCING`.
This PR make the `Peer` atomically switch to the new connection, without
going back to the `DISCONNECTED` state. As a result, we now have a
`CONNECTED`->`CONNECTED` transition.
* Extract faulty channels selection from PaymentLifecycle
Move the logic of figuring out which channels/nodes should be ignored
when retrying after a payment failure out of the PaymentLifecycle.
We can figure this out looking only at the `PaymentFailure` generated,
and the multi-part logic could leverage these helpers.
* Refactor RouteResponse
It was useless to return `ignoreNodes` and `ignoreChannels`, it's rather
the responsibility of the caller (PaymentLifecycle) to store and update
these sets.
Preparing for the MPP move inside the router, we introduce a Route class
and let RouteResponse return a collection of Routes.
This creates some ugliness in PaymentLifecycle because of the `routePrefix`,
but this is just temporary: the `routePrefix` "hack" will be removed soon.
* Use channel capacity and balance in path-finding
The path finding algorithm uses channel capacity instead of htlcMaximumMsat.
It also takes into account channel balance when available and excludes
channels that don't have enough funds to relay the payment.
This change also fixes an off-by-one error in weight computation: we were
incorrectly applying a channel's fee to the amount that needs to be relayed
through that channel (whereas this is instead what the node needs to receive
to collect enough fee *before* relaying).
* Refactor Graph file
Add documentation, update comments, rename fields and reformat to (helpfully)
make the code clearer.
* Simplify path-fiding implementation
There were a couple confusing steps in the implementation of Yen's algorithm.
The first one was the computation of the `edgesToIgnore` and the specific
handling of the case i = 0. This specific case wasn't needed and made the
code a bit hard to read.
The second one was the weight provided to dijkstra for spur paths.
The weight of the root path was applied to the target node. It was probably
an attempt to take into account the fact that dijkstra wasn't computing
a complete path and that fees may not match, but it couldn't really work.
I removed that and added a fee check at the end of the path-finding.
* Update graph balance for duplicate channel_update
This case regularly happens after a restart: the router already has the
latest channel_update for that channel, but we want to update the graph's
balances because they are all at `None` after a restart.
* minor: catch harmless unhandled events
This prevents unnecessary warnings to clog up the logs.
* fix race condition in test
Changing the fake ip address from 1.2.3.4:42000 to localhost:42000
in 32f15c85eb made the dummy connection
fail much faster, creating a race in the test. Reverting to the previous
ip and increasing the timeout should improve things a bit.
* Unlock transaction inputs if tx cannot be published
In some cases, funding a tx will work but publishing may fail (because mempool fees are not met for example).
In that case we need to make sure that the tx inputs are unlocked.
We should validate the announcement signatures in the channel too.
We still validate them in the router, but it's a different layer and the
check should happen as soon as possible.
We don't want to keep the channel open if the peer is sending us garbage.
This allows decoupling the reconnection task from the actual client
creation.
Also improved tests.
Co-Authored-By: Bastien Teinturier <31281497+t-bast@users.noreply.github.com>
This test was randomly failing. There may be a rounding error or a bit of
randomness somewhere in Bitcoin Core 0.19.1, sometimes a feerate right
below the limit is accepted into the mempool.
We've been witnessing random test suites freezes (since ages).
We've observed that when these freezes happen, there are usually a lot of
"too many open files" errors raised by the OS.
The backup handler is a likely culprit as the IntegrationSpec is running
multiple nodes and exchanging HTLCs at a fast rate.
At least it won't hurt disabling it in tests, and will speed up the
test suite.
We also increase the file limits in CI providers, when possible.
Instead of waiting for htlc-success txs to be confirmed, eclair also looks
at mempool txs to detect preimages as soon as possible.
This has been the case for a very long time, but our integration tests
didn't showcase this correctly.
Refactored common watcher test helpers and added tests to
ZmqWatcherSpec.
This is almost a drop-in replacement. I had to relaxed compiler
parameters to allow deprecated features though.
Main changes:
- relaxed compiler parameters to minimize impact (e.g. allow
deprecated features)
- `scala.collection.JavaConverters` -> `scala.jdk.CollectionConverters`
- `MultiMap` -> `MultiDict`
Compilation is 25% faster on my machine, compiler is a bit more strict
(it found an "invalid comparison" bug).
Do all the changes that will be required and are already possible to
minimize the diff:
- update dependencies
- `'something` -> `Symbol("something")`
- `BigDecimal.xValue()` -> `BigDecimal.xValue`
- `Map.filterKeys` -> `Map.filterKeys.toMap` (same for `Map.mapValues`)
- `def myMethod(...)` -> `def myMethod(...): Unit`
Router uses channel events (LocalChannelUpdate and AvailableBalanceChanged)
to track the balance of local channels.
This information will be used in path-finding to improve path-finding,
especially in the MPP case.
The information is added to the graph structures, but it's not used yet in
path-finding. Some A/B testing will be needed before we can use those
safely for the path-finding algorithm.
When switching to a new connection while already connected, peer
immediately kills the current connection and sends back the
`PeerConnection.ConnectionReady` to itself. Since #1379, the sender of
this message is assumed to be the `PeerConnection` actor. If peer
doesn't preserve the sender by using a `forward` instead of a `tell`, it
will assume that itself is the `PeerConnection`, which will break
everything.
Transaction generation functions used to throw exceptions.
We have a good TxGenerationSkipped type to express potential errors,
so these functions should return an Either to make the contract explicit.
We update transaction fees at every block (ie every 10 minutes). While this
works well when the remote peer is a node that's online for more than 10 minutes,
it's an issue for mobile wallets that usually come online for a few minutes
and then disconnect.
We want to make sure we send these wallet peers an update_fee when one
is needed, so we now check for feerate updates on reconnection.
Fixes#1381.
In case a channel has been pruned, and we receive a recent update, we
"unprune" it and immediately request the channel announcement again
(which will cause us to revalidate it). We also discard the update,
assuming that we will receive it again with the channel announcement.
We were using a `GossipDecision.Duplicate` rejection for the channel
update, which is inaccurate. This PR introduces a new
`GossipDecision.RelatedChannelPruned`.
This commit reverts #1278 where connecting to an Electrum server
would disable the SSL check. The correct way to handle that is to
allow users to choose their SSL behavior in the frontend applications.
If our latest successful connection attempt was less
than 30 seconds ago, we pick up the exponential
back-off retry delay where we left it. The
goal is to address cases where the reconnection
is successful, but we are disconnected right away.
* Support additional user defined TLVs when sending a payment (both single-part and MPP)
* Allow encoding and decoding of even TLV types above the high range
* Add missing cases to PostRestart
When a channel is closed we want to remove its HTLCs from our
list of pending broken HTLCs (they are being resolved on-chain).
We should also ignore outgoing HTLCs that have already been
settled upstream (which can happen when downstream is closing).
* Watch for downstream HTLC resolved on-chain
When a downstream channel is closing, we can safely fail upstream the
HTLCs that were either timed out on-chain or not included in the
broadcast commit transaction.
Channels will not always raise events about those after a reboot, so we
need to inspect the channel state and detect such HTLCs.
* Add helper function to HTLC scripts
To extract the payment_hash or preimage from an HTLC script seen on-chain.
* Cleanup on-chain HTLC timeout handling for MPP
With MPP, it's possible that a channel contains multiple HTLCs for the
same payment hash, and potentially even for the same expiry and amount.
We add more fine-grained handling of HTLC timeouts that share the same
payment hash. This allows a cleaner handling after a restart, and makes
sure we correctly detect failure that should be propagated upstream.
Otherwise we wouldn't be losing any money, but some channels may be closed
that we can avoid.
* Handle out-of-order htlc-timeout txs
It may happen that a commit tx and some htlc-timeout txs end up in the
same block. In that case, there is no guarantee on the order we'll receive
the confirmation events.
If any tx in a local/remoteCommitPublished is confirmed, that implicitly
means that the commit tx is confirmed (because it spends from it).
So we can consider the closing type known and forward the failure upstream.
* removed the `Direction` class
* improved the non-reg test for htlcs
- check actual content instead of only success and roundtrip
- use randomized data for all fields instead of all-zero
- check the remaining data, not only the decoded value (codecs are
chained so a regression here will cause the next codec to fail)
Co-Authored-By: Bastien Teinturier <31281497+t-bast@users.noreply.github.com>
* Sort commit transaction outputs using BIP69 + CLTV as tie-breaker for offered HTLCs
* Type DirectedHtlc:
We now use a small hierarchy of classes to represent HTLC directions.
There is also a type alias for a collection of commitment output links.
* front now handles ping/sync
Peer has been split in two and now handles only channel related stuff.
A new `PeerConnection` class is in charge of managing the BOLT 1 part
(init handshake, pings) and has the same lifetime as the underlying
connection.
Also, made `TransportHandler` be a child of `PeerConnection` by making
the `remoteNodeId` an attribute of the state of `PeerConnection` instead
of a constructor argument (since we cannot be sure of the remote nodeid
before the auth handshake is done). Now we don't need to worry about
cleaning up the underlying `TransportHandler` if the `PeerConnection`
dies.
* remove `Authenticator`
Instead of first authenticating a connection, then passing it to the
`PeerConnection` actor, we pass the connection directly to the
`PeerConnection` and let it handle the crypto handshake, before the LN
init. This removes a central point of management and makes things easier
to reason about. As a side effect, the `TransportHandler` actor is now a
child of `PeerConnection` which gives us a guarantee that it dies when
its parent dies.
* separated connection logic from `Peer`
The `ReconnectionTask` actor handles outgoing connections to a peer. The
goal is to free
the `Peer` actor from the reconnection logic and have it just react to
already established
connections, independently of whether those connections are incoming or
outgoing.
The base assumption is that the `Peer` will send its state transitions
to the `ReconnectionTask` actor.
This is more complicated than it seems and there are various corner
cases to consider:
- multiple available addresses
- concurrent outgoing connections and conflict between
automated/user-requested attempts
- concurrent incoming/outgoing connections and risk of reconnection
loops
- etc.
Co-Authored-By: Bastien Teinturier <31281497+t-bast@users.noreply.github.com>
* Refactor timed out HTLC helpers: directly take a DATA_CLOSING
and extract the relevant parts.
* ClosingStateSpec: test dust HTLCs
* Improve ClosingStateSpec
* Clean up usage of AddHtlcFailed
We were abusing AddHtlcFailed in some cases where an outgoing HTLC
was correctly added, but was later settled on-chain (fulfilled, timed
out or overridden by a different commit transaction).
These cases are now specifically handled with new Relayer.ForwardMessage
dedicated to on-chain settling.
* Refactor Relayer's ForwardMessages
ForwardFail and ForwardFulfill are now traits.
Handle both on-chain and remote fail/fulfills.
We were allowing users to set htlc-minimum-msat to 0, which directly
contradicts the fact that we must never send an HTLC for 0 msat.
We now explicitly disallow that behavior: the minimum is 1 msat.
In case the remote side of a channel had set its htlc-minimum-msat to 0,
we would forward HTLC with a value of 0 msat if a sender crafted such a
payment. The spec disallows that, so we now explicitly check for that
lower bound.
Also removed string interpolation for some of the more expensive debug
lines. It's a trade-off performance vs readability and is probably not
worth changing for info level logs, which will be enabled anyway.
Co-Authored-By: Bastien Teinturier <31281497+t-bast@users.noreply.github.com>
See https://github.com/lightningnetwork/lightning-rfc/issues/728
Add an additional reserve on the funder to prevent emptying and then
being stuck with an unusable channel.
As fundee we don't verify funders comply with that change.
We may enforce it in the future when we're confident the network as a
whole enforces that.
Previous implementation in #1317 wasn't working because in a bug in the
transition. Added a test and fixed it.
Co-authored-by: Bastien Teinturier <31281497+t-bast@users.noreply.github.com>
Refactor and improve payment metrics:
* Unify in a Monitoring object.
* Add helper functions and objects.
* Add more error metrics.
* Add more latency metrics.
* Add metrics to post-restart HTLC cleanup
* Add metrics to router path-finding
As long as we receive messages from our peer, we consider it is
online and don't send ping requests. If we don't hear from the
peer, we send pings and expect timely answers, otherwise we'll
close the connection.
This is implemented by scheduling a ping request every 30 seconds,
and pushing it back every time we receive a message from the peer.
* Support wumbo channels:
- use bits 18, 19
- compute the min depth for the funding transaction according to the channel size
- update routing heuristics for a wumbo world:
- the lower bound is the 25th percentile of current channel capacity on the network
- the higher bound is the most common capacity for wumbo channels
- add 'max-funding-satoshis' configuration key to allow to set the maximum channel size that will be accepted
Make DLP data mandatory in ChannelReestablish.
We make them mandatory to allow extending the message with TLVs.
Make upfront_shutdown_script a TLV record that we always include in
open_channel / accept_channel.
See https://github.com/lightningnetwork/lightning-rfc/pull/714.
Add an abstraction of a PaymentPart.
Remove unnecessary intermediary case classes.
This allows extending how payment parts can be received.
It's not limited to HTLCs (could be swaps, pay-to-opens, etc).
Currently, those methods throw exceptions, and we rely on `Channel` to
call them within a `Try(...)`. It puts more burden on `Channel` and
isn't very functional.
Some methods were returning an `Either[]`, which seem to play the role
of a `Try` but isn't used. It seems the idea was to not fail the channel
upon receiving a `fulfill`/`fail` for an unknown htlc, but it is not
fully wired, and isn't compliant (BOLT 2):
> A receiving node:
> if the id does not correspond to an HTLC in its current commitment
transaction:
> MUST fail the channel.
For signature-related methods, I went with the minimal change of
encapsulating portions of the code inside a `Try {...}` to minimize risk
of regression. We could also make `CommitmentSpec` methods return
`Try[]` but I suspect that would be more complicated with little
benefit.
Note that if some part isn't covered by the `Try` and ends up throwing
an exception, that will be covered by the `handleException` handler of
`Channel`.
Fixes#1305.
SendToRoute previously didn't accept invoice routing hints.
This was a limitation when paying a wallet invoice.
Invoice hints are now correctly taken into account.
Writing the PaymentRelayed event to the DB is not atomic.
Each part is written independently of the others (one row per part).
This is fine as nothing relies on this event being written atomically.
However tests were expecting that and we would observe rare
test failures on travis.
When coming back online, re-send private channels' `channel_update`.
This makes sure it gets rebroadcast regularly in case it was missed.
Since it's a private channel, it won't spam the network.
We use the socks5 proxy that is defined in the configuration and is typically used to connect to LN nodes running as TOR hidden services.
This should allow users to connect to Electrum servers that are running behind TOR.
* Refactor payment errors
When sending payments, it makes it easier for a wallet to display
the correct localized error message to users.
* Faster Trampoline payments fulfill
We were previously waiting for the whole downstream payment
to be settled (all individual HTLCs).
We can do better and fulfill upstream as soon as we get the preimage
(which only needs one downstream fulfill).
We currently rely on `require`, which is convenient, but doesn't allow
fine-grained exception control.
Also, in case of errors, logging is done at the supervisor level, where
we lose the remote `node_id`.
Instead, we type some crypto-related errors and log them in the
`TransportHandler`, which already has the correct MDC.
The DB ordering is not deterministic.
For multi-part payments, the first timestamp is taken.
This can vary depending on which record is listed first.
Using the same timestamp avoids a failed assertion.
Instead of emitting this event when we send a signature, we emit it when
our `availableBalanceForSend` actually changes. This happens:
- when we send a new `update_*`;
- when we receive a `commit_sig`, which may acknowledge one or several
`update_*` sent by our peer.
We choose to only emit this event in `NORMAL` state, because its goal is
to give information about what payments we can make, which can only
happen in that state.
NB: other events `ChannelSignatureSent` and `ChannelSignatureReceived` give
a different type of information, and are sent in all states where
signatures are exchanges, not only in `NORMAL`.
The field `localBalance` has been removed because it was ambiguous, and so is
the balance tracking in the database, which wasn't very useful.
Co-Authored-By: Bastien Teinturier <31281497+t-bast@users.noreply.github.com>
* Electrum: allow watcher to watch for mempool transactions
Watcher now handles WatchConfirmed watches where min depth
is set to 0: the watch event will be sent when the tx enters the
mempool of the bitcoin node our Electrum server is connected to.
For 0-conf channel, use scids with a height of 0 and a tx index
generated from the first 16 bytes of the funding txid. This gives us
unique ids that can still be identified as 0-conf channel.
Let a sender manually split a payment and specify a trampoline route.
Fix two flaky tests where the order of payment parts could be
different, resulting in a failed equality test.
If we're relaying multiple HTLCs for the same payment_hash,
we need to list all of those.
The previous code only handled that when Trampoline was used.
Comparing with the router ActorRef simply didn't work.
The reason is probably because Peers receive the router's supervisor ref
which doesn't match what `self` is inside `Router`.
Checking that the origin was the router felt brittle anyway.
We're now correctly typing the gossip origin.
We don't implement the upfront_shutdown_script feature.
However we update our encoding to always specify it.
This allows extending OpenChannel/AcceptChannel with tlv streams.
There is one caveat: Phoenix shipped with a version that's incompatible.
So we use a workaround to identify unpatched Phoenix versions
and send them the old encoding.
With MPP and Trampoline (and particularly the combination of the two),
we need to keep track of multiple amounts, recipients and fees.
There's a trampoline fee and a fee to reach the first trampoline node.
The trampoline nodes must appear in the route, but not as payment recipients.
Adding new fields to payment events and DB structs lets us distinguish those.
We also relax the spec requirement about feature graph dependencies.
The requirement to include `var_onion_optin` in invoice feature bits
was added after the first Phoenix release.
Phoenix users will thus have non spec-compliant invoices in their
payment history.
We accept invoices that don't set this field; this is a harmless
spec violation (as long as we set it in new invoices).
There was a rounding issue with the availableForSend/Receive calculation.
Because CommitTx fee and Htlc fee were computed separately,
but each was individually rounded down to Satoshis, we could
end up with an off-by-one error.
This resulted in an incapacity to send/receive the maximum amount available.
We now allow computing fees in msat, which removes rounding issues.
c-lightning fails to decode empty arrays of scids or timestamps with an encoding type set to COMPRESSED_ZLIB.
The spec is not specific enough on whether this is valid or not, so we'll set the encoding type of empty arrays to UNCOMPRESSED.
When paying an invoice, we weren't properly checking our own features.
If the invoice supported MPP, we would use it all the time.
If MPP isn't enabled in our features, we now default to a legacy payment.
Add new errors that let senders know they need to raise the trampoline fee/ctlv.
When the error is downstream, select the best error to forward.
Implement retry with more fees for trampoline payments.
This process is currently quite manual: the sender decides upfront on
each attempt's fee/cltv.
If our initial random deconnnection delay is 0 (unlikely but possible) then all "exponential backoff" reconnection delays will be 0 too, so we set a minimum value of 200 milliseconds.
lnd expects ids ranges in reply_channel_range messages to strictly follow each other, without gaps.
For example, using block heights and not ids, [1,2,4,5] would be split into (first=1, num=2, [1,2]) :: (first=3, num=2, [4, 5])
This is arguably a limitation of lnd (c-lightning does not requires this and it's not needed to properly process replies) but is easy to implement.
This is needed to make sure we broadcast our own gossip.
Otherwise we will try to gossip at the beginning of the connection,
when the peer hasn't set any timestamp, so our gossip will be dropped.
See https://github.com/lightningnetwork/lightning-rfc/pull/684
Otherwise eclair-mobile can't pay using MPP.
This heuristic was only here to help Trampoline nodes with a lot of
channels relay using MPP, but we disabled that in #1271 anyway.
We will reactivate Trampoline-MPP once split is done inside the router.
* Add test to check that we split short channel ids correctly
reply_channel_range messages should not overlap i.e different replies should not contain
channel ids that have the same block height.
The test in this commit fails, because our 'split' function needs to be updated.
* Channel Queries: make sure that our replies match the request range (fixes#1269)
Even though it's not completely explicit in the specs, we should make sure that
the [firstBlock, numBlock] range that we cover in our replies is not computed
from the ids that we actually have but instead matches the [firstBlock, numBlock] range
that was requested.
* Make sure that serialised replies stay below the 65Kb limit
We prune short channel id chunks to make sure that serialised replies stay below the 65 Kb limit.
The pruning algo is very simple: for each chunk we randomly keep the first or last 3200 ids
Selection is random so peers that re-connect will eventually receive all channel info.
The limit of 3200 was chosen for the worst case where replies are not compressed and include timestamps and checksum.
It is a fairly conservative boundary, the highest number of public channels in a single block so far is <300, and
there 3200 is roughly the currently observed number of transactions in a "full" block.
* Set default ids chunk size to 1500
Have smaller chunks (smaller than 3200 / 2) reduces the probability of merging 2 chunks and having to prune the result because the encoded reply would be over 65K.
* Smarter algo for enforcing max chunk size policy
Instead of keeping either the first or last items, we use a random offset. This way peers will eventually receive info about all channels even if chunks are much larger than the max chunk size and are pruned.
There is currently a backwards-compatibility issue with eclair-mobile.
Eclair-mobile mistakes feature bit 15 (payment_secret) for the
gossip_queries_ex prototype (which is incompatible with the spec-ed version).
To temporarily avoid this issue (until eclair-mobile is patched and all users have updated),
we never advertize those ambiguous bits in Init.
They're only really needed in the invoice so it's ok.
Implement https://github.com/lightningnetwork/lightning-rfc/pull/666
Keep the global/local split in Commitments to avoid backwards incompatibility in the codec.
Remove allowMultiPart API field: we instead rely on the MPP feature being set in nodeParams.
That means MPP-enabled nodes need to update their reference.conf.
Rework features:
* Add types to allow cleaner dependency validation.
* Most of the time we don't care whether a feature is activated as optional or mandatory, which caused duplicate code. This is now handled more cleanly.
* It also paves the way to annotate features with the places they should be advertised (Init vs NodeAnn vs ChannelAnn vs invoice).
This is safer for now since the splitting algorithm isn't working
well on nodes with a large number of channels and we don't
expect too many payments from Phoenix to non-Phoenix to
actually need MPP in the short term.
Mockito sometimes throws an unnecessary stubbing exception, it's unclear whether the test is faulty or mockito has issues with our parallel setup.
Rewrite switchboard tests without mockito makes them more flexible.
In case they randomly fail we should get more useful data to help troubleshooting.