A few of our test helper functions started using too many default arguments,
which makes them a bit messy to use in tests, especially when we want to
add new arguments.
We change this to use method overloading instead, which makes it easier to
add new overloads without changes to existing tests.
In channel tests, we set a 60 seconds timeout on the receiver end,
otherwise we sometimes run into timeout on slow CI machines that run a
lot of tests in parallel.
It makes sense to do the same for onion messages, to help us figure out
whether there is a race condition or it's just the machines that are slow.
With anchor outputs, we need to keep utxos available for fee bumping.
Having enough on-chain funds to claim HTLCs in case channels force-close
is necessary to guarantee funds safety.
There is no perfect formula to estimate how much we should reserve: it
depends on the feerate when the channels force-close, the number of
impacted HTLCs and their amount.
We use a very rough estimation, to avoid needlessly spamming node operators
while still notifying them when the situation becomes risky.
We were previously subscribing to all payment events, which includes
`PaymentSettlingOnChain`, for which there is nothing to do at the
DbEventHandler level.
We now register to each concrete event instead of registering to a generic
trait.
We've seen users bump into these limits often because many nodes now run
behind Tor on poor hardware, so it makes sense to make our default values
more robust.
The switchboard is our singleton actor entry point to all of our peers and
peer connections, it can be useful to have it deliver basic information
about our peers.
We also take this opportunity to fix a bug: we were not emitting the
`LastChannelClosed` event when it happened while the peer was
disconnected.
We introduced a task to regularly compute network statistics (mostly about
channel parameters such as expiry and fees).
The goal was to use this information in the MPP split algorithm to decide
whether to split a payment or not.
But we haven't used it, and I'm not sure anymore that it's useful at all.
If node operators are interested in network statistics, an ad-hoc
on-the-fly computation would make more sense.
When using the API to update the relay fees of a channel, the changes are saved in the DB and then the channel is updated. If the channel can't be updated, because it is not ready yet for instance, that's fine because it will use the values from the DB as soon as it is ready. So there is no need to return an error in that case.
Fixes#2085
Whenever a remote commitment is confirmed in which we received htlcs, we
generate a claim-htlc-success transaction even if we don't have the preimage
yet (so we can't broadcast it).
This created a confusing log line:
"tx generation success: desc=claim-htlc-success"
We want to create that log line only when we have the preimage and the node
operator can expect the HTLC to be redeemed on-chain.
For some reason, when sending to a pre-defined channel route, we only
looked at the public channel graph. This was incorrect, we should also
include private channels and routing hints.
Fixes#2081
If the remote commit confirms before our local commit, there is no reason
to try to publish our HTLC transactions, we will instead directly claim
the htlc outputs from the remote commit.
We previously checked timelocks before checking preconditions, which in
this case means we would be waiting for a confirmation on our local commit
forever. We now check preconditions before timelocks, and added a
precondition that verifies that the remote commit isn't confirmed before
publishing our HTLC txs.
The output of `getsentinfo` didn't include the `nodeId` of the failing node.
This PR adds it, as it can be used by external apps when they build routes
themselves instead of relying on eclair's internals (e.g. channel rebalancing).
The scodec magic was quite hard to read, and the use of the prefix wasn't
very intuitive since Sphinx uses both a prefix and a suffix.
Also added more codec tests.
We rename the EncryptedRecipientData types.
The data it contains is namespaced to usages for route blinding, so we
make that explicit.
This way if future scenarios use another kind of encrypted tlv stream
we won't have name clashes (e.g. encrypted state backup).
We also update the route blinding test vectors to the final spec version.
Add basic support for onion messages (lightning/bolts#759)
Add functions and codecs to create, read and process onion messages. Does not use any of them yet.
We previously relied on bitcoind's dumpprivkey RPC in some of our tests.
That RPC isn't available with descriptor wallets, and descriptor wallets
are now the default wallet type.
We previously created restrictions in Sphinx.scala to only allow using it
for two types of onions: a 1300 bytes one for HTLCs and a 400 bytes one
for trampoline.
This doesn't make sense anymore. The latest version of trampoline allows
any onion size, and onion messages also allow any onion size. The Sphinx
protocol doesn't care either about the size of the payload.
Another reason to remove it is that it wasn't working that well with
pattern matching because of type erasure.
So now the caller must explicitly set the length of the payload, which is
more flexible. Verifying that the correct length is used is deferred to
higher level components.
Fixes#1995, which was due to a pattern matching error for the expected response type of `sendToX` helper methods in `EclairImpl`, and had nothing to do with json serialization. Added a few non-reg tests.
In the second commit I also set a basic "ok" json serializer for all default `RES_SUCCESS` messages, but didn't follow https://github.com/ACINQ/eclair/issues/1995#issuecomment-940821678, because we would either completely break backwards compatibility, or create inconsistency with non-default command responses like `RES_GETINFO`, and with other API calls not related to channels.
If a _local_ mutual close transaction is published from outside of the actor state machine, the channel will fail to recognize it, and will move to the `ERR_INFORMATION_LEAK` state. We could instead log a warning and handle it gracefully, since no harm has been done.
This is different from a local force close, because we do not keep the fully-signed local commit tx anymore, so an unexpected published tx would indeed be very fishy in that case. But we do store the best fully-signed, read-to-publish mutual close tx in the channel data so we must be ready to handle the case where the operator manually publishes it for whatever reason.
We use to store UNIX timestamps in the `waitingSince` field before
moving to block count. In order to ensure backward compatibility, we
converted from timestamps to blockheight based on the value. Anything
over 1 500 000 was considered a timestamp. But this value is much too
low: on testnet the blockheight is already greater than 2 000 000.
We can use 1 500 000 000 instead, which is somewhere in 2017.
Another way to deal with this would be to simply remove this
compatibility code.
This PRs adds an alternate strategy to handle unhandled exceptions. The goal is to avoid unnecessary mass force-close, but it is reserved to advanced users who closely monitor the node.
Available strategies are:
- local force close of the channel (default)
- log an error message and stop the node
Default settings maintain the same behavior as before.
In an "outdated commitment" scenario where we are on the up-to-date side, we always react by force-closing the channel immediately, not giving our peer a chance to fix their data and restart. On top of that, we consider this a commitment sync error, instead of clearly logging that our counterparty is using outdated data.
Addressing this turned out to be rabbit-holey: our sync code is quite complicated and is a bit redundant because we separate between:
- checking whether we are late
- deciding what messages we need to retransmit
Also, discovered a missing corner case when syncing in SHUTDOWN state.
Add support for cookie authentication with bitcoind instead of
user/password. This is recommended when running eclair and
bitcoind on the same machine: it ensures only processes with
read permissions to the bitcoind cookie file are able to call the
RPC, which is safer than a user/password pair.
The app must stop when connection to the backend fails. It will be gracefully restarted on Beanstalk instead of just hanging.
Fixes a regression introduced by #1912.
Default upper bound was `Long.MaxValue unixsec` which overflowed when converted to `TimestampMilli`. We now enforce `min` and `max` values on timestamp types.
API tests didn't catch it because eclair is mocked and the conversion happens later.
Fixes#2031.
Add a new log file for important notifications that require an action from
the node operator.
Using a separate log file makes it easier than grepping specific messages
from the standard logs, and lets us use a different style of messaging,
where we provide more information about what steps to take to resolve
the issue.
We rely on an event sent to the event stream so that plugins can also pick
it up and connect with notification systems (push, messages, mails, etc).
On slow CI machines, the "recv WatchFundingConfirmedTriggered" test was
flaky because there was a race between the publication of Alice's
TransactionPublished event before going to the WaitForFundingLocked state
and the tests registering event listeners (after going to the
WaitForFundingLocked state).
It doesn't make sense to throw away this information, and it's useful in
some scenarios such as onion messages.
The ephemeral keys aren't part of the route, they're usually derived hop
by hop instead. We only need to keep the first one that must be somehow
sent to the introduction node.
For incoming htlcs, the amount needs to be included in our balance if we know the preimage, even if the htlc hasn't yet been formally settled.
We were already taking into accounts preimages in the `pending_commands` database.
But, as soon as we have sent and signed an `update_fulfill_htlc`, we clean up the `pending_commands` database. So we also need to look at current sent changes.
Add options to ignore specific channels or nodes for
findRoute* APIs, and an option to specify a flat maximum
fee.
With these new parameters, it's now possible to do circular
rebalancing of your channels.
Co-authored-by: Roman Taranchenko <romantaranchenko@Romans-MacBook-Pro.local>
Co-authored-by: t-bast <bastuc@hotmail.fr>
Add API to delete an invoice.
This only works if the invoice wasn't paid yet.
Co-authored-by: Roman Taranchenko <romantaranchenko@Romans-MacBook-Pro.local>
Co-authored-by: t-bast <bastuc@hotmail.fr>
Cryptographic functions to blind and unblind a route and its associated
encrypted payloads.
Decrypt and decode the contents of an `encrypted_recipient_data` tlv field.
We could share the tlv namespace with onion tlvs, but it's cleaner to
separate them. They have a few common fields, but already diverge on
others, and will diverge even more in the future.
* Check serialization consistency in all channel tests
We add a simple wrapper over the channels db used in all channel state unit tests, which will basically check
that deserialize(serialize(state)) == state.
* Add getChannel() method to ChannelsDb interface
This makes our serialization checks cleaner: we now test that read(write(channel)) == channel
We define `TimestampSecond` and `TimestampMilli` for second and millisecond precision UNIX-style timestamps.
Let me know what you think of the syntaxic sugar, I went for `123456 unixsec` and `123456789 unixms`.
Json serialization is as follows for resp. second and millisecond precision. Note that in both case we display the unix format in second precision, but the iso format is more precise:
```
{
"iso": "2021-10-04T14:32:41Z",
"unix": 1633357961
}
{
"iso": "2021-10-04T14:32:41.456Z",
"unix": 1633357961
}
```
* use a map for feature->channelType resolution
Instead of explicitly listing all the combination of features, and risk
inconsistency, we may has well build the reverse map using the channel
type objects.
* better and less spammy logs
We can switch the "funding tx already spent" router log from _warn_ to
_debug_ because as soon as there are more than 10 of them, the peer's
announcements will be ignored and there will be a warning message about
that.
* timedOutHtlcs -> trimmedOrTimedOutHtlcs
Add a precision on trimmed htlcs, which can be failed as soon as the
commitment tx has been confirmed.
* proper logging of outgoing messages
It is also logical to make `Outgoing` a command of `Peer`. It should
have been done this way from the start if `Peer` had been a typed actor.
* fixed mixed up comments
Discovered this while working on #1838.
In the following scenario, at reconnection:
- `localCommit.index = 7`
- `nextRemoteRevocationNumber = 6`
So when `localCommit.index == nextRemoteRevocationNumber + 1` we must retransmit the revocation.
```
local remote
| |
| (no pending sig) |
commit = 6 | | next rev = 6
|<----- sig 7 ------|
commit = 7 | |
|-- rev 6 --> ? |
| |
| (disconnection) |
| |
```
* reproduce bug causing API hang at open
In case of an error when validating channel parameters, we do not
return a message to the origin actor. That translates to API hanging
until timeout.
Took the opportunity to test return values in other cases too.
* return an error to origin actor for invalid params
* WaitForFundingCreatedInternal -> WaitForFundingInternal
* add tests to WaitForFundingInternalStateSpec
* add tests to WaitForFundingConfirmedStateSpec
* API nits
We probably don't need to print the stack trace for API errors, and the
open timeout of 10s was a bit short (it has to be << 30s though).