* Update default path-finding weight ratios
* The two months window for channel age was too small for today's network
* CLTV is less of an issue nowadays: there are fewer stuck payments and
we're encouraging nodes to increase their CLTV because of the recent
mempool congestions
There are two cases depending on whether you use weight ratios or not.
They used to behave very differently:
* Without weight ratios, the weight would be the total amount to send (base amount + fees)
* With weight ratios, the ratios would apply to the total amount, not just the fees.
The effect is that only the number of hops and then the weight factors matter as
the fee itself is negligible compared to the total amount.
The code is now shared and the weight is now the sum of the fees
(multiplied by a factor if using weight ratios).
We use one actor per topic, but each actor previously registered to multiple
topics so we received duplicate events and consumed twice the necessary
bandwidth.
We don't need `SERIALIZABLE` consistency guarantees when all we do is
updating timestamp columns. This happens concurrently to channel data
update and raised serialization errors in postgres.
Fixed#1786.
Co-authored-by: Bastien Teinturier <31281497+t-bast@users.noreply.github.com>
We need to group incoming HTLCs together by payment_hash and payment_secret,
otherwise we will reject valid payments that are split into multiple distinct
trampoline parts (same payment_hash but different payment_secret).
Fixes#1723
Did some refactoring in tests and introduced a new `migrationCheck`
helper method.
Note that the change of data type in sqlite for the `commitment_number`
field (from `BLOB` to `INTEGER`) is not a migration. If the table has
been created before, it will stay like it was. It doesn't matter due to
how sqlite stores data, and we make sure in tests that there is no
regression.
* rework the db version management
The way our `getVersion`/`setVersion` was very unintuitive and led to
comments like the following in tests:
```scala
dbs.getVersion(statement, "network", 1) // this will set version to 1
```
The reason is that we treat unitialized databases and up-to-date
databases exactly the same way. That is why a `getVersion` will set the
version to the most up-to-date if none is found. It's also why we use
`CREATE IF NOT EXISTS` statements for tables and indices.
With this change, the `getVersion` now only _gets_ the version, or
returns `None` if no version exists.
Since we now differentiate between uninitialized and up-to-date db, we
don't need to make the create statements idempotent. This makes the code
more strict, and we will see errors if our versioning system has bugs.
Internal tables (used for versioning, locking, etc.) have been left
unchanged.
Co-authored-by: Bastien Teinturier <31281497+t-bast@users.noreply.github.com>
* use prepared statements for pruning
* optional read-only user
It is good practice to create a dedicated read-only user to browse the
database safely. But since the app itself creates its tables, the
postgres user is the owner and a manual `GRANT` is required everytime a
new table is added.
This PR makes it possible to specify an arbitrary username, that will be
granted read-only access to all tables in the `public` schema.
NB: The assumption here is that eclair is the only app using the
eclair database (in the `CREATE DATABASE eclair` sense), which I believe
is reasonable.
* set timezone on lease table
This only affects newly created table, there is no migration.
Users that are already using postgres will keep the previous column
type, it doesn't change anything for them.
* put back lock timeout on lease table
We use a timeout, because due to concurrency we may not be able to
obtain a lock immediately.
The timeout has been set to its original value of 5s and made
configurable.
* obtain lock before initializing tables
* Add trampoline info to auditDB
Add a new table containing the recipient and amount sent to the recipient in case of trampoline relaying.
When using trampoline, the recipient may not be the next node on the path.
The default is 360 minutes which is very long.
Our builds are usually around 10 minutes, so 20 minutes is a good value
to prevent overbilling builds that create infinite loops.
We don't want a database error to cause force close of channels.
Database errors are more likely to happen when using Postgres, but can
also happen with Sqlite in case of e.g. full disk.
Since we always write to disk before sending messages, we should be able
to recover gracefully after the db issue is fixed and eclair is
restarted.
The `ReconnectionTask` was only catching
`ConnectionResult.Failure.ConnectionFailed`, which is a subset of
possible failures. It should instead have caught
`ConnectionResult.Failure`.
All authentication and initialization failures were not caught and
didn't trigger reconnections.
Co-authored-by: Bastien Teinturier <31281497+t-bast@users.noreply.github.com>
Introduce a `TxPublisher` actor to publish channel txs.
Move logic from watcher to this new actor.
Remove the `TxSigningKit` abstraction that was introduced a bit too early.
The `TxPublisher` will hold all the logic so we'll start by providing the full
commitments, and we'll extract more compact objects later.
We also now publish the commit-tx and its anchor-tx independently.
I'll add two of mine later once documentation is there and it's made sure they work as intended.
Co-authored-by: Bastien Teinturier <31281497+t-bast@users.noreply.github.com>
* preserve pg lock exception cause
* specialize connections by backend type
* added concurrency test on channels table
This test unveils a concurrency issue in the upsert logic of the local
channels db, with the following error being thrown when we update many
channels concurrently:
```
Canceled on identification as a pivot, during conflict out checking
```
* use pg upsert construct
This is the recommended pattern according to postgres doc
(https://www.postgresql.org/docs/current/plpgsql-control-structures.html#PLPGSQL-UPSERT-EXAMPLE):
> It is recommended that applications use INSERT with ON CONFLICT DO
UPDATE rather than actually using this pattern.
* reproduce and fix same issue in peers db
* proper formatting
* prefix sqlite-specific tests
* fix double init tests
Due to how the `TestDatabases` is instantiated, calling `dbs.dbName`
twice was a no-op.
* add jdbcurl files to gitignore
Electrum support was provided for mobile wallets, server nodes should always
run a bitcoind node as this provides more control (especially for utxo
management for anchor outputs channels).
Since wallets will use https://github.com/acinq/eclair-kmp instead of eclair,
we can now remove Electrum and API fee providers from eclair.
We also removed 3rd-party fee API providers that were only used on wallets.
We need to wait for the channel relayer to be registered to the event stream
before publishing an event, otherwise the channel relayer will never receive it.
We send it a first message and wait for its response to ensure it had time
to correctly initialize.
More symmetry between postgres and sqlite init.
* define SqliteDatabases and PostgresDatabase
Those classes implement traits `Databases`, `FileBackup` and
`ExclusiveLock`.
The goal is to have access to backend-specific attributes, particularly
in tests. It arguably makes the `Databases` cleaner and simpler, with a
nice symmetry between the `apply methods`.
* replace 5s lock timeout by NOLOCK
* use chaindir instead of datadir for jdbcurl file
It is more consistent with sqlite, and makes sense because we don't want
to mix up testnet and mainnet databases.
* add tests on locks and jdbc url check
Reverse the flow of yen's k-shortest path: go backwards like
we do in dijkstra.
Better tracking of already explored spur paths which improves
performance (especially tail latency).
An interesting side-effect of anchor outputs is that htlc txs can be merged
when they have the same lockTime (thanks to sighash flags).
We're not currently doing that, but our peers may do it, so we need to handle
it in the revoked commit tx case and correctly claim multiple outputs if
necessary.
Since anchor outputs, we not only deduce the commit tx fee from the funder's
main output but the cost of the anchors as well.
We rename the function that does that for more clarity.
It can be useful for newcomers to have a high-level view of the main
components in eclair. This will help them quickly find where they should
start digging into actual code to achieve what they want.
We're leaving a lot of details out to ensure this document stays up-to-date
and doesn't need to completely change every time we slightly rework internal
details of our architecture.
This commit clarifies some parts of the code on which we regularly have
questions during pull requests.
We also add a test for an edge case in shutdown that was correctly handled,
but not properly tested, to ensure non-regression.
Re-work the `CommitPublished` types to work better with anchor outputs.
We previously stored the txs spending utxos that we could claim: this
doesn't make sense anymore if these txs may be RBF-ed, because the final
tx will be different from the initial one.
We instead track what `OutPoint`s we can claim, and the information
necessary to claim them. This way we can in the future let a different
actor finalize the txs that spend these outpoints (set the fees and sign).
We also add information on mutual close txs to immediately identify our
output and its amount: this makes auditing how much sats we'll get back
very easy from the API when we have many channels to watch.
This commit contains a DB migration of the channel data types, but in a
backwards-compatible way: we can still read from old data. The only
scenario impacted is channels that started force-closing before the migration.
They need special care to handle the fact that they had less data than
migrated channels, which is why we keep some legacy code around.
The goal is to ensure maximum safety when migrating data.
To achieve this, we strictly segregate each codec version (to make sure
that we don't accidentally mix codec versions), while still letting
tests access each unitary codecs (which using inner `private classes`
would have prevented). Relevant tests only need to be moved to the
appropriate package.
The package structure is now:
```
wire
|
`-- internal
| |
| `-- channel
| | |
| | `-- version0
| | | |
| | | `-- ChannelCodecs0
| | |
| | `-- version1
| | | |
| | | `-- ChannelCodecs1
| | |
| | `-- ChannelCodecs
| |
| `-- CommandCodecs
|
`-- others
```
Co-authored-by: Bastien Teinturier <31281497+t-bast@users.noreply.github.com>
We incorrectly applied error handlers at each sub-route instead of applying
it after grouping all sub-routes together. The result was that only `getinfo`
could actually be called.
* rename `Auditor` to `DbEventHandler`
Move it to the `db` package (it was in the `payments` package for
historical reasons but doesn't deal only with payment anymore).
Better typing for channel lifecycle event.
* add meta info to local_channels table
Here is the rationale for implementing channel metadata as additional
columns in the `local_channels` table of the `channels` db, as opposed
to a dedicated `channel_metadata` table of a `audit` db:
1) There is a migration to do (in the `local_channels` table, no less!),
but it's just a table migration, as opposed to a data migration, if we
had to populate a new table in a separate database.
2) We don't need to worry about creating a new metadata line when a new
channel is created (compared to doing add-or-update stuff). It's only
_updating_ optional columns in a best-effort manner.
3) We don't need to worry about inconsistencies between two tables
located in two separated databases (that's a big one).
4) We may want to use the metadata during operations, not just for
audit purposes. For example to close channels that have stayed unused
for a long time.
5) The audit db is an append-only log of events and shouldn't be used
for anything else. There is no `UPDATE` sql statement in
`*AuditDb.scala`. The `channel_metadata` would break that heuristic.
When a node returns a TemporaryChannelFailure, we should ignore this channel
when retrying. In some cases (such as channels from routing hints) this was
not correctly handled.
Fixes#1725
When we receive a trampoline payment asking us to relay with an
expiry that is already below the current chain height, we know that
this payment will fail when it reaches the next trampoline node.
Instead of waiting for the next trampoline node to reject it, we reject
it immediately.
Refactor the API handlers.
Split handlers and directives in several files to make them more composable.
Co-authored-by: Pierre-Marie Padiou <pm47@users.noreply.github.com>
Co-authored-by: Bastien Teinturier <31281497+t-bast@users.noreply.github.com>
We previously relied on `context.child` to check whether we already had a
relay handler for a given payment_hash.
Unfortunately this could return an actor that is currently stopping itself.
When that happens our relay command can end up in the dead letters and the
payment will not be relayed, nor be failed upstream.
We fix that by maintaining the list of current relay handlers in the
NodeRelayer and removing them from the list before stopping them.
This is similar to what's done in the MultiPartPaymentFSM.
It makes sense to allow node operators to configure the value they want
to use as a maximum threshold for anchor outputs commitment tx feerate.
This allows node operators to raise this value when mempools start getting
full in anticipation of a potential rise of the min-relay-fee.
This value can also be overridden for specific nodes.