For path finding we always need `routeParams`, however all of the messages asking for finding a route have `routeParams` as an optional parameter. This makes it hard to enforce that a given payments uses a given `routeParams` as there may be a `None` somewhere that reverts to the global defaults.
It seems that the reason all the `routeParams` are optional is that the messages asking for finding a route are sometimes used when we already have a route. This is a hacky solution and I've tried to replace these messages by traits that work both when we want to find a route and when we already have it.
I'm trying to enable AB testing which requires using different `routeParams` for different payments and this is a blocker.
* Disable ZMQ high watermark
This should prevent messages from being dropped.
We also configure the socket before subscribing to topics and connecting.
* Switch ZMQ raw block to block hash only
We were receiving raw blocks on the ZMQ socket, whereas we don't use it.
We only use this event as a trigger that a new block has been found, so we
can save bandwidth by switching to block hash subscriptions instead.
* Regularly check blocks
In case we haven't received block events on the ZMQ socket, we regularly
check whether we're behind.
There's a very annoying race condition in that test, and we can end up
emitting events twice on slow machines. This isn't an issue so we stop
verifying that we avoid duplicate events.
* Never retry anchor publish when commit confirmed
We used to check the feerate before checking whether the commit tx was
already confirmed: when the commit feerate was good enough, we would
respawn a publish actor every block whereas the commit tx was already
confirmed.
* Abandon evicted replaceable txs
This ensures the bitcoind wallet won't keep the transaction around and
avoid using its inputs for other transactions.
Fixes#1898
Scale MPP partial amounts based on the total amount we want to
send and the number of parts we allow.
This avoids polluting the results with cheap routes that don't have the
capacity to route when the amount is big.
- Add a base weight ratio. Allows to give more weight to the fee itself. Setting the base to 1 is like not using weight ratios.
- Add checks that the configuration values make sense (all weights should be nonnegative and sum to 1). Checks are performed when loading the configuration.
- Add a virtual cost per hop which allows prioritizing short paths. The cost per hop has a base component and a proportional one, similar to channel fees.
We didn't handle shutdown messages in the negotiating state, whereas they
may be received upon reconnection. This created some unnecessary warnings
in the logs.
We previously made a single payment attempt per trampoline fee.
Since our channel selection for the first attempt is deterministic, if we
encountered a local failure with that channel, the retries with higher
trampoline fees were hitting the exact same error, whereas we should
instead try a different channel.
* Set relay fees per node and save them to database
- Fees are set per node instead of per channel (setting different fees for different channels to the same node is most probably an error)
- Fees are saved to a database so that we can keep a trace of historic fees and new channels with a known node use the fee that we set and not the default fee.
HTLCs are stored in a `Set`, and doing `htlcs.map(_.amount).sum` will
deduplicate identical amounts, resulting in an erroneous balance
calculation. The `Set` must be converted to a `List` first.
Identical htlcs amount are quite frequent due to splitting for AMP.
Co-authored-by: Bastien Teinturier <31281497+t-bast@users.noreply.github.com>
* Implement option-upfront-shutdown-script
* Do not activate option_upfront_shutdown_Script by defaut
Users will need to explicitly activate it.
* Send back a warning when we receive an invalid shutdown script
This is a simpler approach to completely parallelizing the handling of
payments, where we simply parallelize the fetch from the database.
This brings a ~30% performance improvement in performance in `PerformanceIntegrationSpec`.
Our test suite is putting a lot of strain on our CI machines, and we start
hitting some timeouts. There were places where we would put a 60 seconds
timeout on an `awaitCond` but inside we'd still use the default 15 seconds
timeout.
Delegate the payment request generation, signature and db write to a short-lived child actor.
There is small (~5%) gain in performance in `PerformanceIntegrationSpec` because what matters is the db write, and it is not parallelized in WAL mode.
Fail outgoing _unsigned_ htlcs on force close, just like we do when disconnected.
Fixes#1829.
Co-authored-by: Bastien Teinturier <31281497+t-bast@users.noreply.github.com>
This test was added to lightning-kmp where this case wasn't correctly
handled: https://github.com/ACINQ/lightning-kmp/pull/278
It is correctly handled by eclair, but an additional test doesn't hurt.
[Write-Ahead Logging](https://sqlite.org/wal.html) is both much more performant in general, and more suited to our particular access patterns.
With a simple throughput performance test, it improves performance by a factor of 5-20x depending on the sync flag.
version | throughput
-------------------------------|-------------
mode=journal sync=normal (*)| 11 htlc/s
mode=journal sync=full| 7 htlc/s
mode=wal sync=normal| 248 htlc/s
mode=wal sync=full (**)| 62 htlc/s
(*) previous setting
(**) new setting
I went with a conservative new setting of wal+full sync, which is both 5x more performant, and more secure than what we had before.
> In WAL mode when synchronous is NORMAL (1), the WAL file is synchronized before each checkpoint and the database file is synchronized after each completed checkpoint and the WAL file header is synchronized when a WAL file begins to be reused after a checkpoint, but no sync operations occur during most transactions. With synchronous=FULL in WAL mode, an additional sync operation of the WAL file happens after each transaction commit. The extra WAL sync following each transaction help ensure that transactions are durable across a power loss. Transactions are consistent with or without the extra syncs provided by synchronous=FULL. If durability is not a concern, then synchronous=NORMAL is normally all one needs in WAL mode.
Co-authored-by: Bastien Teinturier <31281497+t-bast@users.noreply.github.com>
Move balance check test to their own file instead of
adding bloat to the NormalStateSpec tests.
Remove unnecessary parts that belonged to the
NormalStateSpec test.
There are three otherwise unrelated changes, that we group together to only have one migration:
- remove local signatures for local commitments (this PR)
- separate internal channel config from channel features (#1848)
- upfront shutdown script (#1846)
We increase database version number in sqlite and postgres to force a full data migration.
The goal of removing local signatures from the channel data is that even if the node database or
a backup is compromised, the attacker won't be able to force close channels from the outside.
I was able to reproduce #1856 by replaying the "concurrent channel
updates" test with hardcoded additional delays in the database code. It
was due to a conflict between `addOrUpdateChannel` and
`updateChannelMetaTimestampColumn`. The two calls run in parallel and
the latter completed before the former, causing it to fail. Reducing
the isolation level makes the problem disappear.
We reduce the transaction isolation level from `SERIALIZABLE` to
`READ_COMMITTED`. Note that [*]:
> Read Committed is the default isolation level in PostgreSQL.
I'm not sure why we were using a stricter isolation level than the
default one, since we only use very basic queries. Doc does say that:
> This behavior makes Read Committed mode unsuitable for commands that involve complex search conditions; however, it is just right for simpler cases
To make sure this didn't cause regression withe the locking
mechanism, I wrote an additional test specifically on the `withLock`
method.
Here is what the doc says on the `INSERT ON CONFLICT DO UPDATE`
statement, which we use for `addOrUpdateChannel`:
> INSERT with an ON CONFLICT DO UPDATE clause behaves similarly. In Read Committed mode, each row proposed for insertion will either insert or update. Unless there are unrelated errors, one of those two outcomes is guaranteed. If a conflict originates in another transaction whose effects are not yet visible to the INSERT, the UPDATE clause will affect that row, even though possibly no version of that row is conventionally visible to the command.
In the scenario described above, the `addOrUpdate` will update the row
which timestamp was updated in parallel by the
`updateChannelMetaTimestampColumn`, and it's exactly what we want.
Fixes#1856.
[*] https://www.postgresql.org/docs/13/transaction-iso.html
Instead of having a flat organization under the default `public` schema, we classify tables in schemas. There is roughly one schema per database type.
The new hierarchy is:
- `local`
- `channels`
- `htlc_infos`
- `pending_settlement_commands`
- `peers`
- `network`
- `nodes`
- `public_channels`
- `pruned_channels`
- `payments`
- `received`
- `sent`
- `audit`
- (all the audit db tables)
- `public`
- `lease`
- `versions`
Note in particular, the change in naming for local channels vs external channels:
- `local_channels` -> `local.channels`
- `channels` -> `network.public_channels`
The two internal tables `lease` and `versions` stay in the `public`
schema, because we have no meta way of migrating them.
A json column has been added to the few tables that contains an
opaque serialized blob:
- `local_channels.data`
- `nodes.data`
- `channels.channel_announcement`, `channels.channel_update_x`
We can now access all the individual data fields from SQL.
For the serialization, we use the same serializers than the one
that were previously used by the API. They have been moved to the
`eclair-core` module and simplified a bit.
There are two json data types in Postgres: `JSON` and `JSONB`. We use
the latter one, which is more recent, and allows indexing.
An alternative to this PR would have been to use columns, but:
- there would have been a *lot* of columns for the channel data
- every modification of our types would have required a db migration
NB: to handle non-backwards compatible changes in the json serializersi,
all the json columns can be recomputed on restart by setting
`eclair.db.reset-json-columns=true`.
Change in in ChannelCodecsSpec:
The goal of this test is to make sure that, in addition to successfully
decoding data that encoded with an older codec, we actually read the
correct data. Just because there is no error doesn't mean that we
interpreted the data properly. For example we could invert a
`payment_hash` and a `payment_preimage`.
We can't compare object to object, because the current version of the
class has probably changed too. That's why we compare using the json
representation of the data, that we amend to ignore new or modified
fields.
After doing a manual comparison, I updated the test to use the current
json serializers, and replaced the test data with the latest json
serialization. This allows us to remove all the tweaks that we added
over time to take into account new and updated fields.
It returns an overall balance, separating onchain, offchain, and
removing duplicates (e.g. mutual closes that haven't reached min depth
still have an associated channel, but they already appear in the
on-chain balance). We also take into account known preimages, even if
the htlc hasn't been formally resolved.
Metrics have also been added.
Co-authored-by: Bastien Teinturier <31281497+t-bast@users.noreply.github.com>
We still use sqlite as the primary db, but all calls are replicated
asynchronously on postgres.
The goal is to prepare a smooth transition from sqlite to postgres
on a production server. This is a very specific use case and most users
shouldn't use it, which is why the new config `eclair.db.driver=dual` is
not documented.
Trampoline payments used to ignore the fee and cltv set for the local channel and use a global default value instead. We now use the correct fee and cltv for the specific local channel that we take.
It doesn't make any sense to forward empty payments.
This is also checked when adding the htlcs in the outgoing channel, but
we should fail early here.