* Implement option-upfront-shutdown-script
* Do not activate option_upfront_shutdown_Script by defaut
Users will need to explicitly activate it.
* Send back a warning when we receive an invalid shutdown script
This is a simpler approach to completely parallelizing the handling of
payments, where we simply parallelize the fetch from the database.
This brings a ~30% performance improvement in performance in `PerformanceIntegrationSpec`.
Our test suite is putting a lot of strain on our CI machines, and we start
hitting some timeouts. There were places where we would put a 60 seconds
timeout on an `awaitCond` but inside we'd still use the default 15 seconds
timeout.
Delegate the payment request generation, signature and db write to a short-lived child actor.
There is small (~5%) gain in performance in `PerformanceIntegrationSpec` because what matters is the db write, and it is not parallelized in WAL mode.
Fail outgoing _unsigned_ htlcs on force close, just like we do when disconnected.
Fixes#1829.
Co-authored-by: Bastien Teinturier <31281497+t-bast@users.noreply.github.com>
This test was added to lightning-kmp where this case wasn't correctly
handled: https://github.com/ACINQ/lightning-kmp/pull/278
It is correctly handled by eclair, but an additional test doesn't hurt.
[Write-Ahead Logging](https://sqlite.org/wal.html) is both much more performant in general, and more suited to our particular access patterns.
With a simple throughput performance test, it improves performance by a factor of 5-20x depending on the sync flag.
version | throughput
-------------------------------|-------------
mode=journal sync=normal (*)| 11 htlc/s
mode=journal sync=full| 7 htlc/s
mode=wal sync=normal| 248 htlc/s
mode=wal sync=full (**)| 62 htlc/s
(*) previous setting
(**) new setting
I went with a conservative new setting of wal+full sync, which is both 5x more performant, and more secure than what we had before.
> In WAL mode when synchronous is NORMAL (1), the WAL file is synchronized before each checkpoint and the database file is synchronized after each completed checkpoint and the WAL file header is synchronized when a WAL file begins to be reused after a checkpoint, but no sync operations occur during most transactions. With synchronous=FULL in WAL mode, an additional sync operation of the WAL file happens after each transaction commit. The extra WAL sync following each transaction help ensure that transactions are durable across a power loss. Transactions are consistent with or without the extra syncs provided by synchronous=FULL. If durability is not a concern, then synchronous=NORMAL is normally all one needs in WAL mode.
Co-authored-by: Bastien Teinturier <31281497+t-bast@users.noreply.github.com>
Move balance check test to their own file instead of
adding bloat to the NormalStateSpec tests.
Remove unnecessary parts that belonged to the
NormalStateSpec test.
There are three otherwise unrelated changes, that we group together to only have one migration:
- remove local signatures for local commitments (this PR)
- separate internal channel config from channel features (#1848)
- upfront shutdown script (#1846)
We increase database version number in sqlite and postgres to force a full data migration.
The goal of removing local signatures from the channel data is that even if the node database or
a backup is compromised, the attacker won't be able to force close channels from the outside.
I was able to reproduce #1856 by replaying the "concurrent channel
updates" test with hardcoded additional delays in the database code. It
was due to a conflict between `addOrUpdateChannel` and
`updateChannelMetaTimestampColumn`. The two calls run in parallel and
the latter completed before the former, causing it to fail. Reducing
the isolation level makes the problem disappear.
We reduce the transaction isolation level from `SERIALIZABLE` to
`READ_COMMITTED`. Note that [*]:
> Read Committed is the default isolation level in PostgreSQL.
I'm not sure why we were using a stricter isolation level than the
default one, since we only use very basic queries. Doc does say that:
> This behavior makes Read Committed mode unsuitable for commands that involve complex search conditions; however, it is just right for simpler cases
To make sure this didn't cause regression withe the locking
mechanism, I wrote an additional test specifically on the `withLock`
method.
Here is what the doc says on the `INSERT ON CONFLICT DO UPDATE`
statement, which we use for `addOrUpdateChannel`:
> INSERT with an ON CONFLICT DO UPDATE clause behaves similarly. In Read Committed mode, each row proposed for insertion will either insert or update. Unless there are unrelated errors, one of those two outcomes is guaranteed. If a conflict originates in another transaction whose effects are not yet visible to the INSERT, the UPDATE clause will affect that row, even though possibly no version of that row is conventionally visible to the command.
In the scenario described above, the `addOrUpdate` will update the row
which timestamp was updated in parallel by the
`updateChannelMetaTimestampColumn`, and it's exactly what we want.
Fixes#1856.
[*] https://www.postgresql.org/docs/13/transaction-iso.html
Instead of having a flat organization under the default `public` schema, we classify tables in schemas. There is roughly one schema per database type.
The new hierarchy is:
- `local`
- `channels`
- `htlc_infos`
- `pending_settlement_commands`
- `peers`
- `network`
- `nodes`
- `public_channels`
- `pruned_channels`
- `payments`
- `received`
- `sent`
- `audit`
- (all the audit db tables)
- `public`
- `lease`
- `versions`
Note in particular, the change in naming for local channels vs external channels:
- `local_channels` -> `local.channels`
- `channels` -> `network.public_channels`
The two internal tables `lease` and `versions` stay in the `public`
schema, because we have no meta way of migrating them.
A json column has been added to the few tables that contains an
opaque serialized blob:
- `local_channels.data`
- `nodes.data`
- `channels.channel_announcement`, `channels.channel_update_x`
We can now access all the individual data fields from SQL.
For the serialization, we use the same serializers than the one
that were previously used by the API. They have been moved to the
`eclair-core` module and simplified a bit.
There are two json data types in Postgres: `JSON` and `JSONB`. We use
the latter one, which is more recent, and allows indexing.
An alternative to this PR would have been to use columns, but:
- there would have been a *lot* of columns for the channel data
- every modification of our types would have required a db migration
NB: to handle non-backwards compatible changes in the json serializersi,
all the json columns can be recomputed on restart by setting
`eclair.db.reset-json-columns=true`.
Change in in ChannelCodecsSpec:
The goal of this test is to make sure that, in addition to successfully
decoding data that encoded with an older codec, we actually read the
correct data. Just because there is no error doesn't mean that we
interpreted the data properly. For example we could invert a
`payment_hash` and a `payment_preimage`.
We can't compare object to object, because the current version of the
class has probably changed too. That's why we compare using the json
representation of the data, that we amend to ignore new or modified
fields.
After doing a manual comparison, I updated the test to use the current
json serializers, and replaced the test data with the latest json
serialization. This allows us to remove all the tweaks that we added
over time to take into account new and updated fields.
It returns an overall balance, separating onchain, offchain, and
removing duplicates (e.g. mutual closes that haven't reached min depth
still have an associated channel, but they already appear in the
on-chain balance). We also take into account known preimages, even if
the htlc hasn't been formally resolved.
Metrics have also been added.
Co-authored-by: Bastien Teinturier <31281497+t-bast@users.noreply.github.com>
We still use sqlite as the primary db, but all calls are replicated
asynchronously on postgres.
The goal is to prepare a smooth transition from sqlite to postgres
on a production server. This is a very specific use case and most users
shouldn't use it, which is why the new config `eclair.db.driver=dual` is
not documented.
Trampoline payments used to ignore the fee and cltv set for the local channel and use a global default value instead. We now use the correct fee and cltv for the specific local channel that we take.
It doesn't make any sense to forward empty payments.
This is also checked when adding the htlcs in the outgoing channel, but
we should fail early here.
Splt the TxPublisher in many smaller actors with clear responsibilities.
Each tx publishing attempt is its own actor and watches the tx until it
either confirms or becomes evicted, and reports the result to its parent.
The TxPublisher (one per channel) orchestrates publishing attempts and
will in the future decide to RBF txs based on deadline information.
The `payment_secret` feature was made mandatory in #1810 and is the default
in other implementations as well. We can thus force it to be available when
decoding onion payloads, which simplifies downstream components (no need
to handle the case where a `payment_secret` may be missing anymore).
We also rename messages in `PaymentInitiator` to remove the confusion with
Bolt 11 payment requests.
There are no functional changes, but bitcoin-lib 0.19 is based on secp256k1-kmp (instead of our own fork of secp256k1's JNI wrapper) which is cleaner, easier to maintain and used in our mobile apps.
With the move to akka _typed_, we will be using more and more
scalatest's `eventually` as a replacement for akka's `awaitCond`
(which isn't available in `testkit.typed`).
But there is a catch:
- `awaitCond` expects a boolean
- `eventually` expects a non-failure
Which means that we must use `eventually(assert(cond))`, and not
`eventually(cond)`.
Make `Setup.datadir` visible to code that receives an instance of
`Setup`. This allows plugin to know where the eclair data directory
is and potentially enrich it.
The spec defines `max_accepted_htlcs` and `max_htlc_value_in_flight_msat`
to let nodes reduce their exposure to pending HTLCs. This only applies to
received HTLCs, and we use the remote peer's values for outgoing HTLCs.
But when we're more restrictive than our peer, it makes sense to apply our
limits to outgoing HTLCs as well.
This allows us to use the full power of scala collections, to iterate
over results, convert to options, etc. while staying purely functional
and immutable.
There is a catch though: the iterator is lazy, it must be materialized
before the result set is closed, by converting the end result in a
collection or an option. In other words, database methods must never
return an `Iterable` or `Iterator`.
* Reduce log level for explorer API errors
* Reduce log level for remote peer invalid open_channel
* Don't send duplicate commands in PostRestartHtlcCleaner: if there
is already a pending HTLC settlement command in the DB, the post
restart handler should let the channel replay it instead of sending
a conflicting command.
* Workaround for lnd bug in reestablish: sometimes lnd sends
announcement_signatures before sending their channel reestablish.
This is a minor spec violation, we can simply delay the message and
handle it later (hopefully once we've received their reestablish).
* Log shared secrets in Sphinx error: Breez sometimes returns errors
that we fail to parse. Unfortunately we didn't correctly log the shared
secrets because the variable was shadowed, so we can't investigate
further for now.
* Fix utxo metric checks: if we're unable to fetch the number of
unconfirmed parents for a utxo, this shouldn't cause the global utxo
check to fail. We log a warning and let operations continue to ensure
the metric is updated.
* Handle ChannelIdAssigned when disconnected: there may be a race
condition where a peer disconnect in the middle of a channel id assignment.
In that case, we still want to record the up-to-date mapping.
Naming was confusing because it led to believe messages were related to
htlcs that have not yet been relayed, whereas those are settlement
messages, meaning that htlcs have relayed and are pending resolution
upstream.
The database has been renamed to a more generic `PendingCommandsDb`
because we may store other types of commands for which we need reliable
delivery.
Plugins can extend the `RouteProvider` trait to enrich the API with
custom calls, removing the need to setup a separate endpoint on a
different port.
When routes clash between plugins, the second one is simply ignored.
Plugin developers should prepend their route with their plugin name
to avoid such silent clashes.
In case of catastrophic failures of the `SecureRandom` instance, we add
a secondary randomness source that we mix into the random stream.
This is a somewhat weak random source and should not be used on its own,
but it doesn't hurt to xor it with the output of `SecureRandom`.
We use an actor that listens to events in the system and inject them
in our weak pseudo-RNG.
One of ZMQ's drawbacks is that subscribers on an unreliable network may
silently disconnect from publishers in case of network failures.
In our case, we want to reconnect immediately when that happens, so we set
a tcp keep-alive to ensure this.
Fixes#1789