Fails liquid-regtest otherwise; liquid tends to hit the dust limit
earlier than non-liquid tx, and MPP exacerbates this by divvying up
payments into dusty bits then attempting to shove them through the same
channel, hitting the dust max. The MPP then fails as not all the parts
were able to arrive at their destination.
Let's make this a softer launch by just warning on the channel til the
feerates go back down.
You can also 'fix' this by upping your dust limit with
the `max-dust-htlc-exposure-msat` config.
for every new added htlc, check that adding it won't go over our 'dust
budget' (which assumes a slightly higher than current feerate, as this
prevents sudden feerate changes from overshooting our dust budget)
note that if the feerate changes surpass the limits we've set, we
immediately fail the channel.
If we're over the dust limit, we fail it immediatey *after* commiting
it, but we need a way to signal this throughout the lifecycle, so we add
it to htlc_in struct and persist it through to the database.
If it's supposed to be failed, we fail after the commit cycle is
completed.
To reduce the surface area of amount of a channel balance that can be
eaten up as htlc dust, we introduce a new config
'--max-dust-htlc-exposure-msat', which sets the max amount that any
channel's balance can be added as dust
Changelog-Added: config: new option --max-dust-htlc-exposure-msat, which limits the total amount of sats to be allowed as dust on a channel
Changelog-Changed: pay: The route selection will now use the log-propability-based channel selection to increase success rate and reduce time to completion
We bias by channel linearly by capacity, scaled by median fee.
This means that we effectively double the fee if we would use the
entire capacity, and only increase it by 50% if we would only use
1/2 the capacity.
This should drive us towards larger channels.
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
Changelog-Changed: Plugins: `pay` now biases towards larger channels, improving success probability.
Fixes: #4868
ChangeLog-Fixed: We now no longer self-limit the number of file descriptors (which limits the number of channels) in sufficiently modern systems, or where we can access `/proc` or `/dev/fd`. We still self-limit on old systems where we cannot find the list of open files on `/proc` or `/dev/fd`, so if you need > ~4000 channels, upgrade or mount `/proc`.
This also inadvertently fixes a latent bug: before this patch, in the
`subd` function in `lightningd/subd.c`, we would close `execfail[1]`
*before* doing an `exec`.
We use an EOF on `execfail[1]` as a signal that `exec` succeeded (the
fd is marked CLOEXEC), and otherwise use it to pump `errno` to the
parent.
The intent is that this fd should be kept open until `exec`, at which
point CLOEXEC triggers and close that fd and sends the EOF, *or* if
`exec` fails we can send the `errno` to the parent process vua that
pipe-end.
However, in the previous version, we end up closing that fd *before*
reaching `exec`, either in the loop which `dup2`s passed-in fds (by
overwriting `execfail[1]` with a `dup2`) or in the "close everything"
loop, which does not guard against `execfail[1]`, only
`dev_disconnect_fd`.
WebSocket is a bit weird:
1. It starts like an HTTP connection, but they send special headers.
2. We reply with special headers, one of which involves SHA1 of one of theirs.
3. We are then in WebSocket mode, where each frame starts with a 2-20 byte
header.
We relay data in a simplistic way: if either side sends something, we
read it and relay it synchronously. That avoids any gratuitous
buffering.
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
If the port is set, we spawn it (lightning_websocketd) on any
connection to that port. That means websocketd is a per-peer daemon,
but it means every other daemon uses the connection normally (it's
just actually talking to websocketd instead of the client directly).
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
We can have an update pending because it's too fast, but
refresh_local_channel is supposed to make sure we're up-to-date, so
force immediate application in that case.
Otherwise, we call update_local_channel at the bottom which frees the
pending update. This can mean that we miss a change in fees, for example.
Changelog-Fixed: errors: Errors returning a `channel_update` no longer return an outdated one.
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
Closes: #4860
ChangeLog-Added: With `sqlite3` db backend we now use a 60-second busy timer, to allow backup processes like `litestream` to operate safely.
```
[gw1] [ 98%] PASSED tests/test_wallet.py::test_hsmtool_dump_descriptors
tests/test_wallet.py::test_fundchannel_listtransaction
[gw0] [ 98%] PASSED tests/test_plugin.py::test_channel_opened_notification
tests/test_wallet.py::test_hsmtool_generatehsm
[gw0] [ 98%] PASSED tests/test_wallet.py::test_hsmtool_generatehsm
tests/test_wallet.py::test_withdraw_nlocktime_fuzz
[gw1] [ 98%] ERROR tests/test_wallet.py::test_fundchannel_listtransaction
tests/test_wallet.py::test_fundchannel_listtransaction
tests/test_wallet.py::test_withdraw_nlocktime_fuzz
tests/test_wallet.py::test_fundchannel_listtransaction
[gw0] [ 99%] ERROR tests/test_wallet.py::test_withdraw_nlocktime_fuzz
tests/test_wallet.py::test_multiwithdraw_simple
[gw1] [ 99%] ERROR tests/test_wallet.py::test_fundchannel_listtransaction
tests/test_wallet.py::test_withdraw_nlocktime
tests/test_wallet.py::test_multiwithdraw_simple
tests/test_wallet.py::test_withdraw_nlocktime
tests/test_wallet.py::test_multiwithdraw_simple
tests/test_wallet.py::test_withdraw_nlocktime
[gw0] [ 99%] ERROR tests/test_wallet.py::test_multiwithdraw_simple
tests/test_wallet.py::test_repro_4258
[gw1] [ 99%] ERROR tests/test_wallet.py::test_withdraw_nlocktime
...
2021-10-12 06:36:09.203 UTC [224552] STATEMENT: SELECT version FROM version LIMIT 1
2021-10-12 06:36:09.566 UTC [224523] PANIC: could not write to file "pg_wal/xlogtemp.224523": No space left on device
2021-10-12 06:36:09.566 UTC [224523] STATEMENT: VACUUM FULL;
Error vacuuming db: BEGIN command failed: PANIC: could not write to file "pg_wal/xlogtemp.224523": No space left on device
server closed the connection unexpectedly
This probably means the server terminated abnormally
before or while processing the request.
```
This makes init a two-stage, and causes some code hoisting.
And we can now send all the HTLCs in a single message, since we have
an 128MB limit and each HTLC is 37 bytes.
This breaks the onchaind stresstest, which uses canned internal messages.
It's time to finally delete that.
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
This is particularly useful after our recent field deletion:
before: 362,573,824 bytes
after: 124,190,720 bytes
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
Changelog-Changed: db: removal of old HTLC information and vacuuming shrinks large lightningd.sqlite3 by a factor of 2-3.
And initialize max to current height max when htlcs are already dead.
Turns out (thanks CI!) that MAX() of multiple columns is GREATEST() in
Postgres. That's clearer (MAX is used elsewhere for single columns),
so translate on the sqlite3 side.
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>