Fixes: #4868
ChangeLog-Fixed: We now no longer self-limit the number of file descriptors (which limits the number of channels) in sufficiently modern systems, or where we can access `/proc` or `/dev/fd`. We still self-limit on old systems where we cannot find the list of open files on `/proc` or `/dev/fd`, so if you need > ~4000 channels, upgrade or mount `/proc`.
This also inadvertently fixes a latent bug: before this patch, in the
`subd` function in `lightningd/subd.c`, we would close `execfail[1]`
*before* doing an `exec`.
We use an EOF on `execfail[1]` as a signal that `exec` succeeded (the
fd is marked CLOEXEC), and otherwise use it to pump `errno` to the
parent.
The intent is that this fd should be kept open until `exec`, at which
point CLOEXEC triggers and close that fd and sends the EOF, *or* if
`exec` fails we can send the `errno` to the parent process vua that
pipe-end.
However, in the previous version, we end up closing that fd *before*
reaching `exec`, either in the loop which `dup2`s passed-in fds (by
overwriting `execfail[1]` with a `dup2`) or in the "close everything"
loop, which does not guard against `execfail[1]`, only
`dev_disconnect_fd`.
WebSocket is a bit weird:
1. It starts like an HTTP connection, but they send special headers.
2. We reply with special headers, one of which involves SHA1 of one of theirs.
3. We are then in WebSocket mode, where each frame starts with a 2-20 byte
header.
We relay data in a simplistic way: if either side sends something, we
read it and relay it synchronously. That avoids any gratuitous
buffering.
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
If the port is set, we spawn it (lightning_websocketd) on any
connection to that port. That means websocketd is a per-peer daemon,
but it means every other daemon uses the connection normally (it's
just actually talking to websocketd instead of the client directly).
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
We can have an update pending because it's too fast, but
refresh_local_channel is supposed to make sure we're up-to-date, so
force immediate application in that case.
Otherwise, we call update_local_channel at the bottom which frees the
pending update. This can mean that we miss a change in fees, for example.
Changelog-Fixed: errors: Errors returning a `channel_update` no longer return an outdated one.
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
Closes: #4860
ChangeLog-Added: With `sqlite3` db backend we now use a 60-second busy timer, to allow backup processes like `litestream` to operate safely.
```
[gw1] [ 98%] PASSED tests/test_wallet.py::test_hsmtool_dump_descriptors
tests/test_wallet.py::test_fundchannel_listtransaction
[gw0] [ 98%] PASSED tests/test_plugin.py::test_channel_opened_notification
tests/test_wallet.py::test_hsmtool_generatehsm
[gw0] [ 98%] PASSED tests/test_wallet.py::test_hsmtool_generatehsm
tests/test_wallet.py::test_withdraw_nlocktime_fuzz
[gw1] [ 98%] ERROR tests/test_wallet.py::test_fundchannel_listtransaction
tests/test_wallet.py::test_fundchannel_listtransaction
tests/test_wallet.py::test_withdraw_nlocktime_fuzz
tests/test_wallet.py::test_fundchannel_listtransaction
[gw0] [ 99%] ERROR tests/test_wallet.py::test_withdraw_nlocktime_fuzz
tests/test_wallet.py::test_multiwithdraw_simple
[gw1] [ 99%] ERROR tests/test_wallet.py::test_fundchannel_listtransaction
tests/test_wallet.py::test_withdraw_nlocktime
tests/test_wallet.py::test_multiwithdraw_simple
tests/test_wallet.py::test_withdraw_nlocktime
tests/test_wallet.py::test_multiwithdraw_simple
tests/test_wallet.py::test_withdraw_nlocktime
[gw0] [ 99%] ERROR tests/test_wallet.py::test_multiwithdraw_simple
tests/test_wallet.py::test_repro_4258
[gw1] [ 99%] ERROR tests/test_wallet.py::test_withdraw_nlocktime
...
2021-10-12 06:36:09.203 UTC [224552] STATEMENT: SELECT version FROM version LIMIT 1
2021-10-12 06:36:09.566 UTC [224523] PANIC: could not write to file "pg_wal/xlogtemp.224523": No space left on device
2021-10-12 06:36:09.566 UTC [224523] STATEMENT: VACUUM FULL;
Error vacuuming db: BEGIN command failed: PANIC: could not write to file "pg_wal/xlogtemp.224523": No space left on device
server closed the connection unexpectedly
This probably means the server terminated abnormally
before or while processing the request.
```
This makes init a two-stage, and causes some code hoisting.
And we can now send all the HTLCs in a single message, since we have
an 128MB limit and each HTLC is 37 bytes.
This breaks the onchaind stresstest, which uses canned internal messages.
It's time to finally delete that.
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
This is particularly useful after our recent field deletion:
before: 362,573,824 bytes
after: 124,190,720 bytes
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
Changelog-Changed: db: removal of old HTLC information and vacuuming shrinks large lightningd.sqlite3 by a factor of 2-3.
And initialize max to current height max when htlcs are already dead.
Turns out (thanks CI!) that MAX() of multiple columns is GREATEST() in
Postgres. That's clearer (MAX is used elsewhere for single columns),
so translate on the sqlite3 side.
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
sendonionmessage can fail when sending a reply, either because
the reply had a bad first peer, or because it went offline. The
latter happens in CI, which is how I found this.
Also fixed typo "onio" -> "onion".
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
Fixes#4482Fixes#4481
Changelog-Added: pay: Payment attempts are now grouped by the pay command that initiated them
Changelog-Fixed: pay: `listpays` returns payments orderd by their creation date
Changelog-Fixed: pay: `listpays` no longer groups attempts from multiple attempts to pay an invoice
This re-establishes the prior behavior where a `sendpay` or
`sendonion` that'd match a prior payment would cause the prior payment
to be deleted. While we no longer delete prior attempts we now avoid a
primary key collision by incrementing once. This helps us not having
to touch all existing tests, and likely avoids breaking other users
too.
One of the fundamental constraints of the payment groups idea is that
there may only ever be one group in flight at any point in time, so if
we find a group that is in flight, any new `sendpay` or `sendonion`
must match its `groupid`.
This was the main cause of the pay states flip-flopping, since we
reset the status on each attempt any final status is not really
final. Let's keep them around, and provide a stable history.
So far we've always been deferring the deletion, retry and early abort
logic to `sendonion` and `sendpay` which do not have the context to
decide if a call is legitimate or not (they were mostly based on
heuristics). By calling `listsendpays` for the invoice's
`payment_hash` we can identify what our `groupid` should be, but more
importantly we can also abort if another payment is pending or a prior
attempt has already succeeded.