Be more graceful in shutting down: this should fix the issue where
bookkeeper gets upset that its commands are rejected during shutdown,
and generally make things more graceful.
1. Stop any new RPC connections.
2. Stop any per-peer daemons (channeld, etc).
3. Shut down plugins.
4. Stop all existing RPC connections.
5. Stop global daemons.
6. Free up peer, chanen HTLC datastructures.
7. Close database.
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
Changelog-Changed: Plugins: RPC operations are now still available during shutdown.
This alters the billboard, but that's a human-readable thing so not
noted in CHANGELOG.
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
Changelog-Changed: JSON-RPC: `listpeers` `status` now refers to "channel ready" rather than "funding locked" (BOLT language change for zeroconf channels)
Changelog-Added: JSON-RPC: `channel_opened` notification `channel_ready` flag.
Changelog-Deprecated: JSON-RPC: `channel_opened` notification `funding_locked` flag (use `channel_ready`: BOLTs namechange).
This contains the zeroconf stuff, with funding_locked renamed to
channel_ready. I change that everywhere, and try to fix up the
comments.
Also the `alias` field is called `short_channel_id`.
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
Changelog-Changed: Protocol: `funding_locked` is now called `channel_ready` as per latest BOLTs.
We were seeing flakes due to channel closures on intermidiary nodes if
the blocks came too fast.
If we use a star topology it's actually what we want and it's much simpler to
figure out what's going on.
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
We don't actually care how it does it, just that it ends the HTLCs,
so simplify this logic which tries to match it exactly.
This also fixes the flake where we would sometimes close the upstream
channels, simply because we now wait at least one second per block.
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
First, how we record "our_funds" and then apply pushes vs lease_fees
(for liquidity ad buys/sales) was exactly opposite.
For pushes we were reporting the total funded into the channel, with the
push representing how much we'd later moved to the peer.
For lease_fees we were rerporting the total in the channel, with the
push representing how much was already moved to the peer.
We fix this (from a view perspective) by re-adding lease fees to what's
reported in the channel funding totals. Since this is now new behavior
(for leased channel values), we added new fields so we can take the old
field names thru a deprecation cycle.
We also make it possible to differentiate btw a push and a lease_fee
(before they were all the same), by adding to new fields to `listpeers`:
`fee_paid_msat` and `fee_rcvd_msat`.
This allows us to avoid math in the bookkeeper, instead we just pick
the numbers out directly and record them.
Fixes#5472
Changelog-Added: JSON-RPC: `listpeers` now has a few new fields for `funding` (`remote_funds_msat`, `local_funds_msat`, `fee_paid_msat`, `fee_rcvd_msat`).
Changelog-Deprecated: JSON-RPC: `listpeers`.`funded` fields `local_msat` and `remote_msat` are now deprecated.
If we expect further events for an onchain output (because we can steal
it away from the 'external'/rightful owner), we mark them.
This prevents us from marking a channel as 'onchain-resolved' before
all events that we're interested in have actually hit the chain.
Case that this matters:
Peer publishes a (cheating) unilateral close and a timeout htlc (which
we can steal).
We then steal the timeout htlc.
W/o the stealable flag, we'd have marked the channel as resolved when
the peer published the timeout htlc, which is incorrect as we're still
waiting for the resolution of that timeout htlc (b/c we *can* steal it).
We were failing to mark channels as resolved b/c we weren't using later
events to external (but originated from this account) events as signals
to run the channel resolution check.
This fixes that, and adds a test.
Adds schema definitions and manpages for bkpr- commands; also renames
the commands to all start with 'bkpr-', so they're easier to identify/
make runes about.
Anchor outputs are ignored by the clightning wallet, but we keep track
of them in the bookkeeper. This causes problems when we do the balance
checks on restart w/ the balance_snapshot -- it results in us printing
out a journal_entry to 'get rid of' the anchors that the clightning node
doesnt know about.
Instead, we mark some outputs as 'ignored' and exclude these from our
account balance sums when we're comparing to the clightning snapshot.
This is a rare case where we RBF the output of a penalty until it no
longer has an output value we can reclaim. We ignore the txid for these
events when closing a channel.
onchain fees are weird at channel close because:
- you may be missing an trimmed htlc (which went to fees)
- the balance from close may have been rounded (msats cant land on
chain)
- the close might have been a past state and you've actually
ended up with more money onchain than you had in the channel. wut
This commit accounts for all of this appropriately, with some tests.
channel_close.debit should equal onchain_fee.credit (for that txid)
plus sum(chain_event.credit [wallet/channel_acct]).
In the penalty case, channel_close.debit becomes channel_close.debit +
penalty_adj.debit, i.e.
channel-close.debit + (penalty_adj.debit) =
onchain_fee.credit
+ sum(chain_event.credit [wallet/channel_acct])
We close the channel because the payment fulfilment is not totally done, and
we generate 6 blocks, causing it to hit CLTV deadline.
```
# send some payments, mine a block or two
inv = l2.rpc.invoice(10**4, '1', 'no_1')
l1.rpc.pay(inv['bolt11'])
# l2 attempts to close a channel that it leased, should fail
with pytest.raises(RpcError, match=r'Peer leased this channel from us'):
l2.rpc.close(l1.get_channel_scid(l2))
bitcoind.generate_block(6)
sync_blockheight(bitcoind, [l1, l2])
# make sure we're at the right place for the csv lock
> l2.daemon.wait_for_log('Blockheight: SENT_ADD_ACK_COMMIT->RCVD_ADD_ACK_REVOCATION LOCAL now 115')
tests/test_closing.py:823:
...
lightningd-2 2022-07-17T13:15:34.242Z DEBUG lightningd: Adding block 115: 39d95061935e9fc42b04c86ae60d0cf157765aff4c040f3a8d0b7888db19e015
lightningd-2 2022-07-17T13:15:34.244Z UNUSUAL 0266e4598d1d3c415f572a8488830b60f7e744ed9235eb0b1ba93283b315c03518-chan#2: Peer permanent failure in CHANNELD_NORMAL: Fulfilled HTLC 0 SENT_REMOVE_COMMIT cltv 115 hit deadline
```
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
The only places which should call try_reconnect now are the "connect"
command, and the disconnect path when it decides there's still an
active channel.
This introduces one subtlety: if we disconnect when there's no active
channel, but then the subd makes one, we have to catch that case!
This temporarily reverts "slow" reconnections to fast ones: see next
patch.
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
Connectd already does this when we *receive* an error or warning, but
now do it on send. This causes some slight behavior change: we don't
disconnect when we close a channel, for example (our behaviour here
has been inconsistent across versions, depending on the code).
When connectd is told to disconnect, it now does so immediately, and
doesn't wait for subds to drain etc. That simplifies the manual
disconnect case, which now cleans up as it would from any other
disconnection when connectd says it's disconnected.
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
In various places, we assumed that when `connected` is false,
everything is finished. This is not true: we should wait for the
state we expect.
In addition, various places allows reconnections, which interfered
with the logic; suppress them.
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
We do smoothing, waiting for this to hit the target (50k+) was taking
longer than my TIMEOUT=15; here we increase the speed at which it hits
exit velocity, so to speak.
We cleanup our output tracking for timeout txs when the peer's
htlc_timeout self-expiry is hit; we'd also log its spend if happen to
see it get spent.
This is a bit of a race as they can't spend it until the locktime is
available. Hence the flakiness in tests that expected the `htlc_timeout`
to *not* be spent.
Instead, we only log an external's `htlc_timeout` spend in the case
where we also immediately register another output to track for it (only
happens when said htlc is stealable)
Fixes#5405
In-Collab-With: @ddustin
Looking at the CI logs, it seems like it took over 5 seconds, so
the unilateral close occurred instead of the expected rejection
of the WIRE_SHUTDOWN reply. Make it bulletproof.
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
Make sure bitcoind gets tx before mining blocks!
```
# l1<->l2 mutual close should work
chan = l1.get_channel_scid(l2)
l2.rpc.connect(l1.info['id'], 'localhost', l1.port)
l1.rpc.close(chan)
l2.daemon.wait_for_log('State changed from CLOSINGD_SIGEXCHANGE to CLOSINGD_COMPLETE')
bitcoind.generate_block(2)
sync_blockheight(bitcoind, [l1, l2])
> l1.daemon.wait_for_log('Resolved FUNDING_TRANSACTION/FUNDING_OUTPUT by MUTUAL_CLOSE')
tests/test_closing.py:851:
```
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
Split harder!
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
```
# This can timeout, so do it in four easy stages.
for i in range(4):
> bitcoind.generate_block(4032 // 4)
tests/test_closing.py:971:
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _
contrib/pyln-testing/pyln/testing/utils.py:475: in generate_block
return self.rpc.generatetoaddress(numblocks, to_addr)
contrib/pyln-testing/pyln/testing/utils.py:372: in f
res = proxy._call(name, *args)
/opt/hostedtoolcache/Python/3.7.13/x64/lib/python3.7/site-packages/bitcoin/rpc.py:233: in _call
response = self._get_response()
/opt/hostedtoolcache/Python/3.7.13/x64/lib/python3.7/site-packages/bitcoin/rpc.py:263: in _get_response
http_response = self.__conn.getresponse()
/opt/hostedtoolcache/Python/3.7.13/x64/lib/python3.7/http/client.py:1373: in getresponse
response.begin()
/opt/hostedtoolcache/Python/3.7.13/x64/lib/python3.7/http/client.py:319: in begin
version, status, reason = self._read_status()
/opt/hostedtoolcache/Python/3.7.13/x64/lib/python3.7/http/client.py:280: in _read_status
line = str(self.fp.readline(_MAXLINE + 1), "iso-8859-1")
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _
self = <socket.SocketIO object at 0x7f1a4b3702d0>
b = <memory at 0x7f1a4ac51600>
def readinto(self, b):
"""Read up to len(b) bytes into the writable buffer *b* and return
the number of bytes read. If the socket is non-blocking and no bytes
are available, None is returned.
If *b* is non-empty, a 0 return value indicates that the connection
was shutdown at the other end.
"""
self._checkClosed()
self._checkReadable()
if self._timeout_occurred:
raise OSError("cannot read from timed out object")
while True:
try:
> return self._sock.recv_into(b)
E socket.timeout: timed out
```
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
We should be using amount_msat always. Many tests were not. Plus,
deprecating it simplifies the code.
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
Changelog-Deprecated: JSONRPC: `sendpay` `route` elements `msatoshi` (use `amount_msat`)
This is consistent with our output changes, and increases consistency.
It also keeps future sanity checks happy, that we only use JSON msat
helpers with '_msat' fields.
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
Changelog-Changed: JSON-RPC: `invoice`, `sendonion`, `sendpay`, `pay`, `keysend`, `fetchinvoice`, `sendinvoice`: `msatoshi` argument is now called `amount_msat` to match other fields.
Changelog-Deprecated: JSON-RPC: `invoice`, `sendonion`, `sendpay`, `pay`, `keysend`, `fetchinvoice`, `sendinvoice` `msatoshi` (use `amount_msat`)
The new msat fields are turned into Millisatoshi, so handle that correctly
too in tests too.
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
Changelog-Deprecated: Plugins: `coin_movement` notification: `balance`, `credit`, `debit` and `fees` (use `balance_msat`, `credit_msat`, `debit_msat` and `fees_msat`)
We do this (send warnings) in almost all cases anyway, so mainly this
is a textual update, but there are some changes:
1. Send ERROR not WARNING if they send a malformed commitment secret.
2. Send WARNING not ERROR if they get the shutdown_scriptpubkey wrong (vs upfront)
3. Send WARNING not ERROR if they send a bad shutdown_scriptpubkey (e.g. p2pkh in future)
4. Rename some vars 'err' to 'warn' to make it clear we send a warning.
This means test_option_upfront_shutdown_script can be made reliable, too,
and it now warns and doesn't automatically close channel.
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
This doesn't have an effect now (except in experimental mode), but it
will when we support anchors. So we deprecate the use of those in the
close command too.
For experimental mode we have to avoid using p2pkh; adapt that test.
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
Changelog-Deprecated: JSON-RPC: `shutdown` no longer allows p2pkh or p2sh addresses.
1. Don't use dust HTLCs.
2. Make l3 unresponsive, like report.
3. Make l2-l3 fail because we time out on successive HTLC.
We use sendpay rather than pay, because pay can do multiple attempts.
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
Either because lightningd tells us it wants to talk, or because the peer
says something about a channel.
We also introduce a behavior change: we disconnect after a failed open.
We might want to modify this later, but we it's a side-effect of openingd
not holding onto idle connections.
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
Only shows up on delayed to us outputs, but nice to have anyway.
It's missing for channel index destined deposits, maybe nice to add at
some point in the future; right now you can figure out which close a
wallet deposit comes from via the channel close txid
Changelog-Experimental: option `--lease-fee-base-msat` renamed to `--lease-fee-base-sat`
Changelog-Experimental: option `--lease-fee-base-msat` deprecated and will be removed next release
These tests have proven to be:
a) very expensive, as they spin up many nodes, and perform long setup
b) are not testing anything specific, they just fuzz functionality
that is already tested otherwise
c) have not helped pinpoint any issues in living memory
d) are very flaky, making for really bad signal-to-noise, so much
that devs usually just restart without even looking at the logs
e) even if we were to look at the logs, we'd be unable to reproduce
due to the inherent randomness involved in these tests
f) are really noisy neighbors, causing other tests to flake as well,
further muddying the water
All in all, these tests are a waste of time, and source of
frustration.
[ Cleaned up python unused imports --RR ]
Changelog-None
We really need our own lnprototest tests for packet-based stuff;
these message-based tests are inherently delicate and awkward.
In particular, connectd now does dev-disconnect, so the socket is not
immediately closed after a dev-disconnect command. In this case, the
WIRE_SHUTDOWN has often already been written from connectd to channeld.
But it sometimes works, too.
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
If we call update_channel_from_inflight *twice* with the same inflight, we
will get bad results. Using tal_steal() here was a premature optimization:
```
Valgrind error file: valgrind-errors.496395
==496395== Invalid read of size 8
==496395== at 0x22A9D3: to_tal_hdr (tal.c:174)
==496395== by 0x22B4B5: tal_steal_ (tal.c:498)
==496395== by 0x16A13D: update_channel_from_inflight (peer_control.c:1225)
==496395== by 0x16A4C7: funding_depth_cb (peer_control.c:1299)
==496395== by 0x182807: txw_fire (watch.c:232)
==496395== by 0x182AA9: watch_topology_changed (watch.c:300)
==496395== by 0x1290ED: updates_complete (chaintopology.c:624)
==496395== by 0x129BF4: get_new_block (chaintopology.c:835)
==496395== by 0x125EEF: getrawblockbyheight_callback (bitcoind.c:362)
==496395== by 0x176ECC: plugin_response_handle (plugin.c:584)
==496395== by 0x1770F5: plugin_read_json_one (plugin.c:690)
==496395== by 0x1772D9: plugin_read_json (plugin.c:735)
==496395== Address 0x89fbb08 is 24 bytes inside a block of size 104 free'd
==496395== at 0x483CA3F: free (in /usr/lib/x86_64-linux-gnu/valgrind/vgpreload_memcheck-amd64-linux.so)
==496395== by 0x22B193: del_tree (tal.c:421)
==496395== by 0x22B461: tal_free (tal.c:486)
==496395== by 0x16A123: update_channel_from_inflight (peer_control.c:1223)
==496395== by 0x16A4C7: funding_depth_cb (peer_control.c:1299)
==496395== by 0x182807: txw_fire (watch.c:232)
==496395== by 0x182AA9: watch_topology_changed (watch.c:300)
==496395== by 0x1290ED: updates_complete (chaintopology.c:624)
==496395== by 0x129BF4: get_new_block (chaintopology.c:835)
==496395== by 0x125EEF: getrawblockbyheight_callback (bitcoind.c:362)
==496395== by 0x176ECC: plugin_response_handle (plugin.c:584)
==496395== by 0x1770F5: plugin_read_json_one (plugin.c:690)
==496395== Block was alloc'd at
==496395== at 0x483B7F3: malloc (in /usr/lib/x86_64-linux-gnu/valgrind/vgpreload_memcheck-amd64-linux.so)
==496395== by 0x22AC1C: allocate (tal.c:250)
==496395== by 0x22B1DD: tal_alloc_ (tal.c:428)
==496395== by 0x22B3A6: tal_alloc_arr_ (tal.c:471)
==496395== by 0x22C094: tal_dup_ (tal.c:805)
==496395== by 0x12B274: new_inflight (channel.c:187)
==496395== by 0x136D4C: wallet_commit_channel (dual_open_control.c:1260)
==496395== by 0x13B084: handle_commit_received (dual_open_control.c:2839)
==496395== by 0x13B6AF: dual_opend_msg (dual_open_control.c:2976)
==496395== by 0x1809FF: sd_msg_read (subd.c:553)
==496395== by 0x218F5D: next_plan (io.c:59)
==496395== by 0x219B65: do_plan (io.c:407)
```
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
If we fund a channel between two nodes, then mine all the blocks to
announce it, any other nodes may see the announcement before the
blocks, causing CI to complain about "bad gossip":
```
lightningd-4: 2022-01-25T22:33:25.468Z DEBUG 032cf15d1ad9c4a08d26eab1918f732d8ef8fdc6abb9640bf3db174372c491304e-gossipd: Ignoring future channel_announcment for 113x1x1 (current block 112)
lightningd-4: 2022-01-25T22:33:25.468Z DEBUG 032cf15d1ad9c4a08d26eab1918f732d8ef8fdc6abb9640bf3db174372c491304e-gossipd: Bad gossip order: WIRE_CHANNEL_UPDATE before announcement 113x1x1/0
lightningd-4: 2022-01-25T22:33:25.468Z DEBUG 032cf15d1ad9c4a08d26eab1918f732d8ef8fdc6abb9640bf3db174372c491304e-gossipd: Bad gossip order: WIRE_CHANNEL_UPDATE before announcement 113x1x1/1
lightningd-4: 2022-01-25T22:33:25.468Z DEBUG 032cf15d1ad9c4a08d26eab1918f732d8ef8fdc6abb9640bf3db174372c491304e-gossipd: Bad gossip order: WIRE_NODE_ANNOUNCEMENT before announcement 032cf15d1ad9c4a08d26eab1918f732d8ef8fdc6abb9640bf3db174372c491304e
```
Add a new helper for this case, and use it where there are more than 2 nodes.
Cleans up test_routing_gossip and a few other places which did this manually.
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
This fixes lightningd's chronic weight underestimate.
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
Changelog-Fixed: closingd: more accurate weight estimation helps mutual closing near min/max feerates.
There's actually a bug in our closing tx size estimation; I'll do
a separate patch for this, though.
Seems this used to be flaky, now we always flush queues, so it's
more reliably caught.
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>