Also has to fix up tests.
Changelog-Fixed: cli doesn't required anymore to confirm the password if the `hsm_secret` is already encrypted.
Signed-off-by: Vincenzo Palazzo <vincenzopalazzodev@gmail.com>
This was missed in e8d2176e6b.
```
> raise ValueError(str(errors))
E ValueError:
E Node errors:
E - lightningd-2: had bad gossip messages
E - lightningd-3: had bad gossip messages
E Global errors:
contrib/pyln-testing/pyln/testing/fixtures.py:201: ValueError
...
0266e4598d1d3c415f572a8488830b60f7e744ed9235eb0b1ba93283b315c03518-gossipd: Ignoring future channel_announcment for 105x1x2 (current block 104)
0266e4598d1d3c415f572a8488830b60f7e744ed9235eb0b1ba93283b315c03518-gossipd: Bad gossip order: WIRE_CHANNEL_UPDATE before announcement 105x1x2/0
```
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
Make sure it sees disconnect before reconnect, otherwise the next command
fails since we're now disconnected.
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
We may not see a disconnect instantly:
```
> assert len(l2.rpc.listpeers()['peers']) == 0
E assert 1 == 0
E +1
E -0
```
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
This is generally verboten now, since there can be multiple. There are a
few exceptions:
1. We sometimes want to know if there are *any* active channels.
2. Some dev commands still take peer id when they mean channel_id.
3. We still allow peer id when it's fully determined.
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
Changelog-Changed: JSON-RPC: `close` by peer id will fail if there is more than one live channel (use `channel_id` or `short_channel_id` as id arg).
Generally this means converting a lazy "peer_active_channel(peer)" call
into an explicit iteration.
1. notify_feerate_change: call all channels (ignores non-active ones anyway).
2. peer_get_owning_subd remove unused function.
3. peer_connected hook: don't save channel, do lookup and iterate channels.
4. In json_setchannelfee "all" remove useless call to peer_active_channel
since we check state anyway, and iterate.
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
Rather than intuiting whether this is a new channel / active channel,
use the channel_id. This simplifies things and makes them explicit,
and prepares for multiple live channels per peer.
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
Either because lightningd tells us it wants to talk, or because the peer
says something about a channel.
We also introduce a behavior change: we disconnect after a failed open.
We might want to modify this later, but we it's a side-effect of openingd
not holding onto idle connections.
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
openingd currently holds the connection to idle peers, but we're about
to change that: it will only look after peers which are actively
opening a connection. We can start this process by disconnecting
whenever we have a negotiation failure.
We could stay connected if we wanted to, but that would be up to
connectd to decide. Right now it's easier if we disconnect from any
idle peer once it's been active.
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
Suggested by @m-schmook, I realized that if we append it later I'll
never get it right: I expect parameters min and max, not max and min!
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
Changelog-Added: Protocol: you can now alter the `htlc_minimum_msat` and `htlc_maximum_msat` your node advertizes.
We still use the channel hint here (as it's the only option), we just
warn about lack of capacity.
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
We need to add some, since our internal representations of
htlc_maximum_msat round up, and we need to disable mpp which succeeds
in getting a payment through by splitting.
We also allow dev_routes to suppress invoice routehints altogether.
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
As per proposal in https://github.com/lightning/bolts/pull/962
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
Changelog-Removed: protocol: support for legacy onion format removed, since everyone supports the new one.
I thought about fixing them up, but really these should be in
lnprototest anyway. Turns out they're from the spec, so we should
actually fix them up there.
I moved the vector files into contrib/pyln-proto, since that still
needs them.
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
When compiled without DEVELOPER this will now filter out `remote_addr` that
come from localhost. The testcase checks for DEVELOPER to test for correct
function of `remote_addr`.
Also, I renamed "test_connect" to "test_connect_basic" so it can be started
without all the other tests in that file that start with "test_connect..."
For now hooks are treated identically to rpcmethods, with the
exception of not being returned in the `getmanifest` call. Later on we
can add typed handlers as well.
Having a list of very targeted suppressions allows us to still run the
majority of tests with valgrind checking, and not fail when Rust does
some trickery. This is for example the case with `std::sync::Once`
which uses `num_procs` calling out to the cgroups subsystem, sometimes
with a null path.
Suggested-by: Rusty Russell <@rustyrussell>
`valgrind` reports seems to flag some memory accesses that are ok in
the Rust standard library, which we can consider false positives for
our purposes:
```Valgrind error file: valgrind-errors.69147
==69147== Syscall param statx(file_name) points to unaddressable byte(s)
==69147== at 0x4B049FE: statx (statx.c:29)
==69147== by 0x2E2DA0: std::sys::unix::fs::try_statx (weak.rs:139)
==69147== by 0x2D7BD5: <std::fs::File as std::io::Read>::read_to_string (fs.rs:784)
==69147== by 0x2632CE: num_cpus::linux::Cgroup::param (linux.rs:214)
==69147== by 0x263179: num_cpus::linux::Cgroup::quota_us (linux.rs:203)
==69147== by 0x263002: num_cpus::linux::Cgroup::cpu_quota (linux.rs:188)
==69147== by 0x262C01: num_cpus::linux::load_cgroups (linux.rs:149)
==69147== by 0x26289D: num_cpus::linux::init_cgroups (linux.rs:129)
==69147== by 0x26BD88: core::ops::function::FnOnce::call_once (function.rs:227)
==69147== by 0x26B749: std::sync::once::Once::call_once::{{closure}} (once.rs:262)
==69147== by 0x139717: std::sync::once::Once::call_inner (once.rs:419)
==69147== by 0x26B6D5: std::sync::once::Once::call_once (once.rs:262)
==69147== Address 0x0 is not stack'd, malloc'd or (recently) free'd
==69147==
==69147== Syscall param statx(buf) points to unaddressable byte(s)
==69147== at 0x4B049FE: statx (statx.c:29)
==69147== by 0x2E2DA0: std::sys::unix::fs::try_statx (weak.rs:139)
==69147== by 0x2D7BD5: <std::fs::File as std::io::Read>::read_to_string (fs.rs:784)
==69147== by 0x2632CE: num_cpus::linux::Cgroup::param (linux.rs:214)
==69147== by 0x263179: num_cpus::linux::Cgroup::quota_us (linux.rs:203)
==69147== by 0x263002: num_cpus::linux::Cgroup::cpu_quota (linux.rs:188)
==69147== by 0x262C01: num_cpus::linux::load_cgroups (linux.rs:149)
==69147== by 0x26289D: num_cpus::linux::init_cgroups (linux.rs:129)
==69147== by 0x26BD88: core::ops::function::FnOnce::call_once (function.rs:227)
==69147== by 0x26B749: std::sync::once::Once::call_once::{{closure}} (once.rs:262)
==69147== by 0x139717: std::sync::once::Once::call_inner (once.rs:419)
==69147== by 0x26B6D5: std::sync::once::Once::call_once (once.rs:262)
==69147== Address 0x0 is not stack'd, malloc'd or (recently) free'd
==69147==
```
Only shows up on delayed to us outputs, but nice to have anyway.
It's missing for channel index destined deposits, maybe nice to add at
some point in the future; right now you can figure out which close a
wallet deposit comes from via the channel close txid
Changelog-Experimental: option `--lease-fee-base-msat` renamed to `--lease-fee-base-sat`
Changelog-Experimental: option `--lease-fee-base-msat` deprecated and will be removed next release
These tests have proven to be:
a) very expensive, as they spin up many nodes, and perform long setup
b) are not testing anything specific, they just fuzz functionality
that is already tested otherwise
c) have not helped pinpoint any issues in living memory
d) are very flaky, making for really bad signal-to-noise, so much
that devs usually just restart without even looking at the logs
e) even if we were to look at the logs, we'd be unable to reproduce
due to the inherent randomness involved in these tests
f) are really noisy neighbors, causing other tests to flake as well,
further muddying the water
All in all, these tests are a waste of time, and source of
frustration.
[ Cleaned up python unused imports --RR ]
Changelog-None
This restores the behaviour prior to `lightningd: use our cached
channel_update for errors instead of asking gossipd.`, where gossipd
would refuse to give us channel_updates for unannounced channels.
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
But this requires a watch-only wallet, and python-bitcoinlib doesn't support
multiple wallets, so we need to unload the original one, but then we need
to generate a block, so that can't generate a new address, so we need
an address arg to generate_block.
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
We really need our own lnprototest tests for packet-based stuff;
these message-based tests are inherently delicate and awkward.
In particular, connectd now does dev-disconnect, so the socket is not
immediately closed after a dev-disconnect command. In this case, the
WIRE_SHUTDOWN has often already been written from connectd to channeld.
But it sometimes works, too.
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
If the HTLCs are completely negotiated, we can get a channel break when
we mine a pile of blocks. This is mainly seen with Postgres, due to the db
speed.
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
If we call update_channel_from_inflight *twice* with the same inflight, we
will get bad results. Using tal_steal() here was a premature optimization:
```
Valgrind error file: valgrind-errors.496395
==496395== Invalid read of size 8
==496395== at 0x22A9D3: to_tal_hdr (tal.c:174)
==496395== by 0x22B4B5: tal_steal_ (tal.c:498)
==496395== by 0x16A13D: update_channel_from_inflight (peer_control.c:1225)
==496395== by 0x16A4C7: funding_depth_cb (peer_control.c:1299)
==496395== by 0x182807: txw_fire (watch.c:232)
==496395== by 0x182AA9: watch_topology_changed (watch.c:300)
==496395== by 0x1290ED: updates_complete (chaintopology.c:624)
==496395== by 0x129BF4: get_new_block (chaintopology.c:835)
==496395== by 0x125EEF: getrawblockbyheight_callback (bitcoind.c:362)
==496395== by 0x176ECC: plugin_response_handle (plugin.c:584)
==496395== by 0x1770F5: plugin_read_json_one (plugin.c:690)
==496395== by 0x1772D9: plugin_read_json (plugin.c:735)
==496395== Address 0x89fbb08 is 24 bytes inside a block of size 104 free'd
==496395== at 0x483CA3F: free (in /usr/lib/x86_64-linux-gnu/valgrind/vgpreload_memcheck-amd64-linux.so)
==496395== by 0x22B193: del_tree (tal.c:421)
==496395== by 0x22B461: tal_free (tal.c:486)
==496395== by 0x16A123: update_channel_from_inflight (peer_control.c:1223)
==496395== by 0x16A4C7: funding_depth_cb (peer_control.c:1299)
==496395== by 0x182807: txw_fire (watch.c:232)
==496395== by 0x182AA9: watch_topology_changed (watch.c:300)
==496395== by 0x1290ED: updates_complete (chaintopology.c:624)
==496395== by 0x129BF4: get_new_block (chaintopology.c:835)
==496395== by 0x125EEF: getrawblockbyheight_callback (bitcoind.c:362)
==496395== by 0x176ECC: plugin_response_handle (plugin.c:584)
==496395== by 0x1770F5: plugin_read_json_one (plugin.c:690)
==496395== Block was alloc'd at
==496395== at 0x483B7F3: malloc (in /usr/lib/x86_64-linux-gnu/valgrind/vgpreload_memcheck-amd64-linux.so)
==496395== by 0x22AC1C: allocate (tal.c:250)
==496395== by 0x22B1DD: tal_alloc_ (tal.c:428)
==496395== by 0x22B3A6: tal_alloc_arr_ (tal.c:471)
==496395== by 0x22C094: tal_dup_ (tal.c:805)
==496395== by 0x12B274: new_inflight (channel.c:187)
==496395== by 0x136D4C: wallet_commit_channel (dual_open_control.c:1260)
==496395== by 0x13B084: handle_commit_received (dual_open_control.c:2839)
==496395== by 0x13B6AF: dual_opend_msg (dual_open_control.c:2976)
==496395== by 0x1809FF: sd_msg_read (subd.c:553)
==496395== by 0x218F5D: next_plan (io.c:59)
==496395== by 0x219B65: do_plan (io.c:407)
```
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
If we fund a channel between two nodes, then mine all the blocks to
announce it, any other nodes may see the announcement before the
blocks, causing CI to complain about "bad gossip":
```
lightningd-4: 2022-01-25T22:33:25.468Z DEBUG 032cf15d1ad9c4a08d26eab1918f732d8ef8fdc6abb9640bf3db174372c491304e-gossipd: Ignoring future channel_announcment for 113x1x1 (current block 112)
lightningd-4: 2022-01-25T22:33:25.468Z DEBUG 032cf15d1ad9c4a08d26eab1918f732d8ef8fdc6abb9640bf3db174372c491304e-gossipd: Bad gossip order: WIRE_CHANNEL_UPDATE before announcement 113x1x1/0
lightningd-4: 2022-01-25T22:33:25.468Z DEBUG 032cf15d1ad9c4a08d26eab1918f732d8ef8fdc6abb9640bf3db174372c491304e-gossipd: Bad gossip order: WIRE_CHANNEL_UPDATE before announcement 113x1x1/1
lightningd-4: 2022-01-25T22:33:25.468Z DEBUG 032cf15d1ad9c4a08d26eab1918f732d8ef8fdc6abb9640bf3db174372c491304e-gossipd: Bad gossip order: WIRE_NODE_ANNOUNCEMENT before announcement 032cf15d1ad9c4a08d26eab1918f732d8ef8fdc6abb9640bf3db174372c491304e
```
Add a new helper for this case, and use it where there are more than 2 nodes.
Cleans up test_routing_gossip and a few other places which did this manually.
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
We were relying on the fee update to create an additional tx. That's
ugly; do an actual payment and make sure we definitely complete a new
tx by waiting for that *then* both revoke_and_ack.
(Without this, we could get a unilateral close instead of a penalty).
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
This is neater than what we had before, and slightly more general.
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
Changelog-Changed: JSON_RPC: `sendcustommsg` now works with any connected peer, even when shutting down a channel.
Next patch starts a timeout ping, which can interfere with results.
In theory, we should reply, but in practice (so far!) we seem to get enough
time that it doesn't hang up on us.
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
We also no longer strip the type off: everyone handles both forms, and
Eclair doesn't strip (and it's easier!).
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
Even if we're deferring putting them in the store and broadcasting them,
we tell lightningd so it will use it in any error messages.
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
This fixes lightningd's chronic weight underestimate.
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
Changelog-Fixed: closingd: more accurate weight estimation helps mutual closing near min/max feerates.
The blockheight is zero though, since these aren't included in a block
yet.
We also don't issue an 'external' deposit event if we can tell that the
address you're sending to actually belongs to our wallet (we'll issue a
deposit event when it gets included in a block)
```
l1.rpc.disconnect(l2.info['id'], force=True)
l1.rpc.connect(l2.info['id'], 'localhost', l2.port)
> l1.daemon.wait_for_log('option_static_remotekey enabled at 2/2')
tests/test_connection.py:3653:
```
If l2's channeld gets killed (due to reconnect) before it tells
lightningd it got the revoke_and_ack it will need a retransmission
*again*.
This makes the test more robust, and does more checks too.
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
OK, now this test makes more sense! Now we don't ignore errors, we
*will* drop to chain if we reconnect after one side has dropped to
chain.
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
There's actually a bug in our closing tx size estimation; I'll do
a separate patch for this, though.
Seems this used to be flaky, now we always flush queues, so it's
more reliably caught.
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
We seem to hit a race between manual reconnect (with address hint) and an automatic
reconnection attempt which fails:
```
> l4.rpc.connect(l3.info['id'], 'localhost', l3.port)
...
E pyln.client.lightning.RpcError: RPC call failed: method: connect, payload: {'id': '035d2b1192dfba134e10e540875d366ebc8bc353d5aa766b80c090b39c3a5d885d', 'host': 'localhost', 'port': 41285}, error: {'code': 401, 'message': 'All addresses failed: 127.0.0.1:36678: Connection establishment: Connection refused. '}
```
See how it didn't even try the given address?
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
l1 might split in a commitment_signed before it notices the disconnect, and this test fails:
```
for i in range(0, len(disconnects)):
with pytest.raises(RpcError):
l1.rpc.sendpay(route, rhash, payment_secret=inv['payment_secret'])
> l1.rpc.waitsendpay(rhash)
E Failed: DID NOT RAISE <class 'pyln.client.lightning.RpcError'>
```
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
We now let gossipd do it.
This also means there's nothing left in 'struct per_peer_state' to
send across the wire (the fds are sent separately), so that gets
removed from wire messages too.
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
We actually intercept the gossip_timestamp_filter, so the gossip_store
mechanism inside the per-peer daemon never kicks off for normal connections.
The gossipwith tool doesn't set OPT_GOSSIP_QUERIES, so it gets both, but
that only effects one place.
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
channeld can't do it any more: it's using local sockets. Connectd
can do it, and simply does it by type.
Amazingly, on my machine the timing change *always* caused
test_channel_receivable() to fail, due to a latent race.
Includes feedback from @cdecker.
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
As connectd handles more packets itself, or diverts them to/from gossipd,
it's the only place we can implement the dev_disconnect logic.
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
This test started mostly failing (in non-DEVELOPER mode) after the
next patch, due to timing issues.
Handle both cases for now, and we'll add more enhancements later to
things we should be handling more consistently.
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
I also got an error under CI; it seems the sleep() was insufficient.
So try adding a sleep inside the check_coin_moves, which should cover
everyone.
```
acct_moves = acct_moves[number_moves:]
else:
> if not move_matches(m, acct_moves[0]):
E IndexError: list index out of range
```
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
Once connectd is doing this, we can't close as soon as we send,
and in fact we can't do 'fail write' either.
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
These would have to be done by connectd, not the local daemon, once it's
intermediating. Otherwise the remote peer won't see any change.
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
Fire off a snapshot of current account balances (node wallet + every
'active' channel) after we've caught up to the chain tip for the *first*
time (in other words, on start).
We need to stash/save the amount of the lease fees on a leased channel,
we do this by re-using the 'push' amount field on channel (which is
technically correct, since we're essentially pushing the fee amount to
the peer).
Also updates a bit of how the pushes are accounted for (pushed to now
has an event; their channel will open at zero but then they'll
immediately register a push event).
Leases fees are treated exactly the same as pushes, except labeled
differently.
Required adding a 'lease_fee' field to the inflights so we keep track of
the fee for the lease until the open happens.
If we initialized the payment, the fees are the entire fee-chain
(final hop amount - starting hop amount)
If it's a payment we routed, the fees are the diff between the
inbound htlc and the outbound (net gain by this routing)
Added to database so data persists nicely.
We record the amount of fees collected for a routed payment. For
simplicity's sake on the data agg side, we record the fee payment on
*BOTH* the incoming htlc and the outgoing htlc. Note that this results
in double counting if you add up the fees from both an in-routed and
out-routed payment.
Get rid of the 'movement_idx', since we replay events now.
Since we're removing a field from the 'coin_movement' event emission, we
bump the version type.
Changelog-Updated: `coin_movements` events have been revamped and are now on version 2.
test_onchain_dust_out restarts a node, which produces duplicate events.
this is expected, but we need to de-duplicate the events stream to get
accurate results
The old model of coin movements attempted to compute fees etc and log
amounts, not utxos. This is not as robust, as multi-party opens and dual
funded channels make it hard to account for fees etc correctly.
Instead, we move towards a 'utxo' view of the onchain events. Every
event is either the creation or 'destruction' of a utxo. For cases where
the value of the utxo is not (fully) debited/credited to our account, we
also record the output_value. E.g. channel closings spend a utxo who's
entire value we may not own.
Since we're now tracking UTXOs onchain, we can now do more complex
assertions about the onchain footprint of them. The integration tests
have been updated to now use more 'chain aware' assertions about the
ending state.
Suggested-by: Rusty Russell
Signed-off-by: Vincenzo Palazzo <vincenzopalazzodev@gmail.com>
Changelog-Changed: Support hsm specific error error code in lightning-cli
1. Adds the missing DNS error massages so they can be handled by
connect_control.
2. Prepends a 'All addresses failed' to code 401 message, so we
always have at least some error message to the user.
Changelog-None
The last line of the testcase was checking on the wrong node l3
instead of l1. l3 didn't had the plugins configured that would
produce the log entry we were looking for not to be present.
Changelog-None
The idea is to have different default ports for different networks.
Current default port is `9735` for everything. Let's use it for
the mainnet and reuse the difference added to the default port
from `rpc_port` values in `bitcoin/chainstate.c`.
Testnet would be `19735` (adding rpc_port - 8332 = `10000`).
Signet would be `39735` (adding rpc_port - 8332 = `30000`).
Regtest would be `19846` (adding rpc_port - 8332 = `10111`).
With Vincenzo's kind pair-programming help over tmate.
Two other commits were squashed into this one so that bisecting
never ends up in half-baked state:
1. chainparams: Fix regtest default rpc_port
bitcoind -help says this:
-rpcport=<port>
Listen for JSON-RPC connections on <port> (default: 8332, testnet:
18332, signet: 38332, regtest: 18443)
2. test_gossip: Default port for regtest
hex: 2607 is now .... (could be 4d86 but Elements uses another port)
dec: 9735 is now any port (could be 19846 ^^ but now is for any port)
The lines which were binding to default port were removed as the
default port is different on each network.
NOTE: Remember not to modify gossip_store tests which loads everything raw
including the checksums.
Changelog-Changed: If the port is unspecified, the default port is chosen according to used network similarly to Bitcoin Core.
And turn "" includes into full-path (which makes it easier to put
config.h first, and finds some cases check-includes.sh missed
previously).
config.h sets _GNU_SOURCE which really needs to be done before any
'#includes': we mainly got away with it with glibc, but other platforms
like Alpine may have stricter requirements.
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
We make sure the gossip msg was sent, but the other node might not
have digested it yet:
```
# Check other node can parse these
> addresses = l2.rpc.listnodes(l1.info['id'])['nodes'][0]['addresses']
E KeyError: 'addresses'
```
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
Temporarily disable sendpay_blinding test which uses obsolete onionmsg;
there's still some debate on the PR about how blinded HTLCs will work.
Changelog-EXPERIMENTAL: onionmessage: removed support for v0.10.1 onion messages.
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
shutdown_subdaemons frees the channel and calls destroy_close_command_on_channel_destroy, see gdb:
0 destroy_close_command_on_channel_destroy (_=0x55db6ca38e18, cc=0x55db6ca43338) at lightningd/closing_control.c:94
1 0x000055db6a8181b5 in notify (ctx=0x55db6ca38df0, type=TAL_NOTIFY_FREE, info=0x55db6ca38e18, saved_errno=0) at ccan/ccan/tal/tal.c:237
2 0x000055db6a8186bb in del_tree (t=0x55db6ca38df0, orig=0x55db6ca38e18, saved_errno=0) at ccan/ccan/tal/tal.c:402
3 0x000055db6a818a47 in tal_free (ctx=0x55db6ca38e18) at ccan/ccan/tal/tal.c:486
4 0x000055db6a73fffa in shutdown_subdaemons (ld=0x55db6c8b4ca8) at lightningd/lightningd.c:543
5 0x000055db6a741098 in main (argc=21, argv=0x7ffffa3e8048) at lightningd/lightningd.c:1192
Before this PR, there was no io_loop after shutdown_subdaemons and client side raised a
general `Connection to RPC server lost.`
Now we test the more specific `Channel forgotten before proper close.`, which is good!
BTW, this test was added recently in PR #4599.
It runs 6 nodes: under valgrind this ends up consuming 5.3 GB RSS. By
stopping nodes between, we peak about 1G RSS.
Measured using:
(while true; do echo $(for i in 4 5 6; do ps uh | tr -s ' ' | cut -d\ -f$i | tally; done); sleep 5; done)&
(Which measures my other processes as well, but that's only about 100M).
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
ChangeLog-Added: With the `sqlite3://` scheme for `--wallet` option, you can now specify a second file path for real-time database backup by separating it from the main file path with a `:` character.
valgrind locally complains about the allocations in autodata leaking:
```
==138200== 16 bytes in 1 blocks are still reachable in loss record 1 of 2
==138200== at 0x483B7F3: malloc (in /usr/lib/x86_64-linux-gnu/valgrind/vgpreload_memcheck-amd64-linux.so)
==138200== by 0x10D41A: autodata_register_ (autodata.c:20)
==138200== by 0x10E7B8: register_autotype_type_to_string (type_to_string.h:79)
==138200== by 0x10F5CA: register_one_type_to_string0 (block.c:259)
==138200== by 0x19734C: __libc_csu_init (in /home/rusty/devel/cvs/lightning/common/test/run-route-specific)
==138200== by 0x4A3D03F: (below main) (libc-start.c:264)
==138200==
==138200== 176 bytes in 1 blocks are still reachable in loss record 2 of 2
==138200== at 0x483DFAF: realloc (in /usr/lib/x86_64-linux-gnu/valgrind/vgpreload_memcheck-amd64-linux.so)
==138200== by 0x10D472: autodata_register_ (autodata.c:26)
==138200== by 0x122D37: register_autotype_type_to_string (type_to_string.h:79)
==138200== by 0x122F1F: register_one_type_to_string0 (node_id.c:50)
==138200== by 0x19734C: __libc_csu_init (in /home/rusty/devel/cvs/lightning/common/test/run-route-specific)
==138200== by 0x4A3D03F: (below main) (libc-start.c:264)
==138200==
make: *** [Makefile:638: unittest/common/test/run-route-specific] Error 7
```
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
If we forget a channel, we can get upset when we get an update about it:
```
2021-11-04T00:35:43.8242370Z lightningd-3: 2021-11-04T00:29:22.073Z DEBUG gossipd: Pruning channel 103x1x1 from network view (ages 61 and 22s)
...
2021-11-04T00:35:43.8263502Z lightningd-3: 2021-11-04T00:29:22.509Z DEBUG 022d223620a359a47ff7f7ac447c85c46c923da53389221a0054c11c1e3ca31d59-gossipd: Bad gossip order: WIRE_CHANNEL_UPDATE before announcement 103x1x1/0
```
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
This loads up 20MB of plugins temporarily; we seem to be getting OOM
killed under CI and I wonder if this is contributing.
Doesn't significantly reduce runtime here, but I have lots of memory.
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
CI seems to be OOM killing us; 5 may be too many under valgrind.
VALGRIND=1 pytest tests/test_pay.py::test_fetchinvoice
Before:
1 passed in 199.33s (0:03:19)
After:
1 passed in 177.91s (0:02:57)
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
This surprised me, since the CHANGELOG for [0.8.2] said:
We now announce multiple addresses of the same type, if given. ([3609](https://github.com/ElementsProject/lightning/pull/3609))
But it lied!
Changelog-Fixed: We really do allow providing multiple addresses of the same type.
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
October was the date Torv2 is no longer supported by the Tor Project;
it will probably not work at all by next release, so we should remove
it now even though it's not quite the 6 months we prefer for
deprecation cycles.
I still see 110 nodes advertizing Torv2 (vs 10,292 Torv3); we still
parse and display it, we just don't advertize or connect to it.
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
We're about to require that fundchannel_complete() take a PSBT, where it
does sanity checks to avoid this error, making this a difficult mistake
to make.
Is it time to remove this functionality anyway? @cdecker?
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
Fails liquid-regtest otherwise; liquid tends to hit the dust limit
earlier than non-liquid tx, and MPP exacerbates this by divvying up
payments into dusty bits then attempting to shove them through the same
channel, hitting the dust max. The MPP then fails as not all the parts
were able to arrive at their destination.
Let's make this a softer launch by just warning on the channel til the
feerates go back down.
You can also 'fix' this by upping your dust limit with
the `max-dust-htlc-exposure-msat` config.
Fixes#4482Fixes#4481
Changelog-Added: pay: Payment attempts are now grouped by the pay command that initiated them
Changelog-Fixed: pay: `listpays` returns payments orderd by their creation date
Changelog-Fixed: pay: `listpays` no longer groups attempts from multiple attempts to pay an invoice
If it reconnects by itself, it will get a warning message:
```
lightningd-2: 2021-10-08T01:40:42.446Z DEBUG 0266e4598d1d3c415f572a8488830b60f7e744ed9235eb0b1ba93283b315c03518-channeld-chan#3: billboard: Sent reestablish, waiting for theirs
lightningd-2: 2021-10-08T01:40:42.446Z DEBUG 0266e4598d1d3c415f572a8488830b60f7e744ed9235eb0b1ba93283b315c03518-channeld-chan#3: peer_in WIRE_ERROR
lightningd-2: 2021-10-08T01:40:42.447Z DEBUG 0266e4598d1d3c415f572a8488830b60f7e744ed9235eb0b1ba93283b315c03518-channeld-chan#3: billboard perm: Received error channel 0a6220a3e904d17e72b5c3499928dc8a65720063c6395c999a129a0ff0b06afb: Forcibly closed by `close` command timeout
lightningd-2: 2021-10-08T01:40:42.448Z INFO 0266e4598d1d3c415f572a8488830b60f7e744ed9235eb0b1ba93283b315c03518-chan#3: Peer transient failure in CHANNELD_NORMAL: channeld WARNING: error channel 0a6220a3e904d17e72b5c3499928dc8a65720063c6395c999a129a0ff0b06afb: Forcibly closed by `close` command timeout
```
And this will make CI complain.
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
It's probably not worth fixing for the other daemons.
Changelog-Changed: JSON-RPC: `ping` now only works if we have a channel with the peer.
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
Send a ping every 15-45 seconds. If we try to send another one and we
haven't got a reply, hang up.
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
Changelog-Changed: Protocol: Send regular pings to detect dead connections (particularly for Tor).
To minimize the diffs, we #if 0 the code. We'll reenable it once
channeld is ready.
We also temporarily disable the ping tests.
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
Main changes are:
1. Uses point32 instead of pubkey32.
2. Uses issuer instead of vendor.
3. Uses byte instead of u8.
4. blinded_path num_hops is now a byte, not u16 (we don't use that yet!).
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
Changelog-EXPERIMENTAL: bolt12: `vendor` is deprecated: the field is now called `issuer`.
We always allocate a new `struct command` when we get a full JSON
object from stdin:
b2df01dc73/plugins/libplugin.c (L1229-L1233)
If it happens to be a notification, we pass the `struct command` to
the handler, and not free it ourselves:
b2df01dc73/plugins/libplugin.c (L1270-L1275)
There are only nine points in `plugins/libplugin.c` where we `tal_free`
anything, and only one of them frees a `struct command`:
b2df01dc73/plugins/libplugin.c (L224-L234)
The above function `command_complete` is not appropriate for
notification handlers; the above function sends out a response
to our stdout, which a notification handler should not do.
However, as-is, it does mean that notification handling leaks
`struct command` objects, which can be problematic if we ever
have future built-in plugins which are significantly more
dependent on notifications.
This commit changes notification handlers to return
`struct command_result *`, because possibly in the future
notification handlers may want to perform `send_outreq`, so we
might as well use our standard convention for callbacks, and
to encourage future developers to check how to properly
terminate notification handlers (and free up the
`struct command`).
We also now provide a `notification_handled` function which a
notification handler must eventually call, as well as a
`notification_handler_pending` which is just a snowclone of
`command_still_pending`.
It was incredibly flaky due to the potential for l2 announcing the
channel before l1 could get to it, thus suppressing the outgoing
announcement which we were looking for. This now checks either
direction.
Before this fix the failure rate was 24% (out of 100 runs), afterwards
it's 0%.
Changelog-None
This was measured as a 95th percentile in our rough testing, thanks to
all the volunteers who monitored my channels.
Fixes: #4761
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
Changelog-Added: JSON-RPC: `setchannelfee` gives a grace period (`enforcedelay`) before rejecting old-fee payments: default 10 minutes.
We can miss it in both logs, so wait for it instead:
```
2021-09-22T07:25:59.1582950Z > l3.rpc.addgossip(ann.split()[3])
2021-09-22T07:25:59.1583911Z E AttributeError: 'NoneType' object has no attribute 'split'
```
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
Changelog-Added: Support to listpays the status parameter to filter the payments by status.
Signed-off-by: Vincenzo Palazzo <vincenzopalazzodev@gmail.com>
sendonionmessage is going to be the new one, and do much *less*.
As this is an internal experimental-only API, no deprecation cycle
required.
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
We never tested that we can correctly unwrap on the next step after
unblinding: it failed because we mangled the onion in place! Fix that.
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
Before:
Ten builds, laptop -j5, no ccache:
```
real 0m36.686000-38.956000(38.608+/-0.65)s
user 2m32.864000-42.253000(40.7545+/-2.7)s
sys 0m16.618000-18.316000(17.8531+/-0.48)s
```
Ten builds, laptop -j5, ccache (warm):
```
real 0m8.212000-8.577000(8.39989+/-0.13)s
user 0m12.731000-13.212000(12.9751+/-0.17)s
sys 0m3.697000-3.902000(3.83722+/-0.064)s
```
After:
Ten builds, laptop -j5, no ccache: 8% faster
```
real 0m33.802000-35.773000(35.468+/-0.54)s
user 2m19.073000-27.754000(26.2542+/-2.3)s
sys 0m15.784000-17.173000(16.7165+/-0.37)s
```
Ten builds, laptop -j5, ccache (warm): 1% faster
```
real 0m8.200000-8.485000(8.30138+/-0.097)s
user 0m12.485000-13.100000(12.7344+/-0.19)s
sys 0m3.702000-3.889000(3.78787+/-0.056)s
```
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
This allows cmdline users to have more idea what's going on.
Inspired-by: https://github.com/ElementsProject/lightning/issues/4777
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
Changelog-Added: `close` now notifies about the feeranges each side uses.
We make it a first-class citizen internally, even though we won't use
it over the wire (at least, non-experimental builds). This scheme
follows the latest draft, in which features are flagged compulsory.
We also add several helper functions.
Since uses the *even* bits (as per latest spec), not the *odd* bits,
we have some other fixups.
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
That was quick!
We remove the 50% test, since the default is now to use quickclose.
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
Changelog-Added: Protocol: We now perform quick-close if the peer supports it.
This affects the range we offer even without quick-close, but it's
more critical for quick-close.
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
Changelog-Added: JSONRPC: `close` now takes a `feerange` parameter to set min/max fee rates for mutual close.
This is now allowed for anchors (as per https://github.com/lightningnetwork/lightning-rfc/pull/847).
We need to play with feerates, since we don't put a discount on anchor
commitments yet.
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
Based on a commit by @niftynei, but:
- Separated quickclose logic from main loop.
- I made it indep of anchor_outputs, use and option instead.
- Disable if they've specified how to negotiate.
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
This includes the new bolt11 test vectors, and also removes the
requirement that HTLCs be less than 2^32 msat. We keep that for now
because Electrum enforced it on receive: in two releases we will stop
that too.
So no longer warn about needing mpp in that case either.
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
Changelog-Deprecated: Protocol: No longer restrict HTLCs to
We weren't actually getting the last log out; this does that.
We have to fix test_bitcoin_failure which now notices the BROKEN
log message.
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
Changelog-Fixed: libplugin: Fatal error messages from plugin_exit() now logged in lightningd.
Changelog-Changed: Change order parameters in the listforwards command
Changelog-Deprecated: Change order of the status parameter in the listforwards rpc command.
Signed-off-by: Vincenzo Palazzo <vincenzopalazzodev@gmail.com>
After some discussion with @shesek, and my own usage, we agreed that
a more comprehensive interface, which explicitly supports grouping,
is desirable.
Thus keys are now arrays, with the semantic that a key is either a
parent or has a value, never both.
For convenience in the JSON schema, we always return them as arrays,
though we accept simple strings as arguments.
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
We add a generation counter, and allow update or del conditional
on a given generation.
Formalizes error codes, too, since we have more now.
Suggested-by: @shesek
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
The function is already provided in
contrib/pyln-testing/pyln/testing/utils.py (which is also
imported in this module), so there is no need to define it
a second time.
We actually were waiting for l3 to disconnect, not l2.
But in general we should be looking for status rather than grovelling
in the logs where possible, so I changed all those.
```
2021-08-17T04:40:34.9015538Z # l2 leases a channel from l3
2021-08-17T04:40:34.9016520Z l2.rpc.connect(l3.info['id'], 'localhost', l3.port)
2021-08-17T04:40:34.9017570Z rates = l2.rpc.dev_queryrates(l3.info['id'], amount, amount)
2021-08-17T04:40:34.9018724Z l3.daemon.wait_for_log('disconnect')
2021-08-17T04:40:34.9019851Z l2.rpc.connect(l3.info['id'], 'localhost', l3.port)
2021-08-17T04:40:34.9021467Z l2.rpc.fundchannel(l3.info['id'], amount, request_amt=amount,
2021-08-17T04:40:34.9022865Z feerate='{}perkw'.format(feerate), minconf=0,
2021-08-17T04:40:34.9024000Z > compact_lease=rates['compact_lease'])
...
2021-08-17T04:40:34.9103422Z > raise RpcError(method, payload, resp['error'])
2021-08-17T04:40:34.9106829Z E pyln.client.lightning.RpcError: RPC call failed: method: fundchannel, payload: {'id': '035d2b1192dfba134e10e540875d366ebc8bc353d5aa766b80c090b39c3a5d885d', 'amount': 500000, 'feerate': '2000perkw', 'announce': True, 'minconf': 0, 'request_amt': 500000, 'compact_lease': '029a00640064000000644c4b40'}, error: {'code': 400, 'message': 'Unable to connect, no address known for peer', 'data': {'id': '035d2b1192dfba134e10e540875d366ebc8bc353d5aa766b80c090b39c3a5d885d', 'method': 'connect'}}
```
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
We fail waiting for 'Resolved OUR_UNILATERAL/DELAYED_OUTPUT_TO_US by our proposal OUR_DELAYED_RETURN_TO_WALLET'
because we close *two* channels, but only wait for one transaction before mining a block.
This means we might only have one, and we immediately mine the next wait_for_mempool=1,
so the OUR_DELAYED_RETURN_TO_WALLET isn't mined.
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
If l3 is too slow, it can get channel_announcement after channel
is closed, so it gets upset at the node_announcement which follows:
022d223620a359a47ff7f7ac447c85c46c923da53389221a0054c11c1e3ca31d59-gossipd: Updated pending announce with update 103x1x1/1
022d223620a359a47ff7f7ac447c85c46c923da53389221a0054c11c1e3ca31d59-gossipd: channel_announcement: no unspent txout 103x1x1
022d223620a359a47ff7f7ac447c85c46c923da53389221a0054c11c1e3ca31d59-gossipd: Bad gossip order: WIRE_NODE_ANNOUNCEMENT before announcement 0266e4598d1d3c415f572a8488830b60f7e744ed9235eb0b1ba93283b315c03518
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
This actually caused the flake in test_funding_reorg_private, where
l1 and l2 might not mark the original channel disabled. In fact, they
should *remove* it as it gets reorged out.
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>