Commit Graph

11544 Commits

Author SHA1 Message Date
Christian Decker
c98f011479 channeld: Send a depth=0 notification when channeld starts up
This is used in order to ensure zeroconf doesn't just wait for the
first confirmation despite mindepth being set to 0.
2022-07-04 22:14:06 +02:00
Christian Decker
b9817d395f zeroconf: Wire the aliases through channeld 2022-07-04 22:14:06 +02:00
Christian Decker
de1c0b51f0 zeroconf: Add alias_remote and alias_local to channel and DB
`alias_local` is generated locally and sent to the peer so it knows
what we're calling the channel, while `alias_remote` is received by
the peer so we know what to include in routehints when generating
invoices.
2022-07-04 22:14:06 +02:00
Christian Decker
9d3cb95489 wire: Add funding_locked tlv patch from PR lightning/bolts#910
Minimal set of changes to update the peer_wire.csv to include the TLV
field in the `funding_locked` message, and add type 1=alias from that
PR too.
2022-07-04 22:14:06 +02:00
Christian Decker
3fbaac3fdb jsonrpc: Add option_zeroconf handling to listpeers 2022-07-04 22:14:06 +02:00
Christian Decker
8609f9e00d pytest: Test the mindepth customizations of fundchannel and hook
Just test that we can customize, and we'll add mindepth=0 support in
the next couple of commits.
2022-07-04 22:14:06 +02:00
Christian Decker
1477873190 plugin: Allow plugins to customize the mindepth in accept_channel
This is the counterpart of the `mindepth` parameter in `fundchannel`
and friends. Allows dynamic lookups of `node_id` and selectively
opting into `option_zeroconf` being used.

Changelog-Added: plugin: The `openchannel` hook may return a `mindepth` indicating how many confirmations are required.
2022-07-04 22:14:06 +02:00
Christian Decker
adbb977053 openingd: If we have negotiated zeroconf we use our mindepth
With `option_zeroconf` we may now send `channel_ready` at any time we
want, rendering the `mindepth` parameter a mere heads up. We ignore it
in favor of our own value, since we plan to trigger releasing the
`channel_ready` once we reach our own depth.
2022-07-04 22:14:06 +02:00
Christian Decker
e4511452ac bolt: Reflect the zeroconf featurebits in code 2022-07-04 22:14:06 +02:00
Christian Decker
185cd81be4 jsonrpc: Add mindepth argument to fundchannel and multifundchannel
This will eventually enable us to specify 0 for zeroconf channels.

Changelog-Added: JSON-RPC: Added `mindepth` argument to specify the number of confirmations we require for `fundchannel` and `multifundchannel`
2022-07-04 22:14:06 +02:00
Christian Decker
6d07f4ed85 json: Add parser for u32 params 2022-07-04 22:14:06 +02:00
Vincenzo Palazzo
bad943da55 valgrind: ingore plugin build with rust
Ok this should be fixed the following stack trace


```
2022-06-29T14:19:41.183Z DEBUG   lightningd: Command returned result after jcon close
------------------------------- Valgrind errors --------------------------------
Valgrind error file: valgrind-errors.55581
==55581== Syscall param statx(file_name) points to unaddressable byte(s)
==55581==    at 0x4B0188E: statx (statx.c:29)
==55581==    by 0x1133481: std::sys::unix::fs::try_statx (weak.rs:178)
==55581==    by 0x11265E0: std::fs::buffer_capacity_required (fs.rs:851)
==55581==    by 0x112675B: <std::fs::File as std::io::Read>::read_to_string (fs.rs:644)
==55581==    by 0x10DACA8: num_cpus::linux::Cgroup::param (linux.rs:214)
==55581==    by 0x10DAB39: num_cpus::linux::Cgroup::quota_us (linux.rs:203)
==55581==    by 0x10DA9C2: num_cpus::linux::Cgroup::cpu_quota (linux.rs:188)
==55581==    by 0x10DA5A1: num_cpus::linux::load_cgroups (linux.rs:149)
==55581==    by 0x10DA23D: num_cpus::linux::init_cgroups (linux.rs:129)
==55581==    by 0x10DCDC8: core::ops::function::FnOnce::call_once (function.rs:227)
==55581==    by 0x10DC749: std::sync::once::Once::call_once::{{closure}} (once.rs:276)
==55581==    by 0x21EE89: std::sync::once::Once::call_inner (once.rs:434)
==55581==  Address 0x0 is not stack'd, malloc'd or (recently) free'd
==55581== 
==55581== Syscall param statx(buf) points to unaddressable byte(s)
==55581==    at 0x4B0188E: statx (statx.c:29)
==55581==    by 0x1133481: std::sys::unix::fs::try_statx (weak.rs:178)
==55581==    by 0x11265E0: std::fs::buffer_capacity_required (fs.rs:851)
==55581==    by 0x112675B: <std::fs::File as std::io::Read>::read_to_string (fs.rs:644)
==55581==    by 0x10DACA8: num_cpus::linux::Cgroup::param (linux.rs:214)
==55581==    by 0x10DAB39: num_cpus::linux::Cgroup::quota_us (linux.rs:203)
==55581==    by 0x10DA9C2: num_cpus::linux::Cgroup::cpu_quota (linux.rs:188)
==55581==    by 0x10DA5A1: num_cpus::linux::load_cgroups (linux.rs:149)
==55581==    by 0x10DA23D: num_cpus::linux::init_cgroups (linux.rs:129)
==55581==    by 0x10DCDC8: core::ops::function::FnOnce::call_once (function.rs:227)
==55581==    by 0x10DC749: std::sync::once::Once::call_once::{{closure}} (once.rs:276)
==55581==    by 0x21EE89: std::sync::once::Once::call_inner (once.rs:434)
==55581==  Address 0x0 is not stack'd, malloc'd or (recently) free'd
==55581==
--------------------------------------------------------------------------------
Leaving base_dir /tmp/ltests-hzt9ppqp intact, it still has test sub-directories with failure details: ['test_peers_1']
```

Signed-off-by: Vincenzo Palazzo <vincenzopalazzodev@gmail.com>
2022-07-03 20:36:20 +09:30
Rusty Russell
1771b8ec22 CI: re-enable checks, by changing errant tab back to spaces.
And the Python contrib/ stuff seems to fail under VALGRIND, so attach
it to a normal make line.

Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
2022-07-03 20:36:20 +09:30
Igor Bubelov
ee3f059e80 Update README.md 2022-07-03 12:41:07 +02:00
Igor Bubelov
50107754a7 Add README.md 2022-07-03 12:41:07 +02:00
Greg Sanders
9f953b5efb No funding_wscript arg in initial_commit_tx 2022-07-01 13:30:19 -05:00
Rusty Russell
2fe17a5837 CI: make sure *someone* runs check-units under valgrind!
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
2022-06-29 21:07:42 +09:30
Rusty Russell
9ab7c8aed3 connected/test: fix memleak in test.
```
VALGRIND=1 valgrind -q --error-exitcode=7 --track-origins=yes --leak-check=full --show-reachable=yes --errors-for-leak-kinds=all connectd/test/run-netaddress > /dev/null
==2483395== 16 bytes in 1 blocks are still reachable in loss record 1 of 15
==2483395==    at 0x483B7F3: malloc (in /usr/lib/x86_64-linux-gnu/valgrind/vgpreload_memcheck-amd64-linux.so)
==2483395==    by 0x10D59A: autodata_register_ (autodata.c:20)
==2483395==    by 0x10EB26: register_autotype_type_to_string (type_to_string.h:77)
==2483395==    by 0x10EB6B: register_one_type_to_string0 (type_to_string.c:8)
==2483395==    by 0x188C0C: __libc_csu_init (in /home/rusty/devel/cvs/lightning/connectd/test/run-netaddress)
==2483395==    by 0x4A3A00F: (below main) (libc-start.c:264)
==2483395==
==2483395== 40 bytes in 1 blocks are still reachable in loss record 2 of 15
==2483395==    at 0x483B7F3: malloc (in /usr/lib/x86_64-linux-gnu/valgrind/vgpreload_memcheck-amd64-linux.so)
...
```

Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
2022-06-29 21:07:42 +09:30
Rusty Russell
fd90e5746b connectd: don't keep around more than one old connection.
This was fixed in 1c495ca5a8 ("connectd:
fix accidental handling of old reconnections.") and then reverted by
the rework in "connectd: avoid use-after-free upon multiple
reconnections by a peer".

The latter made the race much less likely, since we cleaned up the
reconnecting struct once the connection was hung up by the remote
node, but it's still theoretically possible.

Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
2022-06-28 13:47:27 +09:30
Rusty Russell
afbddcf7f3 lightningd: fix crash on rapid reconnect.
Happens occasionally when running
`tests/test_connection.py::test_mutual_reconnect_race` (which is too
flaky to add, without more fixes):


```
lightningd: lightningd/peer_control.c:1252: peer_active: Assertion `!channel->owner' failed.
lightningd: FATAL SIGNAL 6 (version v0.11.0.1-38-g4f167da)
0x5594a41f8f45 send_backtrace
	common/daemon.c:33
0x5594a41f8fef crashdump
	common/daemon.c:46
0x7f7cb585c08f ???
	/build/glibc-SzIz7B/glibc-2.31/signal/../sysdeps/unix/sysv/linux/x86_64/sigaction.c:0
0x7f7cb585c00b __GI_raise
	../sysdeps/unix/sysv/linux/raise.c:51
0x7f7cb583b858 __GI_abort
	/build/glibc-SzIz7B/glibc-2.31/stdlib/abort.c:79
0x7f7cb583b728 __assert_fail_base
	/build/glibc-SzIz7B/glibc-2.31/assert/assert.c:92
0x7f7cb584cfd5 __GI___assert_fail
	/build/glibc-SzIz7B/glibc-2.31/assert/assert.c:101
0x5594a41b45ca peer_active
	lightningd/peer_control.c:1252
0x5594a418794c connectd_msg
	lightningd/connect_control.c:457
0x5594a41cd457 sd_msg_read
	lightningd/subd.c:556
0x5594a41ccbe5 read_fds
	lightningd/subd.c:357
0x5594a4269fc2 next_plan
	ccan/ccan/io/io.c:59
0x5594a426abca do_plan
	ccan/ccan/io/io.c:407
0x5594a426ac0c io_ready
	ccan/ccan/io/io.c:417
0x5594a426ceff io_loop
	ccan/ccan/io/poll.c:453
0x5594a41930d9 io_loop_with_timers
	lightningd/io_loop_with_timers.c:22
0x5594a4199293 main
	lightningd/lightningd.c:1181
0x7f7cb583d082 __libc_start_main
	../csu/libc-start.c:308
0x5594a416e15d ???
	???:0
0xffffffffffffffff ???
	???:0
```
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
2022-06-28 13:47:27 +09:30
Matt Whitlock
83c825945c connectd: avoid use-after-free upon multiple reconnections by a peer
`peer_reconnected` was freeing a `struct peer_reconnected` instance
while a pointer to that instance was registered to be passed as an
argument to the `retry_peer_connected` callback function. This caused a
use-after-free crash when `retry_peer_connected` attempted to reparent
the instance to the temporary context.

Instead, never have `peer_reconnected` free a `struct peer_reconnected`
instance, and only ever allow such an instance to be freed after the
`retry_peer_connected` callback has finished with it. To ensure that the
instance is freed even if the connection is closed before the callback
can be invoked, parent the instance to the connection rather than to the
daemon.

Absent the need to free `struct peer_reconnected` instances outside of
the `retry_peer_connected` callback, there is no use for the
`reconnected` hashtable, so remove it as well.

See: https://github.com/ElementsProject/lightning/issues/5282#issuecomment-1141454255
Fixes: #5282
Fixes: #5284
Changelog-Fixed: connectd no longer crashes when peers reconnect.
2022-06-28 13:47:27 +09:30
Rusty Russell
4ee55acc71 connectd: don't start connecting in parallel in peer_conn_closed.
The crash below from @zerofeerouting left me confused.  The invalid
value in fmt_wireaddr_internal is a telltale sign of use-after-free.

This backtrace shows us destroying the conn *twice*: what's happening?

Well, tal carefully protects against destroying twice: it's not that
unusual to free something in a destructor which has already been freed.
So this indicates that there are *two* io_conn hanging off one
struct connecting, which isn't supposed to happen!  We deliberately
call try_connect_one_addr() initially, then inside the io_conn destructor.

But due to races in connectd vs lightningd connection state, we added
a fix which allows a connect command to sit around while the peer is
cleaning up (6cc9f37cab) and get fired
off when it's done.

But what if, in the chaos, we are already connecting again?  Now we'll
end up with *two* connections.

Fortunately, we have a `conn` pointer inside struct connecting, which
(with a bit of additional care) we can ensure is only non-NULL while
we're actually trying to connect.  This lets us check that before
firing off a new connection attempt in peer_conn_closed.

```
lightning_connectd: FATAL SIGNAL 6 (version v0.11.2rc2-2-g8f7e939)
0x5614a4915ae8 send_backtrace
	common/daemon.c:33
0x5614a4915b72 crashdump
	common/daemon.c:46
0x7ffa14fcd72f ???
	???:0
0x7ffa14dc87bb ???
	???:0
0x7ffa14db3534 ???
	???:0
0x5614a491fc71 fmt_wireaddr_internal
	common/wireaddr.c:255
0x5614a491fc7a fmt_wireaddr_internal_
	common/wireaddr.c:257
0x5614a491ea6b type_to_string_
	common/type_to_string.c:32
0x5614a490beaa destroy_io_conn
	connectd/connectd.c:754
0x5614a494a2f1 destroy_conn
	ccan/ccan/io/poll.c:246
0x5614a494a313 destroy_conn_close_fd
	ccan/ccan/io/poll.c:252
0x5614a4953804 notify
	ccan/ccan/tal/tal.c:240
0x5614a49538d6 del_tree
	ccan/ccan/tal/tal.c:402
0x5614a4953928 del_tree
	ccan/ccan/tal/tal.c:412
0x5614a4953e07 tal_free
	ccan/ccan/tal/tal.c:486
0x5614a4908b7a try_connect_one_addr
	connectd/connectd.c:870
0x5614a490bef1 destroy_io_conn
	connectd/connectd.c:759
0x5614a494a2f1 destroy_conn
	ccan/ccan/io/poll.c:246
0x5614a494a313 destroy_conn_close_fd
	ccan/ccan/io/poll.c:252
0x5614a4953804 notify
	ccan/ccan/tal/tal.c:240
0x5614a49538d6 del_tree
	ccan/ccan/tal/tal.c:402
0x5614a4953e07 tal_free
	ccan/ccan/tal/tal.c:486
0x5614a4948f08 io_close
	ccan/ccan/io/io.c:450
0x5614a4948f59 do_plan
	ccan/ccan/io/io.c:401
0x5614a4948fe1 io_ready
	ccan/ccan/io/io.c:417
0x5614a494a8e6 io_loop
	ccan/ccan/io/poll.c:453
0x5614a490c12f main
	connectd/connectd.c:2164
0x7ffa14db509a ???
	???:0
0x5614a4904e99 ???
	???:0
0xffffffffffffffff ???
	???:0
```

Fixes: #5339
Changelog-Fixed: connectd: occasional crash when we reconnect to a peer quickly.
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
2022-06-28 13:46:59 +09:30
Rusty Russell
8e2dcc1167 doc: document the [] IPv6 address hack.
Fixes: #5317
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
2022-06-28 13:46:41 +09:30
Vincenzo Palazzo
2754e30269 devtools: revert changes and make sure chainparams always set.
[Also fixes crash with -h, since chainparams was not set! --RR]
2022-06-28 06:09:01 +09:30
Vincenzo Palazzo
2c9d7484a2 test: introduce the default port function and remove hard coded port 2022-06-28 06:09:01 +09:30
Vincenzo Palazzo
7ff62b4a00 lightnind: removeDEFAULT_PORT global definition
Signed-off-by: Vincenzo Palazzo <vincenzopalazzodev@gmail.com>
2022-06-28 06:09:01 +09:30
Vincenzo Palazzo
c07d44b4d4 docs: update the docs in according with the new code
Suggested-by: @rustyrussell 

Signed-off-by: Vincenzo Palazzo <vincenzopalazzodev@gmail.com>

# Title: 

Signed-off-by: Vincenzo Palazzo <vincenzopalazzodev@gmail.com>
2022-06-28 06:09:01 +09:30
Vincenzo Palazzo
4b11f968ad lightningd: change the regtest default port according with the tests
This doesn't make sense, but for now we just keep it

Signed-off-by: Vincenzo Palazzo <vincenzopalazzodev@gmail.com>
2022-06-28 06:09:01 +09:30
Vincenzo Palazzo
cc7a405ca4 lightningd: use the standard port derivation in connect command
Complete implementation of BOLT1 port derivation proposal https://github.com/lightning/bolts/pull/968

Changelog-Added: rpc: use the standard port derivation in connect command when the port is not specified.

Signed-off-by: Vincenzo Palazzo <vincenzopalazzodev@gmail.com>
2022-06-28 06:09:01 +09:30
Rusty Russell
9152b8c424 pytest: fix test_multifunding_feerates
There's a 1 in 256 chance that our signature on the transaction is 70,
not 71 bytes long.  This changes the freerate.  So fix up the weight in
this case, to be the expected weight.

```
@unittest.skipIf(TEST_NETWORK == 'liquid-regtest', "Fees on elements are different")
@pytest.mark.developer("uses dev-fail")
@pytest.mark.openchannel('v1')  # v2 the weight calculation is off by 3
deftest_multifunding_feerates(node_factory, bitcoind):
'''
    Test feerate parameters for multifundchannel
    '''
        funding_tx_feerate = '10000perkw'
        commitment_tx_feerate_int = 2000
        commitment_tx_feerate = str(commitment_tx_feerate_int) + 'perkw'
        l1, l2, l3 = node_factory.get_nodes(3, opts={'log-level': 'debug'})
        l1.fundwallet(1 << 26)
def_connect_str(node):
return'{}@localhost:{}'.format(node.info['id'], node.port)
        destinations = [{"id": _connect_str(l2), 'amount': 50000}]
        res = l1.rpc.multifundchannel(destinations, feerate=funding_tx_feerate,
                                      commitment_feerate=commitment_tx_feerate)
        entry = bitcoind.rpc.getmempoolentry(res['txid'])
        weight = entry['weight']
        expected_fee = int(funding_tx_feerate[:-5]) * weight // 1000
>       assert expected_fee == entry['fees']['base'] * 10 ** 8
E       AssertionError: assert 7000 == (Decimal('0.00007010') * (10 ** 8))
tests/test_connection.py:1982: AssertionError
```

Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
2022-06-27 17:21:35 +09:30
Rusty Russell
25699994e5 pytest: fix flake in test_node_reannounce
Again, our new behaviour of sending our own gossip even before
they ask can confuse our gossip query tests.

In this case, simply eliminate duplicates.

Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
2022-06-27 17:21:35 +09:30
Rusty Russell
a677ea404b pytest: fix flake in test_query_short_channel_id
Again, our new behaviour of sending our own gossip even before
they ask can confuse our gossip query tests.

Create a new node, attach it, and perform queries on it.

Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
2022-06-27 17:21:35 +09:30
Rusty Russell
4970704d24 pytest: fix gossipwith flake harder.
Even though we generally wait until a node has seen the gossip,
that doesn't mean that connectd has processed it!  This means when
we connect it may still send us "old" gossip.

So we set the OPT_GOSSIP_QUERIES bit, which means don't send until we
ask.  But now it sends us WIRE_QUERY_CHANNEL_RANGE, so everyone needs
to filter that out.

Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
2022-06-27 17:21:35 +09:30
Rusty Russell
e120b4afd6 lightningd: add more information should subd send wrong message.
I saw this once, but could not track it down.

Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
2022-06-27 17:21:35 +09:30
Rusty Russell
3f98cf3fce lightningd: track weird CI crash in test_important_plugin
Looks like we woke one of the startup io_loops early, and thus
we thought we'd finished connectd_activate and we hadn't.  This
caused us to use an uninitialized ld->announceable array, and
finally caused an assert fail in the main loop.

Make *every* loop assert that it was exited for the correct reason,
so if it happens again, we can maybe figure out what part of
the code to look at.

```
lightningd: lightningd/lightningd.c:1186: main: Assertion `io_loop_ret == ld' failed.
lightningd: FATAL SIGNAL 6 (version 4df66fa)
...
------------------------------- Valgrind errors --------------------------------
Valgrind error file: valgrind-errors.895509
==895509== Conditional jump or move depends on uninitialised value(s)
==895509==    at 0x22C58E: to_tal_hdr_or_null (tal.c:184)
==895509==    by 0x22D531: tal_bytelen (tal.c:637)
==895509==    by 0x1F10B6: towire_gossipd_init (gossipd_wiregen.c:100)
==895509==    by 0x13AC6E: gossip_init (gossip_control.c:254)
==895509==    by 0x1497EC: main (lightningd.c:1090)
==895509== 
```

Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
2022-06-27 17:21:35 +09:30
Rusty Russell
9137ea262b pytest: don't assume that join_nodes needs only check ends.
I saw another "only_one()" fail on alias checking: it's not
entirely clear to me with the more aggressive sending of
own gossip, that we necessarily process in order, so we might
not have actually seen all channels just because we saw
the farthest one.

Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
2022-06-27 17:21:35 +09:30
Rusty Russell
3958d59e39 pytest: fix test_channel_lease_post_expiry flake.
Make sure bitcoind gets tx before mining blocks!

```
# l1<->l2 mutual close should work
        chan = l1.get_channel_scid(l2)
        l2.rpc.connect(l1.info['id'], 'localhost', l1.port)
        l1.rpc.close(chan)
        l2.daemon.wait_for_log('State changed from CLOSINGD_SIGEXCHANGE to CLOSINGD_COMPLETE')
        bitcoind.generate_block(2)
        sync_blockheight(bitcoind, [l1, l2])
>       l1.daemon.wait_for_log('Resolved FUNDING_TRANSACTION/FUNDING_OUTPUT by MUTUAL_CLOSE')
tests/test_closing.py:851:
```

Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
2022-06-27 17:21:35 +09:30
Rusty Russell
70b091d9f6 lightningd: fix transient leak report when openingd shutting down.
```
>           raiseValueError(str(errors))
E           ValueError: 
E           Node errors:
E           Global errors:
E            - Node /tmp/ltests-x5sfpiwp/test_openchannel_hook_chaining_1/lightning-2/ has memory leaks: [
E               {
E                   "backtrace": [
E                       "ccan/ccan/tal/tal.c:442 (tal_alloc_)",
E                       "ccan/ccan/io/io.c:91 (io_new_conn_)",
E                       "lightningd/subd.c:773 (new_subd)",
E                       "lightningd/subd.c:827 (new_channel_subd_)",
E                       "lightningd/opening_control.c:870 (peer_start_openingd)",
E                       "lightningd/peer_control.c:1307 (peer_active)",
E                       "lightningd/connect_control.c:457 (connectd_msg)",
E                       "lightningd/subd.c:556 (sd_msg_read)",
E                       "lightningd/subd.c:357 (read_fds)",
E                       "ccan/ccan/io/io.c:59 (next_plan)",
E                       "ccan/ccan/io/io.c:407 (do_plan)",
E                       "ccan/ccan/io/io.c:417 (io_ready)",
E                       "ccan/ccan/io/poll.c:453 (io_loop)",
E                       "lightningd/io_loop_with_timers.c:22 (io_loop_with_timers)",
E                       "lightningd/lightningd.c:1181 (main)",
E                       "../csu/libc-start.c:308 (__libc_start_main)"
E                   ],
E                   "label": "ccan/ccan/io/io.c:91:struct io_conn",
E                   "parents": [
E                       "lightningd/lightningd.c:107:struct lightningd"
E                   ],
E                   "value": "0x2b5a898"
E               }
E           ]
```

Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
2022-06-27 17:21:35 +09:30
Rusty Russell
fcff21fae5 pytest: allow more time for test_waitblockheight !DEVELOPER.
It actually timed out with the default 60 seconds, just before it
saw the block:

```
2022-06-07T02:16:05.2557049Z         bitcoind.generate_block(1)
2022-06-07T02:16:05.2557300Z         sync_blockheight(bitcoind, [node])
2022-06-07T02:16:05.2557594Z         fut1.result(5)
2022-06-07T02:16:05.2557912Z         assert not fut2.done()
2022-06-07T02:16:05.2558121Z     
2022-06-07T02:16:05.2558370Z         # Trigger two blocks.
2022-06-07T02:16:05.2558689Z         bitcoind.generate_block(1)
2022-06-07T02:16:05.2558941Z         sync_blockheight(bitcoind, [node])
2022-06-07T02:16:05.2559219Z >       fut2.result(5)
2022-06-07T02:16:05.2559350Z 
2022-06-07T02:16:05.2559508Z tests/test_misc.py:2138: 
...
2022-06-07T02:16:05.2586947Z         elif "error" in resp:
2022-06-07T02:16:05.2587398Z >           raise RpcError(method, payload, resp['error'])
2022-06-07T02:16:05.2588026Z E           pyln.client.lightning.RpcError: RPC call failed: method: waitblockheight, payload: {'blockheight': 103}, error: {'code': 2000, 'message': 'Timed out.'}
2022-06-07T02:16:05.2588325Z 
2022-06-07T02:16:05.2588563Z contrib/pyln-client/pyln/client/lightning.py:387: RpcError
```

Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
2022-06-27 17:21:35 +09:30
Rusty Russell
b4820d6706 lightningd: don't run off end of buffer if db_hook returns nonsense.
It shouldn't return nonsense, but it did, and we segfaulted.

Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
2022-06-27 17:21:35 +09:30
Rusty Russell
6221ac621d pytest: fix timeout in test_option_types
We were waiting for the start to timeout waiting for the "public key"
message.  Instead, start manually.

Before, this took (with TIMEOUT=30) 97 seconds, after 8 seconds.

Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
2022-06-27 17:21:35 +09:30
Rusty Russell
61d8eb5fa8 pyln-testing: increase default daemon.wait() timeout.
Valgrind in CI is slow:

```
    def test_bitcoin_failure(node_factory, bitcoind):
...
        # Ignore BROKEN log message about blocksonly mode.
        l2 = node_factory.get_node(start=False, expect_fail=True,
                                   allow_broken_log=True)
        l2.daemon.start(wait_for_initialized=False)
        # Will exit with failure code.
>       assert l2.daemon.wait() == 1

tests/test_misc.py:114: 
```

Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
2022-06-27 17:21:35 +09:30
Rusty Russell
d2952576cd pyln-testing: restore proper streaming behaviour for lightningd.
Get it to log direct to stdout, so we see what's happening *as it
happens* rather than as we read it.  We could restore the thread we
were using before, but that added more problems than it solved.

This means that we need the hsm password prompts in the log though.

Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
2022-06-27 17:21:35 +09:30
Rusty Russell
a0e0dbf229 pyln-testing: use files for stdout and stderr, not threads.
Some flakes are caused by weird races in this code.  Plus, if we
get things to write straight to files, we might see things in
there on post-mortem which happen after the python runner exits.

It's a bit less efficient, but much simpler.  Let's see if it helps!

Some tests need a rework now, since we don't get a failure (except
eventual timeout), but they're simpler.

Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
2022-06-27 17:21:35 +09:30
Rusty Russell
bcd050a610 pytest: fix test_ping_timeout flake.
We can hang up due to ping while waiting for channel to become active.
But we don't even need an active channel for this test, so simplify.

```
______________________________ test_ping_timeout _______________________________
[gw1] linux -- Python 3.7.13 /opt/hostedtoolcache/Python/3.7.13/x64/bin/python3

node_factory = <pyln.testing.utils.NodeFactory object at 0x7f132cca8a90>

    @pytest.mark.developer("dev-disconnect required")
    def test_ping_timeout(node_factory):
        # Disconnects after this, but doesn't know it.
        l1_disconnects = ['xWIRE_PING']
    
        l1, l2 = node_factory.line_graph(2, opts=[{'dev-no-reconnect': None,
                                                   'disconnect': l1_disconnects},
>                                                 {}])

tests/test_connection.py:3826: 
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ 
contrib/pyln-testing/pyln/testing/utils.py:1493: in line_graph
    self.join_nodes(nodes, fundchannel, fundamount, wait_for_announce, announce_channels)
contrib/pyln-testing/pyln/testing/utils.py:1470: in join_nodes
    nodes[i].wait_channel_active(scids[i])
contrib/pyln-testing/pyln/testing/utils.py:1003: in wait_channel_active
    wait_for(lambda: self.is_channel_active(chanid))
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ 

success = <function LightningNode.wait_channel_active.<locals>.<lambda> at 0x7f132c3ef830>
timeout = 900

    def wait_for(success, timeout=TIMEOUT):
        start_time = time.time()
        interval = 0.25
        while not success():
            time_left = start_time + timeout - time.time()
            if time_left <= 0:
>               raise ValueError("Timeout while waiting for {}", success)
E               ValueError: ('Timeout while waiting for {}', <function LightningNode.wait_channel_active.<locals>.<lambda> at 0x7f132c3ef830>)
```
2022-06-27 17:21:35 +09:30
Rusty Russell
517828adb2 lightningd: don't print nasty message when onchaind fails partially-failed HTLC
1. We set an outgoing htlc's `failonion` when we get a commitment_signed.
2. We don't transfer it to the corresponding incoming HTLC until we send
   commitment_signed and receive revoke_and_ack (meaning, outgoing htlc is
   completely dead).
3. If between these steps we go onchain, onchaind (after 3 blocks) tells us
   to fail the HTLC.
4. hout->failonion is set, but hout->hin has not been failed yet.  We
   do a sanity check and print a nasty message, and fail it with
   WIRE_PERMANENT_CHANNEL_FAILURE instead of relaying the error.

So handle this case explicitly.

Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
2022-06-27 17:21:35 +09:30
Rusty Russell
353361a05c pytest: test to demonstrate BROKEN message on onchaind htlc close.
We assumed that if the outgoing htlc has failed, we should have always
failed the incoming one.  This is not true, as this testcase demonstrates:

```
lightningd-2: 2022-05-25T04:38:31.449Z DEBUG   035d2b1192dfba134e10e540875d366ebc8bc353d5aa766b80c090b39c3a5d885d-onchaind-chan#2: Sending 1 missing htlc messages
lightningd-2: 2022-05-25T04:38:31.449Z DEBUG   035d2b1192dfba134e10e540875d366ebc8bc353d5aa766b80c090b39c3a5d885d-chan#2: onchain_failed_our_htlc
lightningd-2: 2022-05-25T04:38:31.449Z DEBUG   035d2b1192dfba134e10e540875d366ebc8bc353d5aa766b80c090b39c3a5d885d-chan#2: HTLC id 0 failonion = 0x55cb8bc045d8, failmsg = (nil), preimage = (nil)
lightningd-2: 2022-05-25T04:38:31.449Z **BROKEN** 035d2b1192dfba134e10e540875d366ebc8bc353d5aa766b80c090b39c3a5d885d-chan#2: HTLC id 0 already complete, but ->in not resolved! failonion = 206816a391d3a7666ddd5213914cbb68f5da1fc4a0937e729de5a1990c94d26312caa5f2778e8da0c6bdefc68dd1a3bc28b5b5650fc0bdb3c2247ecca94ed0bbb224c8448c2c71eb1656a8740cadb301bd1ee1c1e774a8fef817352f502e4217b11e93aa6877b88b37afab0e4d4e49ed0385be9ab9a1ab1ac0e3460e41cfafb30ed896cea96e346041919a6c524ce56c3e5f27c7cd78a36b6df221e90a1c6e048c72b4146a5a51885fb70649037fe7ace77a016ae3ec8aee97960d0e5f0582713f671df79d8dee11b82708b6d882ee5adbb328db1938e824110f57ead1b27410bc6f775c7bb4ae40c1768d77a166c9bfda8f634ba0ac4f8a9fe199894dd3754c5ce41c9694544c805ffc177517661f11221dd8dffac60ce1c8c5bf54cda8e5ea44d8ec6b, failmsg = (null), preimage = (null)
lightningd-2: 2022-05-25T04:38:31.449Z **BROKEN** 035d2b1192dfba134e10e540875d366ebc8bc353d5aa766b80c090b39c3a5d885d-chan#2: MISSING incoming fail for 0: failing incoming now
lightningd-2: 2022-05-25T04:38:31.449Z **BROKEN** 0266e4598d1d3c415f572a8488830b60f7e744ed9235eb0b1ba93283b315c03518-chan#1: MISSED incoming fail for 0: failing now
lightningd-2: 2022-05-25T04:38:31.449Z DEBUG   0266e4598d1d3c415f572a8488830b60f7e744ed9235eb0b1ba93283b315c03518-chan#1: HTLC in 0 RCVD_ADD_ACK_REVOCATION->SENT_REMOVE_HTLC
```

Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
2022-06-27 17:21:35 +09:30
Rusty Russell
11de721ba9 gossipd: fix gossmap race.
When upgrading a channel from private to public, we would delete the
private channel then add the public one.  However, this is visible in
the gossmap!  In particular, the node may be removed from the gossmap
and then re-added, so it may temporarily vanish!

I was seeing an occasional assertion inside node_factory.line_graph:

```
@pytest.mark.developer
    def test_reconnect_remote_sends_no_sigs(node_factory):
        """We re-announce, even when remote node doesn't send its announcement_signatures on reconnect.
        """
>       l1, l2 = node_factory.line_graph(2, wait_for_announce=True, opts={'may_reconnect': True})

tests/test_connection.py:870: 
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ 
contrib/pyln-testing/pyln/testing/utils.py:1467: in line_graph
    self.join_nodes(nodes, fundchannel, fundamount, wait_for_announce, announce_channels)
contrib/pyln-testing/pyln/testing/utils.py:1460: in join_nodes
    wait_for(lambda: 'alias' in only_one(end.rpc.listnodes(n.info['id'])['nodes']))
contrib/pyln-testing/pyln/testing/utils.py:88: in wait_for
    while not success():
contrib/pyln-testing/pyln/testing/utils.py:1460: in <lambda>
    wait_for(lambda: 'alias' in only_one(end.rpc.listnodes(n.info['id'])['nodes']))
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ 

arr = []

    def only_one(arr):
        """Many JSON RPC calls return an array; often we only expect a single entry
        """
>       assert len(arr) == 1
E       AssertionError
```

Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
2022-06-27 17:21:35 +09:30
Rusty Russell
2b7915359c gossmap: handle case where private channel turns into public.
Usually we won't see this, since private is deleted.  But we could
have already read the private channel before that.  Handle it properly.

(Tested by removing the gossip_store deletion code and making sure
this worked).

We have to fix up the test, which announces a channel twice!

Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
2022-06-27 17:21:35 +09:30
Rusty Russell
c1a111b68c pytest: wait for stderr, rather than asserting.
Two almost identical assertion fails under CI.  Try waiting.

Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
2022-06-27 17:21:35 +09:30