Commit Graph

11363 Commits

Author SHA1 Message Date
Rusty Russell
a677ea404b pytest: fix flake in test_query_short_channel_id
Again, our new behaviour of sending our own gossip even before
they ask can confuse our gossip query tests.

Create a new node, attach it, and perform queries on it.

Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
2022-06-27 17:21:35 +09:30
Rusty Russell
4970704d24 pytest: fix gossipwith flake harder.
Even though we generally wait until a node has seen the gossip,
that doesn't mean that connectd has processed it!  This means when
we connect it may still send us "old" gossip.

So we set the OPT_GOSSIP_QUERIES bit, which means don't send until we
ask.  But now it sends us WIRE_QUERY_CHANNEL_RANGE, so everyone needs
to filter that out.

Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
2022-06-27 17:21:35 +09:30
Rusty Russell
e120b4afd6 lightningd: add more information should subd send wrong message.
I saw this once, but could not track it down.

Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
2022-06-27 17:21:35 +09:30
Rusty Russell
3f98cf3fce lightningd: track weird CI crash in test_important_plugin
Looks like we woke one of the startup io_loops early, and thus
we thought we'd finished connectd_activate and we hadn't.  This
caused us to use an uninitialized ld->announceable array, and
finally caused an assert fail in the main loop.

Make *every* loop assert that it was exited for the correct reason,
so if it happens again, we can maybe figure out what part of
the code to look at.

```
lightningd: lightningd/lightningd.c:1186: main: Assertion `io_loop_ret == ld' failed.
lightningd: FATAL SIGNAL 6 (version 4df66fa)
...
------------------------------- Valgrind errors --------------------------------
Valgrind error file: valgrind-errors.895509
==895509== Conditional jump or move depends on uninitialised value(s)
==895509==    at 0x22C58E: to_tal_hdr_or_null (tal.c:184)
==895509==    by 0x22D531: tal_bytelen (tal.c:637)
==895509==    by 0x1F10B6: towire_gossipd_init (gossipd_wiregen.c:100)
==895509==    by 0x13AC6E: gossip_init (gossip_control.c:254)
==895509==    by 0x1497EC: main (lightningd.c:1090)
==895509== 
```

Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
2022-06-27 17:21:35 +09:30
Rusty Russell
9137ea262b pytest: don't assume that join_nodes needs only check ends.
I saw another "only_one()" fail on alias checking: it's not
entirely clear to me with the more aggressive sending of
own gossip, that we necessarily process in order, so we might
not have actually seen all channels just because we saw
the farthest one.

Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
2022-06-27 17:21:35 +09:30
Rusty Russell
3958d59e39 pytest: fix test_channel_lease_post_expiry flake.
Make sure bitcoind gets tx before mining blocks!

```
# l1<->l2 mutual close should work
        chan = l1.get_channel_scid(l2)
        l2.rpc.connect(l1.info['id'], 'localhost', l1.port)
        l1.rpc.close(chan)
        l2.daemon.wait_for_log('State changed from CLOSINGD_SIGEXCHANGE to CLOSINGD_COMPLETE')
        bitcoind.generate_block(2)
        sync_blockheight(bitcoind, [l1, l2])
>       l1.daemon.wait_for_log('Resolved FUNDING_TRANSACTION/FUNDING_OUTPUT by MUTUAL_CLOSE')
tests/test_closing.py:851:
```

Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
2022-06-27 17:21:35 +09:30
Rusty Russell
70b091d9f6 lightningd: fix transient leak report when openingd shutting down.
```
>           raiseValueError(str(errors))
E           ValueError: 
E           Node errors:
E           Global errors:
E            - Node /tmp/ltests-x5sfpiwp/test_openchannel_hook_chaining_1/lightning-2/ has memory leaks: [
E               {
E                   "backtrace": [
E                       "ccan/ccan/tal/tal.c:442 (tal_alloc_)",
E                       "ccan/ccan/io/io.c:91 (io_new_conn_)",
E                       "lightningd/subd.c:773 (new_subd)",
E                       "lightningd/subd.c:827 (new_channel_subd_)",
E                       "lightningd/opening_control.c:870 (peer_start_openingd)",
E                       "lightningd/peer_control.c:1307 (peer_active)",
E                       "lightningd/connect_control.c:457 (connectd_msg)",
E                       "lightningd/subd.c:556 (sd_msg_read)",
E                       "lightningd/subd.c:357 (read_fds)",
E                       "ccan/ccan/io/io.c:59 (next_plan)",
E                       "ccan/ccan/io/io.c:407 (do_plan)",
E                       "ccan/ccan/io/io.c:417 (io_ready)",
E                       "ccan/ccan/io/poll.c:453 (io_loop)",
E                       "lightningd/io_loop_with_timers.c:22 (io_loop_with_timers)",
E                       "lightningd/lightningd.c:1181 (main)",
E                       "../csu/libc-start.c:308 (__libc_start_main)"
E                   ],
E                   "label": "ccan/ccan/io/io.c:91:struct io_conn",
E                   "parents": [
E                       "lightningd/lightningd.c:107:struct lightningd"
E                   ],
E                   "value": "0x2b5a898"
E               }
E           ]
```

Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
2022-06-27 17:21:35 +09:30
Rusty Russell
fcff21fae5 pytest: allow more time for test_waitblockheight !DEVELOPER.
It actually timed out with the default 60 seconds, just before it
saw the block:

```
2022-06-07T02:16:05.2557049Z         bitcoind.generate_block(1)
2022-06-07T02:16:05.2557300Z         sync_blockheight(bitcoind, [node])
2022-06-07T02:16:05.2557594Z         fut1.result(5)
2022-06-07T02:16:05.2557912Z         assert not fut2.done()
2022-06-07T02:16:05.2558121Z     
2022-06-07T02:16:05.2558370Z         # Trigger two blocks.
2022-06-07T02:16:05.2558689Z         bitcoind.generate_block(1)
2022-06-07T02:16:05.2558941Z         sync_blockheight(bitcoind, [node])
2022-06-07T02:16:05.2559219Z >       fut2.result(5)
2022-06-07T02:16:05.2559350Z 
2022-06-07T02:16:05.2559508Z tests/test_misc.py:2138: 
...
2022-06-07T02:16:05.2586947Z         elif "error" in resp:
2022-06-07T02:16:05.2587398Z >           raise RpcError(method, payload, resp['error'])
2022-06-07T02:16:05.2588026Z E           pyln.client.lightning.RpcError: RPC call failed: method: waitblockheight, payload: {'blockheight': 103}, error: {'code': 2000, 'message': 'Timed out.'}
2022-06-07T02:16:05.2588325Z 
2022-06-07T02:16:05.2588563Z contrib/pyln-client/pyln/client/lightning.py:387: RpcError
```

Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
2022-06-27 17:21:35 +09:30
Rusty Russell
b4820d6706 lightningd: don't run off end of buffer if db_hook returns nonsense.
It shouldn't return nonsense, but it did, and we segfaulted.

Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
2022-06-27 17:21:35 +09:30
Rusty Russell
6221ac621d pytest: fix timeout in test_option_types
We were waiting for the start to timeout waiting for the "public key"
message.  Instead, start manually.

Before, this took (with TIMEOUT=30) 97 seconds, after 8 seconds.

Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
2022-06-27 17:21:35 +09:30
Rusty Russell
61d8eb5fa8 pyln-testing: increase default daemon.wait() timeout.
Valgrind in CI is slow:

```
    def test_bitcoin_failure(node_factory, bitcoind):
...
        # Ignore BROKEN log message about blocksonly mode.
        l2 = node_factory.get_node(start=False, expect_fail=True,
                                   allow_broken_log=True)
        l2.daemon.start(wait_for_initialized=False)
        # Will exit with failure code.
>       assert l2.daemon.wait() == 1

tests/test_misc.py:114: 
```

Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
2022-06-27 17:21:35 +09:30
Rusty Russell
d2952576cd pyln-testing: restore proper streaming behaviour for lightningd.
Get it to log direct to stdout, so we see what's happening *as it
happens* rather than as we read it.  We could restore the thread we
were using before, but that added more problems than it solved.

This means that we need the hsm password prompts in the log though.

Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
2022-06-27 17:21:35 +09:30
Rusty Russell
a0e0dbf229 pyln-testing: use files for stdout and stderr, not threads.
Some flakes are caused by weird races in this code.  Plus, if we
get things to write straight to files, we might see things in
there on post-mortem which happen after the python runner exits.

It's a bit less efficient, but much simpler.  Let's see if it helps!

Some tests need a rework now, since we don't get a failure (except
eventual timeout), but they're simpler.

Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
2022-06-27 17:21:35 +09:30
Rusty Russell
bcd050a610 pytest: fix test_ping_timeout flake.
We can hang up due to ping while waiting for channel to become active.
But we don't even need an active channel for this test, so simplify.

```
______________________________ test_ping_timeout _______________________________
[gw1] linux -- Python 3.7.13 /opt/hostedtoolcache/Python/3.7.13/x64/bin/python3

node_factory = <pyln.testing.utils.NodeFactory object at 0x7f132cca8a90>

    @pytest.mark.developer("dev-disconnect required")
    def test_ping_timeout(node_factory):
        # Disconnects after this, but doesn't know it.
        l1_disconnects = ['xWIRE_PING']
    
        l1, l2 = node_factory.line_graph(2, opts=[{'dev-no-reconnect': None,
                                                   'disconnect': l1_disconnects},
>                                                 {}])

tests/test_connection.py:3826: 
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ 
contrib/pyln-testing/pyln/testing/utils.py:1493: in line_graph
    self.join_nodes(nodes, fundchannel, fundamount, wait_for_announce, announce_channels)
contrib/pyln-testing/pyln/testing/utils.py:1470: in join_nodes
    nodes[i].wait_channel_active(scids[i])
contrib/pyln-testing/pyln/testing/utils.py:1003: in wait_channel_active
    wait_for(lambda: self.is_channel_active(chanid))
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ 

success = <function LightningNode.wait_channel_active.<locals>.<lambda> at 0x7f132c3ef830>
timeout = 900

    def wait_for(success, timeout=TIMEOUT):
        start_time = time.time()
        interval = 0.25
        while not success():
            time_left = start_time + timeout - time.time()
            if time_left <= 0:
>               raise ValueError("Timeout while waiting for {}", success)
E               ValueError: ('Timeout while waiting for {}', <function LightningNode.wait_channel_active.<locals>.<lambda> at 0x7f132c3ef830>)
```
2022-06-27 17:21:35 +09:30
Rusty Russell
517828adb2 lightningd: don't print nasty message when onchaind fails partially-failed HTLC
1. We set an outgoing htlc's `failonion` when we get a commitment_signed.
2. We don't transfer it to the corresponding incoming HTLC until we send
   commitment_signed and receive revoke_and_ack (meaning, outgoing htlc is
   completely dead).
3. If between these steps we go onchain, onchaind (after 3 blocks) tells us
   to fail the HTLC.
4. hout->failonion is set, but hout->hin has not been failed yet.  We
   do a sanity check and print a nasty message, and fail it with
   WIRE_PERMANENT_CHANNEL_FAILURE instead of relaying the error.

So handle this case explicitly.

Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
2022-06-27 17:21:35 +09:30
Rusty Russell
353361a05c pytest: test to demonstrate BROKEN message on onchaind htlc close.
We assumed that if the outgoing htlc has failed, we should have always
failed the incoming one.  This is not true, as this testcase demonstrates:

```
lightningd-2: 2022-05-25T04:38:31.449Z DEBUG   035d2b1192dfba134e10e540875d366ebc8bc353d5aa766b80c090b39c3a5d885d-onchaind-chan#2: Sending 1 missing htlc messages
lightningd-2: 2022-05-25T04:38:31.449Z DEBUG   035d2b1192dfba134e10e540875d366ebc8bc353d5aa766b80c090b39c3a5d885d-chan#2: onchain_failed_our_htlc
lightningd-2: 2022-05-25T04:38:31.449Z DEBUG   035d2b1192dfba134e10e540875d366ebc8bc353d5aa766b80c090b39c3a5d885d-chan#2: HTLC id 0 failonion = 0x55cb8bc045d8, failmsg = (nil), preimage = (nil)
lightningd-2: 2022-05-25T04:38:31.449Z **BROKEN** 035d2b1192dfba134e10e540875d366ebc8bc353d5aa766b80c090b39c3a5d885d-chan#2: HTLC id 0 already complete, but ->in not resolved! failonion = 206816a391d3a7666ddd5213914cbb68f5da1fc4a0937e729de5a1990c94d26312caa5f2778e8da0c6bdefc68dd1a3bc28b5b5650fc0bdb3c2247ecca94ed0bbb224c8448c2c71eb1656a8740cadb301bd1ee1c1e774a8fef817352f502e4217b11e93aa6877b88b37afab0e4d4e49ed0385be9ab9a1ab1ac0e3460e41cfafb30ed896cea96e346041919a6c524ce56c3e5f27c7cd78a36b6df221e90a1c6e048c72b4146a5a51885fb70649037fe7ace77a016ae3ec8aee97960d0e5f0582713f671df79d8dee11b82708b6d882ee5adbb328db1938e824110f57ead1b27410bc6f775c7bb4ae40c1768d77a166c9bfda8f634ba0ac4f8a9fe199894dd3754c5ce41c9694544c805ffc177517661f11221dd8dffac60ce1c8c5bf54cda8e5ea44d8ec6b, failmsg = (null), preimage = (null)
lightningd-2: 2022-05-25T04:38:31.449Z **BROKEN** 035d2b1192dfba134e10e540875d366ebc8bc353d5aa766b80c090b39c3a5d885d-chan#2: MISSING incoming fail for 0: failing incoming now
lightningd-2: 2022-05-25T04:38:31.449Z **BROKEN** 0266e4598d1d3c415f572a8488830b60f7e744ed9235eb0b1ba93283b315c03518-chan#1: MISSED incoming fail for 0: failing now
lightningd-2: 2022-05-25T04:38:31.449Z DEBUG   0266e4598d1d3c415f572a8488830b60f7e744ed9235eb0b1ba93283b315c03518-chan#1: HTLC in 0 RCVD_ADD_ACK_REVOCATION->SENT_REMOVE_HTLC
```

Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
2022-06-27 17:21:35 +09:30
Rusty Russell
11de721ba9 gossipd: fix gossmap race.
When upgrading a channel from private to public, we would delete the
private channel then add the public one.  However, this is visible in
the gossmap!  In particular, the node may be removed from the gossmap
and then re-added, so it may temporarily vanish!

I was seeing an occasional assertion inside node_factory.line_graph:

```
@pytest.mark.developer
    def test_reconnect_remote_sends_no_sigs(node_factory):
        """We re-announce, even when remote node doesn't send its announcement_signatures on reconnect.
        """
>       l1, l2 = node_factory.line_graph(2, wait_for_announce=True, opts={'may_reconnect': True})

tests/test_connection.py:870: 
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ 
contrib/pyln-testing/pyln/testing/utils.py:1467: in line_graph
    self.join_nodes(nodes, fundchannel, fundamount, wait_for_announce, announce_channels)
contrib/pyln-testing/pyln/testing/utils.py:1460: in join_nodes
    wait_for(lambda: 'alias' in only_one(end.rpc.listnodes(n.info['id'])['nodes']))
contrib/pyln-testing/pyln/testing/utils.py:88: in wait_for
    while not success():
contrib/pyln-testing/pyln/testing/utils.py:1460: in <lambda>
    wait_for(lambda: 'alias' in only_one(end.rpc.listnodes(n.info['id'])['nodes']))
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ 

arr = []

    def only_one(arr):
        """Many JSON RPC calls return an array; often we only expect a single entry
        """
>       assert len(arr) == 1
E       AssertionError
```

Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
2022-06-27 17:21:35 +09:30
Rusty Russell
2b7915359c gossmap: handle case where private channel turns into public.
Usually we won't see this, since private is deleted.  But we could
have already read the private channel before that.  Handle it properly.

(Tested by removing the gossip_store deletion code and making sure
this worked).

We have to fix up the test, which announces a channel twice!

Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
2022-06-27 17:21:35 +09:30
Rusty Russell
c1a111b68c pytest: wait for stderr, rather than asserting.
Two almost identical assertion fails under CI.  Try waiting.

Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
2022-06-27 17:21:35 +09:30
Rusty Russell
2f27aad8d0 pytest: fix flake in test_wallet.py::test_hsm_secret_encryption
Under valgrind, at least, this test fails regularly, and this sleep
fixes it.

Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
2022-06-27 17:21:35 +09:30
Rusty Russell
b1f393f355 pytest: fix race in test_multichan
We can fail to use larger channel if it's not ready yet:

```
2022-05-23T01:20:05.5325600Z         # Check it used the larger channel!
2022-05-23T01:20:05.5326376Z >       assert before[chan23a_idx]['to_us_msat'] == after[chan23a_idx]['to_us_msat']
2022-05-23T01:20:05.5326961Z E       assert 1000000000msat == 900000000msat
2022-05-23T01:20:05.5327240Z 
2022-05-23T01:20:05.5327621Z tests/test_connection.py:3896: AssertionError
```

Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
2022-06-27 17:21:35 +09:30
Rusty Russell
6247bee11e pytest: try to fix bitcoind socket timeout in test_channel_lease_unilat_closes
Split harder!

Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>

```
 # This can timeout, so do it in four easy stages.
        for i in range(4):
>           bitcoind.generate_block(4032 // 4)

tests/test_closing.py:971: 
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ 
contrib/pyln-testing/pyln/testing/utils.py:475: in generate_block
    return self.rpc.generatetoaddress(numblocks, to_addr)
contrib/pyln-testing/pyln/testing/utils.py:372: in f
    res = proxy._call(name, *args)
/opt/hostedtoolcache/Python/3.7.13/x64/lib/python3.7/site-packages/bitcoin/rpc.py:233: in _call
    response = self._get_response()
/opt/hostedtoolcache/Python/3.7.13/x64/lib/python3.7/site-packages/bitcoin/rpc.py:263: in _get_response
    http_response = self.__conn.getresponse()
/opt/hostedtoolcache/Python/3.7.13/x64/lib/python3.7/http/client.py:1373: in getresponse
    response.begin()
/opt/hostedtoolcache/Python/3.7.13/x64/lib/python3.7/http/client.py:319: in begin
    version, status, reason = self._read_status()
/opt/hostedtoolcache/Python/3.7.13/x64/lib/python3.7/http/client.py:280: in _read_status
    line = str(self.fp.readline(_MAXLINE + 1), "iso-8859-1")
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ 

self = <socket.SocketIO object at 0x7f1a4b3702d0>
b = <memory at 0x7f1a4ac51600>

    def readinto(self, b):
        """Read up to len(b) bytes into the writable buffer *b* and return
        the number of bytes read.  If the socket is non-blocking and no bytes
        are available, None is returned.
    
        If *b* is non-empty, a 0 return value indicates that the connection
        was shutdown at the other end.
        """
        self._checkClosed()
        self._checkReadable()
        if self._timeout_occurred:
            raise OSError("cannot read from timed out object")
        while True:
            try:
>               return self._sock.recv_into(b)
E               socket.timeout: timed out
```
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
2022-06-27 17:21:35 +09:30
Rusty Russell
eb25e08039 pytest: fix port allocation race when nodes restart.
We use ephemeral_port_reserve to grab ports, but this can fail when we
restart a node, since the port can be reallocated at that point.

Attempt to overcome this using a global reserved list (is there a
neater way?).

Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
2022-06-27 17:21:35 +09:30
Rusty Russell
a1b8b40d13 connectd: fix debug message on bind fail.
It doesn't get the right errno, and it says "create" not "bind".

```
2022-05-20T03:04:46.498Z DEBUG   connectd: Failed to create 2 socket: Success
2022-05-20T03:04:46.500Z DEBUG   connectd: REPLY WIRE_CONNECTD_INIT_REPLY with 0 fds
2022-05-20T03:04:46.501Z DEBUG   connectd: connectd_init_done
2022-05-20T03:04:46.503Z **BROKEN** connectd: Failed to bind socket for 127.0.0.1:37871: Address already in use
```

Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
2022-06-27 17:21:35 +09:30
Rusty Russell
71b1eaf2fe pyln-testing: try harder if cleaning directory fails.
This happens under CI, but it's not informative.  Sleep and retry.

Also, "except (OSError, Exception)" does not seem to do what you'd think:
this clause never gets run.

Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
2022-06-27 17:21:35 +09:30
Rusty Russell
d274de2bf4 pytest: fix flake in test_htlc_too_dusty_outgoing
We can sneak the HTLC add in before we do the update_fee which fails:

```
        # the channel should start warning -- too much dust
        inv = l2.rpc.invoice(htlc_val_msat, str(num_dusty_htlcs + 1), str(num_dusty_htlcs + 1))
        with pytest.raises(RpcError, match=r'WIRE_TEMPORARY_CHANNEL_FAILURE'):
>           l1.rpc.sendpay(route, inv['payment_hash'], payment_secret=inv['payment_secret'])
E           Failed: DID NOT RAISE <class 'pyln.client.lightning.RpcError'>

tests/test_pay.py:2654: Failed
```

From the logs:

```
lightningd-1: 2022-05-20T02:16:40.008Z DEBUG   022d223620a359a47ff7f7ac447c85c46c923da53389221a0054c11c1e3ca31d59-channeld-chan#1: NEW:: HTLC LOCAL 14 = SENT_ADD_HTLC/RCVD_ADD_HTLC
lightningd-1: 2022-05-20T02:16:40.010Z DEBUG   022d223620a359a47ff7f7ac447c85c46c923da53389221a0054c11c1e3ca31d59-channeld-chan#1: Adding HTLC 14 amount=10000000msat cltv=113 gave CHANNEL_ERR_ADD_OK
lightningd-1: 2022-05-20T02:16:40.012Z DEBUG   022d223620a359a47ff7f7ac447c85c46c923da53389221a0054c11c1e3ca31d59-channeld-chan#1: peer_out WIRE_UPDATE_ADD_HTLC
lightningd-1: 2022-05-20T02:16:40.015Z DEBUG   022d223620a359a47ff7f7ac447c85c46c923da53389221a0054c11c1e3ca31d59-channeld-chan#1: REPLY WIRE_CHANNELD_OFFER_HTLC_REPLY with 0 fds
lightningd-1: 2022-05-20T02:16:40.026Z DEBUG   022d223620a359a47ff7f7ac447c85c46c923da53389221a0054c11c1e3ca31d59-channeld-chan#1: peer_out WIRE_WARNING
lightningd-1: 2022-05-20T02:16:40.029Z DEBUG   022d223620a359a47ff7f7ac447c85c46c923da53389221a0054c11c1e3ca31d59-channeld-chan#1: billboard perm: Too much dust to update fee (Desired feerate update 20000)
lightningd-1: 2022-05-20T02:16:40.035Z INFO    022d223620a359a47ff7f7ac447c85c46c923da53389221a0054c11c1e3ca31d59-chan#1: Peer transient failure in CHANNELD_NORMAL: channeld WARNING: Too much dust to update fee (Desired feerate update 20000)
lightningd-1: 2022-05-20T02:16:40.039Z DEBUG   022d223620a359a47ff7f7ac447c85c46c923da53389221a0054c11c1e3ca31d59-lightningd: Will try reconnect in 60 seconds
```

Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
2022-06-27 17:21:35 +09:30
Rusty Russell
930860c781 pytest: fix flakes in test_gossip_query_channel_range
Nodes now spray their own gossip at the start; this can race with
our query, and they can give us gossip we didn't ask for.

The fix is to always query an uninvolved node: in this case, we
only connected l3 and l4 to test zlib, which is gone, so now we can
remove that and simply always query l4.

```
2022-05-23T01:32:53.4695809Z         # reply_channel_range == 264
2022-05-23T01:32:53.4696416Z >       assert msgs == ['0108'
2022-05-23T01:32:53.4696835Z                         # blockhash
2022-05-23T01:32:53.4697328Z                         + genesis_blockhash
2022-05-23T01:32:53.4697841Z                         # first_blocknum, number_of_blocks, complete
2022-05-23T01:32:53.4698685Z                         + format(0, '08x') + format(65535, '08x') + '01'
2022-05-23T01:32:53.4699245Z                         # encoded_short_ids
2022-05-23T01:32:53.4699859Z                         + format(len(encoded) // 2, '04x')
2022-05-23T01:32:53.4700255Z                         + encoded]
2022-05-23T01:32:53.4700799Z E       AssertionError: assert ['0100194b419...700000010000'] == ['010806226e4...700000010000']
2022-05-23T01:32:53.4710678Z E         At index 0 diff: '0100194b4190af98beedf231901aeda2dcee0e757dd5327219aa007ea6027e769c98665f49e0f99e03b91e80896ed01cd9c4b157c77e908852097d2ba753c6ca01ae41a90a81c9f492c770d9dd35010c702c7c564045cf85a4bf6efaf25cceb9b81b56d1526429d3c60bf1c7ed6a835720730fd7ad1851a937d18aecf8bc3400dd4e0a30d5c4d65999ac39ae6eb3b98d27e3157973598256d2dd4718cd39010a3612580e7f3bba9eda0fb595976f955e345b6affa62fd556f7257a30d54dd87743dc38585e7c31e1ca80b42cf803bd4c1aaa3127351c95de8a4198c46eb0bd7b4ce175a4b51fcaf9676b01806d237758494fcf3d1558279060a0c3b84806a3330c41000006226e46111a0b59caaf126043eb5bbf28c34f3a5e332a1fc7b2b73cf188910f0000700000010000035d2b1192dfba134e10e540875d366ebc8bc353d5aa766b80c090b39c3a5d885d0382ce59ebf18be7d84677c2e35f23294b9992ceca95491fcf8a56c6cb2d9de19903c47b1b2afcfd68c7b86c0976adb9a4f0835bc78242fced745d78433497a867d6021c29746e4136ddff456483df3980c2d0d5f31c93ef5ded564f7294a10d7414aa' != '010806226e46111a0b59caaf126043eb5bbf28c34f3a5e332a1fc7b2b73cf188910f000000000000ffff01001900000068000001000100006900000100000000700000010000'
```

Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
2022-06-27 17:21:35 +09:30
Rusty Russell
aefcea44a3 pytest: fix flake in test_gossip_timestamp_filter
We didn't get the second channel_announcement in the filter:
```
        # Now choose range which will only give second update.
        msgs = l4.query_gossip('gossip_timestamp_filter',
                               genesis_blockhash,
                               after_12 - backdate,
                               after_23 - after_12 + 1,
                               filters=['0109', '0012'])
    
        # 0x0100 = channel_announcement
        # 0x0102 = channel_update
        # (Node announcement may have any timestamp)
        types = Counter([m[0:4] for m in msgs])
>       assert types['0100'] == 1
E       assert 0 == 1
```

Looking at the gossip store:

```
1089: t=1653008725 channel_announcement: 0100420c13f233d9b461b035257b77704e0003a0280efdbb8f8964bf4c5bd22b8f8b2e205a3c57d7d5f4dcd812cd41be3d8f84c60046c1fd31abefd0985c2263f8f5074bb7dff548ac106dc10ef4485f5921d9b252d094f9518b44a842c1a2c7f98759fa2f8803febeb59e50b65a8f258877e18f6224262e218de38f21438b2112ef0425d64174092bcd8d757f18982c3fb83c8ae2fb5ee88363248e51bdea247cef055a01a17580b60735b82b88fea87dccb724a743beaf4bf3cf578b0da445afcc786fe3b061429f3288e6970819236e97134898ce900b7b61159148d6785e26932e21f1c337cb6f86a16e967e27396b168a19b0d1478c4be2579a9b96aa0125aa000006226e46111a0b59caaf126043eb5bbf28c34f3a5e332a1fc7b2b73cf188910f00006e0000010000022d223620a359a47ff7f7ac447c85c46c923da53389221a0054c11c1e3ca31d59035d2b1192dfba134e10e540875d366ebc8bc353d5aa766b80c090b39c3a5d885d02d595ae92b3544c3250fb772f214ad8d4c51425033740a5bcc357190add6d7e7a02d6063d022691b2490ab454dee73a57c6ff5d308352b461ece69f3c284f2c2412
1533: channel_amount: 100000sat
1555: t=1653008725 channel_update: 01022b98e7d680ffa8096d0a2bda62c0b8b3fc975a5f91c0cff0a42a76b37285276c0156581f85c0be2ee0ed7e7003cb67f5d476a7e3714960836b882bb3ad78097406226e46111a0b59caaf126043eb5bbf28c34f3a5e332a1fc7b2b73cf188910f00006e00000100006286e955010000060000000000000000000000010000000a0000000005e69ec0
1705: t=1653008729 node_announcement: 010175f8ed9f48db511d43f1da96aa307ccc53a521dcc2f8fb732c6938650e6cc00f672e772134a29e16b723a1e9cf626cf942323bd3fa294befd25430a9395c1fa40007800000080269a26286e959035d2b1192dfba134e10e540875d366ebc8bc353d5aa766b80c090b39c3a5d885d035d2b484f5050494e47464952452d34646636366661000000000000000000000000000000
1866: t=1653008720 channel_update: 010266385083f1d6e554e1701f2456c24212c5aa937f7b090d32d243678b2fe78f7f52ed3626d6cdd52703bf78013bb9d1a621ae5b7d68fc965d29bb94574e340b5606226e46111a0b59caaf126043eb5bbf28c34f3a5e332a1fc7b2b73cf188910f00006e00000100006286e950010100060000000000000000000000010000000a00000
```

It's not clear what we times we were querying, but let's make it
clearer by putting a full 10 seconds between them, and making our
queries more exact.

Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
2022-06-27 17:21:35 +09:30
Rusty Russell
575b94c1ef pytest: Remove all trace of python's "flaky" module.
Over time, it has cost us more developer cycles than it has gained.
It has hidden intermittant bugs, and allowed cruft to accumulate:
when we eventually tried to figure out what was going wrong, the
actual change which caused it was now stale and forgotten.

This was a particular bane during the connectd rewrite, and I
worked through some issues which had occurred before, but were not
more likely.

Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
2022-06-27 17:21:35 +09:30
Rusty Russell
56dde2cb77 lightningd: multiple log-file options allow more than one log output.
I've wanted this for a while: the ability to log to multiple places
at once.

Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
Changelog-Changed: lightningd: `log-file` option specified multiple times opens multiple log files.
2022-06-27 17:21:35 +09:30
Christian Decker
92f10f2c34 pyln: Fix relative path dependencies when publishing to PyPI
So this was quite a journey:

 - We want relative depdendencies (using the `path` argument) whenever
   developing locally. Otherwise we would have to install each
   dependency every time we change a single character, which
   undoubtedly would cause us to waste time trying to debug an issue
   just because we forgot to install.
 - When publishing however we want to rely on the version number,
   since the repo context gets lost upon publishing, and path
   dependencies cause failures.

The solution then it seems is to use `dev-dependencies` (not that
surprising once you find it) with relative paths, so that `poetry
install` uses these over the normal dependencies (no idea how they
dedup them) and use `dependencies` when publishing. The paths are
still in there when publishing, but `pip install` ignores them.

I checked that `poetry install` from an unrelated project doesn't
accidentally use the path dependencies, even when adding them as
dev-dependencies. This should hopefully also allow installing them
as a repo link, though I can't test that right now.
2022-06-26 13:54:01 +09:30
Rusty Russell
a196c77fe6 common: downgrade LND 'internal error' properly.
Thanks to @zerofeerouting for another report.

"desc" here is the sanitized message, eg:
"ERROR error channel 0123456789abcdef0123456789abcdef0123456789abcdef0123456789abcdef: internal error"

Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
2022-06-26 13:52:14 +09:30
Rusty Russell
e48c0dda85 common: downgrade "internal error" errors from lnd.
Prior to 0.11.0 we had cases where we would treat errors
as warnings: regretfully, this is still needed.  This message
in particular has been widely reported, and it now causes
channel force closes.

Downgrade and log.  I did insert some snarky log message earlier,
but hey, I'm sure CLN has done worse things to our peers!

Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
Changelog-Fixed: Protocol: treat LND "internal error" as warnings, not force close events (as we did in v0.10).
2022-06-26 13:52:14 +09:30
Christian Decker
5444a843b6 git: Ignore some more generated files and the cln-grpc plugin 2022-06-24 16:27:13 +02:00
Vincenzo Palazzo
d4bc4f6460 signmessage: improve the UX of the rpc command when zbase is not a valid one
Changelog-Fixed: signmessage: improve the UX of the rpc command when zbase is not a valid one

Stacktrace generated with a bad `zbase`

```
lightningd: lightningd/signmessage.c:59: from_zbase32: Assertion `len == tal_bytelen(u8arr)' failed
lightningd: FATAL SIGNAL 6 (version v0.11.1)
0x55b9b1b4e617 send_backtrace
[...]
```

Signed-off-by: Vincenzo Palazzo <vincenzopalazzodev@gmail.com>
2022-06-24 14:02:04 +09:30
Dustin Dettmer
49c6459148 Update hsm error formatting.
Changelog-None
2022-06-24 13:57:17 +09:30
Rusty Russell
0b7897302e common/gossip_store: optimize case where entries are filtered out.
@whitslack complained of large CPU usage by connectd at startup;
I ran perf record on connectd on my machine (which sees a little spike, only)
and I see the cost of reading and discarding the entries:

```
-   95.52%     5.24%  lightning_conne  lightning_connectd  [.] gossip_store_next
   - 90.28% gossip_store_next
      + 40.27% tal_alloc_arr_
      + 22.78% tal_free
      + 11.74% crc32c
      + 9.32% fromwire_peektype
      + 4.10% __libc_pread64 (inlined)
        1.70% be32_to_cpu
```

Much of this is caused by the search for our own gossip: keeping this separately
would be even better, but this fix is minimal.

Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
Changelog-Fixed: connectd: reduce initial CPU load when connecting to peers.
2022-06-24 10:33:04 +09:30
Rusty Russell
37403e471c pytest: add (failing) test for db with old Tor v2 address.
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
2022-06-24 10:24:39 +09:30
Rusty Russell
a0c6447611 lightningd: remove overzealous assertion.
This is being hit: it's possible if connectd and lightningd get desynchronized,
and we'll handle this later when peer is activated.

Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
2022-06-24 10:16:18 +09:30
Justin Moon
acc78397eb Remove native-tls dependency from cln-rpc 2022-06-23 15:45:06 +02:00
Jan Sarenik
bb53a17855 doc: Tor needs CookieAuthFile set
The message was

    [warn] CookieAuthFileGroupReadable is set, but will have no effect: you must specify an explicit CookieAuthFile to have it group-readable.

Changelog-None
2022-06-23 09:40:47 +02:00
Christian Decker
e6442d798e cln-plugin: Make the proxy-related configuration Option<>
These are only populated if a proxy was specified, see
lightningd/plugin.c:1855, so we were getting upset when we expected
them and they weren't set.
2022-06-22 12:07:55 +02:00
Justin Moon
5ec424bc86 cln-plugin: Configuration struct
Represents the "configuration" part of the "init" message during
plugin initialization.

Changelog-Added: cln_plugin: persist cln configuration from init msg
2022-06-22 12:07:55 +02:00
Justin Moon
318b6e803e cln-plugin: Add unittest for parsing "init" message 2022-06-22 12:07:55 +02:00
Justin Moon
14483901cd cln-plugin: Save "configuration" from "init" method 2022-06-22 12:07:55 +02:00
Christian Decker
9a880a0932 clnrs: Implement backwards compatible mode for Amount as bare u64
Since we want to transition to raw `u64` values we need to accept both
for some time.
2022-06-21 06:52:35 +09:30
Rusty Russell
55f94322e5 doc/PLUGINS: update to remove deprecated fields, formats.
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
2022-06-21 06:52:35 +09:30
Rusty Russell
60bd70be85 JSON: deprecate printing msat fields as strings.
This changes many fields: in non-deprecated mode, they're now raw integers.
This was always the intention, but the transition was never completed.

Suggested-By: @ShahanaFarooqui
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
Changelog-Changed: JSON-RPC: "_msat" fields can be raw numbers, not "123msat" strings (please handle both!)
Changelog-Deprecated: JSON-RPC: "_msat" fields as "123msat" strings (will be only numbers)
2022-06-21 06:52:35 +09:30
Rusty Russell
5531de99de lease_rates: prepare for msats fields as raw numbers.
We would otherwise multiply them by 1000.

Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
2022-06-21 06:52:35 +09:30
Rusty Russell
6461815dd9 plugins/funder: fix parameter parsing.
This code was buggy: handing "1000" as a parameter to
min_their_funding_msat, don't turn that into "1000sat"!

Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
2022-06-21 06:52:35 +09:30