`peer_reconnected` was freeing a `struct peer_reconnected` instance
while a pointer to that instance was registered to be passed as an
argument to the `retry_peer_connected` callback function. This caused a
use-after-free crash when `retry_peer_connected` attempted to reparent
the instance to the temporary context.
Instead, never have `peer_reconnected` free a `struct peer_reconnected`
instance, and only ever allow such an instance to be freed after the
`retry_peer_connected` callback has finished with it. To ensure that the
instance is freed even if the connection is closed before the callback
can be invoked, parent the instance to the connection rather than to the
daemon.
Absent the need to free `struct peer_reconnected` instances outside of
the `retry_peer_connected` callback, there is no use for the
`reconnected` hashtable, so remove it as well.
See: https://github.com/ElementsProject/lightning/issues/5282#issuecomment-1141454255Fixes: #5282Fixes: #5284
Changelog-Fixed: connectd no longer crashes when peers reconnect.
The crash below from @zerofeerouting left me confused. The invalid
value in fmt_wireaddr_internal is a telltale sign of use-after-free.
This backtrace shows us destroying the conn *twice*: what's happening?
Well, tal carefully protects against destroying twice: it's not that
unusual to free something in a destructor which has already been freed.
So this indicates that there are *two* io_conn hanging off one
struct connecting, which isn't supposed to happen! We deliberately
call try_connect_one_addr() initially, then inside the io_conn destructor.
But due to races in connectd vs lightningd connection state, we added
a fix which allows a connect command to sit around while the peer is
cleaning up (6cc9f37cab) and get fired
off when it's done.
But what if, in the chaos, we are already connecting again? Now we'll
end up with *two* connections.
Fortunately, we have a `conn` pointer inside struct connecting, which
(with a bit of additional care) we can ensure is only non-NULL while
we're actually trying to connect. This lets us check that before
firing off a new connection attempt in peer_conn_closed.
```
lightning_connectd: FATAL SIGNAL 6 (version v0.11.2rc2-2-g8f7e939)
0x5614a4915ae8 send_backtrace
common/daemon.c:33
0x5614a4915b72 crashdump
common/daemon.c:46
0x7ffa14fcd72f ???
???:0
0x7ffa14dc87bb ???
???:0
0x7ffa14db3534 ???
???:0
0x5614a491fc71 fmt_wireaddr_internal
common/wireaddr.c:255
0x5614a491fc7a fmt_wireaddr_internal_
common/wireaddr.c:257
0x5614a491ea6b type_to_string_
common/type_to_string.c:32
0x5614a490beaa destroy_io_conn
connectd/connectd.c:754
0x5614a494a2f1 destroy_conn
ccan/ccan/io/poll.c:246
0x5614a494a313 destroy_conn_close_fd
ccan/ccan/io/poll.c:252
0x5614a4953804 notify
ccan/ccan/tal/tal.c:240
0x5614a49538d6 del_tree
ccan/ccan/tal/tal.c:402
0x5614a4953928 del_tree
ccan/ccan/tal/tal.c:412
0x5614a4953e07 tal_free
ccan/ccan/tal/tal.c:486
0x5614a4908b7a try_connect_one_addr
connectd/connectd.c:870
0x5614a490bef1 destroy_io_conn
connectd/connectd.c:759
0x5614a494a2f1 destroy_conn
ccan/ccan/io/poll.c:246
0x5614a494a313 destroy_conn_close_fd
ccan/ccan/io/poll.c:252
0x5614a4953804 notify
ccan/ccan/tal/tal.c:240
0x5614a49538d6 del_tree
ccan/ccan/tal/tal.c:402
0x5614a4953e07 tal_free
ccan/ccan/tal/tal.c:486
0x5614a4948f08 io_close
ccan/ccan/io/io.c:450
0x5614a4948f59 do_plan
ccan/ccan/io/io.c:401
0x5614a4948fe1 io_ready
ccan/ccan/io/io.c:417
0x5614a494a8e6 io_loop
ccan/ccan/io/poll.c:453
0x5614a490c12f main
connectd/connectd.c:2164
0x7ffa14db509a ???
???:0
0x5614a4904e99 ???
???:0
0xffffffffffffffff ???
???:0
```
Fixes: #5339
Changelog-Fixed: connectd: occasional crash when we reconnect to a peer quickly.
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
Complete implementation of BOLT1 port derivation proposal https://github.com/lightning/bolts/pull/968
Changelog-Added: rpc: use the standard port derivation in connect command when the port is not specified.
Signed-off-by: Vincenzo Palazzo <vincenzopalazzodev@gmail.com>
There's a 1 in 256 chance that our signature on the transaction is 70,
not 71 bytes long. This changes the freerate. So fix up the weight in
this case, to be the expected weight.
```
@unittest.skipIf(TEST_NETWORK == 'liquid-regtest', "Fees on elements are different")
@pytest.mark.developer("uses dev-fail")
@pytest.mark.openchannel('v1') # v2 the weight calculation is off by 3
deftest_multifunding_feerates(node_factory, bitcoind):
'''
Test feerate parameters for multifundchannel
'''
funding_tx_feerate = '10000perkw'
commitment_tx_feerate_int = 2000
commitment_tx_feerate = str(commitment_tx_feerate_int) + 'perkw'
l1, l2, l3 = node_factory.get_nodes(3, opts={'log-level': 'debug'})
l1.fundwallet(1 << 26)
def_connect_str(node):
return'{}@localhost:{}'.format(node.info['id'], node.port)
destinations = [{"id": _connect_str(l2), 'amount': 50000}]
res = l1.rpc.multifundchannel(destinations, feerate=funding_tx_feerate,
commitment_feerate=commitment_tx_feerate)
entry = bitcoind.rpc.getmempoolentry(res['txid'])
weight = entry['weight']
expected_fee = int(funding_tx_feerate[:-5]) * weight // 1000
> assert expected_fee == entry['fees']['base'] * 10 ** 8
E AssertionError: assert 7000 == (Decimal('0.00007010') * (10 ** 8))
tests/test_connection.py:1982: AssertionError
```
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
Again, our new behaviour of sending our own gossip even before
they ask can confuse our gossip query tests.
In this case, simply eliminate duplicates.
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
Again, our new behaviour of sending our own gossip even before
they ask can confuse our gossip query tests.
Create a new node, attach it, and perform queries on it.
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
Even though we generally wait until a node has seen the gossip,
that doesn't mean that connectd has processed it! This means when
we connect it may still send us "old" gossip.
So we set the OPT_GOSSIP_QUERIES bit, which means don't send until we
ask. But now it sends us WIRE_QUERY_CHANNEL_RANGE, so everyone needs
to filter that out.
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
Looks like we woke one of the startup io_loops early, and thus
we thought we'd finished connectd_activate and we hadn't. This
caused us to use an uninitialized ld->announceable array, and
finally caused an assert fail in the main loop.
Make *every* loop assert that it was exited for the correct reason,
so if it happens again, we can maybe figure out what part of
the code to look at.
```
lightningd: lightningd/lightningd.c:1186: main: Assertion `io_loop_ret == ld' failed.
lightningd: FATAL SIGNAL 6 (version 4df66fa)
...
------------------------------- Valgrind errors --------------------------------
Valgrind error file: valgrind-errors.895509
==895509== Conditional jump or move depends on uninitialised value(s)
==895509== at 0x22C58E: to_tal_hdr_or_null (tal.c:184)
==895509== by 0x22D531: tal_bytelen (tal.c:637)
==895509== by 0x1F10B6: towire_gossipd_init (gossipd_wiregen.c:100)
==895509== by 0x13AC6E: gossip_init (gossip_control.c:254)
==895509== by 0x1497EC: main (lightningd.c:1090)
==895509==
```
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
I saw another "only_one()" fail on alias checking: it's not
entirely clear to me with the more aggressive sending of
own gossip, that we necessarily process in order, so we might
not have actually seen all channels just because we saw
the farthest one.
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
Make sure bitcoind gets tx before mining blocks!
```
# l1<->l2 mutual close should work
chan = l1.get_channel_scid(l2)
l2.rpc.connect(l1.info['id'], 'localhost', l1.port)
l1.rpc.close(chan)
l2.daemon.wait_for_log('State changed from CLOSINGD_SIGEXCHANGE to CLOSINGD_COMPLETE')
bitcoind.generate_block(2)
sync_blockheight(bitcoind, [l1, l2])
> l1.daemon.wait_for_log('Resolved FUNDING_TRANSACTION/FUNDING_OUTPUT by MUTUAL_CLOSE')
tests/test_closing.py:851:
```
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
```
> raiseValueError(str(errors))
E ValueError:
E Node errors:
E Global errors:
E - Node /tmp/ltests-x5sfpiwp/test_openchannel_hook_chaining_1/lightning-2/ has memory leaks: [
E {
E "backtrace": [
E "ccan/ccan/tal/tal.c:442 (tal_alloc_)",
E "ccan/ccan/io/io.c:91 (io_new_conn_)",
E "lightningd/subd.c:773 (new_subd)",
E "lightningd/subd.c:827 (new_channel_subd_)",
E "lightningd/opening_control.c:870 (peer_start_openingd)",
E "lightningd/peer_control.c:1307 (peer_active)",
E "lightningd/connect_control.c:457 (connectd_msg)",
E "lightningd/subd.c:556 (sd_msg_read)",
E "lightningd/subd.c:357 (read_fds)",
E "ccan/ccan/io/io.c:59 (next_plan)",
E "ccan/ccan/io/io.c:407 (do_plan)",
E "ccan/ccan/io/io.c:417 (io_ready)",
E "ccan/ccan/io/poll.c:453 (io_loop)",
E "lightningd/io_loop_with_timers.c:22 (io_loop_with_timers)",
E "lightningd/lightningd.c:1181 (main)",
E "../csu/libc-start.c:308 (__libc_start_main)"
E ],
E "label": "ccan/ccan/io/io.c:91:struct io_conn",
E "parents": [
E "lightningd/lightningd.c:107:struct lightningd"
E ],
E "value": "0x2b5a898"
E }
E ]
```
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
We were waiting for the start to timeout waiting for the "public key"
message. Instead, start manually.
Before, this took (with TIMEOUT=30) 97 seconds, after 8 seconds.
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
Get it to log direct to stdout, so we see what's happening *as it
happens* rather than as we read it. We could restore the thread we
were using before, but that added more problems than it solved.
This means that we need the hsm password prompts in the log though.
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
Some flakes are caused by weird races in this code. Plus, if we
get things to write straight to files, we might see things in
there on post-mortem which happen after the python runner exits.
It's a bit less efficient, but much simpler. Let's see if it helps!
Some tests need a rework now, since we don't get a failure (except
eventual timeout), but they're simpler.
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
1. We set an outgoing htlc's `failonion` when we get a commitment_signed.
2. We don't transfer it to the corresponding incoming HTLC until we send
commitment_signed and receive revoke_and_ack (meaning, outgoing htlc is
completely dead).
3. If between these steps we go onchain, onchaind (after 3 blocks) tells us
to fail the HTLC.
4. hout->failonion is set, but hout->hin has not been failed yet. We
do a sanity check and print a nasty message, and fail it with
WIRE_PERMANENT_CHANNEL_FAILURE instead of relaying the error.
So handle this case explicitly.
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
We assumed that if the outgoing htlc has failed, we should have always
failed the incoming one. This is not true, as this testcase demonstrates:
```
lightningd-2: 2022-05-25T04:38:31.449Z DEBUG 035d2b1192dfba134e10e540875d366ebc8bc353d5aa766b80c090b39c3a5d885d-onchaind-chan#2: Sending 1 missing htlc messages
lightningd-2: 2022-05-25T04:38:31.449Z DEBUG 035d2b1192dfba134e10e540875d366ebc8bc353d5aa766b80c090b39c3a5d885d-chan#2: onchain_failed_our_htlc
lightningd-2: 2022-05-25T04:38:31.449Z DEBUG 035d2b1192dfba134e10e540875d366ebc8bc353d5aa766b80c090b39c3a5d885d-chan#2: HTLC id 0 failonion = 0x55cb8bc045d8, failmsg = (nil), preimage = (nil)
lightningd-2: 2022-05-25T04:38:31.449Z **BROKEN** 035d2b1192dfba134e10e540875d366ebc8bc353d5aa766b80c090b39c3a5d885d-chan#2: HTLC id 0 already complete, but ->in not resolved! failonion = 206816a391d3a7666ddd5213914cbb68f5da1fc4a0937e729de5a1990c94d26312caa5f2778e8da0c6bdefc68dd1a3bc28b5b5650fc0bdb3c2247ecca94ed0bbb224c8448c2c71eb1656a8740cadb301bd1ee1c1e774a8fef817352f502e4217b11e93aa6877b88b37afab0e4d4e49ed0385be9ab9a1ab1ac0e3460e41cfafb30ed896cea96e346041919a6c524ce56c3e5f27c7cd78a36b6df221e90a1c6e048c72b4146a5a51885fb70649037fe7ace77a016ae3ec8aee97960d0e5f0582713f671df79d8dee11b82708b6d882ee5adbb328db1938e824110f57ead1b27410bc6f775c7bb4ae40c1768d77a166c9bfda8f634ba0ac4f8a9fe199894dd3754c5ce41c9694544c805ffc177517661f11221dd8dffac60ce1c8c5bf54cda8e5ea44d8ec6b, failmsg = (null), preimage = (null)
lightningd-2: 2022-05-25T04:38:31.449Z **BROKEN** 035d2b1192dfba134e10e540875d366ebc8bc353d5aa766b80c090b39c3a5d885d-chan#2: MISSING incoming fail for 0: failing incoming now
lightningd-2: 2022-05-25T04:38:31.449Z **BROKEN** 0266e4598d1d3c415f572a8488830b60f7e744ed9235eb0b1ba93283b315c03518-chan#1: MISSED incoming fail for 0: failing now
lightningd-2: 2022-05-25T04:38:31.449Z DEBUG 0266e4598d1d3c415f572a8488830b60f7e744ed9235eb0b1ba93283b315c03518-chan#1: HTLC in 0 RCVD_ADD_ACK_REVOCATION->SENT_REMOVE_HTLC
```
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
Usually we won't see this, since private is deleted. But we could
have already read the private channel before that. Handle it properly.
(Tested by removing the gossip_store deletion code and making sure
this worked).
We have to fix up the test, which announces a channel twice!
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
We can fail to use larger channel if it's not ready yet:
```
2022-05-23T01:20:05.5325600Z # Check it used the larger channel!
2022-05-23T01:20:05.5326376Z > assert before[chan23a_idx]['to_us_msat'] == after[chan23a_idx]['to_us_msat']
2022-05-23T01:20:05.5326961Z E assert 1000000000msat == 900000000msat
2022-05-23T01:20:05.5327240Z
2022-05-23T01:20:05.5327621Z tests/test_connection.py:3896: AssertionError
```
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
Split harder!
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
```
# This can timeout, so do it in four easy stages.
for i in range(4):
> bitcoind.generate_block(4032 // 4)
tests/test_closing.py:971:
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _
contrib/pyln-testing/pyln/testing/utils.py:475: in generate_block
return self.rpc.generatetoaddress(numblocks, to_addr)
contrib/pyln-testing/pyln/testing/utils.py:372: in f
res = proxy._call(name, *args)
/opt/hostedtoolcache/Python/3.7.13/x64/lib/python3.7/site-packages/bitcoin/rpc.py:233: in _call
response = self._get_response()
/opt/hostedtoolcache/Python/3.7.13/x64/lib/python3.7/site-packages/bitcoin/rpc.py:263: in _get_response
http_response = self.__conn.getresponse()
/opt/hostedtoolcache/Python/3.7.13/x64/lib/python3.7/http/client.py:1373: in getresponse
response.begin()
/opt/hostedtoolcache/Python/3.7.13/x64/lib/python3.7/http/client.py:319: in begin
version, status, reason = self._read_status()
/opt/hostedtoolcache/Python/3.7.13/x64/lib/python3.7/http/client.py:280: in _read_status
line = str(self.fp.readline(_MAXLINE + 1), "iso-8859-1")
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _
self = <socket.SocketIO object at 0x7f1a4b3702d0>
b = <memory at 0x7f1a4ac51600>
def readinto(self, b):
"""Read up to len(b) bytes into the writable buffer *b* and return
the number of bytes read. If the socket is non-blocking and no bytes
are available, None is returned.
If *b* is non-empty, a 0 return value indicates that the connection
was shutdown at the other end.
"""
self._checkClosed()
self._checkReadable()
if self._timeout_occurred:
raise OSError("cannot read from timed out object")
while True:
try:
> return self._sock.recv_into(b)
E socket.timeout: timed out
```
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
We use ephemeral_port_reserve to grab ports, but this can fail when we
restart a node, since the port can be reallocated at that point.
Attempt to overcome this using a global reserved list (is there a
neater way?).
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
It doesn't get the right errno, and it says "create" not "bind".
```
2022-05-20T03:04:46.498Z DEBUG connectd: Failed to create 2 socket: Success
2022-05-20T03:04:46.500Z DEBUG connectd: REPLY WIRE_CONNECTD_INIT_REPLY with 0 fds
2022-05-20T03:04:46.501Z DEBUG connectd: connectd_init_done
2022-05-20T03:04:46.503Z **BROKEN** connectd: Failed to bind socket for 127.0.0.1:37871: Address already in use
```
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
This happens under CI, but it's not informative. Sleep and retry.
Also, "except (OSError, Exception)" does not seem to do what you'd think:
this clause never gets run.
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
We can sneak the HTLC add in before we do the update_fee which fails:
```
# the channel should start warning -- too much dust
inv = l2.rpc.invoice(htlc_val_msat, str(num_dusty_htlcs + 1), str(num_dusty_htlcs + 1))
with pytest.raises(RpcError, match=r'WIRE_TEMPORARY_CHANNEL_FAILURE'):
> l1.rpc.sendpay(route, inv['payment_hash'], payment_secret=inv['payment_secret'])
E Failed: DID NOT RAISE <class 'pyln.client.lightning.RpcError'>
tests/test_pay.py:2654: Failed
```
From the logs:
```
lightningd-1: 2022-05-20T02:16:40.008Z DEBUG 022d223620a359a47ff7f7ac447c85c46c923da53389221a0054c11c1e3ca31d59-channeld-chan#1: NEW:: HTLC LOCAL 14 = SENT_ADD_HTLC/RCVD_ADD_HTLC
lightningd-1: 2022-05-20T02:16:40.010Z DEBUG 022d223620a359a47ff7f7ac447c85c46c923da53389221a0054c11c1e3ca31d59-channeld-chan#1: Adding HTLC 14 amount=10000000msat cltv=113 gave CHANNEL_ERR_ADD_OK
lightningd-1: 2022-05-20T02:16:40.012Z DEBUG 022d223620a359a47ff7f7ac447c85c46c923da53389221a0054c11c1e3ca31d59-channeld-chan#1: peer_out WIRE_UPDATE_ADD_HTLC
lightningd-1: 2022-05-20T02:16:40.015Z DEBUG 022d223620a359a47ff7f7ac447c85c46c923da53389221a0054c11c1e3ca31d59-channeld-chan#1: REPLY WIRE_CHANNELD_OFFER_HTLC_REPLY with 0 fds
lightningd-1: 2022-05-20T02:16:40.026Z DEBUG 022d223620a359a47ff7f7ac447c85c46c923da53389221a0054c11c1e3ca31d59-channeld-chan#1: peer_out WIRE_WARNING
lightningd-1: 2022-05-20T02:16:40.029Z DEBUG 022d223620a359a47ff7f7ac447c85c46c923da53389221a0054c11c1e3ca31d59-channeld-chan#1: billboard perm: Too much dust to update fee (Desired feerate update 20000)
lightningd-1: 2022-05-20T02:16:40.035Z INFO 022d223620a359a47ff7f7ac447c85c46c923da53389221a0054c11c1e3ca31d59-chan#1: Peer transient failure in CHANNELD_NORMAL: channeld WARNING: Too much dust to update fee (Desired feerate update 20000)
lightningd-1: 2022-05-20T02:16:40.039Z DEBUG 022d223620a359a47ff7f7ac447c85c46c923da53389221a0054c11c1e3ca31d59-lightningd: Will try reconnect in 60 seconds
```
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
Nodes now spray their own gossip at the start; this can race with
our query, and they can give us gossip we didn't ask for.
The fix is to always query an uninvolved node: in this case, we
only connected l3 and l4 to test zlib, which is gone, so now we can
remove that and simply always query l4.
```
2022-05-23T01:32:53.4695809Z # reply_channel_range == 264
2022-05-23T01:32:53.4696416Z > assert msgs == ['0108'
2022-05-23T01:32:53.4696835Z # blockhash
2022-05-23T01:32:53.4697328Z + genesis_blockhash
2022-05-23T01:32:53.4697841Z # first_blocknum, number_of_blocks, complete
2022-05-23T01:32:53.4698685Z + format(0, '08x') + format(65535, '08x') + '01'
2022-05-23T01:32:53.4699245Z # encoded_short_ids
2022-05-23T01:32:53.4699859Z + format(len(encoded) // 2, '04x')
2022-05-23T01:32:53.4700255Z + encoded]
2022-05-23T01:32:53.4700799Z E AssertionError: assert ['0100194b419...700000010000'] == ['010806226e4...700000010000']
2022-05-23T01:32:53.4710678Z E At index 0 diff: '0100194b4190af98beedf231901aeda2dcee0e757dd5327219aa007ea6027e769c98665f49e0f99e03b91e80896ed01cd9c4b157c77e908852097d2ba753c6ca01ae41a90a81c9f492c770d9dd35010c702c7c564045cf85a4bf6efaf25cceb9b81b56d1526429d3c60bf1c7ed6a835720730fd7ad1851a937d18aecf8bc3400dd4e0a30d5c4d65999ac39ae6eb3b98d27e3157973598256d2dd4718cd39010a3612580e7f3bba9eda0fb595976f955e345b6affa62fd556f7257a30d54dd87743dc38585e7c31e1ca80b42cf803bd4c1aaa3127351c95de8a4198c46eb0bd7b4ce175a4b51fcaf9676b01806d237758494fcf3d1558279060a0c3b84806a3330c41000006226e46111a0b59caaf126043eb5bbf28c34f3a5e332a1fc7b2b73cf188910f0000700000010000035d2b1192dfba134e10e540875d366ebc8bc353d5aa766b80c090b39c3a5d885d0382ce59ebf18be7d84677c2e35f23294b9992ceca95491fcf8a56c6cb2d9de19903c47b1b2afcfd68c7b86c0976adb9a4f0835bc78242fced745d78433497a867d6021c29746e4136ddff456483df3980c2d0d5f31c93ef5ded564f7294a10d7414aa' != '010806226e46111a0b59caaf126043eb5bbf28c34f3a5e332a1fc7b2b73cf188910f000000000000ffff01001900000068000001000100006900000100000000700000010000'
```
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
We didn't get the second channel_announcement in the filter:
```
# Now choose range which will only give second update.
msgs = l4.query_gossip('gossip_timestamp_filter',
genesis_blockhash,
after_12 - backdate,
after_23 - after_12 + 1,
filters=['0109', '0012'])
# 0x0100 = channel_announcement
# 0x0102 = channel_update
# (Node announcement may have any timestamp)
types = Counter([m[0:4] for m in msgs])
> assert types['0100'] == 1
E assert 0 == 1
```
Looking at the gossip store:
```
1089: t=1653008725 channel_announcement: 0100420c13f233d9b461b035257b77704e0003a0280efdbb8f8964bf4c5bd22b8f8b2e205a3c57d7d5f4dcd812cd41be3d8f84c60046c1fd31abefd0985c2263f8f5074bb7dff548ac106dc10ef4485f5921d9b252d094f9518b44a842c1a2c7f98759fa2f8803febeb59e50b65a8f258877e18f6224262e218de38f21438b2112ef0425d64174092bcd8d757f18982c3fb83c8ae2fb5ee88363248e51bdea247cef055a01a17580b60735b82b88fea87dccb724a743beaf4bf3cf578b0da445afcc786fe3b061429f3288e6970819236e97134898ce900b7b61159148d6785e26932e21f1c337cb6f86a16e967e27396b168a19b0d1478c4be2579a9b96aa0125aa000006226e46111a0b59caaf126043eb5bbf28c34f3a5e332a1fc7b2b73cf188910f00006e0000010000022d223620a359a47ff7f7ac447c85c46c923da53389221a0054c11c1e3ca31d59035d2b1192dfba134e10e540875d366ebc8bc353d5aa766b80c090b39c3a5d885d02d595ae92b3544c3250fb772f214ad8d4c51425033740a5bcc357190add6d7e7a02d6063d022691b2490ab454dee73a57c6ff5d308352b461ece69f3c284f2c2412
1533: channel_amount: 100000sat
1555: t=1653008725 channel_update: 01022b98e7d680ffa8096d0a2bda62c0b8b3fc975a5f91c0cff0a42a76b37285276c0156581f85c0be2ee0ed7e7003cb67f5d476a7e3714960836b882bb3ad78097406226e46111a0b59caaf126043eb5bbf28c34f3a5e332a1fc7b2b73cf188910f00006e00000100006286e955010000060000000000000000000000010000000a0000000005e69ec0
1705: t=1653008729 node_announcement: 010175f8ed9f48db511d43f1da96aa307ccc53a521dcc2f8fb732c6938650e6cc00f672e772134a29e16b723a1e9cf626cf942323bd3fa294befd25430a9395c1fa40007800000080269a26286e959035d2b1192dfba134e10e540875d366ebc8bc353d5aa766b80c090b39c3a5d885d035d2b484f5050494e47464952452d34646636366661000000000000000000000000000000
1866: t=1653008720 channel_update: 010266385083f1d6e554e1701f2456c24212c5aa937f7b090d32d243678b2fe78f7f52ed3626d6cdd52703bf78013bb9d1a621ae5b7d68fc965d29bb94574e340b5606226e46111a0b59caaf126043eb5bbf28c34f3a5e332a1fc7b2b73cf188910f00006e00000100006286e950010100060000000000000000000000010000000a00000
```
It's not clear what we times we were querying, but let's make it
clearer by putting a full 10 seconds between them, and making our
queries more exact.
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
Over time, it has cost us more developer cycles than it has gained.
It has hidden intermittant bugs, and allowed cruft to accumulate:
when we eventually tried to figure out what was going wrong, the
actual change which caused it was now stale and forgotten.
This was a particular bane during the connectd rewrite, and I
worked through some issues which had occurred before, but were not
more likely.
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
I've wanted this for a while: the ability to log to multiple places
at once.
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
Changelog-Changed: lightningd: `log-file` option specified multiple times opens multiple log files.
So this was quite a journey:
- We want relative depdendencies (using the `path` argument) whenever
developing locally. Otherwise we would have to install each
dependency every time we change a single character, which
undoubtedly would cause us to waste time trying to debug an issue
just because we forgot to install.
- When publishing however we want to rely on the version number,
since the repo context gets lost upon publishing, and path
dependencies cause failures.
The solution then it seems is to use `dev-dependencies` (not that
surprising once you find it) with relative paths, so that `poetry
install` uses these over the normal dependencies (no idea how they
dedup them) and use `dependencies` when publishing. The paths are
still in there when publishing, but `pip install` ignores them.
I checked that `poetry install` from an unrelated project doesn't
accidentally use the path dependencies, even when adding them as
dev-dependencies. This should hopefully also allow installing them
as a repo link, though I can't test that right now.
Thanks to @zerofeerouting for another report.
"desc" here is the sanitized message, eg:
"ERROR error channel 0123456789abcdef0123456789abcdef0123456789abcdef0123456789abcdef: internal error"
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
Prior to 0.11.0 we had cases where we would treat errors
as warnings: regretfully, this is still needed. This message
in particular has been widely reported, and it now causes
channel force closes.
Downgrade and log. I did insert some snarky log message earlier,
but hey, I'm sure CLN has done worse things to our peers!
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
Changelog-Fixed: Protocol: treat LND "internal error" as warnings, not force close events (as we did in v0.10).
Changelog-Fixed: signmessage: improve the UX of the rpc command when zbase is not a valid one
Stacktrace generated with a bad `zbase`
```
lightningd: lightningd/signmessage.c:59: from_zbase32: Assertion `len == tal_bytelen(u8arr)' failed
lightningd: FATAL SIGNAL 6 (version v0.11.1)
0x55b9b1b4e617 send_backtrace
[...]
```
Signed-off-by: Vincenzo Palazzo <vincenzopalazzodev@gmail.com>
@whitslack complained of large CPU usage by connectd at startup;
I ran perf record on connectd on my machine (which sees a little spike, only)
and I see the cost of reading and discarding the entries:
```
- 95.52% 5.24% lightning_conne lightning_connectd [.] gossip_store_next
- 90.28% gossip_store_next
+ 40.27% tal_alloc_arr_
+ 22.78% tal_free
+ 11.74% crc32c
+ 9.32% fromwire_peektype
+ 4.10% __libc_pread64 (inlined)
1.70% be32_to_cpu
```
Much of this is caused by the search for our own gossip: keeping this separately
would be even better, but this fix is minimal.
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
Changelog-Fixed: connectd: reduce initial CPU load when connecting to peers.
This is being hit: it's possible if connectd and lightningd get desynchronized,
and we'll handle this later when peer is activated.
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>