mirrors/core-lightning

mirror of https://github.com/ElementsProject/lightning.git synced 2025-02-23 15:00:34 +01:00

Author	SHA1	Message	Date
Rusty Russell	47584bd504	connectd: tie gossip query responses into ratelimiting code. A bit tricky, since we get more than one message at a time. However, this just means we go over quota for a bit, and will get caught when those are sent (we do this for a single message already, so it's not that much worse). Note: this not only limits sending, but it limits the actuall query processing, which is nice. Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>	2024-07-10 12:21:19 +09:30
Rusty Russell	4a78d17748	connectd: do response to gossip queries, don't hand them to gossipd. This basically means moving the code from gossipd to connectd to handle these queries. This will get connectd have finer control over ratelimiting them. Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>	2024-07-10 12:21:19 +09:30
Rusty Russell	d60977f37f	connectd: use gossmap streaming interface. This is more efficient in a few ways: 1. It's trivial to get to the end of the gossip_store, we don't have to iterate. 2. It tends to be mmaped so we don't have to call pread(). Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>	2024-07-10 12:21:19 +09:30
Rusty Russell	401533667d	connectd: throttle streaming gossip for peers. We currently stream gossip as fast as we can, even if they start at timestamp 0. Instead, use a simple token bucket filter and only let them have 1MB per second (500 bytes per second for testing). Signed-off-by: Rusty Russell <rusty@rustcorp.com.au> Changelog-Protocol: connectd: we now throttle outgoing gossip at 1MB/second per peer.	2024-07-10 12:21:19 +09:30
Rusty Russell	5e585d061f	connectd: log incoming onion message IO properly. I noticed we were missing this. Move logging up a level so it's easier to spot the omission. Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>	2024-07-09 15:09:29 +02:00
Rusty Russell	01cd605cb1	connectd: fix missing peer close. We were getting the following message in test_feerate_stress: ``` 2024-07-08T02:15:45.5663941Z lightningd-2 2024-07-08T02:13:45.696Z BROKEN 0266e4598d1d3c415f572a8488830b60f7e744ed9235eb0b1ba93283b315c03518-connectd: Peer did not close, forcing close ``` I can reproduce it locally if I run the test enough, and finally found the issue by printing the status of the fd when we time it out (using routines from connectd.c). The peer fd alternates between reading and writing. When we go to discard it, we wake the write queue, so write_to_peer() get called. It won't shutdown the socket if there are still subds attached, and will wait again for a read. The last subd exit has to also wake the write queue if we're draining, so it can do the io_sock_shutdown. Otherwise, we hit the timeout, causing the message above. Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>	2024-07-09 18:03:44 +09:30
Rusty Russell	002dc60b33	Gossip: BOLT catch, remove initial_routing_sync. Everyone sends a gossip_timestamp_filter message these days to start gossip. Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>	2024-06-19 15:54:24 +09:30
Rusty Russell	06cf5ac841	Doc: update bolts to assume gossip_queries under the new meaning. Everyone understands gossip_queries now, but peers leave it unset to indicate they have nothing useful to say. Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>	2024-06-19 15:54:24 +09:30
Rusty Russell	5d061c4cf4	global: remove tags from BOLT quotes now dual-funding is in master A few of them had minor wording changes, too. Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>	2024-05-09 16:14:23 -05:00
Rusty Russell	d3dbcf03fa	channeld: close an unimportant connection when fds get low. We use a crude heuristic: if we were trying to contact them, it's a "deliberate" connection, and should be preserved. Changelog-Changed: connectd: prioritize peers with channels (and log!) if we run low on file descriptors. Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>	2024-05-09 01:23:46 -05:00
Rusty Russell	3bfe622413	connectd: log when we fail to receive an fd from lightningd. Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>	2024-05-09 01:23:46 -05:00
Rusty Russell	e0e879c003	common: remove type_to_string files altogther. This means including <common/utils.h> where it was indirectly included. Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>	2024-03-20 13:51:48 +10:30
Rusty Russell	07cd4a809b	gossipd: remove spam handling. We weakened this progressively over time, and gossip v1.5 makes spam impossible by protocol, so we can wait until then. Removing this code simplifies things a great deal! Signed-off-by: Rusty Russell <rusty@rustcorp.com.au> Changelog-Removed: Protocol: we no longer ratelimit gossip messages by channel, making our code far simpler.	2024-02-04 09:24:44 +10:30
Rusty Russell	db6f0da3b3	connectd: separate routine to inject message without closing connection. We will want this to send private channel_updates direct to peer. Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>	2024-01-31 14:47:33 +10:30
niftynei	fa8458c00a	dualfund: add test to make sure that tx-sigs sent before commitment results in an error.	2023-11-02 19:32:05 +10:30
Rusty Russell	043c4ae5eb	pytest: fix flake in test_even_sendcustommsg Make sure plugin has got message to connectd before sending! ``` def test_even_sendcustommsg(node_factory): l1, l2 = node_factory.get_nodes(2, opts={'log-level': 'io', 'allow_warning': True}) l1.connect(l2) # Even-numbered message msg = hex(43690)[2:] + ('ff' * 30) + 'bb' # l2 will hang up when it gets this. l1.rpc.sendcustommsg(l2.info['id'], msg) l2.daemon.wait_for_log(r'\[IN\] {}'.format(msg)) l1.daemon.wait_for_log('Invalid unknown even msg') wait_for(lambda: l1.rpc.listpeers(l2.info['id'])['peers'] == []) # Now with a plugin which allows it l1.connect(l2) l2.rpc.plugin_start(os.path.join(os.getcwd(), "tests/plugins/allow_even_msgs.py")) l1.rpc.sendcustommsg(l2.info['id'], msg) l2.daemon.wait_for_log(r'\[IN\] {}'.format(msg)) > l2.daemon.wait_for_log(r'allow_even_msgs.Got message 43690') tests/test_misc.py:3623: ... > raise TimeoutError('Unable to find "{}" in logs.'.format(exs)) E TimeoutError: Unable to find "[re.compile('allow_even_msgs.Got message 43690')]" in logs. contrib/pyln-testing/pyln/testing/utils.py:327: TimeoutError ``` Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>	2023-10-24 09:51:43 +02:00
Rusty Russell	ad7dcf381e	lightningd: tell connectd about the custom messages. We re-send whenever a plugin which allows them starts/finishes. Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>	2023-10-24 11:50:57 +10:30
Rusty Russell	4216affe90	connectd: fix forwarding after tx_abort. If we get a WIRE_TX_ABORT then another message, we send the other message to the same subd (even though the tx abort causes it to shutdown). This means we effectively lose the next message, and timeout (see below from CI, reproduced locally). So, have connectd ignore the subd after it forwards the WIRE_TX_ABORT. The next message will, correctly, cause a fresh subdaemon to be spawned. ``` @unittest.skipIf(TEST_NETWORK != 'regtest', 'elementsd doesnt yet support PSBT features we need') @pytest.mark.openchannel('v2') def test_v2_rbf_multi(node_factory, bitcoind, chainparams): l1, l2 = node_factory.get_nodes(2, opts={'may_reconnect': True, 'dev-no-reconnect': None, 'allow_warning': True}) l1.rpc.connect(l2.info['id'], 'localhost', l2.port) amount = 224 chan_amount = 100000 bitcoind.rpc.sendtoaddress(l1.rpc.newaddr()['bech32'], amount / 108 + 0.01) bitcoind.generate_block(1) # Wait for it to arrive. wait_for(lambda: len(l1.rpc.listfunds()['outputs']) > 0) res = l1.rpc.fundchannel(l2.info['id'], chan_amount) chan_id = res['channel_id'] vins = bitcoind.rpc.decoderawtransaction(res['tx'])['vin'] assert(only_one(vins)) prev_utxos = ["{}:{}".format(vins[0]['txid'], vins[0]['vout'])] # Check that we're waiting for lockin l1.daemon.wait_for_log(' to DUALOPEND_AWAITING_LOCKIN') # Attempt to do abort, should fail since we've # already gotten an inflight with pytest.raises(RpcError): l1.rpc.openchannel_abort(chan_id) rate = int(find_next_feerate(l1, l2)[:-5]) # We 4x the feerate to beat the min-relay fee next_feerate = '{}perkw'.format(rate * 4) # Initiate an RBF startweight = 42 + 172 # base weight, funding output initpsbt = l1.rpc.utxopsbt(chan_amount, next_feerate, startweight, prev_utxos, reservedok=True, min_witness_weight=110, excess_as_change=True) # Do the bump bump = l1.rpc.openchannel_bump(chan_id, chan_amount, initpsbt['psbt'], funding_feerate=next_feerate) # Abort this open attempt! We will re-try aborted = l1.rpc.openchannel_abort(chan_id) assert not aborted['channel_canceled'] # We no longer disconnect on aborts, because magic! assert only_one(l1.rpc.listpeers()['peers'])['connected'] # Do the bump, again, same feerate > bump = l1.rpc.openchannel_bump(chan_id, chan_amount, initpsbt['psbt'], funding_feerate=next_feerate) tests/test_opening.py:668: _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ contrib/pyln-client/pyln/client/lightning.py:1206: in openchannel_bump return self.call("openchannel_bump", payload) contrib/pyln-testing/pyln/testing/utils.py:718: in call res = LightningRpc.call(self, method, payload, cmdprefix, filter) contrib/pyln-client/pyln/client/lightning.py:398: in call resp, buf = self._readobj(sock, buf) contrib/pyln-client/pyln/client/lightning.py:315: in _readobj b = sock.recv(max(1024, len(buff))) _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ self = <pyln.client.lightning.UnixSocket object at 0x7f34675aae80> length = 1024 def recv(self, length: int) -> bytes: if self.sock is None: raise socket.error("not connected") > return self.sock.recv(length) E Failed: Timeout >1200.0s ```	2023-10-23 21:57:51 +10:30
Rusty Russell	1d61edfe0c	connectd: use shutdown() not close() on TCP sockets for dev-disconnect. close() is allowed to lose data, and I saw this in CI: ``` 2023-10-22T05:12:36.6576005Z ____________________________ test_permfail_htlc_out ____________________________ 2023-10-22T05:12:36.6608511Z [gw2] linux -- Python 3.8.18 /home/runner/.cache/pypoetry/virtualenvs/cln-meta-project-AqJ9wMix-py3.8/bin/python 2023-10-22T05:12:36.6611663Z 2023-10-22T05:12:36.6614768Z node_factory = <pyln.testing.utils.NodeFactory object at 0x7f381039a5e0> 2023-10-22T05:12:36.6623694Z bitcoind = <pyln.testing.utils.BitcoinD object at 0x7f38103c0400> 2023-10-22T05:12:36.6627092Z executor = <concurrent.futures.thread.ThreadPoolExecutor object at 0x7f38103c0ee0> 2023-10-22T05:12:36.6627701Z 2023-10-22T05:12:36.6628051Z def test_permfail_htlc_out(node_factory, bitcoind, executor): 2023-10-22T05:12:36.6631192Z # Test case where we fail with unsettled outgoing HTLC. 2023-10-22T05:12:36.6634154Z disconnects = ['+WIRE_REVOKE_AND_ACK', 'permfail'] 2023-10-22T05:12:36.6635106Z l1 = node_factory.get_node(options={'dev-no-reconnect': None}) 2023-10-22T05:12:36.6637321Z # Feerates identical so we don't get gratuitous commit to update them 2023-10-22T05:12:36.6642691Z l2 = node_factory.get_node(disconnect=disconnects, 2023-10-22T05:12:36.6644734Z feerates=(7500, 7500, 7500, 7500)) 2023-10-22T05:12:36.6647205Z 2023-10-22T05:12:36.6649671Z l1.rpc.connect(l2.info['id'], 'localhost', l2.port) 2023-10-22T05:12:36.6650460Z l2.daemon.wait_for_log('Handed peer, entering loop') 2023-10-22T05:12:36.6654865Z l2.fundchannel(l1, 10**6) 2023-10-22T05:12:36.6655305Z 2023-10-22T05:12:36.6657810Z # This will fail at l2's end. 2023-10-22T05:12:36.6660554Z t = executor.submit(l2.pay, l1, 200000000) 2023-10-22T05:12:36.6662947Z 2023-10-22T05:12:36.6665147Z l2.daemon.wait_for_log('dev_disconnect permfail') 2023-10-22T05:12:36.6668530Z l2.wait_for_channel_onchain(l1.info['id']) 2023-10-22T05:12:36.6671588Z bitcoind.generate_block(1) 2023-10-22T05:12:36.6674510Z > l1.daemon.wait_for_log('Their unilateral tx, old commit point') 2023-10-22T05:12:36.6675001Z 2023-10-22T05:12:36.6675212Z tests/test_closing.py:3027: ... 2023-10-22T05:12:36.8784390Z lightningd-2 2023-10-22T04:41:04.448Z DEBUG 0266e4598d1d3c415f572a8488830b60f7e744ed9235eb0b1ba93283b315c03518-connectd: dev_disconnect: +WIRE_REVOKE_AND_ACK (WIRE_REVOKE_AND_ACK) 2023-10-22T05:12:36.8786260Z lightningd-2 2023-10-22T04:41:04.452Z INFO 0266e4598d1d3c415f572a8488830b60f7e744ed9235eb0b1ba93283b315c03518-channeld-chan#1: Peer connection lost 2023-10-22T05:12:36.8788076Z lightningd-2 2023-10-22T04:41:04.453Z DEBUG 0266e4598d1d3c415f572a8488830b60f7e744ed9235eb0b1ba93283b315c03518-channeld-chan#1: Status closed, but not exited. Killing 2023-10-22T05:12:36.8789915Z lightningd-1 2023-10-22T04:41:04.454Z INFO 022d223620a359a47ff7f7ac447c85c46c923da53389221a0054c11c1e3ca31d59-channeld-chan#1: Peer connection lost ``` Note that l1 doesn't receive WIRE_REVOKE_AND_ACK! Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>	2023-10-23 15:48:50 +10:30
Rusty Russell	cf2ebed567	lightningd: close channel ourselves, if we receive an error. Previously, we would forward the message to a subd, but now we have the case where the subd is gone, but we're still connected. If the peer anything but a reestablish in that state, we drop the connection. Instead, an error should always make us fail the channel. Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>	2023-10-23 15:48:50 +10:30
Rusty Russell	08f0a54fdc	connectd: don't disconnect automatically on sending warning or error. This changes some tests which expected us to disconnect. Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>	2023-10-23 15:48:50 +10:30
Rusty Russell	798cf27cb4	connectd: give subds a chance to drain when lightningd says to disconnect. Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>	2023-10-23 15:48:50 +10:30
Rusty Russell	0ff91e65dc	connectd: remove #if DEVELOPER We still refuse to run dev commands if lightningd sends it to us despite us not being in developer mode, but that's mainly paranoia. Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>	2023-09-21 20:08:24 +09:30
Rusty Russell	df80a2c056	doc: update to BOLT 3747ba83022cd385093df2696ed342f1e41e31b3 "Remove requirements to disconnect on warnings" Now we don't do that anymore (at least, for sending) we can update bolt quotes to match. Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>	2023-09-20 13:56:46 +09:30
Rusty Russell	468d3fd387	connectd: also don't disconnect on "all-channel" warnings. Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>	2023-09-20 13:56:46 +09:30
Dusty Daemon	4628e3ace8	channeld: Code to implement splicing Update the lightningd <-> channeld interface with lots of new commands to needed to facilitate spicing. Implement the channeld splicing protocol leveraging the interactivetx protocol. Implement lightningd’s channel_control to support channeld in its splicing efforts. Changelog-Added: Added the features to enable splicing & resizing of active channels.	2023-07-31 21:00:22 +09:30
Rusty Russell	6c23349c72	channeld: allow stfu based on peer features, not EXPERIMENTAL_FEATURES. Changelog-EXPERIMENTAL: Config: `--experimental-quiesce` enables queiescence, for testing. Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>	2023-05-23 09:34:08 +09:30
Alex Myers	bec8586dce	connectd: remove handling of push only gossip This is now handled by gossipd on initial connection to peer.	2023-04-13 08:48:50 -07:00
Rusty Russell	6a446a94c6	connectd: implement timestamp-as-trinary. This implements the proposal to simply use timestamp as "all", "none" or "stream". There's also a rough spec draft which I will post soon. This also removes the last place where we would sometimes sweep the entire gossip_store looking for their given timestamps. We could also get rid of the actual timestamp filtering logic in gossip_store_next if we want to, as it's now basically unused. Changelog-Changed: Protocol: Simplify gossip_timestamp_filter handling to "all", "none" or "recent" instead of exact timestamp. Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>	2023-04-13 08:48:50 -07:00
Rusty Russell	00f75d6ee1	connectd: no longer stream our own generated gossip, let gossipd do it. This removes the sweep logic as soon as they connect. This should save connectd a significant number of CPU cycles and make @whitslack finally stop hitting me. Changelog-Changed: `connectd` no longer sweeps gossip_store file when peer connects, saving CPU for large nodes. Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>	2023-04-13 08:48:50 -07:00
Rusty Russell	ed58c24bc7	connectd: log broken if TCP_CORK fails. But not if we're a developer using dev_disconnect, which substitutes the fd. Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>	2023-04-10 09:41:56 +09:30
Rusty Russell	295557ac50	connectd: don't try to set TCP_CORK on websocket pipe. Most of this is piping the flag through so we know it's a websocket! Reported-by: @ShahanaFarooqui Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>	2023-04-10 09:41:56 +09:30
adi2011	709ff01fd2	connectd: make exception for peer storage msgs.	2023-02-08 08:37:59 -06:00
adi2011	5f481aaf96	wire: Add patch file for peer storage bkp Add msg type peer_storage and your_peer_storage	2023-02-08 08:37:59 -06:00
Rusty Russell	2209d0149f	connectd: add new start_shutdown message. We stop listening, and also refuse to send "connectd_peer_spoke" to create new subdaemons. Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>	2023-02-05 20:40:47 +01:00
Michael Schmoock	ad249607d6	dual-fund: update extracted CSVs to latest bolt draft Changelog-None	2023-02-04 15:31:16 +10:30
Rusty Russell	153b7bf192	common/gossip_store: move subdaemon-only routines to connectd. connectd is the only one who uses these routines now. The rest can be linked into a plugin. Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>	2023-01-30 15:15:41 -06:00
Rusty Russell	81e57dce52	connectd: ensure htables are always tal objects. We want to change the htable allocator to use tal, which will need this. Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>	2023-01-12 11:44:10 +10:30
Rusty Russell	15d0a8bec8	connectd: don't spam logs when we're under load. This happens a lot with my node with rc2, so drop it to debug. Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>	2022-11-30 19:31:38 +01:00
Rusty Russell	41ef85318d	onionmessages: remove obsolete onion message parsing. Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>	2022-09-29 16:10:57 +09:30
Rusty Russell	1b30ea4b82	doc: update BOLTs to bc86304b4b0af5fd5ce9d24f74e2ebbceb7e2730 This contains the zeroconf stuff, with funding_locked renamed to channel_ready. I change that everywhere, and try to fix up the comments. Also the `alias` field is called `short_channel_id`. Signed-off-by: Rusty Russell <rusty@rustcorp.com.au> Changelog-Changed: Protocol: `funding_locked` is now called `channel_ready` as per latest BOLTs.	2022-09-12 09:34:52 +09:30
Rusty Russell	a3c4908f4a	lightningd: don't explicitly tell connectd to disconnect, have it do it on sending error/warning. Connectd already does this when we receive an error or warning, but now do it on send. This causes some slight behavior change: we don't disconnect when we close a channel, for example (our behaviour here has been inconsistent across versions, depending on the code). When connectd is told to disconnect, it now does so immediately, and doesn't wait for subds to drain etc. That simplifies the manual disconnect case, which now cleans up as it would from any other disconnection when connectd says it's disconnected. Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>	2022-07-18 20:50:04 -05:00
Rusty Russell	c415c80d48	connectd: spelling and typo fixes. From @niftynei. Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>	2022-07-18 20:50:04 -05:00
Rusty Russell	719d1384d1	connectd: give connections a chance to drain when lightningd says to disconnect, or peer disconnects. We want to avoid lost messages in the common cases. This generalizes our drain code, by giving the subds each 5 seconds to close themselves, but continue to allow them to send us traffic (if peer is still connected) and continue to send them traffic. We continue to send traffic out to the peer (if it's still connected), until all subds are gone. We still have a 5 second timer to close the connection to peer. On reconnects, we don't do this "drain period" on reconnects: we kill immediately. We fix up one test which was looking for the "disconnect" message explicitly. Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>	2022-07-18 20:50:04 -05:00
Rusty Russell	6a2817101d	connectd: don't move parent while we're being freed. A subtle case I hadn't come across before: if a child tal_resizes() its parent while the parent is being deleted, tal gets confused. The subd destructor does this using tal_arr_remove() on peer->subds, which is currently being freed: ``` ==61056== Invalid read of size 8 ==61056== at 0x185632: del_tree (tal.c:417) ==61056== by 0x18560D: del_tree (tal.c:412) ==61056== by 0x185957: tal_free (tal.c:486) ==61056== by 0x1183BC: peer_discard (connectd.c:1861) ==61056== by 0x11869E: recv_req (connectd.c:1942) ==61056== by 0x12774B: handle_read (daemon_conn.c:35) ==61056== by 0x173453: next_plan (io.c:59) ==61056== by 0x17405B: do_plan (io.c:407) ==61056== by 0x17409D: io_ready (io.c:417) ==61056== by 0x176390: io_loop (poll.c:453) ==61056== by 0x118A68: main (connectd.c:2082) ==61056== Address 0x4bd8850 is 16 bytes inside a block of size 48 free'd ==61056== at 0x483DFAF: realloc (in /usr/lib/x86_64-linux-gnu/valgrind/vgpreload_memcheck-amd64-linux.so) ==61056== by 0x1860E6: tal_resize_ (tal.c:699) ==61056== by 0x1373DD: tal_arr_remove_ (utils.c:184) ==61056== by 0x11D508: destroy_subd (multiplex.c:930) ==61056== by 0x1850A4: notify (tal.c:240) ==61056== by 0x1855BB: del_tree (tal.c:402) ==61056== by 0x18560D: del_tree (tal.c:412) ==61056== by 0x18560D: del_tree (tal.c:412) ==61056== by 0x185957: tal_free (tal.c:486) ==61056== by 0x1183BC: peer_discard (connectd.c:1861) ==61056== by 0x11869E: recv_req (connectd.c:1942) ==61056== by 0x12774B: handle_read (daemon_conn.c:35) ``` So simply make the subds children of `peer` not the `peer->subds` array. The only effect is that drain_peer() can't simply free the subds array but must free the subds one at a time. Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>	2022-07-18 20:50:04 -05:00
Rusty Russell	d31420211a	connectd: add counters to each peer connection. This allows us to detect when lightningd hasn't seen our latest disconnect/reconnect; in particular, we would hit the following pattern: 1. lightningd says to connect a subd. 2. connectd disconnects and reconnects. 3. connectd reads message, connects subd. 4. lightningd reads disconnect and reconnect, sends msg to connect to subd again. 5. connectd asserts because subd is alreacy connected. This way connectd can tell if lightningd is talking about the previous connection, and ignoere it. Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>	2022-07-18 20:50:04 -05:00
Rusty Russell	41b379ed89	lightningd: hand fds to connectd, not receive them from connectd. Before this patch: 1. connectd says it's connected (peer_connected) 2. we tell connectd we want to talk about each channel (peer_make_active) 3. connectd gives us an fd for each channel, and we connect it to a subd (peer_active) 4. OR, connectd says it sent something about a channel we didn't tell it about, with an fd (peer_active) Now: 1. connectd says it's connected (peer_connected) 2. we start all appropriate subds and tell connectd to what channels/fds (peer_connect_subd). 3. if connectd says it sent something about a channel we didn't tell it about, we either tell it to hang up (peer_final_msg), or connect a new opening daemon (peer_connect_subd). This is the minimal-size patch, which is why we create socket pairs in so many places to use the existing functions. Many cleanups are possible, since the new flow is so simple. Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>	2022-07-18 20:50:04 -05:00
Rusty Russell	ab0e5d30ee	connectd: don't io_halfclose() We don't io_halfclose() the other side, we io_sock_shutdown(), which can leave both sides unset: ``` lightningd-2: 2022-06-07T11:00:05.053Z BROKEN connectd: FATAL SIGNAL 6 (version 57e1af2) lightningd-2: 2022-06-07T11:00:05.053Z BROKEN connectd: backtrace: common/daemon.c:38 (send_backtrace) 0x563b9b603af7 lightningd-2: 2022-06-07T11:00:05.053Z BROKEN connectd: backtrace: common/daemon.c:46 (crashdump) 0x563b9b603b4b lightningd-2: 2022-06-07T11:00:05.053Z BROKEN connectd: backtrace: /build/glibc-SzIz7B/glibc-2.31/signal/../sysdeps/unix/sysv/linux/x86_64/sigaction.c:0 ((null)) 0x7fe6e8d4f08f lightningd-2: 2022-06-07T11:00:05.053Z BROKEN connectd: backtrace: ../sysdeps/unix/sysv/linux/raise.c:51 (__GI_raise) 0x7fe6e8d4f00b lightningd-2: 2022-06-07T11:00:05.054Z BROKEN connectd: backtrace: /build/glibc-SzIz7B/glibc-2.31/stdlib/abort.c:79 (__GI_abort) 0x7fe6e8d2e858 lightningd-2: 2022-06-07T11:00:05.054Z BROKEN connectd: backtrace: /build/glibc-SzIz7B/glibc-2.31/assert/assert.c:92 (__assert_fail_base) 0x7fe6e8d2e728 lightningd-2: 2022-06-07T11:00:05.054Z BROKEN connectd: backtrace: /build/glibc-SzIz7B/glibc-2.31/assert/assert.c:101 (__GI___assert_fail) 0x7fe6e8d3ffd5 lightningd-2: 2022-06-07T11:00:05.054Z BROKEN connectd: backtrace: ccan/ccan/io/io.c:65 (next_plan) 0x563b9b64fd7e lightningd-2: 2022-06-07T11:00:05.054Z BROKEN connectd: backtrace: ccan/ccan/io/io.c:407 (do_plan) 0x563b9b6508f0 lightningd-2: 2022-06-07T11:00:05.054Z BROKEN connectd: backtrace: ccan/ccan/io/io.c:423 (io_ready) 0x563b9b650984 lightningd-2: 2022-06-07T11:00:05.054Z BROKEN connectd: backtrace: ccan/ccan/io/poll.c:453 (io_loop) 0x563b9b652c25 lightningd-2: 2022-06-07T11:00:05.054Z BROKEN connectd: backtrace: connectd/connectd.c:2037 (main) 0x563b9b5f5793 lightningd-2: 2022-06-07T11:00:05.054Z BROKEN connectd: backtrace: ../csu/libc-start.c:308 (__libc_start_main) 0x7fe6e8d30082 lightningd-2: 2022-06-07T11:00:05.054Z BROKEN connectd: backtrace: (null):0 ((null)) 0x563b9b5ebf6d lightningd-2: 2022-06-07T11:00:05.054Z BROKEN connectd: backtrace: (null):0 ((null)) 0xffffffffffffffff ``` Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>	2022-07-18 20:50:04 -05:00
Rusty Russell	7b0c11efb4	connectd: don't let peer close take forever. Sending any pending messages to peer before hanging up is a courtesy: give it 5 seconds before simply closing. Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>	2022-07-18 20:50:04 -05:00
Rusty Russell	8678c5efb3	connectd: release peer soon as lightingd tells us. Now we have separate peer draining logic, we can simply use it when connectd tells us to release the peer, without waiting. (We could simply free the peer, but that's a bit rude, as messages can get lost). This removes various complex flags and logic we had before. Signed-off-by: Rusty Russell <rusty@rustcorp.com.au> Changelog-Fixed: `connectd`: various crashes and issues fixed by simplification and rewrite.	2022-07-18 20:50:04 -05:00

1 2

96 commits