Commit Graph

11623 Commits

Author SHA1 Message Date
Rusty Russell
a08728497b lightningd: reintroduce "slow connect" logic.
Just keep a flag on the peer, and delay connection longer if that is set.

Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
2022-07-18 20:50:04 -05:00
Rusty Russell
02e169fd27 lightningd: drive all reconnections out of disconnections.
The only places which should call try_reconnect now are the "connect"
command, and the disconnect path when it decides there's still an
active channel.

This introduces one subtlety: if we disconnect when there's no active
channel, but then the subd makes one, we have to catch that case!

This temporarily reverts "slow" reconnections to fast ones: see next
patch.

Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
2022-07-18 20:50:04 -05:00
Rusty Russell
a3c4908f4a lightningd: don't explicitly tell connectd to disconnect, have it do it on sending error/warning.
Connectd already does this when we *receive* an error or warning, but
now do it on send.  This causes some slight behavior change: we don't
disconnect when we close a channel, for example (our behaviour here
has been inconsistent across versions, depending on the code).

When connectd is told to disconnect, it now does so immediately, and
doesn't wait for subds to drain etc.  That simplifies the manual
disconnect case, which now cleans up as it would from any other
disconnection when connectd says it's disconnected.

Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
2022-07-18 20:50:04 -05:00
Rusty Russell
2962b93199 pytest: don't assume disconnect finished atomically, and suppress interfering redirects.
In various places, we assumed that when `connected` is false,
everything is finished.  This is not true: we should wait for the
state we expect.

In addition, various places allows reconnections, which interfered
with the logic; suppress them.

Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
2022-07-18 20:50:04 -05:00
Rusty Russell
aec307f7ba multifundchannel: fix race where we restart fundchannel.
Disconnecting a peer after openingd fails is not instantaneous:
we abort the open, so openingd sends out a WIRE_ERROR which makes
connectd close the connection.

As a result this test fails often.  The simplest fix is to wait for a
second in multifundchannel before retrying, which is also robust
against behaviour changes if we decide *not* to disconnect in future.

Also make sure that addrhint ownership is correct, since this can
lead to a use-after-free if we filter dests.

```
tests/test_connection.py::test_multifunding_best_effort FAILED                                                    [100%]

======================================================= FAILURES ========================================================
_____________________________________________ test_multifunding_best_effort _____________________________________________

node_factory = <pyln.testing.utils.NodeFactory object at 0x7f6c0c95c1c0>
bitcoind = <pyln.testing.utils.BitcoinD object at 0x7f6c0c92a880>

    @pytest.mark.openchannel('v1')
    @pytest.mark.developer("disconnect=... needs DEVELOPER=1")
    def test_multifunding_best_effort(node_factory, bitcoind):
        '''
        Check that best_effort flag works.
        '''
        disconnects = ["-WIRE_INIT",
                       "-WIRE_ACCEPT_CHANNEL",
                       "-WIRE_FUNDING_SIGNED"]
        l1 = node_factory.get_node()
        l2 = node_factory.get_node()
        l3 = node_factory.get_node(disconnect=disconnects)
        l4 = node_factory.get_node()
    
        l1.fundwallet(2000000)
    
        destinations = [{"id": '{}@localhost:{}'.format(l2.info['id'], l2.port),
                         "amount": 50000},
                        {"id": '{}@localhost:{}'.format(l3.info['id'], l3.port),
                         "amount": 50000},
                        {"id": '{}@localhost:{}'.format(l4.info['id'], l4.port),
                         "amount": 50000}]
    
        for i, d in enumerate(disconnects):
            # Should succeed due to best-effort flag.
>           l1.rpc.multifundchannel(destinations, minchannels=2)

tests/test_connection.py:2070: 
...
>           raise RpcError(method, payload, resp['error'])
E           pyln.client.lightning.RpcError: RPC call failed: method: multifundchannel, payload: {'destinations': [{'id': '022d223620a359a47ff7f7ac447c85c46c923da53389221a0054c11c1e3ca31d59@localhost:41023', 'amount': 50000}, {'id': '035d2b1192dfba134e10e540875d366ebc8bc353d5aa766b80c090b39c3a5d885d@localhost:41977', 'amount': 50000}, {'id': '0382ce59ebf18be7d84677c2e35f23294b9992ceca95491fcf8a56c6cb2d9de199@localhost:34943', 'amount': 50000}], 'minchannels': 2}, error: {'code': 305, 'message': 'Peer not connected at start', 'data': {'id': '0382ce59ebf18be7d84677c2e35f23294b9992ceca95491fcf8a56c6cb2d9de199', 'method': 'fundchannel_start'}}
```

Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
2022-07-18 20:50:04 -05:00
Rusty Russell
e59e12dcb6 lightningd: don't forget peer if it's still connected.
In particular, when onchaind finishes with a channel, and we delete it, we
would forget about the peer, even if it's still connected.  That leads to a
surprise if we are activated because of something it sends:

```
2022-07-16T09:07:51.8668176Z lightningd-1 2022-07-16T08:54:32.497Z INFO    022d223620a359a47ff7f7ac447c85c46c923da53389221a0054c11c1e3ca31d59-chan#1: Peer transient failure in AWAITING_UNILATERAL: Disconnected
2022-07-16T09:07:51.8668717Z lightningd-1 2022-07-16T08:54:32.497Z DEBUG   022d223620a359a47ff7f7ac447c85c46c923da53389221a0054c11c1e3ca31d59-lightningd: Will try reconnect in 1 seconds
2022-07-16T09:07:51.8669323Z lightningd-1 2022-07-16T08:54:32.497Z DEBUG   022d223620a359a47ff7f7ac447c85c46c923da53389221a0054c11c1e3ca31d59-chan#1: Peer has reconnected, state AWAITING_UNILATERAL: connecting subd
2022-07-16T09:07:51.8671225Z lightningd-1 2022-07-16T08:54:32.497Z DEBUG   022d223620a359a47ff7f7ac447c85c46c923da53389221a0054c11c1e3ca31d59-chan#1: Telling connectd to send error 001185a48d443eae6fbcc679accd4d497c4183b711f2cd204c0b50acd3cd76fda08d00936368616e6e656c643a207265636569766564204552524f52206572726f72206368616e6e656c20383561343864343433656165366662636336373961636364346434393763343138336237313166326364323034633062353061636433636437366664613038643a20466f726369626c7920636c6f7365642062792060636c6f73656020636f6d6d616e642074696d656f7574
2022-07-16T09:07:51.8671786Z lightningd-2 2022-07-16T08:54:32.497Z DEBUG   0266e4598d1d3c415f572a8488830b60f7e744ed9235eb0b1ba93283b315c03518-connectd: peer_in WIRE_GOSSIP_TIMESTAMP_FILTER
2022-07-16T09:07:51.8672270Z lightningd-2 2022-07-16T08:54:32.497Z DEBUG   022d223620a359a47ff7f7ac447c85c46c923da53389221a0054c11c1e3ca31d59-hsmd: Got WIRE_HSMD_ECDH_REQ
2022-07-16T09:07:51.8673027Z lightningd-2 2022-07-16T08:54:32.497Z DEBUG   0266e4598d1d3c415f572a8488830b60f7e744ed9235eb0b1ba93283b315c03518-onchaind-chan#1: billboard: All outputs resolved: waiting 0 more blocks before forgetting channel
2022-07-16T09:07:51.8673419Z lightningd-2 2022-07-16T08:54:32.497Z DEBUG   gossipd: REPLY WIRE_GOSSIPD_GET_ADDRS_REPLY with 0 fds
2022-07-16T09:07:51.8673954Z lightningd-2 2022-07-16T08:54:32.497Z DEBUG   0266e4598d1d3c415f572a8488830b60f7e744ed9235eb0b1ba93283b315c03518-connectd: peer_out WIRE_GOSSIP_TIMESTAMP_FILTER
2022-07-16T09:07:51.8674298Z lightningd-2 2022-07-16T08:54:32.497Z DEBUG   hsmd: Client: Received message 1 from client
2022-07-16T09:07:51.8674811Z lightningd-2 2022-07-16T08:54:32.497Z DEBUG   0266e4598d1d3c415f572a8488830b60f7e744ed9235eb0b1ba93283b315c03518-gossipd: seeker: chosen as startup peer
2022-07-16T09:07:51.8675330Z lightningd-2 2022-07-16T08:54:32.497Z DEBUG   0266e4598d1d3c415f572a8488830b60f7e744ed9235eb0b1ba93283b315c03518-connectd: Activating for message WIRE_ERROR
2022-07-16T09:07:51.8675825Z lightningd-2 2022-07-16T08:54:32.497Z DEBUG   022d223620a359a47ff7f7ac447c85c46c923da53389221a0054c11c1e3ca31d59-hsmd: Got WIRE_HSMD_ECDH_REQ
...
2022-07-16T09:07:51.9503144Z lightningd-2 2022-07-16T08:54:32.737Z **BROKEN** lightningd: FATAL SIGNAL 11 (version 6e6b41d)
2022-07-16T09:07:51.9503563Z lightningd-2 2022-07-16T08:54:32.737Z **BROKEN** lightningd: backtrace: common/daemon.c:38 (send_backtrace) 0x5620882dbffb
2022-07-16T09:07:51.9503970Z lightningd-2 2022-07-16T08:54:32.737Z **BROKEN** lightningd: backtrace: common/daemon.c:46 (crashdump) 0x5620882dc04d
2022-07-16T09:07:51.9504534Z lightningd-2 2022-07-16T08:54:32.737Z **BROKEN** lightningd: backtrace: /build/glibc-SzIz7B/glibc-2.31/signal/../sysdeps/unix/sysv/linux/x86_64/sigaction.c:0 ((null)) 0x7ffb3abdf08f
2022-07-16T09:07:51.9504973Z lightningd-2 2022-07-16T08:54:32.737Z **BROKEN** lightningd: backtrace: lightningd/peer_control.c:1378 (peer_spoke) 0x5620882aedd7
2022-07-16T09:07:51.9505418Z lightningd-2 2022-07-16T08:54:32.737Z **BROKEN** lightningd: backtrace: lightningd/connect_control.c:492 (connectd_msg) 0x56208828e9db
2022-07-16T09:07:51.9505835Z lightningd-2 2022-07-16T08:54:32.737Z **BROKEN** lightningd: backtrace: lightningd/subd.c:557 (sd_msg_read) 0x5620882bd89a
2022-07-16T09:07:51.9506236Z lightningd-2 2022-07-16T08:54:32.737Z **BROKEN** lightningd: backtrace: ccan/ccan/io/io.c:59 (next_plan) 0x5620883318d4
2022-07-16T09:07:51.9506618Z lightningd-2 2022-07-16T08:54:32.737Z **BROKEN** lightningd: backtrace: ccan/ccan/io/io.c:407 (do_plan) 0x562088331da1
2022-07-16T09:07:51.9507021Z lightningd-2 2022-07-16T08:54:32.737Z **BROKEN** lightningd: backtrace: ccan/ccan/io/io.c:417 (io_ready) 0x562088331e3e
2022-07-16T09:07:51.9507428Z lightningd-2 2022-07-16T08:54:32.737Z **BROKEN** lightningd: backtrace: ccan/ccan/io/poll.c:453 (io_loop) 0x5620883337d3
2022-07-16T09:07:51.9507945Z lightningd-2 2022-07-16T08:54:32.737Z **BROKEN** lightningd: backtrace: lightningd/io_loop_with_timers.c:22 (io_loop_with_timers) 0x5620882969f5
2022-07-16T09:07:51.9508368Z lightningd-2 2022-07-16T08:54:32.737Z **BROKEN** lightningd: backtrace: lightningd/lightningd.c:1190 (main) 0x56208829a7bb
2022-07-16T09:07:51.9508804Z lightningd-2 2022-07-16T08:54:32.737Z **BROKEN** lightningd: backtrace: ../csu/libc-start.c:308 (__libc_start_main) 0x7ffb3abc0082
2022-07-16T09:07:51.9509172Z lightningd-2 2022-07-16T08:54:32.737Z **BROKEN** lightningd: backtrace: (null):0 ((null)) 0x56208827d32d
2022-07-16T09:07:51.9509552Z lightningd-2 2022-07-16T08:54:32.737Z **BROKEN** lightningd: backtrace: (null):0 ((null)) 0xffffffffffffffff
```

Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
2022-07-18 20:50:04 -05:00
Rusty Russell
e15e55190b lightningd: provide peer address for reconnect if connect fails.
It usually works out due to other reconnections, but I noticed this
diagnosing another test.

Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
2022-07-18 20:50:04 -05:00
Rusty Russell
c57a5a0a06 gossipd: downgrade broken message that can actually happen.
I saw this in test_bech32_funding: the peer disconnected and told gossipd before lightningd
relayed a local_channel_announcement from the subd:

```
lightningd-1 2022-07-14T08:46:32.352Z DEBUG   020ba48216be53051ba8c661c641b5d9c3547c44bfcc43bf4d8362f0dfce0e950d-channeld-chan#1: billboard: Funding transaction locked. Channel announced.
lightningd-1 2022-07-14T08:46:33.353Z DEBUG   connectd: drain_peer
lightningd-1 2022-07-14T08:46:33.353Z DEBUG   connectd: drain_peer draining subd!
lightningd-1 2022-07-14T08:46:33.353Z DEBUG   020ba48216be53051ba8c661c641b5d9c3547c44bfcc43bf4d8362f0dfce0e950d-lightningd: peer_disconnect_done
lightningd-1 2022-07-14T08:46:33.354Z INFO    020ba48216be53051ba8c661c641b5d9c3547c44bfcc43bf4d8362f0dfce0e950d-channeld-chan#1: Peer connection lost
lightningd-1 2022-07-14T08:46:33.354Z INFO    020ba48216be53051ba8c661c641b5d9c3547c44bfcc43bf4d8362f0dfce0e950d-chan#1: Peer transient failure in CHANNELD_NORMAL: channeld: Owning subdaemon channeld died (62208)
lightningd-1 2022-07-14T08:46:33.354Z DEBUG   connectd: maybe_free_peer freeing peer!
lightningd-1 2022-07-14T08:46:33.354Z DEBUG   0228af54fd951097caa2ceea5546a37bcc7d7f746e1cb7cb549e3edcd1797a1d80-hsmd: Got WIRE_HSMD_CUPDATE_SIG_REQ
lightningd-1 2022-07-14T08:46:33.354Z DEBUG   plugin-funder: Cleaning up inflights for peer id 020ba48216be53051ba8c661c641b5d9c3547c44bfcc43bf4d8362f0dfce0e950d
lightningd-1 2022-07-14T08:46:33.354Z **BROKEN** gossipd: Unknown peer 020ba48216be53051ba8c661c641b5d9c3547c44bfcc43bf4d8362f0dfce0e950d for local_channel_announcement
```

Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
2022-07-18 20:50:04 -05:00
Rusty Russell
c415c80d48 connectd: spelling and typo fixes.
From @niftynei.

Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
2022-07-18 20:50:04 -05:00
Rusty Russell
719d1384d1 connectd: give connections a chance to drain when lightningd says to disconnect, or peer disconnects.
We want to avoid lost messages in the common cases.

This generalizes our drain code, by giving the subds each 5 seconds to
close themselves, but continue to allow them to send us traffic (if
peer is still connected) and continue to send them traffic.

We continue to send traffic *out* to the peer (if it's still
connected), until all subds are gone.  We still have a 5 second timer
to close the connection to peer.

On reconnects, we don't do this "drain period" on reconnects: we kill
immediately.

We fix up one test which was looking for the "disconnect" message
explicitly.

Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
2022-07-18 20:50:04 -05:00
Rusty Russell
9cff125590 common/gossip_store: fix leak on partial read.
Very unusual, but it can happen, and we don't free:

```
lightningd-1 2022-07-12T04:21:22.591Z DEBUG   gossipd: REPLY WIRE_GOSSIPD_DEV_MEMLEAK_REPLY with 0 fds
lightningd-1 2022-07-12T04:21:22.645Z **BROKEN** connectd: MEMLEAK: 0x55e73123d008
lightningd-1 2022-07-12T04:21:22.645Z **BROKEN** connectd:   label=common/gossip_store.c:92:u8[]
lightningd-1 2022-07-12T04:21:22.645Z **BROKEN** connectd:   backtrace:
lightningd-1 2022-07-12T04:21:22.645Z **BROKEN** connectd:     ccan/ccan/tal/tal.c:442 (tal_alloc_)
lightningd-1 2022-07-12T04:21:22.645Z **BROKEN** connectd:     ccan/ccan/tal/tal.c:471 (tal_alloc_arr_)
lightningd-1 2022-07-12T04:21:22.645Z **BROKEN** connectd:     common/gossip_store.c:92 (gossip_store_next)
lightningd-1 2022-07-12T04:21:22.645Z **BROKEN** connectd:     connectd/multiplex.c:433 (maybe_from_gossip_store)
lightningd-1 2022-07-12T04:21:22.645Z **BROKEN** connectd:     connectd/multiplex.c:856 (write_to_peer)
lightningd-1 2022-07-12T04:21:22.646Z **BROKEN** connectd:     ccan/ccan/io/io.c:59 (next_plan)
lightningd-1 2022-07-12T04:21:22.646Z **BROKEN** connectd:     ccan/ccan/io/io.c:407 (do_plan)
lightningd-1 2022-07-12T04:21:22.646Z **BROKEN** connectd:     ccan/ccan/io/io.c:423 (io_ready)
lightningd-1 2022-07-12T04:21:22.646Z **BROKEN** connectd:     ccan/ccan/io/poll.c:453 (io_loop)
lightningd-1 2022-07-12T04:21:22.646Z **BROKEN** connectd:     connectd/connectd.c:2083 (main)
lightningd-1 2022-07-12T04:21:22.646Z **BROKEN** connectd:     ../sysdeps/nptl/libc_start_call_main.h:58 (__libc_start_call_main)
lightningd-1 2022-07-12T04:21:22.646Z **BROKEN** connectd:     ../csu/libc-start.c:392 (__libc_start_main_impl)
lightningd-1 2022-07-12T04:21:22.646Z **BROKEN** connectd:   parents:
```
2022-07-18 20:50:04 -05:00
Rusty Russell
671e66490e lightningd: don't kill subds immediately on disconnect.
Give them time to process any final messages!  If there's a reconnect,
then we need to clean them up immediately of course.

Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
2022-07-18 20:50:04 -05:00
Rusty Russell
6a2817101d connectd: don't move parent while we're being freed.
A subtle case I hadn't come across before: if a child tal_resizes()
its parent while the parent is being deleted, tal gets confused.

The subd destructor does this using tal_arr_remove() on peer->subds,
which is currently being freed:

```
==61056== Invalid read of size 8
==61056==    at 0x185632: del_tree (tal.c:417)
==61056==    by 0x18560D: del_tree (tal.c:412)
==61056==    by 0x185957: tal_free (tal.c:486)
==61056==    by 0x1183BC: peer_discard (connectd.c:1861)
==61056==    by 0x11869E: recv_req (connectd.c:1942)
==61056==    by 0x12774B: handle_read (daemon_conn.c:35)
==61056==    by 0x173453: next_plan (io.c:59)
==61056==    by 0x17405B: do_plan (io.c:407)
==61056==    by 0x17409D: io_ready (io.c:417)
==61056==    by 0x176390: io_loop (poll.c:453)
==61056==    by 0x118A68: main (connectd.c:2082)
==61056==  Address 0x4bd8850 is 16 bytes inside a block of size 48 free'd
==61056==    at 0x483DFAF: realloc (in /usr/lib/x86_64-linux-gnu/valgrind/vgpreload_memcheck-amd64-linux.so)
==61056==    by 0x1860E6: tal_resize_ (tal.c:699)
==61056==    by 0x1373DD: tal_arr_remove_ (utils.c:184)
==61056==    by 0x11D508: destroy_subd (multiplex.c:930)
==61056==    by 0x1850A4: notify (tal.c:240)
==61056==    by 0x1855BB: del_tree (tal.c:402)
==61056==    by 0x18560D: del_tree (tal.c:412)
==61056==    by 0x18560D: del_tree (tal.c:412)
==61056==    by 0x185957: tal_free (tal.c:486)
==61056==    by 0x1183BC: peer_discard (connectd.c:1861)
==61056==    by 0x11869E: recv_req (connectd.c:1942)
==61056==    by 0x12774B: handle_read (daemon_conn.c:35)
```

So simply make the subds children of `peer` not the `peer->subds`
array.  The only effect is that drain_peer() can't simply free the
subds array but must free the subds one at a time.

Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
2022-07-18 20:50:04 -05:00
Rusty Russell
2daf461762 pytest: enable race test.
Now this passes.

Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
2022-07-18 20:50:04 -05:00
Rusty Russell
d31420211a connectd: add counters to each peer connection.
This allows us to detect when lightningd hasn't seen our latest
disconnect/reconnect; in particular, we would hit the following pattern:

1. lightningd says to connect a subd.
2. connectd disconnects and reconnects.
3. connectd reads message, connects subd.
4. lightningd reads disconnect and reconnect, sends msg to connect to subd again.
5. connectd asserts because subd is alreacy connected.

This way connectd can tell if lightningd is talking about the previous
connection, and ignoere it.

Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
2022-07-18 20:50:04 -05:00
Rusty Russell
41b379ed89 lightningd: hand fds to connectd, not receive them from connectd.
Before this patch:
1. connectd says it's connected (peer_connected)
2. we tell connectd we want to talk about each channel (peer_make_active)
3. connectd gives us an fd for each channel, and we connect it to a subd (peer_active)
4. OR, connectd says it sent something about a channel we didn't tell it about, with an fd (peer_active)

Now:
1. connectd says it's connected (peer_connected)
2. we start all appropriate subds and tell connectd to what channels/fds (peer_connect_subd).
3. if connectd says it sent something about a channel we didn't tell it about, we either tell
   it to hang up (peer_final_msg), or connect a new opening daemon (peer_connect_subd).

This is the minimal-size patch, which is why we create socket pairs in
so many places to use the existing functions.  Many cleanups are
possible, since the new flow is so simple.

Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
2022-07-18 20:50:04 -05:00
Rusty Russell
430d6521a0 common/daemon_conn: add function to read an fd.
We never needed this before.

Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
2022-07-18 20:50:04 -05:00
Rusty Russell
eff53495db lightningd: make "is peer connected" a tristate.
First, connectd tells us the peer has connected, and we call the connected hook,
and if it says it's fine, we are actually connected and we fire off notifications.

Of course, we could be disconnected while in the connected hook, and that would
mean we tell people about a connection which is no longer current.

Make this clear with a tristate: if we're not marked disconnected by
the time the hooks finish, we're good.  It also gives us a cleaner
"connect" command return when we connected but disconnected before
processing.

Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
2022-07-18 20:50:04 -05:00
Rusty Russell
571f0fad1b lightningd: remove delay on succeeding connect.
We used to not return from "connect" until we had connected all the subds,
which introduced more races if something went wrong.

Remove this workaround, since we're going to rework this logic entirely.

Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
2022-07-18 20:50:04 -05:00
Rusty Russell
a12e2209ff dualopend: fix memleak report.
Not an important one, but memleak detection got upset:

```
022d223620a359a47ff7f7ac447c85c46c923da53389221a0054c11c1e3ca31d59-dualopend-chan#2: MEMLEAK: 0x55dd9797bb68
022d223620a359a47ff7f7ac447c85c46c923da53389221a0054c11c1e3ca31d59-dualopend-chan#2:   label=openingd/dualopend_wiregen.c:767:struct lease_rates
022d223620a359a47ff7f7ac447c85c46c923da53389221a0054c11c1e3ca31d59-dualopend-chan#2:   backtrace:
022d223620a359a47ff7f7ac447c85c46c923da53389221a0054c11c1e3ca31d59-dualopend-chan#2:     ccan/ccan/tal/tal.c:442 (tal_alloc_)
022d223620a359a47ff7f7ac447c85c46c923da53389221a0054c11c1e3ca31d59-dualopend-chan#2:     openingd/dualopend_wiregen.c:767 (fromwire_dualopend_opener_init)
022d223620a359a47ff7f7ac447c85c46c923da53389221a0054c11c1e3ca31d59-dualopend-chan#2:     openingd/dualopend.c:2671 (opener_start)
022d223620a359a47ff7f7ac447c85c46c923da53389221a0054c11c1e3ca31d59-dualopend-chan#2:     openingd/dualopend.c:3649 (handle_master_in)
022d223620a359a47ff7f7ac447c85c46c923da53389221a0054c11c1e3ca31d59-dualopend-chan#2:     openingd/dualopend.c:3973 (main)
022d223620a359a47ff7f7ac447c85c46c923da53389221a0054c11c1e3ca31d59-dualopend-chan#2:     ../csu/libc-start.c:308 (__libc_start_main)
022d223620a359a47ff7f7ac447c85c46c923da53389221a0054c11c1e3ca31d59-dualopend-chan#2:   parents:
022d223620a359a47ff7f7ac447c85c46c923da53389221a0054c11c1e3ca31d59-dualopend-chan#2:     openingd/dualopend.c:3796:struct state
```

Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
2022-07-18 20:50:04 -05:00
Rusty Russell
ab0e5d30ee connectd: don't io_halfclose()
We don't io_halfclose() the other side, we io_sock_shutdown(), which can leave both
sides unset:

```
lightningd-2: 2022-06-07T11:00:05.053Z **BROKEN** connectd: FATAL SIGNAL 6 (version 57e1af2)
lightningd-2: 2022-06-07T11:00:05.053Z **BROKEN** connectd: backtrace: common/daemon.c:38 (send_backtrace) 0x563b9b603af7
lightningd-2: 2022-06-07T11:00:05.053Z **BROKEN** connectd: backtrace: common/daemon.c:46 (crashdump) 0x563b9b603b4b
lightningd-2: 2022-06-07T11:00:05.053Z **BROKEN** connectd: backtrace: /build/glibc-SzIz7B/glibc-2.31/signal/../sysdeps/unix/sysv/linux/x86_64/sigaction.c:0 ((null)) 0x7fe6e8d4f08f
lightningd-2: 2022-06-07T11:00:05.053Z **BROKEN** connectd: backtrace: ../sysdeps/unix/sysv/linux/raise.c:51 (__GI_raise) 0x7fe6e8d4f00b
lightningd-2: 2022-06-07T11:00:05.054Z **BROKEN** connectd: backtrace: /build/glibc-SzIz7B/glibc-2.31/stdlib/abort.c:79 (__GI_abort) 0x7fe6e8d2e858
lightningd-2: 2022-06-07T11:00:05.054Z **BROKEN** connectd: backtrace: /build/glibc-SzIz7B/glibc-2.31/assert/assert.c:92 (__assert_fail_base) 0x7fe6e8d2e728
lightningd-2: 2022-06-07T11:00:05.054Z **BROKEN** connectd: backtrace: /build/glibc-SzIz7B/glibc-2.31/assert/assert.c:101 (__GI___assert_fail) 0x7fe6e8d3ffd5
lightningd-2: 2022-06-07T11:00:05.054Z **BROKEN** connectd: backtrace: ccan/ccan/io/io.c:65 (next_plan) 0x563b9b64fd7e
lightningd-2: 2022-06-07T11:00:05.054Z **BROKEN** connectd: backtrace: ccan/ccan/io/io.c:407 (do_plan) 0x563b9b6508f0
lightningd-2: 2022-06-07T11:00:05.054Z **BROKEN** connectd: backtrace: ccan/ccan/io/io.c:423 (io_ready) 0x563b9b650984
lightningd-2: 2022-06-07T11:00:05.054Z **BROKEN** connectd: backtrace: ccan/ccan/io/poll.c:453 (io_loop) 0x563b9b652c25
lightningd-2: 2022-06-07T11:00:05.054Z **BROKEN** connectd: backtrace: connectd/connectd.c:2037 (main) 0x563b9b5f5793
lightningd-2: 2022-06-07T11:00:05.054Z **BROKEN** connectd: backtrace: ../csu/libc-start.c:308 (__libc_start_main) 0x7fe6e8d30082
lightningd-2: 2022-06-07T11:00:05.054Z **BROKEN** connectd: backtrace: (null):0 ((null)) 0x563b9b5ebf6d
lightningd-2: 2022-06-07T11:00:05.054Z **BROKEN** connectd: backtrace: (null):0 ((null)) 0xffffffffffffffff
```

Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
2022-07-18 20:50:04 -05:00
Rusty Russell
6a9a091234 pytest: add another connection stress test, using multiple channels (bug #5254)
This one actually triggers an assert() on my machine, so though it
wasn't what I was looking for, let's include it:

```
lightning_connectd: connectd/connectd.c:1905: peer_conn_closed: Assertion `tal_count(peer->subds) == 0' failed.
lightning_connectd: FATAL SIGNAL 6 (version v0.11.0.1-15-gc812595)
0x55b3e1e21302 send_backtrace
	common/daemon.c:33
0x55b3e1e213ac crashdump
	common/daemon.c:46
0x7f44292ff08f ???
	/build/glibc-SzIz7B/glibc-2.31/signal/../sysdeps/unix/sysv/linux/x86_64/sigaction.c:0
0x7f44292ff00b __GI_raise
	../sysdeps/unix/sysv/linux/raise.c:51
0x7f44292de858 __GI_abort
	/build/glibc-SzIz7B/glibc-2.31/stdlib/abort.c:79
0x7f44292de728 __assert_fail_base
	/build/glibc-SzIz7B/glibc-2.31/assert/assert.c:92
0x7f44292effd5 __GI___assert_fail
	/build/glibc-SzIz7B/glibc-2.31/assert/assert.c:101
0x55b3e1e125db peer_conn_closed
	connectd/connectd.c:1905
0x55b3e1e17b4f destroy_subd
	connectd/multiplex.c:1112
0x55b3e1e7fdf4 notify
	ccan/ccan/tal/tal.c:240
0x55b3e1e8030b del_tree
	ccan/ccan/tal/tal.c:402
0x55b3e1e8035d del_tree
	ccan/ccan/tal/tal.c:412
0x55b3e1e806a7 tal_free
	ccan/ccan/tal/tal.c:486
0x55b3e1e6ef59 io_close
	ccan/ccan/io/io.c:450
0x55b3e1e17429 write_to_subd
	connectd/multiplex.c:957
0x55b3e1e6e1a3 next_plan
	ccan/ccan/io/io.c:59
0x55b3e1e6eebc io_do_always
	ccan/ccan/io/io.c:435
0x55b3e1e70baa handle_always
	ccan/ccan/io/poll.c:304
0x55b3e1e70ea1 io_loop
	ccan/ccan/io/poll.c:385
0x55b3e1e12dd5 main
	connectd/connectd.c:2159
0x7f44292e0082 __libc_start_main
	../csu/libc-start.c:308
0x55b3e1e0885d ???
	???:0
0xffffffffffffffff ???
	???:0
```

Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
2022-07-18 20:50:04 -05:00
Rusty Russell
8e1d5c19d6 pytest: test to reproduce "channeld: sent ERROR bad reestablish revocation_number: 0 vs 3"
It's caused by a reconnection race: we hold the new incoming connection while we
ask lightningd to kill the old connection.  But under some circumstances we leave
the new incoming hanging (with, in this case, old reestablish messages unread!)
and another connection comes in.

Then, later we service the long-gone "incoming" connection, channeld
reads the ancient reestablish message and gets upset.

This test used to hang, but now we've fixed reconnection races it is fine.

Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
2022-07-18 20:50:04 -05:00
Rusty Russell
40145e619b connectd: remove the redundant "already connected" logic.
It should now be reliable, so we don't need this.

Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
2022-07-18 20:50:04 -05:00
Rusty Russell
9b6c97437e connectd: remove reconnection logic.
We don't have to put aside a peer which is reconnecting and wait for
lightningd to remove the old peer, we can now simply free the old
and add the new.

Fixes: #5240
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
2022-07-18 20:50:04 -05:00
Rusty Russell
7b0c11efb4 connectd: don't let peer close take forever.
Sending any pending messages to peer before hanging up is a courtesy:
give it 5 seconds before simply closing.

Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
2022-07-18 20:50:04 -05:00
Rusty Russell
8678c5efb3 connectd: release peer soon as lightingd tells us.
Now we have separate peer draining logic, we can simply use it when
connectd tells us to release the peer, without waiting.  (We could
simply free the peer, but that's a bit rude, as messages can get
lost).

This removes various complex flags and logic we had before.

Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
Changelog-Fixed: `connectd`: various crashes and issues fixed by simplification and rewrite.
2022-07-18 20:50:04 -05:00
Rusty Russell
e856accb7d connectd: send cleanup messages however peer is freed.
This lets us tal_free() it wherever we want, rather than always
freeing via peer_discard.

Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
2022-07-18 20:50:04 -05:00
Rusty Russell
d58e6fa20b lightningd: don't tell connectd to disconnect peer if it told us.
We allow connectd to tell us a peer has gone away, but now we need
to make sure we don't double-spiderman and tell it to disconnect peer.

This is particularly harmful on reconnect: it (will soon) tell us the
old connection is gone, ready to tell us the new peer has connected.
We would tell it to disconnect the peer, which throws away the new
connection!

Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
2022-07-18 20:50:04 -05:00
Rusty Russell
c64ce4bbf3 lightningd: clean up channels when connectd says peer is gone.
This is redundant now, since connectd only sends us this once we tell
it it's OK, but that's changing, so clean up now.  This means that
connectd will be able to make *unsolicited* closes, if it needs to.

We share logic with peer_please_disconnect.

Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
2022-07-18 20:50:04 -05:00
Rusty Russell
9dc3880360 connectd: put peer into "draining" mode when we want to close it.
This removes it from the hashtable, and forces it to do nothing but
send out any remaining packets, then close.

It is, in effect, reduced to a stub, with no further interactions
with the rest of the system (all subds are freed already).

Also removes the need for an explicit "final_msg" too.

Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
2022-07-18 20:50:04 -05:00
Rusty Russell
37ff013c2c connectd: fix subd tal parents.
This came out in a later patch: freeing the peer->subds doesn't actually
free the subds, because they're reparented onto subd->conn, which is
a child of peer itself.


This breaks because when the peer is finally freed, destroy_subd is
called, and expects to find itself in peer->subds (but we made that
NULL when we manually freed it!).

Fix this, and make it obvious that we tal_steal it.

```
ightning_connectd: FATAL SIGNAL 11 (version v0.11.0.1-25-gbf025aa-modded)
0x55de2a1b8b94 send_backtrace
	common/daemon.c:33
0x55de2a1b8c3e crashdump
	common/daemon.c:46
0x7fe2be2fc08f ???
	/build/glibc-SzIz7B/glibc-2.31/signal/../sysdeps/unix/sysv/linux/x86_64/sigaction.c:0
0x55de2a1af41e destroy_subd
	connectd/multiplex.c:1119
0x55de2a217686 notify
	ccan/ccan/tal/tal.c:240
0x55de2a217b9d del_tree
	ccan/ccan/tal/tal.c:402
0x55de2a217bef del_tree
	ccan/ccan/tal/tal.c:412
0x55de2a217bef del_tree
	ccan/ccan/tal/tal.c:412
0x55de2a217f39 tal_free
	ccan/ccan/tal/tal.c:486
0x55de2a1aa116 peer_discard
	connectd/connectd.c:1834
0x55de2a1aa38d recv_req
	connectd/connectd.c:1903
0x55de2a1b9121 handle_read
	common/daemon_conn.c:31
0x55de2a205a35 next_plan
	ccan/ccan/io/io.c:59
0x55de2a20663d do_plan
	ccan/ccan/io/io.c:407
0x55de2a20667f io_ready
	ccan/ccan/io/io.c:417
0x55de2a208972 io_loop
	ccan/ccan/io/poll.c:453
0x55de2a1aa736 main
	connectd/connectd.c:2042
0x7fe2be2dd082 __libc_start_main
	../csu/libc-start.c:308
0x55de2a1a085d ???
	???:0
0xffffffffffffffff ???
	???:0
```

Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
2022-07-18 20:50:04 -05:00
Rusty Russell
912ac25270 lightningd: remove 'connected' flag from channel structure.
It's directly a product of "does it have a current owner subdaemon"
and "does that subdaemon talk to peers", so create a helper function
which just evaluates that instead.

Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
2022-07-18 20:50:04 -05:00
Rusty Russell
b8ed107743 lightningd: fix dev-memleak crash on unown unconfirmed channels.
```
lightningd-1 2022-07-14T08:18:38.972Z **BROKEN** lightningd: FATAL SIGNAL 11 (version e0507aa)
lightningd-1 2022-07-14T08:18:38.972Z **BROKEN** lightningd: backtrace: common/daemon.c:38 (send_backtrace) 0x56319736e437
lightningd-1 2022-07-14T08:18:38.972Z **BROKEN** lightningd: backtrace: common/daemon.c:46 (crashdump) 0x56319736e48a
lightningd-1 2022-07-14T08:18:38.972Z **BROKEN** lightningd: backtrace: ./signal/../sysdeps/unix/sysv/linux/x86_64/libc_sigaction.c:0 ((null)) 0x7fc37721151f
lightningd-1 2022-07-14T08:18:38.972Z **BROKEN** lightningd: backtrace: lightningd/subd.c:839 (subd_send_msg) 0x5631973425f3
lightningd-1 2022-07-14T08:18:38.972Z **BROKEN** lightningd: backtrace: lightningd/subd.c:859 (subd_req_) 0x5631973426e4
lightningd-1 2022-07-14T08:18:38.972Z **BROKEN** lightningd: backtrace: lightningd/peer_control.c:2967 (peer_dev_memleak) 0x56319732ca93
lightningd-1 2022-07-14T08:18:38.972Z **BROKEN** lightningd: backtrace: lightningd/memdump.c:296 (json_memleak) 0x56319730fc97
lightningd-1 2022-07-14T08:18:38.972Z **BROKEN** lightningd: backtrace: lightningd/jsonrpc.c:630 (command_exec) 0x563197306f9d
lightningd-1 2022-07-14T08:18:38.972Z **BROKEN** lightningd: backtrace: lightningd/jsonrpc.c:765 (rpc_command_hook_final) 0x5631973075d5
lightningd-1 2022-07-14T08:18:38.972Z **BROKEN** lightningd: backtrace: lightningd/plugin_hook.c:278 (plugin_hook_call_) 0x56319733db47
lightningd-1 2022-07-14T08:18:38.972Z **BROKEN** lightningd: backtrace: lightningd/jsonrpc.c:853 (plugin_hook_call_rpc_command) 0x5631973079d4
lightningd-1 2022-07-14T08:18:38.972Z **BROKEN** lightningd: backtrace: lightningd/jsonrpc.c:957 (parse_request) 0x563197307efb
lightningd-1 2022-07-14T08:18:38.972Z **BROKEN** lightningd: backtrace: lightningd/jsonrpc.c:1054 (read_json) 0x563197308361
lightningd-1 2022-07-14T08:18:38.972Z **BROKEN** lightningd: backtrace: ccan/ccan/io/io.c:59 (next_plan) 0x5631973de46c
lightningd-1 2022-07-14T08:18:38.972Z **BROKEN** lightningd: backtrace: ccan/ccan/io/io.c:407 (do_plan) 0x5631973df0a1
lightningd-1 2022-07-14T08:18:38.972Z **BROKEN** lightningd: backtrace: ccan/ccan/io/io.c:417 (io_ready) 0x5631973df0e3
lightningd-1 2022-07-14T08:18:38.972Z **BROKEN** lightningd: backtrace: ccan/ccan/io/poll.c:453 (io_loop) 0x5631973e147f
lightningd-1 2022-07-14T08:18:38.972Z **BROKEN** lightningd: backtrace: lightningd/io_loop_with_timers.c:22 (io_loop_with_timers) 0x563197305041
lightningd-1 2022-07-14T08:18:38.972Z **BROKEN** lightningd: backtrace: lightningd/lightningd.c:1184 (main) 0x56319730b58d
lightningd-1 2022-07-14T08:18:38.972Z **BROKEN** lightningd: backtrace: ../sysdeps/nptl/libc_start_call_main.h:58 (__libc_start_call_main) 0x7fc3771f8d8f
lightningd-1 2022-07-14T08:18:38.972Z **BROKEN** lightningd: backtrace: ../csu/libc-start.c:392 (__libc_start_main_impl) 0x7fc3771f8e3f
lightningd-1 2022-07-14T08:18:38.972Z **BROKEN** lightningd: backtrace: (null):0 ((null)) 0x5631972dec54
lightningd-1 2022-07-14T08:18:38.972Z **BROKEN** lightningd: backtrace: (null):0 ((null)) 0xffffffffffffffff
lightningd-3 2022-07-14T08:18:38.976Z DEBUG   connectd: drain_peer
```

Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
2022-07-18 20:50:04 -05:00
Rusty Russell
08e3e979c8 lightningd: set cid correctly in peer->uncommitted_channel.
Setting it to 0xfffff... is just confusing.

Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
2022-07-18 20:50:04 -05:00
William Casarin
98185dfc2b startup_regtest: add connect helper
`connect 1 2` to connect from l1 to l2, etc
2022-07-18 20:37:32 -05:00
William Casarin
a3f5d31b09 startup_regtest: add experimental-offers 2022-07-18 20:37:32 -05:00
Rusty Russell
73762de18c lightningd: reduce log level for remote address reporting.
It's available in listpeers() if you want to see it, otherwise it's not
really something users want to see in the normal course of operation.

Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
2022-07-18 05:32:24 +02:00
Michael Schmoock
65433de05f options: set DNS port to network default if not specified
- set port for a DNS announcement without port to network default
- remove x-fail

Changelog-Fixed: Port of a DNS announcement can be 0 if unspecified
2022-07-17 23:26:07 +02:00
Michael Schmoock
6d4285d7c4 pytest: add xfail test to show DNS w/o port issue
This adds an X-Fail testcase that demonstrates that currently
the port of a DNS announcement is not set to the corresponding
network port (in this case regtest), but it will be set to 0.

Changelog-None
2022-07-17 23:26:07 +02:00
Swapnil
6204d70a37 docs: fix contrib/ docs 2022-07-17 21:40:01 +09:30
grubles
af2b863b4a Add instructions for checking out a release tag 2022-07-17 11:45:34 +09:30
zero fee routing
58ae885f48 fix typo in commando documentation 2022-07-17 08:51:02 +09:30
Rusty Russell
8c48eda8c7 decode: support decoding runes.
This is a bit weird since it lives in the offers plugin, but it works
well.  This should make runes much more approachable for people!

Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
2022-07-17 08:51:02 +09:30
Rusty Russell
468dff1723 commando: add rate for maximum successful rune use per minute.
I'm assuming that nobody wants a rate slower than 1 per minute; we can
introduce 'drate' if we want a per-day kind of limit.

Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
2022-07-17 08:51:02 +09:30
Rusty Russell
4ab09f7cfb commando: add support for parameters by array, parameter count.
Awkward to filter, but they're really practical for many commands.

Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
2022-07-17 08:51:02 +09:30
Rusty Russell
cf28cff398 doc: document commando and commando-rune.
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
2022-07-17 08:51:02 +09:30
Rusty Russell
8688daf937 commando: require runes for operation.
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
2022-07-17 08:51:02 +09:30
Rusty Russell
ae4856df70 commando: don't look at messages *at all* unless they've created a rune.
This means we can leave commando on by default, without an explicit config flag.

Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
2022-07-17 08:51:02 +09:30
Rusty Russell
419cb60b1b commando: add commando-rune command.
Can both mint new runes, and add one or more restrictions to existing ones.

Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
2022-07-17 08:51:02 +09:30