Commit Graph

226 Commits

Author SHA1 Message Date
Rusty Russell
5d42600076 connectd: ratelimit onion messages
However fast we can handle them, it's antisocial to allow others to
make us spam the rest of the network.

Changelog-Protocol: onion messages: we limit incoming to 4 per second, allowing a little burst.
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
2024-07-10 13:34:00 +02:00
Rusty Russell
f122c0beb4 connectd: include map of scid->peer node id.
This will let us fwd onion messages via scid, even if they're aliases.

Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
2024-07-10 13:34:00 +02:00
Rusty Russell
b5f921ce0a lightningd: add routine to directly inject an onion message.
Unlike "sendonionmessage" which instructs us to send to a peer, this
process it locally (presumably, it contains the next hop).  This is
useful because it allows us to process an onion message which starts
with us (a legal case for a blinded path supplied by someone else!).
It also opens the door to bolt12 self-pay.

Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
2024-07-10 13:34:00 +02:00
Rusty Russell
4a78d17748 connectd: do response to gossip queries, don't hand them to gossipd.
This basically means moving the code from gossipd to connectd to handle
these queries.

This will get connectd have finer control over ratelimiting them.

Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
2024-07-10 12:21:19 +09:30
Rusty Russell
d60977f37f connectd: use gossmap streaming interface.
This is more efficient in a few ways:
1. It's trivial to get to the end of the gossip_store, we don't have
   to iterate.
2. It tends to be mmaped so we don't have to call pread().

Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
2024-07-10 12:21:19 +09:30
Rusty Russell
401533667d connectd: throttle streaming gossip for peers.
We currently stream gossip as fast as we can, even if they start at
timestamp 0.  Instead, use a simple token bucket filter and only let
them have 1MB per second (500 bytes per second for testing).

Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
Changelog-Protocol: connectd: we now throttle outgoing gossip at 1MB/second per peer.
2024-07-10 12:21:19 +09:30
Rusty Russell
5a5fee92b3 connectd: don't report socket fds twice.
The initial commit had this code twice, for some reason!

Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
2024-07-09 18:03:44 +09:30
Rusty Russell
155311b053 connectd: --dev-handshake-no-reply so we can test pending connections.
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
2024-05-14 18:16:26 -05:00
Rusty Russell
a9b7402910 pytest: test dropping transient connections.
Requires a hack to exhaust connectd fds and make us close a transient.

Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
2024-05-14 18:16:26 -05:00
Rusty Russell
8268df9a4b connectd: implement "transient" connections.
Currently, anything which doesn't have a live channel is considered transient.
We free this first under stress, and also if they're still connecting.

Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
2024-05-14 18:16:26 -05:00
Rusty Russell
541cc9dd1f connectd: fix exhaustion code where we pick random peer.
If we don't find one searching from our random spot in the peer table,
we're supposed to wrap, not crash!

Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
2024-05-14 18:16:26 -05:00
Rusty Russell
d3dbcf03fa channeld: close an unimportant connection when fds get low.
We use a crude heuristic: if we were trying to contact them, it's a
"deliberate" connection, and should be preserved.

Changelog-Changed: connectd: prioritize peers with channels (and log!) if we run low on file descriptors.
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
2024-05-09 01:23:46 -05:00
Rusty Russell
6a648fd2bc connectd: use hash table, not linked list, for connecting structs.
I thought I was going to want to have a convenient way of counting
these, but it turns out unnecessary.  Still, this is slightly more
efficient and simple, so I am including it.

Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
2024-05-09 01:23:46 -05:00
Rusty Russell
c8c87e2bf6 connectd: log if we fail an accept() call.
This can happen if we're totally out of fds, but previously we gave
no log message indicating this!

Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
2024-05-09 01:23:46 -05:00
Rusty Russell
ba922f9160 lightningd/connectd: remove --experimental-websocket-port
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
Changelog-Removed: Config `experimental-websocket-port` (deprecated 23.08, EOL 24.02)
2024-03-25 15:02:35 +10:30
Rusty Russell
e0e879c003 common: remove type_to_string files altogther.
This means including <common/utils.h> where it was indirectly included.

Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
2024-03-20 13:51:48 +10:30
Rusty Russell
37d22f9141 global: change all type_to_string to fmt_X.
This has the benefit of being shorter, as well as more reliable (you
will get a link error if we can't print it, not a runtime one!).

Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
2024-03-20 13:51:48 +10:30
Rusty Russell
c755dfdfc9 connectd: fix bad assert.
This code was trying to check that the address type is not one of the ADDR_TYPE_TOR*
types, but the is_toraddr() function checks a domain name!  The cast should have been
a clue that this was wrong!

Anyway, wireaddr_to_addrinfo() aborts on these cases already, so the asserts here are
superfluous.

Found in unrelated CI run:

```
Valgrind error file: valgrind-errors.20610
==20610== Conditional jump or move depends on uninitialised value(s)
==20610==    at 0x484ED28: strlen (in /usr/libexec/valgrind/vgpreload_memcheck-amd64-linux.so)
==20610==    by 0x138FA3: is_toraddr (wireaddr.c:344)
==20610==    by 0x11499B: conn_init (connectd.c:729)
==20610==    by 0x28FD73: next_plan (io.c:59)
==20610==    by 0x28FF94: io_new_conn_ (io.c:116)
==20610==    by 0x11531B: try_connect_one_addr (connectd.c:927)
==20610==    by 0x1182A8: try_connect_peer (connectd.c:1781)
==20610==    by 0x11834E: connect_to_peer (connectd.c:1797)
==20610==    by 0x119241: recv_req (connectd.c:2074)
==20610==    by 0x12836F: handle_read (daemon_conn.c:35)
==20610==    by 0x28FD73: next_plan (io.c:59)
==20610==    by 0x2909A8: do_plan (io.c:407)
==20610==
```

Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
2024-02-15 12:07:47 +01:00
Rusty Russell
db6f0da3b3 connectd: separate routine to inject message without closing connection.
We will want this to send private channel_updates direct to peer.

Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
2024-01-31 14:47:33 +10:30
Rusty Russell
25110ff2cc connectd: fix fd leak for --offline.
```
**BROKEN** connectd: dev_report_fds: 5 open but unowned?
```

Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
2023-10-26 12:59:55 +10:30
Rusty Russell
ad7dcf381e lightningd: tell connectd about the custom messages.
We re-send whenever a plugin which allows them starts/finishes.

Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
2023-10-24 11:50:57 +10:30
Rusty Russell
798cf27cb4 connectd: give subds a chance to drain when lightningd says to disconnect.
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
2023-10-23 15:48:50 +10:30
Dusty Daemon
e1ac2410b0 connectd: Ignore sodium fd for Mac tests
On Mac most tests report BROKEN because sodium creating an untracked fd pointing to /dev/random. dev_report_fd’s finds it at tear down and reports a BROKEN message.

We allow a single “char special” fd without reporting it as broken improving QOL for Mac developers.

While we’re here we added the fd mode to the log to help with future rogue fd issues.

ChangeLog-None
2023-10-19 14:31:25 +10:30
Rusty Russell
e11b35cb3a common/memleak: implement callback arg for dump_memleak.
This makes it easier to use outside simple subds, and now lightningd can
simply dump to log rather than returning JSON.

JSON formatting was a lot of work, and we only did it for lightningd, not for
subdaemons.  Easier to use the logs in all cases.

Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
2023-10-03 10:05:55 +02:00
Rusty Russell
0ff91e65dc connectd: remove #if DEVELOPER
We still refuse to run dev commands if lightningd sends it to us
despite us not being in developer mode, but that's mainly paranoia.

Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
2023-09-21 20:08:24 +09:30
Rusty Russell
a9f26b7d07 common/daemon.c: remove #ifdef DEVELOPER in favor of runtime flag.
Also requires us to expose memleak when !DEVELOPER, however we only
ever used the memleak tracking when the LIGHTNINGD_DEV_MEMLEAK
environment variable was set, so keep that.

Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
2023-09-21 20:08:24 +09:30
Rusty Russell
9bc1a020d0 connectd: don't try to keep going if gossipd dies.
We will access the freed connection to gossipd.  This is weird to track
down when the *actual* issue is that gossipd died!

Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
2023-07-09 16:49:48 +09:30
Rusty Russell
a6772e9dec common: add new internal type for websockets.
Now it's not a public type, we need a way to refer to it.

Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
2023-06-01 09:28:39 +09:30
Rusty Russell
3f35d48fe4 common: remove websocket type from wireaddr.
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
2023-06-01 09:28:39 +09:30
Rusty Russell
e66cf46a71 connectd: don't advertise websocket addresses.
I never really liked this hack: websockets are useful, advertizing
them not so much.

Note that we never actually documented that we would advertize these!

Changelog-EXPERIMENTAL: Protocol: Removed support for advertizing websocket addresses in gossip.
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
2023-06-01 09:28:39 +09:30
Rusty Russell
ddb79162ab connectd: clean up add_gossip loops.
This contained cut & paste code, and it wasn't clear to me that
the first loop included DNS entries with IPv6 entries.

Instead, allow the iterator to take multiple types, and use
a switch statement so compile will break as new types are added.

Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
2023-06-01 09:28:39 +09:30
Rusty Russell
cf80f0520a connectd: dev-report-fds to do file descriptor audit.
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
2023-04-10 09:41:56 +09:30
Rusty Russell
3e49cb01bd connectd: don't leak fds if we have both IPv4 and IPv6.
We accept that we will fail to listen if we bind both IPv6 and IPv4 to
the same socket on a dual-stack machine (e.g. normal Linux), but we weren't
closing the fd.

Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
2023-04-10 09:41:56 +09:30
Rusty Russell
ed58c24bc7 connectd: log broken if TCP_CORK fails.
But not if we're a developer using dev_disconnect, which substitutes the fd.

Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
2023-04-10 09:41:56 +09:30
Rusty Russell
295557ac50 connectd: don't try to set TCP_CORK on websocket pipe.
Most of this is piping the flag through so we know it's a websocket!

Reported-by: @ShahanaFarooqui
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
2023-04-10 09:41:56 +09:30
Rusty Russell
b5c614069b connectd: fix crash on freed context for new connections.
ccan/io stores the context pointer for io_new_conn, but we were using
`daemon->listeners` which we reallocate, so it can use a stale pointer.

```
0x3e1700 call_error
	ccan/ccan/tal/tal.c:93
0x3e1700 check_bounds
	ccan/ccan/tal/tal.c:165
0x3e1700 to_tal_hdr
	ccan/ccan/tal/tal.c:174
0x3e1211 to_tal_hdr_or_null
	ccan/ccan/tal/tal.c:186
0x3e1211 tal_alloc_
	ccan/ccan/tal/tal.c:426
0x3db8f4 io_new_conn_
	ccan/ccan/io/io.c:91
0x3dd2e1 accept_conn
	ccan/ccan/io/poll.c:277
0x3dd2e1 io_loop
	ccan/ccan/io/poll.c:444
0x3419fa main
	connectd/connectd.c:2081
```

Fixes: #6060
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
2023-03-06 17:14:22 -06:00
Rusty Russell
2209d0149f connectd: add new start_shutdown message.
We stop listening, and also refuse to send "connectd_peer_spoke" to create
new subdaemons.

Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
2023-02-05 20:40:47 +01:00
Rusty Russell
05ac74fc44 connectd: keep array of our listening sockets.
This allows us to free them if we want to stop listening.

Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
2023-02-05 20:40:47 +01:00
niftynei
0b8ea2299a connectd: patch valgrind error w/ buffers for error msgs
The `tmpctx` is free'd before the error is read out/sent over the wire;
there's a call that will copy the array before sending it, let's use
that instead and take() the object?

------------------------------- Valgrind errors --------------------------------
Valgrind error file: valgrind-errors.2181501
==2181501== Syscall param write(buf) points to unaddressable byte(s)
==2181501==    at 0x49E4077: write (write.c:26)
==2181501==    by 0x1C79A3: do_write (io.c:189)
==2181501==    by 0x1C80AB: do_plan (io.c:394)
==2181501==    by 0x1C81BA: io_ready (io.c:423)
==2181501==    by 0x1CA45B: io_loop (poll.c:453)
==2181501==    by 0x118593: main (connectd.c:2053)
==2181501==  Address 0x4afb158 is 40 bytes inside a block of size 140 free'd
==2181501==    at 0x483F0C3: free (vg_replace_malloc.c:872)
==2181501==    by 0x1D103C: del_tree (tal.c:421)
==2181501==    by 0x1D130A: tal_free (tal.c:486)
==2181501==    by 0x1364B8: clean_tmpctx (utils.c:172)
==2181501==    by 0x1266DD: daemon_poll (daemon.c:87)
==2181501==    by 0x1CA334: io_loop (poll.c:420)
==2181501==    by 0x118593: main (connectd.c:2053)
==2181501==  Block was alloc'd at
==2181501==    at 0x483C855: malloc (vg_replace_malloc.c:381)
==2181501==    by 0x1D0AC5: allocate (tal.c:250)
==2181501==    by 0x1D1086: tal_alloc_ (tal.c:428)
==2181501==    by 0x1D124F: tal_alloc_arr_ (tal.c:471)
==2181501==    by 0x126204: cryptomsg_encrypt_msg (cryptomsg.c:161)
==2181501==    by 0x11335F: peer_connected (connectd.c:318)
==2181501==    by 0x118A8A: peer_init_received (peer_exchange_initmsg.c:135)
==2181501==    by 0x1C751E: next_plan (io.c:59)
==2181501==    by 0x1C8126: do_plan (io.c:407)
==2181501==    by 0x1C8168: io_ready (io.c:417)
==2181501==    by 0x1CA45B: io_loop (poll.c:453)
==2181501==    by 0x118593: main (connectd.c:2053)
==2181501==
{
   <insert_a_suppression_name_here>
   Memcheck:Param
   write(buf)
   fun:write
   fun:do_write
   fun:do_plan
   fun:io_ready
   fun:io_loop
   fun:main
}
--------------------------------------------------------------------------------
2023-02-04 15:31:16 +10:30
Rusty Russell
81e57dce52 connectd: ensure htables are always tal objects.
We want to change the htable allocator to use tal, which will need
this.

Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
2023-01-12 11:44:10 +10:30
Rusty Russell
22eac96750 connectd: don't ask DNS seeds for addresses on every reconnect.
We were stressing the servers if node cannot be found.  Only do lookup
on manual connect commands.

Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
Changelog-Protocol: lightningd: Only use DNS server address lookup on manual `connect` commands, not normal reconnection attempts.
2023-01-03 15:00:27 +10:30
Rusty Russell
701dd3dcef memleak: remove exclusions from memleak_start()
Add memleak_ignore_children() so callers can do exclusions themselves.

Having two exclusions was always such a hack!

Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
2022-09-19 11:34:42 +09:30
Rusty Russell
3380f559f9 memleak: simplify API.
Mainly renaming.

Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
2022-09-19 11:34:42 +09:30
Rusty Russell
2da5244e83 jsonrpc: make error codes an enum.
This allows GDB to print values, but also allows us to use them in
'case' statements.  This wasn't allowed before because they're not
constant terms.

This also made it clear there's a clash between two error codes,
so move one.

Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
Changelog-Changed: JSON-RPC: Error code from bcli plugin changed from 400 to 500.
2022-09-19 10:18:55 +09:30
Michael Schmoock
e0d6f3ceb1 connectd: DNS Bolt7 #911 no longer EXPERIMENTAL
Changelog-Changed: Bolt7 #911 DNS annoucenent support is no longer EXPERIMENTAL
2022-09-13 06:42:20 +09:30
Rusty Russell
22ff007d64 connectd: control connect backoff from lightningd.
We used to tell connectd to remember our connect delay, and hand it
back (increased if necessary).

Instead, simply record when we last tried to connect.  If it was less
than 10 minutes ago, double delay (up to 5 minutes max), otherwise
reset delay to 1 second.

This covers all scenarios: whether we reconnect then immediately
disconnect, or never successfully connect, it doesn't matter.

Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
Fixes: #5453
2022-07-28 15:08:44 +09:30
Rusty Russell
9498e14530 connectd: two logging cleanups.
Don't log_io final messages twice (multiplex_final_message already does
this, so it's confusing to see us send e.g. WIRE_ERROR twice!).

And report that the peer has failed to connect out *before* telling
lightningd, otherwise we get a very confusing ordering, e.g.:

```
2022-07-23T05:17:36.096Z DEBUG   027d0de66d08f956a8d606c0d1c34e59bda38c05a3b1cc738fdd6378716c644997-lightningd: Reconnecting in 4 seconds
2022-07-23T05:17:36.096Z DEBUG   027d0de66d08f956a8d606c0d1c34e59bda38c05a3b1cc738fdd6378716c644997-lightningd: Will try reconnect in 4 seconds
2022-07-23T05:17:36.096Z DEBUG   027d0de66d08f956a8d606c0d1c34e59bda38c05a3b1cc738fdd6378716c644997-connectd: Failed connected out:
```

Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
2022-07-25 15:16:58 -07:00
Rusty Russell
a3c4908f4a lightningd: don't explicitly tell connectd to disconnect, have it do it on sending error/warning.
Connectd already does this when we *receive* an error or warning, but
now do it on send.  This causes some slight behavior change: we don't
disconnect when we close a channel, for example (our behaviour here
has been inconsistent across versions, depending on the code).

When connectd is told to disconnect, it now does so immediately, and
doesn't wait for subds to drain etc.  That simplifies the manual
disconnect case, which now cleans up as it would from any other
disconnection when connectd says it's disconnected.

Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
2022-07-18 20:50:04 -05:00
Rusty Russell
719d1384d1 connectd: give connections a chance to drain when lightningd says to disconnect, or peer disconnects.
We want to avoid lost messages in the common cases.

This generalizes our drain code, by giving the subds each 5 seconds to
close themselves, but continue to allow them to send us traffic (if
peer is still connected) and continue to send them traffic.

We continue to send traffic *out* to the peer (if it's still
connected), until all subds are gone.  We still have a 5 second timer
to close the connection to peer.

On reconnects, we don't do this "drain period" on reconnects: we kill
immediately.

We fix up one test which was looking for the "disconnect" message
explicitly.

Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
2022-07-18 20:50:04 -05:00
Rusty Russell
d31420211a connectd: add counters to each peer connection.
This allows us to detect when lightningd hasn't seen our latest
disconnect/reconnect; in particular, we would hit the following pattern:

1. lightningd says to connect a subd.
2. connectd disconnects and reconnects.
3. connectd reads message, connects subd.
4. lightningd reads disconnect and reconnect, sends msg to connect to subd again.
5. connectd asserts because subd is alreacy connected.

This way connectd can tell if lightningd is talking about the previous
connection, and ignoere it.

Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
2022-07-18 20:50:04 -05:00