Default goes to stderr for LOG_UNUSUAL and higher.
We have to whitelist more cases in map_catchup so we don't spam the logs
with perfectly-expected (but ignored) messages though.
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
```
Program received signal SIGSEGV, Segmentation fault.
0x000000001014e9d8 in io_set_finish_ (conn=0x0, finish=0x0, arg=0x0) at ccan/ccan/io/io.c:137
137 conn->finish = finish;
(gdb) bt
incoming=true) at connectd/connectd.c:394
```
Fixes: #7871
Reported-by: grubles
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
Changelog-None: broken in this release
We wait until a connection fails, or a subd is connected to the peer,
before letting another one through. This should prevent us from
overwhelming lightningd on large nodes, but unlike the previous back-off,
it's based on how fast lightningd is, not an arbitrary time.
We also let one through each second, in case we're connecting to many,
but not doing anything but gossip (e.g. 100 explicit connect
commands).
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
Changelog-Changed: Reconnecting to peers at startup should be significantly faster (dependent on machine speed).
Rather than have lightningd call us repeatedly to try to connect, have
it tell us what peers are transient and aren't, and connectd will
automatically try to maintain that connection.
There's a new "downgrade_peer" message to tell it a peer is now
transient: to make it non-transient we simply tell connectd to
connect as a non-transient.
The first time, I missed that dual_open_control does its own state
transitions :(
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
Changelog-Changed: `connectd` now handles maintaining/reconnecting to important peers, and we remember the last successful address we connected to.
Let lightningd feed us hints to try first, but we can extract the
addresses from node_announcement messages ourselves.
(Lightningd used to ask gossipd on our behalf: this is far simpler!)
One side effect of this is that we don't hand back address hints given to us
by lightningd: it would use these again for reconnecting. This is breaks
test_sendpay_grouping, so we disable it temporarily.
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
However fast we can handle them, it's antisocial to allow others to
make us spam the rest of the network.
Changelog-Protocol: onion messages: we limit incoming to 4 per second, allowing a little burst.
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
Unlike "sendonionmessage" which instructs us to send to a peer, this
process it locally (presumably, it contains the next hop). This is
useful because it allows us to process an onion message which starts
with us (a legal case for a blinded path supplied by someone else!).
It also opens the door to bolt12 self-pay.
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
This basically means moving the code from gossipd to connectd to handle
these queries.
This will get connectd have finer control over ratelimiting them.
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
This is more efficient in a few ways:
1. It's trivial to get to the end of the gossip_store, we don't have
to iterate.
2. It tends to be mmaped so we don't have to call pread().
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
We currently stream gossip as fast as we can, even if they start at
timestamp 0. Instead, use a simple token bucket filter and only let
them have 1MB per second (500 bytes per second for testing).
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
Changelog-Protocol: connectd: we now throttle outgoing gossip at 1MB/second per peer.
Currently, anything which doesn't have a live channel is considered transient.
We free this first under stress, and also if they're still connecting.
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
If we don't find one searching from our random spot in the peer table,
we're supposed to wrap, not crash!
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
We use a crude heuristic: if we were trying to contact them, it's a
"deliberate" connection, and should be preserved.
Changelog-Changed: connectd: prioritize peers with channels (and log!) if we run low on file descriptors.
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
I thought I was going to want to have a convenient way of counting
these, but it turns out unnecessary. Still, this is slightly more
efficient and simple, so I am including it.
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
This can happen if we're totally out of fds, but previously we gave
no log message indicating this!
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
This has the benefit of being shorter, as well as more reliable (you
will get a link error if we can't print it, not a runtime one!).
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
This code was trying to check that the address type is not one of the ADDR_TYPE_TOR*
types, but the is_toraddr() function checks a domain name! The cast should have been
a clue that this was wrong!
Anyway, wireaddr_to_addrinfo() aborts on these cases already, so the asserts here are
superfluous.
Found in unrelated CI run:
```
Valgrind error file: valgrind-errors.20610
==20610== Conditional jump or move depends on uninitialised value(s)
==20610== at 0x484ED28: strlen (in /usr/libexec/valgrind/vgpreload_memcheck-amd64-linux.so)
==20610== by 0x138FA3: is_toraddr (wireaddr.c:344)
==20610== by 0x11499B: conn_init (connectd.c:729)
==20610== by 0x28FD73: next_plan (io.c:59)
==20610== by 0x28FF94: io_new_conn_ (io.c:116)
==20610== by 0x11531B: try_connect_one_addr (connectd.c:927)
==20610== by 0x1182A8: try_connect_peer (connectd.c:1781)
==20610== by 0x11834E: connect_to_peer (connectd.c:1797)
==20610== by 0x119241: recv_req (connectd.c:2074)
==20610== by 0x12836F: handle_read (daemon_conn.c:35)
==20610== by 0x28FD73: next_plan (io.c:59)
==20610== by 0x2909A8: do_plan (io.c:407)
==20610==
```
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
On Mac most tests report BROKEN because sodium creating an untracked fd pointing to /dev/random. dev_report_fd’s finds it at tear down and reports a BROKEN message.
We allow a single “char special” fd without reporting it as broken improving QOL for Mac developers.
While we’re here we added the fd mode to the log to help with future rogue fd issues.
ChangeLog-None
This makes it easier to use outside simple subds, and now lightningd can
simply dump to log rather than returning JSON.
JSON formatting was a lot of work, and we only did it for lightningd, not for
subdaemons. Easier to use the logs in all cases.
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
We still refuse to run dev commands if lightningd sends it to us
despite us not being in developer mode, but that's mainly paranoia.
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
Also requires us to expose memleak when !DEVELOPER, however we only
ever used the memleak tracking when the LIGHTNINGD_DEV_MEMLEAK
environment variable was set, so keep that.
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
We will access the freed connection to gossipd. This is weird to track
down when the *actual* issue is that gossipd died!
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
I never really liked this hack: websockets are useful, advertizing
them not so much.
Note that we never actually documented that we would advertize these!
Changelog-EXPERIMENTAL: Protocol: Removed support for advertizing websocket addresses in gossip.
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
This contained cut & paste code, and it wasn't clear to me that
the first loop included DNS entries with IPv6 entries.
Instead, allow the iterator to take multiple types, and use
a switch statement so compile will break as new types are added.
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
We accept that we will fail to listen if we bind both IPv6 and IPv4 to
the same socket on a dual-stack machine (e.g. normal Linux), but we weren't
closing the fd.
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
Most of this is piping the flag through so we know it's a websocket!
Reported-by: @ShahanaFarooqui
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
ccan/io stores the context pointer for io_new_conn, but we were using
`daemon->listeners` which we reallocate, so it can use a stale pointer.
```
0x3e1700 call_error
ccan/ccan/tal/tal.c:93
0x3e1700 check_bounds
ccan/ccan/tal/tal.c:165
0x3e1700 to_tal_hdr
ccan/ccan/tal/tal.c:174
0x3e1211 to_tal_hdr_or_null
ccan/ccan/tal/tal.c:186
0x3e1211 tal_alloc_
ccan/ccan/tal/tal.c:426
0x3db8f4 io_new_conn_
ccan/ccan/io/io.c:91
0x3dd2e1 accept_conn
ccan/ccan/io/poll.c:277
0x3dd2e1 io_loop
ccan/ccan/io/poll.c:444
0x3419fa main
connectd/connectd.c:2081
```
Fixes: #6060
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
The `tmpctx` is free'd before the error is read out/sent over the wire;
there's a call that will copy the array before sending it, let's use
that instead and take() the object?
------------------------------- Valgrind errors --------------------------------
Valgrind error file: valgrind-errors.2181501
==2181501== Syscall param write(buf) points to unaddressable byte(s)
==2181501== at 0x49E4077: write (write.c:26)
==2181501== by 0x1C79A3: do_write (io.c:189)
==2181501== by 0x1C80AB: do_plan (io.c:394)
==2181501== by 0x1C81BA: io_ready (io.c:423)
==2181501== by 0x1CA45B: io_loop (poll.c:453)
==2181501== by 0x118593: main (connectd.c:2053)
==2181501== Address 0x4afb158 is 40 bytes inside a block of size 140 free'd
==2181501== at 0x483F0C3: free (vg_replace_malloc.c:872)
==2181501== by 0x1D103C: del_tree (tal.c:421)
==2181501== by 0x1D130A: tal_free (tal.c:486)
==2181501== by 0x1364B8: clean_tmpctx (utils.c:172)
==2181501== by 0x1266DD: daemon_poll (daemon.c:87)
==2181501== by 0x1CA334: io_loop (poll.c:420)
==2181501== by 0x118593: main (connectd.c:2053)
==2181501== Block was alloc'd at
==2181501== at 0x483C855: malloc (vg_replace_malloc.c:381)
==2181501== by 0x1D0AC5: allocate (tal.c:250)
==2181501== by 0x1D1086: tal_alloc_ (tal.c:428)
==2181501== by 0x1D124F: tal_alloc_arr_ (tal.c:471)
==2181501== by 0x126204: cryptomsg_encrypt_msg (cryptomsg.c:161)
==2181501== by 0x11335F: peer_connected (connectd.c:318)
==2181501== by 0x118A8A: peer_init_received (peer_exchange_initmsg.c:135)
==2181501== by 0x1C751E: next_plan (io.c:59)
==2181501== by 0x1C8126: do_plan (io.c:407)
==2181501== by 0x1C8168: io_ready (io.c:417)
==2181501== by 0x1CA45B: io_loop (poll.c:453)
==2181501== by 0x118593: main (connectd.c:2053)
==2181501==
{
<insert_a_suppression_name_here>
Memcheck:Param
write(buf)
fun:write
fun:do_write
fun:do_plan
fun:io_ready
fun:io_loop
fun:main
}
--------------------------------------------------------------------------------