Commit graph

13 commits

Author SHA1 Message Date
Rusty Russell
183da392ca connectd: increase queue length to 250,000.
The original complaint which caused my investigation was the 100% CPU
consumption of connectd, which we traced to the queue to gossipd.

However, the issue is not really connectd's overproduction, but
gossipd's underconsumption, probably caused by its own queueing issues
with the trace messages to lightningd, which the prior patch fixed.

Nonetheless, gossipd *can* get busy, and if we were to ask multiple
nodes for full gossip, we could see a few hundred thousand messages
come it at once.  Hence I'm increasing the warning limit to 250,000
messages.

This commit is also where we attach the Changelog message, even
though it's really "common/msg_queue: use membuf for greater efficiency."
and "gossipd: fix excessive msg_queue length from status_trace()" which
solved the problem.

Here's the backtrace from a previous debug patch:

```
lightning_connectd: msg_queue length excessive (version v24.08.1-17-ga780ad4-modded)
0x5580534051f0 send_backtrace
        common/daemon.c:33
0x55805340bd5b do_enqueue
        common/msg_queue.c:66
0x55805340bde5 msg_enqueue
        common/msg_queue.c:82
0x5580534057ce daemon_conn_send
        common/daemon_conn.c:161
0x5580533fe3ff handle_gossip_in
        connectd/multiplex.c:624
0x5580533ff23b handle_message_locally
        connectd/multiplex.c:763
0x5580533ff2d6 read_body_from_peer_done
        connectd/multiplex.c:1112
```

Reported-by: https://github.com/JssDWt
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
Changelog-Fixed: `connectd` and `gossipd` message queues are much more efficient.
2024-11-01 16:54:49 +10:30
Rusty Russell
1e4adbff17 common/msg_queue: send backtrace on oversize queues.
Scary looking, but great for debugging!

Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
2024-11-01 16:54:49 +10:30
Rusty Russell
1051e97d69 common/msg_queue: use membuf for greater efficiency.
Based on CPU consumption in memmove with the current naive approach.

Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
2024-11-01 16:54:49 +10:30
Rusty Russell
6f8754a72b common/msg_queue: make sure to close any pending fds on destruction.
We only hand fds to connectd for now, so this doesn't happen (we don't
destroy that queue except on shutdown).  But it's still a potential issue.

Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
2024-07-09 18:03:44 +09:30
Rusty Russell
d51fb5207a msg_queue: don't allow magic MSG_PASS_FD message for peers.
msg_queue was originally designed for inter-daemon comms, and so it has
a special mechanism to mark that we're trying to send an fd.  Unfortunately,
a peer could also send such a message, confusing us!

Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
2022-01-20 15:24:06 +10:30
Rusty Russell
06a54606a3 check-includes: allow redundant "config.h"
We should actually be including this (as it may define _GNU_SOURCE
etc) before any system headers.  But where we include <assert.h> we
often didn't, because check-includes would complain that the headers
included it too.

Weaken that check, and include config.h in C files before assert.h.

Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
2021-02-04 12:02:36 +10:30
Rusty Russell
2aad3ffcf8 common: tal_dup_talarr() helper.
This is a common thing to do, so create a macro.

Unfortunately, it still needs the type arg, because the paramter may
be const, and the return cannot be, and C doesn't have a general
"(-const)" cast.

Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
2020-02-27 14:16:16 +10:30
Rusty Russell
ec658c1f89 status: suppress status_trace/status_debug messages if queue too long.
We can be spammy, which is good for tests, but bad for our simple message queue.
This avoids breaking our tests but also avoid the worst case (1M entries and counting!)
for gossip status messages in the case where we're syncing a large peer.

Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
2019-05-03 11:42:48 +02:00
Rusty Russell
26dda57cc0 utils: make tal_arr_expand safer.
Christian and I both unwittingly used it in form:

	*tal_arr_expand(&x) = tal(x, ...)

Since '=' isn't a sequence point, the compiler can (and does!) cache
the value of x, handing it to tal *after* tal_arr_expand() moves it
due to tal_resize().

The new version is somewhat less convenient to use, but doesn't have
this problem, since the assignment is always evaluated after the
resize.

Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
2019-01-15 12:01:38 +01:00
Rusty Russell
3e2dea221b common/msg_queue: make it a tal object.
This way there's no need for a context pointer, and freeing a msg_queue
frees its contents, as expected.

Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
2018-10-29 04:06:16 +00:00
Rusty Russell
96f05549b2 common/utils.h: add tal_arr_expand helper.
We do this a lot, and had boutique helpers in various places.  So add
a more generic one; for convenience it returns a pointer to the new
end element.

I prefer the name tal_arr_expand to tal_arr_append, since it's up to
the caller to populate the new array entry.

Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
2018-09-27 22:57:19 +02:00
Rusty Russell
5cf34d6618 Remove tal_len, use tal_count() or tal_bytelen().
tal_count() is used where there's a type, even if it's char or u8, and
tal_bytelen() is going to replace tal_len() for clarity: it's only needed
where a pointer is void.

We shim tal_bytelen() for now.

Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
2018-07-30 11:31:17 +02:00
Rusty Russell
a37c165cb9 common: move some files out of lightningd/
Basically all files shared by different daemons.

Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
2017-08-29 17:54:14 +02:00
Renamed from lightningd/msg_queue.c (Browse further)