Commit graph

13 commits

Author SHA1 Message Date
Rusty Russell
5bd3d51131 subds: fix case where we keep retrying on EOF.
Our low-level ccan/io IO routines return three values:
-1: error.
0: call me again, I'm not finished.
1: I'm done, go onto the next thing.

In the last release, we tweaked the sematics of "-1": we now opportunistically
call a routine which returns 0 once more, in case there's more data.  We use errno to
distinguish between "EAGAIN" which means there wasn't any data, and real errors.

However, if the underlying read() returns 0 (which it does when the peer has closed
the other end) the value of errno is UNDEFINED.  If it happens to be EAGAIN, we will
call it again, rather than closing.  This causes us to spin: in particular people reported
hsmd consuming 100% of CPU.

The ccan/io read code handled this by setting errno to 0 in this case, but our own
wire low-level routines *did not*.

Fixes: https://github.com/ElementsProject/lightning/issues/7655
Changelog-Fixed: Fixed intermittant bug where hsmd (particularly, but also lightningd) could use 100% CPU.
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
2024-09-13 11:13:53 -07:00
Rusty Russell
eb1c04ef14 wire: optimize sending packets now we can return EAGAIN.
Interestingly, this patch and the last take my total test time on my
build machine down by about 10%.  From 403.93s (0:06:43) to 366.64s (0:06:06)
though there were some unrelated errors.

Changelog-Changed: connectd: I/O optimizations to significantly speed up larger nodes.
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
2024-07-02 15:16:55 -07:00
Rusty Russell
4ffda340d3 check: make sure all files outside contrib/ include "config.h" first.
And turn "" includes into full-path (which makes it easier to put
config.h first, and finds some cases check-includes.sh missed
previously).

config.h sets _GNU_SOURCE which really needs to be done before any
'#includes': we mainly got away with it with glibc, but other platforms
like Alpine may have stricter requirements.

Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
2021-12-06 10:05:39 +10:30
Rusty Russell
7401b26824 cleanup: remove unneeded includes in C files.
Before:
 Ten builds, laptop -j5, no ccache:

```
real	0m36.686000-38.956000(38.608+/-0.65)s
user	2m32.864000-42.253000(40.7545+/-2.7)s
sys	0m16.618000-18.316000(17.8531+/-0.48)s
```

 Ten builds, laptop -j5, ccache (warm):

```
real	0m8.212000-8.577000(8.39989+/-0.13)s
user	0m12.731000-13.212000(12.9751+/-0.17)s
sys	0m3.697000-3.902000(3.83722+/-0.064)s
```

After:
 Ten builds, laptop -j5, no ccache: 8% faster

```
real	0m33.802000-35.773000(35.468+/-0.54)s
user	2m19.073000-27.754000(26.2542+/-2.3)s
sys	0m15.784000-17.173000(16.7165+/-0.37)s
```

 Ten builds, laptop -j5, ccache (warm): 1% faster

```
real	0m8.200000-8.485000(8.30138+/-0.097)s
user	0m12.485000-13.100000(12.7344+/-0.19)s
sys	0m3.702000-3.889000(3.78787+/-0.056)s
```

Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
2021-09-17 09:43:22 +09:30
Rusty Russell
2aad3ffcf8 common: tal_dup_talarr() helper.
This is a common thing to do, so create a macro.

Unfortunately, it still needs the type arg, because the paramter may
be const, and the return cannot be, and C doesn't have a general
"(-const)" cast.

Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
2020-02-27 14:16:16 +10:30
Rusty Russell
5cf34d6618 Remove tal_len, use tal_count() or tal_bytelen().
tal_count() is used where there's a type, even if it's char or u8, and
tal_bytelen() is going to replace tal_len() for clarity: it's only needed
where a pointer is void.

We shim tal_bytelen() for now.

Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
2018-07-30 11:31:17 +02:00
practicalswift
98f49c0837 Remove include in file foo.c that is already included in foo.h 2018-03-25 23:54:21 +00:00
Rusty Russell
29b83aed2a wire: restore BE endian to wire headers for internal messages.
We don't anticipate daemons across machines, but you never know.

Suggested-by: Christian Decker
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
2017-09-29 14:40:34 +02:00
Rusty Russell
3d316518fd wire: use 26-bit lengths for inter-daemon messaging.
Fixes: #289
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
2017-09-29 14:40:34 +02:00
Christian Decker
6154020f67 fix: Header include order 2017-05-19 14:06:44 +02:00
Rusty Russell
55510ea27a io_write_wire: always make a copy (or steal if take).
I caught the gossip daemon freeing a message, while it was queued to be
written.  Using tal_dup_arr() is the Right Thing, as it handles taken()
properly automatically.

------------------------------- Valgrind errors --------------------------------
Valgrind error file: /tmp/lightning-rvc7d5oi/test_forward/lightning-3/valgrind-errors
==11057== Invalid read of size 8
==11057==    at 0x1328F2: to_tal_hdr (tal.c:174)
==11057==    by 0x133894: tal_len (tal.c:659)
==11057==    by 0x11BBE7: do_write_wire (wire_io.c:103)
==11057==    by 0x127B95: do_plan (io.c:369)
==11057==    by 0x127C31: io_ready (io.c:390)
==11057==    by 0x129461: io_loop (poll.c:295)
==11057==    by 0x10CBB4: main (gossip.c:722)
==11057==  Address 0x55a99d8 is 24 bytes inside a block of size 200 free'd
==11057==    at 0x4C2ED5B: free (in /usr/lib/valgrind/vgpreload_memcheck-amd64-linux.so)
==11057==    by 0x133000: del_tree (tal.c:416)
==11057==    by 0x132F77: del_tree (tal.c:405)
==11057==    by 0x13333E: tal_free (tal.c:504)
==11057==    by 0x1123F1: queue_broadcast (broadcast.c:38)
==11057==    by 0x111EB0: handle_node_announcement (routing.c:918)
==11057==    by 0x10B166: handle_gossip_msg (gossip.c:170)
==11057==    by 0x10B76B: owner_msg_in (gossip.c:335)
==11057==    by 0x12712E: next_plan (io.c:59)
==11057==    by 0x127BD0: do_plan (io.c:376)
==11057==    by 0x127C09: io_ready (io.c:386)
==11057==    by 0x129461: io_loop (poll.c:295)
==11057==  Block was alloc'd at
==11057==    at 0x4C2DB2F: malloc (in /usr/lib/valgrind/vgpreload_memcheck-amd64-linux.so)
==11057==    by 0x132AE7: allocate (tal.c:245)
==11057==    by 0x1330A3: tal_alloc_ (tal.c:443)
==11057==    by 0x1332A6: tal_alloc_arr_ (tal.c:491)
==11057==    by 0x133FEC: tal_dup_ (tal.c:846)
==11057==    by 0x112347: new_queued_message (broadcast.c:20)
==11057==    by 0x11240B: queue_broadcast (broadcast.c:43)
==11057==    by 0x111EB0: handle_node_announcement (routing.c:918)
==11057==    by 0x10B166: handle_gossip_msg (gossip.c:170)
==11057==    by 0x10B76B: owner_msg_in (gossip.c:335)
==11057==    by 0x12712E: next_plan (io.c:59)
==11057==    by 0x127BD0: do_plan (io.c:376)

Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>

wire_io: make a copy in io_write_wire (unless taken()).

I hit a corner case where gossipd freed a duplicate while it was being
sent out; this kind of thing doesn't happen if io_write_wire() makes
a copy by default.

We also do a memcheck() here; this gives us a caller in the backtrace
if there are uninitialized bytes, rather than waiting until the write
which happens later.

Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
2017-05-19 13:30:32 +02:00
Rusty Russell
df7b68eea9 wire/wire_io: support take() arg to io_write_wire().
This simplifies memory management, but means we have to keep the
original pointer.

Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
2017-03-10 21:45:55 +10:30
Rusty Russell
a10ea23e0d wire: add ccan/io helpers to send (unencrypted) messages between damones.
Format is "le16 len; u8 message[len]" same as wire format specified in
BOLTs, even though the endian conversion is overkill for local messages.

Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
2017-01-10 15:22:20 +10:30