subds: fix case where we keep retrying on EOF.

Our low-level ccan/io IO routines return three values:
-1: error.
0: call me again, I'm not finished.
1: I'm done, go onto the next thing.

In the last release, we tweaked the sematics of "-1": we now opportunistically
call a routine which returns 0 once more, in case there's more data.  We use errno to
distinguish between "EAGAIN" which means there wasn't any data, and real errors.

However, if the underlying read() returns 0 (which it does when the peer has closed
the other end) the value of errno is UNDEFINED.  If it happens to be EAGAIN, we will
call it again, rather than closing.  This causes us to spin: in particular people reported
hsmd consuming 100% of CPU.

The ccan/io read code handled this by setting errno to 0 in this case, but our own
wire low-level routines *did not*.

Fixes: https://github.com/ElementsProject/lightning/issues/7655
Changelog-Fixed: Fixed intermittant bug where hsmd (particularly, but also lightningd) could use 100% CPU.
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
This commit is contained in:
Rusty Russell 2024-09-13 09:53:28 +09:30 committed by ShahanaFarooqui
parent c5fc1b55d8
commit 5bd3d51131

View File

@ -27,8 +27,12 @@ static int do_read_wire_header(int fd, struct io_plan_arg *arg)
u8 *p = *(u8 **)arg->u1.vp;
ret = read(fd, p + len, HEADER_LEN - len);
if (ret <= 0)
if (ret <= 0) {
/* Errno isn't set if we hit EOF, so set it to distinct value */
if (ret == 0)
errno = 0;
return -1;
}
arg->u2.s += ret;
/* Length bytes read? Set up for normal read of data. */
@ -61,8 +65,12 @@ static int do_read_wire(int fd, struct io_plan_arg *arg)
/* Normal read */
ret = read(fd, arg->u1.cp, arg->u2.s);
if (ret <= 0)
if (ret <= 0) {
/* Errno isn't set if we hit EOF, so set it to distinct value */
if (ret == 0)
errno = 0;
return -1;
}
arg->u1.cp += ret;
arg->u2.s -= ret;