subds: fix case where we keep retrying on EOF.

Our low-level ccan/io IO routines return three values: -1: error. 0: call me again, I'm not finished. 1: I'm done, go onto the next thing. In the last release, we tweaked the sematics of "-1": we now opportunistically call a routine which returns 0 once more, in case there's more data. We use errno to distinguish between "EAGAIN" which means there wasn't any data, and real errors. However, if the underlying read() returns 0 (which it does when the peer has closed the other end) the value of errno is UNDEFINED. If it happens to be EAGAIN, we will call it again, rather than closing. This causes us to spin: in particular people reported hsmd consuming 100% of CPU. The ccan/io read code handled this by setting errno to 0 in this case, but our own wire low-level routines *did not*. Fixes: https://github.com/ElementsProject/lightning/issues/7655 Changelog-Fixed: Fixed intermittant bug where hsmd (particularly, but also lightningd) could use 100% CPU. Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
2025-03-12 02:08:15 +01:00 · 2024-09-13 09:53:28 +09:30 · 2024-09-13 09:53:28 +09:30 · 5bd3d51131
commit 5bd3d51131
parent c5fc1b55d8
1 changed files with 10 additions and 2 deletions
--- a/wire/wire_io.c
+++ b/wire/wire_io.c
@ -27,8 +27,12 @@ static int do_read_wire_header(int fd, struct io_plan_arg *arg)
 	u8 *p = *(u8 **)arg->u1.vp;

 	ret = read(fd, p + len, HEADER_LEN - len);
-	if (ret <= 0)
+	if (ret <= 0) {
+		/* Errno isn't set if we hit EOF, so set it to distinct value */
+		if (ret == 0)
+			errno = 0;
 		return -1;
+	}
 	arg->u2.s += ret;

 	/* Length bytes read?  Set up for normal read of data. */
@ -61,8 +65,12 @@ static int do_read_wire(int fd, struct io_plan_arg *arg)

 	/* Normal read */
 	ret = read(fd, arg->u1.cp, arg->u2.s);
-	if (ret <= 0)
+	if (ret <= 0) {
+		/* Errno isn't set if we hit EOF, so set it to distinct value */
+		if (ret == 0)
+			errno = 0;
 		return -1;
+	}

 	arg->u1.cp += ret;
 	arg->u2.s -= ret;