Because our monotonic time interface doesn't play well with value set to 0,
always initialize to now() the scheduler_last_run at init() of the KIST
scheduler.
Fixes#23696
Signed-off-by: David Goulet <dgoulet@torproject.org>
When a channel is scheduled and flush cells returns 0 that is no cells to
flush, we flag it back in waiting for cells so it doesn't get stuck in a
possible infinite loop.
It has been observed on moria1 where a closed channel end up in the scheduler
where the flush process returned 0 cells but it was ultimately kept in the
scheduling loop forever. We suspect that this is due to a more deeper problem
in tor where the channel_more_to_flush() is actually looking at the wrong
queue and was returning 1 for an empty channel thus putting the channel in the
"Case 4" of the scheduler which is to go back in pending state thus
re-considered at the next iteration.
This is a fix that allows the KIST scheduler to recover properly from a not
entirelly diagnosed problem in tor.
Fixes#23676
Signed-off-by: David Goulet <dgoulet@torproject.org>
When we added single_conn_free_bytes(), we cleared the outbuf on a
connection without setting outbuf_flushlen() to 0. This could cause
an assertion failure later on in flush_buf().
Fixes bug 23690; bugfix on 0.2.6.1-alpha.
This caused a BUG log when we noticed that the circuit had no
channel. The likeliest culprit for exposing that behavior is
d769cab3e5, where we made circuit_mark_for_close() NULL out
the n_chan and p_chan fields of the circuit.
Fixes bug 8185; bugfix on 0.2.5.4-alpha, I think.
My current theory is that this is just a marked circuit that hasn't
closed yet, but let's gather more information in case that theory is
wrong.
Diagnostic for 8185.
If 6 SOCKS requests are opened at once, it would have triggered 6 fetches
which ultimately poke all 6 HSDir. We don't want that, if we have multiple
SOCKS requests for the same service, do one fetch only.
Signed-off-by: David Goulet <dgoulet@torproject.org>
When purging last HSDir requests, we used time(NULL) for computing the
service blinded key but in all other places in our codebase we actually
use the consensus times. That can cause wrong behavior if the consensus
is in a different time period than time(NULL).
This commit is required for proper purging of HSDir requests.
The confparse field has type UINT, which corresponds to an int
type. We had uint32_t.
This shouldn't cause trouble in practice, since int happens to
4-bytes wide on every platform where an authority is running. It's
still wrong, though.
These should have been int, but we had listed them as unsigned.
That's an easy mistake to make, since "int" corresponds with either
INT or UINT in the configuration file.
This bug cannot have actually caused a problem in practice, since we
check those fields' values on load, and ensure that they are in
range 0..INT32_MAX.
New approach, suggested by Taylor: During testing builds, we
initialize a union member of an appropriate pointer type with the
address of the member field we're trying to test, so we can make
sure that the compiler doesn't warn.
My earlier approach invoked undefined behavior.
Also demote a log message that can occur under natural causes
(if the circuit subsystem is missing descriptors/consensus etc.).
The HS subsystem will naturally retry to connect to intro points,
so no need to make that log user-facing.
So we can track them more easily in the logs and match any open/close/free
with those identifiers.
Part of #23645
Signed-off-by: David Goulet <dgoulet@torproject.org>
This removes the "nickname" of the cannibalized circuit last hop as it is
useless. It now logs the n_circ_id and global identifier so we can match it
with other logging statement.
Signed-off-by: David Goulet <dgoulet@torproject.org>
Prior to the log statement, the circuit n_circ_id value is zeroed so keep a
copy so we can log it at the end.
Part of #23645
Signed-off-by: David Goulet <dgoulet@torproject.org>
Make the "Exit" flag assignment only depend on whether the exit
policy allows connections to ports 80 and 443. Previously relays
would get the Exit flag if they allowed connections to one of
these ports and also port 6667.
Resolves ticket 23637.
Back in 0.2.4.3-alpha (e106812a77), when we switched from using
double to using uint64 for selecting by bandwidth, I got the math
wrong: I should have used llround(x), or (uint64_t)(x+0.5), but
instead I wrote llround(x+0.5). That means we would always round
up, rather than rounding to the closest integer
Fixes bug 23318; bugfix on 0.2.4.3-alpha.
The is_first_hop field should have been called used_create_fast,
but everywhere that we wanted to check it, we should have been
checking channel_is_client() instead.
The diff is confusing, but were two static scheduler functions that
needed moving to static comment block.
No code change. Thanks dgoulet for original commit