While the `k=8` value worked for the current main network tests with the
amounts in those tests, it wasn't robust across a wider range of values
(as demonstrated when other test changes broke tests!).
Time to do this properly: calculate the ratio at the time we combine them,
using median values.
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
Even after the previous fix, we still occasionally increase fees when my increases.
This is due to the difference between MCF's linear fees, and actual fees, and
is unavoidable, but add a check if it somehow happens.
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
I noticed this in the logs:
plugin-cln-askrene: notify msg unusual: The flows had a fee of 151950msat, greater than max of 53697msat, retrying with mu of 10%...
plugin-cln-askrene: notify msg unusual: The flows had a fee of 220126msat, greater than max of 53697msat, retrying with mu of 20%...
We would expect increasing mu to *reduce* the fee!
Turns out that our linear fee is a bad terrible approximation, because I
was using base_fee_penalty of 10.0.
|
| / __ <- real fee, with base: fee = base + propfee * amount.
| / __/
| _//
| __/
| __/_/
|/ _/
| _/ <- linearized fee: fee = linear * amount
|/
+-----------------------------------
These cross over where linear = propfee + base / amount. Assume we split the
payment into 10 parts, this implies that the base_fee_penalty should be 10 / amount
(this gives a slight penalty to the normal case, but that's ok).
This gives better results, too: we get down to 650099 sats in fees, vs 801613
before.
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
During "test_real_data", then only successes with reduced fees were 92 on "mu=10", and only
1 on "mu=30": the rest went to mu=100 and failed.
I tried numerous approaches, and in the end, opted for the simplest:
The typical range of probability costs looks likes:
min = 0, max = 924196240, mean = 10509.4, stddev = 1.9e+06
The typical range of linear fee costs looks like:
min = 0, max = 101000000, mean = 81894.6, stddev = 2.6e+06
This implies a k factor of 8 makes the two comparable.
This makes the two numbers comparable, and thus makes "mu" much more
effective. Here are the number of different mu values we succeeded at:
87 mu=0
90 mu=10
42 mu=20
24 mu=30
17 mu=40
19 mu=50
19 mu=60
11 mu=70
95 mu=80
19 mu=90
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
The current prob_cost_factor setting does not seem to make mu very
effective, in fact, it gives strange results:
plugin-cln-askrene: notify msg unusual: The flows had a fee of 151950msat, greater than max of 53697msat, retrying with mu of 10%...
plugin-cln-askrene: notify msg unusual: The flows had a fee of 220126msat, greater than max of 53697msat, retrying with mu of 20%...
We would expect increasing mu to *reduce* the fee!
As a first step, simplify (it can't be infinite, and the -1 are weird).
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
We ask it again, but reduce fees by 1msat from the previous answer.
This is really nasty, as it frequently exercises the case where we
only go over fee when we do the refinement step.
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
I tested with a really large gossmap (hacked to be 4GB), and when we
keep retrying to minimize cost (calling minflow 11 times), and we
don't free tmpctx.
Due to an issue with how gossmap estimates the index sizes, we ended
up running out of memory. This fixes it.
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
Flow cycles can occur if we have arc zero arc costs.
The previous path construction from the flow in the network assumed the
absence of such cycles and would enter an infinite loop if it hit one.
With his patch wee add cycle detection and removal during the path
construction phase.
Reported-by: Rusty Russell <rusty@rustcorp.com.au>
Signed-off-by: Lagrang3 <lagrang3@protonmail.com>
Changelog-EXPERIMENTAL: `askrene` infinite loop fixed
This happens in the coming "real network" test!
We add fees and hit htlc_max, but don't have another flow to add to.
Rather than MCF again, we split the flow into two.
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
The fp16_t values are approximations (overestimate for htlc_max,
underestimate for htlc_min), so in the refinement step we should use
the exact values.
This also fixes a logic bug: flow_remaining_capacity returned the
total capacity, not the additional capacity!
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
Changelog-EXPERIMENTAL: `askrene` now honors exact htlc_maximum_msat limits.
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
Changelog-EXPERIMENTAL: `getroutes` now applies `auto.sourcefree` layer in the order specified, so doesn't alter channels changed in later layers.
Rather than adding to the gossmap modifications directly, populate
the layer and have the normal layer application logic do it.
This is consistent when we query layers in the next patch.
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
And unify logging for better debugging.
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
Changelog-EXPERIMENTAL: `askrene` now has better logging, gives notifications of progress.
1. describe_disabled should point out if node itself is disabled.
2. Hoist constraint check for neater if branching.
3. Use amount_msat_max/min for greater clarity.
4. Simply disable channels, don't zero htlc_min/max when node disabled.
I also fixed the diagnostic of htlc_max correctly, which removes a FIXME.
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
The code is a bit too complex for gcc to track it:
```
In file included from ccan/ccan/tal/str/str.h:7,
from plugins/askrene/askrene.c:11:
plugins/askrene/askrene.c: In function ‘do_getroutes’:
ccan/ccan/tal/tal.h:324:23: error: ‘routes’ may be used uninitialized in this function [-Werror=maybe-uninitialized]
324 | #define tal_count(p) (tal_bytelen(p) / sizeof(*p))
| ^~~~~~~~~~~
plugins/askrene/askrene.c:476:24: note: ‘routes’ was declared here
476 | struct route **routes;
| ^~~~~~
plugins/askrene/askrene.c:475:29: error: ‘amounts’ may be used uninitialized in this function [-Werror=maybe-uninitialized]
475 | struct amount_msat *amounts;
| ^~~~~~~
plugins/askrene/askrene.c:488:69: error: ‘probability’ may be used uninitialized in this function [-Werror=maybe-uninitialized]
488 | json_add_u64(response, "probability_ppm", (u64)(probability * 1000000));
| ~~~~~~~~~~~~~^~~~~~~~~~
cc plugins/askrene/dijkstra.c
cc1: all warnings being treated as errors
```
On my local machine, it also warns in param_dev_channel, so I fixed that too.
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
This turns out to be critical for users: also stops them from
bothering us when their node is offline or has insufficient capacity!
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
Lagrang3 points out it's less useful (when we time them out), and probably
a premature optimization anyway.
Suggested-by: Lagrang3 <lagrang3@protonmail.com>
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
This allows for explicit partial updates to channels (e.g. just change
fees, or just disable) without haveing to set the other fields.
This generalizes askrene-disable-channel, which is removed.
We also take the chance to use the proper BOLT 7 terms in the API:
- htlc_minimum_msat
- htlc_maximum_msat
- cltv_expiry_delta
- fee_base_msat
- fee_proportional_millionths
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
It was weird not to have a capacity associated with localmods channels, and
fixing it has some very nice side effects.
Now the gossmap_chan_get_capacity() call never fails (we prevented reading
of channels from gossmap in the partially-written case already), so we
make it return the capacity. We do this in msat, because that's what
all the callers want.
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
This is actually what we want in several places: to only override one or
two fields in a channel_update.
We add a gossmap_local_setchan() with a similar API to the old
gossmap_local_updatechan(), for the case where we want to set every
field.
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
Lagrang3 points out that if we hit a maximum, we should take into account
the reserve. This is true, but it's hard for the caller to do, so change
the API to be slightly higher level.
Tell "inform" what happened, and it adjust the constraints appropriately.
This makes the least assumptions possible (a reserve does *not* mean that
the capacity was actually used at that time).
We also add a mode to say "this succeeded": for now this does nothing,
but it could reduce both min/max capacities, and add capacity in the
other direction. This is useful for future payments, but not as useful
for the current one.
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
I got confused, as we had a struct containing two arrays. Simply expose the
reserve_hop struct and use arrays directly.
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
It's generally better to be explicit with these things: currently typos
would be ignored. But it's also much easier to clean up entire layers
as we use them for temporary (per-payment) effects.
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
I like the clarity, but this is a hot path. Fortunately these arrays
have very well defined lengths.
Before: 5.81 seconds
After: 1.06 seconds
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
We only ever visit each node once, so we can just use an array. This
avoids calling tal() all the time, which is *especially* slow when we're
memory tracking.
I had an old canned gossmap which I benchmarked for these (and in
particular one node was unreachable, and that was slow):
Before: 17.27 seconds
After: 5.80 seconds
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
askrene.c was getting quite long, and this is self-contained.
The only code change is a convenience accessor for the per-htlc-cost
hash table.
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
"spendable" is for a single HTLC: if we own the channel, this amount
decreases with every HTLC, as we have to pay fees. We have access to this since
we call listpeerchannels anyway, so we can calculate the additional costs and
use it in the refining phase.
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
We don't actually hit the htlc_max cases, since the flow code already
constrains us to that.
And handling htlc_min is better done in the caller, where diagnostics
are better (basically, we should eliminate them, and if that means no
route, give a clear error message).
And the refinement step can handle any extra millisats from rounding.
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
This is the root cause of the problem worked around in 50949b7b9c
"askrene: hack in some padding so we don't overflow capacities."
When adding fees to flows, we didn't recheck the boundary conditions: in
renepay this is done by routebuilder.
Fortunately, we can use our "reservations" infrastructure to temporarily
use capacity as we process flows, so we handle the cases where they are
not independent correclty.
My assumption is that the resulting errors are small, so we divide
them between the remaining flows based on highest-to-least
probability.
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
Simply calculate it when we need it, which means we don't have to keep it
up-to-date as we tweak the flow.
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
We had a workaround for channels added by "auto.local", but instead we
should make it work properly.
I didn't do this before because we can't manipulate the localmods while
they're applied, but it's simple to do it in two stages.
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
I used `amount_msat_eq(x, AMOUNT_MSAT(0))` because I forgot this
function existed. I probably missed it because the name is surprising,
so add "is" in there to make it clear it's a boolean function.
You'll note almost all the places which did use it are Eduardo's and
Lisa's code, so maybe it's just me.
Fix up a few places which I could use it, too.
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
Of course, we still will, since spendable is for a single HTLC, but
this also shows why we should treat *minimum* as the incorrect answer
if they cross, too.
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
Fixes: https://github.com/ElementsProject/lightning/issues/7563