mirrors/core-lightning

mirror of https://github.com/ElementsProject/lightning.git synced 2025-02-24 23:18:25 +01:00

Author	SHA1	Message	Date
Lagrang3	05514b46e3	Askrene: change median factor to 1. The ratio of the median of the fees and probability cost is overall not a bad factor to combine these two features. This is what the test_real_data shows. Changelog-None Signed-off-by: Lagrang3 <lagrang3@protonmail.com>	2024-11-21 16:17:52 +10:30
Lagrang3	2b3fd67dfb	askrene: don't skip fee_fallback test The fee_fallback test would fail after fixing the computation of the median. Now by we can restore it by making the probability cost factor 1000x higher than the ratio of the median. This shows how hard it is to combine fee and probability costs and why is the current approach so fragile. Changelog-None Signed-off-by: Lagrang3 <lagrang3@protonmail.com>	2024-11-21 16:17:52 +10:30
Lagrang3	44c9609f3a	askrene: add arbitrary precision flow unit Changelog-none: askrene: add arbitrary precision flow unit Signed-off-by: Lagrang3 <lagrang3@protonmail.com>	2024-11-21 16:17:52 +10:30
Lagrang3	4dc1a44cd9	askrene: fix the median The calculation of the median values of probability and fee cost in the linear approximation had a bug by counting on non-existing arcs. Changelog-none: askrene: fix the median Signed-off-by: Lagrang3 <lagrang3@protonmail.com>	2024-11-21 16:17:52 +10:30
Lagrang3	ee623616d2	askrene: fix CI check the return value of scanf in askrene unit tests, Changelog-none: askrene: fix CI Signed-off-by: Lagrang3 <lagrang3@protonmail.com>	2024-11-21 16:17:52 +10:30
Lagrang3	937cf7a554	askrene: use the new MCF solver Changelog-none: askrene: use the new MCF solver Signed-off-by: Lagrang3 <lagrang3@protonmail.com>	2024-11-21 16:17:52 +10:30
Rusty Russell	b2dcf7248d	askrene: add askrene-bias-channel. This lets you place annotated biases on channels, to influence routing. Uses include avoiding TOR nodes, slow channels or other local preferences. Signed-off-by: Rusty Russell <rusty@rustcorp.com.au> Changelog-None: askrene is new anyway.	2024-11-08 21:48:55 +10:30
Rusty Russell	2a0f09fc2d	askrene: calculate `k` value dynamically, using medians. While the `k=8` value worked for the current main network tests with the amounts in those tests, it wasn't robust across a wider range of values (as demonstrated when other test changes broke tests!). Time to do this properly: calculate the ratio at the time we combine them, using median values. Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>	2024-10-15 09:58:04 +10:30
Rusty Russell	32aa79a1e2	askrene: debug and check we actually reduce fees when mu increase. Even after the previous fix, we still occasionally increase fees when my increases. This is due to the difference between MCF's linear fees, and actual fees, and is unavoidable, but add a check if it somehow happens. Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>	2024-10-15 09:58:04 +10:30
Rusty Russell	08df93cb25	askrene: fix base fee. I noticed this in the logs: plugin-cln-askrene: notify msg unusual: The flows had a fee of 151950msat, greater than max of 53697msat, retrying with mu of 10%... plugin-cln-askrene: notify msg unusual: The flows had a fee of 220126msat, greater than max of 53697msat, retrying with mu of 20%... We would expect increasing mu to reduce the fee! Turns out that our linear fee is a bad terrible approximation, because I was using base_fee_penalty of 10.0. \| \| / __ <- real fee, with base: fee = base + propfee * amount. \| / __/ \| _// \| __/ \| __/_/ \|/ _/ \| _/ <- linearized fee: fee = linear * amount \|/ +----------------------------------- These cross over where linear = propfee + base / amount. Assume we split the payment into 10 parts, this implies that the base_fee_penalty should be 10 / amount (this gives a slight penalty to the normal case, but that's ok). This gives better results, too: we get down to 650099 sats in fees, vs 801613 before. Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>	2024-10-15 09:58:04 +10:30
Rusty Russell	6273adbe47	askrene: calculate prob_cost_factor using ratio of typical mainnet channel. During "test_real_data", then only successes with reduced fees were 92 on "mu=10", and only 1 on "mu=30": the rest went to mu=100 and failed. I tried numerous approaches, and in the end, opted for the simplest: The typical range of probability costs looks likes: min = 0, max = 924196240, mean = 10509.4, stddev = 1.9e+06 The typical range of linear fee costs looks like: min = 0, max = 101000000, mean = 81894.6, stddev = 2.6e+06 This implies a k factor of 8 makes the two comparable. This makes the two numbers comparable, and thus makes "mu" much more effective. Here are the number of different mu values we succeeded at: 87 mu=0 90 mu=10 42 mu=20 24 mu=30 17 mu=40 19 mu=50 19 mu=60 11 mu=70 95 mu=80 19 mu=90 Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>	2024-10-15 09:58:04 +10:30
Rusty Russell	4897286c25	mcf: simplify mu -> cost translation. The current prob_cost_factor setting does not seem to make mu very effective, in fact, it gives strange results: plugin-cln-askrene: notify msg unusual: The flows had a fee of 151950msat, greater than max of 53697msat, retrying with mu of 10%... plugin-cln-askrene: notify msg unusual: The flows had a fee of 220126msat, greater than max of 53697msat, retrying with mu of 20%... We would expect increasing mu to reduce the fee! As a first step, simplify (it can't be infinite, and the -1 are weird). Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>	2024-10-15 09:58:04 +10:30
Rusty Russell	f17c5f5a6b	askrene: don't use tmpctx in minflow() I tested with a really large gossmap (hacked to be 4GB), and when we keep retrying to minimize cost (calling minflow 11 times), and we don't free tmpctx. Due to an issue with how gossmap estimates the index sizes, we ended up running out of memory. This fixes it. Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>	2024-10-15 09:58:04 +10:30
Lagrang3	bd8cc1fb1f	askrene: detect and cancel flow cycles Flow cycles can occur if we have arc zero arc costs. The previous path construction from the flow in the network assumed the absence of such cycles and would enter an infinite loop if it hit one. With his patch wee add cycle detection and removal during the path construction phase. Reported-by: Rusty Russell <rusty@rustcorp.com.au> Signed-off-by: Lagrang3 <lagrang3@protonmail.com> Changelog-EXPERIMENTAL: `askrene` infinite loop fixed	2024-10-15 09:58:04 +10:30
Rusty Russell	bb3663c4a0	askrene: ignore disabled channels for min-cost-flow. We also set htlc_max to 0 when disabling, so the tests worked, but this is correct. Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>	2024-10-04 11:27:53 +09:30
Rusty Russell	0a23c63d37	askrene: optimize, by calling tal_count less. I like the clarity, but this is a hot path. Fortunately these arrays have very well defined lengths. Before: 5.81 seconds After: 1.06 seconds Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>	2024-10-04 08:47:53 +09:30
Rusty Russell	9f0c0e1cca	askrene: use a simple array as our queue. We only ever visit each node once, so we can just use an array. This avoids calling tal() all the time, which is especially slow when we're memory tracking. I had an old canned gossmap which I benchmarked for these (and in particular one node was unreachable, and that was slow): Before: 17.27 seconds After: 5.80 seconds Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>	2024-10-04 08:47:53 +09:30
Lagrang3	0aa52b7fdd	askrene: remove unused function Signed-off-by: Lagrang3 <lagrang3@protonmail.com>	2024-09-19 12:16:53 +09:30
Rusty Russell	db29a2d6b5	askrene: don't have get_flow_paths() handle htlc_max, htlc_min and extra millisats. We don't actually hit the htlc_max cases, since the flow code already constrains us to that. And handling htlc_min is better done in the caller, where diagnostics are better (basically, we should eliminate them, and if that means no route, give a clear error message). And the refinement step can handle any extra millisats from rounding. Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>	2024-09-19 12:16:53 +09:30
Rusty Russell	f0331cd82e	askrene: add a "refining" step to add fees and handle corner cases. This is the root cause of the problem worked around in `50949b7b9c` "askrene: hack in some padding so we don't overflow capacities." When adding fees to flows, we didn't recheck the boundary conditions: in renepay this is done by routebuilder. Fortunately, we can use our "reservations" infrastructure to temporarily use capacity as we process flows, so we handle the cases where they are not independent correclty. My assumption is that the resulting errors are small, so we divide them between the remaining flows based on highest-to-least probability. Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>	2024-09-19 12:16:53 +09:30
Rusty Russell	5883aa85ca	askrene: rename `struct flow` `amount` to `delivers`. This is clearer: it's the final amount, not the amount we send! Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>	2024-09-19 12:16:53 +09:30
Rusty Russell	829954ac71	askrene: remove `struct flow` `probability` member. Simply calculate it when we need it, which means we don't have to keep it up-to-date as we tweak the flow. Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>	2024-09-19 12:16:53 +09:30
Rusty Russell	50949b7b9c	askrene: hack in some padding so we don't overflow capacities. Of course, we still will, since spendable is for a single HTLC, but this also shows why we should treat minimum as the incorrect answer if they cross, too. Signed-off-by: Rusty Russell <rusty@rustcorp.com.au> Fixes: https://github.com/ElementsProject/lightning/issues/7563	2024-08-23 18:52:15 +09:30
Rusty Russell	fafda82d82	askrene: fix up our handling of htlc_max. It seems we didn't handle it correctly: we need to cap the first segment as well as the others, as far as I can tell. Also, it can be less than the maximum capacity. Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>	2024-08-23 18:52:15 +09:30
Rusty Russell	79ceb59d7a	plugins/askrene: remove local contexts. In general, we should be using tmpctx unless there's a specific reason not to. It's clear, and simplifies the code somewhat. If tmpctx is not cleaned often enough, we can look at a per-MCF context, but this seems like premature optimization. Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>	2024-08-07 20:35:30 +09:30
Rusty Russell	b1817b6c52	askrene: include the mcf and flow routines. This make the code use askrene's "struct route_query". Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>	2024-08-07 20:35:30 +09:30
Rusty Russell	1db5cf6dea	askrene: simply fail if a flow amount exceeds 64 bits. Rather than handling failure, simply report and exit the plugin. Simplifies error handling. Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>	2024-08-07 20:35:30 +09:30
Rusty Russell	7bf399cac5	askrene: remove code which tries to handle tal failures. tal does not fail: the default handler (which we use) aborts. Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>	2024-08-07 20:35:30 +09:30
Rusty Russell	e4b84f1ffb	askrene: copy flow and dijkstra from renepay. Still don't actually try compiling them. Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>	2024-08-07 20:35:30 +09:30
Rusty Russell	d109fcb568	askrene: simplify minflow() We let the caller choose mu, and iterate if necessary: it can also check its limits for fees, etc. Rationalize it to 0-100 inclusive for human consumption. This means we don't loop internally, and in fact there's only one failure mode: we cannot find enough capacity. Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>	2024-08-07 20:35:30 +09:30
Rusty Russell	5999467dce	askrene: copy mcf.[ch] from renepay with minimal modifications. Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>	2024-08-07 20:35:30 +09:30

31 commits