mirrors/core-lightning

mirror of https://github.com/ElementsProject/lightning.git synced 2025-02-21 14:24:09 +01:00

Author	SHA1	Message	Date
Rusty Russell	d85dcc0ce4	askrene: persistent layer support. Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>	2024-11-08 21:48:55 +10:30
Rusty Russell	b2dcf7248d	askrene: add askrene-bias-channel. This lets you place annotated biases on channels, to influence routing. Uses include avoiding TOR nodes, slow channels or other local preferences. Signed-off-by: Rusty Russell <rusty@rustcorp.com.au> Changelog-None: askrene is new anyway.	2024-11-08 21:48:55 +10:30
Rusty Russell	3f09e503ec	askrene: fix false positive memleak since we didn't scan local_updates. Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>	2024-11-08 21:48:55 +10:30
Rusty Russell	c797b6fb20	libplugin: add method string to jsonrpc callbacks, implement generic helpers. Without knowing what method was called, we can't have useful general logging methods, so go through the pain of adding "const char *method" everywhere, and add: 1. ignore_and_complete - we're done when jsonrpc returned 2. log_broken_and_complete - we're done, but emit BROKEN log. 3. plugin_broken_cb - if this happens, fail the plugin. Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>	2024-11-07 17:04:35 +10:30
Rusty Russell	4f4ec9aefd	askrene: make sure we depend on libplugin.h Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>	2024-11-07 17:04:35 +10:30
Rusty Russell	c5099b1647	libplugin: clean up API. When we used to allow cmd to be NULL, we had to hand the plugin everywhere. We no longer do. 1. Various jsonrpc_ functions no longer need the plugin arg. 2. send_outreq no longer needs a plugin arg. 3. The init function takes a command, not a plugin. 4. Remove command_deprecated_in_nocmd_ok. Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>	2024-11-07 17:04:35 +10:30
Rusty Russell	318e49e9c7	askrene: more logging in explain_failure. Lagrang3 doesn't like the logging in here at all, but he suggested we at least be consistent! Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>	2024-10-15 09:58:04 +10:30
Rusty Russell	95c5fda79f	askrene: remove flowset_probability() now refine step calculates it. Now we've checked it gives the same answers, we can remove a lot of work in flow.c. Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>	2024-10-15 09:58:04 +10:30
Rusty Russell	5501e4b13d	askrene: use refine step to calculate flowset probability. Since we know the total reservations on each hop, we can more easily determine probabilities than using flowset_probability() which has to replicate this collision detection. We leave both in place for now, to check. The results are not identical, due to slightly different calculation methods. Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>	2024-10-15 09:58:04 +10:30
Rusty Russell	4b6a38fe0a	askrene: fix bug with reservations used during refinement. We were trying to get the max capacity of a flow to see if we could add some more sats, and hit an assertion: tests/test_askrene.py:707: ``` DEBUG plugin-cln-askrene: notify msg info: Flow reduced to deliver 88070161msat not 90008000msat, because 107x1x0/1 has remaining capacity 88071042msat DEBUG plugin-cln-askrene: notify msg info: Flow reduced to deliver 284138158msat not 284787000msat, because 108x1x0/1 has remaining capacity 284141000msat BROKEN plugin-cln-askrene: Flow delivers 129565000msat but max only 56506138msat INFO plugin-cln-askrene: Killing plugin: exited during normal operation ``` We need to unreserve our flow before asking for max capacity. We were also missing a few less important cases where we altered flows without altering the reservation, so fix those too. Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>	2024-10-15 09:58:04 +10:30
Rusty Russell	bcc8bd59c8	askrene: don't completely ignore fees to start. I noticed that increasing mu a little bit sometimes made a big difference, because by completely ignoring fees we were choosing the worst of two channels in some cases. Start at 1% fees; this saves a lot on initial fees in this test! Here's the new stats on mu levels: 96 mu=1 90 mu=10 41 mu=20 30 mu=30 24 mu=40 19 mu=50 22 mu=60 8 mu=70 95 mu=80 19 mu=90 Signed-off-by: Rusty Russell <rusty@rustcorp.com.au> Changelog-EXPERIMENTAL: `askrene` is now better at finding low-fee paths.	2024-10-15 09:58:04 +10:30
Rusty Russell	2a0f09fc2d	askrene: calculate `k` value dynamically, using medians. While the `k=8` value worked for the current main network tests with the amounts in those tests, it wasn't robust across a wider range of values (as demonstrated when other test changes broke tests!). Time to do this properly: calculate the ratio at the time we combine them, using median values. Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>	2024-10-15 09:58:04 +10:30
Rusty Russell	32aa79a1e2	askrene: debug and check we actually reduce fees when mu increase. Even after the previous fix, we still occasionally increase fees when my increases. This is due to the difference between MCF's linear fees, and actual fees, and is unavoidable, but add a check if it somehow happens. Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>	2024-10-15 09:58:04 +10:30
Rusty Russell	08df93cb25	askrene: fix base fee. I noticed this in the logs: plugin-cln-askrene: notify msg unusual: The flows had a fee of 151950msat, greater than max of 53697msat, retrying with mu of 10%... plugin-cln-askrene: notify msg unusual: The flows had a fee of 220126msat, greater than max of 53697msat, retrying with mu of 20%... We would expect increasing mu to reduce the fee! Turns out that our linear fee is a bad terrible approximation, because I was using base_fee_penalty of 10.0. \| \| / __ <- real fee, with base: fee = base + propfee * amount. \| / __/ \| _// \| __/ \| __/_/ \|/ _/ \| _/ <- linearized fee: fee = linear * amount \|/ +----------------------------------- These cross over where linear = propfee + base / amount. Assume we split the payment into 10 parts, this implies that the base_fee_penalty should be 10 / amount (this gives a slight penalty to the normal case, but that's ok). This gives better results, too: we get down to 650099 sats in fees, vs 801613 before. Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>	2024-10-15 09:58:04 +10:30
Rusty Russell	6273adbe47	askrene: calculate prob_cost_factor using ratio of typical mainnet channel. During "test_real_data", then only successes with reduced fees were 92 on "mu=10", and only 1 on "mu=30": the rest went to mu=100 and failed. I tried numerous approaches, and in the end, opted for the simplest: The typical range of probability costs looks likes: min = 0, max = 924196240, mean = 10509.4, stddev = 1.9e+06 The typical range of linear fee costs looks like: min = 0, max = 101000000, mean = 81894.6, stddev = 2.6e+06 This implies a k factor of 8 makes the two comparable. This makes the two numbers comparable, and thus makes "mu" much more effective. Here are the number of different mu values we succeeded at: 87 mu=0 90 mu=10 42 mu=20 24 mu=30 17 mu=40 19 mu=50 19 mu=60 11 mu=70 95 mu=80 19 mu=90 Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>	2024-10-15 09:58:04 +10:30
Rusty Russell	4897286c25	mcf: simplify mu -> cost translation. The current prob_cost_factor setting does not seem to make mu very effective, in fact, it gives strange results: plugin-cln-askrene: notify msg unusual: The flows had a fee of 151950msat, greater than max of 53697msat, retrying with mu of 10%... plugin-cln-askrene: notify msg unusual: The flows had a fee of 220126msat, greater than max of 53697msat, retrying with mu of 20%... We would expect increasing mu to reduce the fee! As a first step, simplify (it can't be infinite, and the -1 are weird). Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>	2024-10-15 09:58:04 +10:30
Rusty Russell	83eee64fda	pytest: test askrene with worse maxfee argument. We ask it again, but reduce fees by 1msat from the previous answer. This is really nasty, as it frequently exercises the case where we only go over fee when we do the refinement step. Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>	2024-10-15 09:58:04 +10:30
Rusty Russell	f17c5f5a6b	askrene: don't use tmpctx in minflow() I tested with a really large gossmap (hacked to be 4GB), and when we keep retrying to minimize cost (calling minflow 11 times), and we don't free tmpctx. Due to an issue with how gossmap estimates the index sizes, we ended up running out of memory. This fixes it. Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>	2024-10-15 09:58:04 +10:30
Lagrang3	bd8cc1fb1f	askrene: detect and cancel flow cycles Flow cycles can occur if we have arc zero arc costs. The previous path construction from the flow in the network assumed the absence of such cycles and would enter an infinite loop if it hit one. With his patch wee add cycle detection and removal during the path construction phase. Reported-by: Rusty Russell <rusty@rustcorp.com.au> Signed-off-by: Lagrang3 <lagrang3@protonmail.com> Changelog-EXPERIMENTAL: `askrene` infinite loop fixed	2024-10-15 09:58:04 +10:30
Rusty Russell	d08a3bb9e6	askrene: don't give up if we hit htlc_max and have no other flows. This happens in the coming "real network" test! We add fees and hit htlc_max, but don't have another flow to add to. Rather than MCF again, we split the flow into two. Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>	2024-10-15 09:58:04 +10:30
Rusty Russell	1b82a3ad5b	askrene: constrain to exact htlc_min/htlc_max values. The fp16_t values are approximations (overestimate for htlc_max, underestimate for htlc_min), so in the refinement step we should use the exact values. This also fixes a logic bug: flow_remaining_capacity returned the total capacity, not the additional capacity! Signed-off-by: Rusty Russell <rusty@rustcorp.com.au> Changelog-EXPERIMENTAL: `askrene` now honors exact htlc_maximum_msat limits.	2024-10-15 09:58:04 +10:30
Rusty Russell	22e7a57557	askrene: make `auto.sourcefree` a real layer, too. Signed-off-by: Rusty Russell <rusty@rustcorp.com.au> Changelog-EXPERIMENTAL: `getroutes` now applies `auto.sourcefree` layer in the order specified, so doesn't alter channels changed in later layers.	2024-10-15 09:58:04 +10:30
Rusty Russell	3321ad5883	askrene: populate auto.localchans layer properly. Rather than adding to the gossmap modifications directly, populate the layer and have the normal layer application logic do it. This is consistent when we query layers in the next patch. Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>	2024-10-15 09:58:04 +10:30
Rusty Russell	1230f1b832	askrene: give notifications back to caller as we go. And unify logging for better debugging. Signed-off-by: Rusty Russell <rusty@rustcorp.com.au> Changelog-EXPERIMENTAL: `askrene` now has better logging, gives notifications of progress.	2024-10-15 09:58:04 +10:30
Rusty Russell	b8acd3b37c	askrene: more code tweaks on feedback from Lagrang3. 1. describe_disabled should point out if node itself is disabled. 2. Hoist constraint check for neater if branching. 3. Use amount_msat_max/min for greater clarity. 4. Simply disable channels, don't zero htlc_min/max when node disabled. I also fixed the diagnostic of htlc_max correctly, which removes a FIXME. Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>	2024-10-04 11:27:53 +09:30
Rusty Russell	e3c9bc6d3a	askrene: trivial changes to avoid -O3 compiler warnings. The code is a bit too complex for gcc to track it: ``` In file included from ccan/ccan/tal/str/str.h:7, from plugins/askrene/askrene.c:11: plugins/askrene/askrene.c: In function ‘do_getroutes’: ccan/ccan/tal/tal.h:324:23: error: ‘routes’ may be used uninitialized in this function [-Werror=maybe-uninitialized] 324 \| #define tal_count(p) (tal_bytelen(p) / sizeof(p)) \| ^~~~~~~~~~~ plugins/askrene/askrene.c:476:24: note: ‘routes’ was declared here 476 \| struct route routes; \| ^~~~~~ plugins/askrene/askrene.c:475:29: error: ‘amounts’ may be used uninitialized in this function [-Werror=maybe-uninitialized] 475 \| struct amount_msat amounts; \| ^~~~~~~ plugins/askrene/askrene.c:488:69: error: ‘probability’ may be used uninitialized in this function [-Werror=maybe-uninitialized] 488 \| json_add_u64(response, "probability_ppm", (u64)(probability * 1000000)); \| ~~~~~~~~~~~~~^~~~~~~~~~ cc plugins/askrene/dijkstra.c cc1: all warnings being treated as errors ``` On my local machine, it also warns in param_dev_channel, so I fixed that too. Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>	2024-10-04 11:27:53 +09:30
Rusty Russell	630ec6a566	askrene: give better feedback when we can't find a suitable route. This turns out to be critical for users: also stops them from bothering us when their node is offline or has insufficient capacity! Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>	2024-10-04 11:27:53 +09:30
Rusty Russell	e54c0f8ded	askrene: don't replace constraints, simply accumulate. Lagrang3 points out it's less useful (when we time them out), and probably a premature optimization anyway. Suggested-by: Lagrang3 <lagrang3@protonmail.com> Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>	2024-10-04 11:27:53 +09:30
Rusty Russell	c307b77d2f	askrene: split askrene-create-channel into create-channel and update-channel. This allows for explicit partial updates to channels (e.g. just change fees, or just disable) without haveing to set the other fields. This generalizes askrene-disable-channel, which is removed. We also take the chance to use the proper BOLT 7 terms in the API: - htlc_minimum_msat - htlc_maximum_msat - cltv_expiry_delta - fee_base_msat - fee_proportional_millionths Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>	2024-10-04 11:27:53 +09:30
Rusty Russell	5052f0763f	gossmap: keep capacity for locally-generated channels as well. It was weird not to have a capacity associated with localmods channels, and fixing it has some very nice side effects. Now the gossmap_chan_get_capacity() call never fails (we prevented reading of channels from gossmap in the partially-written case already), so we make it return the capacity. We do this in msat, because that's what all the callers want. Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>	2024-10-04 11:27:53 +09:30
Rusty Russell	a60063e763	common/gossmods_listpeerchannels: include channel capacity in callback. Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>	2024-10-04 11:27:53 +09:30
Rusty Russell	a65e325b13	gossmap: implement partial updates. This is actually what we want in several places: to only override one or two fields in a channel_update. We add a gossmap_local_setchan() with a similar API to the old gossmap_local_updatechan(), for the case where we want to set every field. Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>	2024-10-04 11:27:53 +09:30
Rusty Russell	bb3663c4a0	askrene: ignore disabled channels for min-cost-flow. We also set htlc_max to 0 when disabling, so the tests worked, but this is correct. Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>	2024-10-04 11:27:53 +09:30
Rusty Russell	3253623785	gossmods_from_listpeerchannels: use correct type for cltv_delta. Doesn't matter now, but will with the next change where we want to pass a pointer. Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>	2024-10-04 11:27:53 +09:30
Rusty Russell	fcc0d2bad8	askrene: remove unused parameter in layer_add_localmods. Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>	2024-10-04 11:27:53 +09:30
Rusty Russell	321ec0875f	askrene: rework constraints to exist in pairs. This is a bit more efficient, but moreover the JSONRPC API is more logical this way. Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>	2024-10-04 11:27:53 +09:30
Rusty Russell	3c5c22b17a	askrene: change inform interface, take into account reserve. Lagrang3 points out that if we hit a maximum, we should take into account the reserve. This is true, but it's hard for the caller to do, so change the API to be slightly higher level. Tell "inform" what happened, and it adjust the constraints appropriately. This makes the least assumptions possible (a reserve does not mean that the capacity was actually used at that time). We also add a mode to say "this succeeded": for now this does nothing, but it could reduce both min/max capacities, and add capacity in the other direction. This is useful for future payments, but not as useful for the current one. Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>	2024-10-04 11:27:53 +09:30
Rusty Russell	d50838b60f	askrene: implement listreservations And actually write tests! Suggested-by: Lagrang3 <lagrang3@protonmail.com> Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>	2024-10-04 11:27:53 +09:30
Rusty Russell	0398d2ff73	askrene: remember individual reservations, for better debugging. Suggested-by: Lagrang3 <lagrang3@protonmail.com> Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>	2024-10-04 11:27:53 +09:30
Rusty Russell	10dc40a895	askrene: clean up reserve array handling. I got confused, as we had a struct containing two arrays. Simply expose the reserve_hop struct and use arrays directly. Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>	2024-10-04 11:27:53 +09:30
Rusty Russell	2df53c32c2	askrene: make `route_query` contain pointer to the command. This is important for errors and feedback. Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>	2024-10-04 11:27:53 +09:30
Rusty Russell	b88f4cb854	askrene: askrene-create-layer and askrene-remove-layer. It's generally better to be explicit with these things: currently typos would be ignored. But it's also much easier to clean up entire layers as we use them for temporary (per-payment) effects. Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>	2024-10-04 11:27:53 +09:30
Rusty Russell	29cc227a53	askrene: use short_channel_id_dir in API. It's generally much more convenient, and it's already present in other APIs. Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>	2024-10-04 11:27:53 +09:30
Lagrang3	33404b03a0	add askrene-disable-channel Changelog-EXPERIMENTAL: askrene: add askrene-disable-channel RPC Signed-off-by: Lagrang3 <lagrang3@protonmail.com>	2024-10-04 11:27:53 +09:30
Rusty Russell	f46219b505	common: round out the short_channel_id_dir JSON routines. Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>	2024-10-04 11:27:53 +09:30
Rusty Russell	d8ca9baa3b	askrene: add support for disabled channels in layers. Based-on-the-patch-by: Lagrang3 <lagrang3@protonmail.com> Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>	2024-10-04 11:27:53 +09:30
Rusty Russell	0a23c63d37	askrene: optimize, by calling tal_count less. I like the clarity, but this is a hot path. Fortunately these arrays have very well defined lengths. Before: 5.81 seconds After: 1.06 seconds Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>	2024-10-04 08:47:53 +09:30
Rusty Russell	9f0c0e1cca	askrene: use a simple array as our queue. We only ever visit each node once, so we can just use an array. This avoids calling tal() all the time, which is especially slow when we're memory tracking. I had an old canned gossmap which I benchmarked for these (and in particular one node was unreachable, and that was slow): Before: 17.27 seconds After: 5.80 seconds Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>	2024-10-04 08:47:53 +09:30
Lagrang3	0aa52b7fdd	askrene: remove unused function Signed-off-by: Lagrang3 <lagrang3@protonmail.com>	2024-09-19 12:16:53 +09:30
Rusty Russell	9b60f6cc6d	askrene: re-check min_htlc violations after correcting for MCF rounding. Thanks to @Lagrang3 for spotting this! Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>	2024-09-19 12:16:53 +09:30

1 2

87 commits