In fact, only 951 of 17419 (5%) of node announcements are missing an address
(and gossipd doesn't know if we can connect to Tor addresses anyway) so
just check it *has* a node_announcement.
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
Let lightningd feed us hints to try first, but we can extract the
addresses from node_announcement messages ourselves.
(Lightningd used to ask gossipd on our behalf: this is far simpler!)
One side effect of this is that we don't hand back address hints given to us
by lightningd: it would use these again for reconnecting. This is breaks
test_sendpay_grouping, so we disable it temporarily.
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
It turns out that under some circumstances we end up clearing the
pointee of `current` but not the pointer. Thus when we select the next
slot we can end up reusing the same slot, making it its own parent.
We forcefull break these cycles by enforcing that `current` should
never be returned and be set as its own parent.
Changelog-None
Trace spans form a tree, but we don't actually check that the
structure doesn't break. Breakage can for example come if we use the
same key accidentally, making a new span its own ancestor.
We have the space in memory set aside anyway, so let's just copy the
`trace_id` into the span itself, rather than resolving the `root` at
time of emission.
This was a bit harder to identify: during an `io_loop` run we suspend
the current span before handing over to `io_loop`, and later when a callback
is called we resume the span again. Depending on how we return from
the `io_loop` instance that is used to drive the startup, we either
have resumed the last span, or we don't. Since we start a span before
`io_loop` and want it to be emitted afterwards, we need to take care
of the case where we returned from a callback that did not resume, and
therefore the current context is empty.
Making `trace_span_resume` idempotent means we can just resume it
manually.
Ideally we'd push the suspend / resume logic down into `io_loop`
itself, and then we'd have just one place. Maybe suspend and resume
callbacks that can be configured in `io_loop`?
After adding the DB query instrumentation we ran into a couple of
issues, with spans not being resumed correctly, and it was rather hard
to identify the problem. This adds debug statements so we can trace
the tracing (traception if you will).
Changelog-None
lightningd/test/run-find_my_abspath.c includes ../lightningd.c, which includes
header_versions_gen.h, a generated header file.
lightningd/Makefile correctly declares that lightningd/lightningd.o depends on
header_versions_gen.h, but lightningd/test/Makefile lacks any such declaration
regarding lightningd/test/run-find_my_abspath.c, which leads to build failure:
In file included from lightningd/test/run-find_my_abspath.c:5:
lightningd/test/../lightningd.c:64:10: fatal error: header_versions_gen.h: No such file or directory
64 | #include <header_versions_gen.h>
| ^~~~~~~~~~~~~~~~~~~~~~~
Declare the missing dependency in lightningd/test/Makefile so that Make will
ensure that header_versions_gen.h is generated before it attempts to build
lightningd/test/run-find_my_abspath.o.
Changelog-None
Due to Darwin-arm64 conditional setting of `CPPFLAGS`, the subsequent
`CPPFLAGS +=` is resolved earlier on ARM macOS which results in empty
paths being used.
Changelog-None
If the first one doesn't use the entire timeout, the second might need longer
(I used TIMEOUT=10 normally):
```
FAILED tests/test_gossip.py::test_gossip_pruning - TimeoutError: Unable to find "[re.compile('Pruning channel 103x1x0 from network view')]" in logs.
```
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
pay does this, xpay does not. Which means if a block comes in (or you're behind),
you get gratuitous failures:
```
def test_xpay_simple(node_factory):
l1, l2, l3, l4 = node_factory.get_nodes(4, opts={'may_reconnect': True})
node_factory.join_nodes([l1, l2, l3], wait_for_announce=True)
node_factory.join_nodes([l3, l4], announce_channels=False)
# BOLT 11, direct peer
b11 = l2.rpc.invoice('10000msat', 'test_xpay_simple', 'test_xpay_simple bolt11')['bolt11']
> ret = l1.rpc.xpay(b11)
tests/test_xpay.py:148:
...
if not isinstance(resp, dict):
raise TypeError("Malformed response, response is not a dictionary %s." % resp)
elif "error" in resp:
> raise RpcError(method, payload, resp['error'])
E pyln.client.lightning.RpcError: RPC call failed: method: xpay, payload: ('lnbcrt100n1pn5qu7csp53rp0mfwtfsyyy8gzsggepnxgslyalwvz3jkg9ptmqq452ln2nmgqpp58ak9nmfz9l93r0fpm266ewyjrhurhatrs05nda0r03p82cykp0vsdp9w3jhxazl0pcxz72lwd5k6urvv5sxymmvwscnzxqyjw5qcqp99qxpqysgqa798258yppu2tlfj8herr3zuz0zgux79zvtx6z57cmfzs2wdesmr4nvnkcmyssyu6k64ud54eg0v45c3mcw342jj6uy7tu202p6klrcp6ljc9w',), error: {'code': 203, 'message': "Destination said it doesn't know invoice: incorrect_or_unknown_payment_details"}
```
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
Changelog-None: xpay is new this release.
24.05 and before requires a "description" field. We should not have removed it here
until that was EOL!
Changelog-Fixed: pyln-client: plugins now compatible with CLN <= 24.05 (broken in 24.08)
Reported-by: Christian Decker
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
We don't want to to refresh the gossmap internally: this could invalidate the
gossmap held by the current callers.
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
This does not validate a node announcement and address, but it
does select a node at random from the gossmap and asks lightningd
to attempt a connection to it.
Gossipd uses this to ask lightningd -> connectd to initiate
a connection to a new gossip peer. This can be used when
there are insufficient peers already connected to gossip with.
Changelog-Changed: Gossipd can now request connections to additional nodes for improved gossip sync
We can get more gossip_filter messages now. And we can also go over max-messages,
so increase that too.
```
del tally['query_short_channel_ids']
del tally['query_channel_range']
del tally['ping']
> assert tally == {'channel_announce': 1,
'channel_update': 3,
'node_announce': 1,
'gossip_filter': 1}
E AssertionError: assert {'channel_ann..._announce': 1} == {'channel_ann..._announce': 1}
E Omitting 2 identical items, use -vv to show
E Differing items:
E {'gossip_filter': 2} != {'gossip_filter': 1}
E {'channel_update': 2} != {'channel_update': 3}
E Full diff:
E {
E 'channel_announce': 1,...
E
E ...Full output truncated (10 lines hidden), use '-vv' to show
tests/test_gossip.py:2326: AssertionError
```
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
We actually mine *300* blocks, not 200, and if timing is right l1
can have mined the txid before mine_txid_or_rbf() checks the mempool:
```
def test_onchaind_replay(node_factory, bitcoind):
disconnects = ['+WIRE_REVOKE_AND_ACK', 'permfail']
# Feerates identical so we don't get gratuitous commit to update them
l1, l2 = node_factory.line_graph(2, opts=[{'watchtime-blocks': 201, 'cltv-delta': 101,
'disconnect': disconnects,
'feerates': (7500, 7500, 7500, 7500)},
{'watchtime-blocks': 201, 'cltv-delta': 101}],
wait_for_announce=True)
inv = l2.rpc.invoice(10**8, 'onchaind_replay', 'desc')
rhash = inv['payment_hash']
routestep = {
'amount_msat': 10**8 - 1,
'id': l2.info['id'],
'delay': 101,
'channel': first_scid(l1, l2)
}
l1.rpc.sendpay([routestep], rhash, payment_secret=inv['payment_secret'])
l1.daemon.wait_for_log('sendrawtx exit 0')
bitcoind.generate_block(1, wait_for_mempool=1)
# Wait for nodes to notice the failure, this seach needle is after the
# DB commit so we're sure the tx entries in onchaindtxs have been added
l1.daemon.wait_for_log("Deleting channel .* due to the funding outpoint being spent")
l2.daemon.wait_for_log("Deleting channel .* due to the funding outpoint being spent")
# We should at least have the init tx now
assert len(l1.db_query("SELECT * FROM channeltxs;")) > 0
assert len(l2.db_query("SELECT * FROM channeltxs;")) > 0
# Generate some blocks so we restart the onchaind from DB (we rescan
# last_height - 100)
bitcoind.generate_block(100)
sync_blockheight(bitcoind, [l1, l2])
# l1 should still have a running onchaind
assert len(l1.db_query("SELECT * FROM channeltxs;")) > 0
l2.rpc.stop()
l1.restart()
# Can't wait for it, it's after the "Server started" wait in restart()
assert l1.daemon.is_in_log(r'Restarting onchaind \(ONCHAIN\): closed in block 109')
# l1 should still notice that the funding was spent and that we should react to it
_, txid, blocks = l1.wait_for_onchaind_tx('OUR_DELAYED_RETURN_TO_WALLET',
'OUR_UNILATERAL/DELAYED_OUTPUT_TO_US')
assert blocks == 200
bitcoind.generate_block(200)
# Could be RBF!
> l1.mine_txid_or_rbf(txid)
tests/test_closing.py:1864:
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _
contrib/pyln-testing/pyln/testing/utils.py:1375: in mine_txid_or_rbf
wait_for(lambda: rbf_or_txid_broadcast(txids))
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _
success = <function LightningNode.mine_txid_or_rbf.<locals>.<lambda> at 0x7f9b129c4550>
timeout = 180
def wait_for(success, timeout=TIMEOUT):
start_time = time.time()
interval = 0.25
while not success():
time_left = start_time + timeout - time.time()
if time_left <= 0:
> raise ValueError("Timeout while waiting for {}".format(success))
E ValueError: Timeout while waiting for <function LightningNode.mine_txid_or_rbf.<locals>.<lambda> at 0x7f9b129c4550>
```
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
The test program has a leak, so address sanitizer complains and makes it
"fail" the zlib detection test!
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
We can fix the median calc by removing the (unused) reverse edges.
Also analyze the failure case in test_real_data: it's a real edge case, so
hardcode that one as "ok".
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
The ratio of the median of the fees and probability cost is overall not
a bad factor to combine these two features. This is what the
test_real_data shows.
Changelog-None
Signed-off-by: Lagrang3 <lagrang3@protonmail.com>
The fee_fallback test would fail after fixing the computation of the
median. Now by we can restore it by making the probability cost factor
1000x higher than the ratio of the median. This shows how hard it is to
combine fee and probability costs and why is the current approach so
fragile.
Changelog-None
Signed-off-by: Lagrang3 <lagrang3@protonmail.com>
Rusty: "We don't generally use NDEBUG in our code"
Instead use a compile time flag ASKRENE_UNITTEST to make checks on unit
tests that we don't normally need on release code.
Changelog-none
Signed-off-by: Lagrang3 <lagrang3@protonmail.com>
- use graph_max_num_arcs/nodes instead of tal_count in bound checks,
- don't use ccan/lqueue, use instead a minimalistic queue
implementation with an array,
- add missing const qualifiers to temporary tal allocators,
- check preconditions with assert,
- remove inline specifier for static functions,
Changelog-None
Signed-off-by: Lagrang3 <lagrang3@protonmail.com>
The calculation of the median values of probability and fee cost in the
linear approximation had a bug by counting on non-existing arcs.
Changelog-none: askrene: fix the median
Signed-off-by: Lagrang3 <lagrang3@protonmail.com>
We use an arc "array" in the graph structure, but not all arc indexes
correspond to real topological arcs. We must be careful when iterating
through all arcs, and check if they are enabled before making operations
on them.
Changelog-None: askrene: fix bug, not all arcs exists
Signed-off-by: Lagrang3 <lagrang3@protonmail.com>
Add a new function to compute a MCF using a more general description of
the problem. I call it mcf_refinement because it can start with a
feasible flow (though this is not necessary) and adapt it to achieve
optimality.
Changelog-None: askrene: add a MCF refinement
Signed-off-by: Lagrang3 <lagrang3@protonmail.com>