Commit graph

15705 commits

Author SHA1 Message Date
Matt Whitlock
7a2006842f lightningd/test/Makefile: add missing dependency on header_versions_gen.h
lightningd/test/run-find_my_abspath.c includes ../lightningd.c, which includes
header_versions_gen.h, a generated header file.

lightningd/Makefile correctly declares that lightningd/lightningd.o depends on
header_versions_gen.h, but lightningd/test/Makefile lacks any such declaration
regarding lightningd/test/run-find_my_abspath.c, which leads to build failure:

In file included from lightningd/test/run-find_my_abspath.c:5:
lightningd/test/../lightningd.c:64:10: fatal error: header_versions_gen.h: No such file or directory
   64 | #include <header_versions_gen.h>
      |          ^~~~~~~~~~~~~~~~~~~~~~~

Declare the missing dependency in lightningd/test/Makefile so that Make will
ensure that header_versions_gen.h is generated before it attempts to build
lightningd/test/run-find_my_abspath.o.

Changelog-None
2024-11-23 13:03:00 +01:00
daywalker90
54e7ac6872 startup_regtest: remove experimental-offers flag
Changelog-None
2024-11-23 10:48:32 +10:30
Michael Cho
94c5695d6f Makefile: fix defines on ARM macOS
Due to Darwin-arm64 conditional setting of `CPPFLAGS`, the subsequent
`CPPFLAGS +=` is resolved earlier on ARM macOS which results in empty
paths being used.

Changelog-None
2024-11-23 10:47:32 +10:30
Rusty Russell
dba9746d21 pytest: fix flake in test_gossip_pruning.
If the first one doesn't use the entire timeout, the second might need longer
(I used TIMEOUT=10 normally):

```
FAILED tests/test_gossip.py::test_gossip_pruning - TimeoutError: Unable to find "[re.compile('Pruning channel 103x1x0 from network view')]" in logs.
```

Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
2024-11-23 10:20:30 +10:30
Rusty Russell
90ab9325a1 xpay: give an additional block "slack" for CLTV values.
pay does this, xpay does not.  Which means if a block comes in (or you're behind),
you get gratuitous failures:

```
    def test_xpay_simple(node_factory):
        l1, l2, l3, l4 = node_factory.get_nodes(4, opts={'may_reconnect': True})
        node_factory.join_nodes([l1, l2, l3], wait_for_announce=True)
        node_factory.join_nodes([l3, l4], announce_channels=False)

        # BOLT 11, direct peer
        b11 = l2.rpc.invoice('10000msat', 'test_xpay_simple', 'test_xpay_simple bolt11')['bolt11']
>       ret = l1.rpc.xpay(b11)

tests/test_xpay.py:148:
...
        if not isinstance(resp, dict):
            raise TypeError("Malformed response, response is not a dictionary %s." % resp)
        elif "error" in resp:
>           raise RpcError(method, payload, resp['error'])
E           pyln.client.lightning.RpcError: RPC call failed: method: xpay, payload: ('lnbcrt100n1pn5qu7csp53rp0mfwtfsyyy8gzsggepnxgslyalwvz3jkg9ptmqq452ln2nmgqpp58ak9nmfz9l93r0fpm266ewyjrhurhatrs05nda0r03p82cykp0vsdp9w3jhxazl0pcxz72lwd5k6urvv5sxymmvwscnzxqyjw5qcqp99qxpqysgqa798258yppu2tlfj8herr3zuz0zgux79zvtx6z57cmfzs2wdesmr4nvnkcmyssyu6k64ud54eg0v45c3mcw342jj6uy7tu202p6klrcp6ljc9w',), error: {'code': 203, 'message': "Destination said it doesn't know invoice: incorrect_or_unknown_payment_details"}
```

Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
Changelog-None: xpay is new this release.
2024-11-23 10:20:30 +10:30
Rusty Russell
4186591a70 pyln-client: restore backwards compatibility with CLN prior to 24.08
24.05 and before requires a "description" field.  We should not have removed it here
until that was EOL!

Changelog-Fixed: pyln-client: plugins now compatible with CLN <= 24.05 (broken in 24.08)
Reported-by: Christian Decker
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
2024-11-22 09:12:27 +01:00
Rusty Russell
d5c0d21db8 gossipd: hand gossmap to gossmap_manage_get_node_addresses, not gossmap_manage.
We don't want to to refresh the gossmap internally: this could invalidate the
gossmap held by the current callers.

Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
2024-11-22 15:21:45 +10:30
Rusty Russell
69c252e06f gossmap: implement gossmap_random_node(), use it in gossipd.
It's easy for gossmap, since it has access to the htable.

Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
2024-11-22 15:21:45 +10:30
Alex Myers
363b721cd3 gossipd: use autoconnect-seeker-peers setting 2024-11-22 15:21:45 +10:30
Alex Myers
dc878dc937 lightningd: add option for minimum seeker autoconnect peers
Changelog-added: Added option --autoconnect-seeker-peers, allowing seeker to reach out to new nodes for additional gossip.
2024-11-22 15:21:45 +10:30
Alex Myers
f2243e6013 pytest: Add seeker autoconnect test 2024-11-22 15:21:45 +10:30
Alex Myers
dff5c893e7 gossipd: seeker: select random peer and tell lightningd
This does not validate a node announcement and address, but it
does select a node at random from the gossmap and asks lightningd
to attempt a connection to it.
2024-11-22 15:21:45 +10:30
Alex Myers
7fc214a67f gossipd: add request to connect to new gossip peer
Gossipd uses this to ask lightningd -> connectd to initiate
a connection to a new gossip peer.  This can be used when
there are insufficient peers already connected to gossip with.

Changelog-Changed: Gossipd can now request connections to additional nodes for improved gossip sync
2024-11-22 15:21:45 +10:30
Alex Myers
9cba417ed0 gossipd: seeker: allocate gossiper array at init
This will let us change the default gossipers at runtime
2024-11-22 15:21:45 +10:30
Rusty Russell
9295b4f77e common/test: fix -O3 compile error with gcc-12 (Ubuntu 12.3.0-17ubuntu1) 12.3.0
```
common/test/run-splice_script.c: In function ‘main’:
common/test/run-splice_script.c:349:17: error: ‘%.*s’ directive argument is null [-Werror=format-overflow=]
  349 |         printf("%.*s\n", (int)len, str);
      |                 ^~~~
cc1: all warnings being treated as errors
make: *** [Makefile:297: common/test/run-splice_script.o] Error 1
make: *** Waiting for unfinished jobs....
```

Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
2024-11-22 15:21:45 +10:30
Rusty Russell
8566370087 pytest: fix flake in test_gossip_force_broadcast_channel_msgs
We can get more gossip_filter messages now.  And we can also go over max-messages,
so increase that too.

```
        del tally['query_short_channel_ids']
        del tally['query_channel_range']
        del tally['ping']
>       assert tally == {'channel_announce': 1,
                         'channel_update': 3,
                         'node_announce': 1,
                         'gossip_filter': 1}
E       AssertionError: assert {'channel_ann..._announce': 1} == {'channel_ann..._announce': 1}
E         Omitting 2 identical items, use -vv to show
E         Differing items:
E         {'gossip_filter': 2} != {'gossip_filter': 1}
E         {'channel_update': 2} != {'channel_update': 3}
E         Full diff:
E           {
E            'channel_announce': 1,...
E         
E         ...Full output truncated (10 lines hidden), use '-vv' to show

tests/test_gossip.py:2326: AssertionError
```

Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
2024-11-22 14:01:44 +10:30
Rusty Russell
a295099ace pytest: fix flake in test_onchaind_replay.
We actually mine *300* blocks, not 200, and if timing is right l1
can have mined the txid before mine_txid_or_rbf() checks the mempool:

```
    def test_onchaind_replay(node_factory, bitcoind):
        disconnects = ['+WIRE_REVOKE_AND_ACK', 'permfail']
        # Feerates identical so we don't get gratuitous commit to update them
        l1, l2 = node_factory.line_graph(2, opts=[{'watchtime-blocks': 201, 'cltv-delta': 101,
                                                   'disconnect': disconnects,
                                                   'feerates': (7500, 7500, 7500, 7500)},
                                                  {'watchtime-blocks': 201, 'cltv-delta': 101}],
                                         wait_for_announce=True)
    
        inv = l2.rpc.invoice(10**8, 'onchaind_replay', 'desc')
        rhash = inv['payment_hash']
        routestep = {
            'amount_msat': 10**8 - 1,
            'id': l2.info['id'],
            'delay': 101,
            'channel': first_scid(l1, l2)
        }
        l1.rpc.sendpay([routestep], rhash, payment_secret=inv['payment_secret'])
        l1.daemon.wait_for_log('sendrawtx exit 0')
        bitcoind.generate_block(1, wait_for_mempool=1)
    
        # Wait for nodes to notice the failure, this seach needle is after the
        # DB commit so we're sure the tx entries in onchaindtxs have been added
        l1.daemon.wait_for_log("Deleting channel .* due to the funding outpoint being spent")
        l2.daemon.wait_for_log("Deleting channel .* due to the funding outpoint being spent")
    
        # We should at least have the init tx now
        assert len(l1.db_query("SELECT * FROM channeltxs;")) > 0
        assert len(l2.db_query("SELECT * FROM channeltxs;")) > 0
    
        # Generate some blocks so we restart the onchaind from DB (we rescan
        # last_height - 100)
        bitcoind.generate_block(100)
        sync_blockheight(bitcoind, [l1, l2])
    
        # l1 should still have a running onchaind
        assert len(l1.db_query("SELECT * FROM channeltxs;")) > 0
    
        l2.rpc.stop()
        l1.restart()
    
        # Can't wait for it, it's after the "Server started" wait in restart()
        assert l1.daemon.is_in_log(r'Restarting onchaind \(ONCHAIN\): closed in block 109')
    
        # l1 should still notice that the funding was spent and that we should react to it
        _, txid, blocks = l1.wait_for_onchaind_tx('OUR_DELAYED_RETURN_TO_WALLET',
                                                  'OUR_UNILATERAL/DELAYED_OUTPUT_TO_US')
        assert blocks == 200
        bitcoind.generate_block(200)
        # Could be RBF!
>       l1.mine_txid_or_rbf(txid)

tests/test_closing.py:1864: 
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ 
contrib/pyln-testing/pyln/testing/utils.py:1375: in mine_txid_or_rbf
    wait_for(lambda: rbf_or_txid_broadcast(txids))
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ 

success = <function LightningNode.mine_txid_or_rbf.<locals>.<lambda> at 0x7f9b129c4550>
timeout = 180

    def wait_for(success, timeout=TIMEOUT):
        start_time = time.time()
        interval = 0.25
        while not success():
            time_left = start_time + timeout - time.time()
            if time_left <= 0:
>               raise ValueError("Timeout while waiting for {}".format(success))
E               ValueError: Timeout while waiting for <function LightningNode.mine_txid_or_rbf.<locals>.<lambda> at 0x7f9b129c4550>
```

Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
2024-11-22 14:01:44 +10:30
Rusty Russell
8132d19ab5 configure: make configuration with address sanitizer find zlib.
The test program has a leak, so address sanitizer complains and makes it
"fail" the zlib detection test!

Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
2024-11-22 14:01:44 +10:30
Jesse de Wit
a90d9c9f4f tests: add pay test over unannounced channels
This test fails with cln v24.08.2. Add this test, so it doesn't happen
again.

Changelog-None
2024-11-21 11:22:26 +01:00
Rusty Russell
2c9023ee25 pytest: reenable askrene bias test.
We can fix the median calc by removing the (unused) reverse edges.

Also analyze the failure case in test_real_data: it's a real edge case, so
hardcode that one as "ok".

Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
2024-11-21 16:17:52 +10:30
Lagrang3
05514b46e3 Askrene: change median factor to 1.
The ratio of the median of the fees and probability cost is overall not
a bad factor to combine these two features. This is what the
test_real_data shows.

Changelog-None

Signed-off-by: Lagrang3 <lagrang3@protonmail.com>
2024-11-21 16:17:52 +10:30
Lagrang3
2b3fd67dfb askrene: don't skip fee_fallback test
The fee_fallback test would fail after fixing the computation of the
median. Now by we can restore it by making the probability cost factor
1000x higher than the ratio of the median. This shows how hard it is to
combine fee and probability costs and why is the current approach so
fragile.

Changelog-None

Signed-off-by: Lagrang3 <lagrang3@protonmail.com>
2024-11-21 16:17:52 +10:30
Lagrang3
9fdcc26d1d askrene: bugfix queue overflow
Changelog-none

Signed-off-by: Lagrang3 <lagrang3@protonmail.com>
2024-11-21 16:17:52 +10:30
Lagrang3
9966969d4c askrene: remove allocation checks
Rusty: "allocations don't fail"

Signed-off-by: Lagrang3 <lagrang3@protonmail.com>
2024-11-21 16:17:52 +10:30
Lagrang3
460a28bb32 askrene: add compiler flag ASKRENE_UNITTEST
Rusty: "We don't generally use NDEBUG in our code"

Instead use a compile time flag ASKRENE_UNITTEST to make checks on unit
tests that we don't normally need on release code.

Changelog-none

Signed-off-by: Lagrang3 <lagrang3@protonmail.com>
2024-11-21 16:17:52 +10:30
Lagrang3
b1cd26373b askrene: small fixes suggested by Rusty Russell
- use graph_max_num_arcs/nodes instead of tal_count in bound checks,
- don't use ccan/lqueue, use instead a minimalistic queue
  implementation with an array,
- add missing const qualifiers to temporary tal allocators,
- check preconditions with assert,
- remove inline specifier for static functions,

Changelog-None

Signed-off-by: Lagrang3 <lagrang3@protonmail.com>
2024-11-21 16:17:52 +10:30
Lagrang3
44c9609f3a askrene: add arbitrary precision flow unit
Changelog-none: askrene: add arbitrary precision flow unit

Signed-off-by: Lagrang3 <lagrang3@protonmail.com>
2024-11-21 16:17:52 +10:30
Lagrang3
225939a5e3 add ratio ceil and floor operators on amount_msat
Changelog-none: add ratio ceil and floor operators on amount_msat

Signed-off-by: Lagrang3 <lagrang3@protonmail.com>
2024-11-21 16:17:52 +10:30
Lagrang3
4dc1a44cd9 askrene: fix the median
The calculation of the median values of probability and fee cost in the
linear approximation had a bug by counting on non-existing arcs.

Changelog-none: askrene: fix the median

Signed-off-by: Lagrang3 <lagrang3@protonmail.com>
2024-11-21 16:17:52 +10:30
Lagrang3
ee623616d2 askrene: fix CI
check the return value of scanf in askrene unit tests,

Changelog-none: askrene: fix CI

Signed-off-by: Lagrang3 <lagrang3@protonmail.com>
2024-11-21 16:17:52 +10:30
Lagrang3
937cf7a554 askrene: use the new MCF solver
Changelog-none: askrene: use the new MCF solver

Signed-off-by: Lagrang3 <lagrang3@protonmail.com>
2024-11-21 16:17:52 +10:30
Lagrang3
84a9476311 askrene: add mcf_refinement to the public API
Changelog-none: askrene: add mcf_refinement to the public API

Signed-off-by: Lagrang3 <lagrang3@protonmail.com>
2024-11-21 16:17:52 +10:30
Lagrang3
2142094e43 askrene: fix bug, not all arcs exists
We use an arc "array" in the graph structure, but not all arc indexes
correspond to real topological arcs. We must be careful when iterating
through all arcs, and check if they are enabled before making operations
on them.

Changelog-None: askrene: fix bug, not all arcs exists

Signed-off-by: Lagrang3 <lagrang3@protonmail.com>
2024-11-21 16:17:52 +10:30
Lagrang3
e655fe7bbd askrene: add bigger test for MCF
Using zlib to read big test case file.

Changelog-None: askrene: add bigger test for MCF

Signed-off-by: Lagrang3 <lagrang3@protonmail.com>
2024-11-21 16:17:52 +10:30
Lagrang3
2ea8e49683 askrene: add a MCF refinement
Add a new function to compute a MCF using a more general description of
the problem. I call it mcf_refinement because it can start with a
feasible flow (though this is not necessary) and adapt it to achieve
optimality.

Changelog-None: askrene: add a MCF refinement

Signed-off-by: Lagrang3 <lagrang3@protonmail.com>
2024-11-21 16:17:52 +10:30
Lagrang3
1dfa562cd9 askrene algorithm add helper for flow conservation
Changelog-None: askrene algorithm add helper for flow conservation

Signed-off-by: Lagrang3 <lagrang3@protonmail.com>
2024-11-21 16:17:52 +10:30
Lagrang3
42d075cc97 askrene: add a simple MCF solver
Changelog-EXPERIMENTAL: askrene: add a simple MCF solver

Signed-off-by: Lagrang3 <lagrang3@protonmail.com>
2024-11-21 16:17:52 +10:30
Lagrang3
8558299dec askrene: add algorithm to compute feasible flow
Changelog-EXPERIMENTAL: askrene: add algorithm to compute feasible flow

Signed-off-by: Lagrang3 <lagrang3@protonmail.com>
2024-11-21 16:17:52 +10:30
Lagrang3
f4f2985bdf askrene: add dijkstra algorithm
Changelog-EXPERIMENTAL: askrene: add dijkstra algorithm

Signed-off-by: Lagrang3 <lagrang3@protonmail.com>
2024-11-21 16:17:52 +10:30
Lagrang3
507153a1cd askrene: add graph algorithms module
Changelog-EXPERIMENTAL: askrene: add graph algorithms module

Signed-off-by: Lagrang3 <lagrang3@protonmail.com>
2024-11-21 16:17:52 +10:30
Lagrang3
59ce410699 askrene: add priorityqueue
It is just a copy-paste of "dijkstra" but the name
implies what it actually is. Not an implementation of minimum cost path
Dijkstra algorithm, but a helper data structure.
I keep the old "dijkstra.h/c" files for the moment to avoid breaking the
current code.

Changelog-EXPERIMENTAL: askrene: add priorityqueue

Signed-off-by: Lagrang3 <lagrang3@protonmail.com>
2024-11-21 16:17:52 +10:30
Lagrang3
32548cf02b askrene: add a new graph abstraction
Changelog-EXPERIMENTAL: askrene new graph abstraction

Signed-off-by: Lagrang3 <lagrang3@protonmail.com>
2024-11-21 16:17:52 +10:30
Rusty Russell
799acc90e6 lightningd: tell gossipd channel is closed if it tells us about our channel and is wrong.
While we have (I hope!) fixed the underlying sync problem, this can still happen with
older gossip.  So now we tell it the channel is dead, so it won't happen more than once.

Fixes: #7703
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
2024-11-21 15:42:06 +10:30
Alex Myers
ead5dbf6a2 pytest: allow additional gossip filters
in test_gossip_force_broadcast_channel_msgs now that the seeker
is asking for periodic full gossip syncs.
2024-11-21 14:23:57 +10:30
Alex Myers
84b0dace31 gossipd: seeker: rotate worst gossiper every 30 minutes
This will allow all gossipers to be rotated in ~5 hours similar to
how it operated with half as many gossip streaming peers.
2024-11-21 14:23:57 +10:30
Alex Myers
323f23cf8f gossipd: seeker: rotate out the worst performing gossiper
Previously they were chosen at random.  We should instead
prefer to strop streaming from the peer who has provided
the least novel gossip.
2024-11-21 14:23:57 +10:30
Alex Myers
04180c1cad gossipd: add separate counter for unsolicted gossip
This will be used to drop underperforming gossipers instead
of choosing at random.
2024-11-21 14:23:57 +10:30
Alex Myers
995648911e gossipd: seeker: choose a new node when resyncing 2024-11-21 14:23:57 +10:30
Alex Myers
be694cba3a gossipd: seeker: add hourly full gossip resync from a random peer
This can help us backfill any missing gossip if our current
peers haven't been the most reliable.

Changelog-Changed: Gossipd requests a full sync from a random peer every hour.
2024-11-21 14:23:57 +10:30
Alex Myers
80bde554a4 gossipd: Increase gossiping peers to 10
Based on gossip sync data from random network peers, listening to only 5
peers will not reliably catch all gossip traffic.  For now, add extra
redundancy.
2024-11-21 14:23:57 +10:30