As per BOLT02 #message-retransmission :
if `next_commitment_number` is 1 in both the `channel_reestablish` it sent and received:
- MUST retransmit `funding_locked`
It seems we spend a lot of time waiting for `bitcoind` and `lightningd` to
talk to disks. This adds the `TEST_DIR` environment variable, allowing for
example to use `/dev/shm`, or a faster disk than the disk `/tmp` is on, as the
root directory for all test-related files.
Testing this on one of our builder machines cut the time to run the entire
suite under valgrind roughly in half (180-200 seconds vs 440-490 seconds).
My machine would accumulate a number of zombie lightningd and bitcoind
processes over time while testing. Investigating this showed that if a fixture
raised an exception during fixture teardown then other fixtures that have not
been torn down would linger around. The issue is that pytest treats exceptions
in fixtures as non-recoverable and therefore will not catch them and call the
remaining ones.
This commit adds a new fixture, that is there just to collect eventual errors
from other fixtures and ensure that anything that needs to clean up something,
e.g., processes started by the fixture, are cleaned up before we raise an
eventual exception. This is achieved by making any fixture that needs cleaning
up dependent on the teardown_checks fixture, which also serves as central
point to collect errors and printer of eventual errors.
Signed-off-by: Christian Decker <decker.christian@gmail.com>
This has a slight side-effect of removing the actual begin and commit
statements from the `db_write` hooks, but they are mostly redundant anyway (no
harm in grouping pre-init statements into one transaction, and we know that
each post-init call is supposed to be wrapped anyway).
Signed-off-by: Christian Decker <decker.christian@gmail.com>
These are used to do one-time initializations and wait for pending statements
before closing.
Signed-off-by: Christian Decker <decker.christian@gmail.com>
log files were being deleted on memleak errors, since
we weren't marking the node has having an error.
this helper function is designed to exactly handle this, so
we use the helper function and modify it to print any additional
error messages that are handed back from killall.
Throwing an exception while killing all nodes meant that
we aren't cleaning up all the nodes properly. Instead,
collect the errors, and return them back to the upper level,
where we report them and terminate as expected.
Memleaks appear in the logs as 'broken', so the broken log
check captures them as well. This moves broken to after memleak
so we get more informative error messages.
We were checking the test request against the searched for string. This fixes
it by actually looking at the outcome instead and should clean up correctly
if tests do not fail.
Signed-off-by: Christian Decker <decker.christian@gmail.com>
This is an issue that was raised in #2665: some of the dependencies where
causing warnings to be added to the logs about deprecated dependencies. Since
I did not get these warnings I just blanket updated all the dependencies in
the hopes of getting the warnings to resolve.
Signed-off-by: Christian Decker <@cdecker>
We seem to be getting intermittant failures, but it's hard
to disgnose. Simplify it by moving all the test logic into
the test itself, and making the plugin dumber. This means we'll
see exactly what the differences are if it fails again.
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
VALGRIND=1, SLOW_MACHINE=0:
Before: 197.74 seconds
After: 135.43 seconds
Note that we now spend about 13 seconds in teardown, could probably
be optimized.
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
During sync it is highly likely that we can coalesce multiple calls and share
results among them. We also report back failures for non-existing blocks early
on, so we don't run into issues with blocks that our bitcoind doesn't have
yet.
Signed-off-by: Christian Decker <decker.christian@gmail.com>
This was caused by us not checking against the max_blockheight, but rather the
min_blockheight which can be negative with a newly created node. This is still
safe since we check for duplicates anyway in `wallet_filteredblock_add`.
Signed-off-by: Christian Decker <decker.christian@gmail.com>
I don't remember ever seeing a bug which only showed up in VALGRIND=1 with developer
mode disabled, so don't test that, and spread out the other test more evenly.
In addition, disable the worst-performing tests in DEVELOPER=0 mode.
Here timings from my build machine: the worst 6 (- DEVELOPER=0 VALGRIND=0)
with the same tests (+ DEVELOPER=1 VALGRIND=1)
-452.42s call tests/test_pay.py::test_channel_spendable
+87.69s call tests/test_pay.py::test_channel_spendable
-335.66s call tests/test_gossip.py::test_gossip_store_compact_on_load
+47.41s call tests/test_gossip.py::test_gossip_store_compact_on_load
-332.07s call tests/test_connection.py::test_opening_tiny_channel
+89.71s call tests/test_connection.py::test_opening_tiny_channel
-331.97s call tests/test_pay.py::test_channel_spendable_large
+56.23s call tests/test_pay.py::test_channel_spendable_large
-305.28s call tests/test_invoices.py::test_invoice_routeboost
+37.57s call tests/test_invoices.py::test_invoice_routeboost
-284.28s call tests/test_plugin.py::test_htlc_accepted_hook_forward_restart
+49.12s call tests/test_plugin.py::test_htlc_accepted_hook_forward_restart
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
This is probably worth preventing.
1. Our depth estimate would be inaccurate possibly leading to us
timing out too early.
2. If we're not up-to-date our onchain funds are unknown.
3. We wouldn't be able to send or receive HTLCs until we're synced anyway.
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
We want to still allow incoming connections, and reestablishment of
channels, but if one tries to give us an HTLC, stall until we're
synced.
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
If we don't know block height, we shouldn't be sending HTLCs. This
stops us forwarding HTLCs as well as new payments.
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
You're supposed to be able to hand mock_rpc either a function to call,
or a dict canned response. We never did the latter, and the logic
was broken.
It was testing the key, not the value for whether it was a dict. And
it could never have given a valid response anyway, since it wouldn't
know the id to use. So assume it's a successful result.
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
If l2 doesn't think we're onchain yet, it treats the new connection from l1
as a reconnection, triggering 'ValueError: 1 nodes had unexpected reconnections'
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>