This seems to trigger now, especially on PostgresQL (maybe it's faster
to process blocks?).
e.g. test_closing_simple() hangs in close(), because the close is unilateral
because the HTLC timed out, so it's waiting for a block (other lines removed):
```
lightningd-1: 2022-01-12T00:33:46.258Z DEBUG 022d223620a359a47ff7f7ac447c85c46c923da53389221a0054c11c1e3ca31d59-channeld-chan#1: peer_out WIRE_COMMITMENT_SIGNED
lightningd-1: 2022-01-12T00:33:46.278Z DEBUG lightningd: close_command: timeout = 172800
2022-01-12T01:03:36.9757201Z lightningd-2: 2022-01-12T00:33:46.384Z DEBUG lightningd: Adding block 104: 73ffa19d27d048613b2731e1682b4efff0dc226807d8cc99d724523c2ea58204
2022-01-12T01:03:36.9759053Z lightningd-2: 2022-01-12T00:33:46.396Z DEBUG lightningd: Adding block 105: 44fd06ed053a0d0594abcfefcfa69089351fc89080826799fb4b278a68fe5c20
2022-01-12T01:03:36.9760865Z lightningd-2: 2022-01-12T00:33:46.406Z DEBUG lightningd: Adding block 106: 0fee2dcbd1376249832642079131275e195bba4fb49cc9968df3a899010bba0f
2022-01-12T01:03:36.9762632Z lightningd-2: 2022-01-12T00:33:46.418Z DEBUG lightningd: Adding block 107: 7f24f2d7d3e83fe3256298bd661e57cdf92b058440738fd4d7e1c8ef4a4ca073
2022-01-12T01:03:36.9773411Z lightningd-2: 2022-01-12T00:33:46.429Z DEBUG 0266e4598d1d3c415f572a8488830b60f7e744ed9235eb0b1ba93283b315c03518-channeld-chan#1: peer_in WIRE_REVOKE_AND_ACK
2022-01-12T01:03:36.9794707Z lightningd-2: 2022-01-12T00:33:46.437Z DEBUG 0266e4598d1d3c415f572a8488830b60f7e744ed9235eb0b1ba93283b315c03518-channeld-chan#1: Commits outstanding after recv revoke_and_ack
2022-01-12T01:03:36.9788197Z lightningd-2: 2022-01-12T00:33:46.433Z DEBUG lightningd: Adding block 108: 283b371fb5d1ef42980ea10ab9f5965a179af8e91ddf31c8176e79820e1ec54d
2022-01-12T01:03:36.9799347Z lightningd-2: 2022-01-12T00:33:46.439Z DEBUG 0266e4598d1d3c415f572a8488830b60f7e744ed9235eb0b1ba93283b315c03518-channeld-chan#1: HTLC 0[REMOTE] => RCVD_REMOVE_REVOCATION
2022-01-12T01:03:36.9808057Z lightningd-2: 2022-01-12T00:33:46.447Z UNUSUAL 0266e4598d1d3c415f572a8488830b60f7e744ed9235eb0b1ba93283b315c03518-chan#1: Peer permanent failure in CHANNELD_NORMAL: Fulfilled HTLC 0 RCVD_REMOVE_REVOCATION cltv 109 hit deadline
```
This is because `pay` returns from l1 when it has the preimage, not
when the HTLC is fully resolved. Add a helper for this, and call it
at the end of the pay test helper. We might need this elsewhere
though!
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
This loads up 20MB of plugins temporarily; we seem to be getting OOM
killed under CI and I wonder if this is contributing.
Doesn't significantly reduce runtime here, but I have lots of memory.
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
On my machine it should have produced:
/tmp/ltests-1p0v5z1j/.locks/lock-1631090663.1066182
but instead it produced:
/private/tmp/ltests-1p0v5z1j/.locks/lock-1631090663.105848
The mismatch resulted in a hang inside flock.
Path .resolve() seems to be causing problems for others as well. It appears the library was not meant to handle complex path situations and isn't maintained as such (see link reference).
Since we have already built a full path here anyway, the call to .resolve() appears redundant.
Tested on python 3.9.6 on my Mac OS X 11.4 (20F71), Xcode 12.5.1 (12E507)
Relevant discussion: https://discuss.python.org/t/pathlib-absolute-vs-resolve/2573
As suggested in this issue https://github.com/python/mypy/issues/7484#issuecomment-529363083 we skip following import becuase with the recent version of mypy the __init__.py file make confusione inside the analysis (in the python issue it is unclear the main motivation of this issue. At list unclear to me).
Signed-off-by: Vincenzo Palazzo <vincenzopalazzodev@gmail.com>
We would fail even if a process exited cleanly after having the line we
were looking for, because we checked if it died before reading the logs.
Co-Authored-by: Daniela Brozzoni <daniela@revault.dev>
Signed-off-by: Antoine Poinsot <darosior@protonmail.com>
This test takes 695 seconds, because fundwallet waits for the wallet to
notice the tx, which takes 60 seconds if not DEVELOPER. Do all the waiting
at once, and this speeds the test up to 153 seconds.
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
This adds our first (basic) schema, and sews support into pyln-testing
so it will load schemas for any method for doc/schemas/{method}.schema.json.
All JSON responses in a test run are checked against the schema (if any).
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
Tests that will only run when !EXPERIMENTAL_DUAL_FUND:
@pytest.marker.openchannel('v1')
def test_...()
Tests that will only run when EXPERIMENTAL_DUAL_FUND:
@pytest.marker.openchannel('v2')
def test_...()
Users are more upset recently with the cost of unilateral closes
than they are the risk of being cheated. While we complete our
anchor implementation so we can use low fees there, let's
get less aggressive (we already have 34 or 18 blocks to close
in the worst case).
The changes are:
- Commit transactions were "2 CONSERVATIVE" now "6 ECONOMICAL".
- HTLC resolution txs were "3 CONSERVATIVE" now "6 ECONOMICAL".
- Penalty txs were "3 CONSERVATIVE" now "12 ECONOMICAL".
- Normal txs were "4 ECONOMICAL" now "12 ECONOMICAL".
There can be no perfect levels, but we have had understandable
complaints recently about how high our default fee levels are.
Changelog-Changed: Protocol: channel feerates reduced to bitcoind's "6 block ECONOMICAL" rate.
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
You still shouldn't do this (you could get some transient failures),
but at least you have a decent chance if you reinstall over a running
daemon, instead of getting confusing internal errors if message
formats have changed.
Changelog-Added: lightningd: we now try to restart if subdaemons are upgraded underneath us.
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
Fixes: #4346
e.g. in test_closing_id we can get a spend from the first (closed) channel
in the same block as the open of the second. Half the time, we'll choose
the wrong one as scid.
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
Since we turned many errors into warnings, we want our tests to fail
when they happen unexpectedly. We make WARNING clear in the strings
we print, too, to help out.
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
This 'fixes' the `wait_for` helper by removing a pointless final
`time.sleep()`, thus potentially making the method return quicker.
The old code could have had three final states:
- success() := True
- Timeout and success() := True
- Timeout and success() := False
The new code has just two final state:
- success() := True
- Timeout and success() := False
It ensures the final `time.sleep()` is just the right amount before timeout.
And more importantly making it more readable :-)
Changelog-None
We are printing `repr(obj)` which is not pretty-printed, hard to read,
and can't even be copied and inspected to JSON tools. We now print the
JSONified and indented calls and responses for easier debugging based
on solely the logs (useful for CI!).
Changelog-Added: pyln-testing: The RPC client will now pretty-print requests and responses to facilitate log-based debugging.
We were getting a couple of starvations, so we need a fair filelock. I
also wasn't too happy with the lock as is, so I hand-coded it quickly.
Should be correct, but the overall timeout will tell us how well we
are doing on CI.
We were getting a number of incompatibility warning due to the
dependencies being expressed too rigidly. This losens the requirement
definitions to being compatible with a known good version, and while
we're at it we also bump all outdated requirements.
we got rid of push_msats for dual funded channels. this assumes that hte
peer will match an equal amount of sats as ours (the df_accepter.py
plugin will do this)
3.0.12 here:
```
pyln/testing/utils.py:9: error: Cannot find implementation or library stub for module named 'filelock'
pyln/testing/utils.py:9: error: Cannot find implementation or library stub for module named 'filelock'
pyln/testing/utils.py:9: note: See https://mypy.readthedocs.io/en/latest/running_mypy.html#missing-imports
```
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
This was causing the following error
```
Exception in thread Thread-553:
Traceback (most recent call last):
File "/usr/lib/python3.6/threading.py", line 916, in _bootstrap_inner
self.run()
File "/usr/lib/python3.6/threading.py", line 864, in run
self._target(*self._args, **self._kwargs)
File "/tmp/cirrus-ci-build/contrib/pyln-testing/pyln/testing/utils.py", line 232, in tail
self.err_logs.append(line.rstrip().decode('UTF-8', 'replace')).rstrip()
AttributeError: 'NoneType' object has no attribute 'rstrip'
[gw5] [ 33%] FAILED tests/test_misc.py::test_bitcoin_failure
```
Notice the second call to `.rstrip()` on the return value of `.append()`
Both my machine and apparently the CI tester machines regularly run
into issues with load on the system, causing timeouts (and
unresponsiveness). The throttler throttles the speed with which new
instances of c-lightning get started to avoid overloading. Since the
plugin used for parallelism when testing spawns multiple processes we
need to lock on the fs. Since we have that file open already, we'll
also write a couple of performance metics to it.
We sometimes have very specific sequences of tx broadcasts and blocks
being generated to confirm them. If the confirmation is missed the
test can completely get out of sync. Make debugging this easier by
logging what we confirmed.
Logs may contain non-ASCII and non-UTF8 symbols, which crashes the
tailer. It's better to replace them with a glyph representing
undecodeable symbols instead, and handle the issue further up the
call-chain.
We rely on it when logging, but didn't set it in the class, rather we
did in the sub-classes. This proved to be rather annoying to search
since it'd just fail silently and not log any output.
This will change the current hardwired PostgreSQL binary lookup mechanism to
utilizing the `pg_config --bindir` utility that should work better for most
distributions. The old method caused issues at least on Archlinux.
Changelog-None
Since we start a new instance of postgres for each test we may end up swamped
and the startup can take a bit longer. So let's loop until we get a success.
It was really flaky, especially under `test_mpp_interference_2`, most likely
due to multiple calls to `fund_channel`. This commit looks for the specific
txids in the `listfunds` output and the `getrawmempool` output, avoiding
strange artifacts from multiple calls.
For performance reasons we were starting one for each session, which caused
the same postgres DB to be re-used for multiple tests (all test run in the
same worker process), but this could lead to interactions if there is a
timeout or a test happens to touch the `db_provider`. It turns out that we
were only saving about 15 seconds on a 1250 second run anyway, which is a
small cost for increased test isolation.
We were not removing the base test directory if we had other files in there,
which was the case for postgres runs. This now explicitly check for `test_*`
directories which are an indicator of a failed test.
We had a couple of issues with workers dying and attempting to re-initialize
the database while it was already initialized. This will look for a free
directory and just start the DB in there, allowing workers to be better
isolated.
The next patch perturbed things enough that we suddenly started
getting (with --track-origins=yes):
Valgrind error file: valgrind-errors.120470
==120470== Use of uninitialised value of size 8
==120470== at 0x14EBD5: htable_val (htable.c:150)
==120470== by 0x14EC3C: htable_firstval_ (htable.c:165)
==120470== by 0x14F583: htable_del_ (htable.c:349)
==120470== by 0x11825D: pointer_referenced (memleak.c:65)
==120470== by 0x118485: scan_for_pointers (memleak.c:121)
==120470== by 0x118500: memleak_remove_region (memleak.c:130)
==120470== by 0x118A30: call_memleak_helpers (memleak.c:257)
==120470== by 0x118A8B: call_memleak_helpers (memleak.c:262)
==120470== by 0x118A8B: call_memleak_helpers (memleak.c:262)
==120470== by 0x118B25: memleak_find_allocations (memleak.c:278)
==120470== by 0x10EB12: closing_dev_memleak (closingd.c:584)
==120470== by 0x10F3E2: main (closingd.c:783)
==120470== Uninitialised value was created by a heap allocation
==120470== at 0x483B7F3: malloc (in /usr/lib/x86_64-linux-gnu/valgrind/vgpreload_memcheck-amd64-linux.so)
==120470== by 0x1604E8: allocate (tal.c:250)
==120470== by 0x160AA9: tal_alloc_ (tal.c:428)
==120470== by 0x119BE0: new_per_peer_state (per_peer_state.c:24)
==120470== by 0x11A101: fromwire_per_peer_state (per_peer_state.c:95)
==120470== by 0x10FB7C: fromwire_closingd_init (closingd_wiregen.c:103)
==120470== by 0x10ED15: main (closingd.c:626)
==120470==
This is because there is uninitialized padding at the end of struct
peer_state.
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
And when it's set, and we're SLOW_MACHINE, simply disable valgrind.
Since Travis (SLOW_MACHINE=1) only does VALGRIND=1 DEVELOPER=1 tests,
and VALGRIND=0 DEVELOPER=0 tests, it was missing tests which needed
DEVELOPER and !VALGRIND.
Instead, this demotes them to non-valgrind tests for SLOW_MACHINEs.
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
We can query all the txids at once, rather than one at a time.
Doesn't make any measurable difference to full runtime testing here
though.
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
We might have funds prior to calling join_nodes(), so testing that
we've all seen the block is better.
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
This is what fund_channel() does, which is more thorough than what
we were doing. But since the order of the logs is undefined, we need
to be a little careful.
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
Reduces VALGRIND=1 node_factory.line_graph(5) time on my laptop from 42s to 36s.
This is simply because forking all the subdaemons just to check the
version is very expensive under valgrind.
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
Telling `lightningd` to pass a `-datadir` to `bitcoin-cli` so it doesn't go
snooping where it doesn't belong (i.e., the user's home directory and config).
Changelog-None
Suggested-by: Simon Vrouwe <@SimonVrouwe>
Signed-off-by: Christian Decker <@cdecker>
This was breaking a couple of tests if the pyln version was not synced up with
`lightningd`, so now we just warn (these are then collected when running in
pytest and highlighted).
We only support a very limited number of argument combinations, and apparently
sometimes we trigger a case we aren't handling. This adds a more useful error
message, including the params we didn't match.
So far we've always cleared the node directory when provisioning the node, but
while testing some plugins we noticed that pre-generating some files in that
directory is useful. This just adds yet another flag to `get_node` that
disables deleting any existing node directory.
A CONSERVATIVE/3 target for them.
Some noisy changes to the tests as we had to update the estimatesmartfee
mock.
Changelog-Changed: We now use a higher feerate for resolving onchain HTLCs and for penalty transactions
We kept track of an URGENT, a NORMAL, and a SLOW feerate. They were used
for opening (NORMAL), mutual (NORMAL), UNILATERAL (URGENT) transactions
as well as minimum and maximum estimations, and onchain resolution.
We now keep track of more fine-grained feerates:
- `opening` used for funding and also misc transactions
- `mutual_close` used for the mutual close transaction
- `unilateral_close` used for unilateral close (commitment transactions)
- `delayed_to_us` used for resolving our output from our unilateral close
- `htlc_resolution` used for resolving onchain HTLCs
- `penalty` used for resolving revoked transactions
We don't modify our requests to our Bitcoin backend, as the next commit
will batch them !
Changelog-deprecated: The "urgent", "slow", and "normal" field of the `feerates` command are now deprecated.
Changelog-added: The fields "opening", "mutual_close", "unilateral_close", "delayed_to_us", "htlc_resolution" and "penalty" have been added to the `feerates` command.
pytest captures the output by monkey patching out `sys.stdout`. This may
conflict with our use of `sys.stdout` when configuring logging, resulting in
the "Write to closed file" issue that is spamming the logs. By making the
logging configuration a fixture hopefully we always use the correct
stdout (after pytest has monkey-patched it).
For some reason we fail to remove the test directory in some cases. My
hypothesis is that it is a daemon that is not completely shut down yet, and
still writes to the directory. This commit intercepts the error, prints any
files in the directory and re-raises the error. This should allow us to debug
the reappears.
Some tests may not spawn a node at all, so make sure that our assumption that
the directory exists in the fixture cleanup is correct by creating the
directory.
if the node fails to start (and we're expecting it to) return to us the
node object anyway
we also signal to collect all of its stderr logs by setting stderr
on the tailableproc that backs the node
The `generate` has been deprecated since 0.16 and has been removed in 0.18.0
so we better use `generatetoaddress` instead, which is already what we do with
`bitcoind`. So we remove the override here.