Since we turned many errors into warnings, we want our tests to fail
when they happen unexpectedly. We make WARNING clear in the strings
we print, too, to help out.
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
This 'fixes' the `wait_for` helper by removing a pointless final
`time.sleep()`, thus potentially making the method return quicker.
The old code could have had three final states:
- success() := True
- Timeout and success() := True
- Timeout and success() := False
The new code has just two final state:
- success() := True
- Timeout and success() := False
It ensures the final `time.sleep()` is just the right amount before timeout.
And more importantly making it more readable :-)
Changelog-None
We are printing `repr(obj)` which is not pretty-printed, hard to read,
and can't even be copied and inspected to JSON tools. We now print the
JSONified and indented calls and responses for easier debugging based
on solely the logs (useful for CI!).
Changelog-Added: pyln-testing: The RPC client will now pretty-print requests and responses to facilitate log-based debugging.
We were getting a couple of starvations, so we need a fair filelock. I
also wasn't too happy with the lock as is, so I hand-coded it quickly.
Should be correct, but the overall timeout will tell us how well we
are doing on CI.
we got rid of push_msats for dual funded channels. this assumes that hte
peer will match an equal amount of sats as ours (the df_accepter.py
plugin will do this)
3.0.12 here:
```
pyln/testing/utils.py:9: error: Cannot find implementation or library stub for module named 'filelock'
pyln/testing/utils.py:9: error: Cannot find implementation or library stub for module named 'filelock'
pyln/testing/utils.py:9: note: See https://mypy.readthedocs.io/en/latest/running_mypy.html#missing-imports
```
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
This was causing the following error
```
Exception in thread Thread-553:
Traceback (most recent call last):
File "/usr/lib/python3.6/threading.py", line 916, in _bootstrap_inner
self.run()
File "/usr/lib/python3.6/threading.py", line 864, in run
self._target(*self._args, **self._kwargs)
File "/tmp/cirrus-ci-build/contrib/pyln-testing/pyln/testing/utils.py", line 232, in tail
self.err_logs.append(line.rstrip().decode('UTF-8', 'replace')).rstrip()
AttributeError: 'NoneType' object has no attribute 'rstrip'
[gw5] [ 33%] FAILED tests/test_misc.py::test_bitcoin_failure
```
Notice the second call to `.rstrip()` on the return value of `.append()`
Both my machine and apparently the CI tester machines regularly run
into issues with load on the system, causing timeouts (and
unresponsiveness). The throttler throttles the speed with which new
instances of c-lightning get started to avoid overloading. Since the
plugin used for parallelism when testing spawns multiple processes we
need to lock on the fs. Since we have that file open already, we'll
also write a couple of performance metics to it.
We sometimes have very specific sequences of tx broadcasts and blocks
being generated to confirm them. If the confirmation is missed the
test can completely get out of sync. Make debugging this easier by
logging what we confirmed.
Logs may contain non-ASCII and non-UTF8 symbols, which crashes the
tailer. It's better to replace them with a glyph representing
undecodeable symbols instead, and handle the issue further up the
call-chain.
We rely on it when logging, but didn't set it in the class, rather we
did in the sub-classes. This proved to be rather annoying to search
since it'd just fail silently and not log any output.
This will change the current hardwired PostgreSQL binary lookup mechanism to
utilizing the `pg_config --bindir` utility that should work better for most
distributions. The old method caused issues at least on Archlinux.
Changelog-None
Since we start a new instance of postgres for each test we may end up swamped
and the startup can take a bit longer. So let's loop until we get a success.
It was really flaky, especially under `test_mpp_interference_2`, most likely
due to multiple calls to `fund_channel`. This commit looks for the specific
txids in the `listfunds` output and the `getrawmempool` output, avoiding
strange artifacts from multiple calls.
For performance reasons we were starting one for each session, which caused
the same postgres DB to be re-used for multiple tests (all test run in the
same worker process), but this could lead to interactions if there is a
timeout or a test happens to touch the `db_provider`. It turns out that we
were only saving about 15 seconds on a 1250 second run anyway, which is a
small cost for increased test isolation.
We were not removing the base test directory if we had other files in there,
which was the case for postgres runs. This now explicitly check for `test_*`
directories which are an indicator of a failed test.
We had a couple of issues with workers dying and attempting to re-initialize
the database while it was already initialized. This will look for a free
directory and just start the DB in there, allowing workers to be better
isolated.
The next patch perturbed things enough that we suddenly started
getting (with --track-origins=yes):
Valgrind error file: valgrind-errors.120470
==120470== Use of uninitialised value of size 8
==120470== at 0x14EBD5: htable_val (htable.c:150)
==120470== by 0x14EC3C: htable_firstval_ (htable.c:165)
==120470== by 0x14F583: htable_del_ (htable.c:349)
==120470== by 0x11825D: pointer_referenced (memleak.c:65)
==120470== by 0x118485: scan_for_pointers (memleak.c:121)
==120470== by 0x118500: memleak_remove_region (memleak.c:130)
==120470== by 0x118A30: call_memleak_helpers (memleak.c:257)
==120470== by 0x118A8B: call_memleak_helpers (memleak.c:262)
==120470== by 0x118A8B: call_memleak_helpers (memleak.c:262)
==120470== by 0x118B25: memleak_find_allocations (memleak.c:278)
==120470== by 0x10EB12: closing_dev_memleak (closingd.c:584)
==120470== by 0x10F3E2: main (closingd.c:783)
==120470== Uninitialised value was created by a heap allocation
==120470== at 0x483B7F3: malloc (in /usr/lib/x86_64-linux-gnu/valgrind/vgpreload_memcheck-amd64-linux.so)
==120470== by 0x1604E8: allocate (tal.c:250)
==120470== by 0x160AA9: tal_alloc_ (tal.c:428)
==120470== by 0x119BE0: new_per_peer_state (per_peer_state.c:24)
==120470== by 0x11A101: fromwire_per_peer_state (per_peer_state.c:95)
==120470== by 0x10FB7C: fromwire_closingd_init (closingd_wiregen.c:103)
==120470== by 0x10ED15: main (closingd.c:626)
==120470==
This is because there is uninitialized padding at the end of struct
peer_state.
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
And when it's set, and we're SLOW_MACHINE, simply disable valgrind.
Since Travis (SLOW_MACHINE=1) only does VALGRIND=1 DEVELOPER=1 tests,
and VALGRIND=0 DEVELOPER=0 tests, it was missing tests which needed
DEVELOPER and !VALGRIND.
Instead, this demotes them to non-valgrind tests for SLOW_MACHINEs.
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
We can query all the txids at once, rather than one at a time.
Doesn't make any measurable difference to full runtime testing here
though.
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
We might have funds prior to calling join_nodes(), so testing that
we've all seen the block is better.
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
This is what fund_channel() does, which is more thorough than what
we were doing. But since the order of the logs is undefined, we need
to be a little careful.
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
Reduces VALGRIND=1 node_factory.line_graph(5) time on my laptop from 42s to 36s.
This is simply because forking all the subdaemons just to check the
version is very expensive under valgrind.
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
Telling `lightningd` to pass a `-datadir` to `bitcoin-cli` so it doesn't go
snooping where it doesn't belong (i.e., the user's home directory and config).
Changelog-None
Suggested-by: Simon Vrouwe <@SimonVrouwe>
Signed-off-by: Christian Decker <@cdecker>
This was breaking a couple of tests if the pyln version was not synced up with
`lightningd`, so now we just warn (these are then collected when running in
pytest and highlighted).
We only support a very limited number of argument combinations, and apparently
sometimes we trigger a case we aren't handling. This adds a more useful error
message, including the params we didn't match.
So far we've always cleared the node directory when provisioning the node, but
while testing some plugins we noticed that pre-generating some files in that
directory is useful. This just adds yet another flag to `get_node` that
disables deleting any existing node directory.
A CONSERVATIVE/3 target for them.
Some noisy changes to the tests as we had to update the estimatesmartfee
mock.
Changelog-Changed: We now use a higher feerate for resolving onchain HTLCs and for penalty transactions
We kept track of an URGENT, a NORMAL, and a SLOW feerate. They were used
for opening (NORMAL), mutual (NORMAL), UNILATERAL (URGENT) transactions
as well as minimum and maximum estimations, and onchain resolution.
We now keep track of more fine-grained feerates:
- `opening` used for funding and also misc transactions
- `mutual_close` used for the mutual close transaction
- `unilateral_close` used for unilateral close (commitment transactions)
- `delayed_to_us` used for resolving our output from our unilateral close
- `htlc_resolution` used for resolving onchain HTLCs
- `penalty` used for resolving revoked transactions
We don't modify our requests to our Bitcoin backend, as the next commit
will batch them !
Changelog-deprecated: The "urgent", "slow", and "normal" field of the `feerates` command are now deprecated.
Changelog-added: The fields "opening", "mutual_close", "unilateral_close", "delayed_to_us", "htlc_resolution" and "penalty" have been added to the `feerates` command.
pytest captures the output by monkey patching out `sys.stdout`. This may
conflict with our use of `sys.stdout` when configuring logging, resulting in
the "Write to closed file" issue that is spamming the logs. By making the
logging configuration a fixture hopefully we always use the correct
stdout (after pytest has monkey-patched it).
For some reason we fail to remove the test directory in some cases. My
hypothesis is that it is a daemon that is not completely shut down yet, and
still writes to the directory. This commit intercepts the error, prints any
files in the directory and re-raises the error. This should allow us to debug
the reappears.
Some tests may not spawn a node at all, so make sure that our assumption that
the directory exists in the fixture cleanup is correct by creating the
directory.
if the node fails to start (and we're expecting it to) return to us the
node object anyway
we also signal to collect all of its stderr logs by setting stderr
on the tailableproc that backs the node
The `generate` has been deprecated since 0.16 and has been removed in 0.18.0
so we better use `generatetoaddress` instead, which is already what we do with
`bitcoind`. So we remove the override here.
We were hardcoding the chainparams->chain_hash which caused the query to
return an empty result. By parametrizing the test we can make it work on
elements.
In the c-lightning tests we have `tests/conftest.py` which annotates test
function with the outcome. If we use pyln-testing outside of the c-lightning
tree we cannot rely on that annotation being there, so we assume it passed.
Changelog-changed: .lightningd plugins and files moved into <network>/ subdir
Changelog-changed: WARNING: If you don't have a config file, you now may need to specify the network to lightning-cli
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
Spaces just make life a little harder for everyone.
(Plus, fix documentation: it's 'jsonrpc' not 'json' subsystem).
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
This simplifies our tests, too, since we don't need a magic option to
enable io logging in subdaemons.
Note that test_bad_onion still takes too long, due to a separate minor
bug, so that's marked and left dev-only for now.
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
1. Printed form is always "[<nodeid>-]<prefix>: <string>"
2. "jcon fd %i" becomes "jsonrpc #%i".
3. "jsonrpc" log is only used once, and is removed.
4. "database" log prefix is use for db accesses.
5. "lightningd(%i)" becomes simply "lightningd" without the pid.
6. The "lightningd_" prefix is stripped from subd log prefixes, and pid removed.
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
Changelog-changed: Logging: formatting made uniform: [NODEID-]SUBSYSTEM: MESSAGE
Changelog-removed: `lightning_` prefixes removed from subdaemon names, including in listpeers `owner` field.
A log can have a default node_id, which can be overridden on a per-entry
basis. This changes the format of logging, so some tests need rework.
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
Quite a few of the things in the LightningNode class are tailored to their use
in the c-lightning tests, so I decided to split those customizations out into
a sub-class, and adding one more fixture that just serves the class. This
allows us to override the LightningNode implementation in our own tests, while
still having sane defaults for other users.
We were relying heavily on NodeFactory to do some magic before instantiating
the Node with rpc and daemon initialized, that meant that we'd have to replace
all 3 classes when customizing the node to our needs. Moving that
initialization into the node itself means that the LightningNode class now can
be swapped out and customized, without having to wire everything else through.
`DEVELOPER=1` assumes that the binary has been compiled with developer set to
true, which might not be the case for plugin developers. Setting this to 0 by
default has no effect in c-lightning since we always at least set it in
`config.vars` but may prevent some issues outside.
We'll rewrite the tests to use this infrastructure in the next commit.
Changelog-Added: The new `pyln-testing` package now contains the testing infrastructure so it can be reused to test against c-lighting in external projects