It was waiting for a remote channel, but not for all the interesting
channels we want to check. It can sometimes happen that further away
channels are added before closer ones are added, depending on
propagation path, flush timers and bitcoind poll timers. This now just
checks for all channels, which also reduces the ambiguity of whether
we selected a path solely because we were lacking alternatives.
Signed-off-by: Christian Decker <decker.christian@gmail.com>
We were restarting the with the nodes before, which was causing some
port contention. This is more natural since `bitcoind` will take care
of terminating all proxies it returned.
Signed-off-by: Christian Decker <decker.christian@gmail.com>
We used to have a bug where decoderawtransaction would fail, fixed in
fedcfd661 (pytest: hand 'True' to decoderawtransaction so it doesn't
get confused.).
So we can remove the fallback decode, and might as well extract the
ugliness into a helper function.
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
Currently only used by gossipd for channel elimination.
Also print them in canonical form (/[01]), so tests need to be
changed.
Suggested-by: @cdecker
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
Since we are planning to release a bug fix release, and the plugin
subsystem is not yet complete, it is better to make plugin support
opt-in while we continue testing.
Signed-off-by: Christian Decker <decker.christian@gmail.com>
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
It wasn't JSON formatted either so there was no nice pretty-printing
way. This jsonifies and pretty prints it.
Signed-off-by: Christian Decker <@cdecker>
Because gossip in this case takes up to a minute, this test took 10
minutes. The workaround is to do the waiting-for-gossip all at once.
Now it takes 362 seconds.
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
In one case we can reduce, in the others we eliminated if VALGRIND.
Here are the ten slowest tests on my laptop:
469.75s call tests/test_closing.py::test_closing_torture
243.61s call tests/test_closing.py::test_onchain_multihtlc_our_unilateral
222.73s call tests/test_closing.py::test_onchain_multihtlc_their_unilateral
217.80s call tests/test_closing.py::test_closing_different_fees
146.14s call tests/test_connection.py::test_dataloss_protection
138.93s call tests/test_connection.py::test_restart_many_payments
129.66s call tests/test_gossip.py::test_gossip_persistence
128.73s call tests/test_connection.py::test_no_fee_estimate
122.46s call tests/test_misc.py::test_htlc_send_timeout
118.79s call tests/test_closing.py::test_onchain_dust_out
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
generate was deprecated some time ago, so we added the generate_block()
helper. But many calls crept back in, and git master refuses it.
(test_blockchaintrack relied on the return value, so make generate_block
return the list of blocks).
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
After Ubuntu 18.10 upgrade, lots of new flake8 warnings.
$ flake8 --version:
3.5.0 (mccabe: 0.6.1, pycodestyle: 2.4.0, pyflakes: 1.6.0) CPython 3.6.7rc1 on Linux
Note it seems that W503 warned about line breaks before binary
operators, and W504 complains about them after. I prefer W504, so
disable W503.
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
Occasional failure in test_fulfill_incoming_first where the channel
closed before the final message from dev_disonnect was read. Cause
was the peer writing a gossip msg and failing due to ECONNRESET, before
it read the final message.
(Managed to reproduce under strace -f, FTW).
This is really a symptom of the fact that line_graph's announce=True
didn't wait for node announcements. Let's do that.
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
Adapts the `test_forward_stats` test to include checks for the
`forwarded_payments` table. Will add checks for the `listforwardings`
RPC call next.
Signed-off-by: Christian Decker <@cdecker>
We extract the tx from the logs, and then we wait until that hits
the mempool. This is more reliable than 'sendrawtx' in the logs,
which might catch a previous sendrawtx; it's also more explicit
that we expect that tx exactly.
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
BOLT 7's been updated to split the flags field in `channel_update`
into two: `channel_flags` and `message_flags`. This changeset does the
minimal necessary to get to building with the new flags.
Got a spurious failure in test_no_fee_estimate; we fired too soon from the logs (presumably
we raced in on the first response, but estimatesmartfee gets called 3 times).
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
This is a simple reverse proxy that `bitcoin-cli` can talk to when invoked by
`lightningd`. It allows us to trace `bitcoin-cli` calls, and intercept calls to
mock the replies, better than the current bash-script based method.
And no more filtering out messages, as we should no longer spam the
logs with them (the 'Connected json input' one was removed some time
ago).
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
Tests were failing when in the same thread after a test which set
log_all_io=True, because SIGUSR1 seemed to be turning logging *off*.
This is due to Python using references not copies for assignment.
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
This is required for the next test, which has to log messages from channeld
as soon as it starts (so might be too late if it sends SIGUSR1).
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
Useful it we want to intercept bitcoin-cli first.
We move the getinfo() caching into start(), as that's when we can actually
use RPC.
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
We're going to use it to override specific commands. It's non-valgrinded
already since we use '--trace-children-skip=*bitcoin-cli*' so the overhead
should be minimal.
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
There's now a potential race: the source peer connect returns, but in
destination peer the master hasn't read the connect message from
connectd, so the peer isn't in listpeers yet.
(Previously the connection stayed in connectd, so there was no such
window).
This is an occasional issue in a few places.
Note that we take the opportunity to speed up test_disconnectpeer too
while we're there.
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
In particular, I found lightning_openingd processes after running
tests. When we use the dev_disconnect blackhole '0' option, they
stick around until the dev_disconnect file is truncated (there is only
so much you can do with only a file descriptor), so let's do that.
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
The following changes revealed this race, where expecting listchannels()
to contain two channels immediately after fund_channel() was racy.
We also derive the short_channel_id first, so we can search logs for the
exact messages.
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
The next patches get better at reconecting, so if we use dev-allow-localhost
nodes can often find each other and reconnect before shutting down; only
use that option where we actually need it.
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
I could not figure out why test_announce_address suddenly stopped working:
I had previously been using DEVELOPER=1 on the cmdline for historical
reasons when testing locally.
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
Python dict can't have duplicate entries, but some options can be specified
multiple times. The easiest way is to put a list in the dict.
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
E ConnectionRefusedError: [Errno 111] Connection refused
And in debug.log:
2018-05-17T04:06:35Z Warning: Config setting for -rpcport only applied on regtest network when in [regtest] section.
Unfortunately, current versions including 0.16.1 *ignore* the contents of
a '[regtest]' section, so we need it in *both* places.
Also remove the misleading 'rpcport' initialization which we always
override.
Note that we don't fix this message though:
2018-05-17T04:06:35Z Config options rpcuser and rpcpassword will soon be deprecated. Locally-run instances may remove rpcuser to use cookie-based auth, or may be replaced with rpcauth. Please see share/rpcuser for rpcauth auth generation.
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
For some reason, we created a second bitcoin.conf in the regtest/ directory,
which AFAICT nothing uses.
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
It's not optional for our test setup, and this makes it easier to invoke
bitcoin-cli manually, for example.
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
When opening a number channels from a single node we could end up not waiting
for the funding tx to make it into the mempool, instead triggering on a previous
`sendrawtransaction` or `CHANNEL_NORMAL` in the logs. This now checks that the
actual funding transaction makes it into the mempool and that we wait for the
depth change for that specific channel.
Signed-off-by: Christian Decker <decker.christian@gmail.com>
Mostly `lightningd` complaining about not being able to estimate fees. Safes us
a lot of log space when some tests time out, and safes us a few context switches
between log appender and log watchers.
Signed-off-by: Christian Decker <decker.christian@gmail.com>
Make --override-fee-rates a dev option. We use default-fee-rate in
its place, which (since bitcoind won't give fee estimates in regtest
mode for short chains) gives an effective feerate of 15000/7500/3750.
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
Someone could try to announce an internal address, and we might probe
it.
This breaks tests, so we add '--dev-allow-localhost' for our tests, so
we don't eliminate that one. Of course, now we need to skip some more
tests in non-developer mode.
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
If we're given a wildcard address, we can't announce it like that: we need
to try to turn it into a real address (using guess_address). Then we
use that address. As a side-effect of this cleanup, we only announce
*any* '--addr' if it's routable.
This fix means that our tests have to force '--announce-addr' because
otherwise localhost isn't routable.
This means that gossipd really controls the addresses now, and breaks
them into two arrays: what we bind to, and what we announce. That is
now what we return to the master for json_getinfo(), which prints them
as 'bindings' and 'addresses' respectively.
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
It's become clear that our network options are insufficient, with the coming
addition of Tor and unix domain support.
Currently:
1. We always bind to local IPv4 and IPv6 sockets, unless --port=0, --offline,
or any address is specified explicitly. If they're routable, we announce.
2. --addr is used to announce, but not to control binding.
After this change:
1. --port is deprecated.
2. --addr controls what we bind to and announce.
3. --bind-addr/--announce-addr can be used to control one and not the other.
4. Unless --autolisten=0, we add local IPv4 & IPv6 port 9735 (and announce if they are routable).
5. --offline still overrides listening (though announcing is still the same).
This means we can bind to as many ports/interfaces as we want, and for
special effects we can announce different things (eg. we're sitting
behind a port forward or a proxy).
What remains to implement is semi-automatic binding: we should be able
to say '--addr=0.0.0.0:9999' and have the address resolve at bind
time, or even '--addr=0.0.0.0:0' and have the port autoresolve too
(you could determine what it was from 'lightning-cli getinfo'.
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
We're about to change the JSONRPC, so let's put an explicit 'port' into
our node class.
We initialize it at startup time: in future I hope to use ephemeral ports
to make our tests more easily parallelizable.
Suggested-by: @cdecker
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
This used to be the port, but since we no longer have fixed ports, and we start
them in random order we can't easily distinguish them by the port anymore. Just
use a numeric ID that matches their lightning-dirs.
Signed-off-by: Christian Decker <decker.christian@gmail.com>
We never really look at the output, and it's rather noisy, so we just stop
writing to the log.
Signed-off-by: Christian Decker <decker.christian@gmail.com>
We can create the hsm file from python directly; that works even if we
don't have DEVELOPER set, and is simpler.
We add a test that the aliases are correct.
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
This shaves off about 15% of our integration testing suite on my machine. It
assumes we never reorg below the first block the node starts with, which is true
for all tests, so it's safe.
Signed-off-by: Christian Decker <decker.christian@gmail.com>
I had a weird failure which was caused by an unexpected disconnect and
reconnecct. Since we are prersistend and recover from these, they can
slip through our tests; most tests don't involve reconnection, so we
need to catch this explicitly.
For the connect() helper, we always suppress reconnection; tests which
want it all want other options so don't use this helper anyway. (Actually,
after I said that, test_closing_while_disconnected was added when I
rebased, which did require it, so I had to open-code that one).
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
This allows us to have some default options that can then be overridden easily
on a per-test basis.
Signed-off-by: Christian Decker <decker.christian@gmail.com>
Individual tests can always re-enable them, though.
[ More test fallout fixes by Christian Decker ]
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
Seems to avoid the nasty python resource warnings, as well as the
fatal 'ValueError: PyMemoryView_FromBuffer(): info->buf must not be NULL'
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
This, of course, should never be used. But it helps maintain connections
for the moment while we dig deeper into feerates.
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
CI always runs with TEST_DEBUG=1 which prints logs anyway, and testing
locally should also be done this way, combined with pytest which
captures the logs. No need to duplicate the functionality of pytest.
Signed-off-by: Christian Decker <decker.christian@gmail.com>
With python-bitcoinlib==0.9.0 it appears that the URL based auth
information is no longer used, so we fall back to reading the config
file for the bitcoind daemon instead.
Signed-off-by: Christian Decker <decker.christian@gmail.com>
If you run locally, it fails occasionally; presumably because it
sees previous funds. Use a random HSM key for that teste.
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
I noticed some breakage with git master:
1. getinfo no longer supported (for us, use getblockchaininfo)
2. generate no longer supported (use generatetoaddress)
Both these options are supported at least in 0.15, too.
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
These need to be different for testing the example in BOLT 11.
We also use the cltv_final instead of deadline_blocks in the final hop:
various tests assumed 5 was OK, so we tweak utils.py.
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
This is a bit messier than I'd like, but we want to clearly remove all
dev code (not just have it uncalled), so we remove fields and functions
altogether rather than stub them out. This means we put #ifdefs in callers
in some places, but at least it's explicit.
We still run tests, but only a subset, and we run with NO_VALGRIND under
Travis to avoid increasing test times too much.
See-also: #176
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
Makes it easier to compare before/after failures. Ideally, we should
run under Travis both with this option and with the seed based on the
entire tmp path (which is still reproducible with determination, but
not fixed every run like this is).
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
Now that we have HTLC persistence we'd also like to test it. This
kills the second node in the middle of an HTLC, it'll recover and
finish the flow.
Signed-off-by: Christian Decker <decker.christian@gmail.com>
This broke somewhere in the recent changes, because we override
TailalbleProc stop(). Break out log extractor.
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
Moved the flagging for allowed failures into the factory getter, and
renamed into `may_fail`. Also stopped the teardown of a node from
throwing an exception if we are allowed to exit non-cleanly.
Signed-off-by: Christian Decker <decker.christian@gmail.com>
We used to simply kill the daemon, which in some cases could result in
half-written crashlogs and similar artifacts such as half-completed
RPC calls. Now we ask lightningd to stop nicely, give it some time and
only then kill it. We also return the returncode of the daemon.
Signed-off-by: Christian Decker <decker.christian@gmail.com>
Note that it should really be a flag to daemon on construction, too,
but that may interfere with another concurrent branch so I've deferred.
Suggested-by: Christian Decker
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
In the next test, we wait for multiple 'sendrawtx exit 0' which
doesn't work because we use a set not a list, and the current code
would match multiple against the same thing. The result was we didn't
wait for the final sendrawtransaction, and occasionally had test
failures as a result.
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
When we see an offered HTLC onchain, we need to use the preimage if we
know it. So we dump all the known HTLC preimages at startup, and send
new ones as we discover them.
This doesn't cover preimages we know because we're the final
recipient; that can happen if an HTLC hasn't been irrevocably
committed yet. We'll do that in a followup patch.
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
We simply kill lightningd; we should stop it properly and have a timeout
to kill it if that fails. However, that's beyond my python skills :(
So we just look for crash.log. Unfortunately, we usually kill
lightningd before it's finished writing it. So we look for it and
don't kill lightningd, just wait in this case.
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
To reproduce the next bug, I had to ensure that one node keeps thinking it's
disconnected, then the other node reconnects, then the first node realizes
it's disconnected.
This code does that, adding a '0' dev-disconnect modifier. That means
we fork off a process which (due to pipebuf) will accept a little
data, but when the dev_disconnect file is truncated (a hacky, but
effective, signalling mechanism) will exit, as if the socket finally
realized it's not connected any more.
The python tests hang waiting for the daemon to terminate if you leave
the blackhole around; to give a clue as to what's happening in this
case I moved the log dump to before killing the daemon.
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
I was hoping to defer HTLC updates until we actually store HTLCs, but
we need to flush to DB whenever balances update as well.
Signed-off-by: Christian Decker <decker.christian@gmail.com>
I was hoping to trigger on more things from the bitcoind process, but
stuff like mempool is hard to trigger on. Reducing to info so we can
work a bit easier with pdb and the log becomes less noisy.
I have a test which waits for multiple occurrences of the same string,
but doesn't want them to overlap. Make wait_for_log() do the right thing,
so that it only looks for log entries since the last wait_for_log.
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
On my laptop under load, 5 seconds was no longer enough for legacy.
But this breaks async (they all see mempool increase, and fire
prematurely), so stop doing that.
I can't get this test to work at all, in fact, without this patch.
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
1. We explicitly assert what state we're coming from, to make transitions
clearer.
2. Every transition has a state, even between owners while waiting for HSM.
3. Explictly step though getting the HSM signature on the funding tx
before starting channeld, rather than doing it in parallel: makes
states clearer.
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
I couldn't actually figure out how to just dump them on error, so I
dump all the time. When running 3 lightningd + bitcoind, this separates
the logs nicely.
TODO: We should delete the directories on success!
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
This moves all the non-legacy blackbox testing into python.
Before:
real 10m18.385s
After:
real 9m54.877s
Note that this doesn't valgrind the subdaemons: that patch seems to cause
some issues in the python framework which I am still chasing.
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
Running long integration tests could result in `bitcoind` dropping the
connection inbetween calls, and since python-bitcoinlib does not
reconnect and/or retry, all subsequent tests would fail as well. This
patch switches to throwaway connections, each serving just one
request. It's easier than to reach into the bitcoinlib to reconnect
and reauth, but comes with some overhead, but I think it's acceptable
for the few bitcoin calls we actually perform.
By looking for 'Done loading' in the log output we should actually be
called after `SetRPCWarmupFinished` in bitcoind. Only then is it safe
to make RPC calls. This resulted in the test suite being a bit flaky.