It's an array: we were only saving the single element; if there was more than
one changed HTLC we'd get a bad signature!
The report in #1907 is probably caused by the other side re-requesting
something we considered already finalized; to avoid this particular error,
we should set the field to NULL if there's no last_sent_commit.
I'm increasingly of the opinion we want to just save all the update
packets to the db and blast them out, instead of doing this
second-guessing dance.
Fixes: #1907
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
These happen after we compact the store; every log I've seen of a
restart on a real node has a message about truncating the store,
because node_announcements predate channel_announcements.
I extracted one such case from testnet, and reduced it to test here.
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
1. Wait for a 'sendrawtransaction' *after* the dev-fail message; don't be
fooled by a previous one.
2. Turning on estimate fee sets fees exactly; just wait for it to be processed.
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
It's probably unnecessary to have this weird way of injecting results
now we have explicit feerate args.
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
And, reluctantly, default to bitcoind style.
"It's wrong to be right too soon."
Suggested-by: @cdecker
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
We could refine this later (based on existing wallet, for example), but
this gives some estimate.
[ Rename onchain_estimates -> onchain_fee_estimates Suggested-by: @SimonVrouwe ]
[ Factor of 1000 fix Reported-by: @SimonVrouwe ]
Suggested-by: @molxyz
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
We don't know what our peer is doing, but if we see those values, maybe
they did too, and for longer. And add the min/max acceptable values
into our JSON API.
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
This is useful mainly in the case where bitcoind is not giving estimates,
but can also be used to bias results if you want.
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
And no more filtering out messages, as we should no longer spam the
logs with them (the 'Connected json input' one was removed some time
ago).
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
The comment was wrong: the channel being locked in was triggering
the fee update and hence the disconnect. But that can actually
happen before fund_channel returns, as that waits for the gossipd
to see the channel active.
Best to do the fee update manually, so it's exactly what we want.
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
If feerates change, L2 sends L3 a commit for that, which causes us to
fail the assert (which says we won't send a commitment_signed).
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
We use feerate in several places, and each one really should react
differently when it's not available (such as when bitcoind is still
catching up):
1. For general fee-enforcement, we use the broadest possible limits.
2. For closingd, we use it as our opening negotiation point: just use half
the last tx feerate.
3. For onchaind, we can use the last tx feerate as a guide for our own txs;
it might be too high, but at least we know it was sufficient to be mined.
4. For withdraw and fund_channel, we can simply refuse.
Fixes: #1836
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
Manipulate fees via fake-bitcoin-cli. It's not quite the same, as
these are pre-smoothing, so we need a restart to override that where
we really need an exact change. Or we can wait until it reaches a
certain value in cases we don't care about exact amounts.
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
We don't respond to fee changes until we're locked in: make sure we catch
up at that point.
Note that we use NORMAL fees during opening, but IMMEDIATE after, so
this often sends a fee update. The tests which break, we set those
feerates to be equal.
This (sometimes) changes the behavior of test_permfail, as we now
get an immediate commit, so that is fixed too so we always wait for
that to complete.
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
This is a noop if we're opening a new channel (channel_fees_can_change(channel)
is false until funding locked in), but important if we're restarting.
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
We don't update a channel's feerate on reestablishment: we insert a restart
in test_onchain_different_fees() (which we'll need soon anyway) to show it.
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
When in this state, we send a canned error "Awaiting unilateral close".
We enter this both when we drop to chain, and when we're trying to get
them to drop to chain due to option_data_loss_protect.
As this state (unlike channel errors) is saved to the database, it means
we will *never* talk to a peer again in this state, so they can't
confuse us.
Since we set this state in channel_fail_permanent() (which is the only
place we call drop_to_chain for a unilateral close), we don't need to
save to the db: channel_set_state() does that for us.
This state change has a subtle effect: we return WIRE_UNKNOWN_NEXT_PEER
instead of WIRE_TEMPORARY_CHANNEL_FAILURE as soon as we get a failure
with a peer. To provoke a temporary failure in test_pay_disconnect we
take the node offline.
Reported-by: Christian Decker @cdecker
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
This means we don't try to unilaterally close after a restart, *and*
we can tell onchaind to try to use the point to recover funds when the
peer unilaterally closes.
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
Firstly, if they claim to know a future value, we ask the HSM; if
they're right, we tell master what the per-commitment-secret it gave
us (we have no way to validate this, though) and it will not broadcast
a unilateral (knowing it will cause them to use a penalty tx!).
Otherwise, we check the results they sent were valid. The spec says
to do this (and close the channel if it's wrong!), because otherwise they
could continually lie and give us a bad per-commitment-secret when we
actually need it.
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
Tests were failing when in the same thread after a test which set
log_all_io=True, because SIGUSR1 seemed to be turning logging *off*.
This is due to Python using references not copies for assignment.
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
This is required for the next test, which has to log messages from channeld
as soon as it starts (so might be too late if it sends SIGUSR1).
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
We ignore incoming for now, but this means we advertize the option and
we send the required fields.
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
peer features are only kept for connected peers (as they can change),
but we didn't update them on reconnect. The main effect was that
after a restart we displayed the features as empty, even after
reconnect.
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
1. l1 update_fee -> l2
2. l1 commitment_signed -> l2 (using new feerate)
3. l1 <- revoke_and_ack l2
4. l1 <- commitment_signed l2 (using new feerate)
5. l1 -> revoke_and_ack l2
When we break the connection after #3, the reconnection causes #4 to
be retransmitted, but it turns out l1 wasn't telling the master to set
the local feerate until it received the commitment_signed, so on
reconnect it uses the old feerate, with predictable results (bad
signature).
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
Useful it we want to intercept bitcoin-cli first.
We move the getinfo() caching into start(), as that's when we can actually
use RPC.
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
We're going to use it to override specific commands. It's non-valgrinded
already since we use '--trace-children-skip=*bitcoin-cli*' so the overhead
should be minimal.
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
We got an index error, because status had only one field (onchaind not
started yet).
> wait_for(lambda: only_one(p.rpc.listpeers(l1.info['id'])['peers'][0]['channels'])['status'][1] == 'ONCHAIN:Tracking mutual close transaction')
E IndexError: list index out of range1
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
Normal wallet txs get reconfirmed as blocks come in, but ones which need
closeinfo are more fragile, so we do it manually using txwatch for them.
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
In one case, the channel_update which we expected to activate the channel
from l2 was suppressed as redundant. This is certainly valid, so just
check the results.
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
Waiting for three node_announcments isn't always enough, since l2 can
publish two of them (an independent bug). Do the more Right Thing and
just wait for 30 seconds of no input...
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
1. connect convenience variable for improved readabilty.
2. a comment explaining that timer is on channel, not HTLC.
3. use modern python style in test_htlc_send_timeout
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
Now sending a ping makes sense: it should force the other end to send
a reply, unblocking the commitment process.
Note that rather than waiting for a reply, we're actually spinning on
a 100ms loop in this case. But it's simple and it works.
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
Currently, if we don't realize a TCP connection is down, we almost
certainly don't find out until *after* we're sent the
commitment_signed message, in which case we cannot fail the incoming
HTLC.
This test demonstrates that. Note the 30 second sleep: we should really
run Travis tests in parallel!
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
If we have an address hint, we start with that, but we'll use
node_announcement information if required.
Note: we (ab)use the address hint when restoring from the database
or reconnecting, even if the connection was *incoming*. That meant
that the recipient of a connection would *never* manage to connect out.
We still don't take multiple addresses from the DNS seeds: I assume we
should, since there could be IPv4 and IPv6.
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
connectd tells master about every disconnection, and master knows
whether it's important to reconnect. Just get the master to invoke a new
connect command if it considers the peer important!
The only twist is timeouts: we don't want to immediately reconnect if
we've failed to connect. To solve this, connectd passes a 'delaytime'
to the master when a connection fails, and the master passes it back
when it asks for a connection.
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
In particular, all opening_read_peer_msg() callers need to know there
was an error (presumably, negotiating) so they can stop, but we should
not exit.
This lets us reenable the final disabled test.
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
We now simply maintain a pubkey set for connected peers (we only care
if there's a reconnect), not the entire peer structure.
lightningd no longer queries us for getpeers: it knows more than we do
already.
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
There's now a potential race: the source peer connect returns, but in
destination peer the master hasn't read the connect message from
connectd, so the peer isn't in listpeers yet.
(Previously the connection stayed in connectd, so there was no such
window).
This is an occasional issue in a few places.
Note that we take the opportunity to speed up test_disconnectpeer too
while we're there.
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
Prior to this, lightningd would hand uninteresting peers back to connectd,
which would then return it to lightningd if it sent a non-gossip msg,
or if lightningd asked it to release the peer.
Now connectd hands the peer to lightningd once we've done the init
handshake, which hands it off to openingd.
This is a deep structural change, so we do the minimum here and cleanup
in the following patches.
Lightningd:
1. Remove peer_nongossip handling from connect_control and peer_control.
2. Remove list of outstanding fundchannel command; it was only needed to
find the race between us asking connectd to release the peer and it
reconnecting.
3. We can no longer tell if the remote end has started trying to fund a
channel (until it has succeeded): it's very transitory anyway so not
worth fixing.
4. We now always have a struct peer, and allocate an uncommitted_channel
for it, though it may never be used if neither end funds a channel.
5. We start funding on messages for openingd: we can get a funder_reply
or a fundee, or an error in response to our request to fund a channel.
so we handle all of them.
6. A new peer_start_openingd() is called after connectd hands us a peer.
7. json_fund_channel just looks through local peers; there are none
hidden in connectd any more.
8. We sometimes start a new openingd just to send an error message.
Openingd:
1. We always have information we need to accept them funding a channel (in
the init message).
2. We have to listen for three fds: peer, gossip and master, so we opencode
the poll.
3. We have an explicit message to start trying to fund a channel.
4. We can be told to send a message in our init message.
Testing:
1. We don't handle some things gracefully yet, so two tests are disabled.
2. 'hand_back_peer .*: now local again' from connectd is no longer a message,
openingd says 'Handed peer, entering loop' once its managing it.
3. peer['state'] used to be set to 'GOSSIPING' (otherwise this field doesn't
exist; 'state' is now per-channel. It doesn't exist at all now.
4. Some tests now need to turn on IO logging in openingd, not connectd.
5. There's a gap between connecting on one node and having connectd on
the peer hand over the connection to openingd. Our tests sometimes
checked getpeers() on the peer, and didn't see anything, so line_graph
needed updating.
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
In particular, I found lightning_openingd processes after running
tests. When we use the dev_disconnect blackhole '0' option, they
stick around until the dev_disconnect file is truncated (there is only
so much you can do with only a file descriptor), so let's do that.
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
The following changes revealed this race, where expecting listchannels()
to contain two channels immediately after fund_channel() was racy.
We also derive the short_channel_id first, so we can search logs for the
exact messages.
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
The next patches get better at reconecting, so if we use dev-allow-localhost
nodes can often find each other and reconnect before shutting down; only
use that option where we actually need it.
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
Saw this in Travis: technically we return from the dev_set_max_scids...
cmd after sending it to gossipd, but we should wait for it to log.
Adding an internal reply message for a dev command seems overkill.
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
1. If the IPv6 address was public, that changed the wireaddr and thus the ipv4 bind
would not be to a wildcard and would fail.
2. Binding two fds to the same port on both wildcard IPv4 and IPv6 succeeds; we only
fail when we try to listen, so allow error at this point.
For some reason this triggered on my digital ocean machine.
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
openingd calculates our reserve based on the channel amount (even if
we're funding, to keep the calculation in one place), but it wasn't
reporting it back to the master daemon. We initialized it to 0 so that
valgrind wouldn't get upset, as it's part of a structure we send over
the wire.
Have openingd report back, and also initialize it to an impossible value
as extra assurance. And remove a stray (harmless but weird) semicolon.
Reported-by: Gálli Zoltán
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
This adds one line with the onion and the channel_update we extract from
it. This in turn allows us to check that the channel_update in the onion is not
type prefixed, and that we patch it correctly before passing it to gossipd.
The easiest way to do this is to play with the 'wallet_tx' semantics
and have 'amount' have meaning even when 'all_funds' is set.
Note that we change the string 'Cannot afford funding transaction' to
'Cannot afford transaction' as this code is also used for withdrawls.
Inspired-by: molz on #c-lightning
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
The logs in various Travis failures show that it takes 20 seconds just for
closingd to read the init message. As a result, the close times out (default
is 30 seconds).
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
We were *supposed* to be waiting for the next commitment tx so we
made sure the one we broadcast was old, *but* the 'revoke_and_ack'
we were waiting for could be matched by the completion of the previous
'revoke_and_ack'.
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
This needs to be done separately from the rest of the daemon since we can
otherwise not make sure that it happens before the DB is freed and we might
still need the DN, and be running in a DB transaction, for some destructors to
run.
That was the cause of the bad gossip order failures: gossipd thought our
channel was live, but the other end didn't receive message last time.
Now gossipd doesn't use fd to kill us (connectd tells master to do so), we
can implement read_peer_msg_nogossip().
Fixes: #1706
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
This patch guts gossipd of all peer-related functionality, and hands
all the peer-related requests to channeld instead.
gossipd now gets the final announcable addresses in its init msg, since
it doesn't handle socket binding any more.
lightningd now actually starts connectd, and activates it. The init
messages for both gossipd and connectd still contain redundant fields
which need cleaning up.
There are shims to handle the fact that connectd's wire messages are
still (mostly) gossipd messages.
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
Gossipd combines the information if it knows it, but that's really the
job of 'listnodes'. More importantly, channeld won't have access to
this information.
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
I saw an error in test_gossip_weirdalias in Travis, where listnodes(nodeid)
returned *BOTH* nodes; it happened to fail because [0] was the wrong one, but
it would have passed if the order had been different.
This helper asserts that we really do only have one element, and should
catch such bugs faster.
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
I could not figure out why test_announce_address suddenly stopped working:
I had previously been using DEVELOPER=1 on the cmdline for historical
reasons when testing locally.
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
We weren't waiting for gossipd to actually process the
dev_set_max_scids_encode_size message, so under Travis it sometimes
split the reply before processing that.
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
In this test we tell l3 to disconnect on sending WIRE_CHANNEL_ANNOUNCEMENT.
This is hit by gossipd (to disconnect from l2) but *also* channeld to
disconnect from l4. That's OK, because normally by this point l4 has
sent its real channel_update.
However, the next patch introduces a delay in sending channel_updates,
meaning l4 hasn't sent it yet. If l3 doesn't reconnect to l4, we
never get the channel_update and the test which expects l1 to eventually
see both sides of the channel fails.
So we manually reconnect then. Note that we remove the redundant
'dev-no-reconnect' option from l2: it's added automatically as it
doesn't set 'may_reconnect'.
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
Gossipd will ignore the second one, but doing it in the front end
gives an explicit error message.
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
Python dict can't have duplicate entries, but some options can be specified
multiple times. The easiest way is to put a list in the dict.
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
E ConnectionRefusedError: [Errno 111] Connection refused
And in debug.log:
2018-05-17T04:06:35Z Warning: Config setting for -rpcport only applied on regtest network when in [regtest] section.
Unfortunately, current versions including 0.16.1 *ignore* the contents of
a '[regtest]' section, so we need it in *both* places.
Also remove the misleading 'rpcport' initialization which we always
override.
Note that we don't fix this message though:
2018-05-17T04:06:35Z Config options rpcuser and rpcpassword will soon be deprecated. Locally-run instances may remove rpcuser to use cookie-based auth, or may be replaced with rpcauth. Please see share/rpcuser for rpcauth auth generation.
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
For some reason, we created a second bitcoin.conf in the regtest/ directory,
which AFAICT nothing uses.
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
It's not optional for our test setup, and this makes it easier to invoke
bitcoin-cli manually, for example.
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
As of bitcoind 0.16.1, you can't send a single-input OP_RETURN output,
as you get 'tx-too-small'.
sendrawtx exit 26, gave error code: -26?error message:?tx-size-small (code 64)?'
So instead we use the minimum fee we can, but otherwise ignore it and
don't wait for it to be mined.
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
New codes: FUND_MAX_EXCEEDED, FUND_CANNOT_AFFORD, FUND_DUST_LIMIT_UNMET.
The error message "Cannot afford fee" was not exactly correct because
it would also occur if the amount requested could not be afforded. So
I changed it to the more generic "Cannot afford transaction".
Other things:
* Fixed off-by-one satoshi in fundchannel manpage.
* Changed 'arror' to 'error' because we are not pirates.
I still believe that 2 weeks is way too much, but we were promised that these
defaults would be slowly reduced to saner values as the stability increases.
Signed-off-by: Christian Decker <decker.christian@gmail.com>
Waiting for 'Disabling channel' is not enough, since it's async to the
actual tx broadcast: I caught a case where the unilateral close wasn't
in the block, and so failed.
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
72d103d6bb deprecated DEVELOPER env var
in favor of config.vars, but didn't update test_closing.py or test_gossip.py.
5d0a54b7f0 then removed the explicit
DEVELOPER= setting from Travis.
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
The modern, pytest based, tests now clean up after themselves by removing
directories of successful tests and the base directory if there was no failure.
Signed-off-by: Christian Decker <decker.christian@gmail.com>
Previous replies weren't large enough; add another channel and then
use IO tracing to make sure the reply is zlib encoded.
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
Since we currently only (ab)use it to send everything, we need a way to
generate boutique queries for testing.
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
The gossip-query spec enhancements say not to forward an channel_announcement until
you have receive a channel_update. This test fails for now.
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
When opening a number channels from a single node we could end up not waiting
for the funding tx to make it into the mempool, instead triggering on a previous
`sendrawtransaction` or `CHANNEL_NORMAL` in the logs. This now checks that the
actual funding transaction makes it into the mempool and that we wait for the
depth change for that specific channel.
Signed-off-by: Christian Decker <decker.christian@gmail.com>
We don't have any connection yet, so how could they be active? Disable both
sides to avoid trying to route through them or telling others to use them as
`contact_points` in invoices.
Signed-off-by: Christian Decker <decker.christian@gmail.com>
Mostly `lightningd` complaining about not being able to estimate fees. Safes us
a lot of log space when some tests time out, and safes us a few context switches
between log appender and log watchers.
Signed-off-by: Christian Decker <decker.christian@gmail.com>
We were sort of assuming that one side telling it's ok would automagically make
the other side succeed as well.
Signed-off-by: Christian Decker <decker.christian@gmail.com>
While 1 second is very generous, it resolving the HTLCs may take longer, so we
just loop over this instead of making it one-shot
Signed-off-by: Christian Decker <decker.christian@gmail.com>
The RPC calls aren't really free, so let's wait for absolute times, computed
from the `start_time` instead.
Signed-off-by: Christian Decker <decker.christian@gmail.com>
We can restore them once we get parallel testing on Travis, but meanwhile
we time out because of the 30 seconds bitcoind poll.
Running on my laptop with --duration=5:
=========================== slowest 5 test durations ===========================
184.07s call tests/test_lightningd.py::LightningDTests::test_multiple_channels
156.66s call tests/test_lightningd.py::LightningDTests::test_forward
155.77s call tests/test_lightningd.py::LightningDTests::test_closing
126.83s call tests/test_lightningd.py::LightningDTests::test_waitinvoice
126.11s call tests/test_lightningd.py::LightningDTests::test_waitanyinvoice
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
Because we have too many which are never used and I don't want to document
them.
1. Remove unused anchor_onchain_wait. When implemented, it should be
hardcoded to 100 or more.
2. Remove anchor_confirms_max. 10 always reasonable, and we can readd
an override option should someone need it.
3. max_htlc_expiry should be the same as locktime_max (which increases
from 3 to 5 days by default): they're both a limit on how long
funds can be locked up.
4. channel_update_interval should always be a dev option.
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
Make --override-fee-rates a dev option. We use default-fee-rate in
its place, which (since bitcoind won't give fee estimates in regtest
mode for short chains) gives an effective feerate of 15000/7500/3750.
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
@cdecker points out that in test_forward, where we manually create a route,
we get an error back which contains an update for an unknown channel.
We should still note this, but it's not an error for testing.
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
This is something which generally shouldn't happen, but we didn't
notice it previously.
We ignore this warning in the case where a channel was deleted: this
happens because one side can send an update while the other notices
that the channel is closed.
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
It gets confused if re-run due to flaky, since:
1. Using the same HSM, it generates the same spend and sees a previous one.
2. The block height numbers are off.
Fixes: #1479
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
We have a 'bitcoind' global: getting it from inside one of the daemons
was a mistake I've copied widely.
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
It simply wasn't waiting for us to register the peer before attempting to open a
connection. Moved into a separate test to be able rerun multiple times.
Signed-off-by: Christian Decker <decker.christian@gmail.com>
Someone could try to announce an internal address, and we might probe
it.
This breaks tests, so we add '--dev-allow-localhost' for our tests, so
we don't eliminate that one. Of course, now we need to skip some more
tests in non-developer mode.
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
If we're given a wildcard address, we can't announce it like that: we need
to try to turn it into a real address (using guess_address). Then we
use that address. As a side-effect of this cleanup, we only announce
*any* '--addr' if it's routable.
This fix means that our tests have to force '--announce-addr' because
otherwise localhost isn't routable.
This means that gossipd really controls the addresses now, and breaks
them into two arrays: what we bind to, and what we announce. That is
now what we return to the master for json_getinfo(), which prints them
as 'bindings' and 'addresses' respectively.
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
1. Add special option where an empty host means 'wildcard for IPv4 and/or IPv6'
which means ':1234' can be used to set only the portnum.
2. Only add this protocol wildcard if --autolisten=1 (default)
and no other addresses specified.
3. Pass it down to gossipd, so it can handle errors correctly: in most cases,
it's fatal not to be able to bind to a port, but for this case, it's OK
if we can only bind to one of IPv4/v6 (fatal iff neither).
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
It's become clear that our network options are insufficient, with the coming
addition of Tor and unix domain support.
Currently:
1. We always bind to local IPv4 and IPv6 sockets, unless --port=0, --offline,
or any address is specified explicitly. If they're routable, we announce.
2. --addr is used to announce, but not to control binding.
After this change:
1. --port is deprecated.
2. --addr controls what we bind to and announce.
3. --bind-addr/--announce-addr can be used to control one and not the other.
4. Unless --autolisten=0, we add local IPv4 & IPv6 port 9735 (and announce if they are routable).
5. --offline still overrides listening (though announcing is still the same).
This means we can bind to as many ports/interfaces as we want, and for
special effects we can announce different things (eg. we're sitting
behind a port forward or a proxy).
What remains to implement is semi-automatic binding: we should be able
to say '--addr=0.0.0.0:9999' and have the address resolve at bind
time, or even '--addr=0.0.0.0:0' and have the port autoresolve too
(you could determine what it was from 'lightning-cli getinfo'.
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
We're about to change the JSONRPC, so let's put an explicit 'port' into
our node class.
We initialize it at startup time: in future I hope to use ephemeral ports
to make our tests more easily parallelizable.
Suggested-by: @cdecker
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
This used to be the port, but since we no longer have fixed ports, and we start
them in random order we can't easily distinguish them by the port anymore. Just
use a numeric ID that matches their lightning-dirs.
Signed-off-by: Christian Decker <decker.christian@gmail.com>
We add an attempt number to the test directory to improve the test-isolation and
allow for multiple reruns of the same test, without re-using any of the
lightning-dirs or bitcoin-datadirs.
Signed-off-by: Christian Decker <decker.christian@gmail.com>
This is the first example of the py.test style fixtures which should allow us to
write much cleaner and nicer tests.
Signed-off-by: Christian Decker <decker.christian@gmail.com>
We never really look at the output, and it's rather noisy, so we just stop
writing to the log.
Signed-off-by: Christian Decker <decker.christian@gmail.com>
Especially with valgrind this allows us to safe quite some time on multi-core
machines since startup is about the most expensive operation in the tests. In a
simple test spinning up 3 nodes this gave me about 25% - 30% test time
reduction. The effects will be smaller for single core machines, hoping Travis
handles these gracefully.
Signed-off-by: Christian Decker <decker.christian@gmail.com>
We can create the hsm file from python directly; that works even if we
don't have DEVELOPER set, and is simpler.
We add a test that the aliases are correct.
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
We're getting spurious closures, even on mainnet. Using --ignore-fee-limits
is dangerous; it's slightly less so to lower the minimum (which is the
usual cause of problems).
So let's halve it, but beware the floor.
This is a workaround, until we get independent feerates in the spec.
Fixes: #613
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
At least say whether we failed to connect at all, or failed cryptographic
handshake, or failed reading/writing init messages.
The errno can be "Operation now in progress" if the other end closes the
socket on us: this happens when we handshake with the wrong key and it
hangs up on us. Fixing this would require work on ccan/io though.
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
When we get a reconnection, kill the current remote peer, and wait for the
master to tell us it's dead. Then we hand it the new peer.
Previously, we would end up with gossipd holding multiple peers, and
the logging was really hard to interpret; I'm not completely convinced
that we did the right thing when one terminated, either.
Note that this now means we can have peers with neither ->local nor ->remote
populated, so we check that more carefully.
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
1. Lifetime of 'struct reaching' now only while we're actively doing connect.
2. Always free after a single attempt: if it's an important peer, retry
on a timer.
3. Have a single response message to master, rather than relying on
peer_connected on success and other msgs on failure.
4. If we are actively connecting and we get another command for the same
id, just increment the counter
The result is much simpler in the master daemon, and much nicer for
reconnection: if they say to connect they get an immediate response,
rather than waiting for 10 retries. Even if it's an important peer,
it fires off another reconnect attempt, unless it's actively
connecting now.
This removes exponential backoff: that's restored in next patch. It
also doesn't handle multiple addresses for a single peer.
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
Got some intermittant failures, mainly caused by the tests being slow
enough that the peer reconnected. We should always suppress
reconnection if we can, and not stress too much in the !DEVELOPER case
where we can't.
We should turn off dev-no-reconnect *always* unless told we will
reconnect, and since we can't if !DEVELOPER, don't do the connection
check there.
Instead of adding an option to line_graph, we remove it in favor
of connect (since we only use it with n=2 anyway).
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
This shaves off about 15% of our integration testing suite on my machine. It
assumes we never reorg below the first block the node starts with, which is true
for all tests, so it's safe.
Signed-off-by: Christian Decker <decker.christian@gmail.com>
Simplification of the offset calculation to use the rescan parameter, and rename
of `wallet_first_blocknum`. We now use either relative rescan from our last
known location, or absolute if a negative rescan was given. It's all handled in
a single location (except the case in which the blockcount is below our
precomputed offset), so this should reduce surprises.
Signed-off-by: Christian Decker <decker.christian@gmail.com>
Careful log examination revealed that we were generating a block before one
of the mutual close txs had entered the mempool. This is rare because it
means that both peers have to be too slow.
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
I had a weird failure which was caused by an unexpected disconnect and
reconnecct. Since we are prersistend and recover from these, they can
slip through our tests; most tests don't involve reconnection, so we
need to catch this explicitly.
For the connect() helper, we always suppress reconnection; tests which
want it all want other options so don't use this helper anyway. (Actually,
after I said that, test_closing_while_disconnected was added when I
rebased, which did require it, so I had to open-code that one).
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
Also report tx and txid, and whether we closed unilaterally or
bilaterally, if we could close the channel.
Also make a manpage.
Fixes: #1207Fixes: #714Fixes: #622
Mixing the test into the existing test allows us to reduce the execution, but
adds a bit of complexity, so I added two small helpers.
Signed-off-by: Christian Decker <decker.christian@gmail.com>
We can have more than one; eg we might offer both bech32 and a p2sh
address, and in future we might offer v1 segwit, etc.
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
This allows us to have some default options that can then be overridden easily
on a per-test basis.
Signed-off-by: Christian Decker <decker.christian@gmail.com>
In the next commit we remove channels whose outpoint was spent from our network
view, so checking for it will not work anymore.
Signed-off-by: Christian Decker <decker.christian@gmail.com>
This fixes the root cause of https://github.com/ElementsProject/lightning/issues/1212
where we deleted the payment because we wanted to retry, then retry failed
so we had an (old) HTLC without a matching payment. We then fed that
HTLC to onchaind, which tells us it's missing, and we try to fail the
payment and deref a NULL pointer.
Fixes: #1212
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
We say "in N blocks" but we actually mean "N blocks after this tx" which is
actually N-1 or less. Change wording and tighten tests which misunderstood
this.
Also, the 'assert not l1.daemon.is_in_log('onchaind complete, forgetting peer')'
are unlikely to work until the daemon has actually seen the block, so add
sync_blockheight before all of those.
These changes reveal some sloppy testing, which we fix.
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
With the following patch applied, we could clearly see onchaind try to
broadcast the timeout tx one block too early:
sendrawtx exit 26, gave error code: -26?error message:?non-final (code 64)?
This is because of an out-by-one error in calculating the relative
depth required, since the out->tx_blockheight is already 1 before the
current block.
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
This was revealed in #1114; onchaind isn't actually completely idempotent
due to fee changes (and the now-fixed change in keys used).
This triggers the bug by restarting with different fees, resulting in
onchaind not recognizing its own proposal:
2018-03-05T09:38:15.550Z lightningd(23076): lightning_onchaind-022d223620a359a47ff7f7ac447c85c46c923da53389221a0054c11c1e3ca31d59 chan #1: STATUS_FAIL_INTERNAL_ERROR: THEIR_UNILATERAL/OUR_HTLC spent with weird witness 3
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
This is twice the 'update_channel_interval' we get handed.
We delete the non-existent channel_add_connection and delete_connection
declarations from the header too.
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
We currently give them a free pass. The simplest fix is to give them
an old timestamp on initialization.
We still skip unannounced channels, on the assumption that they're
ours. And we set the last_update_timestamp to -1 when we convert to
gossip_getchannels_entry to indicate no update.
This breaks the DEVELOPER=1 pruning test, since we hardcode the 1
week timeout. That's fixed in the next patch.
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
I initially disabled this until 0.16 because the withdraw command
was modified to require 'brct1' addresses for regtest.
But commit bd07a9 allows a regular testnet address to work
just as well. So renable this check.
FWIW, the tests without valgrind take 662 seconds before we reduced
the number of blocks, and only 648 seconds now.
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
0.16.0 is required since we rely on it for some tests and the block
reduction allows us to waste less time during setup. 121 blocks were
chosen so that we have at least one mature output to spend.
Signed-off-by: Christian Decker <decker.christian@gmail.com>
If bitcoind is still fetching blocks, we might accidentally inject
the failure between getblockhash and getblock. That's OK, but
it's not the failure we test for.
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
Looks like rebasing the flake8 branch caused breakage, as new violations
had occurred since that check was written
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
This test generates a pre-bip173 testnet P2PKH address with bitcoin 0.15.1
which fails under the new verification checks for the bip173 network name.
This should be renabled with bitcoin 0.16 which can generate a bip173 address
for regtest with the bcrt prefix.
* Modifies invoice command to have the following format
invoice <msatoshi> <label> <desc> <?expiry> <?fallbackaddr>
* Adds support for Segwit bcrt1 addresses for withdraw
* Add test case for fallback address in invoice creation
* Create a common json_tok_address_scriptpubkey to be used
by invoice and withdraw commands.
We can do similar tricks to test other things, even to run without a
real bitcoind for faster testing, but for now we simply exit if a magic
file says so.
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
The billboard is now far more useful to tell what's going on, and this
gets us closer to a state == owner mapping.
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
We also fold opening_got_hsm_funding_sig() into the caller; it was
previously a callback before we decided to always use the HSM
synchronously.
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
Each peer can have one 'uncommitted' channel, which is in the process
of opening. This is used for openingd, and then on return we convert
it into a full-fledged struct channel and commit it into the database.
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
* Fix dev_setfees to set slow and normal fees correctly.
Due to a bug def_setfees(100, slow=100) would instead set immediate and
normal fees to 100. This behavior has been updated to set fees to
correct values, make the values truly optional as per documentation and
unit test this behavior.
* Fix pay() to set msatoshi, description and risk factor properly
Due to a bug pay(invoice, description='1000') resulted in payment of
1000 msatoshi instead. This was fixed and covered with tests.
* Fix named args in listpayments, listpeers and connect
* Do not pass None to methods where it is default value
* Make description on invoice and pay match.
Suggested-by: @ZmnSCPxj
* Fix dev_setfees to set slow and normal fees correctly.
Due to a bug def_setfees(100, slow=100) would instead set immediate and
normal fees to 100. This behavior has been updated to set fees to
correct values, make the values truly optional as per documentation and
unit test this behavior.
* Fix pay() to set msatoshi, description and risk factor properly
Due to a bug pay(invoice, description='1000') resulted in payment of
1000 msatoshi instead. This was fixed and covered with tests.
* Fix named args in listpayments, listpeers and connect
* Do not pass None to methods where it is default value
* Make description on invoice and pay match.
Suggested-by: @ZmnSCPxj
We now keep a list of commands for the jcon instead of a simple
'current' pointer: the assertions become a bit more complex, but
the rest is fairly mechanical.
Fixes: #1007
Reported-by: @ZmnSCPxj
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
We need to set expiry, otherwise waitinvoice would take 1 hr, and we
can't read once for every cmd, since each read may consume more than
a single result, and we block.
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
Once we read a command, we are supposed to io_wait until it finishes.
However, we are actually woken in two places: when it's complete
(which is correct), and when it's written out (which is wrong).
We don't care when it's written out, only when it's finished:
refactor to make json_done() free and NULL the old ->current,
rather than have the callers do it. Now it's clear that it's
ready for both new output and new input.
Fixes: #934
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
And return the correct error message for the channel they give, if
they try to re-establish on an error channel.
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
Much like the database; peer contains id, address, channel contains
per-channel information. Where we create a channel, we always create
the peer too.
For the moment, peer->log and channel->log coexist side-by-side, to
reduce some of the churn.
Note that this changes the API to dev-forget-channel: if we have more
than one channel, we insist they specify the short-channel-id.
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
We get intermittant failure: WIRE_UNKNOWN_NEXT_PEER (First peer not ready)
because CHANNELD_NORMAL and actually telling gossipd that the channel
is available are distinct things: we need both.
(For test_closing_different_fees, we were testing CHANNELD_NORMAL on
the peer, not on l1, too).
But we may also directly send the announcement sigs if the height is
sufficient, so the simplest is to unify the messages.
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
```
> assert [c['active'] for c in l2.rpc.listchannels()['channels']] == [True, True]
E AssertionError: assert [True, False] == [True, True]
E At index 1 diff: False != True
E Full diff:
E - [True, False]
E + [True, True]
```
We don't actually wait that l2's gossipd has also processed the message.
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
Sometimes the super-low-fee commitment tx succeeds, and we see
that 'sendrawtx exit 0' instead of the one we're expecting.
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
fundrawtransaction returns before the actual sendrawtx, so we can
end up mining blocks before it's sent, thus not having enough confirms.
We handle this correctly in fund_channel, but this test open-codes it
for speed with multiple peers.
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
It was taking over 10 minutes under valgrind, causing Travis to time it out.
This shrinks it to its essential tests, and also batches.
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
With the new 'human-readable' mode of lightning-cli, this actually produces
a valid config file. It's a bit hacky though...
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
In particular, decode error messages correctly and do the right thing with
messages about other channels.
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
We are now too quick in disabling the channel for us to attempt a
payment. We need to separate into getroute and sendpay to trigger this
now.
Signed-off-by: Christian Decker <decker.christian@gmail.com>
This was flaky because we didn't wait for the fee update to complete
and were using the old, way too small, fees, which upset bitcoind.
Signed-off-by: Christian Decker <decker.christian@gmail.com>
delinvoice was orginally documented to only allow deletion of unpaid
invoices, but there might be reasons to delete paid ones or unexpired ones.
But we have to avoid the race where someone pays as it's deleted: the
easiest way is to have the caller tell us the status, and fail if
it's wrong.
Fixes: #477
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
We're going to have to support multiple channels per peer, even if only
when some are onchain. This would break the current listpeers, so
change it to an array (single element for now).
Other cleanups:
1. Only set connected true if daemon is not onchaind.
2. Only show netaddr if connected; don't make it an array, call it `address`
in comparison with `addresses` in listnodes.
3. Rename `channel` to `short_channel_id`
4. Add `funding_txid` field for voyeurism.
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
Individual tests can always re-enable them, though.
[ More test fallout fixes by Christian Decker ]
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
Seems to avoid the nasty python resource warnings, as well as the
fatal 'ValueError: PyMemoryView_FromBuffer(): info->buf must not be NULL'
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
This, of course, should never be used. But it helps maintain connections
for the moment while we dig deeper into feerates.
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
For now this just tests that we are sending out keepalive
channel_updates for all local channels.
Signed-off-by: Christian Decker <decker.christian@gmail.com>
Since most callers use positional arguments, we should allow a 'null'
literal where we require no value at all.
Also adds some more value tests.
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
Paid invoices need to know how much was actually paid: both for the case
where no 'msatoshi' amount was specified, and for the normal case, where
clients are permitted to overpay in order to help them disguise their
payments.
While we migrate the db, we leave this field as 0 for old paid
invoices. This is unhelpful for accounting, but at least clearly
indicates what happened if we find this in the wild.
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
'rhash' is the old terminology, but 'payment_preimage' and
'payment_hash' were decided on for the BOLTs, so we should fix that here.
We still use rhash internally, but that's much easier to fix.
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
> assert [c['active'] for c in l2.rpc.getchannels()['channels']] == [True, True]
E AssertionError: assert [False, True] == [True, True]
E At index 0 diff: False != True
E Full diff:
E - [False, True]
E + [True, True]
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
We need to make sure all the updates are known to gossip. Since
one is the local update, we change that message to look the same.
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
Receiving them in channeld is not enough to avoid the race:
route = l1.rpc.getroute(l3.info['id'], 4999999, 1)["route"]
...
ValueError: RPC call failed: Could not find a route
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
We wait the the receipt of the CHANNEL_UPDATE message by channeld,
but that doesn't mean it reached gossipd yet, causing spurious test
failure.
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
CI always runs with TEST_DEBUG=1 which prints logs anyway, and testing
locally should also be done this way, combined with pytest which
captures the logs. No need to duplicate the functionality of pytest.
Signed-off-by: Christian Decker <decker.christian@gmail.com>
Since we seem to have some isolation concerns when re-generating the
same HSM secret and re-parsing the blockchain some blocks in the past.
This also alleviates the problem of printing to a logging stream that
has been closed. Previously bitcoind would keep running despite a test
had failed and continue logging to the, now closed, StringIO that
py.test uses when capturing stdout.
The performance impact seems to be 1-3 second per test, not too bad
IMHO for increased test isolation and cleaner logs:
|--------------------+---------------+----------|
| | No_valgrind | Valgrind |
|--------------------+---------------+----------|
| bitcoind per suite | 10 min 24 sec | 46:15.31 |
| bitcoind per test | 11 min 38 sec | 49:21.64 |
|--------------------+---------------+----------|
Signed-off-by: Christian Decker <decker.christian@gmail.com>
I was examining a test_onchain_timeout failure, and realized that we
were forgetting a peer even though we'd just spent the HTLC_TIMEOUT_TX!
This reveals that we weren't resolving an output when we stole the preimage
from it, like we should.
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
Since we panic when we see our root reorg out, even if we're not doing
anything yet, restoring the 100 block margin is the simplest fix.
Unfortunately this means adding a 100-block spacer in the tests, so things
don't get confused.
Fixes: #511
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
We stopped too early; we should continue and make sure it all goes well.
This means we have to fix them to be deterministic: by generating 2
blocks at once in test_htlc_in_timeout, we raced between fulfill and
timeout on the HTLC. Now it's always fulfilled.
Also, fixed confusing comments: l1 doesn't drop to chain.
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
OUR_HTLC_TIMEOUT_TO_US = normal tx, used to timeout htlc in their commit tx.
OUR_HTLC_TIMEOUT_TX = dual-sig tx with delay, used to timeout htlc in our commit tx.
Only one test looks at that string, so fix that too.
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
The test_reconnect_normal test is failing rather consistently on 32bit
architectures, disabling to reduce noise. Issue #468 tracks progress.
Signed-off-by: Christian Decker <decker.christian@gmail.com>
With python-bitcoinlib==0.9.0 it appears that the URL based auth
information is no longer used, so we fall back to reading the config
file for the bitcoind daemon instead.
Signed-off-by: Christian Decker <decker.christian@gmail.com>
Relatively simple: until we reach funding-depth the channels should be
known locally, so we can already route through them, but they should
not be announced to peers to which the connection is non-local.
Signed-off-by: Christian Decker <decker.christian@gmail.com>
If send_htlc_out() fails, it doesn't initialize pc->out; that can
make us think it's still in progress.
Reported-by: Jonas Nick
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
All peers come from gossipd, and maintain an fd to talk to it. Sometimes
we hand the peer back, but to avoid a race, we always recreated it.
The race was that a daemon closed the gossip_fd, which made gossipd
forget the peer, then master handed the peer back to gossipd. We stop
the race by never closing the gossipfd, but hand it back to gossipd
for closing.
Now gossipd has to accept two fds, but the handling of peers is far
clearer.
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
We should not disconnect from a peer just because it fails opening; we
should return it to gossipd, and give a meaningful error.
Closes: #401
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
We don't use it yet, but now we'll decode correctly.
See: https://github.com/lightningnetwork/lightning-rfc/pull/317
lightning-rfc commit: ef053c09431442697ab46e83f9d3f86e3510a18e
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
If you run locally, it fails occasionally; presumably because it
sees previous funds. Use a random HSM key for that teste.
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
I noticed some breakage with git master:
1. getinfo no longer supported (for us, use getblockchaininfo)
2. generate no longer supported (use generatetoaddress)
Both these options are supported at least in 0.15, too.
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
This seems to happen when we manage to check between the
channel_announcement and the channel_update being processed.
Signed-off-by: Christian Decker <decker.christian@gmail.com>
Add two simple tests: one for a single direct payment and one with
hundreds of parallel payments, reusing the same route.
Signed-off-by: Christian Decker <decker.christian@gmail.com>
We are announcing that we are willing to accept incoming payments with
current_height + min_final_cltv_expiry + slack, assuming that the
sender adds some slack. In particular we'd reject the payment if
slack=0 which is allowed by the spec.
Signed-off-by: Christian Decker <decker.christian@gmail.com>
Raising the exception about non-zero exit values into the
teardown. This was previously masking the valgrind errors. Now
valgrind errors > crash errors > non-zero return value.
Still hoping to catch that elusive [7, 0] return value on travis.
Signed-off-by: Christian Decker <decker.christian@gmail.com>
These need to be different for testing the example in BOLT 11.
We also use the cltv_final instead of deadline_blocks in the final hop:
various tests assumed 5 was OK, so we tweak utils.py.
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
I run with ulimit -c unlimited, and valgrind leaves core files like
valgrind-errors.22114.core.22114 which test_lightning.py tries to
parse as log files.
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
It crashes under valgrind, causing a valgrind error: valgrind gives us a
backtrace anyway, so we don't need it.
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
And we report these through the getpeers JSON RPC again (carefully: in
our reconnect tests we can get duplicates which this patch now filters
out).
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
This is a bit messier than I'd like, but we want to clearly remove all
dev code (not just have it uncalled), so we remove fields and functions
altogether rather than stub them out. This means we put #ifdefs in callers
in some places, but at least it's explicit.
We still run tests, but only a subset, and we run with NO_VALGRIND under
Travis to avoid increasing test times too much.
See-also: #176
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
Makes it easier to compare before/after failures. Ideally, we should
run under Travis both with this option and with the seed based on the
entire tmp path (which is still reproducible with determination, but
not fixed every run like this is).
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
There are now only two kinds of subdaemons: global ones (hsmd, gossipd) and
per-peer ones. We can handle many callbacks internally now.
We can have a handler to set a new peer owner, and automatically do
the cleanup of the old one if necessary, since we now know which ones
are per-peer.
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
Now the flow is much simpler from a lightningd POV:
1. If we want to connect to a peer, just send gossipd `gossipctl_reach_peer`.
2. Every new peer, gossipd hands up to lightningd, with global/local features
and the peer fd and a gossip fd using `gossip_peer_connected`
3. If lightningd doesn't want it, it just hands the peerfd and global/local
features back to gossipd using `gossipctl_handle_peer`
4. If a peer sends a non-gossip msg (eg `open_channel`) the gossipd sends
it up using `gossip_peer_nongossip`.
5. If lightningd wants to fund a channel, it simply calls `release_channel`.
Notes:
* There's no more "unique_id": we use the peer id.
* For the moment, we don't ask gossipd when we're told to list peers, so
connected peers without a channel don't appear in the JSON getpeers API.
* We add a `gossipctl_peer_addrhint` for the moment, so you can connect to
a specific ip/port, but using other sources is a TODO.
* We now (correctly) only give up on reaching a peer after we exchange init
messages, which changes the test_disconnect case.
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
We currently assume the daemon gives up; gossipd won't, and we want to
test it there too.
This reveals a bug (returning io_close() is bad if the call is to
duplex()), and breaks a test which now continues after dropping a
packet..
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
Useful if we want to drop & suppress, for example. We change '=' to mean
do nothing to the packet.
We use this to clean up the test_reconnect_sender_add test.
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
We're going to make the ip/port optional, so they should go at the end.
In addition, using ip:port is nicer, for gethostbyaddr().
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
We have to do a dance when we get a reconnect in openingd, because we
don't normally expect to free both owner and peer. It's a layering
violation: freeing a peer should clean up the owner's pointer to it,
to avoid a double free, and we can eliminate this dance.
The free order is now different, and the test_reconnect_openingd was
overprecise.
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
Now that we have HTLC persistence we'd also like to test it. This
kills the second node in the middle of an HTLC, it'll recover and
finish the flow.
Signed-off-by: Christian Decker <decker.christian@gmail.com>
This broke somewhere in the recent changes, because we override
TailalbleProc stop(). Break out log extractor.
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
Moved the flagging for allowed failures into the factory getter, and
renamed into `may_fail`. Also stopped the teardown of a node from
throwing an exception if we are allowed to exit non-cleanly.
Signed-off-by: Christian Decker <decker.christian@gmail.com>
A failed returncode check could result in the cleanup for other
lightningds to be skipped. Now make sure to cleanup all and then
rethrow an exception that contains all returncodes.
Signed-off-by: Christian Decker <decker.christian@gmail.com>
We used to simply kill the daemon, which in some cases could result in
half-written crashlogs and similar artifacts such as half-completed
RPC calls. Now we ask lightningd to stop nicely, give it some time and
only then kill it. We also return the returncode of the daemon.
Signed-off-by: Christian Decker <decker.christian@gmail.com>
peer_fail_permanent() frees peer->owner, but for bad_peer() we're
being called by the sd->badpeercb(), which then goes on to
io_close(conn) which is a child of sd.
We need to detach the two for this case, so neither tries to free the
other.
This leads to a corner case when the subd exits after the peer is gone:
subd->peer is NULL, so we have to handle that too.
Fixes: #282
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
Note that it should really be a flag to daemon on construction, too,
but that may interfere with another concurrent branch so I've deferred.
Suggested-by: Christian Decker
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
We re-use the value for reasonable_depth given by the master, and we
tell it when our timeout transactions reach that depth.
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
In the next test, we wait for multiple 'sendrawtx exit 0' which
doesn't work because we use a set not a list, and the current code
would match multiple against the same thing. The result was we didn't
wait for the final sendrawtransaction, and occasionally had test
failures as a result.
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
When we sent out an HTLC-Timeout or HTLC-Success tx, we need to spend
it after the timeout so it's safely in our wallet.
We generalize the tx_type OUR_UNILATERAL_TO_US_RETURN_TO_WALLET to
OUR_DELAYED_RETURN_TO_WALLET, since we use it for HTLC transactions too.
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
When we see an offered HTLC onchain, we need to use the preimage if we
know it. So we dump all the known HTLC preimages at startup, and send
new ones as we discover them.
This doesn't cover preimages we know because we're the final
recipient; that can happen if an HTLC hasn't been irrevocably
committed yet. We'll do that in a followup patch.
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
lightningd can crash on shutdown if it's in the middle of getchaintips;
we free the conn, the finished callback is called (process_chaintips),
and it reports that it received an empty result.
The simplest fix is to set a flag in the struct bitcoind destructor,
and avoid the callback.
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
Either when it exits with a signal, or sends an error status message.
Then we make test_lightningd.py use it.
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
We simply kill lightningd; we should stop it properly and have a timeout
to kill it if that fails. However, that's beyond my python skills :(
So we just look for crash.log. Unfortunately, we usually kill
lightningd before it's finished writing it. So we look for it and
don't kill lightningd, just wait in this case.
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
This is the step where we broadcast the transaction to the network and
a nice place to extract the change from the transaction.
Signed-off-by: Christian Decker <decker.christian@gmail.com>
For the permfail tests the sendpay call is supposed to fail, so this
was printing stacktraces upon success. Running in futures captures any
thrown exceptions and rethrows them when calling `result()`, in our
case we just ignore them.
Signed-off-by: Christian Decker <decker.christian@gmail.com>
So far we were always using the deadline in the announcements, that's
obviously not good, so this introduces the parameter as per spec.
Signed-off-by: Christian Decker <decker.christian@gmail.com>
We weren't killing it. Eventually it would die, and peer_owner_finished()
would access subd->peer->owner, but that peer was freed already.
Closes: #261
Reported-by: Christian Decker <decker.christian@gmail.com>
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
To reproduce the next bug, I had to ensure that one node keeps thinking it's
disconnected, then the other node reconnects, then the first node realizes
it's disconnected.
This code does that, adding a '0' dev-disconnect modifier. That means
we fork off a process which (due to pipebuf) will accept a little
data, but when the dev_disconnect file is truncated (a hacky, but
effective, signalling mechanism) will exit, as if the socket finally
realized it's not connected any more.
The python tests hang waiting for the daemon to terminate if you leave
the blackhole around; to give a clue as to what's happening in this
case I moved the log dump to before killing the daemon.
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
jl777 reported a crash when we try to pay past reserve. Fix that (and
a whole class of related bugs) and add tests.
In test_lightning.py I had to make non-async path for sendpay() non-threaded
to get the exception passed through for testing.
Closes: #236
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
I was hoping to defer HTLC updates until we actually store HTLCs, but
we need to flush to DB whenever balances update as well.
Signed-off-by: Christian Decker <decker.christian@gmail.com>
This is the big one, and it's completely anticlimactic: it loads all
channels that have reached opening and are not marked as
closingd_complete into memory, that's it.
Signed-off-by: Christian Decker <decker.christian@gmail.com>
I was hoping to trigger on more things from the bitcoind process, but
stuff like mempool is hard to trigger on. Reducing to info so we can
work a bit easier with pdb and the log becomes less noisy.
We'll need this for testing nodes going down during payment.
However, there's no good way to silence the threads that I can tell,
so we get a nasty backtrace from it.
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
I tracked down a bug, and couldn't figure out why valgrind wasn't
finding it.
From man valgrind:
--log-file=<filename>
Specifies that Valgrind should send all of its messages to the
specified file. If the file name is empty, it causes an abort.
There are three special format specifiers that can be used in the
file name.
%p is replaced with the current process ID. This is very useful for
program that invoke multiple processes. WARNING: If you use
--trace-children=yes and your program invokes multiple processes OR
your program forks without calling exec afterwards, and you don't
use this specifier (or the %q specifier below), the Valgrind output
from all those processes will go into one file, possibly jumbled
up, and possibly incomplete.
"possibly incomplete" indeed!
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
We weren't registering reconnecting peers for broadcasts. Just
starting a timer is enough. Also added an integration test to check
that the gossip sync is being resumed.
This is a transitional state, while we're waiting to see the
closing tx onchain (which is To Be Implemented).
The simplest way to do re-transmission is to re-use closingd, and just
disallow any updates.
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
We actually don't need to transition if we're reconnecting, and logic
to go to CHANNELD_NORMAL was wrong: we checked that we'd seen funding tx
locked, but not that we'd received a msg from the remote peer.
We need to fix the tests now we no longer double-transition, too.
Fixes: #188
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
When we drop an HTLC_ADD packet, sometimes the commit timer fires
before we try to read from the fd. In this case, the payment is
considered committed and we don't fail.
We need manual commit to work around this, and also we'd need to do
the pay command asynchronously from python because it will block.
That's a bit out of scope for now, so just handle either way.
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
We keep the scriptpubkey to send until after a commitment_signed (or,
in the corner case, if there's no pending commitment). When we
receive a shutdown from the peer, we pass it up to the master.
It's up to the master not to add any more HTLCs, which works because
we move from CHANNELD_NORMAL to CHANNELD_SHUTTING_DOWN.
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
Currently it's fairly ad-hoc, but we need to tell it to channeld when
it restarts, so we define it as the non-HTLC balance.
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
When adding their HTLCs, it needs all the information. When failing,
it needs the id as key and the failure reason. When fulfilling, it
needs the id and payment preimage.
It also needs to know when we have received an revoke_and_ack or a
commitment_signed, to place in the database.
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
I have a test which waits for multiple occurrences of the same string,
but doesn't want them to overlap. Make wait_for_log() do the right thing,
so that it only looks for log entries since the last wait_for_log.
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
On my laptop under load, 5 seconds was no longer enough for legacy.
But this breaks async (they all see mempool increase, and fire
prematurely), so stop doing that.
I can't get this test to work at all, in fact, without this patch.
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
I actually hit this very hard to reproduce race: if we haven't process
the channeld message when block #6 comes in, we won't send the gossip
message. We wait for logs, but don't generate new blocks, and timeout
on l1.daemon.wait_for_log('peer_out WIRE_ANNOUNCEMENT_SIGNATURES').
The solution, which also tests that we don't send announcement signatures
immediately, is to generate a single block, wait for CHANNELD_NORMAL,
then (in gossip tests), generate 5 more.
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
1. We explicitly assert what state we're coming from, to make transitions
clearer.
2. Every transition has a state, even between owners while waiting for HSM.
3. Explictly step though getting the HSM signature on the funding tx
before starting channeld, rather than doing it in parallel: makes
states clearer.
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
We need to do this on every connection, whether reconnecting or not,
so it makes sense for the handshake daemon to handle it and return
the feature fields.
Longer term I'm considering having the handshake daemon handle the
listening and connecting, and simply hand the fds back once the peers
are ready.
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
We currently create a peer struct, then complete handshake to find out
who it is. This means we have a half-formed peer, and worse: if it's
a reconnect we get two peers the same.
Add an explicit 'struct connection' for the handshake phase, and
construct a 'struct peer' once that's done.
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
eg:
test_routing_gossip (__main__.LightningDTests) ... ERROR
======================================================================
ERROR: test_routing_gossip (__main__.LightningDTests)
----------------------------------------------------------------------
Traceback (most recent call last):
File "tests/test_lightningd.py", line 150, in tearDown
err_count += self.printValgrindErrors(node)
File "tests/test_lightningd.py", line 137, in printValgrindErrors
errors, fname = self.getValgrindErrors(node)
File "tests/test_lightningd.py", line 132, in getValgrindErrors
with open(error_file, 'r') as f:
FileNotFoundError: [Errno 2] No such file or directory: '/tmp/lightning-l106st0a/test_routing_gossip/lightning-1/valgrind-errors'
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
Under stress, the tests can mine blocks too soon, and the funding never
locks. This gives more of a chance, at least.
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
I couldn't actually figure out how to just dump them on error, so I
dump all the time. When running 3 lightningd + bitcoind, this separates
the logs nicely.
TODO: We should delete the directories on success!
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
But it breaks:
test_forward (__main__.LightningDTests) ... lightningd_channel: Computed MAC does not match expected MAC, the message was modified.
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
Mainly switching from the old include to the new include and adjusting
the actual size of the onion packet. It also moves `channel.c` to use
`struct hop_data`.
It introduces a dummy next hop in `channel.c` that will be replaced in
the next commit.
This moves all the non-legacy blackbox testing into python.
Before:
real 10m18.385s
After:
real 9m54.877s
Note that this doesn't valgrind the subdaemons: that patch seems to cause
some issues in the python framework which I am still chasing.
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
Running long integration tests could result in `bitcoind` dropping the
connection inbetween calls, and since python-bitcoinlib does not
reconnect and/or retry, all subsequent tests would fail as well. This
patch switches to throwaway connections, each serving just one
request. It's easier than to reach into the bitcoinlib to reconnect
and reauth, but comes with some overhead, but I think it's acceptable
for the few bitcoin calls we actually perform.
By looking for 'Done loading' in the log output we should actually be
called after `SetRPCWarmupFinished` in bitcoind. Only then is it safe
to make RPC calls. This resulted in the test suite being a bit flaky.