This is a noop if we're opening a new channel (channel_fees_can_change(channel)
is false until funding locked in), but important if we're restarting.
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
We don't update a channel's feerate on reestablishment: we insert a restart
in test_onchain_different_fees() (which we'll need soon anyway) to show it.
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
When in this state, we send a canned error "Awaiting unilateral close".
We enter this both when we drop to chain, and when we're trying to get
them to drop to chain due to option_data_loss_protect.
As this state (unlike channel errors) is saved to the database, it means
we will *never* talk to a peer again in this state, so they can't
confuse us.
Since we set this state in channel_fail_permanent() (which is the only
place we call drop_to_chain for a unilateral close), we don't need to
save to the db: channel_set_state() does that for us.
This state change has a subtle effect: we return WIRE_UNKNOWN_NEXT_PEER
instead of WIRE_TEMPORARY_CHANNEL_FAILURE as soon as we get a failure
with a peer. To provoke a temporary failure in test_pay_disconnect we
take the node offline.
Reported-by: Christian Decker @cdecker
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
This means we don't try to unilaterally close after a restart, *and*
we can tell onchaind to try to use the point to recover funds when the
peer unilaterally closes.
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
Firstly, if they claim to know a future value, we ask the HSM; if
they're right, we tell master what the per-commitment-secret it gave
us (we have no way to validate this, though) and it will not broadcast
a unilateral (knowing it will cause them to use a penalty tx!).
Otherwise, we check the results they sent were valid. The spec says
to do this (and close the channel if it's wrong!), because otherwise they
could continually lie and give us a bad per-commitment-secret when we
actually need it.
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
Tests were failing when in the same thread after a test which set
log_all_io=True, because SIGUSR1 seemed to be turning logging *off*.
This is due to Python using references not copies for assignment.
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
This is required for the next test, which has to log messages from channeld
as soon as it starts (so might be too late if it sends SIGUSR1).
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
We ignore incoming for now, but this means we advertize the option and
we send the required fields.
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
peer features are only kept for connected peers (as they can change),
but we didn't update them on reconnect. The main effect was that
after a restart we displayed the features as empty, even after
reconnect.
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
1. l1 update_fee -> l2
2. l1 commitment_signed -> l2 (using new feerate)
3. l1 <- revoke_and_ack l2
4. l1 <- commitment_signed l2 (using new feerate)
5. l1 -> revoke_and_ack l2
When we break the connection after #3, the reconnection causes #4 to
be retransmitted, but it turns out l1 wasn't telling the master to set
the local feerate until it received the commitment_signed, so on
reconnect it uses the old feerate, with predictable results (bad
signature).
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
Useful it we want to intercept bitcoin-cli first.
We move the getinfo() caching into start(), as that's when we can actually
use RPC.
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
We're going to use it to override specific commands. It's non-valgrinded
already since we use '--trace-children-skip=*bitcoin-cli*' so the overhead
should be minimal.
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
We got an index error, because status had only one field (onchaind not
started yet).
> wait_for(lambda: only_one(p.rpc.listpeers(l1.info['id'])['peers'][0]['channels'])['status'][1] == 'ONCHAIN:Tracking mutual close transaction')
E IndexError: list index out of range1
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
Normal wallet txs get reconfirmed as blocks come in, but ones which need
closeinfo are more fragile, so we do it manually using txwatch for them.
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
In one case, the channel_update which we expected to activate the channel
from l2 was suppressed as redundant. This is certainly valid, so just
check the results.
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
Waiting for three node_announcments isn't always enough, since l2 can
publish two of them (an independent bug). Do the more Right Thing and
just wait for 30 seconds of no input...
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
1. connect convenience variable for improved readabilty.
2. a comment explaining that timer is on channel, not HTLC.
3. use modern python style in test_htlc_send_timeout
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
Now sending a ping makes sense: it should force the other end to send
a reply, unblocking the commitment process.
Note that rather than waiting for a reply, we're actually spinning on
a 100ms loop in this case. But it's simple and it works.
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
Currently, if we don't realize a TCP connection is down, we almost
certainly don't find out until *after* we're sent the
commitment_signed message, in which case we cannot fail the incoming
HTLC.
This test demonstrates that. Note the 30 second sleep: we should really
run Travis tests in parallel!
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
If we have an address hint, we start with that, but we'll use
node_announcement information if required.
Note: we (ab)use the address hint when restoring from the database
or reconnecting, even if the connection was *incoming*. That meant
that the recipient of a connection would *never* manage to connect out.
We still don't take multiple addresses from the DNS seeds: I assume we
should, since there could be IPv4 and IPv6.
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
connectd tells master about every disconnection, and master knows
whether it's important to reconnect. Just get the master to invoke a new
connect command if it considers the peer important!
The only twist is timeouts: we don't want to immediately reconnect if
we've failed to connect. To solve this, connectd passes a 'delaytime'
to the master when a connection fails, and the master passes it back
when it asks for a connection.
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
In particular, all opening_read_peer_msg() callers need to know there
was an error (presumably, negotiating) so they can stop, but we should
not exit.
This lets us reenable the final disabled test.
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
We now simply maintain a pubkey set for connected peers (we only care
if there's a reconnect), not the entire peer structure.
lightningd no longer queries us for getpeers: it knows more than we do
already.
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
There's now a potential race: the source peer connect returns, but in
destination peer the master hasn't read the connect message from
connectd, so the peer isn't in listpeers yet.
(Previously the connection stayed in connectd, so there was no such
window).
This is an occasional issue in a few places.
Note that we take the opportunity to speed up test_disconnectpeer too
while we're there.
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
Prior to this, lightningd would hand uninteresting peers back to connectd,
which would then return it to lightningd if it sent a non-gossip msg,
or if lightningd asked it to release the peer.
Now connectd hands the peer to lightningd once we've done the init
handshake, which hands it off to openingd.
This is a deep structural change, so we do the minimum here and cleanup
in the following patches.
Lightningd:
1. Remove peer_nongossip handling from connect_control and peer_control.
2. Remove list of outstanding fundchannel command; it was only needed to
find the race between us asking connectd to release the peer and it
reconnecting.
3. We can no longer tell if the remote end has started trying to fund a
channel (until it has succeeded): it's very transitory anyway so not
worth fixing.
4. We now always have a struct peer, and allocate an uncommitted_channel
for it, though it may never be used if neither end funds a channel.
5. We start funding on messages for openingd: we can get a funder_reply
or a fundee, or an error in response to our request to fund a channel.
so we handle all of them.
6. A new peer_start_openingd() is called after connectd hands us a peer.
7. json_fund_channel just looks through local peers; there are none
hidden in connectd any more.
8. We sometimes start a new openingd just to send an error message.
Openingd:
1. We always have information we need to accept them funding a channel (in
the init message).
2. We have to listen for three fds: peer, gossip and master, so we opencode
the poll.
3. We have an explicit message to start trying to fund a channel.
4. We can be told to send a message in our init message.
Testing:
1. We don't handle some things gracefully yet, so two tests are disabled.
2. 'hand_back_peer .*: now local again' from connectd is no longer a message,
openingd says 'Handed peer, entering loop' once its managing it.
3. peer['state'] used to be set to 'GOSSIPING' (otherwise this field doesn't
exist; 'state' is now per-channel. It doesn't exist at all now.
4. Some tests now need to turn on IO logging in openingd, not connectd.
5. There's a gap between connecting on one node and having connectd on
the peer hand over the connection to openingd. Our tests sometimes
checked getpeers() on the peer, and didn't see anything, so line_graph
needed updating.
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
In particular, I found lightning_openingd processes after running
tests. When we use the dev_disconnect blackhole '0' option, they
stick around until the dev_disconnect file is truncated (there is only
so much you can do with only a file descriptor), so let's do that.
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
The following changes revealed this race, where expecting listchannels()
to contain two channels immediately after fund_channel() was racy.
We also derive the short_channel_id first, so we can search logs for the
exact messages.
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
The next patches get better at reconecting, so if we use dev-allow-localhost
nodes can often find each other and reconnect before shutting down; only
use that option where we actually need it.
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
Saw this in Travis: technically we return from the dev_set_max_scids...
cmd after sending it to gossipd, but we should wait for it to log.
Adding an internal reply message for a dev command seems overkill.
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
1. If the IPv6 address was public, that changed the wireaddr and thus the ipv4 bind
would not be to a wildcard and would fail.
2. Binding two fds to the same port on both wildcard IPv4 and IPv6 succeeds; we only
fail when we try to listen, so allow error at this point.
For some reason this triggered on my digital ocean machine.
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
openingd calculates our reserve based on the channel amount (even if
we're funding, to keep the calculation in one place), but it wasn't
reporting it back to the master daemon. We initialized it to 0 so that
valgrind wouldn't get upset, as it's part of a structure we send over
the wire.
Have openingd report back, and also initialize it to an impossible value
as extra assurance. And remove a stray (harmless but weird) semicolon.
Reported-by: Gálli Zoltán
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
This adds one line with the onion and the channel_update we extract from
it. This in turn allows us to check that the channel_update in the onion is not
type prefixed, and that we patch it correctly before passing it to gossipd.
The easiest way to do this is to play with the 'wallet_tx' semantics
and have 'amount' have meaning even when 'all_funds' is set.
Note that we change the string 'Cannot afford funding transaction' to
'Cannot afford transaction' as this code is also used for withdrawls.
Inspired-by: molz on #c-lightning
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
The logs in various Travis failures show that it takes 20 seconds just for
closingd to read the init message. As a result, the close times out (default
is 30 seconds).
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
We were *supposed* to be waiting for the next commitment tx so we
made sure the one we broadcast was old, *but* the 'revoke_and_ack'
we were waiting for could be matched by the completion of the previous
'revoke_and_ack'.
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
This needs to be done separately from the rest of the daemon since we can
otherwise not make sure that it happens before the DB is freed and we might
still need the DN, and be running in a DB transaction, for some destructors to
run.
That was the cause of the bad gossip order failures: gossipd thought our
channel was live, but the other end didn't receive message last time.
Now gossipd doesn't use fd to kill us (connectd tells master to do so), we
can implement read_peer_msg_nogossip().
Fixes: #1706
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
This patch guts gossipd of all peer-related functionality, and hands
all the peer-related requests to channeld instead.
gossipd now gets the final announcable addresses in its init msg, since
it doesn't handle socket binding any more.
lightningd now actually starts connectd, and activates it. The init
messages for both gossipd and connectd still contain redundant fields
which need cleaning up.
There are shims to handle the fact that connectd's wire messages are
still (mostly) gossipd messages.
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
Gossipd combines the information if it knows it, but that's really the
job of 'listnodes'. More importantly, channeld won't have access to
this information.
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
I saw an error in test_gossip_weirdalias in Travis, where listnodes(nodeid)
returned *BOTH* nodes; it happened to fail because [0] was the wrong one, but
it would have passed if the order had been different.
This helper asserts that we really do only have one element, and should
catch such bugs faster.
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
I could not figure out why test_announce_address suddenly stopped working:
I had previously been using DEVELOPER=1 on the cmdline for historical
reasons when testing locally.
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
We weren't waiting for gossipd to actually process the
dev_set_max_scids_encode_size message, so under Travis it sometimes
split the reply before processing that.
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
In this test we tell l3 to disconnect on sending WIRE_CHANNEL_ANNOUNCEMENT.
This is hit by gossipd (to disconnect from l2) but *also* channeld to
disconnect from l4. That's OK, because normally by this point l4 has
sent its real channel_update.
However, the next patch introduces a delay in sending channel_updates,
meaning l4 hasn't sent it yet. If l3 doesn't reconnect to l4, we
never get the channel_update and the test which expects l1 to eventually
see both sides of the channel fails.
So we manually reconnect then. Note that we remove the redundant
'dev-no-reconnect' option from l2: it's added automatically as it
doesn't set 'may_reconnect'.
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
Gossipd will ignore the second one, but doing it in the front end
gives an explicit error message.
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
Python dict can't have duplicate entries, but some options can be specified
multiple times. The easiest way is to put a list in the dict.
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
E ConnectionRefusedError: [Errno 111] Connection refused
And in debug.log:
2018-05-17T04:06:35Z Warning: Config setting for -rpcport only applied on regtest network when in [regtest] section.
Unfortunately, current versions including 0.16.1 *ignore* the contents of
a '[regtest]' section, so we need it in *both* places.
Also remove the misleading 'rpcport' initialization which we always
override.
Note that we don't fix this message though:
2018-05-17T04:06:35Z Config options rpcuser and rpcpassword will soon be deprecated. Locally-run instances may remove rpcuser to use cookie-based auth, or may be replaced with rpcauth. Please see share/rpcuser for rpcauth auth generation.
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
For some reason, we created a second bitcoin.conf in the regtest/ directory,
which AFAICT nothing uses.
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
It's not optional for our test setup, and this makes it easier to invoke
bitcoin-cli manually, for example.
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
As of bitcoind 0.16.1, you can't send a single-input OP_RETURN output,
as you get 'tx-too-small'.
sendrawtx exit 26, gave error code: -26?error message:?tx-size-small (code 64)?'
So instead we use the minimum fee we can, but otherwise ignore it and
don't wait for it to be mined.
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
New codes: FUND_MAX_EXCEEDED, FUND_CANNOT_AFFORD, FUND_DUST_LIMIT_UNMET.
The error message "Cannot afford fee" was not exactly correct because
it would also occur if the amount requested could not be afforded. So
I changed it to the more generic "Cannot afford transaction".
Other things:
* Fixed off-by-one satoshi in fundchannel manpage.
* Changed 'arror' to 'error' because we are not pirates.
I still believe that 2 weeks is way too much, but we were promised that these
defaults would be slowly reduced to saner values as the stability increases.
Signed-off-by: Christian Decker <decker.christian@gmail.com>
Waiting for 'Disabling channel' is not enough, since it's async to the
actual tx broadcast: I caught a case where the unilateral close wasn't
in the block, and so failed.
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
72d103d6bb deprecated DEVELOPER env var
in favor of config.vars, but didn't update test_closing.py or test_gossip.py.
5d0a54b7f0 then removed the explicit
DEVELOPER= setting from Travis.
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
The modern, pytest based, tests now clean up after themselves by removing
directories of successful tests and the base directory if there was no failure.
Signed-off-by: Christian Decker <decker.christian@gmail.com>
Previous replies weren't large enough; add another channel and then
use IO tracing to make sure the reply is zlib encoded.
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
Since we currently only (ab)use it to send everything, we need a way to
generate boutique queries for testing.
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
The gossip-query spec enhancements say not to forward an channel_announcement until
you have receive a channel_update. This test fails for now.
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
When opening a number channels from a single node we could end up not waiting
for the funding tx to make it into the mempool, instead triggering on a previous
`sendrawtransaction` or `CHANNEL_NORMAL` in the logs. This now checks that the
actual funding transaction makes it into the mempool and that we wait for the
depth change for that specific channel.
Signed-off-by: Christian Decker <decker.christian@gmail.com>
We don't have any connection yet, so how could they be active? Disable both
sides to avoid trying to route through them or telling others to use them as
`contact_points` in invoices.
Signed-off-by: Christian Decker <decker.christian@gmail.com>
Mostly `lightningd` complaining about not being able to estimate fees. Safes us
a lot of log space when some tests time out, and safes us a few context switches
between log appender and log watchers.
Signed-off-by: Christian Decker <decker.christian@gmail.com>
We were sort of assuming that one side telling it's ok would automagically make
the other side succeed as well.
Signed-off-by: Christian Decker <decker.christian@gmail.com>
While 1 second is very generous, it resolving the HTLCs may take longer, so we
just loop over this instead of making it one-shot
Signed-off-by: Christian Decker <decker.christian@gmail.com>
The RPC calls aren't really free, so let's wait for absolute times, computed
from the `start_time` instead.
Signed-off-by: Christian Decker <decker.christian@gmail.com>
We can restore them once we get parallel testing on Travis, but meanwhile
we time out because of the 30 seconds bitcoind poll.
Running on my laptop with --duration=5:
=========================== slowest 5 test durations ===========================
184.07s call tests/test_lightningd.py::LightningDTests::test_multiple_channels
156.66s call tests/test_lightningd.py::LightningDTests::test_forward
155.77s call tests/test_lightningd.py::LightningDTests::test_closing
126.83s call tests/test_lightningd.py::LightningDTests::test_waitinvoice
126.11s call tests/test_lightningd.py::LightningDTests::test_waitanyinvoice
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
Because we have too many which are never used and I don't want to document
them.
1. Remove unused anchor_onchain_wait. When implemented, it should be
hardcoded to 100 or more.
2. Remove anchor_confirms_max. 10 always reasonable, and we can readd
an override option should someone need it.
3. max_htlc_expiry should be the same as locktime_max (which increases
from 3 to 5 days by default): they're both a limit on how long
funds can be locked up.
4. channel_update_interval should always be a dev option.
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
Make --override-fee-rates a dev option. We use default-fee-rate in
its place, which (since bitcoind won't give fee estimates in regtest
mode for short chains) gives an effective feerate of 15000/7500/3750.
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
@cdecker points out that in test_forward, where we manually create a route,
we get an error back which contains an update for an unknown channel.
We should still note this, but it's not an error for testing.
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
This is something which generally shouldn't happen, but we didn't
notice it previously.
We ignore this warning in the case where a channel was deleted: this
happens because one side can send an update while the other notices
that the channel is closed.
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
It gets confused if re-run due to flaky, since:
1. Using the same HSM, it generates the same spend and sees a previous one.
2. The block height numbers are off.
Fixes: #1479
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
We have a 'bitcoind' global: getting it from inside one of the daemons
was a mistake I've copied widely.
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>