Commit Graph

145 Commits

Author SHA1 Message Date
Rusty Russell
d85251ac6c db: fix up HTLCs which are missing failure information.
We don't save them to the database, so fix things up as we load them.

Next patch will actually save them into the db, and this will become
COMPAT code.

Also: call htlc_in_check() with NULL on db load, as otherwise it aborts
internally.

Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
2018-10-09 23:17:54 +00:00
Rusty Russell
77be009354 pytest: add restart-during-n-way payment test.
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
2018-10-09 23:17:54 +00:00
Rusty Russell
65f6813706 lightningd: handle the case where the db contains a resolved HTLC without a preimage.
We need to handle this case (old db) before the next commit, which actually
fixes it.

Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
2018-10-09 23:17:54 +00:00
Rusty Russell
c94ab7370c pytest: extend the test_fulfill_incoming_first case to cover reconnect.
Which we don't handle, due to a separate bug, so it's xfail.

Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
2018-10-09 23:17:54 +00:00
Rusty Russell
4040c53258 lightningd: handle case where incoming HTLC vanished before fulfilled outgoing.
We now need an explicit 'local' flag, rather than relying on the existence
of the 'in' pointer.

Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
2018-10-09 23:17:54 +00:00
Rusty Russell
3643e1bd90 pytest: add test for when an incoming fulfilled HTLC expires before outgoing.
Usually, we only close an incoming HTLC once the outgoing HTLC is completely
resolved.  However, we short-cut this in the FULFILL case: we have the
preimage, so might as well use it immediately (in fact, we wait for it to
be committed, but we don't need to in theory).

As a side-effect of this, our assumption that every outgoing HTLC has
a corresponding incoming HTLC is incorrect, and this test (xfail) tickles
that corner case.

Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
2018-10-09 23:17:54 +00:00
Rusty Russell
9455331575 json: use bolt naming for features arrays in listnodes, listpeers.
Deprecate the old names.

Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
2018-09-28 04:14:28 +00:00
Rusty Russell
3358437062 connectd: don't log every time a peer disconnects.
Great for a few of our tests, but generally spammy.

Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
2018-09-28 04:14:28 +00:00
Rusty Russell
252bbe1d2d pytest: don't wait for sendrawtx, wait for expected tx.
In particular, test_no_fee_estimate was flaky due to seeing the funding
tx being sent.

Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
2018-09-19 13:04:01 +02:00
Christian Decker
9e5d7dacb0 pytest: Use the mock bitcoind everywhere 2018-09-16 00:05:34 +02:00
Christian Decker
16869e3fe6 pytest: Use the bitcoind proxy to mock feerates 2018-09-16 00:05:34 +02:00
Rusty Russell
cefb6925b2 db: save and restore last_sent_commit correctly.
It's an array: we were only saving the single element; if there was more than
one changed HTLC we'd get a bad signature!

The report in #1907 is probably caused by the other side re-requesting
something we considered already finalized; to avoid this particular error,
we should set the field to NULL if there's no last_sent_commit.

I'm increasingly of the opinion we want to just save all the update
packets to the db and blast them out, instead of doing this
second-guessing dance.

Fixes: #1907
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
2018-09-04 14:43:27 +02:00
Rusty Russell
db12a1452f pytest: reproduce problem with restarting and retransmitting multiple outgoing htlcs
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
2018-09-04 14:43:27 +02:00
Rusty Russell
7e1fd9c38a pytest: make test_no_fee_estimate more reliable.
1. Wait for a 'sendrawtransaction' *after* the dev-fail message; don't be
   fooled by a previous one.
2. Turning on estimate fee sets fees exactly; just wait for it to be processed.

Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
2018-08-30 16:33:35 +02:00
Rusty Russell
db3c387264 feerate: allow names 'urgent' 'normal' and 'slow'.
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
2018-08-30 16:33:35 +02:00
Rusty Russell
e0952ceff2 feerate: use suffix, not separate argument.
And, reluctantly, default to bitcoind style.
"It's wrong to be right too soon."

Suggested-by: @cdecker
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
2018-08-30 16:33:35 +02:00
Rusty Russell
14dc1c37ab fundchannel / withdraw: allow explicit feerate setting.
These are the two cases where we'll refuse without a fee estimate.

Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
2018-08-30 16:33:35 +02:00
Rusty Russell
9d37b78088 cleanup: lowercase name of feerates, immediate -> urgent.
This is only used for logging now, but it gets more important as it
enters the RPC API.

Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
2018-08-25 00:33:12 +00:00
Rusty Russell
b5ae4c12c7 pytest: fix flaky wait_for_log() in test_funder_feerate_reconnect.
The comment was wrong: the channel being locked in was triggering
the fee update and hence the disconnect.  But that can actually
happen before fund_channel returns, as that waits for the gossipd
to see the channel active.

Best to do the fee update manually, so it's exactly what we want.

Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
2018-08-24 02:17:51 +00:00
Rusty Russell
175db926c2 chaintopology: expose when we don't actually know feerate.
We use feerate in several places, and each one really should react
differently when it's not available (such as when bitcoind is still
catching up):

1. For general fee-enforcement, we use the broadest possible limits.
2. For closingd, we use it as our opening negotiation point: just use half
   the last tx feerate.
3. For onchaind, we can use the last tx feerate as a guide for our own txs;
   it might be too high, but at least we know it was sufficient to be mined.
4. For withdraw and fund_channel, we can simply refuse.

Fixes: #1836
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
2018-08-24 02:17:51 +00:00
Rusty Russell
d93be58bd0 pytest: remove use dev-override-feerates.
Manipulate fees via fake-bitcoin-cli.  It's not quite the same, as
these are pre-smoothing, so we need a restart to override that where
we really need an exact change.  Or we can wait until it reaches a
certain value in cases we don't care about exact amounts.

Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
2018-08-24 02:17:51 +00:00
Rusty Russell
607d4bf9d2 channel: update fees after lockin.
We don't respond to fee changes until we're locked in: make sure we catch
up at that point.

Note that we use NORMAL fees during opening, but IMMEDIATE after, so
this often sends a fee update.  The tests which break, we set those
feerates to be equal.

This (sometimes) changes the behavior of test_permfail, as we now
get an immediate commit, so that is fixed too so we always wait for
that to complete.

Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
2018-08-24 02:17:51 +00:00
Rusty Russell
6338ae8a44 channeld: update fees if we're restarting.
This is a noop if we're opening a new channel (channel_fees_can_change(channel)
is false until funding locked in), but important if we're restarting.

Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
2018-08-24 02:17:51 +00:00
Rusty Russell
36b1cac6e6 lightningd: new state AWAITING_UNILATERAL.
When in this state, we send a canned error "Awaiting unilateral close".
We enter this both when we drop to chain, and when we're trying to get
them to drop to chain due to option_data_loss_protect.

As this state (unlike channel errors) is saved to the database, it means
we will *never* talk to a peer again in this state, so they can't
confuse us.

Since we set this state in channel_fail_permanent() (which is the only
place we call drop_to_chain for a unilateral close), we don't need to
save to the db: channel_set_state() does that for us.

This state change has a subtle effect: we return WIRE_UNKNOWN_NEXT_PEER
instead of WIRE_TEMPORARY_CHANNEL_FAILURE as soon as we get a failure
with a peer.  To provoke a temporary failure in test_pay_disconnect we
take the node offline.

Reported-by: Christian Decker @cdecker
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
2018-08-23 14:46:22 +02:00
Rusty Russell
a5ecc95c42 db: store claimed per_commitment_point from option_data_loss_protect.
This means we don't try to unilaterally close after a restart, *and*
we can tell onchaind to try to use the point to recover funds when the
peer unilaterally closes.

Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
2018-08-23 14:46:22 +02:00
Rusty Russell
1a4084442b onchaind: use a point-of-last-resort if we see an unknown transaction.
This may have been supplied by the peer if it's nice and supports
option_data_loss_protect.

Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
2018-08-23 14:46:22 +02:00
Rusty Russell
6aed936799 channeld: check option_data_loss_protect fields.
Firstly, if they claim to know a future value, we ask the HSM; if
they're right, we tell master what the per-commitment-secret it gave
us (we have no way to validate this, though) and it will not broadcast
a unilateral (knowing it will cause them to use a penalty tx!).

Otherwise, we check the results they sent were valid.  The spec says
to do this (and close the channel if it's wrong!), because otherwise they
could continually lie and give us a bad per-commitment-secret when we
actually need it.

Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
2018-08-23 14:46:22 +02:00
Rusty Russell
da8d620907 pytest: check that we advertise and send option_data_loss_protect.
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
2018-08-23 14:46:22 +02:00
Rusty Russell
ebaf5eaf2e channeld: send option_data_loss_protect fields.
We ignore incoming for now, but this means we advertize the option and
we send the required fields.

Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
2018-08-23 14:46:22 +02:00
Rusty Russell
162adfdf12 listpeers: correctly display features on reconnect.
peer features are only kept for connected peers (as they can change),
but we didn't update them on reconnect.  The main effect was that
after a restart we displayed the features as empty, even after
reconnect.

Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
2018-08-23 14:46:22 +02:00
Rusty Russell
9f175deecd lightningd: update feerate upon receiving revoke_and_ack from fundee.
1. l1     update_fee ->    l2
2. l1 commitment_signed -> l2 (using new feerate)
3. l1  <- revoke_and_ack   l2
4. l1 <- commitment_signed l2 (using new feerate)
5. l1  -> revoke_and_ack   l2

When we break the connection after #3, the reconnection causes #4 to
be retransmitted, but it turns out l1 wasn't telling the master to set
the local feerate until it received the commitment_signed, so on
reconnect it uses the old feerate, with predictable results (bad
signature).

Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
2018-08-22 18:54:53 +02:00
Rusty Russell
c106fa1b4f pytest: add test for reconnect immediately after feerate change.
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
2018-08-22 18:54:53 +02:00
Rusty Russell
7a77b81dff pytest: always use the fake-bitcoin-cli, name it more appropriately.
We're going to use it to override specific commands.  It's non-valgrinded
already since we use '--trace-children-skip=*bitcoin-cli*' so the overhead
should be minimal.

Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
2018-08-22 12:13:23 +02:00
Rusty Russell
5e20eedb41 pytest: allow more elaborate bitcoin-cli override.
In particular, this lets us intercept individual commands, such as
estimatesmartfee.

Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
2018-08-22 12:13:23 +02:00
Rusty Russell
b379bec4e4 pytest: fix flakiness in test_channel_reenable.
In one case, the channel_update which we expected to activate the channel
from l2 was suppressed as redundant.  This is certainly valid, so just
check the results.

Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
2018-08-16 00:14:08 +00:00
Christian Decker
a97955845f pytest: dev-ping was renamed to ping 2018-08-10 16:15:12 +02:00
Rusty Russell
35d7449259 connectd: initialize peer->conn.
It's only used in one place, but that's enough.

Fixes: #1434
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
2018-08-10 16:15:12 +02:00
Rusty Russell
f8aed1b4b0 pytest: add reconnection stress test.
It sometimes triggers a crash like #1434 (though never under valgrind).

Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
2018-08-10 16:15:12 +02:00
Rusty Russell
fefb7faba7 pytest: try a simple reconnection test.
This passes, but that's OK.

Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
2018-08-10 16:15:12 +02:00
Rusty Russell
8939a5001b connectd: rely on the master to tell us to reconnect.
connectd tells master about every disconnection, and master knows
whether it's important to reconnect.  Just get the master to invoke a new
connect command if it considers the peer important!

The only twist is timeouts: we don't want to immediately reconnect if
we've failed to connect.  To solve this, connectd passes a 'delaytime'
to the master when a connection fails, and the master passes it back
when it asks for a connection.

Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
2018-08-09 19:44:27 +02:00
Rusty Russell
035362e151 openingd: don't exit when we receive an error.
In particular, all opening_read_peer_msg() callers need to know there
was an error (presumably, negotiating) so they can stop, but we should
not exit.

This lets us reenable the final disabled test.

Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
2018-08-09 19:44:27 +02:00
Rusty Russell
02966a4857 connectd: remove unused handback APIs and code.
We now simply maintain a pubkey set for connected peers (we only care
if there's a reconnect), not the entire peer structure.

lightningd no longer queries us for getpeers: it knows more than we do
already.

Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
2018-08-09 19:44:27 +02:00
Rusty Russell
e59cbb3e2c pytest: make sure receiving peer's openingd is ready.
There's now a potential race: the source peer connect returns, but in
destination peer the master hasn't read the connect message from
connectd, so the peer isn't in listpeers yet.

(Previously the connection stayed in connectd, so there was no such
window).

This is an occasional issue in a few places.

Note that we take the opportunity to speed up test_disconnectpeer too
while we're there.

Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
2018-08-09 19:44:27 +02:00
Rusty Russell
50f5eb34b4 openingd: take peer before we're opening, wait for explicit funding msg.
Prior to this, lightningd would hand uninteresting peers back to connectd,
which would then return it to lightningd if it sent a non-gossip msg,
or if lightningd asked it to release the peer.

Now connectd hands the peer to lightningd once we've done the init
handshake, which hands it off to openingd.

This is a deep structural change, so we do the minimum here and cleanup
in the following patches.

Lightningd:
1. Remove peer_nongossip handling from connect_control and peer_control.
2. Remove list of outstanding fundchannel command; it was only needed to
   find the race between us asking connectd to release the peer and it
   reconnecting.
3. We can no longer tell if the remote end has started trying to fund a
   channel (until it has succeeded): it's very transitory anyway so not
   worth fixing.
4. We now always have a struct peer, and allocate an uncommitted_channel
   for it, though it may never be used if neither end funds a channel.
5. We start funding on messages for openingd: we can get a funder_reply
   or a fundee, or an error in response to our request to fund a channel.
   so we handle all of them.
6. A new peer_start_openingd() is called after connectd hands us a peer.
7. json_fund_channel just looks through local peers; there are none
   hidden in connectd any more.
8. We sometimes start a new openingd just to send an error message.

Openingd:
1. We always have information we need to accept them funding a channel (in
   the init message).
2. We have to listen for three fds: peer, gossip and master, so we opencode
   the poll.
3. We have an explicit message to start trying to fund a channel.
4. We can be told to send a message in our init message.

Testing:
1. We don't handle some things gracefully yet, so two tests are disabled.
2. 'hand_back_peer .*: now local again' from connectd is no longer a message,
   openingd says 'Handed peer, entering loop' once its managing it.
3. peer['state'] used to be set to 'GOSSIPING' (otherwise this field doesn't
   exist; 'state' is now per-channel.  It doesn't exist at all now.
4. Some tests now need to turn on IO logging in openingd, not connectd.
5. There's a gap between connecting on one node and having connectd on
   the peer hand over the connection to openingd.  Our tests sometimes
   checked getpeers() on the peer, and didn't see anything, so line_graph
   needed updating.

Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
2018-08-09 19:44:27 +02:00
Christian Decker
58709cf190 pytest: Migrate connection tests to new fixture model 2018-08-07 00:54:19 +00:00