Usually if we get a packet while closing (onchain event), we're going
through pkt_in which discards it. However, if we're reconnecting, we
simply process the init packet and get upset because they've forgotten
us.
Hard to reproduce, but here's the log (in this case, test-routing --reconnect
and we have just done mutual close):
We reconnect in STATE_MUTUAL_CLOSING, send INIT pkt:
+19.397025114 023ec94fb93c669154ba7b08907276e8c8661b2e65d80fc2c089215d5395574898:DEBUG: Init with ack 1 opens + 9 sigs + 8 revokes + 1 shutdown + 1 closing
While waiting for response, we see the mutual close...
+19.398732602 lightningd(4637):DEBUG: reaped 6370: bitcoin-cli -regtest=1 -datadir=/tmp/bitcoin-lightning2 getblock 2a63b209e17aedc5b1bcc6c2f9e044f97c9c3ca136fc64a719f704d2f632df5f false
+19.401834422 lightningd(4637):DEBUG: Adding block 5fdf32f6d204f719a764fc36a13c9c7cf944e0f9c2c6bcb1c5ed7ae109b2632a
+19.405167334 023ec94fb93c669154ba7b08907276e8c8661b2e65d80fc2c089215d5395574898:DEBUG: Got UTXO spend for 8bb48a:0: 7f5e422f...
+19.412543610 023ec94fb93c669154ba7b08907276e8c8661b2e65d80fc2c089215d5395574898:DEBUG: anchor_spent: STATE_MUTUAL_CLOSING => STATE_CLOSE_ONCHAIN_MUTUAL
And we also see it buried "forever" (10 blocks in test mode), so we forget peer:
+19.423045014 023ec94fb93c669154ba7b08907276e8c8661b2e65d80fc2c089215d5395574898:DEBUG: Anchor at depth 13
+19.426775063 023ec94fb93c669154ba7b08907276e8c8661b2e65d80fc2c089215d5395574898:DEBUG: check_for_resolution: STATE_CLOSE_ONCHAIN_MUTUAL => STATE_CLOSED
+19.427613109 023ec94fb93c669154ba7b08907276e8c8661b2e65d80fc2c089215d5395574898:DEBUG: db_forget_peer(023ec94fb93c669154ba7b08907276e8c8661b2e65d80fc2c089215d5395574898)
+19.428130685 023ec94fb93c669154ba7b08907276e8c8661b2e65d80fc2c089215d5395574898:DEBUG: db_start_transaction(023ec94fb93c669154ba7b08907276e8c8661b2e65d80fc2c089215d5395574898)
+19.501027511 023ec94fb93c669154ba7b08907276e8c8661b2e65d80fc2c089215d5395574898:DEBUG: db_commit_transaction(023ec94fb93c669154ba7b08907276e8c8661b2e65d80fc2c089215d5395574898)
Now, we get their reply, but they've forgotten us:
+19.520208608 023ec94fb93c669154ba7b08907276e8c8661b2e65d80fc2c089215d5395574898:DEBUG: Decrypted header len 5
+19.520872035 023ec94fb93c669154ba7b08907276e8c8661b2e65d80fc2c089215d5395574898:DEBUG: Received packet LEN=5, type=PKT__PKT_INIT
+19.520999082 023ec94fb93c669154ba7b08907276e8c8661b2e65d80fc2c089215d5395574898:DEBUG: Our order counter is 19, their ack 0
+19.521078913 023ec94fb93c669154ba7b08907276e8c8661b2e65d80fc2c089215d5395574898:DEBUG: They acked 0, remote=16 local=15
+19.521447174 023ec94fb93c669154ba7b08907276e8c8661b2e65d80fc2c089215d5395574898:DEBUG: Queued pkt PKT__PKT_OPEN (order=19)
+19.522563794 023ec94fb93c669154ba7b08907276e8c8661b2e65d80fc2c089215d5395574898:DEBUG: Queued pkt PKT__PKT_OPEN_COMMIT_SIG (order=19)
+19.523517319 023ec94fb93c669154ba7b08907276e8c8661b2e65d80fc2c089215d5395574898:BROKEN: Can't rexmit 2 when local commit 15 and remote 16
+19.524613177 023ec94fb93c669154ba7b08907276e8c8661b2e65d80fc2c089215d5395574898:UNUSUAL: Sending PKT_ERROR: invalid ack
+19.526638447 023ec94fb93c669154ba7b08907276e8c8661b2e65d80fc2c089215d5395574898:DEBUG: Queued pkt PKT__PKT_ERROR (order=19)
+19.527508022 023ec94fb93c669154ba7b08907276e8c8661b2e65d80fc2c089215d5395574898:DEBUG: peer_comms_err: STATE_CLOSED => STATE_ERR_BREAKDOWN
We should never transition from STATE_CLOSED to STATE_ERR_BREAKDOWn,
and that's what this check prevents.
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
This seems rather easy to fix, the only case we do not want to set
`STATE_SHUTDOWN` us when we have updates which we have not committed
yet, which is handled separately in the other IF-branch.
The `dstate` reference was only an indirection to the `timers`
sub-structure anyway, so removing this indirection allows us to reuse
the timers in the subdaemon arch.
Moved the broadcast functionality to broadcast.[ch]. So far this
includes only the enqueuing side of broadcasts, the dequeuing and
actual push to the peer is daemon dependent. This also adds the
broadcast_state to the routing_state and the last broadcast index to
the peer for the legacy daemon.
This was the only time we actually reference non-routing structs in
routing, so moving this out should allow us to get it working in the
new subdaemons.
This used to be part of `lightningd_state` which is being split up for
the various subdaemons. The main change is the addition of the `struct
routing_state` in `routing.h` and the addition of `rstate` in `struct
lightningd_state` for backwards compatibility.
We had a hack for 'struct rval' in protobuf_convert.h; make an
explicit header and put it in bitcoin/preimage.h. It's not really
bitcoin-specific, but it's better than having bitcoin/script depend on
an external header.
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
So far this was simply set to a zero-length end-to-end payload. We
don't have any plans of re-adding it for the moment, so let's get rid
of the unused code.
This is a bit more awkward for large structures, but avoids
indirection for the simpler ones (I copied the structures for the test
code, however). We also remove explicit padding.
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
Other than being neater (no more global list to edit!), this lets the
new daemon and old daemon have their own separate routines.
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
In particular, we got a segv because we were measuring the wrong
wscript, then we miswired the inputs. It only worked because our
current steal tests don't have a to_us_idx output.
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
db_forget_peer() was harmless, but we haven't been entered into the
database yet anyway, and it asserted that we should have been STATE_CLOSED.
Closes: #67
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
We don't need it for testing at the moment, and if we do it'll have
to change to relative anyway now we're going to use time_mono().
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
It's possible that we won't have sent the anchor, but state is
committed in db. And our current philosophy is that we retransmit all
the txs dumbly, all the time.
Our --restart --timeout-anchor test trigger this case, too, so
re-enable that now.
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
Instead of using wall-clock time, we use blocks. This is simpler and
better for database restores. And both sides will time out.
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
Re-enabling the next test revealed bugs: if we need to retransmit the
initial open_commit_sig packet, we currently tried to send it as an
UPDATE_COMMIT, which isn't allowed. Fixing that revealed that if
we have to retransmit the initial open, we didn't do that either.
Thus the initial open should count towards the ack count, and we should
special case transmissions of 0 (pkt_open) and 1
(pkt_open_commit_sig).
We also save those early state changes to the database.
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
The simplest way is to always use peer_received_unexpected_pkt() which
sends the error packet, and ensure it doesn't do so in response to
pkt_err.
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
It doesn't actually help here; we only did it because we differentiate
the states later, and with refactoring we do that via the explicit
offer_anchor flag.
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>