Found in tests/test_connection.py::test_restart_many_payments:
`lightningd: outstanding taken(): lightningd/peer_htlcs.c:532:towire_temporary_channel_failure(((void *)0), ((void *)0))`
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
This adds a `state_change` 'cause' to a channel.
A 'cause' is some initial 'reason' a channel was created or closed by:
/* Anything other than the reasons below. Should not happen. */
REASON_UNKNOWN,
/* Unconscious internal reasons, e.g. dev fail of a channel. */
REASON_LOCAL,
/* The operator or a plugin opened or closed a channel by intention. */
REASON_USER,
/* The remote closed or funded a channel with us by intention. */
REASON_REMOTE,
/* E.g. We need to close a channel because of bad signatures and such. */
REASON_PROTOCOL,
/* A channel was closed onchain, while we were offline. */
/* Note: This is very likely a conscious remote decision. */
REASON_ONCHAIN
If a 'cause' is known and a subsequent state change is made with
`REASON_UNKNOWN` the preceding cause will be used as reason, since a lot
(all `REASON_UNKNOWN`) state changes are a subsequent consequences of a prior
cause: local, user, remote, protocol or onchain.
Changelog-Added: Plugins: Channel closure resaon/cause to channel_state_changed notification
Great report from whitslack on this crash at startup:
```
2020-10-07T13:03:21.419Z **BROKEN** lightningd: FATAL SIGNAL 6 (version 0.9.1)
2020-10-07T13:03:21.419Z **BROKEN** lightningd: backtrace: common/daemon.c:51 (crashdump) 0x559fb67bcc76
2020-10-07T13:03:21.419Z **BROKEN** lightningd: backtrace: /var/tmp/portage/sys-libs/glibc-2.32-r2/work/glibc-2.32/signal/../sysdeps/unix/sysv/linux/x86_64/sigaction.c:0 ((null)) 0x7f61cdca8baf
2020-10-07T13:03:21.419Z **BROKEN** lightningd: backtrace: ../sysdeps/unix/sysv/linux/raise.c:50 (__GI_raise) 0x7f61cdca8b31
2020-10-07T13:03:21.419Z **BROKEN** lightningd: backtrace: /var/tmp/portage/sys-libs/glibc-2.32-r2/work/glibc-2.32/stdlib/abort.c:79 (__GI_abort) 0x7f61cdc92535
2020-10-07T13:03:21.419Z **BROKEN** lightningd: backtrace: /var/tmp/portage/sys-libs/glibc-2.32-r2/work/glibc-2.32/assert/assert.c:92 (__assert_fail_base) 0x7f61cdc9241e
2020-10-07T13:03:21.419Z **BROKEN** lightningd: backtrace: /var/tmp/portage/sys-libs/glibc-2.32-r2/work/glibc-2.32/assert/assert.c:101 (__GI___assert_fail) 0x7f61cdca1241
2020-10-07T13:03:21.419Z **BROKEN** lightningd: backtrace: lightningd/subd.c:750 (subd_send_msg) 0x559fb67a1c31
2020-10-07T13:03:21.419Z **BROKEN** lightningd: backtrace: lightningd/subd.c:745 (subd_send_msg) 0x559fb67a1c31
2020-10-07T13:03:21.419Z **BROKEN** lightningd: backtrace: lightningd/peer_htlcs.c:252 (local_fail_in_htlc) 0x559fb6798f77
2020-10-07T13:03:21.419Z **BROKEN** lightningd: backtrace: lightningd/peer_htlcs.c:1441 (onchain_failed_our_htlc) 0x559fb6798f77
2020-10-07T13:03:21.419Z **BROKEN** lightningd: backtrace: lightningd/onchain_control.c:339 (handle_missing_htlc_output) 0x559fb6786b9d
2020-10-07T13:03:21.419Z **BROKEN** lightningd: backtrace: lightningd/onchain_control.c:455 (onchain_msg) 0x559fb6786b9d
```
The problem is a channel with an onchaind can be in state FUNDING_STATE_SEEN,
because onchaind has started but not responded to init yet (which it does once it
has analyzed the commitment tx).
Channel B is onchain, and its onchaind fails the HTLC, and we try to send a msg
to channel A's onchaind as if it were channeld.
Explicitly check if it's channeld, rather than trying to see if it's onchaind.
Fixes: #4114
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
Changelog-Fixed: crash: assertion fail at restart when source and destination channels of an HTLC are both onchain.
This avoids overwriting the ones in git, and generally makes things neater.
We have convenience headers wire/peer_wire.h and wire/onion_wire.h to
avoid most #ifdefs: simply include those.
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
Note that other directories were explicitly depending on the generated
file, instead of relying on their (already existing) dependency on
$(LIGHTNINGD_HSM_CLIENT_OBJS), so we remove that.
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
This means some files get renamed, and I took the opportunity to clarify
our naming (the *d* is important!)
1. channeld/channel_wire.csv -> channeld/channeld_wire.csv
2. channeld/gen_channel_wire.h -> channeld/channeld_wiregen.h
3. enum channel_wire_type -> enum channeld_wire
4. WIRE_CHANNEL_FUNDING_DEPTH -> WIRE_CHANNELD_FUNDING_DEPTH.
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
As per https://github.com/lightningnetwork/lightning-rfc/pull/785
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
Changelog-Changed: config: the default CLTV expiry is now 34 blocks, and final expiry 18 blocks as per new BOLT recommendations.
This is best done by passing `struct bitcoin_signature` around instead
of raw signatures. We still save raw sigs to the db, and of course the
wire protocol uses them.
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
On node start we replay onchaind's transactions from the database/from
our loaded htlc table. To keep things tidy, we shouldn't notify the
ledger about these, so we wrap pretty much everything in a flag that
tells us whether or not this is a replay.
There's a very small corner case where dust transactions will get missed
if the node crashes after the htlc has been added to the database but
before we've successfully notified onchaind about it.
Notably, most of the obtrusive updates to onchaind wrappings are due to
the fact that we record dust (ignored outputs) before we receive
confirmation of its confirmation.
HTLCs trigger a coin movement only when their final form (state) is
reached. This prevents us from needing to concern ourselves with
retries, as well as being the absolutely most correct in terms of
answering the question 'when has the money irrevocably changed hands'.
All coin movements should pass this bar, for ultimate accounting
correctness
The current plan for coin movements involves tagging
origination/destination htlc's with a separate tag from 'routed' htlcs
(which pass through our node). In order to do this, we need a persistent flag on
incoming htlcs as to whether or not we are the final destination.
I noticed the following in logs for tests/test_connection.py::test_feerate_stress:
```
DEBUG 022d223620a359a47ff7f7ac447c85c46c923da53389221a0054c11c1e3ca31d59-chan#1: Failing HTLC 18446744073709551615 due to peer death
DEBUG 022d223620a359a47ff7f7ac447c85c46c923da53389221a0054c11c1e3ca31d59-chan#1: local_routing_failure: 8194 (WIRE_TEMPORARY_NODE_FAILURE)
```
This is because it reports the (transient) node_failure error, because
our channel_failure message is incomplete. Fix this wart up.
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
Previously we've used the term 'funder' to refer to the peer
paying the fees for a transaction; v2 of openchannel will make
this no longer true. Instead we rename this to 'opener', or the
peer sending the 'open_channel' message, since this will be universally
true in a dual-funding world.
The plugin can basically return whatever it thinks the preimage is, but we
weren't handling the case in which it doesn't actually match the hash. If it
doesn't match now we just return an error claiming we don't have any matching
invoice.
One is called on every plugin return, and tells us whether to continue;
the other is only called if every plugin says ok.
This works for things like payload replacement, where we need to process
the results from each plugin, not just the final one!
We should probably turn everything into a chained callback next
release.
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
They callback must take ownership of the payload (almost all do, but
now it's explicit).
And since the payload and cb_arg arguments to plugin_hook_call_() are
always identical, make them a single parameter.
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
Note that it's channeld which calculates the shared secret, too. This
minimizes the work that lightningd has to do, at cost of passing this
through.
We also don't yet save the blinding field(s) to the database.
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
This requires us to call ecdh() in the corner case where the blinding seed
is in the TLV itself (which is the case for the start of a blinded route).
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
This happened on my testnet node because I've been failing to reconnect to
a node which created a channel and never exchanged announcement sigs.
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
We currently abuse the added_htlc and failed_htlc messages to tell channeld
about existing htlcs when it restarts. It's clearer to have an explicit
'existing_htlc' type which contains all the information for this case.
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
For messages, we use the onion but payload lengths 0 and 1 aren't special.
Create a flag to disable that logic.
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
Even without optimization, it's faster to walk all the channels than
ping another daemon and wait for the response.
Changelog-Changed: Forwarding messages is now much faster (less inter-daemon traffic)
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
Instead of saving a stripped_update, we use the new
local_fail_in_htlc_needs_update.
One minor change: we return the more correct
towire_temporary_channel_failure when the node is still syncing.
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
The idea is that gossipd can give us the cupdate we need for an error, and
we wire things up so that we ask for it (async) just before we send the
error to the subdaemon.
I tried many other things, but they were all too high-risk.
1. We need to ask gossipd every time, since it produces these lazily
(in particular, it doesn't actually generate an offline update unless
the channel is used).
2. We can't do async calls in random places, since we'll end up with
an HTLC in limbo. What if another path tries to fail it at the same time?
3. This allows us to use a temporary_node_failure error, and upgrade it
when gossipd replies. This doesn't change any existing assumptions.
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
This is a common thing to do, so create a macro.
Unfortunately, it still needs the type arg, because the paramter may
be const, and the return cannot be, and C doesn't have a general
"(-const)" cast.
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
This completes the conversion; any in-flight HTLC failures get turned into temporary_node_failures.
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
This cleans up the "local failure" callers for incoming HTLCs to hand
an onionreply instead of making us generate it from the code inside
make_failmsg.
(The db path still needs make_failmsg, so that's next).
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
Changelog-deprecated: Plugins: htlc_accepted_hook "failure_code" only handles simple cases now, use "failure_message".
Unfortunately the invoice_payment_hook can give us a failcode, so I simply
restrict it to the two sensible ones.
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
Changelog-deprecated: plugins: invoice_payment_hook "failure_code" only handles simple cases now, use "failure_message".
We tell channeld that an htlc is bad by sending it a 'struct
failed_htlc'. This usually contains an onionreply to forward, but for
the case where the onion itself was bad, it contains a failure code
instead.
This makes the "send a failed_htlc for a bad onion" a completely
separate code path, then we can work on removing failcodes from the
other path.
In several places 'failcode' is now changed to 'badonion' to reflect
that it can only be a BADONION failcode.
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
At the moment, we store e.g. WIRE_TEMPORARY_CHANNEL_FAILURE, and then
lightningd has a large demux function which turns that into the correct
error message.
Such an enum demuxer is an anti-pattern.
Instead, store the message directly for output HTLCs; channeld now
sends us an error message rather than an error code.
For input HTLCs we will still need the failure code if the onion was
bad (since we need to prompt channeld to send a completely different
message than normal), though we can (and will!) eliminate its use in
non-BADONION failure cases.
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
We're going to change our internal structure next, so this is preparation.
We populate existing errors with temporary node failures, for simplicity.
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
Instead of making it ourselves, lightningd does it. Now we only have
two cases of failed htlcs: completely malformed (BADONION), and with
an already-wrapped onion reply to send.
This makes channeld's job much simpler.
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
I hadn't realized that lightningd asks gossipd every time we forward
a payment. But I'm going to abuse it here to get the latest channel_update,
otherwise (as lightningd takes over error message generation) lightningd
needs to do an async request at various painful points.
So have gossipd tell us the lastest update (stripped so compatible with
the strange in-onion-error format).
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
Turn it into temporary node failure: this only happens if we restart
with a failed htlc in, but it's clearer and more robust to handle it
generically.
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
For incoming htlcs, we need failure details in case we need to
re-xmit them. But for outgoing htlcs, lightningd is telling us it
already knows they've failed, so we just need to flag them failed
and don't need the details.
Internally, we set the ->fail to a dummy non-NULL value; this is
cleaned up next.
This matters for the next patch, which moves onion handling into
lightningd.
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
1. forward_htlc sets hout to NULL.
2. forward_htlc passes &hout to send_htlc_out.
3. forward_htlc checks the failcode and frees(NULL) and sets hout to NULL
(again). This in fact covers every failcode which send_htlc_out returns.
We should ensure send_htlc_out sets *houtp to NULL on failure; in fact,
both callers pass houtp, so we can make it unconditional.
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
If the peer is not connected, or other error which means we don't
actually create an outgoing HTLC, we don't record the
short_channel_id. This is unhelpful!
Pass the scid down to the wallet code, and explicitly hand the
scid and amount down to the notification code rather than handing it
the htlc_out (which it doesn't need).
Changelog-Changed: JSON API: `listforwards` now shows `out_channel` even if we couldn't forward.
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
Make the `htlc_accepted` hook the first chained hook in our repertoire. The
plugins are called one after the other in order until we have no more plugins
or the HTLC was handled by one of the plugins. If no plugins handles the HTLC
we continue to handle it internally like always.
Handling in this case means the plugin returns either `{"result": "resolve",
...}` or `{"result": "fail", ...}`.
Changelog-Changed: plugin: Multiple plugins can now register for the htlc_accepted hook.
The newly introduced type is used to determine what the call semantics of the
hook are. We have `single` corresponding to the old behavior, as well as
`chain` which allows multiple plugins to register for the hook, and they are
then called sequentially (if all plugins return `{"result": "continue"}`) or
exit the chain if the hook event was handled.
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
Changelog-Removed: Relative plugin paths are not relative to startup (deprecated v0.7.2.1)
Changelog-Removed: Dummy fields in listforwards (deprecated v0.7.2.1)
This shouldn't happen if channeld is working properly, but I'm going to
change that, and this current code means we stop responding at that point
(not every failpath in peer_accepted_htlc() called channel_internal_error).
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
Generally I prefer structures over u8, since the size is enforced at
runtime; and in several places we were doing conversions as the code
using Sphinx does treat struct secret as type of the secret.
Note that passing an array is the same as passing the address, so
changing from 'u8 secret[32]' to 'struct secret secret' means various
'secret' parameters change to '&secret'. Technically, '&secret' also
would have worked before, since '&' is a noop on array, but that's
always seemed a bit weird.
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
This makes it clear we're dealing with a message which is a wrapped error
reply (needing unwrap_onionreply), not an already-wrapped one.
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
`wallet_payment_store` would free the `wallet_payment` instance which would
then cause us to reload it from the DB. Instead of doing the store->free->load
dance we now tell `wallet_payment_store` whether it should take ownership and
leave it alone if not.
Passing the payment around instead of referencing it through payment_hash and
partid is a nice side-effect.
This is the final step: we pass the complete fee_states to and from
channeld.
Changelog-Fixed: "Bad commitment signature" closing channels when we sent back-to-back update_fee messages across multiple reconnects.
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
The invoice_try_pay code now takes a set, rather than a single htlc, but
it's basically the same thing.
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
This is a transient field, so rework things so we don't leave it in
struct htlc_out. Instead, load htlc_in first and connect htlc_out to
them as we go.
This also changes one place where we use it instead of the am_origin
flag.
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
This is in preparation for partial payments. For existing payments,
partid is 0 (to match the corresponding payment).
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
This is in preparation for partial payments. For existing payments,
partid is 0 (arbitrarity) and total_msat is msatoshi.
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
Because my node runs under valgrind, it can take quite a while to
sync; nodes tend to disconnect and reconnect if you block too long.
This is particularly problematic since we often update fees: when the
other side sends its commitment_signed we block.
In particular, this triggers the corner case we have where we
update_fee twice, disconnecting each time, and our state machine gets
confused (which is why we never saw this exact corner case before this
change in 0.7.3!).
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
Now "raw_payload" is always the complete string (including realm or length
bytes at the front).
This has several effects:
1. We can receive an decrypt an onion which is grossly malformed.
2. We can still hand this to the htlc_accepted hook.
3. We then fail it unless the htlc_accepted accepts it manually.
4. The createonion API now takes the raw payload, and does not know
anything about "style".
The only caveat is that the sphinx code needs to know the payload
length: we have a call for that, which simply tells it to copy the
entire onion (and treat us as the final node) if it's invalid.
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
We don't set the secret to compulsory (yet!) but put code in for the
future. Meanwhile, if there is a secret, check it is correct.
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
Also pulls in a new onion error (mpp_timeout). We change our
route_step_decode_end() to always return the total_msat and optional
secret.
We check total_amount (to prohibit mpp), but we do nothing with
secret for now other than hand it to the htlc_accepted hook.
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
This function ensures we have all the infos we need to continue if the
htlc_accepted hook tells us to. It also enforces well-formedness of the TLV
payload if we have a TLV payload.
Suggested-by: List Neigut <@niftynei>
Signed-off-by: Christian Decker <@cdecker>
We wire in the code-generated function, which removes the upfront validation
and add the validation back after the `htlc_accepted` hook returns. If a
plugin wanted to handle the onion in a special way it'll not have told us to
just continue.
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
Changelog-changed: JSON API: `htlc_accepted` hook has `type` (currently `legacy` or `tlv`) and other fields directly inside `onion`.
Changelog-deprecated: JSON API: `htlc_accepted` hook `per_hop_v0` object deprecated, as is `short_channel_id` for the final hop.
Feerate changes are asymmetric, as they can only be sent by the funder.
For FUNDER, the remote feerate is set when upon send of
commitment_signed, and the local feerate is set on receipt of
revoke_and_ack.
For non-funder, the local feerate is set on receipt of
commitment_signed, and the remote feerate set on send of
revoke_and_ack. In our code, these two happen together.
channeld gets this right, but lightningd ignored the funder/fundee
distinction, and as a result, receipt of a commitment_signed by the
funder altered fees in the database. If there was a reconnection
event or restart, then these (incorrect) values would be used, causing
us to complain about a 'Bad commit_sig signature' and close the
channel.
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>