It's generally clearer to have simple hardcoded numbers with an
#if DEVELOPER around it, than apparent variables which aren't, really.
Interestingly, our pruning test was always kinda broken: we have to pass
two cycles, since l2 will refresh the channel once to avoid pruning.
Do the more obvious thing, and cut the network in half and check that
l1 and l3 time out.
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
531c8d7d9b
In this one, we always send my_current_per_commitment_point, though it's
ignored. And we have our official feature numbers.
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
The largest change is inside hsmd: it hands a null per-commitment key
to the wallet to tell it to spend the to_remote output.
It can also now resolve unknown commitments, even if it doesn't have a
possible_remote_per_commitment_point from the peer.
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
As per BOLT02 #message-retransmission :
if `next_commitment_number` is 1 in both the `channel_reestablish` it sent and received:
- MUST retransmit `funding_locked`
Rather than reaching into data structures, let them register their own
callbacks. This avoids us having to expose "memleak_remove_xxx"
functions, and call them manually.
Under the hood, this is done by having a specially-named tal child of
the thing we want to assist, containing the callback.
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
This removes the WIRE_FINAL_EXPIRY_TOO_SOON which leaked too much info,
and adds the blockheight to WIRE_INCORRECT_OR_UNKNOWN_PAYMENT_DETAILS.
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
This is the normal convention for this type; it makes using converters
a little easier. See next patch.
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
We currently send channel_announcement as soon as we and our
peer agree it's 6 blocks deep. In theory, our other peers might
not have seen that block yet though, so delay a little.
This is mitigated by two factors:
1. lnd will stash any "not ready yet" channel_announcements anyway.
2. c-lightning doesn't enforce the 6 depth minimum at all.
We should not rely on other nodes' generosity or laxity, however!
Next release, we can start enforcing the depth limit, and maybe stashing
ones which don't quite make it (or simply enforce depth 5, not 6).
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
The way we build transactions, serialize them, and compute fees depends on the
chain we are working on, so let's add some context to the transactions.
Signed-off-by: Christian Decker <decker.christian@gmail.com>
And clean up some dev ones which actually happen (mainly by calling
channel_fail_permanent which logs UNUSUAL, rather than
channel_internal_error which logs BROKEN).
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
If we ever do this, we'd end up with an unspendable commitment tx anyway.
It might be able to happen if we have htlcs added from the non-fee-paying
party while the fees are increased, though. But better to close the
channel and get a report about it if that happens.
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
We track whether each change is affordable as we go;
test_channel_drainage got us so close that the difference mattered; we
hit an assert when we tried to commit the tx and realized we couldn't
afford it.
We should not be trying to add an HTLC if it will result in the funder
being unable to afford it on either the local *or remote* commitments.
Note the test still "fails" because it refuses to send the final
payment.
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
Subtracting both arbitrarily reduces our capacity, even for ourselves
since the routing logic uses this maximum.
I also changed 'advertise' to 'advertize', since we use american
spelling.
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
@pm47 gave a great bug report showing c-lightning sending the same
UPDATE_FEE over and over, with the final surprise result being that we
blamed the peer for sending us multiple empty commits!
The spam is caused by us checking "are we at the desired feerate?" but
then if we can't afford the desired feerate, setting the feerate we
can afford, even though it's a duplicate. Doing the feerate cap before
we test if it's what we have already eliminates this.
But the empty commits was harder to find: it's caused by a heuristic in
channel_rcvd_revoke_and_ack:
```
/* For funder, ack also means time to apply new feerate locally. */
if (channel->funder == LOCAL &&
(channel->view[LOCAL].feerate_per_kw
!= channel->view[REMOTE].feerate_per_kw)) {
status_trace("Applying feerate %u to LOCAL (was %u)",
channel->view[REMOTE].feerate_per_kw,
channel->view[LOCAL].feerate_per_kw);
channel->view[LOCAL].feerate_per_kw
= channel->view[REMOTE].feerate_per_kw;
channel->changes_pending[LOCAL] = true;
}
```
We assume we never send duplicates, so we detect an otherwise-empty
change using the difference in feerates. If we don't set this flag,
we will get upset if we receive a commitment_signed since we consider
there to be no changes to commit.
This is actually hard to test: the previous commit adds a test which
spams update_fee and doesn't trigger this bug, because both sides
use the same "there's nothing outstanding" logic.
Fixes: #2701
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
Keeping the uintmap ordering all the broadcastable messages is expensive:
130MB for the million-channels project. But now we delete obsolete entries
from the store, we can have the per-peer daemons simply read that sequentially
and stream the gossip itself.
This is the most primitive version, where all gossip is streamed;
successive patches will bring back proper handling of timestamp filtering
and initial_routing_sync.
We add a gossip_state field to track what's happening with our gossip
streaming: it's initialized in gossipd, and currently always set, but
once we handle timestamps the per-peer daemon may do it when the first
filter is sent.
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
Encapsulating the peer state was a win for lightningd; not surprisingly,
it's even more of a win for the other daemons, especially as we want
to add a little gossip information.
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
We ask gossipd for the channel_update for the outgoing channel; any other
messages it sends us get queued for later processing.
But this is overzealous: we can shunt those msgs to the peer while
we're waiting. This fixes a nasty case where we have to handle
WIRE_GOSSIPD_NEW_STORE_FD messages by queuing the fd for later.
This then means that WIRE_GOSSIPD_NEW_STORE_FD can be handled
internally inside handle_gossip_msg(), since it's always dealt with
the same, simplifying all callers.
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
1. Add remote_ann_node_sigs and remote_bitcoin_sigs fields in channel_init message;
2. Master add announcement signatures into channel_init message, and send this message to Channeld.
Channeld will initial the channel with this signatures when it reenables the channel.
Channeld sends announcement signatures to Master by this message.
When Channeld receive a new channel announcement msg, (After channel locking)it will sends announcement signatures to Master by this message.
Instead of reading the store ourselves, we can just send them an
offset. This saves gossipd a lot of work, putting it where it belongs
(in the daemon responsible for the specific peer).
MCP bench results:
store_load_msec:28509-31001(29206.6+/-9.4e+02)
vsz_kb:580004-580016(580006+/-4.8)
store_rewrite_sec:11.640000-12.730000(11.908+/-0.41)
listnodes_sec:1.790000-1.880000(1.83+/-0.032)
listchannels_sec:21.180000-21.950000(21.476+/-0.27)
routing_sec:2.210000-11.160000(7.126+/-3.1)
peer_write_all_sec:36.270000-41.200000(38.168+/-1.9)
Signficant savings in streaming gossip:
-peer_write_all_sec:48.160000-51.480000(49.608+/-1.1)
+peer_write_all_sec:35.780000-37.980000(36.43+/-0.81)
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
* remove libbase58, use base58 from libwally
This removes libbase58 and uses libwally instead.
It allocates and then frees some memory, we may want to
add a function in wally that doesn't or override
wally_operations to use tal.
Signed-off-by: Lawrence Nahum lawrence@greenaddress.it
The spec says not to send a commitment_signed without any changes, but LND
does this. To understand why, you have to understand how LND works. I
haven't read the code, but I'm pretty sure it works like this:
1. lnd slows down to do garbage collection, because it's in Go.
2. When an alert timer goes off, noticing it's not making process, it
sends a twitter message to @roasbeef.
3. @roasbeef sshs into the user's machine and binary patches lnd to send
a commitment_signed message.
4. Unfortunately he works so fast that various laws of causality are broken,
meaning sometimes the commitment_signed is sent before any of thes
other things happen.
I'm fairly sure that this will stop as @roasbeef ages, or lnd introduces
some kind of causality enforcement fix.
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
This fixes block parsing on testnet; specifically, non-standard tx versions.
We hit a type bug in libwally (wallt_get_secp_context()) which I had to
work around for the moment, and the updated libsecp adds an optional hash
function arg to the ECDH function.
Fixes: #2563
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
Don't turn them to/from pubkeys implicitly. This means nodeids in the store
don't get converted, but bitcoin keys still do.
MCP results from 5 runs, min-max(mean +/- stddev):
store_load_msec:33934-35251(34531.4+/-5e+02)
vsz_kb:2637488
store_rewrite_sec:34.720000-35.130000(34.94+/-0.14)
listnodes_sec:1.020000-1.290000(1.146+/-0.086)
listchannels_sec:51.110000-58.240000(54.826+/-2.5)
routing_sec:30.000000-33.320000(30.726+/-1.3)
peer_write_all_sec:50.370000-52.970000(51.646+/-1.1)
MCP notable changes from previous patch (>1 stddev):
-store_load_msec:46184-47474(46673.4+/-4.5e+02)
+store_load_msec:33934-35251(34531.4+/-5e+02)
-vsz_kb:2638880
+vsz_kb:2637488
-store_rewrite_sec:46.750000-48.280000(47.512+/-0.51)
+store_rewrite_sec:34.720000-35.130000(34.94+/-0.14)
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
I tried to just do gossipd, but it was uncontainable, so this ended up being
a complete sweep.
We didn't get much space saving in gossipd, even though we should save
24 bytes per node.
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
This is what all of this has been working towards: ripping out the handwoven
transaction handling. By removing the custom parsing we can finally switch
over to using `wally_tx` as sole representation of transactions in
memory. The commit is a bit larger but it's mostly removing setters and old
references to the input and output fields.
Signed-off-by: Christian Decker <decker.christian@gmail.com>
The `wally_tx_input`s do not keep track of their input value, which means we
need to track them ourselves if we try to sign these transactions at a later
point in time.
Signed-off-by: Christian Decker <decker.christian@gmail.com>
1. Rename channel_funding_locked to channel_funding_depth in
channeld/channel_wire.csv.
2. Add minimum_depth in struct channel in common/initial_channel.h and
change corresponding init function: new_initial_channel().
3. Add confirmation_needed in struct peer in channeld/channeld.c.
4. Rename channel_tell_funding_locked to channel_tell_depth.
5. Call channel_tell_depth even if depth < minimum, and still call
lockin_complete in channel_tell_depth, iff depth > minimum_depth.
6. channeld ignore the channel_funding_depth unless its >
minimum_depth(except to update billboard, and set
peer->confirmation_needed = minimum_depth - depth).
We need to do it in various places, but we shouldn't do it lightly:
the primitives are there to help us get overflow handling correct.
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
Basically we tell it that every field ending in '_msat' is a struct
amount_msat, and 'satoshis' is an amount_sat. The exceptions are
channel_update's fee_base_msat which is a u32, and
final_incorrect_htlc_amount's incoming_htlc_amt which is also a
'struct amount_msat'.
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
As a side-effect of using amount_msat in gossipd/routing.c, we explicitly
handle overflows and don't need to pre-prune ridiculous-fee channels.
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
We used to just throw htlcs into the channel with a flag to tell it to
ignore overflow. Instead, we can insert them in order (which is the same as
id order) which always must be valid.
This helps when we turn the balance into a struct amount_msat which will get
upset with overflows.
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
They're generally used pass-by-copy (unusual for C structs, but
convenient they're basically u64) and all possibly problematic
operations return WARN_UNUSED_RESULT bool to make you handle the
over/underflow cases.
The new #include in json.h means we bolt11.c sees the amount.h definition
of MSAT_PER_BTC, so delete its local version.
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
LND seems to do this occasionally, though fixed in new versions. Workaround
in the meantime.
I tested this by hacking our code to send it prematurely, and this worked.
Fixes: #2219
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
This solves (or at least reduces probability of) a deadlock in channeld
when there is lot of gossip traffic, see issue #2286. That issue is
almost identical to #1943 (deadlock in openingd) and so is the fix.
Spurious errors were occuring around checking the provided
current commitment point from the peer on reconnect when
option_data_loss_protect is enabled. The problem was that
we were using an inaccurate measure to screen for which
commitment point to compare the peer's provided one to.
This fixes the problem with screening, plus makes our
data_loss test a teensy bit more robust.
Christian and I both unwittingly used it in form:
*tal_arr_expand(&x) = tal(x, ...)
Since '=' isn't a sequence point, the compiler can (and does!) cache
the value of x, handing it to tal *after* tal_arr_expand() moves it
due to tal_resize().
The new version is somewhat less convenient to use, but doesn't have
this problem, since the assignment is always evaluated after the
resize.
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
This is based on Christian's change, but removes all trace of the old codes.
I've proposed another spec change which removes this code altogether:
https://github.com/lightningnetwork/lightning-rfc/pull/544
Signed-off-by: Christian Decker <decker.christian@gmail.com>
Reported-by: Rusty Russell <@rustyrussell>
This is mainly just copying over the copy-editing from the
lightning-rfc repository.
[ Split to just perform changes prior to the UNKNOWN_PAYMENT_HASH change --RR ]
Signed-off-by: Christian Decker <decker.christian@gmail.com>
Reported-by: Rusty Russell <@rustyrussell>
Fortunately, we can calculate the sha256 ourselves, so the
outgoing channeld doesn't need to tell us.
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
When the next node tells us the onion is malformed, we now actually
report the failcode to lightningd (rather than generating an invalid
error as we do now).
We could generate the onion at this point, but except we don't know
the shared secret; we'd have to plumb that through from the incoming
channeld's HTLC.
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
This covers all the cases where an onion can be malformed; this means
we know in advance that it's bad. That allows us to distinguish two
cases: where lightningd rejects the onion as bad, and where the next
peer rejects the next onion as bad. Both of those (will) set failcode
to one of the BADONION values.
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
It's more natural than using a zero-secret when something goes wrong.
Also note that the HSM will actually kill the connection if the ECDH
fails, which is fortunately statistically unlikely.
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
Funder can't spend the fee it needs to pay for the commitment transaction:
we were not converting to millisatoshis, however!
This breaks our routeboost test, which no longer has sufficient funds
to make payment.
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
We have an incompatibility with lnd it seems: I've lost channels on
reconnect with 'sync error'. Since I never got this code to be reliable,
disable it for next release since I suspect it's our fault :(
And reenable the check which didn't work, for others to untangle.
I couldn't get option_data_loss_protect to be reliable, and I disabled
the check. This was a mistake, I should have either spent even more
time trying to get to the bottom of this (especially, writing test
vectors for the spec and testing against other implementations).
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
This is prep work for when we sign htlc txs with
SIGHASH_SINGLE|SIGHASH_ANYONECANPAY.
We still deal with raw signatures for the htlc txs at the moment, since
we send them like that across the wire, and changing that was simply too
painful (for the moment?).
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
We only use them for re-transmitting the last commitment tx,
and the HSM signs them sync so it's straight-line code.
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
This simplifies lifetime assumptions. Currently all callers keep the
original around, but everything broke when I changed that in the next
patch.
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
They were not universally used, and most are trivial accessors anyway.
The exception is getting the channel reserve: we have to multiply by 1000
as well as flip direction, so keep that one.
The BOLT quotes move to `struct channel_config`.
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
We probably also want to call secp_randomise/wally_secp_randomize here
too, and since these calls all call setup_tmpctx, it probably makes
sense to have a helper function to do all that. Until thats done, I
modified the tests so grepping will show the places where the sequence
of calls is repeated.
Signed-off-by: Jon Griffiths <jon_p_griffiths@yahoo.com>
This avoids some very ugly switch() statements which mixed the two,
but we also take the chance to rename 'towire_gossip_' to
'towire_gossipd_' for those inter-daemon messages; they're messages to
gossipd, not gossip messages.
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
This way there's no need for a context pointer, and freeing a msg_queue
frees its contents, as expected.
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
This was suggested by Pierre-Marie as the solution to the 'same HTLC,
different CLTV' signature mismatch.
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
Have c-lightning nodes send out the largest value for
`htlc_maximum_msat` that makes sense, ie the lesser of
the peer's max_inflight_htlc value or the total channel
capacity minus the total channel reserve.
LND does this, and we get upset with it. I had assumed we would only
do this after funding_locked (since we don't consider the channel
shortid stable until that point), but TBH 6 confirms is probably
enough.
Fixes: #1985
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
Under stress, it fails (test_restart_many_payments, the next test).
I suspect a deep misunderstanding in the comparison code, will chase
separately.
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
We do this a lot, and had boutique helpers in various places. So add
a more generic one; for convenience it returns a pointer to the new
end element.
I prefer the name tal_arr_expand to tal_arr_append, since it's up to
the caller to populate the new array entry.
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
That matches the other CSV names (HSM was the first, so it was written
before the pattern emerged).
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
@renepickhardt: why is it actually lightningd.c with a d but hsm.c without d ?
And delete unused gossipd/gossip.h.
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
When in this state, we send a canned error "Awaiting unilateral close".
We enter this both when we drop to chain, and when we're trying to get
them to drop to chain due to option_data_loss_protect.
As this state (unlike channel errors) is saved to the database, it means
we will *never* talk to a peer again in this state, so they can't
confuse us.
Since we set this state in channel_fail_permanent() (which is the only
place we call drop_to_chain for a unilateral close), we don't need to
save to the db: channel_set_state() does that for us.
This state change has a subtle effect: we return WIRE_UNKNOWN_NEXT_PEER
instead of WIRE_TEMPORARY_CHANNEL_FAILURE as soon as we get a failure
with a peer. To provoke a temporary failure in test_pay_disconnect we
take the node offline.
Reported-by: Christian Decker @cdecker
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
Firstly, if they claim to know a future value, we ask the HSM; if
they're right, we tell master what the per-commitment-secret it gave
us (we have no way to validate this, though) and it will not broadcast
a unilateral (knowing it will cause them to use a penalty tx!).
Otherwise, we check the results they sent were valid. The spec says
to do this (and close the channel if it's wrong!), because otherwise they
could continually lie and give us a bad per-commitment-secret when we
actually need it.
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
For option_data_loss_protect, the peer can prove to us that it's ahead;
it gives us the (hopefully honest!) per_commitment_point it will use,
and we make sure we don't broadcast the commitment transaction we have.
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
We ignore incoming for now, but this means we advertize the option and
we send the required fields.
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
We quote BOLT 2 on *local* above the *remote* checks (we quote it
again below when we do the local checks).
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
1. l1 update_fee -> l2
2. l1 commitment_signed -> l2 (using new feerate)
3. l1 <- revoke_and_ack l2
4. l1 <- commitment_signed l2 (using new feerate)
5. l1 -> revoke_and_ack l2
When we break the connection after #3, the reconnection causes #4 to
be retransmitted, but it turns out l1 wasn't telling the master to set
the local feerate until it received the commitment_signed, so on
reconnect it uses the old feerate, with predictable results (bad
signature).
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
Now sending a ping makes sense: it should force the other end to send
a reply, unblocking the commitment process.
Note that rather than waiting for a reply, we're actually spinning on
a 100ms loop in this case. But it's simple and it works.
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
This doesn't do much (though we might get an error before we send the
commitment_signed), but it's infrastructure for the next patch.
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
We were adding channels without their capacity, and eventually annotated them
when we exchanged `channel_update`s. This worked as long as we weren't
considering the channel capacity, but would result in local-only channels to be
unusable once we start checking.
Also means we simplify the handle_gossip_msg() since everyone wants it to
use sync_crypto_write().
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
This is clearer and neater, and even slightly more efficient, since
read_peer_msg() was calling poll() again on gossipfd.
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
tal_count() is used where there's a type, even if it's char or u8, and
tal_bytelen() is going to replace tal_len() for clarity: it's only needed
where a pointer is void.
We shim tal_bytelen() for now.
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
We use these for receiving arrays at init time, we should also use them
for fulfull/fail of HTLCs in normal operation. That we we benefit from all
those assertions.
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
The master tells us the short_channel_id of the outgoing channel, and
channeld is supposed to get the corresponding channel_update from gossipd.
Instead, it got the channel_update for the *local* channel and ignored
that one.
The master tells us the short_channel_id of the outgoing channel when
failing an HTLC, but channeld didn't store it anywhere. It also
didn't tell channeld the short_channel_id in the case where we're
reconnecting and it's feeding us an array of failed htlcs.
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
That was the cause of the bad gossip order failures: gossipd thought our
channel was live, but the other end didn't receive message last time.
Now gossipd doesn't use fd to kill us (connectd tells master to do so), we
can implement read_peer_msg_nogossip().
Fixes: #1706
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
This will avoid us having to round-trip to the HSM each time we want it.
For now we still derive it, too, and assert it's correct.
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
Well, it's generated by shachain, so technically it is a sha256, but
that's an internal detail. It's a secret.
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
I'm not completely convinced that it's only ever set to a failcode
with the BADONION bit set, especially after the previous patches in
this series. Now that channeld can handle arbitrary failcodes passed
this way, simply rename it.
We add marshalling assertions that only one of failcode and failreason
is set, and we unmarshal an empty 'fail' to NULL (just the the
generated unmarshalling code does).
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
None of these sanity checks should fail, but let's be thorough: we
were testing for htlc->fail but not failcode when fulfilling an HTLC.
The failing-htlc case had this correct already.
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
'struct htlc' in channeld has a 'malformed' field, which is really only
used in the "retransmit updates on reconnect" case. That's quite confusing,
and I'm not entirely convinced that it can only be set to a failcode
with the BADONION bit set.
So generalize it, using the same logic we use in the master daemon:
failcode: a locally generated error, for channeld to turn into the appropriate
error message.
fail: a remotely generated onion error, for forwarding.
Either of these being non-zero/non-NULL means we've failed, and only one
should be set at any time.
We unify the "send htlc fail/fulfill update due to retransmit" and the
normal send update paths, by always calling send_fail_or_fulfill.
This unification revealed that we accidentally skipped the
onion-wrapping stage when we retransmit failed htlcs!
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
structeq() is too dangerous: if a structure has padding, it can fail
silently.
The new ccan/structeq instead provides a macro to define foo_eq(),
which does the right thing in case of padding (which none of our
structures currently have anyway).
Upgrade ccan, and use it everywhere. Except run-peer-wire.c, which
is only testing code and can use raw memcmp(): valgrind will tell us
if padding exists.
Interestingly, we still declared short_channel_id_eq, even though
we didn't define it any more!
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
This resolves the problem where both channeld and gossipd can generate
updates, and they can have the same timestamp. gossipd is always able
to generate them, so can ensure timestamp moves forward.
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
Instead of considering it a temporary step, consider it a necessary preamble
to sending updates.
This means (in the next patch) when we tell gossipd to generate the updates,
it's always done after we've told it to create the channel.
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
If we hit depth 6, we would start exchanging announcement signatures.
However, we should still send a temporary update while waiting for the
reply; make the logic clear in this case that we should always send
one or the other.
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
The condition in send_channel_update is wrong: it needs to match the
conditions under which we send announcements.
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
Note: this will break the gossip_store if they have current channels,
but it will fail to parse and be discarded.
Have local_add_channel do just that: the update is logically separate
and can be sent separately.
This removes the ugly 'bool add_to_store' flag.
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
We always call:
send_temporary_announcement(peer);
send_announcement_signatures(peer);
We should handle these in one place, since the conditional at the top
of them actually makes sure only one is effective. We also make the
caller set the peer->have_sigs[LOCAL] flag, instead of doing it
inside send_announcement_signatures().
We were sending announcements at the wrong time (on restart) somtimes.
We also move announce_channel() into the same logic, so it's always
together.
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
Just have a "new depth" callback, and let channeld do the right thing.
This makes the channeld paths a bit more straightforward.
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
Some paths (eg reconnect) were unconditionally sending a channel_update.
valgrind wasn't catching it because we unmarshal short_channel_ids[LOCAL]
as all-zeroes, so it's technically "initialized".
Create a wrapper to do this, and change the 'bool disabled' flag to be
the explicit disable flag value for clarity.
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
Currently it's always for messages to peer: make that status_peer_io and
add a new status_io for other IO.
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
This is a rebased and combined patch for Tor support. It is extensively
reworked in the following patches, but the basis remains Saibato's work,
so it seemed fairest to begin with this.
Minor changes:
1. Use --announce-addr instead of --tor-external.
2. I also reverted some whitespace and unrelated changes from the patch.
3. Removed unnecessary ';' after } in functions.
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
If channeld dies for some reason (eg, reconnect) and we didn't yet announce
the channel, we can miss doing so. This is unusual, because if lightningd
restarts it rearms the callback which gives us funding_locked, so it only
happens if just channel dies before sending the announcement message.
This problem applies to both temporary announcement (for gossipd) and
the real one. For the temporary one, simply re-send on startup, and
remote the error msg gossipd gives if it sees a second one. For the
real one, we need a flag to tell us the depth is sufficient; the peer
will ignore re-sends anyway.
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
When we get a reconnection, kill the current remote peer, and wait for the
master to tell us it's dead. Then we hand it the new peer.
Previously, we would end up with gossipd holding multiple peers, and
the logging was really hard to interpret; I'm not completely convinced
that we did the right thing when one terminated, either.
Note that this now means we can have peers with neither ->local nor ->remote
populated, so we check that more carefully.
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
This means that openingd and closingd now forward our gossip. But the real
reason we want to do this is that it gives an easy way for gossipd to kill
any active daemon, by closing its fd: previously closingd and openingd didn't
read the fd, so tended not to notice.
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
(This was sitting in my gossip-enchancement patch queue, but it simplifies
this set too, so I moved it here).
In 94711969f we added an explicit gossip_index so when gossipd gets
peers back from other daemons, it knows what gossip it has sent (since
gossipd can send gossip after the other daemon is already complete).
This solution is insufficient for the more general case where gossipd
wants to send other messages reliably, so replace it with the other
solution: have gossipd drain the "gossip fd" which the daemon returns.
This turns out to be quite simple, and is probably how I should have
done it originally :(
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
We missed it in some corner cases where we crashed/were killed between
being told of the lockin and sending the channel_normal_operation message.
When we were restarted, we were told both sides were locked in already,
so we never updated the state.
Pull the entire "tell channeld" logic into channel_control.c, and make
it clear that we need to keep waching if we cant't tell channeld. I think
we did get this correct in practice, since funding_announce_cb has the
same test, but it's better to be clear.
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
In particular, the main daemon and subdaemons share the backtrace code,
with hooks for logging.
The daemon hook inserts the io_poll override, which means we no longer
need io_debug.[ch]. Though most daemons don't need it, they still link
against ccan/io, so it's harmess (suggested by @ZmnSCPxj).
This was tested manually to make sure we get backtraces still.
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
I didn't convert all tests: they can still use a standalone context.
It's just marginally more efficient to share the libwally one for all
our daemons which link against it anyway.
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
If we're going to simply take() a pointer, don't allocate it off a random
object. Using NULL makes our intent clear, particularly with allocating
packets we're going to take() onto a queue.
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
I did a brief audit of tmpctx uses, and we do leak them in various
corner cases. Fortunely, all our daemons are based on some kind of
I/O loop, so it's fairly easy to clean a global tmpctx at that point.
This makes things a bit neater, and slightly more efficient, but also
clearer: I avoided creating a tmpctx in a few places because I didn't
want to add another allocation. With that penalty removed, I can use
it more freely and hopefully write clearer code.
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
This simplifies things, and means it's always in the database. Our
previous approach to creating it on the fly had holes when it was
created for onchaind, causing us to use another every time we
restarted.
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
For the moment, this just tracks the lockin, announce and shutdown
statuses.
We currently have trouble telling when we're stuck in
CHANNELD_AWAITING_LOCKIN who has sent the transaction.
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
We always hand in "NULL" (which means use tal_len on the msg), except
for two places which do that manually for no good reason.
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
Because peer_failed would previously drop the connection, we had a
special 'negotiation_failed' message which made the master hand it
back to gossipd. We don't need that any more.
This also meant we no longer need a special hook in read_peer_msg
for openingd to send this message.
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
Several daemons (onchaind, hsm) want to use the status messages, but
don't communicate with peers. The coming changes made them drag in
more code they didn't need, so instead we have a different
non-overlapping type.
We combine the status_received_errmsg and status_sent_errmsg
into a single status_peer_error, with the presence or not of the
'error_for_them' field indicating direction.
We also rename status_fatal_connection_lost() to
peer_failed_connection_lost() to fit in.
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
We make it a macro, since everyone uses PEER_FD and GOSSIP_FD constants
(they're actually always the same, but this is slightly safer), and
add a gossip_index arg: this is groundwork for when we want to hand
the peer back to master for gossipd.
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
This avoids clashing with the new_channel we're about to add to lightningd,
and also matches its counterpart new_initial_channel.
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
Now we have wirestring, this is much more natural. And with the
24M length limit, we needn't be so concerned about dumping 64k peer
messages in hex.
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
These are now logically arrays of pointers. This is much more natural,
and gets rid of the horrible utxo array converters.
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
We need to override two methods: the io error (tell gossipd to
disable), and send reply (enqueue, don't write direclty).
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
In particular, decode error messages correctly and do the right thing with
messages about other channels.
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
Right now it allows ping but not pong.
If A sends a ping expecting a pong to B during CHANNELD_AWAITING_LOCKIN,
It would result in
`STATUS_FAIL_PEER_BAD: WIRE_PONG (19) before funding locked`
resulting in a unilateral channel close by A.
Disabling the channel and enqueing the update for broadcast so we
don't get forwarding requests from remote peers, and we don't try to
ourselves.
Signed-off-by: Christian Decker <decker.christian@gmail.com>
Sends a disable channel_update before issuing the shutdown message,
gossipd will also take care to update others and not use for future
routes.
Signed-off-by: Christian Decker <decker.christian@gmail.com>
Travis gave an error:
```
DEBUG:root:lightningd(16333): lightning_closingd(8004): STATUS_FAIL_PEER_BAD: Expected closing_signed:
0085b679bd79b836b05c649cad9af31156cb1d50de448a59c6359ab7c85f4b63913d2e3bc8ad4a80ab698558e5b4949b78dc36acc90dde4f5ac006fd6ca1d109feea03aef9c718e9ce09bbb52dc8308ba8f46b43808ea1a551d41aee72af7af77628d1
```
Which is caused by us not waiting for the revoke-and-ack from a feechange
when we're shutting down.
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
We could do this lazily, if HTLC errors out, but we do it as HTLCs
come in in the normal case, so this is slightly simpler.
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
This, of course, should never be used. But it helps maintain connections
for the moment while we dig deeper into feerates.
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
1. htlc->fail has been changed to a u8 *.
2. wallet_get_newindex saves to the db.
3. peer->next_htlc_id is saved to the db in peer_save_commitsig_sent() below.
4. We do store commit in peer_save_commitsig_received(peer, commitnum),
and the fixme below talks about HTLC sigs.
5. We do commit shachain and next_per_commit_point in wallet_shachain_add_hash
and update_per_commit_point respectively.
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>