Rather than reaching into data structures, let them register their own
callbacks. This avoids us having to expose "memleak_remove_xxx"
functions, and call them manually.
Under the hood, this is done by having a specially-named tal child of
the thing we want to assist, containing the callback.
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
This removes the WIRE_FINAL_EXPIRY_TOO_SOON which leaked too much info,
and adds the blockheight to WIRE_INCORRECT_OR_UNKNOWN_PAYMENT_DETAILS.
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
This is the normal convention for this type; it makes using converters
a little easier. See next patch.
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
We currently send channel_announcement as soon as we and our
peer agree it's 6 blocks deep. In theory, our other peers might
not have seen that block yet though, so delay a little.
This is mitigated by two factors:
1. lnd will stash any "not ready yet" channel_announcements anyway.
2. c-lightning doesn't enforce the 6 depth minimum at all.
We should not rely on other nodes' generosity or laxity, however!
Next release, we can start enforcing the depth limit, and maybe stashing
ones which don't quite make it (or simply enforce depth 5, not 6).
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
The way we build transactions, serialize them, and compute fees depends on the
chain we are working on, so let's add some context to the transactions.
Signed-off-by: Christian Decker <decker.christian@gmail.com>
And clean up some dev ones which actually happen (mainly by calling
channel_fail_permanent which logs UNUSUAL, rather than
channel_internal_error which logs BROKEN).
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
If we ever do this, we'd end up with an unspendable commitment tx anyway.
It might be able to happen if we have htlcs added from the non-fee-paying
party while the fees are increased, though. But better to close the
channel and get a report about it if that happens.
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
We track whether each change is affordable as we go;
test_channel_drainage got us so close that the difference mattered; we
hit an assert when we tried to commit the tx and realized we couldn't
afford it.
We should not be trying to add an HTLC if it will result in the funder
being unable to afford it on either the local *or remote* commitments.
Note the test still "fails" because it refuses to send the final
payment.
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
Subtracting both arbitrarily reduces our capacity, even for ourselves
since the routing logic uses this maximum.
I also changed 'advertise' to 'advertize', since we use american
spelling.
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
@pm47 gave a great bug report showing c-lightning sending the same
UPDATE_FEE over and over, with the final surprise result being that we
blamed the peer for sending us multiple empty commits!
The spam is caused by us checking "are we at the desired feerate?" but
then if we can't afford the desired feerate, setting the feerate we
can afford, even though it's a duplicate. Doing the feerate cap before
we test if it's what we have already eliminates this.
But the empty commits was harder to find: it's caused by a heuristic in
channel_rcvd_revoke_and_ack:
```
/* For funder, ack also means time to apply new feerate locally. */
if (channel->funder == LOCAL &&
(channel->view[LOCAL].feerate_per_kw
!= channel->view[REMOTE].feerate_per_kw)) {
status_trace("Applying feerate %u to LOCAL (was %u)",
channel->view[REMOTE].feerate_per_kw,
channel->view[LOCAL].feerate_per_kw);
channel->view[LOCAL].feerate_per_kw
= channel->view[REMOTE].feerate_per_kw;
channel->changes_pending[LOCAL] = true;
}
```
We assume we never send duplicates, so we detect an otherwise-empty
change using the difference in feerates. If we don't set this flag,
we will get upset if we receive a commitment_signed since we consider
there to be no changes to commit.
This is actually hard to test: the previous commit adds a test which
spams update_fee and doesn't trigger this bug, because both sides
use the same "there's nothing outstanding" logic.
Fixes: #2701
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
Keeping the uintmap ordering all the broadcastable messages is expensive:
130MB for the million-channels project. But now we delete obsolete entries
from the store, we can have the per-peer daemons simply read that sequentially
and stream the gossip itself.
This is the most primitive version, where all gossip is streamed;
successive patches will bring back proper handling of timestamp filtering
and initial_routing_sync.
We add a gossip_state field to track what's happening with our gossip
streaming: it's initialized in gossipd, and currently always set, but
once we handle timestamps the per-peer daemon may do it when the first
filter is sent.
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
Encapsulating the peer state was a win for lightningd; not surprisingly,
it's even more of a win for the other daemons, especially as we want
to add a little gossip information.
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
We ask gossipd for the channel_update for the outgoing channel; any other
messages it sends us get queued for later processing.
But this is overzealous: we can shunt those msgs to the peer while
we're waiting. This fixes a nasty case where we have to handle
WIRE_GOSSIPD_NEW_STORE_FD messages by queuing the fd for later.
This then means that WIRE_GOSSIPD_NEW_STORE_FD can be handled
internally inside handle_gossip_msg(), since it's always dealt with
the same, simplifying all callers.
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
1. Add remote_ann_node_sigs and remote_bitcoin_sigs fields in channel_init message;
2. Master add announcement signatures into channel_init message, and send this message to Channeld.
Channeld will initial the channel with this signatures when it reenables the channel.
Channeld sends announcement signatures to Master by this message.
When Channeld receive a new channel announcement msg, (After channel locking)it will sends announcement signatures to Master by this message.
Instead of reading the store ourselves, we can just send them an
offset. This saves gossipd a lot of work, putting it where it belongs
(in the daemon responsible for the specific peer).
MCP bench results:
store_load_msec:28509-31001(29206.6+/-9.4e+02)
vsz_kb:580004-580016(580006+/-4.8)
store_rewrite_sec:11.640000-12.730000(11.908+/-0.41)
listnodes_sec:1.790000-1.880000(1.83+/-0.032)
listchannels_sec:21.180000-21.950000(21.476+/-0.27)
routing_sec:2.210000-11.160000(7.126+/-3.1)
peer_write_all_sec:36.270000-41.200000(38.168+/-1.9)
Signficant savings in streaming gossip:
-peer_write_all_sec:48.160000-51.480000(49.608+/-1.1)
+peer_write_all_sec:35.780000-37.980000(36.43+/-0.81)
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
* remove libbase58, use base58 from libwally
This removes libbase58 and uses libwally instead.
It allocates and then frees some memory, we may want to
add a function in wally that doesn't or override
wally_operations to use tal.
Signed-off-by: Lawrence Nahum lawrence@greenaddress.it
The spec says not to send a commitment_signed without any changes, but LND
does this. To understand why, you have to understand how LND works. I
haven't read the code, but I'm pretty sure it works like this:
1. lnd slows down to do garbage collection, because it's in Go.
2. When an alert timer goes off, noticing it's not making process, it
sends a twitter message to @roasbeef.
3. @roasbeef sshs into the user's machine and binary patches lnd to send
a commitment_signed message.
4. Unfortunately he works so fast that various laws of causality are broken,
meaning sometimes the commitment_signed is sent before any of thes
other things happen.
I'm fairly sure that this will stop as @roasbeef ages, or lnd introduces
some kind of causality enforcement fix.
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
This fixes block parsing on testnet; specifically, non-standard tx versions.
We hit a type bug in libwally (wallt_get_secp_context()) which I had to
work around for the moment, and the updated libsecp adds an optional hash
function arg to the ECDH function.
Fixes: #2563
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
Don't turn them to/from pubkeys implicitly. This means nodeids in the store
don't get converted, but bitcoin keys still do.
MCP results from 5 runs, min-max(mean +/- stddev):
store_load_msec:33934-35251(34531.4+/-5e+02)
vsz_kb:2637488
store_rewrite_sec:34.720000-35.130000(34.94+/-0.14)
listnodes_sec:1.020000-1.290000(1.146+/-0.086)
listchannels_sec:51.110000-58.240000(54.826+/-2.5)
routing_sec:30.000000-33.320000(30.726+/-1.3)
peer_write_all_sec:50.370000-52.970000(51.646+/-1.1)
MCP notable changes from previous patch (>1 stddev):
-store_load_msec:46184-47474(46673.4+/-4.5e+02)
+store_load_msec:33934-35251(34531.4+/-5e+02)
-vsz_kb:2638880
+vsz_kb:2637488
-store_rewrite_sec:46.750000-48.280000(47.512+/-0.51)
+store_rewrite_sec:34.720000-35.130000(34.94+/-0.14)
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
I tried to just do gossipd, but it was uncontainable, so this ended up being
a complete sweep.
We didn't get much space saving in gossipd, even though we should save
24 bytes per node.
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
This is what all of this has been working towards: ripping out the handwoven
transaction handling. By removing the custom parsing we can finally switch
over to using `wally_tx` as sole representation of transactions in
memory. The commit is a bit larger but it's mostly removing setters and old
references to the input and output fields.
Signed-off-by: Christian Decker <decker.christian@gmail.com>
The `wally_tx_input`s do not keep track of their input value, which means we
need to track them ourselves if we try to sign these transactions at a later
point in time.
Signed-off-by: Christian Decker <decker.christian@gmail.com>
1. Rename channel_funding_locked to channel_funding_depth in
channeld/channel_wire.csv.
2. Add minimum_depth in struct channel in common/initial_channel.h and
change corresponding init function: new_initial_channel().
3. Add confirmation_needed in struct peer in channeld/channeld.c.
4. Rename channel_tell_funding_locked to channel_tell_depth.
5. Call channel_tell_depth even if depth < minimum, and still call
lockin_complete in channel_tell_depth, iff depth > minimum_depth.
6. channeld ignore the channel_funding_depth unless its >
minimum_depth(except to update billboard, and set
peer->confirmation_needed = minimum_depth - depth).
We need to do it in various places, but we shouldn't do it lightly:
the primitives are there to help us get overflow handling correct.
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
Basically we tell it that every field ending in '_msat' is a struct
amount_msat, and 'satoshis' is an amount_sat. The exceptions are
channel_update's fee_base_msat which is a u32, and
final_incorrect_htlc_amount's incoming_htlc_amt which is also a
'struct amount_msat'.
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
As a side-effect of using amount_msat in gossipd/routing.c, we explicitly
handle overflows and don't need to pre-prune ridiculous-fee channels.
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
We used to just throw htlcs into the channel with a flag to tell it to
ignore overflow. Instead, we can insert them in order (which is the same as
id order) which always must be valid.
This helps when we turn the balance into a struct amount_msat which will get
upset with overflows.
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
They're generally used pass-by-copy (unusual for C structs, but
convenient they're basically u64) and all possibly problematic
operations return WARN_UNUSED_RESULT bool to make you handle the
over/underflow cases.
The new #include in json.h means we bolt11.c sees the amount.h definition
of MSAT_PER_BTC, so delete its local version.
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
LND seems to do this occasionally, though fixed in new versions. Workaround
in the meantime.
I tested this by hacking our code to send it prematurely, and this worked.
Fixes: #2219
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
This solves (or at least reduces probability of) a deadlock in channeld
when there is lot of gossip traffic, see issue #2286. That issue is
almost identical to #1943 (deadlock in openingd) and so is the fix.
Spurious errors were occuring around checking the provided
current commitment point from the peer on reconnect when
option_data_loss_protect is enabled. The problem was that
we were using an inaccurate measure to screen for which
commitment point to compare the peer's provided one to.
This fixes the problem with screening, plus makes our
data_loss test a teensy bit more robust.
Christian and I both unwittingly used it in form:
*tal_arr_expand(&x) = tal(x, ...)
Since '=' isn't a sequence point, the compiler can (and does!) cache
the value of x, handing it to tal *after* tal_arr_expand() moves it
due to tal_resize().
The new version is somewhat less convenient to use, but doesn't have
this problem, since the assignment is always evaluated after the
resize.
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
This is based on Christian's change, but removes all trace of the old codes.
I've proposed another spec change which removes this code altogether:
https://github.com/lightningnetwork/lightning-rfc/pull/544
Signed-off-by: Christian Decker <decker.christian@gmail.com>
Reported-by: Rusty Russell <@rustyrussell>
This is mainly just copying over the copy-editing from the
lightning-rfc repository.
[ Split to just perform changes prior to the UNKNOWN_PAYMENT_HASH change --RR ]
Signed-off-by: Christian Decker <decker.christian@gmail.com>
Reported-by: Rusty Russell <@rustyrussell>
Fortunately, we can calculate the sha256 ourselves, so the
outgoing channeld doesn't need to tell us.
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
When the next node tells us the onion is malformed, we now actually
report the failcode to lightningd (rather than generating an invalid
error as we do now).
We could generate the onion at this point, but except we don't know
the shared secret; we'd have to plumb that through from the incoming
channeld's HTLC.
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
This covers all the cases where an onion can be malformed; this means
we know in advance that it's bad. That allows us to distinguish two
cases: where lightningd rejects the onion as bad, and where the next
peer rejects the next onion as bad. Both of those (will) set failcode
to one of the BADONION values.
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
It's more natural than using a zero-secret when something goes wrong.
Also note that the HSM will actually kill the connection if the ECDH
fails, which is fortunately statistically unlikely.
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
Funder can't spend the fee it needs to pay for the commitment transaction:
we were not converting to millisatoshis, however!
This breaks our routeboost test, which no longer has sufficient funds
to make payment.
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
We have an incompatibility with lnd it seems: I've lost channels on
reconnect with 'sync error'. Since I never got this code to be reliable,
disable it for next release since I suspect it's our fault :(
And reenable the check which didn't work, for others to untangle.
I couldn't get option_data_loss_protect to be reliable, and I disabled
the check. This was a mistake, I should have either spent even more
time trying to get to the bottom of this (especially, writing test
vectors for the spec and testing against other implementations).
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
This is prep work for when we sign htlc txs with
SIGHASH_SINGLE|SIGHASH_ANYONECANPAY.
We still deal with raw signatures for the htlc txs at the moment, since
we send them like that across the wire, and changing that was simply too
painful (for the moment?).
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
We only use them for re-transmitting the last commitment tx,
and the HSM signs them sync so it's straight-line code.
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
This simplifies lifetime assumptions. Currently all callers keep the
original around, but everything broke when I changed that in the next
patch.
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
They were not universally used, and most are trivial accessors anyway.
The exception is getting the channel reserve: we have to multiply by 1000
as well as flip direction, so keep that one.
The BOLT quotes move to `struct channel_config`.
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
We probably also want to call secp_randomise/wally_secp_randomize here
too, and since these calls all call setup_tmpctx, it probably makes
sense to have a helper function to do all that. Until thats done, I
modified the tests so grepping will show the places where the sequence
of calls is repeated.
Signed-off-by: Jon Griffiths <jon_p_griffiths@yahoo.com>
This avoids some very ugly switch() statements which mixed the two,
but we also take the chance to rename 'towire_gossip_' to
'towire_gossipd_' for those inter-daemon messages; they're messages to
gossipd, not gossip messages.
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
This way there's no need for a context pointer, and freeing a msg_queue
frees its contents, as expected.
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
This was suggested by Pierre-Marie as the solution to the 'same HTLC,
different CLTV' signature mismatch.
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
Have c-lightning nodes send out the largest value for
`htlc_maximum_msat` that makes sense, ie the lesser of
the peer's max_inflight_htlc value or the total channel
capacity minus the total channel reserve.
LND does this, and we get upset with it. I had assumed we would only
do this after funding_locked (since we don't consider the channel
shortid stable until that point), but TBH 6 confirms is probably
enough.
Fixes: #1985
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
Under stress, it fails (test_restart_many_payments, the next test).
I suspect a deep misunderstanding in the comparison code, will chase
separately.
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
We do this a lot, and had boutique helpers in various places. So add
a more generic one; for convenience it returns a pointer to the new
end element.
I prefer the name tal_arr_expand to tal_arr_append, since it's up to
the caller to populate the new array entry.
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
That matches the other CSV names (HSM was the first, so it was written
before the pattern emerged).
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
@renepickhardt: why is it actually lightningd.c with a d but hsm.c without d ?
And delete unused gossipd/gossip.h.
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
When in this state, we send a canned error "Awaiting unilateral close".
We enter this both when we drop to chain, and when we're trying to get
them to drop to chain due to option_data_loss_protect.
As this state (unlike channel errors) is saved to the database, it means
we will *never* talk to a peer again in this state, so they can't
confuse us.
Since we set this state in channel_fail_permanent() (which is the only
place we call drop_to_chain for a unilateral close), we don't need to
save to the db: channel_set_state() does that for us.
This state change has a subtle effect: we return WIRE_UNKNOWN_NEXT_PEER
instead of WIRE_TEMPORARY_CHANNEL_FAILURE as soon as we get a failure
with a peer. To provoke a temporary failure in test_pay_disconnect we
take the node offline.
Reported-by: Christian Decker @cdecker
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
Firstly, if they claim to know a future value, we ask the HSM; if
they're right, we tell master what the per-commitment-secret it gave
us (we have no way to validate this, though) and it will not broadcast
a unilateral (knowing it will cause them to use a penalty tx!).
Otherwise, we check the results they sent were valid. The spec says
to do this (and close the channel if it's wrong!), because otherwise they
could continually lie and give us a bad per-commitment-secret when we
actually need it.
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
For option_data_loss_protect, the peer can prove to us that it's ahead;
it gives us the (hopefully honest!) per_commitment_point it will use,
and we make sure we don't broadcast the commitment transaction we have.
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
We ignore incoming for now, but this means we advertize the option and
we send the required fields.
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
We quote BOLT 2 on *local* above the *remote* checks (we quote it
again below when we do the local checks).
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
1. l1 update_fee -> l2
2. l1 commitment_signed -> l2 (using new feerate)
3. l1 <- revoke_and_ack l2
4. l1 <- commitment_signed l2 (using new feerate)
5. l1 -> revoke_and_ack l2
When we break the connection after #3, the reconnection causes #4 to
be retransmitted, but it turns out l1 wasn't telling the master to set
the local feerate until it received the commitment_signed, so on
reconnect it uses the old feerate, with predictable results (bad
signature).
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
Now sending a ping makes sense: it should force the other end to send
a reply, unblocking the commitment process.
Note that rather than waiting for a reply, we're actually spinning on
a 100ms loop in this case. But it's simple and it works.
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
This doesn't do much (though we might get an error before we send the
commitment_signed), but it's infrastructure for the next patch.
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
We were adding channels without their capacity, and eventually annotated them
when we exchanged `channel_update`s. This worked as long as we weren't
considering the channel capacity, but would result in local-only channels to be
unusable once we start checking.
Also means we simplify the handle_gossip_msg() since everyone wants it to
use sync_crypto_write().
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
This is clearer and neater, and even slightly more efficient, since
read_peer_msg() was calling poll() again on gossipfd.
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
tal_count() is used where there's a type, even if it's char or u8, and
tal_bytelen() is going to replace tal_len() for clarity: it's only needed
where a pointer is void.
We shim tal_bytelen() for now.
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
We use these for receiving arrays at init time, we should also use them
for fulfull/fail of HTLCs in normal operation. That we we benefit from all
those assertions.
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
The master tells us the short_channel_id of the outgoing channel, and
channeld is supposed to get the corresponding channel_update from gossipd.
Instead, it got the channel_update for the *local* channel and ignored
that one.
The master tells us the short_channel_id of the outgoing channel when
failing an HTLC, but channeld didn't store it anywhere. It also
didn't tell channeld the short_channel_id in the case where we're
reconnecting and it's feeding us an array of failed htlcs.
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
That was the cause of the bad gossip order failures: gossipd thought our
channel was live, but the other end didn't receive message last time.
Now gossipd doesn't use fd to kill us (connectd tells master to do so), we
can implement read_peer_msg_nogossip().
Fixes: #1706
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
This will avoid us having to round-trip to the HSM each time we want it.
For now we still derive it, too, and assert it's correct.
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
Well, it's generated by shachain, so technically it is a sha256, but
that's an internal detail. It's a secret.
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>