That was the cause of the bad gossip order failures: gossipd thought our
channel was live, but the other end didn't receive message last time.
Now gossipd doesn't use fd to kill us (connectd tells master to do so), we
can implement read_peer_msg_nogossip().
Fixes: #1706
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
This will avoid us having to round-trip to the HSM each time we want it.
For now we still derive it, too, and assert it's correct.
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
Well, it's generated by shachain, so technically it is a sha256, but
that's an internal detail. It's a secret.
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
I'm not completely convinced that it's only ever set to a failcode
with the BADONION bit set, especially after the previous patches in
this series. Now that channeld can handle arbitrary failcodes passed
this way, simply rename it.
We add marshalling assertions that only one of failcode and failreason
is set, and we unmarshal an empty 'fail' to NULL (just the the
generated unmarshalling code does).
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
None of these sanity checks should fail, but let's be thorough: we
were testing for htlc->fail but not failcode when fulfilling an HTLC.
The failing-htlc case had this correct already.
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
'struct htlc' in channeld has a 'malformed' field, which is really only
used in the "retransmit updates on reconnect" case. That's quite confusing,
and I'm not entirely convinced that it can only be set to a failcode
with the BADONION bit set.
So generalize it, using the same logic we use in the master daemon:
failcode: a locally generated error, for channeld to turn into the appropriate
error message.
fail: a remotely generated onion error, for forwarding.
Either of these being non-zero/non-NULL means we've failed, and only one
should be set at any time.
We unify the "send htlc fail/fulfill update due to retransmit" and the
normal send update paths, by always calling send_fail_or_fulfill.
This unification revealed that we accidentally skipped the
onion-wrapping stage when we retransmit failed htlcs!
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
structeq() is too dangerous: if a structure has padding, it can fail
silently.
The new ccan/structeq instead provides a macro to define foo_eq(),
which does the right thing in case of padding (which none of our
structures currently have anyway).
Upgrade ccan, and use it everywhere. Except run-peer-wire.c, which
is only testing code and can use raw memcmp(): valgrind will tell us
if padding exists.
Interestingly, we still declared short_channel_id_eq, even though
we didn't define it any more!
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
This resolves the problem where both channeld and gossipd can generate
updates, and they can have the same timestamp. gossipd is always able
to generate them, so can ensure timestamp moves forward.
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
Instead of considering it a temporary step, consider it a necessary preamble
to sending updates.
This means (in the next patch) when we tell gossipd to generate the updates,
it's always done after we've told it to create the channel.
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
If we hit depth 6, we would start exchanging announcement signatures.
However, we should still send a temporary update while waiting for the
reply; make the logic clear in this case that we should always send
one or the other.
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
The condition in send_channel_update is wrong: it needs to match the
conditions under which we send announcements.
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
Note: this will break the gossip_store if they have current channels,
but it will fail to parse and be discarded.
Have local_add_channel do just that: the update is logically separate
and can be sent separately.
This removes the ugly 'bool add_to_store' flag.
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
We always call:
send_temporary_announcement(peer);
send_announcement_signatures(peer);
We should handle these in one place, since the conditional at the top
of them actually makes sure only one is effective. We also make the
caller set the peer->have_sigs[LOCAL] flag, instead of doing it
inside send_announcement_signatures().
We were sending announcements at the wrong time (on restart) somtimes.
We also move announce_channel() into the same logic, so it's always
together.
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
Just have a "new depth" callback, and let channeld do the right thing.
This makes the channeld paths a bit more straightforward.
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
Some paths (eg reconnect) were unconditionally sending a channel_update.
valgrind wasn't catching it because we unmarshal short_channel_ids[LOCAL]
as all-zeroes, so it's technically "initialized".
Create a wrapper to do this, and change the 'bool disabled' flag to be
the explicit disable flag value for clarity.
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
Currently it's always for messages to peer: make that status_peer_io and
add a new status_io for other IO.
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
This is a rebased and combined patch for Tor support. It is extensively
reworked in the following patches, but the basis remains Saibato's work,
so it seemed fairest to begin with this.
Minor changes:
1. Use --announce-addr instead of --tor-external.
2. I also reverted some whitespace and unrelated changes from the patch.
3. Removed unnecessary ';' after } in functions.
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
If channeld dies for some reason (eg, reconnect) and we didn't yet announce
the channel, we can miss doing so. This is unusual, because if lightningd
restarts it rearms the callback which gives us funding_locked, so it only
happens if just channel dies before sending the announcement message.
This problem applies to both temporary announcement (for gossipd) and
the real one. For the temporary one, simply re-send on startup, and
remote the error msg gossipd gives if it sees a second one. For the
real one, we need a flag to tell us the depth is sufficient; the peer
will ignore re-sends anyway.
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
When we get a reconnection, kill the current remote peer, and wait for the
master to tell us it's dead. Then we hand it the new peer.
Previously, we would end up with gossipd holding multiple peers, and
the logging was really hard to interpret; I'm not completely convinced
that we did the right thing when one terminated, either.
Note that this now means we can have peers with neither ->local nor ->remote
populated, so we check that more carefully.
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
This means that openingd and closingd now forward our gossip. But the real
reason we want to do this is that it gives an easy way for gossipd to kill
any active daemon, by closing its fd: previously closingd and openingd didn't
read the fd, so tended not to notice.
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
(This was sitting in my gossip-enchancement patch queue, but it simplifies
this set too, so I moved it here).
In 94711969f we added an explicit gossip_index so when gossipd gets
peers back from other daemons, it knows what gossip it has sent (since
gossipd can send gossip after the other daemon is already complete).
This solution is insufficient for the more general case where gossipd
wants to send other messages reliably, so replace it with the other
solution: have gossipd drain the "gossip fd" which the daemon returns.
This turns out to be quite simple, and is probably how I should have
done it originally :(
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
We missed it in some corner cases where we crashed/were killed between
being told of the lockin and sending the channel_normal_operation message.
When we were restarted, we were told both sides were locked in already,
so we never updated the state.
Pull the entire "tell channeld" logic into channel_control.c, and make
it clear that we need to keep waching if we cant't tell channeld. I think
we did get this correct in practice, since funding_announce_cb has the
same test, but it's better to be clear.
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
In particular, the main daemon and subdaemons share the backtrace code,
with hooks for logging.
The daemon hook inserts the io_poll override, which means we no longer
need io_debug.[ch]. Though most daemons don't need it, they still link
against ccan/io, so it's harmess (suggested by @ZmnSCPxj).
This was tested manually to make sure we get backtraces still.
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
I didn't convert all tests: they can still use a standalone context.
It's just marginally more efficient to share the libwally one for all
our daemons which link against it anyway.
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>