We use these for receiving arrays at init time, we should also use them
for fulfull/fail of HTLCs in normal operation. That we we benefit from all
those assertions.
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
The master tells us the short_channel_id of the outgoing channel, and
channeld is supposed to get the corresponding channel_update from gossipd.
Instead, it got the channel_update for the *local* channel and ignored
that one.
The master tells us the short_channel_id of the outgoing channel when
failing an HTLC, but channeld didn't store it anywhere. It also
didn't tell channeld the short_channel_id in the case where we're
reconnecting and it's feeding us an array of failed htlcs.
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
That was the cause of the bad gossip order failures: gossipd thought our
channel was live, but the other end didn't receive message last time.
Now gossipd doesn't use fd to kill us (connectd tells master to do so), we
can implement read_peer_msg_nogossip().
Fixes: #1706
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
This will avoid us having to round-trip to the HSM each time we want it.
For now we still derive it, too, and assert it's correct.
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
Well, it's generated by shachain, so technically it is a sha256, but
that's an internal detail. It's a secret.
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
I'm not completely convinced that it's only ever set to a failcode
with the BADONION bit set, especially after the previous patches in
this series. Now that channeld can handle arbitrary failcodes passed
this way, simply rename it.
We add marshalling assertions that only one of failcode and failreason
is set, and we unmarshal an empty 'fail' to NULL (just the the
generated unmarshalling code does).
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
None of these sanity checks should fail, but let's be thorough: we
were testing for htlc->fail but not failcode when fulfilling an HTLC.
The failing-htlc case had this correct already.
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
'struct htlc' in channeld has a 'malformed' field, which is really only
used in the "retransmit updates on reconnect" case. That's quite confusing,
and I'm not entirely convinced that it can only be set to a failcode
with the BADONION bit set.
So generalize it, using the same logic we use in the master daemon:
failcode: a locally generated error, for channeld to turn into the appropriate
error message.
fail: a remotely generated onion error, for forwarding.
Either of these being non-zero/non-NULL means we've failed, and only one
should be set at any time.
We unify the "send htlc fail/fulfill update due to retransmit" and the
normal send update paths, by always calling send_fail_or_fulfill.
This unification revealed that we accidentally skipped the
onion-wrapping stage when we retransmit failed htlcs!
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
structeq() is too dangerous: if a structure has padding, it can fail
silently.
The new ccan/structeq instead provides a macro to define foo_eq(),
which does the right thing in case of padding (which none of our
structures currently have anyway).
Upgrade ccan, and use it everywhere. Except run-peer-wire.c, which
is only testing code and can use raw memcmp(): valgrind will tell us
if padding exists.
Interestingly, we still declared short_channel_id_eq, even though
we didn't define it any more!
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
This resolves the problem where both channeld and gossipd can generate
updates, and they can have the same timestamp. gossipd is always able
to generate them, so can ensure timestamp moves forward.
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
Instead of considering it a temporary step, consider it a necessary preamble
to sending updates.
This means (in the next patch) when we tell gossipd to generate the updates,
it's always done after we've told it to create the channel.
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
If we hit depth 6, we would start exchanging announcement signatures.
However, we should still send a temporary update while waiting for the
reply; make the logic clear in this case that we should always send
one or the other.
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
The condition in send_channel_update is wrong: it needs to match the
conditions under which we send announcements.
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
Note: this will break the gossip_store if they have current channels,
but it will fail to parse and be discarded.
Have local_add_channel do just that: the update is logically separate
and can be sent separately.
This removes the ugly 'bool add_to_store' flag.
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
We always call:
send_temporary_announcement(peer);
send_announcement_signatures(peer);
We should handle these in one place, since the conditional at the top
of them actually makes sure only one is effective. We also make the
caller set the peer->have_sigs[LOCAL] flag, instead of doing it
inside send_announcement_signatures().
We were sending announcements at the wrong time (on restart) somtimes.
We also move announce_channel() into the same logic, so it's always
together.
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
Just have a "new depth" callback, and let channeld do the right thing.
This makes the channeld paths a bit more straightforward.
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
Some paths (eg reconnect) were unconditionally sending a channel_update.
valgrind wasn't catching it because we unmarshal short_channel_ids[LOCAL]
as all-zeroes, so it's technically "initialized".
Create a wrapper to do this, and change the 'bool disabled' flag to be
the explicit disable flag value for clarity.
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
Currently it's always for messages to peer: make that status_peer_io and
add a new status_io for other IO.
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
This is a rebased and combined patch for Tor support. It is extensively
reworked in the following patches, but the basis remains Saibato's work,
so it seemed fairest to begin with this.
Minor changes:
1. Use --announce-addr instead of --tor-external.
2. I also reverted some whitespace and unrelated changes from the patch.
3. Removed unnecessary ';' after } in functions.
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
If channeld dies for some reason (eg, reconnect) and we didn't yet announce
the channel, we can miss doing so. This is unusual, because if lightningd
restarts it rearms the callback which gives us funding_locked, so it only
happens if just channel dies before sending the announcement message.
This problem applies to both temporary announcement (for gossipd) and
the real one. For the temporary one, simply re-send on startup, and
remote the error msg gossipd gives if it sees a second one. For the
real one, we need a flag to tell us the depth is sufficient; the peer
will ignore re-sends anyway.
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
When we get a reconnection, kill the current remote peer, and wait for the
master to tell us it's dead. Then we hand it the new peer.
Previously, we would end up with gossipd holding multiple peers, and
the logging was really hard to interpret; I'm not completely convinced
that we did the right thing when one terminated, either.
Note that this now means we can have peers with neither ->local nor ->remote
populated, so we check that more carefully.
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
This means that openingd and closingd now forward our gossip. But the real
reason we want to do this is that it gives an easy way for gossipd to kill
any active daemon, by closing its fd: previously closingd and openingd didn't
read the fd, so tended not to notice.
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
(This was sitting in my gossip-enchancement patch queue, but it simplifies
this set too, so I moved it here).
In 94711969f we added an explicit gossip_index so when gossipd gets
peers back from other daemons, it knows what gossip it has sent (since
gossipd can send gossip after the other daemon is already complete).
This solution is insufficient for the more general case where gossipd
wants to send other messages reliably, so replace it with the other
solution: have gossipd drain the "gossip fd" which the daemon returns.
This turns out to be quite simple, and is probably how I should have
done it originally :(
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
We missed it in some corner cases where we crashed/were killed between
being told of the lockin and sending the channel_normal_operation message.
When we were restarted, we were told both sides were locked in already,
so we never updated the state.
Pull the entire "tell channeld" logic into channel_control.c, and make
it clear that we need to keep waching if we cant't tell channeld. I think
we did get this correct in practice, since funding_announce_cb has the
same test, but it's better to be clear.
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
In particular, the main daemon and subdaemons share the backtrace code,
with hooks for logging.
The daemon hook inserts the io_poll override, which means we no longer
need io_debug.[ch]. Though most daemons don't need it, they still link
against ccan/io, so it's harmess (suggested by @ZmnSCPxj).
This was tested manually to make sure we get backtraces still.
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
I didn't convert all tests: they can still use a standalone context.
It's just marginally more efficient to share the libwally one for all
our daemons which link against it anyway.
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
If we're going to simply take() a pointer, don't allocate it off a random
object. Using NULL makes our intent clear, particularly with allocating
packets we're going to take() onto a queue.
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
I did a brief audit of tmpctx uses, and we do leak them in various
corner cases. Fortunely, all our daemons are based on some kind of
I/O loop, so it's fairly easy to clean a global tmpctx at that point.
This makes things a bit neater, and slightly more efficient, but also
clearer: I avoided creating a tmpctx in a few places because I didn't
want to add another allocation. With that penalty removed, I can use
it more freely and hopefully write clearer code.
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
This simplifies things, and means it's always in the database. Our
previous approach to creating it on the fly had holes when it was
created for onchaind, causing us to use another every time we
restarted.
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
For the moment, this just tracks the lockin, announce and shutdown
statuses.
We currently have trouble telling when we're stuck in
CHANNELD_AWAITING_LOCKIN who has sent the transaction.
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
We always hand in "NULL" (which means use tal_len on the msg), except
for two places which do that manually for no good reason.
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
Because peer_failed would previously drop the connection, we had a
special 'negotiation_failed' message which made the master hand it
back to gossipd. We don't need that any more.
This also meant we no longer need a special hook in read_peer_msg
for openingd to send this message.
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
Several daemons (onchaind, hsm) want to use the status messages, but
don't communicate with peers. The coming changes made them drag in
more code they didn't need, so instead we have a different
non-overlapping type.
We combine the status_received_errmsg and status_sent_errmsg
into a single status_peer_error, with the presence or not of the
'error_for_them' field indicating direction.
We also rename status_fatal_connection_lost() to
peer_failed_connection_lost() to fit in.
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
We make it a macro, since everyone uses PEER_FD and GOSSIP_FD constants
(they're actually always the same, but this is slightly safer), and
add a gossip_index arg: this is groundwork for when we want to hand
the peer back to master for gossipd.
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
This avoids clashing with the new_channel we're about to add to lightningd,
and also matches its counterpart new_initial_channel.
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
Now we have wirestring, this is much more natural. And with the
24M length limit, we needn't be so concerned about dumping 64k peer
messages in hex.
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
These are now logically arrays of pointers. This is much more natural,
and gets rid of the horrible utxo array converters.
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
We need to override two methods: the io error (tell gossipd to
disable), and send reply (enqueue, don't write direclty).
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
In particular, decode error messages correctly and do the right thing with
messages about other channels.
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
Right now it allows ping but not pong.
If A sends a ping expecting a pong to B during CHANNELD_AWAITING_LOCKIN,
It would result in
`STATUS_FAIL_PEER_BAD: WIRE_PONG (19) before funding locked`
resulting in a unilateral channel close by A.
Disabling the channel and enqueing the update for broadcast so we
don't get forwarding requests from remote peers, and we don't try to
ourselves.
Signed-off-by: Christian Decker <decker.christian@gmail.com>
Sends a disable channel_update before issuing the shutdown message,
gossipd will also take care to update others and not use for future
routes.
Signed-off-by: Christian Decker <decker.christian@gmail.com>
Travis gave an error:
```
DEBUG:root:lightningd(16333): lightning_closingd(8004): STATUS_FAIL_PEER_BAD: Expected closing_signed:
0085b679bd79b836b05c649cad9af31156cb1d50de448a59c6359ab7c85f4b63913d2e3bc8ad4a80ab698558e5b4949b78dc36acc90dde4f5ac006fd6ca1d109feea03aef9c718e9ce09bbb52dc8308ba8f46b43808ea1a551d41aee72af7af77628d1
```
Which is caused by us not waiting for the revoke-and-ack from a feechange
when we're shutting down.
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
We could do this lazily, if HTLC errors out, but we do it as HTLCs
come in in the normal case, so this is slightly simpler.
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
This, of course, should never be used. But it helps maintain connections
for the moment while we dig deeper into feerates.
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
1. htlc->fail has been changed to a u8 *.
2. wallet_get_newindex saves to the db.
3. peer->next_htlc_id is saved to the db in peer_save_commitsig_sent() below.
4. We do store commit in peer_save_commitsig_received(peer, commitnum),
and the fixme below talks about HTLC sigs.
5. We do commit shachain and next_per_commit_point in wallet_shachain_add_hash
and update_per_commit_point respectively.
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
Our handling of SIGPIPE was incoherent and inconsistent, and we had much
cut & paste between the daemons. They should *ALL* ignore SIGPIPE, and
much of the rest of the boilerplate can be shared, so should be.
Reported-by: @ZmnSCPxj
Fixes: #528
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
It's just a sha256_double, but importantly when we convert it to a
string (in type_to_string, which is used in logging) we use
bitcoin_blkid_to_hex() so it's reversed as people expect.
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
It's just a sha256_double, but importantly when we convert it to a
string (in type_to_string, which is used in logging) we use
bitcoin_txid_to_hex() so it's reversed as people expect.
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
I noted a spurious failure on test_reconnect_sender_add1: we
actually sent an update_commit, which should have been suppressed.
This was because we call dev_disconnect() when we *dequeue* the packet,
which might be too late to suppress the timer. So instead, call it
when the packet in enqueued, and flush synchronously to make sure
we get the right packet.
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
We create a temporary tx which is a child of the real tx, for simplicity of
marshalling. That's OK.
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
When gossipd sends a message, have a gossip_index. When it gets back a
peer, the current gossip_index is included, so it can know exactly where
it's up to.
Most of this is mechanical plumbing through openingd, channeld and closingd,
even though openingd and closingd don't (currently) read gossip, so their
gossip_index will be unchanged.
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
The #389 introduced some changes that conflicted with
9de3827199 so this ports those changes
into #389 and fixes the `master` branch again.
Lesson learned: always rebase a PR before merging.
Signed-off-by: Christian Decker <decker.christian@gmail.com>
The same as master request/response: we queue up incoming replies we
don't want for later processing.
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
We revert to a simple select() loop. This makes things simpler, and fixes
the problem where we want to exit but we've partially read a peer packet.
We still queue up outgoing peer packets for non-blocking send: if we
went full sync there, we'd risk deadlock if both sides wrote a huge
number of packets and neither was reading.
This also greatly simplifies the next patches, where we want to make
our first get/response from gossipd.
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
The master now hands channeld either an error code, and channeld
generates the error message, or an error message relayed from another
node to pass through.
This doesn't fill in the channel_update yet: we need to wire up gossipd
to give us that.
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
Currently lightningd does this, but channeld is perfectly capable of doing it.
channeld is also in a far better position to add channel_updates to it.
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
The bulk of this patch is actually hoisting the get_shared_secret()
function (unchanged) so we can call it earlier.
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
We can tell this more generically because the count of revocations
received != count of commitments sent. This is the correct condition
which allows us to restore the test we had to eliminate in
c3cb7f1c85.
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
We only send them when we're not awaiting revoke_and_ack: our
simplified handling can't deal with multiple in flights.
Closes: #244
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
We can have it happen on reconnect due to fee changes; we should really
detect this case, but it's harmless to let it happen as a noop.
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
Handling feerates for the fundee (who only receives fee_update) is
simple: it's practically atomic since we accept commitment and send
revocation, thus they're applied to both sides at once.
Handling feerates for the funder is more complex: in theory we could
have multiple in flight. However, if we avoid this using the same
logic as we use to suppress multiple commitments in flight, it's
simple again.
We fix the test code to use real feerate manipulation, thus have to
remove an assert about feerate being non-zero. And now we have
feechanges, we need to rely on the changes_pending flags, as we can
have changes without an HTLCs changing.
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
The wire protocol uses this, in the assumption that we'll never see feerates
in excess of 4294967 satoshi per kiloweight.
So let's use that consistently internally as well.
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
Test objects must be added to $(ALL_OBJS) so they correctly depend on
CCAN headers etc.
Also, each test in a subdir must depend on headers and src in the parent
directory, as it will often #include them directly.
Reported-by: Christian Decker
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
We currently scan through HTLCs: this isn't enough if we've only got a
feechange in the commitment, so use a flag (but keep both for now for
debugging).
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
All the callers need to pass it in: currently channeld and openingd just
fake it by copying the payment point.
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
We were sending the announcement_signatures as soon as we locally
locked and got the announcement_depth, this doesn't make the channel
usable any sooner and forces the other side to stash the
signature. This defers the announcement_signature until the channel
really is usable.
This is done by adding an additional check for the remote locked
message and adding a trigger on remote lock.
Signed-off-by: Christian Decker <decker.christian@gmail.com>
These need to be different for testing the example in BOLT 11.
We also use the cltv_final instead of deadline_blocks in the final hop:
various tests assumed 5 was OK, so we tweak utils.py.
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
This is a bit messier than I'd like, but we want to clearly remove all
dev code (not just have it uncalled), so we remove fields and functions
altogether rather than stub them out. This means we put #ifdefs in callers
in some places, but at least it's explicit.
We still run tests, but only a subset, and we run with NO_VALGRIND under
Travis to avoid increasing test times too much.
See-also: #176
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
In this case, it was a gossip message half-sent, when we asked the peer
to be released. Fix the problem in general by making send_peer_with_fds()
wait until after the next packet.
test_routing_gossip/lightning-4/log:
b'lightning_openingd(8738): TRACE: First per_commit_point = 02e2ff759ed70c71f154695eade1983664a72546ebc552861f844bff5ea5b933bf'
b'lightning_openingd(8738): TRACE: Failed hdr decrypt with rn=11'
b'lightning_openingd(8738): STATUS_FAIL_PEER_IO: Reading accept_channel: Success'
test_routing_gossip/lightning-5/log:
b'lightning_gossipd(8461): UPDATE WIRE_GOSSIP_PEER_NONGOSSIP'
b'lightning_gossipd(8461): UPDATE WIRE_GOSSIP_PEER_NONGOSSIP'
b'lightningd(8308): Failed to get netaddr for outgoing: Transport endpoint is not connected'
The problem occurs here on release, but could be on any place where we hand
a peer over when using ccan/io. Note the other case (channel.c).
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
There is a race we see sometimes under valgrind on Travis which shows
gossipd receiving the node_announce from master before it reads the
channel_announce from channeld, and thus fails. The simplest solution
is to send the channel_announce and channel_update to master as well,
so it can ensure it sends them to gossipd in order
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
In particular, the main daemon needs to pass it about (marshal/unmarshal)
but it won't need to actually use it after the next patch.
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
We were sending a channeld message to onchaind, which was v. confusing
due to overlap. We make all the numbers distinct, which means we can
also add an assert() that it's valid for that daemon, which catches
such errors immediately.
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
This change is really to allow us to have a --dev-fail-on-subdaemon-fail option
so we can handle failures from subdaemons generically.
It also neatens handling so we can have an explicit callback for "peer
did something wrong" (which matters if we want to close the channel in
that case).
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
We hit:
assert(!peer->handle_master_reply);
#4 0x000055bba3b030a0 in master_sync_reply (peer=0x55bba41c0030,
msg=0x55bba41c6a80 "", replytype=WIRE_CHANNEL_GOT_COMMITSIG_REPLY,
handle=0x55bba3b041cf <handle_reply_wake_peer>) at channeld/channel.c:518
#5 0x000055bba3b049bc in handle_peer_commit_sig (conn=0x55bba41c10d0,
peer=0x55bba41c0030, msg=0x55bba41c6a80 "") at channeld/channel.c:959
#6 0x000055bba3b05c69 in peer_in (conn=0x55bba41c10d0, peer=0x55bba41c0030,
msg=0x55bba41c67c0 "") at channeld/channel.c:1339
#7 0x000055bba3b123eb in peer_decrypt_body (conn=0x55bba41c10d0,
pcs=0x55bba41c0030) at common/cryptomsg.c:155
#8 0x000055bba3b2c63b in next_plan (conn=0x55bba41c10d0, plan=0x55bba41c1100)
at ccan/ccan/io/io.c:59
We got a commit_sig from the peer while waiting for the master to
reply to acknowledge the commitsig we want to send
(handle_sending_commitsig_reply).
The fix is to go always talk to the master synchronous, and not try to
process anything but messages from the master daemon. This avoids the
whole class of problems.
There's a fairly simple way to do this, as ccan/io lets you override
its poll call: we process any outstanding master requests there, or
add the master fd to the pollfds array.
Fixes: #266
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
The logic of dispatching the announcement_signatures message was
distributed over several places and daemons. This aims to simplify it
by moving it all into `channeld`, making peer_control only report
announcement depth to `channeld`, which then takes care of the
rest. We also do not reuse the funding_locked tx watcher since it is
easier to just fire off a new watcher with the specific purpose of
waiting for the announcement_depth.
Signed-off-by: Christian Decker <decker.christian@gmail.com>
jl777 reported a crash when we try to pay past reserve. Fix that (and
a whole class of related bugs) and add tests.
In test_lightning.py I had to make non-async path for sendpay() non-threaded
to get the exception passed through for testing.
Closes: #236
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>