As tracked down by Christian; by setting up the master conn first,
we make the master fd async. This means that the synchronous read
(in init_channel) can fail with -EAGAIN, and indeed, Christian
saw this when not running under valgrind.
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
We actually don't need to transition if we're reconnecting, and logic
to go to CHANNELD_NORMAL was wrong: we checked that we'd seen funding tx
locked, but not that we'd received a msg from the remote peer.
We need to fix the tests now we no longer double-transition, too.
Fixes: #188
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
Important: a non-standard one can make the closing tx not propagate.
Drive-by cut&paste message fix, too.
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
This is what it actually is, and makes it clearer when we refer to the
spec. It's the commitment we're currently updating, which is the next
commitment.
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
We currently have the problem that the master can send new HTLCs before
we've processed the incoming reestablish message.
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
The next patch includes wire/peer_wire.h and causes a compile error
as lightningd/gossip_control.c defined its own gossip_msg function.
New names are clearer.
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
This can happen even without a protocol violation, if the incoming
update_add_htlc crosses over our outgoing shutdown.
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
We keep the scriptpubkey to send until after a commitment_signed (or,
in the corner case, if there's no pending commitment). When we
receive a shutdown from the peer, we pass it up to the master.
It's up to the master not to add any more HTLCs, which works because
we move from CHANNELD_NORMAL to CHANNELD_SHUTTING_DOWN.
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
We don't need to keep this around any more: by handing it to
subdaemons we ensure we'll close it if the peer disconnects, and we
also add code to get a new one on reconnection.
Because getting a gossip_fd is async, we re-check the peer state after
it gets back. This is kind of annoying: perhaps if we were to hand
the reconnected peer through gossipd (with a flag to immediately
return it) we could get the gossip fd that way and unify the paths?
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
At the moment, master simply keeps the gossip fd open when peer
disconnects. That's inefficient, and wrong anyway (it may want a
complete new sync, or may not, but we'll currently send all the
messages including stale ones).
This interface will be required for restart anyway.
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
Now we're always sync, just use an fd. Put the hsm_sync_read() helper
here, too, and do HSM init sync which makes things much simpler.
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
With no async calls left, we can just use a stack variable for the fd.
And we're now *always* in the hands of some daemon, unless we're
disconnected, so owner is only NULL in that case.
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
We had a terrible hack in gossip when a peer didn't exist. Formalize
a pattern when code+200 is a failure (with no fds passed), and use it
here.
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
This means there's no GETTING_HSMFD state at all any more. We
temporarily play games with the hsm fd; those will go away once we're
done.
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
This means there's no GETTING_SIG_FROM_HSM state at all any more. We
temporarily play games with the hsm fd; those will go away once we're
done.
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
Wallet should really be the container for anything bip32 related, so
I'd like to slowly wean off of `ld->bip32_base` in favor of
`ld->wallet->bip32_base`
We'll re-use them a few times so having them at a central location is
nice. We also fix a bug that was unreserving UTXO entries upon free,
instead of promoting them to being spent.
So far we always needed to know the public key, which was not the case
for addresses that we don't own. Moving the hashing outside of the
script construction allows us to send to arbitrary addresses. I also
added the hash computation to the pubkey primitives.
When we get a fail/fulfill on an outgoing HTLC, we tell the correspoding
incoming HTLC about it. But if that peer is disconnected, we don't.
The better solution is to copy the preimage/malformed/failmessage and mark
the incoming HTLC as resolved. This is done most simply by marking it
SENT_REMOVE_HTLC, which will work in the database case as well.
channeld now re-transmits appropriately when it gets started with an HTLC
in that state.
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
This matches what the master does: increments commit index when we send
commit_sig. Thus if we restart at that point, we match.
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
This matters in one case: channeld receiving a bad message is a
permenant failure, whereas losing a connection is transient.
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
We need the old remote per_commitment_point so we can validate the
per_commitment_secret when we get it.
We unify this housekeeping in the master daemon using
update_per_commit_point().
This patch also saves whether remote funding is locked, and disallows
doing that twice (channeld should ignore it).
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
It's a bit tricky since we want to hand more verbose errors to the local
case, but the locally-created and forwarded paths had diverged (the local
one missing some things).
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
There are two ways we can do retransmission on reconnect: re-derive
what we would have sent, or remember it and simply re-send. The
rederivation is difficult: unwinding state depends on whether we sent
a revoke_and_ack before or after the commitment_signed, and unwinding
a revoke_and_ack would require us to remember HTLCs we would have
normally forgotten at this point.
So we simply tell the master to remember the old signatures for us,
and hand them back in case we need to re-send.
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
In the case where we can't decrypt the onion, we can't fail it in the
normal way (which is encrypted using the onion shared secret), we need
to respond with a update_fail_malformed_htlc message.
Moreover, we need to remember this for persistence. This means that
we really have three conclusions for an HTLC: fulfilled, failed,
malformed. Fix up the logic everywhere which assumed failed or
fulfilled.
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
It's easiest to have the master keep the last commit we sent, for
re-transmission. We could recalculate it, but it's made more difficult
by the before/after revoke case.
And because revoke_and_ack changes the channel state, we need to
remember which order we sent them in for re-transmission.
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
We need this for reestablishing a channel.
(Note: this patch changes quite a bit in this series, but reshuffling was
tedious).
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
Currently it's fairly ad-hoc, but we need to tell it to channeld when
it restarts, so we define it as the non-HTLC balance.
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
It needs to save them to the db in case of restart; this means we tell
it about funding_locked, as well as the next_per_commit_point given
in revoke_and_ack.
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
The channel daemon gets the shared secrets from the HSM to save
the master daemon some work. It used to hand these over at
revoke_and_ack receive, which is when the master daemon needs them.
However, it's a bit simpler to hand them over when we first tell
the master about the incoming HTLC (the first commitsig).
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
They share some fields, but they're basically different, and it's clearest
to treat them differently in most places.
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
When adding their HTLCs, it needs all the information. When failing,
it needs the id as key and the failure reason. When fulfilling, it
needs the id and payment preimage.
It also needs to know when we have received an revoke_and_ack or a
commitment_signed, to place in the database.
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
We're about to change to a batch interface, where we tell the master
before we send certain packets (eg. commit, revoke). We need to wait
for it to respond before doing anything else, but it might cross-over
and be sending us commands at the same time.
This queues those requests until we're ready.
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
This prepares us for handlers turning off peer I/O, rather than assuming
we always want to handle the next incoming message.
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
We still get the shared secret, since that requires a round trip to the HSM
(why waste the master daemon's time?) but it does the processing, which
simplifies the message passing and things like realm handling which
have nothing to do with this particular channeld.
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
Some paths were still sending unencrypted failure messages; unify them
all. We need to keep the fail_msg around for resubmission if the
channeld dies; similarly, we need to keep the htlc_end structure
itself after failure, in case the failed HTLC is committed: we can
move it to a minimal archive once it's flushed from both sides,
however.
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
Means caller has to do some more work, but this is closer to what we want:
we're going to want to send them to the master daemon for atomic commit.
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
We used level -1 to mean "append to log", but that doesn't actually
work, and results in an assert if we try to prune the logs:
lightningd: daemon/pseudorand.c:36: pseudorand: Assertion `max' failed.
Expose logv_add and use that.
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
We use --log-level to control this, but we could add another switch.
It makes the test infrastructure simpler, since we can just look in the
main logs.
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
In particular, it reassured me that the ammag obfuscation step occurs
even for the initial failmsg creator.
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
I actually hit this very hard to reproduce race: if we haven't process
the channeld message when block #6 comes in, we won't send the gossip
message. We wait for logs, but don't generate new blocks, and timeout
on l1.daemon.wait_for_log('peer_out WIRE_ANNOUNCEMENT_SIGNATURES').
The solution, which also tests that we don't send announcement signatures
immediately, is to generate a single block, wait for CHANNELD_NORMAL,
then (in gossip tests), generate 5 more.
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
Since we have a simple way to query the database for UTXOs we can
simplify some of the coin selection logic. That gets rid of the
in-memory list of UTXOs.
Not the nicest code, but it allows us to store the bip32_max_index so
that we don't forget our addresses upon restart. We could have done
the same by retrieving the max index from our index, but then we'd
forget addresses that don't have an associated output. Conversion
to/from string is so that we can store arbitrary one off values in the
DB in the future, independent of type.
The format we use to generate marshal/unmarshal code is from
the spec's tools/extract-formats.py which includes the offset:
we don't use it at all, so rather than having manually-calculated
(and thus probably wrong) values, or 0, emit it altogther.
Reported-by: Christian Decker
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
We use this to make it send the funding_signed message, rather than having
the master daemon do it (which was even more hacky). It also means it
can handle the crypto, so no need for the packet to be handed up encrypted,
and also make --dev-disconnect "just work" for this packet.
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
We use a file descriptor, so when we consume an entry, we move past it
(and everyone shares a file offset, so this works).
The file contains packet names prefixed by - (treat fd as closed when
we try to write this packet), + (write the packet then ensure the file
descriptor fails), or @ ("lose" the packet then ensure the file
descriptor fails).
The sync and async peer-write functions hook this in automatically.
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
Header from folded patch 'test-run-cryptomsg__fix_compilation.patch':
test/run-cryptomsg: fix compilation.
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
Valgrind error file: /tmp/lightning-8k06jbb3/test_disconnect/lightning-7/valgrind-errors
==32307== Uninitialised byte(s) found during client check request
==32307== at 0x11EBAD: memcheck_ (mem.h:247)
==32307== by 0x11EC18: towire (towire.c:14)
==32307== by 0x11EF19: towire_short_channel_id (towire.c:92)
==32307== by 0x12203E: towire_channel_update (gen_peer_wire.c:918)
==32307== by 0x1148D4: send_channel_update (channel.c:185)
==32307== by 0x1175C5: peer_conn_broken (channel.c:1010)
==32307== by 0x13186F: destroy_conn (poll.c:173)
==32307== by 0x13188F: destroy_conn_close_fd (poll.c:179)
==32307== by 0x13B279: notify (tal.c:235)
==32307== by 0x13B721: del_tree (tal.c:395)
==32307== by 0x13BB3A: tal_free (tal.c:504)
==32307== by 0x130522: io_close (io.c:415)
==32307== Address 0xffefff87d is on thread 1's stack
==32307== in frame #2, created by towire_short_channel_id (towire.c:88)
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
This is simpler than passing back and forth, for the moment at least. That
means we don't need to ask for a new one on reconnect.
This partially reverts the gossip handling in openingd, since it no longer
passes the gossip fd back. We also close it when peer is freed, so it
needs initializing to -1.
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
We can go to release a gossip peer, and it can fail at the same time.
We work around the problem that the reply must be a gossipctl_release_peer_reply
with two fds, but it's not pretty.
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
We kill the existing connection if possible; this may mean simply
forgetting the prior peer altogether if it's in an early state.
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
Instead, send it the funding_signed message; it can watch, save to
database, and send it.
Now the openingd fundee path is a simple request and response, too.
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
Simplifies state machine. Master still has to calculate the tx to get
the signature and broadcast, but now the opening daemon funding path
is a simple request/response.
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
We want to use it in peer_control to generate the transaction, but we
really only need the funding_pubkey.
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
Like the fd, it's only useful when the peer is not in a daemon, so we
free & NULL it when that happens.
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
We steal it when we're closing connection, but we normally want to forget
it if connection just dies.
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
1. We explicitly assert what state we're coming from, to make transitions
clearer.
2. Every transition has a state, even between owners while waiting for HSM.
3. Explictly step though getting the HSM signature on the funding tx
before starting channeld, rather than doing it in parallel: makes
states clearer.
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
We need to do this on every connection, whether reconnecting or not,
so it makes sense for the handshake daemon to handle it and return
the feature fields.
Longer term I'm considering having the handshake daemon handle the
listening and connecting, and simply hand the fds back once the peers
are ready.
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
We currently create a peer struct, then complete handshake to find out
who it is. This means we have a half-formed peer, and worse: if it's
a reconnect we get two peers the same.
Add an explicit 'struct connection' for the handshake phase, and
construct a 'struct peer' once that's done.
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
Now in sync with 8ee57b97738b1e9467a1342ca8373d40f0c4aca5.
Our tool doesn't need to convert them any more, but we actually had a
mis-typed field in the HSM which needed fixing.
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
The single string-based hostname and port has been retired in favor of
having multiple `struct ipaddr`s from the `node_announcement`. This
breaks the hostnames and ports from IRC, but I didn't bother to
backport ipaddr for it since it is only used in the legacy daemon.
Rather a big commit, but I couldn't figure out how to split it
nicely. It introduces a new message from the channel to the master
signaling that the channel has been announced, so that the master can
take care of announcing the node itself. A provisorial announcement is
created and passed to the HSM, which signs it and passes it back to
the master. Finally the master injects it into gossipd which will take
care of broadcasting it.
We alternated between using a sha256 and using a privkey, but there are
numerous places where we have a random 32 bytes which are neither.
This fixes many of them (plus, struct privkey is now defined in terms of
struct secret).
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
Under stress, the tests can mine blocks too soon, and the funding never
locks. This gives more of a chance, at least.
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
We were getting an assert "!secp256k1_fe_is_zero(&ge->x)", because
an all-zero pubkey is invalid. We allow marshal/unmarshal of NULL for
now, and clean up the error handling.
1. Use status_failed if master sends a bad message.
2. Similarly, kill the gossip daemon if it gives a bad reply.
3. Use an array for returned pubkeys: 0 or 2.
4. Use type_to_string(trc, struct short_channel_id, &scid) for tracing.
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
I implemented this because a bug causes us to consider the HTLC malformed,
so I can trivially test it for now.
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
Since we now use the short_channel_id to identify the next hop we need
to resolve the channel_id to the pubkey of the next hop. This is done
by calling out to `gossipd` and stuffing the necessary information
into `htlc_end` and recovering it from there once we receive a reply.
This was overly complex since it was off-by-one and we were storing
some information elsewhere. Now this just loads the route as is into
structs, extracts some information for our outgoing HTLC, and then
shifts by the array of structs by one, and finally fills in the last
instruction, which is the terminal.
The new onion uses the `channel_id` instead of the `node_id` of the
next hop to identify where to forward the payment. So we return the
exact channel chosen by the routing algo, to avoid having to look it
up again later.
Mainly switching from the old include to the new include and adjusting
the actual size of the onion packet. It also moves `channel.c` to use
`struct hop_data`.
It introduces a dummy next hop in `channel.c` that will be replaced in
the next commit.
Adds a new command line flag `--dev-broadcast-interval=<ms>` that
allows us to specify how often the staggered broadcast should
trigger. The value is passed down to `gossipd` via an init message.
This is mainly useful for integration tests, since we do not want to
wait forever for gossip to propagate.
We were using an uninitialized `broadcast_index` on the peer which
would occasionally result in no forwardings at all, segmenting the
network. And during the `msg_queue` refactor, some wait targets were
not updated, resulting in the waits never to be woken up.
This moves all the non-legacy blackbox testing into python.
Before:
real 10m18.385s
After:
real 9m54.877s
Note that this doesn't valgrind the subdaemons: that patch seems to cause
some issues in the python framework which I am still chasing.
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
Rather than dumping all gossip messages then handling local ones again.
This should help us give timely ping replies.
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
This fails on the old dev-restart tests, so we need to only enable it
for the new tests:
rusty@rusty-XPS-13-9360:~/devel/cvs/lightning (guilt/ping-pong)$ daemon/test/test-basic --restart --verbose
...
{ }
RESTARTING
dev-restart failed!
valgrind: mmap(0x38000000, 2265088) failed in UME with error 22 (Invalid argument).
valgrind: this can be caused by executables with very large text, data or bss segments.
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
Only the side *accepting* the connection gives a `minumum_depth`, but both
sides are supposed to wait that long:
BOLT #2:
### The `funding_locked` message
...
#### Requirements
The sender MUST wait until the funding transaction has reached
`minimum-depth` before sending this message.
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>