Under stress, the tests can mine blocks too soon, and the funding never
locks. This gives more of a chance, at least.
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
We were getting an assert "!secp256k1_fe_is_zero(&ge->x)", because
an all-zero pubkey is invalid. We allow marshal/unmarshal of NULL for
now, and clean up the error handling.
1. Use status_failed if master sends a bad message.
2. Similarly, kill the gossip daemon if it gives a bad reply.
3. Use an array for returned pubkeys: 0 or 2.
4. Use type_to_string(trc, struct short_channel_id, &scid) for tracing.
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
I implemented this because a bug causes us to consider the HTLC malformed,
so I can trivially test it for now.
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
Since we now use the short_channel_id to identify the next hop we need
to resolve the channel_id to the pubkey of the next hop. This is done
by calling out to `gossipd` and stuffing the necessary information
into `htlc_end` and recovering it from there once we receive a reply.
This was overly complex since it was off-by-one and we were storing
some information elsewhere. Now this just loads the route as is into
structs, extracts some information for our outgoing HTLC, and then
shifts by the array of structs by one, and finally fills in the last
instruction, which is the terminal.
The new onion uses the `channel_id` instead of the `node_id` of the
next hop to identify where to forward the payment. So we return the
exact channel chosen by the routing algo, to avoid having to look it
up again later.
Mainly switching from the old include to the new include and adjusting
the actual size of the onion packet. It also moves `channel.c` to use
`struct hop_data`.
It introduces a dummy next hop in `channel.c` that will be replaced in
the next commit.
Adds a new command line flag `--dev-broadcast-interval=<ms>` that
allows us to specify how often the staggered broadcast should
trigger. The value is passed down to `gossipd` via an init message.
This is mainly useful for integration tests, since we do not want to
wait forever for gossip to propagate.
We were using an uninitialized `broadcast_index` on the peer which
would occasionally result in no forwardings at all, segmenting the
network. And during the `msg_queue` refactor, some wait targets were
not updated, resulting in the waits never to be woken up.
This moves all the non-legacy blackbox testing into python.
Before:
real 10m18.385s
After:
real 9m54.877s
Note that this doesn't valgrind the subdaemons: that patch seems to cause
some issues in the python framework which I am still chasing.
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
Rather than dumping all gossip messages then handling local ones again.
This should help us give timely ping replies.
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
This fails on the old dev-restart tests, so we need to only enable it
for the new tests:
rusty@rusty-XPS-13-9360:~/devel/cvs/lightning (guilt/ping-pong)$ daemon/test/test-basic --restart --verbose
...
{ }
RESTARTING
dev-restart failed!
valgrind: mmap(0x38000000, 2265088) failed in UME with error 22 (Invalid argument).
valgrind: this can be caused by executables with very large text, data or bss segments.
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
Only the side *accepting* the connection gives a `minumum_depth`, but both
sides are supposed to wait that long:
BOLT #2:
### The `funding_locked` message
...
#### Requirements
The sender MUST wait until the funding transaction has reached
`minimum-depth` before sending this message.
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
We now have two partially overlapping state-machines: the channel
state and the announcement state. We need to request signatures from
the HSM to exchange them with the peer, and we need to have both sets
of signatures before we can proceed and send the actual announcements.
Instead of reusing HSMFD_ECDH, we have an explicit channeld hsm fd,
which can do ECDH and will soon do channel announce signatures as well.
Based-on: Christian Decker <decker.christian@gmail.com>
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
We *should* split the struct into key and data, rather than only comparing
the key parts in the htlc_end_eq function. But meanwhile, this fixes
the code.
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
This lets us link HTLCs from one peer to another; but for the moment it
simply means we can adjust balance when an HTLC is fulfilled.
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
This is an approximate result (it's only our confirmed balance, not showing
outstanding HTLCs), but it gives an easy way to check HTLCs have been
resolved.
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
If a peer dies, and then we get a reply, that can cause access after free.
The usual way to handle this is to make the request a child of the peer,
but in fact we still want to catch (and disard) it, so it's a little
more complex internally.
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
We call channel_sent_commit *before* sending (so we know if we need
to), so the name is wrong. Similarly channel_sent_revoke_and_ack.
We can usefully have them tell is if there is outstanding work to do,
too.
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
Passing through 'struct peer *' was a layering violation.
Reported-by: Christian Decker <decker.christian@gmail.com>
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
The three cases we care about only happen on specific transitions:
1. They can no longer spend our failed HTLC: we can fail the source now.
2. They are fully committed to their new HTLC htlc: we can forward now.
3. They can no longer timeout their fulfilled HTLC: the funds are ours.
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
The direction bit was computed in several spots and was inconsistent
in some cases. Now we compute it just in routing, and once when
starting up `channeld`, this avoids recomputing it all over the place.
Now we correctly use the remote revocation basepoint, we need to set
it in run-channel (instead of the local revocation basepoint).
We also update all the comments, as per (pending) spec commit:
https://github.com/lightningnetwork/lightning-rfc/pull/137
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
Before exiting, `channeld` constructs and sends a `channel_update`
marking the channel as disabled. This is the pro-active signalling
that the channel may no longer be used.
Copied the JSON-request parsing from `pay.c`, passing through to
`gossipd`, filling the reply with the `route_hop` serialization, and
serializing as JSON-RPC response.
The `route_hop` struct introduced in the previous refactoring is
reused when returning the reply to a `getroute` request. Since these
are nested messages I added the serialization and deserialization
methods.
This came up while debugging the gossip daemon breaking upon calling
`getroute`. It turns out that log was still writing to stdout, but
stdout had been reused for an inter-daemon socket, which would
break...
The STDOUT fd being reused as communication sockets with other daemons
was causing some unexpected crashes if the sub-daemon wrote something,
e.g., using `log_*`. Not closing it should avoid that conflict.
Some of the struct array helpers need to allocate data when
deserializing their fields. The `getnodes` reply is one such example
that allocates the hostname. Since the change to calling array helpers
the getnodes call was broken because it was attempting to allocate off
of the entry, which did not have a tal header, thus failing.
Use msg_enqueue's wake and msg_queue_wait, and don't clone packets since
msg_enqueue() respects take.
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
We remove the unused status_send_fd, and rename status_send_sync (it
should only be used for that case now).
We add a status_setup_async(), and wire things internally to use that
if it's set up: status_setup() is renamed status_setup_sync().
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
This is a little more awkward, as we used to do some work
synchronously (the init message), but it's still pretty clear.
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
Since we use async IO, we can't use status_send. We keep a pointer to the
master daemon_conn, and use that to send.
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
The gossip subdaemon previously passed the fd after init: this is
unnecessary for peers which simply want to gossip (and not establish
channels).
Now we hand the gossip fd back with the peer fd. This adds another
error message for when we fail to create the gossip fds.
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
Instead of indicating where to place the fd, you say how many: the
fd array gets passed into the callback.
This is also clearer for the users.
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
This measn that gossip (which also wants to wake it) needs to wake
the queue, not the daemon_conn.
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
We use the fourth value (size) to determine the type, unless the fifth
value is suppled. That's silly: allow the fourth value to be a typename,
since that's the only reason we care about the size at all!
Unfortunately there are places in the spec where we use a raw fieldname
without '*1' for a length, so we have to distingish this from the
typename case.
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
Except for the trivial case of u8 arrays, have the generator create
the loop code for the array iteration.
This removes some trivial helpers, and avoids us having to write more.
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
subd_req() needs to get the type before it calls subd_send_msg, because
if it's take() then msg_enqueue() may reallocate.
Which also made me realize that subd_send_message() should not try to dup,
since msg_enqueue() handles that itself.
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
We have some duplication in handling queues, so this is an attempt at
deduplicating some of that work. `daemon_conn` now uses the
`msg_queue` and `channeld` was also migrated to `msg_queue`. At the
same time I made `msg_queue` create a copy of the messages or takes
over messages marked with `take()`. This should make cleaning up
messages easier.
This allows us to break out of the normal queue-based write loop and
handle things ourself for a while. Currently this is used to trigger
regular gossip dumps that do not proceed until the buffers have been
cleared in order to avoid memory-explosions.
We were firing off the wakeup timers all over the place, out of fear
that we would be triggering two concurrent broadcasts. This is not
really the case since the wakeup calls are idempotent. This also
allows us not to differentiate between triggering a broadcast on a
local peer or on a proxied peer.
This includes some code duplication, but since the two write targets
are fundamentally different we might need to refactor a bit more to
unify them again.
We will eventually ween off of the logging, or replace it with status
messages that log in `lightningd`, but for now we still have the
routing module that does some logging.
This uses a single fd for both status and control.
To make this work, we enforce the convention that replies are the same
as requests + 100, and that their name ends in "_REPLY".
This also means that various daemons can simply exit when done; there's
no race between reading request and closing status fds.
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
This was included in the lightningnetwork/lightning-rfc#105 update
to the test vectors, and it's a good idea. Takes a bit of work to
calculate (particularly, being aware of rounding issues).
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
aka "BOLT 3: Use revocation key hash rather than revocation key",
which builds on top of lightningnetwork/lightning-rfc#105 "BOLT 2,3,5:
Make htlc outputs of the commitment tx spendable with revocation key".
This affects callers, since they now need to hand us the revocation
pubkey, but commit_tx has that already anyway.
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
All the daemons will use a common seed for point derivation, so drag
it out of lightningd/opening.
This also provide a nice struct wrapper to reduce argument count.
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
We should start watching for the transaction before we send the
signature; we might miss it otherwise. In practice, we only see
transactions as they enter a block, so it won't happen, but be
thorough.
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
This is a bit tricky: for our signing code, we don't want scriptsigs,
but to calculate the txid, we need them. For most transactions in lightning,
they're pure segwit so it doesn't matter, but funding transactions can
have P2SH-wrapped P2WPKH inputs.
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
built_utxos needs to calculate fees for figuring out how many utxos to
use, so fix that logic and rely on it.
We make build_utxos return a pointer array, so funding_tx can simply hand
that to permute_inputs.
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
The spec 4af8e1841151f0c6e8151979d6c89d11839b2f65 uses a 32-byte 'channel-id'
field, not to be confused with the 8-byte short ID used by gossip. Rename
appropriately, and update to the new handshake protocol.
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
We use a different 'struct peer' in the new daemons, so make sure
the structure isn't assumed in any shared files.
This is a temporary shim.
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
Only minor changes, but I add some more spec text to
lightningd/test/run-commit_tx.c to be sure to catch if it changes
again.
One reference isn't upstream yet, so had to be commented out.
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
Now we've tested it:
1. open_channel needs to write response to REQ_FD not STATUS_FD.
2. recv_channel needs to send our next_per_commit, not echo theirs!
3. print the problematic signature if it's wrong, not our own.
Cleanups:
1. Return the message from open_channel/recv_channel for simplicity.
2. Trace signing information.
3. More tracing messages.
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
The signing helper was really just for testing, so remove it. But
turn the funding_tx() function into a useful one by making it take the
utxo array.
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
I made it privkey to prove we owned one key, but without the HSM checking
we have a valid sig for the first commitment transaction, and that
we haven't revealed the revocation secret key, why bother?
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
Or for blackbox tests --gdb1=<subdaemon> / --gdb2=<subdaemon>.
This makes the subdaemon wait as soon as it's execed, so we can attach
the debugger.
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
We should check that the peer it says it's returning is under its control,
we need to take back the peer fd, and use the correct conversion routine
for the packet it sends us.
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
For the moment this is simply handed through to lightningd for
generating the per-peer secrets; eventually the HSM should keep it and
all peer secret key operations would be done via HSM-ops.
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
Raw crypto_state is what we send across the wire: the peer one is for
use in async crypto io routines (peer_read_message/peer_write_message).
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
The requirements for accepting the remote config are more complex than
a simple min/max value, as various parameters are related. It turns
out that with a few assumptions, we can boil this down to:
1. The valid feerate range.
2. The minimum effective HTLC throughput we want
3. The a maximum delay we'll accept for us to redeem.
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
Unless the transaction is confirmed, the UTXOs should be released if
something happens to the peer.
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
wire_sync_write() adds length, but we already have it, so use write_all.
sync_crypto_read() handed an on-stack buffer to cryptomsg_decrypt_header,
which expected a tal() pointer, so use the known length instead.
sync_crypto_read() also failed to read the tag; add that in (no
overflow possible as 16 is an int, len is a u16).
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
The peer is woken up every 30 seconds to deliver the backlog of
messages. Additionally I added the normal message queue to be able to
send non-gossip message to the peer.
Turns out we want to permute transactions for the wallet too, so we
use void ** rather than assume we're shuffling htlc ** (and do inputs,
too!).
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
This object is basically the embodyment of BOLT #2. Each HTLC already
knows its own state; this moves them between states and keeps them
consistent.
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
It's currently written to produce "local" commit-txs, but of course we
need to produce remote ones too, for signing.
Thus instead of using "remote" and "local" we use "other" and "self",
and indicate with a single "side" flag which we're generating (because
that changes how HTLCs are interpreted).
This also adds to the tests: generate the remote view of the commit_tx
and make sure it matches!
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
We were using the remote per_commitment_point instead of the local
per_commitment_point to generate the remotekey for the local transaction.
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
It's awkward to handle them differently. But this change means we
need to expose them to the generated code.
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
We used to have a permutation map; this reintroduces a variant which
uses the htlc pointers directly.
We need this because we have to send the htlc-tx signatures in output
order as part of the protocol: without two-stage HTLCs we only needed
to wire them up in the unilateral spend case so we simply brute-forced
the ordering.
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
Moved the broadcast functionality to broadcast.[ch]. So far this
includes only the enqueuing side of broadcasts, the dequeuing and
actual push to the peer is daemon dependent. This also adds the
broadcast_state to the routing_state and the last broadcast index to
the peer for the legacy daemon.
This used to be part of `lightningd_state` which is being split up for
the various subdaemons. The main change is the addition of the `struct
routing_state` in `routing.h` and the addition of `rstate` in `struct
lightningd_state` for backwards compatibility.
We can't run them in parallel, but we can at least have 'make check'
run them all.
Developers should be running "make check-source && make check".
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
The problem with wire headers not being generated in time before stuff
depended on it turns out to be related with inclusion order of
sub-makefiles. The inclusions must preceed the use of
LIGHTNINGD_HEADERS since they append to that variable.
Now we hand peers off to the gossip daemon, to do the INIT handshake and
re-transmit/receive gossip. They may stay there forever if neither we nor
them wants to open a channel.
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
It's a bit messy, since some status messages are accompanied by an FD:
in this case, the handler returns STATUS_NEED_FD and we read that then
re-call the handler.
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
These use the same infrastructure as the daemon/test blackbox tests,
so they're not currently wired into make check; use make
"lightningd-blackbox-tests".