This means we print out the correct path with --debugger, which
can be vital if there are multiple binaries (eg. compiled vs installed).
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
As demonstrated in the test at the end of this series, openingd dying
spontaneously causes the conn to be freed which causes the subd to be
destroyed, which fails the peer, which hits the db.
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
This is a bit messier than I'd like, but we want to clearly remove all
dev code (not just have it uncalled), so we remove fields and functions
altogether rather than stub them out. This means we put #ifdefs in callers
in some places, but at least it's explicit.
We still run tests, but only a subset, and we run with NO_VALGRIND under
Travis to avoid increasing test times too much.
See-also: #176
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
There are now only two kinds of subdaemons: global ones (hsmd, gossipd) and
per-peer ones. We can handle many callbacks internally now.
We can have a handler to set a new peer owner, and automatically do
the cleanup of the old one if necessary, since we now know which ones
are per-peer.
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
We have to do a dance when we get a reconnect in openingd, because we
don't normally expect to free both owner and peer. It's a layering
violation: freeing a peer should clean up the owner's pointer to it,
to avoid a double free, and we can eliminate this dance.
The free order is now different, and the test_reconnect_openingd was
overprecise.
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
We were sending a channeld message to onchaind, which was v. confusing
due to overlap. We make all the numbers distinct, which means we can
also add an assert() that it's valid for that daemon, which catches
such errors immediately.
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
Either when it exits with a signal, or sends an error status message.
Then we make test_lightningd.py use it.
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
This change is really to allow us to have a --dev-fail-on-subdaemon-fail option
so we can handle failures from subdaemons generically.
It also neatens handling so we can have an explicit callback for "peer
did something wrong" (which matters if we want to close the channel in
that case).
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
Some fields were redundant, some are simply moved into 'struct lightningd'.
All routines updated to hand 'struct lightningd *ld' now.
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
Also, we split the more sophisticated json_add helpers to avoid pulling in
everything into lightning-cli, and unify the routines to print struct
short_channel_id (it's ':', not '/' too).
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
We had a terrible hack in gossip when a peer didn't exist. Formalize
a pattern when code+200 is a failure (with no fds passed), and use it
here.
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
This matters in one case: channeld receiving a bad message is a
permenant failure, whereas losing a connection is transient.
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
We use a file descriptor, so when we consume an entry, we move past it
(and everyone shares a file offset, so this works).
The file contains packet names prefixed by - (treat fd as closed when
we try to write this packet), + (write the packet then ensure the file
descriptor fails), or @ ("lose" the packet then ensure the file
descriptor fails).
The sync and async peer-write functions hook this in automatically.
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
Header from folded patch 'test-run-cryptomsg__fix_compilation.patch':
test/run-cryptomsg: fix compilation.
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
If a peer dies, and then we get a reply, that can cause access after free.
The usual way to handle this is to make the request a child of the peer,
but in fact we still want to catch (and disard) it, so it's a little
more complex internally.
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
The STDOUT fd being reused as communication sockets with other daemons
was causing some unexpected crashes if the sub-daemon wrote something,
e.g., using `log_*`. Not closing it should avoid that conflict.
Instead of indicating where to place the fd, you say how many: the
fd array gets passed into the callback.
This is also clearer for the users.
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
subd_req() needs to get the type before it calls subd_send_msg, because
if it's take() then msg_enqueue() may reallocate.
Which also made me realize that subd_send_message() should not try to dup,
since msg_enqueue() handles that itself.
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
We have some duplication in handling queues, so this is an attempt at
deduplicating some of that work. `daemon_conn` now uses the
`msg_queue` and `channeld` was also migrated to `msg_queue`. At the
same time I made `msg_queue` create a copy of the messages or takes
over messages marked with `take()`. This should make cleaning up
messages easier.
This uses a single fd for both status and control.
To make this work, we enforce the convention that replies are the same
as requests + 100, and that their name ends in "_REPLY".
This also means that various daemons can simply exit when done; there's
no race between reading request and closing status fds.
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>