Commit Graph

76 Commits

Author SHA1 Message Date
Rusty Russell
3414b992a1 lightningd: don't dump core on subdaemon failure.
That tends to dump core over the top of the subdaemon; just exit non-zero.

Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
2018-08-05 02:03:58 +00:00
Rusty Russell
28c3706f87 hsmd: fix missing status messages.
I crashed the HSMD, and it gave no output at all.  That's because we
were only reading the status fd when we were waiting for a reply.

Fix this by using a separate request fd and status fd, which also means
that hsm_sync_read() is no longer required.

Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
2018-07-17 12:32:00 +02:00
Rusty Russell
8739b4cbe8 lighningd: Remove --debug-subdaemon-io.
We can use SIGUSR1, even in non-developer builds.

Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
2018-05-20 02:32:42 +00:00
Rusty Russell
1e282ecb7a subd: record which ones connect to a peer.
This comes in useful for the next patch.

Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
2018-04-26 05:47:57 +00:00
Rusty Russell
ab9d9ef3b8 gossipd: drain fd instead of passing around gossip index.
(This was sitting in my gossip-enchancement patch queue, but it simplifies
this set too, so I moved it here).

In 94711969f we added an explicit gossip_index so when gossipd gets
peers back from other daemons, it knows what gossip it has sent (since
gossipd can send gossip after the other daemon is already complete).

This solution is insufficient for the more general case where gossipd
wants to send other messages reliably, so replace it with the other
solution: have gossipd drain the "gossip fd" which the daemon returns.

This turns out to be quite simple, and is probably how I should have
done it originally :(

Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
2018-04-26 05:47:57 +00:00
ZmnSCPxj
74f3662a3b lightningd/subd.h: Add missing wire/wire.h.
If not included, a source file containing only
`#include<lightningd/subd.h>` will file compilation.
2018-03-26 01:09:59 +00:00
practicalswift
e56eee50c8 Make sure we never pass a negative value to dup2(...) 2018-03-19 09:25:39 +00:00
Rusty Russell
0a6e3d1e13 utils: remove tal_tmpctx altogether, use global.
In particular, we now only free tmpctx at the end of main().

Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
2018-03-16 00:16:10 +00:00
Corné Plooy
b857b2e843 Add assertions in various places to ensure tal_fmt doesn't receive NULL as argument for strings. 2018-03-06 19:26:21 +01:00
Rusty Russell
cca0a5412e subd: clear transient billboard on start and shutdown.
Use NULL on the callback to mean "clear the slot", and call it.

We have do this in two places: the old daemon might die, or the new
daemon might start first.

Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
2018-02-23 18:02:00 +01:00
Rusty Russell
26b004e5af subd: handle status_peer_billboard messages from subdaemons.
We use a callback which updates the appropriate slot.

Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
2018-02-23 18:02:00 +01:00
practicalswift
91a9c2923f Mark intentionally unused parameters as such (with "UNUSED") 2018-02-22 01:09:12 +00:00
Rusty Russell
e92b710406 tools/generate-wire.py: remove length argument from fromwire_ routines.
We always hand in "NULL" (which means use tal_len on the msg), except
for two places which do that manually for no good reason.

Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
2018-02-20 22:36:21 +01:00
Rusty Russell
eca55cee3c subd: handle stdin being closed (eg. --daemon).
We need to do a more complex dance if stdin was important.

Fixes: #1016
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
2018-02-20 21:42:53 +01:00
Rusty Russell
02d469b3d4 peer_failed: hand fds back to master when we fail.
master now hands it back to gossipd.

Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
2018-02-19 02:56:51 +00:00
Rusty Russell
f76ff90485 status: split off error messages into a new 'peer_status' type.
Several daemons (onchaind, hsm) want to use the status messages, but
don't communicate with peers.  The coming changes made them drag in
more code they didn't need, so instead we have a different
non-overlapping type.

We combine the status_received_errmsg and status_sent_errmsg
into a single status_peer_error, with the presence or not of the
'error_for_them' field indicating direction. 

We also rename status_fatal_connection_lost() to
peer_failed_connection_lost() to fit in.

Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
2018-02-19 02:56:51 +00:00
Rusty Russell
d2f691b288 subd: make functions more generic, don't assume 'struct channel'.
This means the caller needs to supply an explicit log to base the
subd log on, and also a callback for error handling.

The callback is kind of ugly, but it gets reworked towards the end
of this series.

Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
2018-02-19 02:56:51 +00:00
Rusty Russell
55d962046b Rename (almost) all destructors to destroy_<type>.
We usually did this, but sometimes they were named after what they did,
rather than what they cleaned up.

There are still a few exceptions:
1. I didn't bother creating destroy_xxx wrappers for htable routines
   which already existed.
2. Sometimes destructors really are used for side-effects (eg. to simply
   mark that something was freed): these are clearer with boutique names.
3. Generally destructors are static, but they don't need to be: in some
   cases we attach a destructor then remove it later, or only attach
   to *some* cases.  These are best with qualifiers in the destroy_<type>
   name.

Suggested-by: @ZmnSCPxj
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
2018-02-14 11:31:58 +01:00
Rusty Russell
409fef582d subd: keep pointer to channel, not peer.
This rolls through many other functions, making them take channel not peer.

Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
2018-02-14 11:31:58 +01:00
Rusty Russell
b7680412e3 lightningd: rename peer_fail functions to channel_fail.
And move them into channel.c.

Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
2018-02-14 11:31:58 +01:00
Rusty Russell
32411de90e lightningd: split struct peer into struct peer and struct channel.
Much like the database; peer contains id, address, channel contains
per-channel information.  Where we create a channel, we always create
the peer too.

For the moment, peer->log and channel->log coexist side-by-side, to
reduce some of the churn.

Note that this changes the API to dev-forget-channel: if we have more
than one channel, we insist they specify the short-channel-id.

Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
2018-02-14 11:31:58 +01:00
Rusty Russell
cc9ca82821 status: separate types for peer failure vs "impossible" failures.
Ideally we'd rename status_failed() to status_fatal(), but that's
too much churn for now.

Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
2018-02-08 19:07:12 +01:00
Rusty Russell
fd498be7ca status: generate messages rather than marshal/unmarshal manually.
Now we have wirestring, this is much more natural.  And with the
24M length limit, we needn't be so concerned about dumping 64k peer
messages in hex.

Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
2018-02-08 19:07:12 +01:00
Rusty Russell
c01f3267d5 common: only log io if they set --debug-subdaemon-io=<daemon> or with SIGUSR1.
Otherwise we just log the type of msg.

Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
2018-02-07 00:46:49 +00:00
Rusty Russell
84bf60f934 status: add multiple levels of logging.
status_trace maps to status_debug.

Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
2018-02-07 00:46:49 +00:00
Rusty Russell
57b423625b subd: use peer log for messages (if any).
This makes much more sense when you ask for a specific peer's log.
Also, we put the peerid rather than pid ().

Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
2018-01-30 19:47:27 +00:00
Carl Dong
8da65854f0 build: Add needed UNIX standard includes. 2018-01-23 16:10:19 +01:00
practicalswift
e91a8dff12 Change log level for some common debug messages from "info" to "debug" 2018-01-16 03:20:27 +00:00
Rusty Russell
c66df31674 subd: pass absolute path as argv[0].
This means we print out the correct path with --debugger, which
can be vital if there are multiple binaries (eg. compiled vs installed).

Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
2018-01-15 19:26:08 +00:00
practicalswift
a900551815 Use tal_hex(...) instead of tal_hexstr(...) 2018-01-12 00:55:46 +00:00
practicalswift
4bdd2452f2 Make sure fsync, connect and close are never accidentally passed negative arguments 2018-01-09 14:50:50 +01:00
practicalswift
dcb4039a96 Check lseek(...) return value 2018-01-09 13:52:12 +01:00
Rusty Russell
ba22484901 lightningd: simplify permanent failure.
Turns out everyone wanted a formatted string anyway.

Inspired-by: practicalswift
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
2018-01-03 19:56:28 +00:00
Rusty Russell
b83ac58a98 subd: if a required daemon exits, wait instead of killing it.
Otherwise we always say it died because we killed it, so we don't get
the exit status.

Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
2017-12-17 02:44:20 +00:00
Rusty Russell
6b232de7b1 openingd: return to master for more gossip when negotiation fails.
We can open other channels, if we want.

Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
2017-12-17 02:44:20 +00:00
Rusty Russell
899bf3fde9 subd: add transaction to subd exit corner case.
As demonstrated in the test at the end of this series, openingd dying
spontaneously causes the conn to be freed which causes the subd to be
destroyed, which fails the peer, which hits the db.

Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
2017-12-17 02:44:20 +00:00
Rusty Russell
3a596d6dda subd: wrap all message callbacks in a transaction.
Including destructors.

Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
2017-11-06 10:24:34 +01:00
Rusty Russell
3c6eec87e3 Add DEVELOPER flag, set by default.
This is a bit messier than I'd like, but we want to clearly remove all
dev code (not just have it uncalled), so we remove fields and functions
altogether rather than stub them out.  This means we put #ifdefs in callers
in some places, but at least it's explicit.

We still run tests, but only a subset, and we run with NO_VALGRIND under
Travis to avoid increasing test times too much.

See-also: #176
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
2017-10-26 12:53:09 +02:00
Rusty Russell
0b953b86fe subd: automatically detect if callback frees subd.
This involves a tricky callback internally, but far less error-prone.

Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
2017-10-20 18:31:32 +02:00
Rusty Russell
5a256c724a subd: simplify and cleanup lifetime handling.
There are now only two kinds of subdaemons: global ones (hsmd, gossipd) and
per-peer ones.  We can handle many callbacks internally now.

We can have a handler to set a new peer owner, and automatically do
the cleanup of the old one if necessary, since we now know which ones
are per-peer.

Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
2017-10-20 18:31:32 +02:00
Rusty Russell
a117d595a4 subd: allow callbacks to free sd.
We'll need this for the next patch; we'll be freeing the old subd whenever
peer->owner changes.

Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
2017-10-20 18:31:32 +02:00
Rusty Russell
f83ee6d5ea dev_disconnect: don't permfail more than once.
The coming tests trigger this latent bug under travis.

Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
2017-10-20 18:31:32 +02:00
Rusty Russell
871d0b1d74 lightningd: simplify peer destruction.
We have to do a dance when we get a reconnect in openingd, because we
don't normally expect to free both owner and peer.  It's a layering
violation: freeing a peer should clean up the owner's pointer to it,
to avoid a double free, and we can eliminate this dance.

The free order is now different, and the test_reconnect_openingd was
overprecise.

Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
2017-10-20 18:31:32 +02:00
Rusty Russell
61786b9c90 subd: don't leak fds if we fail to create subdaemon.
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
2017-10-20 18:31:32 +02:00
Rusty Russell
b7bb0be944 subd: remove context arg, as we're always owned by lightningd.
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
2017-09-28 15:37:43 +02:00
Rusty Russell
f082c7b80e lightningd: add FIXMEs for future work.
Suggested-by: Christian Decker
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
2017-09-28 13:07:05 +09:30
Rusty Russell
72b215f6fe Make all internal message numbers unique.
We were sending a channeld message to onchaind, which was v. confusing
due to overlap.  We make all the numbers distinct, which means we can
also add an assert() that it's valid for that daemon, which catches
such errors immediately.

Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
2017-09-28 13:07:05 +09:30
Rusty Russell
ec63c0d10b lightningd: give option to crash if a subdaemon fails.
Either when it exits with a signal, or sends an error status message.
Then we make test_lightningd.py use it.

Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
2017-09-12 23:00:53 +02:00
Rusty Russell
ef28b6112c status: use common status codes for all the failures.
This change is really to allow us to have a --dev-fail-on-subdaemon-fail option
so we can handle failures from subdaemons generically.

It also neatens handling so we can have an explicit callback for "peer
did something wrong" (which matters if we want to close the channel in
that case).

Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
2017-09-12 23:00:53 +02:00
Rusty Russell
153c622157 lightningd: remove lightningd_state.
Some fields were redundant, some are simply moved into 'struct lightningd'.
All routines updated to hand 'struct lightningd *ld' now.

Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
2017-08-29 17:54:14 +02:00