Makes it easier to compare before/after failures. Ideally, we should
run under Travis both with this option and with the seed based on the
entire tmp path (which is still reproducible with determination, but
not fixed every run like this is).
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
There are now only two kinds of subdaemons: global ones (hsmd, gossipd) and
per-peer ones. We can handle many callbacks internally now.
We can have a handler to set a new peer owner, and automatically do
the cleanup of the old one if necessary, since we now know which ones
are per-peer.
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
We currently rely on a zero exit status. That's the only difference between
onchain finished handling and other per-peer daemons, so instead we should
have an explicit "done" message. This is both clearer, and allows us to
unify.
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
Now the flow is much simpler from a lightningd POV:
1. If we want to connect to a peer, just send gossipd `gossipctl_reach_peer`.
2. Every new peer, gossipd hands up to lightningd, with global/local features
and the peer fd and a gossip fd using `gossip_peer_connected`
3. If lightningd doesn't want it, it just hands the peerfd and global/local
features back to gossipd using `gossipctl_handle_peer`
4. If a peer sends a non-gossip msg (eg `open_channel`) the gossipd sends
it up using `gossip_peer_nongossip`.
5. If lightningd wants to fund a channel, it simply calls `release_channel`.
Notes:
* There's no more "unique_id": we use the peer id.
* For the moment, we don't ask gossipd when we're told to list peers, so
connected peers without a channel don't appear in the JSON getpeers API.
* We add a `gossipctl_peer_addrhint` for the moment, so you can connect to
a specific ip/port, but using other sources is a TODO.
* We now (correctly) only give up on reaching a peer after we exchange init
messages, which changes the test_disconnect case.
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
We're going to make the ip/port optional, so they should go at the end.
In addition, using ip:port is nicer, for gethostbyaddr().
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
We have to do a dance when we get a reconnect in openingd, because we
don't normally expect to free both owner and peer. It's a layering
violation: freeing a peer should clean up the owner's pointer to it,
to avoid a double free, and we can eliminate this dance.
The free order is now different, and the test_reconnect_openingd was
overprecise.
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
This fixes the only case where the master currently has to write directly
to the peer: re-sending an error. We make gossipd do it, by adding
a new gossipctl_fail_peer message.
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
In particular, the main daemon needs to pass it about (marshal/unmarshal)
but it won't need to actually use it after the next patch.
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
We pull them from the database on-demand, where we're storing them
anyway. No need to keep them in memory as well.
Signed-off-by: Christian Decker <decker.christian@gmail.com>
No idea why we were iterating over the list of stubs and then passing
in the index instead of a pointer to the stub directly.
Signed-off-by: Christian Decker <decker.christian@gmail.com>
This wires in the loading of `struct htlc_stub`s on-demand when
starting `onchaind` so that we don't need to keep them in memory.
Signed-off-by: Christian Decker <decker.christian@gmail.com>
So far we were tracking the status by including it either in the paid
or the unpaid list. This refactor makes the state explicit, which
matches the planned DB schema much better.
Signed-off-by: Christian Decker <decker.christian@gmail.com>
While loading HTLCs from the database we might not yet have all the
incoming HTLCs loaded when loading a dependent htlc_out. So we defer
the wiring of the HTLCs until we are sure we have them loaded.
This is also the first step towards keeping that association only in
the database, since otherwise we cannot selectively load channels from
DB.
Signed-off-by: Christian Decker <decker.christian@gmail.com>
Especially when testing we might want to disable the automatic
reconnection logic in order not to masquerade bugs that disappear when
reconnecting.
Signed-off-by: Christian Decker <decker.christian@gmail.com>
Seems to go out to lunch on reorgs:
+136792.168286138 lightningd(9465):BROKEN: bitcoin-cli getchaintips exited 28: 'error code: -28
error message:
Rewinding blocks...
Closes: #286
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
We don't hit this in testing, since we wait for startup already. Hacking
tests to avoid that, I tested this code by hand.
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
Using pc after free in the pay_command_destroyed destructor, so
we just steal cmd onto pc so free order is the one we want.
[ Edit: expanded comment, split commit ]
Signed-off-by: Christian Decker <decker.christian@gmail.com>
So far only happens during normal shutdown, but it may happen in other
cases as well. We simply define a new destructor that unregisters the
`cmd` from the `jcon`.
Signed-off-by: Christian Decker <decker.christian@gmail.com>
These were fun to hunt down. The jcon and the conn are allocated off
of ld, so the free order is unspecified and if conn is freed before
conn then the finish_jcon destructor uses conn after free.
[ Edit: split commit, modified to use a destructor directly on jcon,
which is more robust than relying on it only being freed via conn --RR ]
Signed-off-by: Christian Decker <decker.christian@gmail.com>
peer_fail_permanent() frees peer->owner, but for bad_peer() we're
being called by the sd->badpeercb(), which then goes on to
io_close(conn) which is a child of sd.
We need to detach the two for this case, so neither tries to free the
other.
This leads to a corner case when the subd exits after the peer is gone:
subd->peer is NULL, so we have to handle that too.
Fixes: #282
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
We have a race where we start onchaind, but state is unchanged, so checks
like peer_control.c's:
peer_ready = (peer->owner && peer->state == CHANNELD_AWAITING_LOCKIN);
if (!peer_ready) {
log_unusual(peer->log,
"Funding tx confirmed, but peer state %s %s",
peer_state_name(peer->state),
peer->owner ? peer->owner->name : "unowned");
} else {
subd_send_msg(peer->owner,
take(towire_channel_funding_locked(peer,
peer->scid)));
}
Can send to the wrong daemon.
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
We were sending a channeld message to onchaind, which was v. confusing
due to overlap. We make all the numbers distinct, which means we can
also add an assert() that it's valid for that daemon, which catches
such errors immediately.
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
We re-use the value for reasonable_depth given by the master, and we
tell it when our timeout transactions reach that depth.
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
When we see an offered HTLC onchain, we need to use the preimage if we
know it. So we dump all the known HTLC preimages at startup, and send
new ones as we discover them.
This doesn't cover preimages we know because we're the final
recipient; that can happen if an HTLC hasn't been irrevocably
committed yet. We'll do that in a followup patch.
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
If the HSM is slow it might happen that the timestamp has changed the
second time we come around, so we generate the timestamp externally
and pass it in so we're sure it won't change between calls.
Reported-by: Rusty Russell
Signed-off-by: Christian Decker <decker.christian@gmail.com>
lightningd can crash on shutdown if it's in the middle of getchaintips;
we free the conn, the finished callback is called (process_chaintips),
and it reports that it received an empty result.
The simplest fix is to set a flag in the struct bitcoind destructor,
and avoid the callback.
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
Either when it exits with a signal, or sends an error status message.
Then we make test_lightningd.py use it.
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
This change is really to allow us to have a --dev-fail-on-subdaemon-fail option
so we can handle failures from subdaemons generically.
It also neatens handling so we can have an explicit callback for "peer
did something wrong" (which matters if we want to close the channel in
that case).
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
1. Remove reference to old $(LIGHTNINGD_OLD_LIB_OBJS) var (in handshaked too).
2. Make check depend directly on unit tests, insteadof weird lightningd/tests
variable.
3. check-source-bolt and check-whitespace are automatic for $(ALL_TEST_PROGRAMS)
so we don't need them here.
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
This is the step where we broadcast the transaction to the network and
a nice place to extract the change from the transaction.
Signed-off-by: Christian Decker <decker.christian@gmail.com>
It's no longer used and we definitely do not want to run with an
outdated or future db, so we'll terminate if we can't upgrade or
the version is newer than what we understand.
Signed-off-by: Christian Decker <decker.christian@gmai.com>
So far we were always using the deadline in the announcements, that's
obviously not good, so this introduces the parameter as per spec.
Signed-off-by: Christian Decker <decker.christian@gmail.com>
We weren't killing it. Eventually it would die, and peer_owner_finished()
would access subd->peer->owner, but that peer was freed already.
Closes: #261
Reported-by: Christian Decker <decker.christian@gmail.com>
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
To reproduce the next bug, I had to ensure that one node keeps thinking it's
disconnected, then the other node reconnects, then the first node realizes
it's disconnected.
This code does that, adding a '0' dev-disconnect modifier. That means
we fork off a process which (due to pipebuf) will accept a little
data, but when the dev_disconnect file is truncated (a hacky, but
effective, signalling mechanism) will exit, as if the socket finally
realized it's not connected any more.
The python tests hang waiting for the daemon to terminate if you leave
the blackhole around; to give a clue as to what's happening in this
case I moved the log dump to before killing the daemon.
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
In this case, we unset the old subd->peer, then freed subd.
peer_owner_finished dereferenced subd->peer->owner, and boom:
test_disconnect_funder (__main__.LightningDTests) ... Fatal signal 11. Log dumped in crash.log
------------------------------- Valgrind errors --------------------------------
Valgrind error file: valgrind-errors.2882
==2882== Invalid read of size 8
==2882== at 0x413F74: peer_owner_finished (peer_control.c:679)
==2882== by 0x41EA2C: destroy_subd (subd.c:381)
==2882== by 0x459700: notify (tal.c:240)
==2882== by 0x459BB1: del_tree (tal.c:400)
==2882== by 0x459FC0: tal_free (tal.c:509)
==2882== by 0x413796: peer_reconnected (peer_control.c:493)
==2882== by 0x413A6A: add_peer (peer_control.c:592)
==2882== by 0x40ED1F: handshake_succeeded (new_connection.c:186)
==2882== by 0x41E3DD: sd_msg_reply (subd.c:262)
==2882== by 0x41E6BB: sd_msg_read (subd.c:318)
==2882== by 0x41E4E6: read_fds (subd.c:283)
==2882== by 0x44DEB4: next_plan (io.c:59)
==2882== Address 0x838 is not stack'd, malloc'd or (recently) free'd
==2882==
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
The logic of dispatching the announcement_signatures message was
distributed over several places and daemons. This aims to simplify it
by moving it all into `channeld`, making peer_control only report
announcement depth to `channeld`, which then takes care of the
rest. We also do not reuse the funding_locked tx watcher since it is
easier to just fire off a new watcher with the specific purpose of
waiting for the announcement_depth.
Signed-off-by: Christian Decker <decker.christian@gmail.com>
1. The code to skip over padding didn't take into account max.
2. It also didn't use symbolic names.
3. We are not supposed to fail on unknown addresses, just stop parsing.
4. We don't use the read_ip/write_ip code, so get rid of it.
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
Use a negative timestamp as the flag for this, making the test simple.
This allows valgrind to detect that we're accessing them prematurely,
including across the wire on gossip_getchannels_entry.
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
jl777 reported a crash when we try to pay past reserve. Fix that (and
a whole class of related bugs) and add tests.
In test_lightning.py I had to make non-async path for sendpay() non-threaded
to get the exception passed through for testing.
Closes: #236
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
You will want to 'make distclean' after this.
I also removed libsecp; we use the one in in libwally anyway.
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
Some fields were redundant, some are simply moved into 'struct lightningd'.
All routines updated to hand 'struct lightningd *ld' now.
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
Also, we split the more sophisticated json_add helpers to avoid pulling in
everything into lightning-cli, and unify the routines to print struct
short_channel_id (it's ':', not '/' too).
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
To avoid everything pulling in HTLCs stuff to the opening daemon, we
split the channel and commit_tx routines into initial_channel and
initial_commit_tx (no HTLC support) and move full HTLC supporting versions
into channeld.
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
Other places require the flags and states, but the structure is
only needed in channeld, and even then we can remove several fields.
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
I was hoping to defer HTLC updates until we actually store HTLCs, but
we need to flush to DB whenever balances update as well.
Signed-off-by: Christian Decker <decker.christian@gmail.com>
We're very simple about it: if there's a reorganization, we restart. Otherwise
we tell it about everything.
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
It's in the shachain, so storing it is completely redundant. We leave
it in for the moment so we can assert() that nothing has changed.
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
When loading from DB the list of htlcs was not being initialized which
caused a segfault when the first commit came around, this fixes it.
Signed-off-by: Christian Decker <decker.christian@gmail.com>
The peer->seed needs to be unique for each channel, since bitcoin
pubkeys and the shachain are generated from it. However we also need
to guarantee that the same seed is generated for a given channel every
time, e.g., upon a restart. The DB channel ID is guaranteed to be
unique, and will not change throughout the lifetime of a channel, so
we simply mix it in, instead of a separate increasing counter.
We also needed to make sure to store in the DB before deriving the
seed, in order to get an ID assigned by the DB.
Signed-off-by: Christian Decker <decker.christian@gmail.com>
This is the big one, and it's completely anticlimactic: it loads all
channels that have reached opening and are not marked as
closingd_complete into memory, that's it.
Signed-off-by: Christian Decker <decker.christian@gmail.com>
This was supposed to be a temporary solution anyway, and I had a
rather annoying mixup between peer_id and unique_id, the latter of
which is actually a connection identifier.
Add the channel to the peer on the two open paths (fundee and funder)
and store it into the database. Currently fails when opening a channel
to a known peer after loading from DB because we attempt to insert a
new peer with the same node_id. Will fix later.
As per lightning-rfc change 956e8809d9d1ee87e31b855923579b96943d5e63
"BOLT 7: add chain_hashes values to channel_update and channel_announcment"
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
This brings us up to 955e874acc535ab2c74c1cf0eab61896ea4224ff in
https://github.com/lightningnetwork/lightning-rfc
This doesn't actually change anything; the only actual change is held back
for the next commit.
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
This is required for onchaind: we want to watch all descendents by default,
as to do otherwise would be racy, which means we need to traverse the outputs
when a tx appears.
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
And store in peer->last_tx/peer->last_sig like all other places,
that way we broadcast it if we need to.
Note: the removal of tmpctx in funder_channel() is needed because we
use txs[0], which was allocated off tmpctx.
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
We need to check if we exit after sending a revoke_and_ack, otherwise
channeld ends up getting the closing_signed packet.
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
tal_strdup() doesn't set tal_count(), so we end up sending an ERROR
packet with an empty message. Wrap this and get it right.
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
There was a race condition that would cause an assertion to segfault
if a call to release_peer was interleaved with a fail_peer. The
release_peer was making the peer non-local, which was then causing the
assertion in fail_peer to fail. Now we just have 3 cases: not found,
local, and non-local.
Signed-off-by: Christian Decker <decker.christian@gmail.com>
After quite some back and forth we seem to finally agree on the bit
3 (mask 0x08) to signal optional initial_routing_sync.
Signed-off-by: Christian Decker <decker.christian@gmail.com>
This was causing failures on testnet where confirmations are not
immediate.
Reported-by: Fabrice Drouin @sstone
Signed-off-by: Christian Decker <decker.christian@gmail.com>
We support a number of features already, so failing connections
whenever we see an even bit set is not a good idea. This turned out to
kill our connections to eclair.
Also, the spec says that the LSB / bit 0 is to be counted as index 0, and
therefore even. So we need to check the lower of each 2-bit-tuple not
the higher one.
We were using the bitcoin genesis blockhash for all networks, which is
not correct, and would result in the open being aborted when talking
to other implementations.
Reported-by: @sstone and @pm47
Signed-off-by: Christian Decker <decker.christian@gmail.com>
We weren't registering reconnecting peers for broadcasts. Just
starting a timer is enough. Also added an integration test to check
that the gossip sync is being resumed.
test_closing_negotiation_reconnect (__main__.LightningDTests) ... peer state CLOSINGD_COMPLETE should be CLOSINGD_SIGEXCHANGE
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
This is a transitional state, while we're waiting to see the
closing tx onchain (which is To Be Implemented).
The simplest way to do re-transmission is to re-use closingd, and just
disallow any updates.
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
I made the mistake of thinking it was a [NUM_SIDES] array, but
it's actually our balance, and it's in millisatoshi. Rename
for clarity.
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
As tracked down by Christian; by setting up the master conn first,
we make the master fd async. This means that the synchronous read
(in init_channel) can fail with -EAGAIN, and indeed, Christian
saw this when not running under valgrind.
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
We actually don't need to transition if we're reconnecting, and logic
to go to CHANNELD_NORMAL was wrong: we checked that we'd seen funding tx
locked, but not that we'd received a msg from the remote peer.
We need to fix the tests now we no longer double-transition, too.
Fixes: #188
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
Important: a non-standard one can make the closing tx not propagate.
Drive-by cut&paste message fix, too.
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
This is what it actually is, and makes it clearer when we refer to the
spec. It's the commitment we're currently updating, which is the next
commitment.
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
We currently have the problem that the master can send new HTLCs before
we've processed the incoming reestablish message.
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
The next patch includes wire/peer_wire.h and causes a compile error
as lightningd/gossip_control.c defined its own gossip_msg function.
New names are clearer.
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
This can happen even without a protocol violation, if the incoming
update_add_htlc crosses over our outgoing shutdown.
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
We keep the scriptpubkey to send until after a commitment_signed (or,
in the corner case, if there's no pending commitment). When we
receive a shutdown from the peer, we pass it up to the master.
It's up to the master not to add any more HTLCs, which works because
we move from CHANNELD_NORMAL to CHANNELD_SHUTTING_DOWN.
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
We don't need to keep this around any more: by handing it to
subdaemons we ensure we'll close it if the peer disconnects, and we
also add code to get a new one on reconnection.
Because getting a gossip_fd is async, we re-check the peer state after
it gets back. This is kind of annoying: perhaps if we were to hand
the reconnected peer through gossipd (with a flag to immediately
return it) we could get the gossip fd that way and unify the paths?
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
At the moment, master simply keeps the gossip fd open when peer
disconnects. That's inefficient, and wrong anyway (it may want a
complete new sync, or may not, but we'll currently send all the
messages including stale ones).
This interface will be required for restart anyway.
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>