All peers come from gossipd, and maintain an fd to talk to it. Sometimes
we hand the peer back, but to avoid a race, we always recreated it.
The race was that a daemon closed the gossip_fd, which made gossipd
forget the peer, then master handed the peer back to gossipd. We stop
the race by never closing the gossipfd, but hand it back to gossipd
for closing.
Now gossipd has to accept two fds, but the handling of peers is far
clearer.
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
We should also go through and use consistent nomenclature on functions which
are used with a local peer ("lpeer_xxx"?) and those with a remote peer
("rpeer_xxx"?) but this is minimal.
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
This will later be used to determine whether or not we should announce
ourselves as a node.
Signed-off-by: Christian Decker <decker.christian@gmail.com>
Test objects must be added to $(ALL_OBJS) so they correctly depend on
CCAN headers etc.
Also, each test in a subdir must depend on headers and src in the parent
directory, as it will often #include them directly.
Reported-by: Christian Decker
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
And nail "make check-source" to that specific version (which is a commit id,
not a branch name, so needs a different syntax for git).
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
And we report these through the getpeers JSON RPC again (carefully: in
our reconnect tests we can get duplicates which this patch now filters
out).
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
In future it will have TOR support, so the name will be awkward.
We collect the to/fromwire functions in common/wireaddr.c, and the
parsing functions in lightningd/netaddress.c.
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
We need to derive this from the fd when they connect in, but we already
know it if we're connecting out.
We want this so we can tell (in next few patches) master the peer's address.
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
954a3990fa had two errors:
1) We created the handoff message *before* we sent the final packet, meaning
that the cryptostate was out-of-sync.
2) We called io_wait() on the output side of a duplex connection: it has
to be io_wait_out().
This time, stress testing for 2 hours revealed no more problems.
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
In this case, it was a gossip message half-sent, when we asked the peer
to be released. Fix the problem in general by making send_peer_with_fds()
wait until after the next packet.
test_routing_gossip/lightning-4/log:
b'lightning_openingd(8738): TRACE: First per_commit_point = 02e2ff759ed70c71f154695eade1983664a72546ebc552861f844bff5ea5b933bf'
b'lightning_openingd(8738): TRACE: Failed hdr decrypt with rn=11'
b'lightning_openingd(8738): STATUS_FAIL_PEER_IO: Reading accept_channel: Success'
test_routing_gossip/lightning-5/log:
b'lightning_gossipd(8461): UPDATE WIRE_GOSSIP_PEER_NONGOSSIP'
b'lightning_gossipd(8461): UPDATE WIRE_GOSSIP_PEER_NONGOSSIP'
b'lightningd(8308): Failed to get netaddr for outgoing: Transport endpoint is not connected'
The problem occurs here on release, but could be on any place where we hand
a peer over when using ccan/io. Note the other case (channel.c).
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
It makes it impossible to embed an ipaddr in another structure, since we
always try to skip over any zeroes, which may swallow a following field.
Do the skip specially for the case where we're parsing routing messages:
we never use padding for our own internal messages anyway.
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
Now the flow is much simpler from a lightningd POV:
1. If we want to connect to a peer, just send gossipd `gossipctl_reach_peer`.
2. Every new peer, gossipd hands up to lightningd, with global/local features
and the peer fd and a gossip fd using `gossip_peer_connected`
3. If lightningd doesn't want it, it just hands the peerfd and global/local
features back to gossipd using `gossipctl_handle_peer`
4. If a peer sends a non-gossip msg (eg `open_channel`) the gossipd sends
it up using `gossip_peer_nongossip`.
5. If lightningd wants to fund a channel, it simply calls `release_channel`.
Notes:
* There's no more "unique_id": we use the peer id.
* For the moment, we don't ask gossipd when we're told to list peers, so
connected peers without a channel don't appear in the JSON getpeers API.
* We add a `gossipctl_peer_addrhint` for the moment, so you can connect to
a specific ip/port, but using other sources is a TODO.
* We now (correctly) only give up on reaching a peer after we exchange init
messages, which changes the test_disconnect case.
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
This fixes the only case where the master currently has to write directly
to the peer: re-sending an error. We make gossipd do it, by adding
a new gossipctl_fail_peer message.
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
In particular, the main daemon needs to pass it about (marshal/unmarshal)
but it won't need to actually use it after the next patch.
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
We were sending a channeld message to onchaind, which was v. confusing
due to overlap. We make all the numbers distinct, which means we can
also add an assert() that it's valid for that daemon, which catches
such errors immediately.
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
This change is really to allow us to have a --dev-fail-on-subdaemon-fail option
so we can handle failures from subdaemons generically.
It also neatens handling so we can have an explicit callback for "peer
did something wrong" (which matters if we want to close the channel in
that case).
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
1. The code to skip over padding didn't take into account max.
2. It also didn't use symbolic names.
3. We are not supposed to fail on unknown addresses, just stop parsing.
4. We don't use the read_ip/write_ip code, so get rid of it.
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
I missed these when I removed the legacy daemon. We also remove the
min_blocks field which was always 0.
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
Use a negative timestamp as the flag for this, making the test simple.
This allows valgrind to detect that we're accessing them prematurely,
including across the wire on gossip_getchannels_entry.
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
The test is could actually go each way, since for 1000000 the fee is
the same either way.
Increase to 300000, and add an extra test when the alternate path
is disabled.
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
I had a routing problem, and wrote a simple unit test which passed. So
I wrote one which copied the failure case (and importantly, had a non-1
fee factor), which triggerd it.
In that real example, we underflowed which resulted in us not finding
a route. Simply don't consider routes which are infinite.
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
Since we initialize last_timestamp to 0, we ignore any initial update
with this timestamp. Don't compare it if we don't already have an
update, and don't initialize it, so valgrind can tell us if we use
it accidentally.
b'lightning_gossipd(3368): TRACE: Received channel_update for channel 6892:2:1(0)'
b'lightning_gossipd(3368): TRACE: Ignoring outdated update.'
b'lightning_gossipd(3368): TRACE: Received channel_update for channel 6893:2:1(1)'
b'lightning_gossipd(3368): TRACE: Channel 6893:2:1(1) was updated.'
The same logic applies to node_updates, so we do the same there.
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>