Rather than using a flag in reaching/peer; we make it self-contained
as the next patch puts it straight into a timer callback.
Also remove unused 'succeeded' field from struct peer.
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
And on channel_fail_permanent and closing (the two places we drop to
chain), we tell gossipd it's no longer important.
Fixes: #1316
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
These don't have a maximum number of reconnect attempts, and ensure
that we try to reconnect when the peer dies.
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
We're about to remove automatic retrying of connect, and that uncovered
that we actually print out our "Server started" message before we create
the listening socket.
Move the init higher (outside the db transaction) and make it a
request/response, the loop until it's done.
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
Internally both payment and routing use 64-bit, but the interface
between them used 32-bit.
Since both components already support 64-bit we should use that.
In particular, the main daemon and subdaemons share the backtrace code,
with hooks for logging.
The daemon hook inserts the io_poll override, which means we no longer
need io_debug.[ch]. Though most daemons don't need it, they still link
against ccan/io, so it's harmess (suggested by @ZmnSCPxj).
This was tested manually to make sure we get backtraces still.
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
If we only remember the actions that added channels then we'd restore them when
re-reading the gossip_store, so put a tombstone in there to remember to delete
it. These will be cleared upon re-writing the store since the announcements wont
be written anymore.
Signed-off-by: Christian Decker <decker.christian@gmail.com>
This now works because we no longer call out to masterd or bitcoind to verify
the channels. It's also rather quick and silent so we can just process all
stored messages until we're done.
Signed-off-by: Christian Decker <decker.christian@gmail.com>
Messages from peers and messages from the gossip_store now have completely
different entrypoints, so we don't need to trace their origin around the message
handling code any longer.
This stores and reads the channel_announcements in the wrapping message which
allows us to store associated data with the raw channel_announcements.
The gossip_store applies channel_announcements directly but it also returns it,
and it gets discarded as a duplicate. In the next commit we'll have gossip_store
apply all changes, bypassing verification, so the duplication is only temporary.
Signed-off-by: Christian Decker <decker.christian@gmail.com>
Since we may want to extend the on-disk format by adding custom information we
may as well just go the extra mile and reuse the serialization primitives we
already have.
Signed-off-by: Christian Decker <decker.christian@gmail.com>
When we read from the gossip_store we set store=false so that we don't duplicate
messages in the store.
Signed-off-by: Christian Decker <decker.christian@gmail.com>
Ee will be replaying gossip messages from the gossip_store soon. This means that
not all messages originate from a peer, so we move the queuing of error messages
up into the peer message handler.
Signed-off-by: Christian Decker <decker.christian@gmail.com>
If we're going to simply take() a pointer, don't allocate it off a random
object. Using NULL makes our intent clear, particularly with allocating
packets we're going to take() onto a queue.
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
Now it just returns true if it queued something. This allows it
to queue multiple packets, and lets it share code paths with other code
in future patches.
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
We currently keep two copies; one in the broadcast structure to send
in order, and one in the routing information. Since we already keep
the broadcast index in the routing information, use that.
Conveniently, a zero index is the same as the old NULL test.
Rename struct node's announcement_idx to node_announce_msgidx to
make it match the other users.
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
1. make queue_peer_msg() use both if branches, as both equally likely.
2. Remove redundant *scid = NULL in handle_channel_announcement.
3. Log failing pending channel_updates.
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
As per BOLT #7.
We don't do this for channel_update which are queued because the
channel_announcement is pending though.
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
We already have 'struct node', so rename 'struct routing_channel' to
'struct chan', and 'struct node_connection' to 'struct half_chan'.
Other minor changes:
1. rstate->channels -> rstate->chanmap.
2. 'connections' -> 'half'.
3. connection_to -> half_chan_to
4. connection_from -> half_chan_from
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
Failure and pruning were the two places where a node_connection could
be freed; now they both deal with entire channels, we can remove the
NULL checks, and the destructor.
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
This is twice the 'update_channel_interval' we get handed.
We delete the non-existent channel_add_connection and delete_connection
declarations from the header too.
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
We currently give them a free pass. The simplest fix is to give them
an old timestamp on initialization.
We still skip unannounced channels, on the assumption that they're
ours. And we set the last_update_timestamp to -1 when we convert to
gossip_getchannels_entry to indicate no update.
This breaks the DEVELOPER=1 pruning test, since we hardcode the 1
week timeout. That's fixed in the next patch.
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
We make new_routing_channel() populate both connections
(active=false), so local_add_channel becomes simpler. We also
suppress listchannels output of active=false unannounced channels, to
avoid breaking tests (also, these are unusable, so it makes sense to
omit them)
It also seems the logic in add_channel_direction is legacy: a
channel_announce cannot replace the scid (that would be a different
channel), we don't allow duplicate announcements, and the announcement
is never NULL.
And since we disallow repeated channel_announce already, I believe
'forward' is always true, greatly simplifying the logic in
handle_pending_cannouncement.
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
This makes 'routing_channel' the primary object in the system; it can have
one or two 'node_connection's attached, and points to two nodes.
The nodes are freed when no more routing_channel refer to them. The
routing_channel are freed when they contain no more 'node_connection'.
This fixes#1072 which I surmise was caused by a dangling
routing_channel after pruning.
Each node contains a single array of 'routing_channel's, not one for
each direction. The 'routing_channel' itself orders nodes in key
order (conveniently the index is equal to the direction flag we use),
and 'node_connection' with source in the same order.
There are helpers to assist with common questions like "which
'node_connection' leads out of this node?".
There are now two ways to find a channel:
1. Direct scid lookup via rstate->channels map.
2. Node key lookup, followed by channel traversal.
Several FIXMEs are inserted for where we can now do things more optimally.
Fixes: #1072
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
We're going to make it a first-class citizen, and pending routing_channel
are not real ones (in particular, we don't want to create pending nodes).
We had a linked list called rstate->pending_cannouncement which we didn't
actually use, so put that back for now and add a FIXME to use a faster
data structure.
We need to check that list now in handle_channel_update, but we never
have a real routing_channel and a pending, unless the routing_channel
isn't public.
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
This hook is called when the queue is empty; we should only send gossip
according to the gossip timer. We're currently dribbling it out after
every message, in violation of the spec.
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
Now we have them, let's use them. I missed one case deliberately, since
that causes merge conflicts when I replace it in a following patch.
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
If we make destroy_node() remove itself from the map, then we simply
need to free it.
We can batch the frees (as we need) simply by reparenting all the pruned
nodes onto a single temporary parent, then freeing it, relying on tal's
internal datastructures.
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
We always hand in "NULL" (which means use tal_len on the msg), except
for two places which do that manually for no good reason.
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
DEBUG:root:lightningd(16333): 2018-02-08T02:12:21.158Z lightningd(8262): lightning_openingd(0382ce59ebf18be7d84677c2e35f23294b9992ceca95491fcf8a56c6cb2d9de199): Failed hdr decrypt with rn=2
We only hand off the peer if we've not started writing, but that was
insufficient: we increment the sn twice on encrypting packet, so there's
a window before we've actually started writing where this is now
wrong.
The simplest fix is only to hand off from master when we've just written,
and have the read-packet path simply wake the write-packet path.
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
We get intermittant failure: WIRE_UNKNOWN_NEXT_PEER (First peer not ready)
because CHANNELD_NORMAL and actually telling gossipd that the channel
is available are distinct things: we need both.
(For test_closing_different_fees, we were testing CHANNELD_NORMAL on
the peer, not on l1, too).
But we may also directly send the announcement sigs if the height is
sufficient, so the simplest is to unify the messages.
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
These are now logically arrays of pointers. This is much more natural,
and gets rid of the horrible utxo array converters.
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
We need to make sure all the updates are known to gossip. Since
one is the local update, we change that message to look the same.
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
This is done it two parts, since we have to ask the main daemon to do
the lookup for us.
If this becomes a bottleneck, we can have a separate daemon, or even
an RPC pipe to bitcoind ourselves.
Fixes: #403
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
Our handling of SIGPIPE was incoherent and inconsistent, and we had much
cut & paste between the daemons. They should *ALL* ignore SIGPIPE, and
much of the rest of the boilerplate can be shared, so should be.
Reported-by: @ZmnSCPxj
Fixes: #528
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
It's just a sha256_double, but importantly when we convert it to a
string (in type_to_string, which is used in logging) we use
bitcoin_blkid_to_hex() so it's reversed as people expect.
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
This adds the channel from us to the remote node and activates it with
our local parameters.
Signed-off-by: Christian Decker <decker.christian@gmail.com>
Couldn't find a good place to put these messages, we probably want to
do the same capability based request routing that we did for the HSM,
but for now this just defines the message in the master messages file.
Signed-off-by: Christian Decker <decker.christian@gmail.com>
When gossipd sends a message, have a gossip_index. When it gets back a
peer, the current gossip_index is included, so it can know exactly where
it's up to.
Most of this is mechanical plumbing through openingd, channeld and closingd,
even though openingd and closingd don't (currently) read gossip, so their
gossip_index will be unchanged.
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
All peers come from gossipd, and maintain an fd to talk to it. Sometimes
we hand the peer back, but to avoid a race, we always recreated it.
The race was that a daemon closed the gossip_fd, which made gossipd
forget the peer, then master handed the peer back to gossipd. We stop
the race by never closing the gossipfd, but hand it back to gossipd
for closing.
Now gossipd has to accept two fds, but the handling of peers is far
clearer.
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
We should also go through and use consistent nomenclature on functions which
are used with a local peer ("lpeer_xxx"?) and those with a remote peer
("rpeer_xxx"?) but this is minimal.
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
This will later be used to determine whether or not we should announce
ourselves as a node.
Signed-off-by: Christian Decker <decker.christian@gmail.com>
And we report these through the getpeers JSON RPC again (carefully: in
our reconnect tests we can get duplicates which this patch now filters
out).
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
In future it will have TOR support, so the name will be awkward.
We collect the to/fromwire functions in common/wireaddr.c, and the
parsing functions in lightningd/netaddress.c.
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
We need to derive this from the fd when they connect in, but we already
know it if we're connecting out.
We want this so we can tell (in next few patches) master the peer's address.
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
954a3990fa had two errors:
1) We created the handoff message *before* we sent the final packet, meaning
that the cryptostate was out-of-sync.
2) We called io_wait() on the output side of a duplex connection: it has
to be io_wait_out().
This time, stress testing for 2 hours revealed no more problems.
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
In this case, it was a gossip message half-sent, when we asked the peer
to be released. Fix the problem in general by making send_peer_with_fds()
wait until after the next packet.
test_routing_gossip/lightning-4/log:
b'lightning_openingd(8738): TRACE: First per_commit_point = 02e2ff759ed70c71f154695eade1983664a72546ebc552861f844bff5ea5b933bf'
b'lightning_openingd(8738): TRACE: Failed hdr decrypt with rn=11'
b'lightning_openingd(8738): STATUS_FAIL_PEER_IO: Reading accept_channel: Success'
test_routing_gossip/lightning-5/log:
b'lightning_gossipd(8461): UPDATE WIRE_GOSSIP_PEER_NONGOSSIP'
b'lightning_gossipd(8461): UPDATE WIRE_GOSSIP_PEER_NONGOSSIP'
b'lightningd(8308): Failed to get netaddr for outgoing: Transport endpoint is not connected'
The problem occurs here on release, but could be on any place where we hand
a peer over when using ccan/io. Note the other case (channel.c).
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
Now the flow is much simpler from a lightningd POV:
1. If we want to connect to a peer, just send gossipd `gossipctl_reach_peer`.
2. Every new peer, gossipd hands up to lightningd, with global/local features
and the peer fd and a gossip fd using `gossip_peer_connected`
3. If lightningd doesn't want it, it just hands the peerfd and global/local
features back to gossipd using `gossipctl_handle_peer`
4. If a peer sends a non-gossip msg (eg `open_channel`) the gossipd sends
it up using `gossip_peer_nongossip`.
5. If lightningd wants to fund a channel, it simply calls `release_channel`.
Notes:
* There's no more "unique_id": we use the peer id.
* For the moment, we don't ask gossipd when we're told to list peers, so
connected peers without a channel don't appear in the JSON getpeers API.
* We add a `gossipctl_peer_addrhint` for the moment, so you can connect to
a specific ip/port, but using other sources is a TODO.
* We now (correctly) only give up on reaching a peer after we exchange init
messages, which changes the test_disconnect case.
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
This fixes the only case where the master currently has to write directly
to the peer: re-sending an error. We make gossipd do it, by adding
a new gossipctl_fail_peer message.
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
This change is really to allow us to have a --dev-fail-on-subdaemon-fail option
so we can handle failures from subdaemons generically.
It also neatens handling so we can have an explicit callback for "peer
did something wrong" (which matters if we want to close the channel in
that case).
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>