Store the two we care about as booleans. Once connectd is complete we won't
even have the feature bitmaps for peers.
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
We weren't waiting for gossipd to actually process the
dev_set_max_scids_encode_size message, so under Travis it sometimes
split the reply before processing that.
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
Note that we mark both directions of the channel disabled immediately,
it's just the broadcast of the update which is delayed, just like the
ones generated when channeld tells us to.
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
We disable the channel every time the peer disconnects; if it reconnects
we get two updates.
The simplest solution: delay all updates by 15 seconds. Replace any
pending delayed update. If update is redundant after 15 seconds,
discard.
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
This doesn't do anything for us now, since we actually tend to produce
DISABLE/ENABLE update pairs. But the infrastructure is useful for the
next patch.
We also add more details to the trace message in the core update code.
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
json_listpeers returns an array of peers, and an array of nodes: the latter
is a subset of the former, and is used for printing alias/color information.
This changes it so there is a 1:1 correspondance between the peer information
and nodes, meaning no more O(n^2) search.
If there is no node_announce for a peer, we use a negative timestamp
(already used to indicate that the rest of the gossip_getnodes_entry
is not valid).
Other fixes:
1. Use get_node instead of iterating through the node map.
2. A node without addresses is perfectly valid: we have to use the timestamp
to see if the alias/color are set. Previously we wouldn't print that
if it didn't also advertize an address.
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
structeq() is too dangerous: if a structure has padding, it can fail
silently.
The new ccan/structeq instead provides a macro to define foo_eq(),
which does the right thing in case of padding (which none of our
structures currently have anyway).
Upgrade ccan, and use it everywhere. Except run-peer-wire.c, which
is only testing code and can use raw memcmp(): valgrind will tell us
if padding exists.
Interestingly, we still declared short_channel_id_eq, even though
we didn't define it any more!
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
We wrap it in 'struct pubkey' for typesafety and consistency, and the
next patch takes advantage of that when we move to pubkey_eq.
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
Only --addr implies announce-if-public: --bind-addr does not.
It's also possible to have --bind-addr to an automatic Tor address:
you'd have to dig the onion address out of the logs or getinfo to use
it, but it's possible.
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
This is a best effort attempt to skip connection attempts if we detect a broken
ISP resolver. A broken ISP resolver is a resolver that will replace NXDOMAIN
replies with a dummy response. This is best effort in that it'll only detect a
single fixed dummy reply, it'll check only on startup, and will not detect if we
switched networks. It should be good enough for most cases, and in the worst
case it will result in a connection attempt that does not complete.
Signed-off-by: Christian Decker <decker.christian@gmail.com>
Reported-by: Glenn Willen <@gwillen>
Cut & paste means we sometimes sent NULL:
```
2018-06-15T00:13:51.908Z lightningd(23653): lightning_closingd-03864ef025fde8fb587d989186ce6a4a186895ee44a926bfc370e2c366597a3f8f chan #436: Gossipd gave us bad send_gossip message 0bc80000
```
Fixes: #1581
Reported-by: @Xian001
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
In this case, local and remote are *both* NULL; so if someone tries to
send a packet with take(), we need to free it.
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
I think this is what is causing #1536: getting disconnected causes gossipd to
attempt to reach the peer again, unconditionally setting the flag to tell the
master. At the same time the master also issues a reaching command (which is
allowed since it is its first), but then it clashes on the already set
flag. Setting this flag only when the master actually needs to be told should
fix this.
Signed-off-by: Christian Decker <decker.christian@gmail.com>
A failed compaction shouldn't be deadly, but we should also not attempt to do
one on every gossip message after the first one fails.
Signed-off-by: Christian Decker <decker.christian@gmail.com>
`gossip_store_add` is the entry point for messages from the network, so it
should do the bookkeeping and disable on failures. `gossip_store_append` is the
shared function that wraps messages and writes it to the given file. This is
shared between the from network path and the compaction path, so we don't
directly use the `gossip_store` instance, but `fd`s.
Signed-off-by: Christian Decker <decker.christian@gmail.com>
We write both when coming from outside, as well as when compacting, so we
extract the write functionality to use it in both cases.
Signed-off-by: Christian Decker <decker.christian@gmail.com>
This makes the exposed interface much smaller, cleaner and will allow us to just
replay gossip messages from the broadcast.
Signed-off-by: Christian Decker <decker.christian@gmail.com>
Two cases:
1. Node no longer has any public channels: remove node_announcement.
2. Node's node_announcement now preceeds all the channel_announcements:
move node_announcement to the end.
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
This lets detect if a node announce preceeds a channel announce once we
delete the node announcement.
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
We *accept* a node_announce if we have a channel_announce, but we
can't queue it until we queue the channel_announce, which we only do
once we have recieved a channel_update.
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
Since we currently only (ab)use it to send everything, we need a way to
generate boutique queries for testing.
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
We have a function called 'wake_pkt_out' which is really 'start
gossiping', so rename it to 'wake_gossip_out'.
In addition, it's fired both on a timer, and in response to our first
gossip_timestamp_filter, which leads to very confusing (though,
technically, not incorrect) behavior.
Keep a single timer at all times, which now doubles as the flag to
indicating we're syncing right now. Set it once we're done syncing
gossip.
Technically this means we got from once-every-60-seconds to
quiet-for-60-seconds-between-gossip, but that's OK.
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
And initialize filter (to "never") when we negotiated LOCAL_GOSSIP_QUERIES,
and send initial filter message.
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
This is kind of orthogonal to the other changes, but makes sense: if we
would instantly or never prune the message, don't accept it.
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
We use the same system as for gossip: we trickle out replies when we're
otherwise idle.
As we trickle out replies to query_short_channel_ids, we remember the
pubkeys of nodes we mention. At the end, we sort and uniquify, and
then send any node_announcements we have for those.
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
We use the same system as for gossip: we trickle out replies when we're
otherwise idle.
This is minimal infrastructure: we don't actually process the
query_short_channel_ids message yet, nor do we append node
announcements.
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
In general, we need to only publish node announcements after
publishing channel announcements, though we can accept node
announcements as soon as we see channel announcements. So we keep a
flag for those node_announcement which haven't been broadcast yet.
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
handle_pending_cannouncement might not actually add the announcment,
as it could be waiting for a channel_update. We need to wait for
the actual announcement before considering announcing our node.
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
We generate new ones anyway; removing this code changes fixes coming
up which now only need to change one place.
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
We don't have any connection yet, so how could they be active? Disable both
sides to avoid trying to route through them or telling others to use them as
`contact_points` in invoices.
Signed-off-by: Christian Decker <decker.christian@gmail.com>
We're telling gossipd about disconnections anyway, so let's just use that signal
to disable both sides of the channel.
Signed-off-by: Christian Decker <decker.christian@gmail.com>
This was failing some of our integration tests, i.e., the ones closing a channel
and not waiting for sigexchange. The remote node would often not be quick enough
to send us its disabling channel_update, and hence we'd still remember the
incoming direction. That could then be sent out as part of an invoice, and fail
subsequently. So just set both directions to be disabled and let the onchain
spend clean up once it happens.
Signed-off-by: Christian Decker <decker.christian@gmail.com>
This resolves the problem where both channeld and gossipd can generate
updates, and they can have the same timestamp. gossipd is always able
to generate them, so can ensure timestamp moves forward.
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
We erroneously create updates with the same timestamps when tests run
quickly, and the second one is ignored.
We've already noted that this should be fixed: gossipd should generate
all the updates, as it already has to do the case where channeld
crashed, for example. But that's a bigger change.
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
@cdecker points out that in test_forward, where we manually create a route,
we get an error back which contains an update for an unknown channel.
We should still note this, but it's not an error for testing.
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
This is something which generally shouldn't happen, but we didn't
notice it previously.
We ignore this warning in the case where a channel was deleted: this
happens because one side can send an update while the other notices
that the channel is closed.
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
Note: this will break the gossip_store if they have current channels,
but it will fail to parse and be discarded.
Have local_add_channel do just that: the update is logically separate
and can be sent separately.
This removes the ugly 'bool add_to_store' flag.
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
Tor wasn't actually working for me to connect to anything, but it worked
for 'ssh -D' testing.
Note that the resulting 'netaddr' is a bit weird, but I guess it's honest.
$ ./cli/lightning-cli connect 021f2cbffc4045ca2d70678ecf8ed75e488290874c9da38074f6d378248337062b
{
"id": "021f2cbffc4045ca2d70678ecf8ed75e488290874c9da38074f6d378248337062b"
}
$ ./cli/lightning-cli listpeers
{
"peers": [
{
"state": "GOSSIPING",
"id": "021f2cbffc4045ca2d70678ecf8ed75e488290874c9da38074f6d378248337062b",
"netaddr": [
"ln1qg0je0lugpzu5ttsv78vlrkhteyg9yy8fjw68qr57mfhsfyrxurzkq522ah.lseed.bitcoinstats.com:9735"
],
"connected": true,
"owner": "lightning_gossipd"
}
]
}
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
Good for debugging (you have to send SIGUSR1 to lightning_gossipd to turn
it on though, and --log-level=io on the lightningd cmdline to have it
output IO messages by default).
I also noticed that io_tor_connect_after_req_host() does a useless
test on reach->buffer[0] after it's *written*: remove it.
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
Use a wireaddr_internal directly (which is what we want).
Also, don't hardcode 9735, use DEFAULT_PORT internally in
seed_resolve_addr().
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
Previously it converted the wireaddr to a string internally: to support
unresolved names we need that done externally.
We actually tell the SOCKS5 proxy to do a domain lookup already, even
though we give use IP/IPv6 address, so this change is sufficient to
support connect-by-name.
Note replacement of assert() with an explicit case statement, which
has the benefit that the compiler complains when we add new
ADDR_INTERNAL types.
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
This is useful for the next patch, where we want to hand the unresolved
name through to the proxy.
This also addresses @Saibato's worry that we still called getaddrinfo()
(with the AI_NUMERICHOST option) even if we didn't want a lookup.
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
Not all of them, but it's really about using the SOCKS proxy rather than
really using Tor at this level.
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
We assert() that it's set by one of the branches (it should be!) but
if we don't hit one it's uninitialized, not NULL.
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
1. If we have a channel_announcement, the channel is public, otherwise
it's not. Not all channels are public, as they can be local: those
have a NULL channel_announcement.
2. If we don't have a channel_update, we know nothing about that half
of the channel, and no other fields are valid.
3. We can tell if a half channel is disabled by the flags field directly.
Note that we never send halfchannels without an update over
gossip_getchannels_reply so that marshalling/unmarshalling can be
vastly simplified.
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
Make the update/announce messages own the element in the broadcast map
not the other way around.
Then we keep a pointer to the message, and when we free it
(eg. channel closed, update replaces it), it gets freed from the
broadcast map automatically.
The result is much nicer!
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
Basically, if we don't have an announcement for the channel, stash it,
and once we get an announcement, replay if necessary.
Fixes: #1485
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
This means it will effect connect commands too (though it's too
late to stop DNS lookups caused by commandline options).
We also warn that this is one case where we allow forcing through Tor
without a proxy set: it just means all connections will fail.
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
This takes the Tor service address in the same option, rather than using
a separate one. Gossipd now digests this like any other type.
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
For the moment, this is a straight handing of current parameters through
from master to the gossip daemon. Next we'll change that.
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
Risks leakage. We could do lookup via the proxy, but that's a TODO.
There's only one occurance of getaddrinfo (and no gethostbyname), so
we add a flag to the callers.
Note: the use of --always-use-proxy suppresses *all* DNS lookups, even
those from connect commands and the command line.
FIXME: An implicit setting of use_proxy_always is done in gossipd if it
determines that we are announcing nothing but Tor addresses, but that
does *not* suppress 'connect'.
This is fixed in a later patch.
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
1. Only force proxy use if we don't announce any non-TOR address.
There's no option to turn it off, so this makes more sense.
2. Don't assume we want an IPv4 socket to reach proxy, use the family
from the struct addrinfo.
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
Instead of storing a wireaddr and converting to an addrinfo every
time, just convert once (which also avoids the memory leak in the
current code).
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
Rename tor_proxyaddrs and tor_serviceaddrs to tor_proxyaddr and tor_serviceaddr:
the 's' at the end suggests that there can be more than one.
Make them NULL or non-NULL, rather than using all-zero if unset.
Hand them the same way to gossipd; it's a bit of a hack since we don't
have optional fields, so we use a counter which is always 0 or 1.
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
All gossipd needs from common/tor is do_we_use_tor_addr(), so move
that and the rest of the tor-specific handshake code into gossip/tor.c
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
This is a rebased and combined patch for Tor support. It is extensively
reworked in the following patches, but the basis remains Saibato's work,
so it seemed fairest to begin with this.
Minor changes:
1. Use --announce-addr instead of --tor-external.
2. I also reverted some whitespace and unrelated changes from the patch.
3. Removed unnecessary ';' after } in functions.
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
This is a leftover from before splitting the `gossip_store` injection path from
the handling of gossip messages.
Signed-off-by: Christian Decker <decker.christian@gmail.com>
Someone could try to announce an internal address, and we might probe
it.
This breaks tests, so we add '--dev-allow-localhost' for our tests, so
we don't eliminate that one. Of course, now we need to skip some more
tests in non-developer mode.
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
If we're given a wildcard address, we can't announce it like that: we need
to try to turn it into a real address (using guess_address). Then we
use that address. As a side-effect of this cleanup, we only announce
*any* '--addr' if it's routable.
This fix means that our tests have to force '--announce-addr' because
otherwise localhost isn't routable.
This means that gossipd really controls the addresses now, and breaks
them into two arrays: what we bind to, and what we announce. That is
now what we return to the master for json_getinfo(), which prints them
as 'bindings' and 'addresses' respectively.
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
1. Add special option where an empty host means 'wildcard for IPv4 and/or IPv6'
which means ':1234' can be used to set only the portnum.
2. Only add this protocol wildcard if --autolisten=1 (default)
and no other addresses specified.
3. Pass it down to gossipd, so it can handle errors correctly: in most cases,
it's fatal not to be able to bind to a port, but for this case, it's OK
if we can only bind to one of IPv4/v6 (fatal iff neither).
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
This replacement is a little menial, but it explicitly catches all
the places where we allow a local socket. The actual implementation of
opening a AF_UNIX socket is almost hidden in the patch.
The detection of "valid address" is now more complex:
p->addr.itype != ADDR_INTERNAL_WIREADDR || p->addr.u.wireaddr.type != ADDR_TYPE_PADDING
But most places we do this, we should audit: I'm pretty sure we can't
get an invalid address any more from gossipd (they may be in db, but
we should fix that too).
Closes: #1323
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
It does all the other address handling, do this too. It also proves useful
as we clean up wildcard address handling.
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
It's become clear that our network options are insufficient, with the coming
addition of Tor and unix domain support.
Currently:
1. We always bind to local IPv4 and IPv6 sockets, unless --port=0, --offline,
or any address is specified explicitly. If they're routable, we announce.
2. --addr is used to announce, but not to control binding.
After this change:
1. --port is deprecated.
2. --addr controls what we bind to and announce.
3. --bind-addr/--announce-addr can be used to control one and not the other.
4. Unless --autolisten=0, we add local IPv4 & IPv6 port 9735 (and announce if they are routable).
5. --offline still overrides listening (though announcing is still the same).
This means we can bind to as many ports/interfaces as we want, and for
special effects we can announce different things (eg. we're sitting
behind a port forward or a proxy).
What remains to implement is semi-automatic binding: we should be able
to say '--addr=0.0.0.0:9999' and have the address resolve at bind
time, or even '--addr=0.0.0.0:0' and have the port autoresolve too
(you could determine what it was from 'lightning-cli getinfo'.
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
We set no_reconnect with --offline, but that doesn't work if !DEVELOPER.
Make the flag positive, and non-DEVELOPER mode for gossipd.
We also don't override portnum with --offline, but have an explicit
'listen' flag.
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
This means gossipd is live and we can tell it things, but it won't
receive incoming connections. The split also means that the main daemon
continues (eg. loading peers from db) while gossipd is loading from the store,
potentially speeding startup.
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
If channeld dies for some reason (eg, reconnect) and we didn't yet announce
the channel, we can miss doing so. This is unusual, because if lightningd
restarts it rearms the callback which gives us funding_locked, so it only
happens if just channel dies before sending the announcement message.
This problem applies to both temporary announcement (for gossipd) and
the real one. For the temporary one, simply re-send on startup, and
remote the error msg gossipd gives if it sees a second one. For the
real one, we need a flag to tell us the depth is sufficient; the peer
will ignore re-sends anyway.
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
When a connect fails, if it's an important peer, we set a timer. If
we have a manual connect command, this means we do this again, leading
to another timer.
For a manual command, free any existing timer; the normal fail logic
will start another if necessary.
Reported-by: @ZmnSCPxj
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
At least say whether we failed to connect at all, or failed cryptographic
handshake, or failed reading/writing init messages.
The errno can be "Operation now in progress" if the other end closes the
socket on us: this happens when we handshake with the wrong key and it
hangs up on us. Fixing this would require work on ccan/io though.
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
When we get a reconnection, kill the current remote peer, and wait for the
master to tell us it's dead. Then we hand it the new peer.
Previously, we would end up with gossipd holding multiple peers, and
the logging was really hard to interpret; I'm not completely convinced
that we did the right thing when one terminated, either.
Note that this now means we can have peers with neither ->local nor ->remote
populated, so we check that more carefully.
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
Currently we intuit it from the fd being closed, but that may happen out
of order with when the master thinks it's dead.
So now if the gossip fd closes we just ignore it, and we'll get a
notification from the master when the peer is disconnected.
The notification is slightly ugly in that we have to disable it for
a channel when we manually hand the channel back to gossipd.
Note: as stands, this is racy with reconnects. See the next patch.
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
(This was sitting in my gossip-enchancement patch queue, but it simplifies
this set too, so I moved it here).
In 94711969f we added an explicit gossip_index so when gossipd gets
peers back from other daemons, it knows what gossip it has sent (since
gossipd can send gossip after the other daemon is already complete).
This solution is insufficient for the more general case where gossipd
wants to send other messages reliably, so replace it with the other
solution: have gossipd drain the "gossip fd" which the daemon returns.
This turns out to be quite simple, and is probably how I should have
done it originally :(
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
1. Lifetime of 'struct reaching' now only while we're actively doing connect.
2. Always free after a single attempt: if it's an important peer, retry
on a timer.
3. Have a single response message to master, rather than relying on
peer_connected on success and other msgs on failure.
4. If we are actively connecting and we get another command for the same
id, just increment the counter
The result is much simpler in the master daemon, and much nicer for
reconnection: if they say to connect they get an immediate response,
rather than waiting for 10 retries. Even if it's an important peer,
it fires off another reconnect attempt, unless it's actively
connecting now.
This removes exponential backoff: that's restored in next patch. It
also doesn't handle multiple addresses for a single peer.
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
Rather than using a flag in reaching/peer; we make it self-contained
as the next patch puts it straight into a timer callback.
Also remove unused 'succeeded' field from struct peer.
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
And on channel_fail_permanent and closing (the two places we drop to
chain), we tell gossipd it's no longer important.
Fixes: #1316
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
These don't have a maximum number of reconnect attempts, and ensure
that we try to reconnect when the peer dies.
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
These were so far only used for bolt11 construction, but we'll need them for the
DNS seed as well, so here we just pull them out into their own unit and prefix
them.
Signed-off-by: Christian Decker <decker.christian@gmail.com>
We're about to remove automatic retrying of connect, and that uncovered
that we actually print out our "Server started" message before we create
the listening socket.
Move the init higher (outside the db transaction) and make it a
request/response, the loop until it's done.
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
Christian points out that we don't get spend notifications for old
channels if we truncate the store. We'd need more work to do this,
either validating the channels are still unspent, or replaying old
blocks from the truncation point.
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
Since we open with O_APPEND, any write() will append as we want it to.
But we want to distinguish a new store creation from a truncation due
to bad version.
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
If something goes (fatally) wrong, we won't add it to the store.
This reveals a latent bug in routing_add_channel_announcement() and
friend which did a take() on msg, which it doesn't own. TAKES means
that it will take ownership IF the caller requests, not an unconditional
ownership transfer (which is an antipattern).
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
We enter nodes in the map when we create channels, but those channels
could be local and unannounced. This triggered a failure in
test_gossip_persistence since the store truncated when it saw the
first thing was a node_announce.
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
Internally both payment and routing use 64-bit, but the interface
between them used 32-bit.
Since both components already support 64-bit we should use that.
In particular, the main daemon and subdaemons share the backtrace code,
with hooks for logging.
The daemon hook inserts the io_poll override, which means we no longer
need io_debug.[ch]. Though most daemons don't need it, they still link
against ccan/io, so it's harmess (suggested by @ZmnSCPxj).
This was tested manually to make sure we get backtraces still.
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
If we only remember the actions that added channels then we'd restore them when
re-reading the gossip_store, so put a tombstone in there to remember to delete
it. These will be cleared upon re-writing the store since the announcements wont
be written anymore.
Signed-off-by: Christian Decker <decker.christian@gmail.com>
This was a tricky one to find, it turns out that some nodes are sending
node_announcements even if they don't have a channel announced yet. If they are
a peer and the channel is currently verifying then we'll have a local channel in
the network view, hence accept the node_announcement, but when replaying, the
node_announcement will be replayed and we won't have a channel yet. This just
skips node_announcements, which is always safe.
Reported-by: @laszlohanyecz
Signed-off-by: Christian Decker <decker.christian@gmail.com>
This now works because we no longer call out to masterd or bitcoind to verify
the channels. It's also rather quick and silent so we can just process all
stored messages until we're done.
Signed-off-by: Christian Decker <decker.christian@gmail.com>
Messages from peers and messages from the gossip_store now have completely
different entrypoints, so we don't need to trace their origin around the message
handling code any longer.
This stores and reads the channel_announcements in the wrapping message which
allows us to store associated data with the raw channel_announcements.
The gossip_store applies channel_announcements directly but it also returns it,
and it gets discarded as a duplicate. In the next commit we'll have gossip_store
apply all changes, bypassing verification, so the duplication is only temporary.
Signed-off-by: Christian Decker <decker.christian@gmail.com>
Since we now store additional data along with the original messages they exceed
the length of the peer wire protocol messages.
Signed-off-by: Christian Decker <decker.christian@gmail.com>
If we have a non-empty file and the version doesn't match, then we truncate and
write our own version. If the file is empty we write our version and the
truncate becomes a no-op
Signed-off-by: Christian Decker <decker.christian@gmail.com>
Since we may want to extend the on-disk format by adding custom information we
may as well just go the extra mile and reuse the serialization primitives we
already have.
Signed-off-by: Christian Decker <decker.christian@gmail.com>
Moves any modifications based on an incoming gossip message into its own
function separate from the message verification. This allows us to skip
verification when reading messages from a trusted source, e.g., the
gossip_store, speeding up the gossip replay.
Signed-off-by: Christian Decker <decker.christian@gmail.com>
When we read from the gossip_store we set store=false so that we don't duplicate
messages in the store.
Signed-off-by: Christian Decker <decker.christian@gmail.com>
As proposed by @rustyrussell this makes it a bit easier to truncate and sync on
read errors.
Signed-off-by: Christian Decker <decker.christian@gmail.com>
Ee will be replaying gossip messages from the gossip_store soon. This means that
not all messages originate from a peer, so we move the queuing of error messages
up into the peer message handler.
Signed-off-by: Christian Decker <decker.christian@gmail.com>
If we're going to simply take() a pointer, don't allocate it off a random
object. Using NULL makes our intent clear, particularly with allocating
packets we're going to take() onto a queue.
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
Now it just returns true if it queued something. This allows it
to queue multiple packets, and lets it share code paths with other code
in future patches.
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
As we add more features, the current code is insufficient.
1. Keep an array of single feature bits, for easy switching on and off.
2. Create feature_offered() which checks for both compulsory and optional
variants.
3. Invert requires_unsupported_features() and unsupported_features()
which tend to be double-negative, all_supported_features() and
features_supported().
4. Move single feature definition from wire/peer_wire.h to common/features.h.
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
We currently keep two copies; one in the broadcast structure to send
in order, and one in the routing information. Since we already keep
the broadcast index in the routing information, use that.
Conveniently, a zero index is the same as the old NULL test.
Rename struct node's announcement_idx to node_announce_msgidx to
make it match the other users.
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
We tal_dup_arr() it, which does take. Make it const in the structure;
the tal_dup_arr() removes the const, so it compiles without it, but it's
misleading.
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
We only access via index. We do, however, want to clean up when we
delete nodes and channels, so we tie lifetimes to that. This leads
us to put the index into 'struct queued_message'.
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
1. make queue_peer_msg() use both if branches, as both equally likely.
2. Remove redundant *scid = NULL in handle_channel_announcement.
3. Log failing pending channel_updates.
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
As per BOLT #7.
We don't do this for channel_update which are queued because the
channel_announcement is pending though.
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
If the channel is pending, we queue the node_announcment and if the channel
is OK we re-call process_node_announcement. Make sure that second call
won't fail if the first succeeded.
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
We already have 'struct node', so rename 'struct routing_channel' to
'struct chan', and 'struct node_connection' to 'struct half_chan'.
Other minor changes:
1. rstate->channels -> rstate->chanmap.
2. 'connections' -> 'half'.
3. connection_to -> half_chan_to
4. connection_from -> half_chan_from
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
The containing `struct routing_channel` contains src and dst, so
remove them. However, the channel_update msgidx does belong int
`struct node_connection` along with the channel_update.
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
Returning the separate first routing_channel was a weird API: just
return the entire array. Sure, we have to treat the first node a bit
differently (because we don't charge ourselves fees), but it's still
simpler.
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
To remove the redundant fields in `struct node_connection` (ie. 'src'
and 'dst' pointers) we need to deal with `struct routing_channel`.
This means we get a series of channels, from which the direction is
implied, so it's a bit more complex to decode. We add a helper
`other_node` to help with this, and since we're the only user of
`connection_to` we change that function to return the index.
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
Failure and pruning were the two places where a node_connection could
be freed; now they both deal with entire channels, we can remove the
NULL checks, and the destructor.
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
We discarded it; we should populate it. The comment is wrong, since
local_add_channel() doesn't add public channels, and we test that above.
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
This is twice the 'update_channel_interval' we get handed.
We delete the non-existent channel_add_connection and delete_connection
declarations from the header too.
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
We currently give them a free pass. The simplest fix is to give them
an old timestamp on initialization.
We still skip unannounced channels, on the assumption that they're
ours. And we set the last_update_timestamp to -1 when we convert to
gossip_getchannels_entry to indicate no update.
This breaks the DEVELOPER=1 pruning test, since we hardcode the 1
week timeout. That's fixed in the next patch.
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
We make new_routing_channel() populate both connections
(active=false), so local_add_channel becomes simpler. We also
suppress listchannels output of active=false unannounced channels, to
avoid breaking tests (also, these are unusable, so it makes sense to
omit them)
It also seems the logic in add_channel_direction is legacy: a
channel_announce cannot replace the scid (that would be a different
channel), we don't allow duplicate announcements, and the announcement
is never NULL.
And since we disallow repeated channel_announce already, I believe
'forward' is always true, greatly simplifying the logic in
handle_pending_cannouncement.
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
This makes 'routing_channel' the primary object in the system; it can have
one or two 'node_connection's attached, and points to two nodes.
The nodes are freed when no more routing_channel refer to them. The
routing_channel are freed when they contain no more 'node_connection'.
This fixes#1072 which I surmise was caused by a dangling
routing_channel after pruning.
Each node contains a single array of 'routing_channel's, not one for
each direction. The 'routing_channel' itself orders nodes in key
order (conveniently the index is equal to the direction flag we use),
and 'node_connection' with source in the same order.
There are helpers to assist with common questions like "which
'node_connection' leads out of this node?".
There are now two ways to find a channel:
1. Direct scid lookup via rstate->channels map.
2. Node key lookup, followed by channel traversal.
Several FIXMEs are inserted for where we can now do things more optimally.
Fixes: #1072
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
We're going to make it a first-class citizen, and pending routing_channel
are not real ones (in particular, we don't want to create pending nodes).
We had a linked list called rstate->pending_cannouncement which we didn't
actually use, so put that back for now and add a FIXME to use a faster
data structure.
We need to check that list now in handle_channel_update, but we never
have a real routing_channel and a pending, unless the routing_channel
isn't public.
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
This hook is called when the queue is empty; we should only send gossip
according to the gossip timer. We're currently dribbling it out after
every message, in violation of the spec.
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
Now we have them, let's use them. I missed one case deliberately, since
that causes merge conflicts when I replace it in a following patch.
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
I'm not completely conviced that we can't end up removing pending things,
so change asserts to simple returns.
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
If we make destroy_node() remove itself from the map, then we simply
need to free it.
We can batch the frees (as we need) simply by reparenting all the pruned
nodes onto a single temporary parent, then freeing it, relying on tal's
internal datastructures.
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
get_connection_by_scid() and update_to_pending() both do the same
lookup which we did in handle_channel_update().
Do the lookup once, and simplify the others.
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
We always hand in "NULL" (which means use tal_len on the msg), except
for two places which do that manually for no good reason.
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
We usually did this, but sometimes they were named after what they did,
rather than what they cleaned up.
There are still a few exceptions:
1. I didn't bother creating destroy_xxx wrappers for htable routines
which already existed.
2. Sometimes destructors really are used for side-effects (eg. to simply
mark that something was freed): these are clearer with boutique names.
3. Generally destructors are static, but they don't need to be: in some
cases we attach a destructor then remove it later, or only attach
to *some* cases. These are best with qualifiers in the destroy_<type>
name.
Suggested-by: @ZmnSCPxj
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
DEBUG:root:lightningd(16333): 2018-02-08T02:12:21.158Z lightningd(8262): lightning_openingd(0382ce59ebf18be7d84677c2e35f23294b9992ceca95491fcf8a56c6cb2d9de199): Failed hdr decrypt with rn=2
We only hand off the peer if we've not started writing, but that was
insufficient: we increment the sn twice on encrypting packet, so there's
a window before we've actually started writing where this is now
wrong.
The simplest fix is only to hand off from master when we've just written,
and have the read-packet path simply wake the write-packet path.
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
We get intermittant failure: WIRE_UNKNOWN_NEXT_PEER (First peer not ready)
because CHANNELD_NORMAL and actually telling gossipd that the channel
is available are distinct things: we need both.
(For test_closing_different_fees, we were testing CHANNELD_NORMAL on
the peer, not on l1, too).
But we may also directly send the announcement sigs if the height is
sufficient, so the simplest is to unify the messages.
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
Now we have wirestring, this is much more natural. And with the
24M length limit, we needn't be so concerned about dumping 64k peer
messages in hex.
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
These are now logically arrays of pointers. This is much more natural,
and gets rid of the horrible utxo array converters.
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
Commit a57a2dcb86 introduced a time_t
in routing.h. So also move the time.h include to the header. This
fixes the build on FreeBSD.
Signed-off-by: Wladimir J. van der Laan <laanwj@gmail.com>
We were dropping these on the floor while checking for txout. So now
we add a map that holds announcements while we are checking.
Signed-off-by: Christian Decker <decker.christian@gmail.com>
We are wasting way too much time looking for announcements and updates
in the broadcast. We can just hint where to find the message to be
evicted and safe the traversal.
Signed-off-by: Christian Decker <decker.christian@gmail.com>
Adding channels that we are currently verifying to the map, and
skipping if we already have a channel at that position.
Signed-off-by: Christian Decker <decker.christian@gmail.com>
We use this technique for the other tags, so use it here too.
This was drawn to my attention when I made more than 10 channels in a
block, and the string changed length:
Valgrind error file: valgrind-errors.31415
==31415== Conditional jump or move depends on uninitialised value(s)
==31415== at 0x4C35E20: bcmp (in /usr/lib/valgrind/vgpreload_memcheck-amd64-linux.so)
==31415== by 0x11A624: queue_broadcast (broadcast.c:40)
==31415== by 0x118D93: handle_pending_cannouncement (routing.c:704)
==31415== by 0x1109E3: handle_txout_reply (gossip.c:1796)
==31415== by 0x111177: recv_req (gossip.c:1955)
==31415== by 0x136723: next_plan (io.c:59)
==31415== by 0x137220: do_plan (io.c:387)
==31415== by 0x13725E: io_ready (io.c:397)
==31415== by 0x138B97: io_loop (poll.c:305)
==31415== by 0x111352: main (gossip.c:2022)
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
We drop all but the first announcement, so any work that is done for a
channel that we already know is wasted. Pulling this up duplicates
some of the work but allows us to skip the costly txout check.
Signed-off-by: Christian Decker <decker.christian@gmail.com>
`tal_fmt` overallocates the returned string under some circumstances,
meaning that the trailer of the formatted string is unset, but still
considered in `tal_len`. The solution then is to truncate the
formatted string to the real string length. Only necessary here, since
we mix strings and `tal_len`.
Signed-off-by: Christian Decker <decker.christian@gmail.com>
We need to make sure all the updates are known to gossip. Since
one is the local update, we change that message to look the same.
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
Otherwise, we otherwise end up with out-of-order updates
(ie. preceeding announcements).
I assume that is because of the locally-inserted connections.
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
This is done it two parts, since we have to ask the main daemon to do
the lookup for us.
If this becomes a bottleneck, we can have a separate daemon, or even
an RPC pipe to bitcoind ourselves.
Fixes: #403
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
Our handling of SIGPIPE was incoherent and inconsistent, and we had much
cut & paste between the daemons. They should *ALL* ignore SIGPIPE, and
much of the rest of the boilerplate can be shared, so should be.
Reported-by: @ZmnSCPxj
Fixes: #528
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>