We currently hand the error back to the master, who then stores it for
future connections and hands it back to another openingd to send and exit.
Just send directly; it's more reliable and simpler.
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
Include it as an optional field in the connect_to_peer message (it was
added before we had optional fields).
The only issue is that reconnects want it too, so again connectd hands
it back to master in connectctl_connect_failed.
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
connectd tells master about every disconnection, and master knows
whether it's important to reconnect. Just get the master to invoke a new
connect command if it considers the peer important!
The only twist is timeouts: we don't want to immediately reconnect if
we've failed to connect. To solve this, connectd passes a 'delaytime'
to the master when a connection fails, and the master passes it back
when it asks for a connection.
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
We used to separate implicit connection requests (ie. timed retries
for important peers) and explicit ones, and send a
WIRE_CONNECTCTL_CONNECT_TO_PEER_RESULT for the latter.
In the success case, that's now redundant, since we hand the connected
peer to the master using WIRE_CONNECT_PEER_CONNECTED; we just need a
message for the failure case. And we might as well tell the master
every failure, so we don't have to distinguish internally.
This also solves a race we had before: connectd would send
WIRE_CONNECTCTL_CONNECT_TO_PEER_RESULT which completes the incoming
JSON connect command, then send WIRE_CONNECT_PEER_CONNECTED. So
there's a window where the JSON command can return, but the peer isn't
known to lightningd yet.
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
We now simply maintain a pubkey set for connected peers (we only care
if there's a reconnect), not the entire peer structure.
lightningd no longer queries us for getpeers: it knows more than we do
already.
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
Prior to this, lightningd would hand uninteresting peers back to connectd,
which would then return it to lightningd if it sent a non-gossip msg,
or if lightningd asked it to release the peer.
Now connectd hands the peer to lightningd once we've done the init
handshake, which hands it off to openingd.
This is a deep structural change, so we do the minimum here and cleanup
in the following patches.
Lightningd:
1. Remove peer_nongossip handling from connect_control and peer_control.
2. Remove list of outstanding fundchannel command; it was only needed to
find the race between us asking connectd to release the peer and it
reconnecting.
3. We can no longer tell if the remote end has started trying to fund a
channel (until it has succeeded): it's very transitory anyway so not
worth fixing.
4. We now always have a struct peer, and allocate an uncommitted_channel
for it, though it may never be used if neither end funds a channel.
5. We start funding on messages for openingd: we can get a funder_reply
or a fundee, or an error in response to our request to fund a channel.
so we handle all of them.
6. A new peer_start_openingd() is called after connectd hands us a peer.
7. json_fund_channel just looks through local peers; there are none
hidden in connectd any more.
8. We sometimes start a new openingd just to send an error message.
Openingd:
1. We always have information we need to accept them funding a channel (in
the init message).
2. We have to listen for three fds: peer, gossip and master, so we opencode
the poll.
3. We have an explicit message to start trying to fund a channel.
4. We can be told to send a message in our init message.
Testing:
1. We don't handle some things gracefully yet, so two tests are disabled.
2. 'hand_back_peer .*: now local again' from connectd is no longer a message,
openingd says 'Handed peer, entering loop' once its managing it.
3. peer['state'] used to be set to 'GOSSIPING' (otherwise this field doesn't
exist; 'state' is now per-channel. It doesn't exist at all now.
4. Some tests now need to turn on IO logging in openingd, not connectd.
5. There's a gap between connecting on one node and having connectd on
the peer hand over the connection to openingd. Our tests sometimes
checked getpeers() on the peer, and didn't see anything, so line_graph
needed updating.
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
Checking in the master doesn't help anything, and it's weird.
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
1diff --git a/connectd/connect.c b/connectd/connect.c
index 138b73fc..b01d1546 100644
tal_count() is used where there's a type, even if it's char or u8, and
tal_bytelen() is going to replace tal_len() for clarity: it's only needed
where a pointer is void.
We shim tal_bytelen() for now.
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
gossip_getnodes_entry was used by gossipd for reporting nodes, and for
reporting peers. But the local_features field is only available for peers,
and most other fields are only available from node_announcement.
Note that the connectd change actually means we get less information
about peers: gossipd used to do the node lookup for peers and include the
node_announcement information if it had it.
Since generate_wire.py can't create arrays-of-arrays, we add a 'struct
peer_features' to encapsulate the two feature arrays for each peer, and
for convenience we add it to lightningd/gossip_msg.
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
This patch guts gossipd of all peer-related functionality, and hands
all the peer-related requests to channeld instead.
gossipd now gets the final announcable addresses in its init msg, since
it doesn't handle socket binding any more.
lightningd now actually starts connectd, and activates it. The init
messages for both gossipd and connectd still contain redundant fields
which need cleaning up.
There are shims to handle the fact that connectd's wire messages are
still (mostly) gossipd messages.
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
Removed `json_get_params`.
Also added json_tok_percent and json_tok_newaddr. Probably should
have been a separate PR but it was so easy.
[ Squashed comment update for gcc workaround --RR ]
Signed-off-by: Mark Beckwith <wythe@intrig.com>
json_listpeers returns an array of peers, and an array of nodes: the latter
is a subset of the former, and is used for printing alias/color information.
This changes it so there is a 1:1 correspondance between the peer information
and nodes, meaning no more O(n^2) search.
If there is no node_announce for a peer, we use a negative timestamp
(already used to indicate that the rest of the gossip_getnodes_entry
is not valid).
Other fixes:
1. Use get_node instead of iterating through the node map.
2. A node without addresses is perfectly valid: we have to use the timestamp
to see if the alias/color are set. Previously we wouldn't print that
if it didn't also advertize an address.
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
structeq() is too dangerous: if a structure has padding, it can fail
silently.
The new ccan/structeq instead provides a macro to define foo_eq(),
which does the right thing in case of padding (which none of our
structures currently have anyway).
Upgrade ccan, and use it everywhere. Except run-peer-wire.c, which
is only testing code and can use raw memcmp(): valgrind will tell us
if padding exists.
Interestingly, we still declared short_channel_id_eq, even though
we didn't define it any more!
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
Until now, `command_fail()` reported an error code of -1 for all uses.
This PR adds an `int code` parameter to `command_fail()`, requiring the
caller to explicitly include the error code.
This is part of #1464.
The majority of the calls are used during parameter validation and
their error code is now JSONRPC2_INVALID_PARAMS.
The rest of the calls report an error code of LIGHTNINGD, which I defined to
-1 in `jsonrpc_errors.h`. The intention here is that as we improve our error
reporting, all occurenaces of LIGHTNINGD will go away and we can eventually
remove it.
I also converted calls to `command_fail_detailed()` that took a `NULL` `data`
parameter to use the new `command_fail()`.
The only difference from an end user perspecive is that bad input errors that
used to be -1 will now be -32602 (JSONRPC2_INVALID_PARAMS).
Just have a "new depth" callback, and let channeld do the right thing.
This makes the channeld paths a bit more straightforward.
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
This replacement is a little menial, but it explicitly catches all
the places where we allow a local socket. The actual implementation of
opening a AF_UNIX socket is almost hidden in the patch.
The detection of "valid address" is now more complex:
p->addr.itype != ADDR_INTERNAL_WIREADDR || p->addr.u.wireaddr.type != ADDR_TYPE_PADDING
But most places we do this, we should audit: I'm pretty sure we can't
get an invalid address any more from gossipd (they may be in db, but
we should fix that too).
Closes: #1323
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
It does all the other address handling, do this too. It also proves useful
as we clean up wildcard address handling.
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
No new functionality, just a continuation of my work toward completing #665.
I removed the common members of `struct withdrawal` and `struct fund_channel`
and placed them in a new `struct wallet_tx`. Then it was fairly straightforward
to reimplement the existing code in terms of `wallet_tx`.
Since I made some structural changes I wanted to get this approved before I
go any farther.
Added 'all' to fundchannel help message.
We're getting spurious closures, even on mainnet. Using --ignore-fee-limits
is dangerous; it's slightly less so to lower the minimum (which is the
usual cause of problems).
So let's halve it, but beware the floor.
This is a workaround, until we get independent feerates in the spec.
Fixes: #613
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
When we get a reconnection, kill the current remote peer, and wait for the
master to tell us it's dead. Then we hand it the new peer.
Previously, we would end up with gossipd holding multiple peers, and
the logging was really hard to interpret; I'm not completely convinced
that we did the right thing when one terminated, either.
Note that this now means we can have peers with neither ->local nor ->remote
populated, so we check that more carefully.
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
Currently we intuit it from the fd being closed, but that may happen out
of order with when the master thinks it's dead.
So now if the gossip fd closes we just ignore it, and we'll get a
notification from the master when the peer is disconnected.
The notification is slightly ugly in that we have to disable it for
a channel when we manually hand the channel back to gossipd.
Note: as stands, this is racy with reconnects. See the next patch.
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
(This was sitting in my gossip-enchancement patch queue, but it simplifies
this set too, so I moved it here).
In 94711969f we added an explicit gossip_index so when gossipd gets
peers back from other daemons, it knows what gossip it has sent (since
gossipd can send gossip after the other daemon is already complete).
This solution is insufficient for the more general case where gossipd
wants to send other messages reliably, so replace it with the other
solution: have gossipd drain the "gossip fd" which the daemon returns.
This turns out to be quite simple, and is probably how I should have
done it originally :(
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
1. Lifetime of 'struct reaching' now only while we're actively doing connect.
2. Always free after a single attempt: if it's an important peer, retry
on a timer.
3. Have a single response message to master, rather than relying on
peer_connected on success and other msgs on failure.
4. If we are actively connecting and we get another command for the same
id, just increment the counter
The result is much simpler in the master daemon, and much nicer for
reconnection: if they say to connect they get an immediate response,
rather than waiting for 10 retries. Even if it's an important peer,
it fires off another reconnect attempt, unless it's actively
connecting now.
This removes exponential backoff: that's restored in next patch. It
also doesn't handle multiple addresses for a single peer.
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
And on channel_fail_permanent and closing (the two places we drop to
chain), we tell gossipd it's no longer important.
Fixes: #1316
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
This will allow us in the next commit to store the transactions that triggered
this event in the DB and thus allowing us to replay them later on.
Signed-off-by: Christian Decker <decker.christian@gmail.com>
d822ba1ee accidentally removed this case, which is important: if the
other side didn't get our final matching closing_signed, it will
reconnect and try again. We consider the channel no longer "active"
and thus ignore it, and get upset when it send the
`channel_reestablish` message.
We could just consider CLOSINGD_COMPLETE to be active, but then we'd
have to wait for the closing transaction to be mined before we'd allow
another connection.
We can't special case it when the peer reconnects, because there
could be (in theory) multiple channels for that peer in CLOSINGD_COMPLETE,
and we don't know which one to reestablish.
So, we need to catch this when they send the reestablish, and hand
that msg to closingd to do negotiation again. We already have code
to note that we're in CLOSINGD_COMPLETE and thus ignore any result
it gives us.
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
The new connect code revealed an existing race: we tell gossipd to
release the peer, but at the same time it connects in. gossipd fails
the release because the peer is remote, and json_fundchannel fails.
Instead, we catch this race when we get peer_connected() and we were
trying to open a channel. It means keeping a list of fundchannels which
are awaiting a gossipd response though.
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
We missed it in some corner cases where we crashed/were killed between
being told of the lockin and sending the channel_normal_operation message.
When we were restarted, we were told both sides were locked in already,
so we never updated the state.
Pull the entire "tell channeld" logic into channel_control.c, and make
it clear that we need to keep waching if we cant't tell channeld. I think
we did get this correct in practice, since funding_announce_cb has the
same test, but it's better to be clear.
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
We'd usually commit to the db soon, but there's a window where it
could be missed.
Also moves loc into the block it's used and make it tmpctx to avoid
an explicit free.
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
Also report tx and txid, and whether we closed unilaterally or
bilaterally, if we could close the channel.
Also make a manpage.
Fixes: #1207Fixes: #714Fixes: #622
All of the callback functions were only using the tx to generate the txid again,
so we just pass that in directly and save passing the tx itself.
This is a simplification to move to the DB backed depth callbacks. It'd be
rather wasteful to read the rawtx and deserialize just to serialize right away
again to find the txid, when we already searched the DB for exactly that txid.
Signed-off-by: Christian Decker <decker.christian@gmail.com>
I saw a failure in test_funding_fail():
assert l2.rpc.listpeers()['peers'][0]['connected']
This can happen if l2 hasn't yet handed back to gossipd. Turns out
we didn't mark uncommitted channels as connected:
[{'id': '03afa3c78bb39217feb8aac308852e6383d59409839c2b91955b2d992421f4a41e', 'connected': False, 'channels': [{'state': 'OPENINGD', 'owner': 'lightning_openingd', 'funder': 'REMOTE', 'status': ['Incoming channel: accepted, now waiting for them to create funding tx']}]}]
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
So we know how much counterparty could theoretically steal from us
(msatoshi_to_us - msatoshi_to_us_min) and how much we could
theoretically steal from counterparty (msatoshi_to_us_max -
msatoshi_to_us).
For more piloting goodness.
This bug is a classic case of being lazy:
1. peer_accept_channel() allocated its return off the input message,
rather than taking an explicit allocation context. This concealed the
lifetime nature of the return.
2. The context for sanitize_error was the error itself, rather than the
more obvious tmpctx (connect_failed does not take).
The global tmpctx removes the "efficiency" excuse for grabbing a random
object to use as context, and is also nice and explicit.
All-the-hard-work-by: @ZmnSCPxj
As we add more features, the current code is insufficient.
1. Keep an array of single feature bits, for easy switching on and off.
2. Create feature_offered() which checks for both compulsory and optional
variants.
3. Invert requires_unsupported_features() and unsupported_features()
which tend to be double-negative, all_supported_features() and
features_supported().
4. Move single feature definition from wire/peer_wire.h to common/features.h.
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
This simplifies things, and means it's always in the database. Our
previous approach to creating it on the fly had holes when it was
created for onchaind, causing us to use another every time we
restarted.
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
I leave all the now-unnecessary accessors in place to avoid churn, but
the use of bitfields has been more pain than help.
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
The billboard is now far more useful to tell what's going on, and this
gets us closer to a state == owner mapping.
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
We always hand in "NULL" (which means use tal_len on the msg), except
for two places which do that manually for no good reason.
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
We also fold opening_got_hsm_funding_sig() into the caller; it was
previously a callback before we decided to always use the HSM
synchronously.
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
Because peer_failed would previously drop the connection, we had a
special 'negotiation_failed' message which made the master hand it
back to gossipd. We don't need that any more.
This also meant we no longer need a special hook in read_peer_msg
for openingd to send this message.
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
And now we can finally do the db upgrade to remove any OPENINGD
channels once, since we never put them back.
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
It's giant, but it's encapsulating at least. It is called from the wallet
code when loading channels, or from the opening code when converting
an uncommitted_channel.
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
Now any struct channel is a genuine channel, the following fields are
always valid:
1. funding_txid: doesn't need to be a pointer.
2. our_msatoshi: doesn't need to be a pointer.
3. last_sig: doesn't need to be a pointer.
4. channel_info: doesn't need to be a pointer.
In addition, 'last_tx' is always valid.
The main effect is to remove a whole heap of branches from the wallet code.
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
Each peer can have one 'uncommitted' channel, which is in the process
of opening. This is used for openingd, and then on return we convert
it into a full-fledged struct channel and commit it into the database.
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
This means the caller needs to supply an explicit log to base the
subd log on, and also a callback for error handling.
The callback is kind of ugly, but it gets reworked towards the end
of this series.
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
Once we rely on the logbook outlasting the peer, we can't refer to the
peer from the logbook function:
Valgrind error file: valgrind-errors.26567
==26567== Invalid read of size 8
==26567== at 0x126297: copy_to_parent_log (peer_control.c:690)
==26567== by 0x11C06B: maybe_print (log.c:253)
==26567== by 0x11C145: logv (log.c:270)
==26567== by 0x11C448: log_ (log.c:319)
==26567== by 0x132951: destroy_subd (subd.c:537)
==26567== by 0x179C19: notify (tal.c:240)
==26567== by 0x17A0CE: del_tree (tal.c:400)
==26567== by 0x17A120: del_tree (tal.c:410)
==26567== by 0x17A4ED: tal_free (tal.c:509)
==26567== by 0x16DEB5: io_close (io.c:443)
==26567== by 0x1328BC: sd_msg_read (subd.c:516)
==26567== by 0x1320AC: read_fds (subd.c:328)
==26567== Address 0x6cf9ca0 is 48 bytes inside a block of size 216 free'd
==26567== at 0x4C30D3B: free (in /usr/lib/valgrind/vgpreload_memcheck-amd64-linux.so)
==26567== by 0x17A1A9: del_tree (tal.c:421)
==26567== by 0x17A4ED: tal_free (tal.c:509)
==26567== by 0x124B6C: delete_peer (peer_control.c:180)
==26567== by 0x12B369: destroy_uncommitted_channel (peer_control.c:2505)
==26567== by 0x179C19: notify (tal.c:240)
==26567== by 0x17A0CE: del_tree (tal.c:400)
==26567== by 0x17A4ED: tal_free (tal.c:509)
==26567== by 0x12B31E: opening_channel_errmsg (peer_control.c:2496)
==26567== by 0x13243A: handle_peer_error (subd.c:407)
==26567== by 0x1326E4: sd_msg_read (subd.c:472)
==26567== by 0x1320AC: read_fds (subd.c:328)
==26567== Block was alloc'd at
==26567== at 0x4C2FB0F: malloc (in /usr/lib/valgrind/vgpreload_memcheck-amd64-linux.so)
==26567== by 0x179C83: allocate (tal.c:250)
==26567== by 0x17A250: tal_alloc_ (tal.c:448)
==26567== by 0x124950: new_peer (peer_control.c:151)
==26567== by 0x12B3EC: new_uncommitted_channel (peer_control.c:2521)
==26567== by 0x12B5C5: peer_accept_channel (peer_control.c:2569)
==26567== by 0x126099: peer_sent_nongossip (peer_control.c:641)
==26567== by 0x113B28: peer_nongossip (gossip_control.c:55)
==26567== by 0x113D9D: gossip_msg (gossip_control.c:144)
==26567== by 0x132783: sd_msg_read (subd.c:487)
==26567== by 0x1320AC: read_fds (subd.c:328)
==26567== by 0x16D1FE: next_plan (io.c:59)
==26567==
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
BackgroundL Each log has a log_book: many logs can share the same one,
as each one can have a separate prefix.
Testing tickled a bug at the end of this series, where subd was
logging to the peer's log_book on shutdown, but the peer was already
freed. We've already had issues with logging while lightningd is
shutting down.
There are times when reference counting really is the right answer,
this seems to be one of them: the 'struct log' share the 'struct
log_book' and the last 'struct log' cleans it up.
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
We derive the seed from this, so it needs to be unique, but using
rowid forced us to put the channel into the db early, before it
was ready.
Instead, use a counter to ensure uniqueness, initialized when we load
existing peers. This doesn't need to touch the database at all.
As we now have only two places where the channel is committed (the
funder and fundee paths), so we create a new explicit
'wallet_channel_insert()' function: 'wallet_channel_save()' now just
updates.
Note that this also fixes some weirdness in
wallet_channels_load_active: we strangely avoided loading channels in
CLOSINGD_COMPLETE (which fortunately was a transient state, so
unlikely anyone hit this). Note that since the lines above already
delete all the OPENINGD channels, we now simply load them all.
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
This provides a sanity check that we are in sync, and also keeps the
logic in the program and out of the SQL.
Since the destructor now doesn't clean up the peer, there are some
wider changes to be made when cleaning up. Most notably we create
lots of channels in run-wallet.c and they previously freed the peer:
now we need free the peer explicitly, so we need to free them first.
Suggested-by: @cdecker
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>