connectd tells master about every disconnection, and master knows
whether it's important to reconnect. Just get the master to invoke a new
connect command if it considers the peer important!
The only twist is timeouts: we don't want to immediately reconnect if
we've failed to connect. To solve this, connectd passes a 'delaytime'
to the master when a connection fails, and the master passes it back
when it asks for a connection.
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
We used to separate implicit connection requests (ie. timed retries
for important peers) and explicit ones, and send a
WIRE_CONNECTCTL_CONNECT_TO_PEER_RESULT for the latter.
In the success case, that's now redundant, since we hand the connected
peer to the master using WIRE_CONNECT_PEER_CONNECTED; we just need a
message for the failure case. And we might as well tell the master
every failure, so we don't have to distinguish internally.
This also solves a race we had before: connectd would send
WIRE_CONNECTCTL_CONNECT_TO_PEER_RESULT which completes the incoming
JSON connect command, then send WIRE_CONNECT_PEER_CONNECTED. So
there's a window where the JSON command can return, but the peer isn't
known to lightningd yet.
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
We don't want to exit just because channel parameter negotiation
failed, but we do want to tell the master if it was a channel we were
trying to fund.
Note that lightningd still needs to fail the funding cmd if it gets a
fromwire_opening_fundee (they raced us and won), or an outright
failure.
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
Previously master would fail once the channel has been negotiated,
which is terrible, since the funder will have already broadcast tx.
Now we tell them if we have an active channel, and update if it goes away.
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
We now simply maintain a pubkey set for connected peers (we only care
if there's a reconnect), not the entire peer structure.
lightningd no longer queries us for getpeers: it knows more than we do
already.
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
Prior to this, lightningd would hand uninteresting peers back to connectd,
which would then return it to lightningd if it sent a non-gossip msg,
or if lightningd asked it to release the peer.
Now connectd hands the peer to lightningd once we've done the init
handshake, which hands it off to openingd.
This is a deep structural change, so we do the minimum here and cleanup
in the following patches.
Lightningd:
1. Remove peer_nongossip handling from connect_control and peer_control.
2. Remove list of outstanding fundchannel command; it was only needed to
find the race between us asking connectd to release the peer and it
reconnecting.
3. We can no longer tell if the remote end has started trying to fund a
channel (until it has succeeded): it's very transitory anyway so not
worth fixing.
4. We now always have a struct peer, and allocate an uncommitted_channel
for it, though it may never be used if neither end funds a channel.
5. We start funding on messages for openingd: we can get a funder_reply
or a fundee, or an error in response to our request to fund a channel.
so we handle all of them.
6. A new peer_start_openingd() is called after connectd hands us a peer.
7. json_fund_channel just looks through local peers; there are none
hidden in connectd any more.
8. We sometimes start a new openingd just to send an error message.
Openingd:
1. We always have information we need to accept them funding a channel (in
the init message).
2. We have to listen for three fds: peer, gossip and master, so we opencode
the poll.
3. We have an explicit message to start trying to fund a channel.
4. We can be told to send a message in our init message.
Testing:
1. We don't handle some things gracefully yet, so two tests are disabled.
2. 'hand_back_peer .*: now local again' from connectd is no longer a message,
openingd says 'Handed peer, entering loop' once its managing it.
3. peer['state'] used to be set to 'GOSSIPING' (otherwise this field doesn't
exist; 'state' is now per-channel. It doesn't exist at all now.
4. Some tests now need to turn on IO logging in openingd, not connectd.
5. There's a gap between connecting on one node and having connectd on
the peer hand over the connection to openingd. Our tests sometimes
checked getpeers() on the peer, and didn't see anything, so line_graph
needed updating.
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
openingd calculates our reserve based on the channel amount (even if
we're funding, to keep the calculation in one place), but it wasn't
reporting it back to the master daemon. We initialized it to 0 so that
valgrind wouldn't get upset, as it's part of a structure we send over
the wire.
Have openingd report back, and also initialize it to an impossible value
as extra assurance. And remove a stray (harmless but weird) semicolon.
Reported-by: Gálli Zoltán
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
It's only used so we can timeout being fundee after a few hundred
blocks, but when openingd is started for idle connections, the
difference can be huge.
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
Checking in the master doesn't help anything, and it's weird.
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
1diff --git a/connectd/connect.c b/connectd/connect.c
index 138b73fc..b01d1546 100644
Fortunately, we hit the assert in wallet_peer_delete() if this happens,
since there are still active channels.
This latent bug becomes far more likely in followup patches, where
openingd is used for idle peers.
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
This gives the network options a chance to load from arguments before usage
exits, so that the proper defaults are shown.
This didn't work:
lightningd --mainnet --help
before it showed testnet defaults, now it shows mainnet defaults.
Signed-off-by: William Casarin <jb55@jb55.com>
This adds one line with the onion and the channel_update we extract from
it. This in turn allows us to check that the channel_update in the onion is not
type prefixed, and that we patch it correctly before passing it to gossipd.
As was pointed out by @robtex we have underspecified the format of the nested
`channel_update` in the onionreply: lnd and eclair inserted the raw
channel_update without the type prefix, while we went for the full wire format,
including the type prefix. While we agreed that with the type it is more
flexible, and consistent, we decided to adapt to the majority and at least be
compatibly broken.
This commit takes care of being able to interpret either format correctly. It's
not perfect since signatures can happen to start with 0x0102 (the channel_update
type) but that'll happen only once ever 65k failures.
The easiest way to do this is to play with the 'wallet_tx' semantics
and have 'amount' have meaning even when 'all_funds' is set.
Note that we change the string 'Cannot afford funding transaction' to
'Cannot afford transaction' as this code is also used for withdrawls.
Inspired-by: molz on #c-lightning
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
In several places we use low-level tal functions because we want the
label to be something other than the default. ccan/tal is adding
tal_*_label so replace them and shim it for now.
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
tal_count() is used where there's a type, even if it's char or u8, and
tal_bytelen() is going to replace tal_len() for clarity: it's only needed
where a pointer is void.
We shim tal_bytelen() for now.
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
Valgrind error file: valgrind-errors.772802
==772802== Invalid read of size 1
==772802== at 0x4C32D04: strlen (in /usr/lib/valgrind/vgpreload_memcheck-amd64-linux.so)
==772802== by 0x14479C: escape (json_escaped.c:41)
==772802== by 0x144B6C: json_escape (json_escaped.c:117)
==772802== by 0x118518: json_getnodes_reply (gossip_control.c:209)
==772802== by 0x139394: sd_msg_reply (subd.c:281)
==772802== by 0x139972: sd_msg_read (subd.c:418)
==772802== by 0x17ABB1: next_plan (io.c:59)
==772802== by 0x17B6A9: do_plan (io.c:387)
==772802== by 0x17B6E7: io_ready (io.c:397)
==772802== by 0x17D2C8: io_loop (poll.c:310)
==772802== by 0x121973: main (lightningd.c:450)
==772802== Address 0x6fe5168 is 0 bytes after a block of size 72 alloc'd
==772802== at 0x4C2FB0F: malloc (in /usr/lib/valgrind/vgpreload_memcheck-amd64-linux.so)
==772802== by 0x18843E: allocate (tal.c:245)
==772802== by 0x18899D: tal_alloc_ (tal.c:421)
==772802== by 0x188B5E: tal_alloc_arr_ (tal.c:464)
==772802== by 0x119BAB: fromwire_gossip_getnodes_entry (gossip_msg.c:35)
==772802== by 0x15CCD6: fromwire_gossip_getnodes_reply (gen_gossip_wire.c:111)
==772802== by 0x118436: json_getnodes_reply (gossip_control.c:192)
==772802== by 0x139394: sd_msg_reply (subd.c:281)
==772802== by 0x139972: sd_msg_read (subd.c:418)
==772802== by 0x17ABB1: next_plan (io.c:59)
==772802== by 0x17B6A9: do_plan (io.c:387)
==772802== by 0x17B6E7: io_ready (io.c:397)
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
This seems like a premature optimization: it tried to cut down the number of
allocations by reusing the same `struct invoice_details` while iterating through
a number of results. But this sidesteps the checks by `valgrind` and we'd miss a
missing field that was set by the previous iteration.
Reported-by: @rustyrussell
Signed-off-by: Christian Decker <@cdecker>
Several users have noticed that they cannot pay satoshis.place or similar places
that have tiny payment amounts if they are not directly connected. This is due
to the forwarding fee dominating the transferred amount.
This commit adds a new option, exempting tiny fees (up to 5 satoshis by default)
from having to pass the maxfeepercent flag. While we could have told users to
tweak maxfeepercent I think it is usefull to have a default exemption.
[Squashed --RR]
Developer errors result in command_fail being called
just like other errors. The bad_programmer() Test is now updated
and passing.
Signed-off-by: Mark Beckwith <wythe@intrig.com>
They now just call command_fail() and cause param() to return false.
Temporarily disabled all the run-param.c tests that redirect
asserts so CI would still pass.
Signed-off-by: Mark Beckwith <wythe@intrig.com>
We use these for receiving arrays at init time, we should also use them
for fulfull/fail of HTLCs in normal operation. That we we benefit from all
those assertions.
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
The master tells us the short_channel_id of the outgoing channel when
failing an HTLC, but channeld didn't store it anywhere. It also
didn't tell channeld the short_channel_id in the case where we're
reconnecting and it's feeding us an array of failed htlcs.
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
We used to just manually set ROUTING_FLAGS_DISABLED, but that means we
then suppressed the real channel_update because we thought it was a
duplicate!
So use a local flag: set it for the channel when the peer disconnects,
and clear it when channeld sends a local update.
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
Just log the failed ones, not every connection and successful commands.
Before (VALGRIND=0 -n10):
111 passed, 1 skipped in 175.78 seconds
After:
111 passed, 1 skipped in 173.92 seconds
111 passed, 1 skipped in 164.16 seconds
111 passed, 1 skipped in 171.30 seconds
111 passed, 1 skipped in 180.05 seconds
111 passed, 1 skipped in 180.04 seconds
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>