When compiled without DEVELOPER this will now filter out `remote_addr` that
come from localhost. The testcase checks for DEVELOPER to test for correct
function of `remote_addr`.
Also, I renamed "test_connect" to "test_connect_basic" so it can be started
without all the other tests in that file that start with "test_connect..."
For example, if you do:
```
./lightningd/lightningd --network=regtest --experimental-websocket-port=19846
```
Then you're trying to reuse the normal port as the websocket port, but this
only fails at *listen* time, when we activate connectd. Catch this too.
Fixes incorrect fatal() message, too.
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
By accessing `addr` after the loop, it's possible that it's one which
failed, in complex scenarios.
Also gives us a chance to warn if they specify a websocket but don't
actually end up advertizing it (you *must* advertize a normal addr as
well).
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
We always added to both arrays, might as well just keep one.
We make mayfail an explicit flag, rather than relying on the presence
of errstr, which is never NULL now.
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
This lets us give a better error message if listen fails, and also
moved the callback closer to where it's needed.
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
Gossipd now simply gets told by channeld when peers arrive or leave.
(it only needs to know for the seeker).
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
This is neater than what we had before, and slightly more general.
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
Changelog-Changed: JSON_RPC: `sendcustommsg` now works with any connected peer, even when shutting down a channel.
We don't need to log msgs from subds, but we do our own, and we weren't.
1. Rename queue_peer_msg to inject_peer_msg for clarity, make it do logging
2. In the one place where we're relaying, call msg_queue() directly.
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
We want to stream gossip through this, but currently connectd treats the
fd as synchronous. While we work on getting rid of that, it's easiest to
have two fds.
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
We used to shut down peers atomically, but now we flush the
connections there's a delay. If we are asked to connect in that time,
we ignore it, as we are already connected, but that's wrong: we need
to remember that we were told to connect and reconnect.
This should solve a few weird test failures where "connect" would hang
indefinitely.
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
This is critical in the common case where peer sends an error and
hangs up: we almost never get to relay the error to the subd in time.
This also applies in the other direction: we need to flush the queue
to the peer when the subd closes. Note we only free the actual peer
struct when lightningd reaps us with connectd_peer_disconnected().
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
We would lose packets sometimes due to this previously, but it
doesn't happen over localhost so our tests didn't notice. However,
now we have connectd being sole thing talking to peers, we can do
a more elegant shutdown, which should fix closing.
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
Changelog-Fixed: Protocol: Always flush sockets to increase chance that final message get to peer (esp. error packets).
msg_queue was originally designed for inter-daemon comms, and so it has
a special mechanism to mark that we're trying to send an fd. Unfortunately,
a peer could also send such a message, confusing us!
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
dev_blackhole_fd was a hack, and doesn't work well now we are async
(it worked for sync comms in per-peer daemons, but now we could sneak
through a read before we get to the next write).
So, make explicit flags and use them. This is much easier now we
have all peer comms in one place.
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
It's weird to have connectd ask gossipd, when lightningd can just do it
and hand all the addresses together.
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
We now let gossipd do it.
This also means there's nothing left in 'struct per_peer_state' to
send across the wire (the fds are sent separately), so that gets
removed from wire messages too.
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
We actually intercept the gossip_timestamp_filter, so the gossip_store
mechanism inside the per-peer daemon never kicks off for normal connections.
The gossipwith tool doesn't set OPT_GOSSIP_QUERIES, so it gets both, but
that only effects one place.
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
channeld can't do it any more: it's using local sockets. Connectd
can do it, and simply does it by type.
Amazingly, on my machine the timing change *always* caused
test_channel_receivable() to fail, due to a latent race.
Includes feedback from @cdecker.
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
As connectd handles more packets itself, or diverts them to/from gossipd,
it's the only place we can implement the dev_disconnect logic.
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
We temporarily hack to sync_crypto_write/sync_crypto_read functions to
not do any crypto, and do it all in connectd.
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
Instead of passing the incoming socket to lightningd for the
subdaemon, create a new one and simply shuffle data between them,
keeping connectd in the loop.
For the moment, we don't decrypt at all, just shuffle. This means our
buffer code is kind of a hack, but that goes away once we start
actually decrypting and understanding message boundaries.
This implementation is naive: it closes the socket to the local daemon
as soon as the peer closes the socket to us. This is fixed in a
successive patch series (along with many other similar issues).
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
connectd will be keeping the conn open, so it needs to free this
"conn_timeout" timer. Pass it through, so we can do that.
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
Once connectd is doing this, we can't close as soon as we send,
and in fact we can't do 'fail write' either.
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
1. Adds the missing DNS error massages so they can be handled by
connect_control.
2. Prepends a 'All addresses failed' to code 401 message, so we
always have at least some error message to the user.
Changelog-None
And turn "" includes into full-path (which makes it easier to put
config.h first, and finds some cases check-includes.sh missed
previously).
config.h sets _GNU_SOURCE which really needs to be done before any
'#includes': we mainly got away with it with glibc, but other platforms
like Alpine may have stricter requirements.
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
We would fail connectd when listening on the IPv6 version failed; instead we should
allow that.
Changelog-Experimental: experimental-websocket-port fixed to work with default addresses.
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
valgrind locally complains about the allocations in autodata leaking:
```
==138200== 16 bytes in 1 blocks are still reachable in loss record 1 of 2
==138200== at 0x483B7F3: malloc (in /usr/lib/x86_64-linux-gnu/valgrind/vgpreload_memcheck-amd64-linux.so)
==138200== by 0x10D41A: autodata_register_ (autodata.c:20)
==138200== by 0x10E7B8: register_autotype_type_to_string (type_to_string.h:79)
==138200== by 0x10F5CA: register_one_type_to_string0 (block.c:259)
==138200== by 0x19734C: __libc_csu_init (in /home/rusty/devel/cvs/lightning/common/test/run-route-specific)
==138200== by 0x4A3D03F: (below main) (libc-start.c:264)
==138200==
==138200== 176 bytes in 1 blocks are still reachable in loss record 2 of 2
==138200== at 0x483DFAF: realloc (in /usr/lib/x86_64-linux-gnu/valgrind/vgpreload_memcheck-amd64-linux.so)
==138200== by 0x10D472: autodata_register_ (autodata.c:26)
==138200== by 0x122D37: register_autotype_type_to_string (type_to_string.h:79)
==138200== by 0x122F1F: register_one_type_to_string0 (node_id.c:50)
==138200== by 0x19734C: __libc_csu_init (in /home/rusty/devel/cvs/lightning/common/test/run-route-specific)
==138200== by 0x4A3D03F: (below main) (libc-start.c:264)
==138200==
make: *** [Makefile:638: unittest/common/test/run-route-specific] Error 7
```
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
blob[] is really a string from the commandline; leave it as a char.
And parsing is much simpler than this code makes it seem!
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
October was the date Torv2 is no longer supported by the Tor Project;
it will probably not work at all by next release, so we should remove
it now even though it's not quite the 6 months we prefer for
deprecation cycles.
I still see 110 nodes advertizing Torv2 (vs 10,292 Torv3); we still
parse and display it, we just don't advertize or connect to it.
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
WebSocket is a bit weird:
1. It starts like an HTTP connection, but they send special headers.
2. We reply with special headers, one of which involves SHA1 of one of theirs.
3. We are then in WebSocket mode, where each frame starts with a 2-20 byte
header.
We relay data in a simplistic way: if either side sends something, we
read it and relay it synchronously. That avoids any gratuitous
buffering.
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
If the port is set, we spawn it (lightning_websocketd) on any
connection to that port. That means websocketd is a per-peer daemon,
but it means every other daemon uses the connection normally (it's
just actually talking to websocketd instead of the client directly).
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
By popular merge-hell demand.
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
Changelog-Changed: Build: Python is now required to build, as generated files are no longer checked into the repository.
Before:
Ten builds, laptop -j5, no ccache:
```
real 0m36.686000-38.956000(38.608+/-0.65)s
user 2m32.864000-42.253000(40.7545+/-2.7)s
sys 0m16.618000-18.316000(17.8531+/-0.48)s
```
Ten builds, laptop -j5, ccache (warm):
```
real 0m8.212000-8.577000(8.39989+/-0.13)s
user 0m12.731000-13.212000(12.9751+/-0.17)s
sys 0m3.697000-3.902000(3.83722+/-0.064)s
```
After:
Ten builds, laptop -j5, no ccache: 8% faster
```
real 0m33.802000-35.773000(35.468+/-0.54)s
user 2m19.073000-27.754000(26.2542+/-2.3)s
sys 0m15.784000-17.173000(16.7165+/-0.37)s
```
Ten builds, laptop -j5, ccache (warm): 1% faster
```
real 0m8.200000-8.485000(8.30138+/-0.097)s
user 0m12.485000-13.100000(12.7344+/-0.19)s
sys 0m3.702000-3.889000(3.78787+/-0.056)s
```
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
This is best-practice (to ensure prototypes match up), but there were a
few places we didn't (at least, directly). Make it a requirement,
either of form "foo.h" or <dir/foo.h>.
The noise is the change to our print templates.
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
We make it a first-class citizen internally, even though we won't use
it over the wire (at least, non-experimental builds). This scheme
follows the latest draft, in which features are flagged compulsory.
We also add several helper functions.
Since uses the *even* bits (as per latest spec), not the *odd* bits,
we have some other fixups.
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
We want to use this to handle the simple description for channel_type.
It also needs to handle variable-size types (just like subtypes).
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
This renames all occurences of use_proxy_always to always_use_proxy
to keep it inline with config values. This was a bit confusing.
Only significant change is that the payload in the plugins init
requests also contained the old name. No plugin currently seems to make
use of this variable yet. The old name 'use_proxy_always' is added when
deprecated APIs is enabled.
Changelog-Deprecated: Plugins: Renames plugin init 'use_proxy_always' to 'always_use_proxy'
This does two things:
- It moves non-tor addresses upfront so it prefers normal connection
which are less laggy and more reliable.
- It prevents connectd from trying the same wire_addr twice when the
addr_hint was given and gossip also added the same address.
Changelog-Changed: connectd: try non-TOR connections first
Changelog-Fixed: connectd: do not try address hint twice
In fact, we make it compulsory, which means if you don't understand it
you'll hang up on us!
Add some logging for that in future.
Changelog-Changed: Protocol: All new invoices require a payment_secret (i.e. modern TLV format onion)
Changelog-Changed: Protocol: We can no longer connect to peers which don't support `payment_secret`.
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
This allows us to ensure a packet is read by the other end, but we
don't read anything else from them or write anything to them.
Using '+' is similar, but because it closes the connection, the peer
might notice before receiving the packet (such as if it does a write).
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
We were accidentally using the port that the tor service was
connecting to, not the /torport the user said to use.
Fixes: #4597
Reported-by: @openoms
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
Changelog-Fixed: Config: `addr` autotor and statictor /torport arguments now advertized correctly.
I did this by copying the updated bech32 code, and then re-patching in
our minor changes:
1. Headers modded (we need size_t)
2. Explicit length for bech32_encode/decode (not 90).
3. Exposing and bech32_ prefix for convert_bits, charset, charset_rev.
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
Currently we abuse openingd and dualopend to do this, but connectd already
has the ability to talk to peers, so it's more efficient.
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
This matters: if we connected, the address is probably usable for future connections.
But if they connected, the port is probably not (but the IP address may be).
Changelog-Added: JSON-RPC: `connect` returns "direction" ("in": they iniatated, or "out": we initiated)
Changelog-Added: plugins: `peer_connected` hook and `connect` notifications have "direction" field.
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
No more sending "all-channel" errors; in particular, gossipd now only
sends warnings (which make us hang up), not errors, and peer_connected
rejections are warnings (and disconnect), not errors.
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
Changelog-Changed: Plugins: `peer_connected` rejections now send a warning, not an error, to the peer.
We should actually be including this (as it may define _GNU_SOURCE
etc) before any system headers. But where we include <assert.h> we
often didn't, because check-includes would complain that the headers
included it too.
Weaken that check, and include config.h in C files before assert.h.
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
Took me a while (stressing under valgrind) to reproduce this,
then longer to figure out how it happened.
Turns out io_new_conn() can fail if the init function fails.
In our case, this can happen if connect() immediately returns
an error (inside io_connect). But we've already set the finish
function, which (if this was the last address), will free connect,
making the assignment `connect->conn = ...` write to a freed address.
Either way, if it fails, try_connect_one_addr() has taken care to
update connect->conn, or free connect, and the caller should not do it.
Here's the valgrind trace:
```
==384981== Invalid write of size 8
==384981== at 0x11127C: try_connect_one_addr (connectd.c:880)
==384981== by 0x112BA1: destroy_io_conn (connectd.c:708)
==384981== by 0x141459: destroy_conn (poll.c:244)
==384981== by 0x14147F: destroy_conn_close_fd (poll.c:250)
==384981== by 0x149EB9: notify (tal.c:240)
==384981== by 0x149F8B: del_tree (tal.c:402)
==384981== by 0x14A51A: tal_free (tal.c:486)
==384981== by 0x140036: io_close (io.c:450)
==384981== by 0x1400B3: do_plan (io.c:401)
==384981== by 0x140134: io_ready (io.c:423)
==384981== by 0x141A57: io_loop (poll.c:445)
==384981== by 0x112CB0: main (connectd.c:1703)
==384981== Address 0x4d67020 is 64 bytes inside a block of size 160 free'd
==384981== at 0x483CA3F: free (in /usr/lib/x86_64-linux-gnu/valgrind/vgpreload_memcheck-amd64-linux.so)
==384981== by 0x14A020: del_tree (tal.c:421)
==384981== by 0x14A51A: tal_free (tal.c:486)
==384981== by 0x1110C5: try_connect_one_addr (connectd.c:806)
==384981== by 0x112BA1: destroy_io_conn (connectd.c:708)
==384981== by 0x141459: destroy_conn (poll.c:244)
==384981== by 0x14147F: destroy_conn_close_fd (poll.c:250)
==384981== by 0x149EB9: notify (tal.c:240)
==384981== by 0x149F8B: del_tree (tal.c:402)
==384981== by 0x14A51A: tal_free (tal.c:486)
==384981== by 0x140036: io_close (io.c:450)
==384981== by 0x1405DC: io_connect_ (io.c:345)
==384981== Block was alloc'd at
==384981== at 0x483B7F3: malloc (in /usr/lib/x86_64-linux-gnu/valgrind/vgpreload_memcheck-amd64-linux.so)
==384981== by 0x149CF1: allocate (tal.c:250)
==384981== by 0x14A3C6: tal_alloc_ (tal.c:428)
==384981== by 0x1114F2: try_connect_peer (connectd.c:1526)
==384981== by 0x111717: connect_to_peer (connectd.c:1558)
==384981== by 0x1124F5: recv_req (connectd.c:1627)
==384981== by 0x1188B2: handle_read (daemon_conn.c:31)
==384981== by 0x13FBCB: next_plan (io.c:59)
==384981== by 0x140076: do_plan (io.c:407)
==384981== by 0x140113: io_ready (io.c:417)
==384981== by 0x141A57: io_loop (poll.c:445)
==384981== by 0x112CB0: main (connectd.c:1703)
```
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
Changelog-Fixed: Occasional crash in connectd due to use-after-free
Fixes: #4343