msg_queue was originally designed for inter-daemon comms, and so it has
a special mechanism to mark that we're trying to send an fd. Unfortunately,
a peer could also send such a message, confusing us!
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
dev_blackhole_fd was a hack, and doesn't work well now we are async
(it worked for sync comms in per-peer daemons, but now we could sneak
through a read before we get to the next write).
So, make explicit flags and use them. This is much easier now we
have all peer comms in one place.
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
It's weird to have connectd ask gossipd, when lightningd can just do it
and hand all the addresses together.
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
We now let gossipd do it.
This also means there's nothing left in 'struct per_peer_state' to
send across the wire (the fds are sent separately), so that gets
removed from wire messages too.
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
We actually intercept the gossip_timestamp_filter, so the gossip_store
mechanism inside the per-peer daemon never kicks off for normal connections.
The gossipwith tool doesn't set OPT_GOSSIP_QUERIES, so it gets both, but
that only effects one place.
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
channeld can't do it any more: it's using local sockets. Connectd
can do it, and simply does it by type.
Amazingly, on my machine the timing change *always* caused
test_channel_receivable() to fail, due to a latent race.
Includes feedback from @cdecker.
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
As connectd handles more packets itself, or diverts them to/from gossipd,
it's the only place we can implement the dev_disconnect logic.
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
We temporarily hack to sync_crypto_write/sync_crypto_read functions to
not do any crypto, and do it all in connectd.
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
Instead of passing the incoming socket to lightningd for the
subdaemon, create a new one and simply shuffle data between them,
keeping connectd in the loop.
For the moment, we don't decrypt at all, just shuffle. This means our
buffer code is kind of a hack, but that goes away once we start
actually decrypting and understanding message boundaries.
This implementation is naive: it closes the socket to the local daemon
as soon as the peer closes the socket to us. This is fixed in a
successive patch series (along with many other similar issues).
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
connectd will be keeping the conn open, so it needs to free this
"conn_timeout" timer. Pass it through, so we can do that.
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
1. Adds the missing DNS error massages so they can be handled by
connect_control.
2. Prepends a 'All addresses failed' to code 401 message, so we
always have at least some error message to the user.
Changelog-None
And turn "" includes into full-path (which makes it easier to put
config.h first, and finds some cases check-includes.sh missed
previously).
config.h sets _GNU_SOURCE which really needs to be done before any
'#includes': we mainly got away with it with glibc, but other platforms
like Alpine may have stricter requirements.
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
We would fail connectd when listening on the IPv6 version failed; instead we should
allow that.
Changelog-Experimental: experimental-websocket-port fixed to work with default addresses.
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
blob[] is really a string from the commandline; leave it as a char.
And parsing is much simpler than this code makes it seem!
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
October was the date Torv2 is no longer supported by the Tor Project;
it will probably not work at all by next release, so we should remove
it now even though it's not quite the 6 months we prefer for
deprecation cycles.
I still see 110 nodes advertizing Torv2 (vs 10,292 Torv3); we still
parse and display it, we just don't advertize or connect to it.
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
If the port is set, we spawn it (lightning_websocketd) on any
connection to that port. That means websocketd is a per-peer daemon,
but it means every other daemon uses the connection normally (it's
just actually talking to websocketd instead of the client directly).
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
Before:
Ten builds, laptop -j5, no ccache:
```
real 0m36.686000-38.956000(38.608+/-0.65)s
user 2m32.864000-42.253000(40.7545+/-2.7)s
sys 0m16.618000-18.316000(17.8531+/-0.48)s
```
Ten builds, laptop -j5, ccache (warm):
```
real 0m8.212000-8.577000(8.39989+/-0.13)s
user 0m12.731000-13.212000(12.9751+/-0.17)s
sys 0m3.697000-3.902000(3.83722+/-0.064)s
```
After:
Ten builds, laptop -j5, no ccache: 8% faster
```
real 0m33.802000-35.773000(35.468+/-0.54)s
user 2m19.073000-27.754000(26.2542+/-2.3)s
sys 0m15.784000-17.173000(16.7165+/-0.37)s
```
Ten builds, laptop -j5, ccache (warm): 1% faster
```
real 0m8.200000-8.485000(8.30138+/-0.097)s
user 0m12.485000-13.100000(12.7344+/-0.19)s
sys 0m3.702000-3.889000(3.78787+/-0.056)s
```
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
This renames all occurences of use_proxy_always to always_use_proxy
to keep it inline with config values. This was a bit confusing.
Only significant change is that the payload in the plugins init
requests also contained the old name. No plugin currently seems to make
use of this variable yet. The old name 'use_proxy_always' is added when
deprecated APIs is enabled.
Changelog-Deprecated: Plugins: Renames plugin init 'use_proxy_always' to 'always_use_proxy'
This does two things:
- It moves non-tor addresses upfront so it prefers normal connection
which are less laggy and more reliable.
- It prevents connectd from trying the same wire_addr twice when the
addr_hint was given and gossip also added the same address.
Changelog-Changed: connectd: try non-TOR connections first
Changelog-Fixed: connectd: do not try address hint twice
In fact, we make it compulsory, which means if you don't understand it
you'll hang up on us!
Add some logging for that in future.
Changelog-Changed: Protocol: All new invoices require a payment_secret (i.e. modern TLV format onion)
Changelog-Changed: Protocol: We can no longer connect to peers which don't support `payment_secret`.
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
I did this by copying the updated bech32 code, and then re-patching in
our minor changes:
1. Headers modded (we need size_t)
2. Explicit length for bech32_encode/decode (not 90).
3. Exposing and bech32_ prefix for convert_bits, charset, charset_rev.
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
Currently we abuse openingd and dualopend to do this, but connectd already
has the ability to talk to peers, so it's more efficient.
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
This matters: if we connected, the address is probably usable for future connections.
But if they connected, the port is probably not (but the IP address may be).
Changelog-Added: JSON-RPC: `connect` returns "direction" ("in": they iniatated, or "out": we initiated)
Changelog-Added: plugins: `peer_connected` hook and `connect` notifications have "direction" field.
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
No more sending "all-channel" errors; in particular, gossipd now only
sends warnings (which make us hang up), not errors, and peer_connected
rejections are warnings (and disconnect), not errors.
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
Changelog-Changed: Plugins: `peer_connected` rejections now send a warning, not an error, to the peer.
Took me a while (stressing under valgrind) to reproduce this,
then longer to figure out how it happened.
Turns out io_new_conn() can fail if the init function fails.
In our case, this can happen if connect() immediately returns
an error (inside io_connect). But we've already set the finish
function, which (if this was the last address), will free connect,
making the assignment `connect->conn = ...` write to a freed address.
Either way, if it fails, try_connect_one_addr() has taken care to
update connect->conn, or free connect, and the caller should not do it.
Here's the valgrind trace:
```
==384981== Invalid write of size 8
==384981== at 0x11127C: try_connect_one_addr (connectd.c:880)
==384981== by 0x112BA1: destroy_io_conn (connectd.c:708)
==384981== by 0x141459: destroy_conn (poll.c:244)
==384981== by 0x14147F: destroy_conn_close_fd (poll.c:250)
==384981== by 0x149EB9: notify (tal.c:240)
==384981== by 0x149F8B: del_tree (tal.c:402)
==384981== by 0x14A51A: tal_free (tal.c:486)
==384981== by 0x140036: io_close (io.c:450)
==384981== by 0x1400B3: do_plan (io.c:401)
==384981== by 0x140134: io_ready (io.c:423)
==384981== by 0x141A57: io_loop (poll.c:445)
==384981== by 0x112CB0: main (connectd.c:1703)
==384981== Address 0x4d67020 is 64 bytes inside a block of size 160 free'd
==384981== at 0x483CA3F: free (in /usr/lib/x86_64-linux-gnu/valgrind/vgpreload_memcheck-amd64-linux.so)
==384981== by 0x14A020: del_tree (tal.c:421)
==384981== by 0x14A51A: tal_free (tal.c:486)
==384981== by 0x1110C5: try_connect_one_addr (connectd.c:806)
==384981== by 0x112BA1: destroy_io_conn (connectd.c:708)
==384981== by 0x141459: destroy_conn (poll.c:244)
==384981== by 0x14147F: destroy_conn_close_fd (poll.c:250)
==384981== by 0x149EB9: notify (tal.c:240)
==384981== by 0x149F8B: del_tree (tal.c:402)
==384981== by 0x14A51A: tal_free (tal.c:486)
==384981== by 0x140036: io_close (io.c:450)
==384981== by 0x1405DC: io_connect_ (io.c:345)
==384981== Block was alloc'd at
==384981== at 0x483B7F3: malloc (in /usr/lib/x86_64-linux-gnu/valgrind/vgpreload_memcheck-amd64-linux.so)
==384981== by 0x149CF1: allocate (tal.c:250)
==384981== by 0x14A3C6: tal_alloc_ (tal.c:428)
==384981== by 0x1114F2: try_connect_peer (connectd.c:1526)
==384981== by 0x111717: connect_to_peer (connectd.c:1558)
==384981== by 0x1124F5: recv_req (connectd.c:1627)
==384981== by 0x1188B2: handle_read (daemon_conn.c:31)
==384981== by 0x13FBCB: next_plan (io.c:59)
==384981== by 0x140076: do_plan (io.c:407)
==384981== by 0x140113: io_ready (io.c:417)
==384981== by 0x141A57: io_loop (poll.c:445)
==384981== by 0x112CB0: main (connectd.c:1703)
```
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
Changelog-Fixed: Occasional crash in connectd due to use-after-free
Fixes: #4343
If we're already attempting to connect to a peer, we would ignore
new connection requests. This is problematic if your node has bad
connection details for the node -- you can't update it while inflight.
This patch appends new connection suggestions to the list of connections
to try.
Fixes#4154
1. Rename memleak_enter_allocations to memleak_find_allocations.
2. Unify scanning for pointers into memleak_remove_region / memleak_remove_pointer.
3. Document the functions.
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
This is simple, and we now can multifundchannel to every node on testnet
(one simply hangs once we connect).
Changelog-Fixed: Protocol: We now hang up if peer doesn't respond to init message after 60 seconds.
This avoids overwriting the ones in git, and generally makes things neater.
We have convenience headers wire/peer_wire.h and wire/onion_wire.h to
avoid most #ifdefs: simply include those.
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
Note that other directories were explicitly depending on the generated
file, instead of relying on their (already existing) dependency on
$(LIGHTNINGD_HSM_CLIENT_OBJS), so we remove that.
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
This simplifies our test matrix, as we never have to handle talking
to peers that specify one but not the other.
This is particularly important for option_anchor_outputs which
assumes option_static_remotekey.
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
@thestick613 noticed that since tor version below 0.3.2.2-alpha
will not support V3 ed25519 address formats, the error handling
is not that helpful in the error message from cli.
So now we add an hint.
Changelog-None:
Signed-off-by: Saibato <saibato.naga@pm.me>
connectd/connectd.h; Add helper function to update conn error list
Signed-off-by: Saibato <saibato.naga@pm.me>
common/onion is going to need to use this for the case where it finds a blinding
seed inside the TLV. But how it does ecdh is daemon-specific.
We already had this problem for devtools/gossipwith, which supplied a
special hsm_do_ecdh(). This just makes it more general.
So we create a generic ecdh() interface, with a specific implementation
which subdaemons and lightningd can use.
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
It's almost always "their_features" and "our_features" respectively, so
make those names clear.
Suggested-by: @cdecker
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
Turns out that unnecessary: all callers can access the feature_set,
so make it much more like a normal primitive.
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
This will help with the next patch, where we wean off using a global
for features: connectd.c has access to the feature bits.
Since connectd might now want to send a message, it needs the crypto_state
non-const, which makes this less trivial than it would otherwise be.
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
This restriction was removed from the spec as of
86c2ebcc5973a4133d3ce4d80ae1c203061a1646.
We also fix up some strange formatting in that part of the documentation.
Changelog-changed: We now announce multiple addresses of the same type, if given.
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
This is a common thing to do, so create a macro.
Unfortunately, it still needs the type arg, because the paramter may
be const, and the return cannot be, and C doesn't have a general
"(-const)" cast.
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
The `init_featurebits` are computed at startup, and then cached
indefinitely. They are then used whenever a new `init` handshake is performed.
We could add a new message to push updates to `connectd` whenever a plugin is
added or removed, but that's up for discussion.
Before this patch we used `int` for error codes. The problem with
`int` is that we try to pass it to/from wire and the size of `int` is
not defined by the standard. So a sender with 4-byte `int` would write
4 bytes to the wire and a receiver with 2-byte `int` (for example) would
read just 2 bytes from the wire.
To resolve this:
* Introduce an error code type with a known size:
`typedef s32 errcode_t`.
* Change all error code macros to constants of type `errcode_t`.
Constants also play better with gdb - it would visualize the name of
the constant instead of the numeric value.
* Change all functions that take error codes to take the new type
`errcode_t` instead of `int`.
* Introduce towire / fromwire functions to send / receive the newly added
type `errcode_t` and use it instead of `towire_int()`.
In addition:
* Remove the now unneeded `towire_int()`.
* Replace a hardcoded error code `-2` with a new constant
`INVOICE_EXPIRED_DURING_WAIT` (903).
Changelog-Changed: The waitinvoice command would now return error code 903 to designate that the invoice expired during wait, instead of the previous -2
Add towire_int() and fromwire_int() functions to "(de)serialize"
"int". This will only work as long as both the caller of towire_int()
and the caller of fromwire_int() use the same in-memory representation
of signed integers and have the same sizeof(int).
Changelog-None
We want to have a static Tor service created from a blob bound to
our node on cmdline
Changelog-added: persistent Tor address support
Changelog-added: allow the Tor inbound service port differ from 9735
Signed-off-by: Saibato <saibato.naga@pm.me>
Add base64 encode/decode to common
We need this to encode the blob for the tor service
Signed-off-by: Saibato <saibato.naga@pm.me>
Even if it is on startup only once ...
Like @bitcoin-software indicated the expected UX should be in
line with what a user expects the software will do
so we should not dns if we say so with a flag that suggest that.
Changelog-Fixed: We disable all dns even on startup the scan for bogus dns servers, if --always-use-proxy is set true
Signed-off-by: Saibato <saibato.naga@pm.me>
I was wondering why TAGS was missing some functions, and finally
tracked it down: PRINTF_FMT() confuses etags if it's at the start
of a function, and it ignores the rest of the file.
So we put PRINTF_FMT at the end, but that doesn't work for
*definitions*, only *declarations*. So we remove it from definitions
and add gratuitous declarations in the few static places.1
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
We were using `i` as index variable in two nested loops. This works as long as
the DNS seed resolves to a single address, but will crash if the node has both
an A as well as an AAAA entry, at which point we'll try to index the hostname
without a matching entry.
Signed-off-by: Christian Decker <@cdecker>
This is mainly an internal-only change, especially since we don't
offer any globalfeatures.
However, LND (as of next release) will offer global features, and also
expect option_static_remotekey to be a *global* feature. So we send
our (merged) feature bitset as both global and local in init, and fold
those bitsets together when we get an init msg.
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
With enable-autotor-v2 defined in cmdline the default behavior to create
v3 onions with the tor service call, is set to v2 onions.
Signed-off-by: Saibato <saibato.naga@pm.me>
This encoding scheme is no longer just used for short_channel_ids, so make
the names more generic.
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
It's generally clearer to have simple hardcoded numbers with an
#if DEVELOPER around it, than apparent variables which aren't, really.
Interestingly, our pruning test was always kinda broken: we have to pass
two cycles, since l2 will refresh the channel once to avoid pruning.
Do the more obvious thing, and cut the network in half and check that
l1 and l3 time out.
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
Rather than reaching into data structures, let them register their own
callbacks. This avoids us having to expose "memleak_remove_xxx"
functions, and call them manually.
Under the hood, this is done by having a specially-named tal child of
the thing we want to assist, containing the callback.
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
Keeping the uintmap ordering all the broadcastable messages is expensive:
130MB for the million-channels project. But now we delete obsolete entries
from the store, we can have the per-peer daemons simply read that sequentially
and stream the gossip itself.
This is the most primitive version, where all gossip is streamed;
successive patches will bring back proper handling of timestamp filtering
and initial_routing_sync.
We add a gossip_state field to track what's happening with our gossip
streaming: it's initialized in gossipd, and currently always set, but
once we handle timestamps the per-peer daemon may do it when the first
filter is sent.
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
We used to generate this in the caller, then save it in case we needed
to retry. We're about to change the message we send to lightningd, so
we'll need to regenerate it every time; just hand all the extra args
into peer_connected() and we can generate the `connect_peer_connected`
msg there.
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
1diff --git a/connectd/connectd.c b/connectd/connectd.c
index 94fe50b56..459c9ac63 100644
I tried to just do gossipd, but it was uncontainable, so this ended up being
a complete sweep.
We didn't get much space saving in gossipd, even though we should save
24 bytes per node.
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
Pubkeys are not not actually DER encoding, but Pieter Wuille corrected
me: it's SEC 1 documented encoding.
Results from 5 runs, min-max(mean +/- stddev):
store_load_msec,vsz_kb,store_rewrite_sec,listnodes_sec,listchannels_sec,routing_sec,peer_write_all_sec
38922-39297(39180.6+/-1.3e+02),2880728,41.040000-41.160000(41.106+/-0.05),2.270000-2.530000(2.338+/-0.097),44.570000-53.980000(49.696+/-3),32.840000-33.080000(32.95+/-0.095),43.060000-44.950000(43.696+/-0.72)
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
They don't clean up after themselves, so best we do it here (by this
point we've already done the pid check to make sure we're the only
lightningd here anyway).
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
It's supposed to be `--bind-addr=/socket` since you can't advertize a
local address, but the parser accepts `--addr=` too, and the intent is
clear.
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
lightning_connectd(19780): STATUS_FAIL_INTERNAL_ERROR: Failed to bind on 2 socket: Address family not supported by protocol
"Untested code is buggy code"
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
Christian and I both unwittingly used it in form:
*tal_arr_expand(&x) = tal(x, ...)
Since '=' isn't a sequence point, the compiler can (and does!) cache
the value of x, handing it to tal *after* tal_arr_expand() moves it
due to tal_resize().
The new version is somewhat less convenient to use, but doesn't have
this problem, since the assignment is always evaluated after the
resize.
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
It's more natural than using a zero-secret when something goes wrong.
Also note that the HSM will actually kill the connection if the ECDH
fails, which is fortunately statistically unlikely.
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
We currently hand the feature set from lightningd, but that's confusing
if they were ever different.
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
We need several notleak() annotations here:
1. The temporary structure which is handed to retry_peer_connected().
It's waiting for the master to respond to our connect_reconnected
message.
2. We don't keep a pointer to the io_conn for a peer, so we need to
mark those as not being a leak.
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
It means an extra allocation at startup, but it means we can hide the definition,
and use standard patterns (new_daemon_conn and typesafe callbacks).
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
When the wrong key is used, the remote end simply hangs up.
We used to get a random errno, which tends to be "Operation now in progress."
Now it's defined to be 0, detect and provide a better error.
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
I split the peer_connected() function into the peer_reconnected(),
which is basically an entire separate path from the rest of
peer_connected().
Also, removed unused TAKEN annotation from `id` parameter. Nobody actually
hands us take() there, and just as well, since we don't take it!
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
connectd is the only user of the cryptomsg async APIs; better to
open-code it here. We need to expose a little from cryptomsg(),
but we remove the 'struct peer' entirely from connectd.
One trick is that we still need to defer telling lightningd when a
peer reconnects (until it tells us the old one is disconnected). So
now we generate the message for lightningd and send it once we're woken.
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
We only call it once, so don't free the "old" one. And fix some indenting.
And make hostname const.
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
We do this a lot, and had boutique helpers in various places. So add
a more generic one; for convenience it returns a pointer to the new
end element.
I prefer the name tal_arr_expand to tal_arr_append, since it's up to
the caller to populate the new array entry.
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
That matches the other CSV names (HSM was the first, so it was written
before the pattern emerged).
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
@renepickhardt: why is it actually lightningd.c with a d but hsm.c without d ?
And delete unused gossipd/gossip.h.
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>