We weakened this progressively over time, and gossip v1.5 makes spam
impossible by protocol, so we can wait until then.
Removing this code simplifies things a great deal!
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
Changelog-Removed: Protocol: we no longer ratelimit gossip messages by channel, making our code far simpler.
Make sure plugin has got message to connectd before sending!
```
def test_even_sendcustommsg(node_factory):
l1, l2 = node_factory.get_nodes(2, opts={'log-level': 'io',
'allow_warning': True})
l1.connect(l2)
# Even-numbered message
msg = hex(43690)[2:] + ('ff' * 30) + 'bb'
# l2 will hang up when it gets this.
l1.rpc.sendcustommsg(l2.info['id'], msg)
l2.daemon.wait_for_log(r'\[IN\] {}'.format(msg))
l1.daemon.wait_for_log('Invalid unknown even msg')
wait_for(lambda: l1.rpc.listpeers(l2.info['id'])['peers'] == [])
# Now with a plugin which allows it
l1.connect(l2)
l2.rpc.plugin_start(os.path.join(os.getcwd(), "tests/plugins/allow_even_msgs.py"))
l1.rpc.sendcustommsg(l2.info['id'], msg)
l2.daemon.wait_for_log(r'\[IN\] {}'.format(msg))
> l2.daemon.wait_for_log(r'allow_even_msgs.*Got message 43690')
tests/test_misc.py:3623:
...
> raise TimeoutError('Unable to find "{}" in logs.'.format(exs))
E TimeoutError: Unable to find "[re.compile('allow_even_msgs.*Got message 43690')]" in logs.
contrib/pyln-testing/pyln/testing/utils.py:327: TimeoutError
```
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
If we get a WIRE_TX_ABORT then another message, we send the other message to the same
subd (even though the tx abort causes it to shutdown). This means we effectively
lose the next message, and timeout (see below from CI, reproduced locally).
So, have connectd ignore the subd after it forwards the WIRE_TX_ABORT. The next
message will, correctly, cause a fresh subdaemon to be spawned.
```
@unittest.skipIf(TEST_NETWORK != 'regtest', 'elementsd doesnt yet support PSBT features we need')
@pytest.mark.openchannel('v2')
def test_v2_rbf_multi(node_factory, bitcoind, chainparams):
l1, l2 = node_factory.get_nodes(2,
opts={'may_reconnect': True,
'dev-no-reconnect': None,
'allow_warning': True})
l1.rpc.connect(l2.info['id'], 'localhost', l2.port)
amount = 2**24
chan_amount = 100000
bitcoind.rpc.sendtoaddress(l1.rpc.newaddr()['bech32'], amount / 10**8 + 0.01)
bitcoind.generate_block(1)
# Wait for it to arrive.
wait_for(lambda: len(l1.rpc.listfunds()['outputs']) > 0)
res = l1.rpc.fundchannel(l2.info['id'], chan_amount)
chan_id = res['channel_id']
vins = bitcoind.rpc.decoderawtransaction(res['tx'])['vin']
assert(only_one(vins))
prev_utxos = ["{}:{}".format(vins[0]['txid'], vins[0]['vout'])]
# Check that we're waiting for lockin
l1.daemon.wait_for_log(' to DUALOPEND_AWAITING_LOCKIN')
# Attempt to do abort, should fail since we've
# already gotten an inflight
with pytest.raises(RpcError):
l1.rpc.openchannel_abort(chan_id)
rate = int(find_next_feerate(l1, l2)[:-5])
# We 4x the feerate to beat the min-relay fee
next_feerate = '{}perkw'.format(rate * 4)
# Initiate an RBF
startweight = 42 + 172 # base weight, funding output
initpsbt = l1.rpc.utxopsbt(chan_amount, next_feerate, startweight,
prev_utxos, reservedok=True,
min_witness_weight=110,
excess_as_change=True)
# Do the bump
bump = l1.rpc.openchannel_bump(chan_id, chan_amount,
initpsbt['psbt'],
funding_feerate=next_feerate)
# Abort this open attempt! We will re-try
aborted = l1.rpc.openchannel_abort(chan_id)
assert not aborted['channel_canceled']
# We no longer disconnect on aborts, because magic!
assert only_one(l1.rpc.listpeers()['peers'])['connected']
# Do the bump, again, same feerate
> bump = l1.rpc.openchannel_bump(chan_id, chan_amount,
initpsbt['psbt'],
funding_feerate=next_feerate)
tests/test_opening.py:668:
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _
contrib/pyln-client/pyln/client/lightning.py:1206: in openchannel_bump
return self.call("openchannel_bump", payload)
contrib/pyln-testing/pyln/testing/utils.py:718: in call
res = LightningRpc.call(self, method, payload, cmdprefix, filter)
contrib/pyln-client/pyln/client/lightning.py:398: in call
resp, buf = self._readobj(sock, buf)
contrib/pyln-client/pyln/client/lightning.py:315: in _readobj
b = sock.recv(max(1024, len(buff)))
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _
self = <pyln.client.lightning.UnixSocket object at 0x7f34675aae80>
length = 1024
def recv(self, length: int) -> bytes:
if self.sock is None:
raise socket.error("not connected")
> return self.sock.recv(length)
E Failed: Timeout >1200.0s
```
Previously, we would forward the message to a subd, but now we have
the case where the subd is gone, but we're still connected. If the
peer anything but a reestablish in that state, we drop the connection.
Instead, an error should always make us fail the channel.
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
On Mac most tests report BROKEN because sodium creating an untracked fd pointing to /dev/random. dev_report_fd’s finds it at tear down and reports a BROKEN message.
We allow a single “char special” fd without reporting it as broken improving QOL for Mac developers.
While we’re here we added the fd mode to the log to help with future rogue fd issues.
ChangeLog-None
This makes it easier to use outside simple subds, and now lightningd can
simply dump to log rather than returning JSON.
JSON formatting was a lot of work, and we only did it for lightningd, not for
subdaemons. Easier to use the logs in all cases.
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
We still refuse to run dev commands if lightningd sends it to us
despite us not being in developer mode, but that's mainly paranoia.
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
Also requires us to expose memleak when !DEVELOPER, however we only
ever used the memleak tracking when the LIGHTNINGD_DEV_MEMLEAK
environment variable was set, so keep that.
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
Update the lightningd <-> channeld interface with lots of new commands to needed to facilitate spicing.
Implement the channeld splicing protocol leveraging the interactivetx protocol.
Implement lightningd’s channel_control to support channeld in its splicing efforts.
Changelog-Added: Added the features to enable splicing & resizing of active channels.
Fixes: #6368
Changelog-Fixed: Protocol: we no longer gossip about recently-closed channels (Eclair gets upset with this).
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
We will access the freed connection to gossipd. This is weird to track
down when the *actual* issue is that gossipd died!
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
I never really liked this hack: websockets are useful, advertizing
them not so much.
Note that we never actually documented that we would advertize these!
Changelog-EXPERIMENTAL: Protocol: Removed support for advertizing websocket addresses in gossip.
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
1. Make it the standard "return the error" pattern.
2. Rather than flags to indicate what types are allowed, have the callers
check the return explicitly.
3. Document the APIs.
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
This contained cut & paste code, and it wasn't clear to me that
the first loop included DNS entries with IPv6 entries.
Instead, allow the iterator to take multiple types, and use
a switch statement so compile will break as new types are added.
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
After the first iteration of the loop, we call memmem with a buflen that
points past the end of buf.
In practice we probably never read the uninitialized memory since we
guarantee the buffer ends with "\r\n", and since most/all libc
implementations probably read the haystack sequentially. But maybe
there's some libc with a crazy optimization out there. It's good to use
an accurate buflen just in case.
Discovered this while running some unit tests with MSan.
The push bit was convenient for connectd to send our own gossip
to peers upon connecting by naively traversing the gossip_store
and sending anything flagged `push`. This function is now
performed by gossipd leaving no use for the push bit.
Changelog-Changed: `gossipd`: gossip_store PUSH bit is no longer set.
This implements the proposal to simply use timestamp as "all", "none"
or "stream". There's also a rough spec draft which I will post soon.
This *also* removes the last place where we would sometimes sweep the
entire gossip_store looking for their given timestamps.
We could also get rid of the actual timestamp filtering logic in
gossip_store_next if we want to, as it's now basically unused.
Changelog-Changed: Protocol: Simplify gossip_timestamp_filter handling to "all", "none" or "recent" instead of exact timestamp.
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
This removes the sweep logic as soon as they connect. This should save
connectd a significant number of CPU cycles and make @whitslack finally
stop hitting me.
Changelog-Changed: `connectd` no longer sweeps gossip_store file when peer connects, saving CPU for large nodes.
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
We accept that we will fail to listen if we bind both IPv6 and IPv4 to
the same socket on a dual-stack machine (e.g. normal Linux), but we weren't
closing the fd.
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
Most of this is piping the flag through so we know it's a websocket!
Reported-by: @ShahanaFarooqui
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
This changes connectd to use `status_fail()` on TOR problems during statup
instead of `err()`. Using `err()` did not write to the logfile.
To find out TOR problems during startup, the user needed to stop the system
daemon and call `lightningd` manually in console to see the error.
`status_fail()` logs and exits, but also prints a whole stacktrace,
which is a bit too much imho on config errors. But currently there is
no `status_SOMETHING` method that logs, prints and exists on an error
without stacktrace.
Changelog-None
ccan/io stores the context pointer for io_new_conn, but we were using
`daemon->listeners` which we reallocate, so it can use a stale pointer.
```
0x3e1700 call_error
ccan/ccan/tal/tal.c:93
0x3e1700 check_bounds
ccan/ccan/tal/tal.c:165
0x3e1700 to_tal_hdr
ccan/ccan/tal/tal.c:174
0x3e1211 to_tal_hdr_or_null
ccan/ccan/tal/tal.c:186
0x3e1211 tal_alloc_
ccan/ccan/tal/tal.c:426
0x3db8f4 io_new_conn_
ccan/ccan/io/io.c:91
0x3dd2e1 accept_conn
ccan/ccan/io/poll.c:277
0x3dd2e1 io_loop
ccan/ccan/io/poll.c:444
0x3419fa main
connectd/connectd.c:2081
```
Fixes: #6060
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
The `tmpctx` is free'd before the error is read out/sent over the wire;
there's a call that will copy the array before sending it, let's use
that instead and take() the object?
------------------------------- Valgrind errors --------------------------------
Valgrind error file: valgrind-errors.2181501
==2181501== Syscall param write(buf) points to unaddressable byte(s)
==2181501== at 0x49E4077: write (write.c:26)
==2181501== by 0x1C79A3: do_write (io.c:189)
==2181501== by 0x1C80AB: do_plan (io.c:394)
==2181501== by 0x1C81BA: io_ready (io.c:423)
==2181501== by 0x1CA45B: io_loop (poll.c:453)
==2181501== by 0x118593: main (connectd.c:2053)
==2181501== Address 0x4afb158 is 40 bytes inside a block of size 140 free'd
==2181501== at 0x483F0C3: free (vg_replace_malloc.c:872)
==2181501== by 0x1D103C: del_tree (tal.c:421)
==2181501== by 0x1D130A: tal_free (tal.c:486)
==2181501== by 0x1364B8: clean_tmpctx (utils.c:172)
==2181501== by 0x1266DD: daemon_poll (daemon.c:87)
==2181501== by 0x1CA334: io_loop (poll.c:420)
==2181501== by 0x118593: main (connectd.c:2053)
==2181501== Block was alloc'd at
==2181501== at 0x483C855: malloc (vg_replace_malloc.c:381)
==2181501== by 0x1D0AC5: allocate (tal.c:250)
==2181501== by 0x1D1086: tal_alloc_ (tal.c:428)
==2181501== by 0x1D124F: tal_alloc_arr_ (tal.c:471)
==2181501== by 0x126204: cryptomsg_encrypt_msg (cryptomsg.c:161)
==2181501== by 0x11335F: peer_connected (connectd.c:318)
==2181501== by 0x118A8A: peer_init_received (peer_exchange_initmsg.c:135)
==2181501== by 0x1C751E: next_plan (io.c:59)
==2181501== by 0x1C8126: do_plan (io.c:407)
==2181501== by 0x1C8168: io_ready (io.c:417)
==2181501== by 0x1CA45B: io_loop (poll.c:453)
==2181501== by 0x118593: main (connectd.c:2053)
==2181501==
{
<insert_a_suppression_name_here>
Memcheck:Param
write(buf)
fun:write
fun:do_write
fun:do_plan
fun:io_ready
fun:io_loop
fun:main
}
--------------------------------------------------------------------------------