Dualopend is not listening to the peer fd when it hangs up, so doesn't
notice it's gone. We don't clean up the channel until it's done (usually
a good thing: it could be about to lock it in), but this harms us
here.
Fix the test failure and make a comment.
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
The only places which should call try_reconnect now are the "connect"
command, and the disconnect path when it decides there's still an
active channel.
This introduces one subtlety: if we disconnect when there's no active
channel, but then the subd makes one, we have to catch that case!
This temporarily reverts "slow" reconnections to fast ones: see next
patch.
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
Connectd already does this when we *receive* an error or warning, but
now do it on send. This causes some slight behavior change: we don't
disconnect when we close a channel, for example (our behaviour here
has been inconsistent across versions, depending on the code).
When connectd is told to disconnect, it now does so immediately, and
doesn't wait for subds to drain etc. That simplifies the manual
disconnect case, which now cleans up as it would from any other
disconnection when connectd says it's disconnected.
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
In various places, we assumed that when `connected` is false,
everything is finished. This is not true: we should wait for the
state we expect.
In addition, various places allows reconnections, which interfered
with the logic; suppress them.
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
We want to avoid lost messages in the common cases.
This generalizes our drain code, by giving the subds each 5 seconds to
close themselves, but continue to allow them to send us traffic (if
peer is still connected) and continue to send them traffic.
We continue to send traffic *out* to the peer (if it's still
connected), until all subds are gone. We still have a 5 second timer
to close the connection to peer.
On reconnects, we don't do this "drain period" on reconnects: we kill
immediately.
We fix up one test which was looking for the "disconnect" message
explicitly.
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
It's caused by a reconnection race: we hold the new incoming connection while we
ask lightningd to kill the old connection. But under some circumstances we leave
the new incoming hanging (with, in this case, old reestablish messages unread!)
and another connection comes in.
Then, later we service the long-gone "incoming" connection, channeld
reads the ancient reestablish message and gets upset.
This test used to hang, but now we've fixed reconnection races it is fine.
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
Got complaints about us hanging up on some nodes because they don't respond
to pings in a timely manner (e.g. ACINQ?), but that turned out to be something
else.
Nonetheless, we've had reports in the past of LND badly prioritizing gossip
traffic, and thus important messages can get queued behind gossip dumps!
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
Changelog-Changed: connectd: give busy peers more time to respond to pings.
If we can't broadcast the tx, confirm that it didn't end up in the
mempool or the utxo set before throwing an error.
Note that this doesn't protect us in the case where the funding
output has already been *spent*... but that's extremely rare, right?
Fixes#5296
Reported-By: @rustyrussell
Collab-With: @vincenzopalazzo
There's a 1 in 256 chance that our signature on the transaction is 70,
not 71 bytes long. This changes the freerate. So fix up the weight in
this case, to be the expected weight.
```
@unittest.skipIf(TEST_NETWORK == 'liquid-regtest', "Fees on elements are different")
@pytest.mark.developer("uses dev-fail")
@pytest.mark.openchannel('v1') # v2 the weight calculation is off by 3
deftest_multifunding_feerates(node_factory, bitcoind):
'''
Test feerate parameters for multifundchannel
'''
funding_tx_feerate = '10000perkw'
commitment_tx_feerate_int = 2000
commitment_tx_feerate = str(commitment_tx_feerate_int) + 'perkw'
l1, l2, l3 = node_factory.get_nodes(3, opts={'log-level': 'debug'})
l1.fundwallet(1 << 26)
def_connect_str(node):
return'{}@localhost:{}'.format(node.info['id'], node.port)
destinations = [{"id": _connect_str(l2), 'amount': 50000}]
res = l1.rpc.multifundchannel(destinations, feerate=funding_tx_feerate,
commitment_feerate=commitment_tx_feerate)
entry = bitcoind.rpc.getmempoolentry(res['txid'])
weight = entry['weight']
expected_fee = int(funding_tx_feerate[:-5]) * weight // 1000
> assert expected_fee == entry['fees']['base'] * 10 ** 8
E AssertionError: assert 7000 == (Decimal('0.00007010') * (10 ** 8))
tests/test_connection.py:1982: AssertionError
```
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
We can fail to use larger channel if it's not ready yet:
```
2022-05-23T01:20:05.5325600Z # Check it used the larger channel!
2022-05-23T01:20:05.5326376Z > assert before[chan23a_idx]['to_us_msat'] == after[chan23a_idx]['to_us_msat']
2022-05-23T01:20:05.5326961Z E assert 1000000000msat == 900000000msat
2022-05-23T01:20:05.5327240Z
2022-05-23T01:20:05.5327621Z tests/test_connection.py:3896: AssertionError
```
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
Over time, it has cost us more developer cycles than it has gained.
It has hidden intermittant bugs, and allowed cruft to accumulate:
when we eventually tried to figure out what was going wrong, the
actual change which caused it was now stale and forgotten.
This was a particular bane during the connectd rewrite, and I
worked through some issues which had occurred before, but were not
more likely.
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
We should be using amount_msat always. Many tests were not. Plus,
deprecating it simplifies the code.
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
Changelog-Deprecated: JSONRPC: `sendpay` `route` elements `msatoshi` (use `amount_msat`)
The new msat fields are turned into Millisatoshi, so handle that correctly
too in tests too.
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
Changelog-Deprecated: Plugins: `coin_movement` notification: `balance`, `credit`, `debit` and `fees` (use `balance_msat`, `credit_msat`, `debit_msat` and `fees_msat`)
Before this fix, there was the situation where a DEVELOPER=1 node would
announce non-public addresses on mainnet if detected. Since there
are some nodes on the internet that falsely report local addresses
we move this 'testing feature' to 'dev-allow-locahost' nodes.
Changelog-None
We call out to connectd to activate the peer, and while we do that,
channel->owner is NULL. A better pattern would be to set up the unsaved
channel once connectd has given us the peer, but this works for now.
Fixes: #5204
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
Make it always a number; this makes the JSON request specification
simpler. We allowed a number since v0.10.1.
(reserve=True is the default anyway, so usually it can be omitted:
reserve=False becomes reserve=0).
Changelog-Deprecated: JSON-RPC: `fundpsbt`/`utxopsbt` `reserve` must be a number, not bool (for `true` use 72/don't specify, for `false` use 0). Numbers have been allowed since v0.10.1.
I have a separate branch which fixes this race properly, but it's not anything
to do with this PR.
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
This was missed in e8d2176e6b.
```
> raise ValueError(str(errors))
E ValueError:
E Node errors:
E - lightningd-2: had bad gossip messages
E - lightningd-3: had bad gossip messages
E Global errors:
contrib/pyln-testing/pyln/testing/fixtures.py:201: ValueError
...
0266e4598d1d3c415f572a8488830b60f7e744ed9235eb0b1ba93283b315c03518-gossipd: Ignoring future channel_announcment for 105x1x2 (current block 104)
0266e4598d1d3c415f572a8488830b60f7e744ed9235eb0b1ba93283b315c03518-gossipd: Bad gossip order: WIRE_CHANNEL_UPDATE before announcement 105x1x2/0
```
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
Make sure it sees disconnect before reconnect, otherwise the next command
fails since we're now disconnected.
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
We may not see a disconnect instantly:
```
> assert len(l2.rpc.listpeers()['peers']) == 0
E assert 1 == 0
E +1
E -0
```
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
This is generally verboten now, since there can be multiple. There are a
few exceptions:
1. We sometimes want to know if there are *any* active channels.
2. Some dev commands still take peer id when they mean channel_id.
3. We still allow peer id when it's fully determined.
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
Changelog-Changed: JSON-RPC: `close` by peer id will fail if there is more than one live channel (use `channel_id` or `short_channel_id` as id arg).
Rather than intuiting whether this is a new channel / active channel,
use the channel_id. This simplifies things and makes them explicit,
and prepares for multiple live channels per peer.
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
Either because lightningd tells us it wants to talk, or because the peer
says something about a channel.
We also introduce a behavior change: we disconnect after a failed open.
We might want to modify this later, but we it's a side-effect of openingd
not holding onto idle connections.
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
openingd currently holds the connection to idle peers, but we're about
to change that: it will only look after peers which are actively
opening a connection. We can start this process by disconnecting
whenever we have a negotiation failure.
We could stay connected if we wanted to, but that would be up to
connectd to decide. Right now it's easier if we disconnect from any
idle peer once it's been active.
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
When compiled without DEVELOPER this will now filter out `remote_addr` that
come from localhost. The testcase checks for DEVELOPER to test for correct
function of `remote_addr`.
Also, I renamed "test_connect" to "test_connect_basic" so it can be started
without all the other tests in that file that start with "test_connect..."
These tests have proven to be:
a) very expensive, as they spin up many nodes, and perform long setup
b) are not testing anything specific, they just fuzz functionality
that is already tested otherwise
c) have not helped pinpoint any issues in living memory
d) are very flaky, making for really bad signal-to-noise, so much
that devs usually just restart without even looking at the logs
e) even if we were to look at the logs, we'd be unable to reproduce
due to the inherent randomness involved in these tests
f) are really noisy neighbors, causing other tests to flake as well,
further muddying the water
All in all, these tests are a waste of time, and source of
frustration.
[ Cleaned up python unused imports --RR ]
Changelog-None