We want to stream gossip through this, but currently connectd treats the
fd as synchronous. While we work on getting rid of that, it's easiest to
have two fds.
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
Once we send funding_locked, gossipd could start seeing channel_updates
from the peer (which get sent so we can use the channel in routehints
even before it's announcable).
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
The last change exposed a race: the peer sends funding_locked
then immediately sends an update_channel. channeld used to process
the funding_locked from the peer, tell gossipd about the new
channel, then finally forward the channel_update.
We can have the channel_update hit gossipd before we've told it about
the channel. It ignores the channel_update for the currently-unknown
channel: we get a 'bad gossip' message, but the immediate symptom
is a timeout in tests/test_closing.py::test_onchain_multihtlc_their_unilateral:
```
node_factory = <pyln.testing.utils.NodeFactory object at 0x7fdf93f42190>
bitcoind = <pyln.testing.utils.BitcoinD object at 0x7fdf940b99d0>
@pytest.mark.developer("needs DEVELOPER=1 for dev_ignore_htlcs")
@pytest.mark.slow_test
def test_onchain_multihtlc_their_unilateral(node_factory, bitcoind):
"""Node pushes a channel onchain with multiple HTLCs with same payment_hash """
> h, nodes = setup_multihtlc_test(node_factory, bitcoind)
tests/test_closing.py:2938:
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _
tests/test_closing.py:2780: in setup_multihtlc_test
nodes = node_factory.line_graph(7, wait_for_announce=True,
/usr/local/lib/python3.8/dist-packages/pyln/testing/utils.py:1416: in line_graph
self.join_nodes(nodes, fundchannel, fundamount, wait_for_announce, announce_channels)
/usr/local/lib/python3.8/dist-packages/pyln/testing/utils.py:1394: in join_nodes
nodes[i + 1].wait_channel_active(scids[i])
/usr/local/lib/python3.8/dist-packages/pyln/testing/utils.py:958: in wait_channel_active
wait_for(lambda: self.is_channel_active(chanid))
```
Note that we are usually much faster to send between subds than we are
between peers, but during CI this is common, as we're all running on
the same machine.
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
We also no longer strip the type off: everyone handles both forms, and
Eclair doesn't strip (and it's easier!).
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
Even if we're deferring putting them in the store and broadcasting them,
we tell lightningd so it will use it in any error messages.
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
We want it to keep the latest, so it can make its own error msgs without
asking us. This installs (but does not use!) the message handler.
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
This is in preparation for gossipd feeding us the latest channel_updates,
rather than having lightningd and channeld query gossipd when it wants
to send an onion error with an update included.
This means gossipd will start telling us the updates, so we need the
channels loaded first.
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
With this change, we get more fine-grained error messages if something
goes wrong in the course of communicating with the SQLite database. To
pick some random examples, the error codes SQLITE_IOERR_NOMEM,
SQLITE_IOERR_CORRUPTFS or SQLITE_IOERR_FSYNC are way more specific
than just a plain SQLITE_IOERR, and the corresponding error messages
generated by sqlite3_errstr() will hence give a better hint to the
user (or also to the developers, if an error report is sent) what the
cause for a failure is.
Changelog-None
Firstly, we were not adding the extra fee output on our dummy tx,
because the fee amount was 0. We probably should always do this, even
if it's 0.
Secondly, there are 6 witnesses, not 1, for elements txs.
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
This fixes lightningd's chronic weight underestimate.
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
Changelog-Fixed: closingd: more accurate weight estimation helps mutual closing near min/max feerates.
And in particular, fix onchaind grinding code which used the
actual number of inputs and outputs (which already includes the
fee output); that breaks with the next patch which fixes other
calculations.
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
The blockheight is zero though, since these aren't included in a block
yet.
We also don't issue an 'external' deposit event if we can tell that the
address you're sending to actually belongs to our wallet (we'll issue a
deposit event when it gets included in a block)
If a coin move concerns an external account, it's really useful to know
which 'internal' account initiated the transfer.
We're about to add a notification for withdrawals, so we can use this to
track wallet pushes to outside addresses
Changelog-Added: JSONRPC: `coin_movement` to 'external' accounts now include an 'originating_account' field
```
l1.rpc.disconnect(l2.info['id'], force=True)
l1.rpc.connect(l2.info['id'], 'localhost', l2.port)
> l1.daemon.wait_for_log('option_static_remotekey enabled at 2/2')
tests/test_connection.py:3653:
```
If l2's channeld gets killed (due to reconnect) before it tells
lightningd it got the revoke_and_ack it will need a retransmission
*again*.
This makes the test more robust, and does more checks too.
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
An "err" is only returned if the channel_update is malformed: more common
is that it's fine, but we don't know the scid.
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
lightningd would race with the subd destructor to do the waitpid(),
resulting in UNUSUAL log messages, but also us missing if a plugin
was killed via a signal.
We can also get rid of the gratuitous waitpid() in test_subdaemons.
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
check-dbstmts was just running the normal pytest, AFAICT:
```
export TEST_CHECK_DBSTMTS=0
+ TEST_CHECK_DBSTMTS=0
```
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
connectd does this internally now using ccan/io, with appropriate
credit for ZmnSCPxj who wrote this code in the first place.
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
OK, now this test makes more sense! Now we don't ignore errors, we
*will* drop to chain if we reconnect after one side has dropped to
chain.
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
This was put in late 2019, and @t-bast says Eclair doesn't ignore their
errors and has had no issues.
It also conflicts with https://github.com/lightning/bolts/pull/932
which suggests you *should* fail when you receive an error.
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
This seems to trigger now, especially on PostgresQL (maybe it's faster
to process blocks?).
e.g. test_closing_simple() hangs in close(), because the close is unilateral
because the HTLC timed out, so it's waiting for a block (other lines removed):
```
lightningd-1: 2022-01-12T00:33:46.258Z DEBUG 022d223620a359a47ff7f7ac447c85c46c923da53389221a0054c11c1e3ca31d59-channeld-chan#1: peer_out WIRE_COMMITMENT_SIGNED
lightningd-1: 2022-01-12T00:33:46.278Z DEBUG lightningd: close_command: timeout = 172800
2022-01-12T01:03:36.9757201Z lightningd-2: 2022-01-12T00:33:46.384Z DEBUG lightningd: Adding block 104: 73ffa19d27d048613b2731e1682b4efff0dc226807d8cc99d724523c2ea58204
2022-01-12T01:03:36.9759053Z lightningd-2: 2022-01-12T00:33:46.396Z DEBUG lightningd: Adding block 105: 44fd06ed053a0d0594abcfefcfa69089351fc89080826799fb4b278a68fe5c20
2022-01-12T01:03:36.9760865Z lightningd-2: 2022-01-12T00:33:46.406Z DEBUG lightningd: Adding block 106: 0fee2dcbd1376249832642079131275e195bba4fb49cc9968df3a899010bba0f
2022-01-12T01:03:36.9762632Z lightningd-2: 2022-01-12T00:33:46.418Z DEBUG lightningd: Adding block 107: 7f24f2d7d3e83fe3256298bd661e57cdf92b058440738fd4d7e1c8ef4a4ca073
2022-01-12T01:03:36.9773411Z lightningd-2: 2022-01-12T00:33:46.429Z DEBUG 0266e4598d1d3c415f572a8488830b60f7e744ed9235eb0b1ba93283b315c03518-channeld-chan#1: peer_in WIRE_REVOKE_AND_ACK
2022-01-12T01:03:36.9794707Z lightningd-2: 2022-01-12T00:33:46.437Z DEBUG 0266e4598d1d3c415f572a8488830b60f7e744ed9235eb0b1ba93283b315c03518-channeld-chan#1: Commits outstanding after recv revoke_and_ack
2022-01-12T01:03:36.9788197Z lightningd-2: 2022-01-12T00:33:46.433Z DEBUG lightningd: Adding block 108: 283b371fb5d1ef42980ea10ab9f5965a179af8e91ddf31c8176e79820e1ec54d
2022-01-12T01:03:36.9799347Z lightningd-2: 2022-01-12T00:33:46.439Z DEBUG 0266e4598d1d3c415f572a8488830b60f7e744ed9235eb0b1ba93283b315c03518-channeld-chan#1: HTLC 0[REMOTE] => RCVD_REMOVE_REVOCATION
2022-01-12T01:03:36.9808057Z lightningd-2: 2022-01-12T00:33:46.447Z UNUSUAL 0266e4598d1d3c415f572a8488830b60f7e744ed9235eb0b1ba93283b315c03518-chan#1: Peer permanent failure in CHANNELD_NORMAL: Fulfilled HTLC 0 RCVD_REMOVE_REVOCATION cltv 109 hit deadline
```
This is because `pay` returns from l1 when it has the preimage, not
when the HTLC is fully resolved. Add a helper for this, and call it
at the end of the pay test helper. We might need this elsewhere
though!
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
There's actually a bug in our closing tx size estimation; I'll do
a separate patch for this, though.
Seems this used to be flaky, now we always flush queues, so it's
more reliably caught.
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
We used to shut down peers atomically, but now we flush the
connections there's a delay. If we are asked to connect in that time,
we ignore it, as we are already connected, but that's wrong: we need
to remember that we were told to connect and reconnect.
This should solve a few weird test failures where "connect" would hang
indefinitely.
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
We seem to hit a race between manual reconnect (with address hint) and an automatic
reconnection attempt which fails:
```
> l4.rpc.connect(l3.info['id'], 'localhost', l3.port)
...
E pyln.client.lightning.RpcError: RPC call failed: method: connect, payload: {'id': '035d2b1192dfba134e10e540875d366ebc8bc353d5aa766b80c090b39c3a5d885d', 'host': 'localhost', 'port': 41285}, error: {'code': 401, 'message': 'All addresses failed: 127.0.0.1:36678: Connection establishment: Connection refused. '}
```
See how it didn't even try the given address?
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
In the case where the peer sends an error (and hangs up) immediately
after init, connectd *doesn't actually read the error* (even after all the
previous fixes so it actually receives the error!).
This is because to tried to first write WIRE_CHANNEL_REESTABLISH, and
that fails, so it never tries to read. Generally, we should ignore
write failures; we'll find out if the socket is closed when we read
nothing.
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
l1 might split in a commitment_signed before it notices the disconnect, and this test fails:
```
for i in range(0, len(disconnects)):
with pytest.raises(RpcError):
l1.rpc.sendpay(route, rhash, payment_secret=inv['payment_secret'])
> l1.rpc.waitsendpay(rhash)
E Failed: DID NOT RAISE <class 'pyln.client.lightning.RpcError'>
```
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>