Since we now log directly, we don't prepend the prefix ourselves, making it really
hard to tell *which* lightningd the log applies to!
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
I had to parse quite a few of these files debugging zeroconf, so I
thought it might be nice to have direct access here.
Changelog-Added: pyln-testing: Added utilities to read and parse `gossip_store` file for nodes.
Ok this should be fixed the following stack trace
```
2022-06-29T14:19:41.183Z DEBUG lightningd: Command returned result after jcon close
------------------------------- Valgrind errors --------------------------------
Valgrind error file: valgrind-errors.55581
==55581== Syscall param statx(file_name) points to unaddressable byte(s)
==55581== at 0x4B0188E: statx (statx.c:29)
==55581== by 0x1133481: std::sys::unix::fs::try_statx (weak.rs:178)
==55581== by 0x11265E0: std::fs::buffer_capacity_required (fs.rs:851)
==55581== by 0x112675B: <std::fs::File as std::io::Read>::read_to_string (fs.rs:644)
==55581== by 0x10DACA8: num_cpus::linux::Cgroup::param (linux.rs:214)
==55581== by 0x10DAB39: num_cpus::linux::Cgroup::quota_us (linux.rs:203)
==55581== by 0x10DA9C2: num_cpus::linux::Cgroup::cpu_quota (linux.rs:188)
==55581== by 0x10DA5A1: num_cpus::linux::load_cgroups (linux.rs:149)
==55581== by 0x10DA23D: num_cpus::linux::init_cgroups (linux.rs:129)
==55581== by 0x10DCDC8: core::ops::function::FnOnce::call_once (function.rs:227)
==55581== by 0x10DC749: std::sync::once::Once::call_once::{{closure}} (once.rs:276)
==55581== by 0x21EE89: std::sync::once::Once::call_inner (once.rs:434)
==55581== Address 0x0 is not stack'd, malloc'd or (recently) free'd
==55581==
==55581== Syscall param statx(buf) points to unaddressable byte(s)
==55581== at 0x4B0188E: statx (statx.c:29)
==55581== by 0x1133481: std::sys::unix::fs::try_statx (weak.rs:178)
==55581== by 0x11265E0: std::fs::buffer_capacity_required (fs.rs:851)
==55581== by 0x112675B: <std::fs::File as std::io::Read>::read_to_string (fs.rs:644)
==55581== by 0x10DACA8: num_cpus::linux::Cgroup::param (linux.rs:214)
==55581== by 0x10DAB39: num_cpus::linux::Cgroup::quota_us (linux.rs:203)
==55581== by 0x10DA9C2: num_cpus::linux::Cgroup::cpu_quota (linux.rs:188)
==55581== by 0x10DA5A1: num_cpus::linux::load_cgroups (linux.rs:149)
==55581== by 0x10DA23D: num_cpus::linux::init_cgroups (linux.rs:129)
==55581== by 0x10DCDC8: core::ops::function::FnOnce::call_once (function.rs:227)
==55581== by 0x10DC749: std::sync::once::Once::call_once::{{closure}} (once.rs:276)
==55581== by 0x21EE89: std::sync::once::Once::call_inner (once.rs:434)
==55581== Address 0x0 is not stack'd, malloc'd or (recently) free'd
==55581==
--------------------------------------------------------------------------------
Leaving base_dir /tmp/ltests-hzt9ppqp intact, it still has test sub-directories with failure details: ['test_peers_1']
```
Signed-off-by: Vincenzo Palazzo <vincenzopalazzodev@gmail.com>
Even though we generally wait until a node has seen the gossip,
that doesn't mean that connectd has processed it! This means when
we connect it may still send us "old" gossip.
So we set the OPT_GOSSIP_QUERIES bit, which means don't send until we
ask. But now it sends us WIRE_QUERY_CHANNEL_RANGE, so everyone needs
to filter that out.
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
I saw another "only_one()" fail on alias checking: it's not
entirely clear to me with the more aggressive sending of
own gossip, that we necessarily process in order, so we might
not have actually seen all channels just because we saw
the farthest one.
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
Get it to log direct to stdout, so we see what's happening *as it
happens* rather than as we read it. We could restore the thread we
were using before, but that added more problems than it solved.
This means that we need the hsm password prompts in the log though.
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
Some flakes are caused by weird races in this code. Plus, if we
get things to write straight to files, we might see things in
there on post-mortem which happen after the python runner exits.
It's a bit less efficient, but much simpler. Let's see if it helps!
Some tests need a rework now, since we don't get a failure (except
eventual timeout), but they're simpler.
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
We use ephemeral_port_reserve to grab ports, but this can fail when we
restart a node, since the port can be reallocated at that point.
Attempt to overcome this using a global reserved list (is there a
neater way?).
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
This happens under CI, but it's not informative. Sleep and retry.
Also, "except (OSError, Exception)" does not seem to do what you'd think:
this clause never gets run.
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
Over time, it has cost us more developer cycles than it has gained.
It has hidden intermittant bugs, and allowed cruft to accumulate:
when we eventually tried to figure out what was going wrong, the
actual change which caused it was now stale and forgotten.
This was a particular bane during the connectd rewrite, and I
worked through some issues which had occurred before, but were not
more likely.
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
So this was quite a journey:
- We want relative depdendencies (using the `path` argument) whenever
developing locally. Otherwise we would have to install each
dependency every time we change a single character, which
undoubtedly would cause us to waste time trying to debug an issue
just because we forgot to install.
- When publishing however we want to rely on the version number,
since the repo context gets lost upon publishing, and path
dependencies cause failures.
The solution then it seems is to use `dev-dependencies` (not that
surprising once you find it) with relative paths, so that `poetry
install` uses these over the normal dependencies (no idea how they
dedup them) and use `dependencies` when publishing. The paths are
still in there when publishing, but `pip install` ignores them.
I checked that `poetry install` from an unrelated project doesn't
accidentally use the path dependencies, even when adding them as
dev-dependencies. This should hopefully also allow installing them
as a repo link, though I can't test that right now.
We should be using amount_msat always. Many tests were not. Plus,
deprecating it simplifies the code.
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
Changelog-Deprecated: JSONRPC: `sendpay` `route` elements `msatoshi` (use `amount_msat`)
This prepares for when they start being u64, not strings with msat appended.
This has a strange side effect on our schema: despite the name,
decodepay's `fee_base_msat` is actually a u64, which we now convert to
msat on decode.
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
This is consistent with our output changes, and increases consistency.
It also keeps future sanity checks happy, that we only use JSON msat
helpers with '_msat' fields.
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
Changelog-Changed: JSON-RPC: `invoice`, `sendonion`, `sendpay`, `pay`, `keysend`, `fetchinvoice`, `sendinvoice`: `msatoshi` argument is now called `amount_msat` to match other fields.
Changelog-Deprecated: JSON-RPC: `invoice`, `sendonion`, `sendpay`, `pay`, `keysend`, `fetchinvoice`, `sendinvoice` `msatoshi` (use `amount_msat`)
Mostly comments and docs: some places are actually paths, which
I have avoided changing. We may migrate them slowly, particularly
when they're user-visible.
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
Types are fixed, in particular:
* rename "OutputDesc" to more consistent "outputdesc".
* rename "utxo" to more consistent "outpoint".
* it's "boolean" not "bool".
* "number" means int or float, usually it should be u32.
Specific commands:
* close `id` can be by channel id, scid.
* close `feerange` is a feerate type.
* datastore/deldatastore/listdatastore `key` can be singleton.
* delexpiredinvoice: `maxexpirytime` is not required, is a u64.
* invoice/delinvoice/listinvoice `label` can be an integer
* fundpsbt: many fields are u32 not number (JSON for int or float).
* invoice: `msatoshi` can be "any".
* invoice: `expiry` has a type (now must be numeric).
* invoice: `exposeprivatechannels` can be bool or array of scids.
* invoice: `deschashonly` added
* keysend: there's no "float" type, use "number" or "u32" etc.
* keysend: `routehints` is a valid arg, as is `extratlvs` (EXPERIMENTAL_FEATURES)
* listdatastore: `key` is not required.
* newaddr: `addresstype` can be "all"
* pay: `exemptfee` is "msat", new fields `locaofferid` and `exclude`
* sendonion: was mis-formatted, missed `localofferid` and `groupid` fields.
* sendpay: add `localofferid` and `groupid` params.
* signpsbt: add `signonly` param.
* txprepare "outptus" typo.
* waitsendpay: add `groupid` and fix `partid` type.
* withdraw: `destination` is a bitcoin address, not a pubkey.
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
These are useful for requests:
1. "outpoint": <txid>:<outnum>
2. "feerate": strings or a number
3. "outputdesc": bitcoin-style addresses-as-keys
4. "msat_or_all": amount or "all"
4. "msat_or_any": amount or "any"
5. "short_channel_id_dir": scid with /0 or /1.
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
Either because lightningd tells us it wants to talk, or because the peer
says something about a channel.
We also introduce a behavior change: we disconnect after a failed open.
We might want to modify this later, but we it's a side-effect of openingd
not holding onto idle connections.
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
But this requires a watch-only wallet, and python-bitcoinlib doesn't support
multiple wallets, so we need to unload the original one, but then we need
to generate a block, so that can't generate a new address, so we need
an address arg to generate_block.
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
If we fund a channel between two nodes, then mine all the blocks to
announce it, any other nodes may see the announcement before the
blocks, causing CI to complain about "bad gossip":
```
lightningd-4: 2022-01-25T22:33:25.468Z DEBUG 032cf15d1ad9c4a08d26eab1918f732d8ef8fdc6abb9640bf3db174372c491304e-gossipd: Ignoring future channel_announcment for 113x1x1 (current block 112)
lightningd-4: 2022-01-25T22:33:25.468Z DEBUG 032cf15d1ad9c4a08d26eab1918f732d8ef8fdc6abb9640bf3db174372c491304e-gossipd: Bad gossip order: WIRE_CHANNEL_UPDATE before announcement 113x1x1/0
lightningd-4: 2022-01-25T22:33:25.468Z DEBUG 032cf15d1ad9c4a08d26eab1918f732d8ef8fdc6abb9640bf3db174372c491304e-gossipd: Bad gossip order: WIRE_CHANNEL_UPDATE before announcement 113x1x1/1
lightningd-4: 2022-01-25T22:33:25.468Z DEBUG 032cf15d1ad9c4a08d26eab1918f732d8ef8fdc6abb9640bf3db174372c491304e-gossipd: Bad gossip order: WIRE_NODE_ANNOUNCEMENT before announcement 032cf15d1ad9c4a08d26eab1918f732d8ef8fdc6abb9640bf3db174372c491304e
```
Add a new helper for this case, and use it where there are more than 2 nodes.
Cleans up test_routing_gossip and a few other places which did this manually.
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
This seems to trigger now, especially on PostgresQL (maybe it's faster
to process blocks?).
e.g. test_closing_simple() hangs in close(), because the close is unilateral
because the HTLC timed out, so it's waiting for a block (other lines removed):
```
lightningd-1: 2022-01-12T00:33:46.258Z DEBUG 022d223620a359a47ff7f7ac447c85c46c923da53389221a0054c11c1e3ca31d59-channeld-chan#1: peer_out WIRE_COMMITMENT_SIGNED
lightningd-1: 2022-01-12T00:33:46.278Z DEBUG lightningd: close_command: timeout = 172800
2022-01-12T01:03:36.9757201Z lightningd-2: 2022-01-12T00:33:46.384Z DEBUG lightningd: Adding block 104: 73ffa19d27d048613b2731e1682b4efff0dc226807d8cc99d724523c2ea58204
2022-01-12T01:03:36.9759053Z lightningd-2: 2022-01-12T00:33:46.396Z DEBUG lightningd: Adding block 105: 44fd06ed053a0d0594abcfefcfa69089351fc89080826799fb4b278a68fe5c20
2022-01-12T01:03:36.9760865Z lightningd-2: 2022-01-12T00:33:46.406Z DEBUG lightningd: Adding block 106: 0fee2dcbd1376249832642079131275e195bba4fb49cc9968df3a899010bba0f
2022-01-12T01:03:36.9762632Z lightningd-2: 2022-01-12T00:33:46.418Z DEBUG lightningd: Adding block 107: 7f24f2d7d3e83fe3256298bd661e57cdf92b058440738fd4d7e1c8ef4a4ca073
2022-01-12T01:03:36.9773411Z lightningd-2: 2022-01-12T00:33:46.429Z DEBUG 0266e4598d1d3c415f572a8488830b60f7e744ed9235eb0b1ba93283b315c03518-channeld-chan#1: peer_in WIRE_REVOKE_AND_ACK
2022-01-12T01:03:36.9794707Z lightningd-2: 2022-01-12T00:33:46.437Z DEBUG 0266e4598d1d3c415f572a8488830b60f7e744ed9235eb0b1ba93283b315c03518-channeld-chan#1: Commits outstanding after recv revoke_and_ack
2022-01-12T01:03:36.9788197Z lightningd-2: 2022-01-12T00:33:46.433Z DEBUG lightningd: Adding block 108: 283b371fb5d1ef42980ea10ab9f5965a179af8e91ddf31c8176e79820e1ec54d
2022-01-12T01:03:36.9799347Z lightningd-2: 2022-01-12T00:33:46.439Z DEBUG 0266e4598d1d3c415f572a8488830b60f7e744ed9235eb0b1ba93283b315c03518-channeld-chan#1: HTLC 0[REMOTE] => RCVD_REMOVE_REVOCATION
2022-01-12T01:03:36.9808057Z lightningd-2: 2022-01-12T00:33:46.447Z UNUSUAL 0266e4598d1d3c415f572a8488830b60f7e744ed9235eb0b1ba93283b315c03518-chan#1: Peer permanent failure in CHANNELD_NORMAL: Fulfilled HTLC 0 RCVD_REMOVE_REVOCATION cltv 109 hit deadline
```
This is because `pay` returns from l1 when it has the preimage, not
when the HTLC is fully resolved. Add a helper for this, and call it
at the end of the pay test helper. We might need this elsewhere
though!
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
This loads up 20MB of plugins temporarily; we seem to be getting OOM
killed under CI and I wonder if this is contributing.
Doesn't significantly reduce runtime here, but I have lots of memory.
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
On my machine it should have produced:
/tmp/ltests-1p0v5z1j/.locks/lock-1631090663.1066182
but instead it produced:
/private/tmp/ltests-1p0v5z1j/.locks/lock-1631090663.105848
The mismatch resulted in a hang inside flock.
Path .resolve() seems to be causing problems for others as well. It appears the library was not meant to handle complex path situations and isn't maintained as such (see link reference).
Since we have already built a full path here anyway, the call to .resolve() appears redundant.
Tested on python 3.9.6 on my Mac OS X 11.4 (20F71), Xcode 12.5.1 (12E507)
Relevant discussion: https://discuss.python.org/t/pathlib-absolute-vs-resolve/2573
As suggested in this issue https://github.com/python/mypy/issues/7484#issuecomment-529363083 we skip following import becuase with the recent version of mypy the __init__.py file make confusione inside the analysis (in the python issue it is unclear the main motivation of this issue. At list unclear to me).
Signed-off-by: Vincenzo Palazzo <vincenzopalazzodev@gmail.com>
We would fail even if a process exited cleanly after having the line we
were looking for, because we checked if it died before reading the logs.
Co-Authored-by: Daniela Brozzoni <daniela@revault.dev>
Signed-off-by: Antoine Poinsot <darosior@protonmail.com>