If there are two HTLCs with the same preimage, lightningd would always
find the first one. By including the id in the `struct htlc_stub`
it's both faster (normal HTLC lookup) and allows lightningd to detect
that onchaind wants to fail both of them.
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
We spend quite a bit of time in libsecp256k1 moving them to and from
DER encoding. With a bit of care, we can transfer the raw bytes from
gossipd and manually decode them so a malformed one can't make us
abort().
Before:
real 0m0.629000-0.695000(0.64985+/-0.019)s
After:
real 0m0.359000-0.433000(0.37645+/-0.023)s
At this point, the main issues are 11% of time spent in ccan/io's
backend_wake (I tried using a hash table there, but that actually makes
the small-number-of-fds case slower), and 65% of gossipd's time is
in marshalling the response (all those tal_resize add up!).
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
My test case is a mainnet gossip store with 22107 channels, and
time to do `lightning-cli listchannels`:
Before: `lightning-cli listchannels` DEVELOPER=0
real 0m1.303000-1.324000(1.3114+/-0.0091)s
After:
real 0m0.629000-0.695000(0.64985+/-0.019)s
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
It's a very ugly one-liner; really ccan/io should have an io_replan
for this, but it would have to be written carefully as it makes
assumptions currently about plans not changing. In this case, we know
it's in io_write, and we're just moving a pointer.
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
Such an API is required for when we stream it directly. Almost all our
handlers fit this pattern already, or nearly do.
We remove new_json_result() in favor of explicit json_stream_success()
and json_stream_fail(), but still allowing command_fail() if you just
want a simple all-in-one fail wrapper.
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
This isn't a big change, since we basically dump the entire JSON
resuly string into the membuf then write it out, but it's prep for the
next changes.
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
We occasionaly had a travis hang in test_multirpc, and it's due to a
thinko in the prior patch: if a command completes immediately, it will
do the wake before we go to sleep. That means we don't digest the
rest of the buffer until the next write.
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
There's a DoS if we keep reading commands and don't insist the client
read the responses.
My initial implementation simply removed the io_duplex, but that
doesn't work if we want to inject notifications in the stream (as we
will eventually want to do), so we operate it as duplex but have each
side wake the other when it's done.
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
My test case is a mainnet gossip store with 22107 channels, and
time to do `lightning-cli listchannels`:
Before: `lightning-cli listchannels` DEVELOPER=0
real 0m1.396000-1.409000(1.4022+/-0.005)s
After:
real 0m1.307000-1.320000(1.3128+/-0.005)s
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
It's the only user of them, and it's going to get optimized.
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
gossip.pydiff --git a/common/test/run-json.c b/common/test/run-json.c
index 956fdda35..db52d6b01 100644
Some people were alarmed that the state was set to "Loaded from
database" indefinitely. Saying that we are trying to reconnect may be
more informative.
And use wallet_forward_status_in_db() everywhere in db code.
And clean up extra CHANGELOG.md entry (looks like rebase error?)
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
The left join should make sure we still get the results but
referencing the fields and/or attempting to write them to the JSON-RPC
result will cause unforeseen problems. So just omit if we forgot
something.
We initialize it to 30 seconds, but it's *always* overridden by the
gossip_init message (and usually to 60 seconds, so it's doubly
misleading).
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
Gossipd provided a generic "get endpoints of this scid" and we only
use it in one place: to look up htlc forwards. But lightningd just
assumed that one would be us.
Instead, provide a simpler API which only returns the peer node
if any, and now we handle it much more gracefully.
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
Right now, the `config` file is read *after* the configuration working directory is moved to in the software. However one configuration option `lightning-dir` settable in the `config` file sets this working directory. As the directory is already opened (which defaults to `$HOME/.lightning`) before the configuration is read, the configured directory will not be used.
This patch parses the configuration file before opening the working directory, fixing this bug.
[ Update CHANGELOG.md and man pages -- RR ]
It went something like:
niftynei: Hey, cppcheck complains this might be NULL, so I put in a check.
rusty: cppcheck is dumb. Make it an assert("Rusty always right!").
niftynei: You seem certain of this so I shall do that.
https://github.com/ElementsProject/lightning/pull/1994
...
renepickhardt: I asked fiatjaf to run
`lightning-cli sendpay "[{'id':'02db8f487fcc0a'}]" 4efe0ba89b`
and his node crashed!
rusty: grep Assertion logs/*
lightningd/jsonrpc.c:326: connection_complete_error: Assertion `Rusty is always right!' failed.
It turns out that in the 'can't parse' error case, we hand NULL cmd to
connection_compete_error.
Next time, less asserting, more grepping!
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
This is a bit of overkill now that we simply accumulate the entire
JSON response in the buffer before flushing, but when we move to
streamed responses it allows us to have a single command that has
exclusive access to the out direction of the JSON-RPC connection.
This is the source of failure in the test_restart_many_payments stress
test: we don't commit the outgoing HTLC immediately, instead waiting for
gossip to tell us the peer for the outgoing channel, then waiting for
that channeld to tell is it's committed. The result was incoming HTLCs
with no outgoing.
I initially pushed the HTLCs through that same path, but of course
(since peers are not connected yet!) the only result was that we failed
these HTLCs immediately. So I chose the far simpler course of just
failing them directly.
To reproduce this, I had to increase the test_restart_many_payments
num to 10, and run it with nice -20 taskset -c 0.
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
During tests, this is half our log! And Travis truncates it if we get
a failure in test_restart_many_payments.
Interestingly, test_logging had a bug which relied on this spam :)
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
Instead of two code paths that return different help objects, simplify things by
always returning the full help object. This not only includes description and
the command name, but the verbose description as well.
Signed-off-by: William Casarin <jb55@jb55.com>
If another channel has set the optional `htlc_maximum_msat` field,
we should correctly parse that field and respect it when drawing up
routes for payments.
Noted by @cdecker, the term 'local' is grossly overused, and the hout
preimage is basically only used as a sanity check (though I've just put
a FIXME there for now).
Also eliminated spurious blank line which crept into wallet.c.
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
We don't expect payment or payment->route_channels to be NULL without an
old db, but putting an assert there reveals that we try to fail an HTLC
which has already succeeded in 'test_onchain_unwatch'.
Obviously we only want to fail an HTLC which goes onchain if we don't
already have the preimage!
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
failoutchannel tells us which channel to send an update for (specifically
for temporary_channel_failure); but we don't save it into the db. It's
not even clear we should, since it's a corner case and the channel might
not even exist when we come back.
So on db restore, change such errors to WIRE_TEMPORARY_NODE_FAILURE
which doesn't need an update.
We also don't memset it to 0 in the normal case (we only access if it
failcode has the UPDATE bit set) so valgrind will trigger if we're
wrong.
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
We don't save them to the database, so fix things up as we load them.
Next patch will actually save them into the db, and this will become
COMPAT code.
Also: call htlc_in_check() with NULL on db load, as otherwise it aborts
internally.
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
This means we need to check when we've altered the state, so the checks
are moved to the callers of htlc_in_update_state and htlc_out_update_state.
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
globalfeatures should not be accessed if we haven't received a
channel_update. Treat it like the other fields which are only
initialized and marshalled/unmarshalled if the timestamp is positive.
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
And use ARRAY_SIZE() everywhere which will break compile if it's not a
literal array, plus assertions that it's the same length.
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
We split json_invoice(), as it now needs to round-trip to the gossipd,
and uniqueness checks need to happen *after* gossipd replies to avoid
a race.
For every candidate channel gossipd gives us, we check that it's in
state NORMAL (not shutting down, not still waiting for lockin), that
it's connected, and that it has capacity. We then choose one with
probability weighted by excess capacity, so larger channels are more
likely.
As a side effect of this, we can tell if an invoice is unpayble (no
channels have sufficient incoming capacity) or difficuly (no *online*
channels have sufficient capacity), so we add those warnings.
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
For routeboost, we want to select from all our enabled channels with
sufficient incoming capacity. Gossipd knows which are enabled (ie. we
have received a `channel_update` from the peer), but doesn't know the
current incoming capacity.
So we get gossipd to give us all the candidates, and lightningd
selects from those.
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
We do this a lot, and had boutique helpers in various places. So add
a more generic one; for convenience it returns a pointer to the new
end element.
I prefer the name tal_arr_expand to tal_arr_append, since it's up to
the caller to populate the new array entry.
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
The help command now adds command usage to its output by calling each
command handler in CMD_USAGE mode.
Instead of seeing, for example:
decodepay
Decode {bolt11}, using {description} if necessary
we see:
decodepay bolt11 [description]
Decode {bolt11}, using {description} if necessary
Signed-off-by: Mark Beckwith <wythe@intrig.com>
Callers to param() can now optionally set a flag to see if command_fail was
called.
This is necessary because the `cmd` is freed in case of failure.
I spent a bit of time trying to extend the lifetime of the `cmd` to the end
of parse_request(), but the destructors still needed to be called when they
were, and it was getting ugly. So I took this minimal approach.
Signed-off-by: Mark Beckwith <wythe@intrig.com>
Now call param() even for commands that don't accept any parameters.
This is a bugfix of sorts. For example, before you could call:
bitcoin-cli getinfo blah
and the blah parameter would be ignored.
Now you will get an error: "too many parameters: got 1, expected 0"
Signed-off-by: Mark Beckwith <wythe@intrig.com>
Added the concept of a "command mode". The
behavior of param() changes based on the mode.
Added and tested the command mode of CMD_USAGE for
setting the usage of a command without running it.
Only infrastructure and test. No functional changes.
Signed-off-by: Mark Beckwith <wythe@intrig.com>
BOLT 7's been updated to split the flags field in `channel_update`
into two: `channel_flags` and `message_flags`. This changeset does the
minimal necessary to get to building with the new flags.
We couldn't use it before because it asserted dbid was non-zero. Remove
assert and save some code.
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
Header from folded patch 'fixup!_lightningd__use_hsm_get_client_fd()_helper_for_global_daemons_too.patch':
fixup! lightningd: use hsm_get_client_fd() helper for global daemons too.
Suggested-by: @ZmnSCPxj
That matches the other CSV names (HSM was the first, so it was written
before the pattern emerged).
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
The current code sends hsmstatus_client_bad_request via the req fd;
this won't work, since lightningd uses that synchronously and only
expects a reply to its commands. So send it via status_conn.
We also enhance hsmstatus_client_bad_request to include details, and
create convenience functions for it. Our previous handling was ad-hoc;
we sometimes just closed on the client without telling lightningd,
and sometimes we didn't tell lightningd *which* client was broken.
Also make every handler the exact same prototype, so they now use the
exact same patterns (hsmd *only* handles requests, makes replies).
I tested this manually by corrupting a request to hsmd.
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
We currently just ignore them. This is one reason the hsm (in some places)
explicitly calls log_broken so we get some idea.
This was the only subdaemon which had a NULL msgcb and msgname, so eliminate
those checks in subd.c.
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
The `json_tok_percentage` parser is used for the `fuzzpercent` in `getroute` and
`maxfeepercent` in `pay`. In both cases it seems reasonable to allow values
larger than 100%. This has bitten users in the past when they transferred single
satoshis to things like satoshis.place over a route longer than 2 hops.
With the previous patch, we could still get stuck behind a low-prio
request. Generalize it into separate queues, and allow more than one
request in parallel.
Worth noting that the test time for `VALGRIND=0 pytest -vx tests/ -n 10`
doesn't change measurably.
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
fiatjaf has a cheap VPS, connecting remotely to his home bitcoind node.
fiatjaf's latency on bitcoin-cli getblock is between 10 and 37 seconds.
fiatjaf's c-lightning node is getting one block per hour.
fiatjaf is sad.
We single-file our bitcoind requests, because bitcoind has a limited
thread pool and it *fails* rather than queueing if you upset it. We
probably be fine using separate queues for each command type, but simply
allowing some requests to cut in line should prove my theory that we're
getting stuck behind gossip verification requests.
fiatjaf now gets one block per 2 minutes.
fiatjaf is less sad.
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
Everything depends on common headers etc, and the HSM_CLIENT_HEADERS was removed
quite a while ago.
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
We would never complete further ping commands if we had < responses
than pings. Oops.
Fixes: #1928
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
We want to try it before --daemon, in case we error, but we don't know
the pid yet, so we split into 'lock' and 'write'.
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
If we run two daemons on the same directory we'd be getting the failure from
trying to listen to the same file before we'd hit the pid-file error, which was
causing confusion.
The first argument of 'ping' was documented as 'peerid', however
internally it is expected to be just 'id'.
To avoid breaking the API, opt to fix the documentation.
This was found because it means we have a non-zero feerate without
filling in the history of that feerate:
==15895== Conditional jump or move depends on uninitialised value(s)
==15895== at 0x408699: feerate_max (chaintopology.c:828)
==15895== by 0x41BE49: peer_start_openingd (opening_control.c:733)
==15895== by 0x425FE9: peer_connected (peer_control.c:515)
==15895== by 0x40CB8F: connectd_msg (connect_control.c:304)
==15895== by 0x42DB4E: sd_msg_read (subd.c:475)
==15895== by 0x42D499: read_fds (subd.c:302)
==15895== by 0x46EB18: next_plan (io.c:59)
==15895== by 0x46F5E9: do_plan (io.c:387)
==15895== by 0x46F627: io_ready (io.c:397)
==15895== by 0x471187: io_loop (poll.c:310)
==15895== by 0x41683D: main (lightningd.c:732)
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
Documentation changes:
1. Lots of extra detail suggested by @renepickhardt.
2. typo fixes from @practicalswift.
3. A section on 'const' usage.
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
Code changes:
1. Expose daemon_poll() so lightningd can call it directly, which avoids us
having store a global and document it.
2. Remove the (undocumented, unused, forgotten) --rpc-file="" option to disable
JSON RPC.
3. Move the ickiness of finding the executable path into subd.c, so it doesn't
distract from lightningd.c overview.
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>