We need several notleak() annotations here:
1. The temporary structure which is handed to retry_peer_connected().
It's waiting for the master to respond to our connect_reconnected
message.
2. We don't keep a pointer to the io_conn for a peer, so we need to
mark those as not being a leak.
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
This avoids some very ugly switch() statements which mixed the two,
but we also take the chance to rename 'towire_gossip_' to
'towire_gossipd_' for those inter-daemon messages; they're messages to
gossipd, not gossip messages.
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
We now keep multiple commands for a json_connection, and an array of
json_streams.
When a command wants to write something, we allocate a new json_stream
at the end of the array.
We always output from the first available json_stream; once that
command has finished, we free that and move to the next. Once all are
done, we wake the reader.
This means we won't read a new command if output is still pending, but
as most commands don't start writing until they're ready to write
everything, we still get command parallelism.
In particular, you can now 'waitinvoice' and 'delinvoice' and it will
work even though the 'waitinvoice' blocks.
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
json_stream_success / json_stream_fail belong in jsonrpc.c, and the
json_tok helpers for special types belong in json.x
json_add_object() isn't used, remove it rather than moving it.
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
We promote 'struct json_stream' to contain the membuf; we only attach
the json_stream to the command when we actually call
json_stream_success / json_stream_fail.
This means we are closer to 'struct json_stream' being an independent
layer; the tests are already modified to use it directly to create
JSON.
This is also the first step toward re-enabling non-serial command
execution.
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
This is the final step to get the plugins working. After parsing the
early options (including `--plugin`), then starting and asking the
plugins for options, and finally reading in the options we just
registered, we just need to assemble the options and send them over.
Signed-off-by: Christian Decker <@cdecker>
We also make `--help` a non-early arg so it allows for the plugins to
register their options before printing the help message. The options
themselves are stored in a separate struct inbetween them being
registered and them being forwarded to the plugin. Currently only
supports string options.
Signed-off-by: Christian Decker <@cdecker>
Also includes some sanity checks for the results returned by the
plugin, such as ensuring that the ID is as expected and that we have
either an error or a real result.
The idea is that `plugin` is an early arg that is parsed (from command
line or the config file). We can then start the plugins and have them
tell us about the options they'd like to add to the mix, before we
actually parse them.
Signed-off-by: Christian Decker <@cdecker>
The spec says so, and it's right: with the right pattern of packet loss
(thanks Travis!) the other end can still be in channeld, waiting for our
`shutdown` message.
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
This way there's no need for a context pointer, and freeing a msg_queue
frees its contents, as expected.
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
When developing in regtest or testnet it is really inconvenient to
have to fake traffic and generate blocks just to get estimatesmartfee
to return a valid estimate. This just sets the minfee if bitcoind
doesn't return a valid estimate.
Reported-by: Rene Pickhardt <@renepickhardt>
Signed-off-by: Christian Decker <@cdecker>
The only change is that the final_incorrect_htlc_amount field is now 64
bit. Since no implementation yet parses that field, we just updated it
quietly in the spec.
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
We always have an addr entry in the db (though it may be an ephemeral socket
the peer connected in from). We don't have to handle a NULL address.
While we're there, simplify new_peer not to take the features args;
the caller who cares immediately updates the features anyway.
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
In e46ce0fc84 I accidentally removed the
actual code which fails the command. As a result, if we retry and it
succeeds later, we can end up "succeeding" the started-failing
command, causing us to hit the 'assert(!cmd->have_json_stream);' in
new_json_stream.
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
If there are two HTLCs with the same preimage, lightningd would always
find the first one. By including the id in the `struct htlc_stub`
it's both faster (normal HTLC lookup) and allows lightningd to detect
that onchaind wants to fail both of them.
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
We spend quite a bit of time in libsecp256k1 moving them to and from
DER encoding. With a bit of care, we can transfer the raw bytes from
gossipd and manually decode them so a malformed one can't make us
abort().
Before:
real 0m0.629000-0.695000(0.64985+/-0.019)s
After:
real 0m0.359000-0.433000(0.37645+/-0.023)s
At this point, the main issues are 11% of time spent in ccan/io's
backend_wake (I tried using a hash table there, but that actually makes
the small-number-of-fds case slower), and 65% of gossipd's time is
in marshalling the response (all those tal_resize add up!).
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
My test case is a mainnet gossip store with 22107 channels, and
time to do `lightning-cli listchannels`:
Before: `lightning-cli listchannels` DEVELOPER=0
real 0m1.303000-1.324000(1.3114+/-0.0091)s
After:
real 0m0.629000-0.695000(0.64985+/-0.019)s
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
It's a very ugly one-liner; really ccan/io should have an io_replan
for this, but it would have to be written carefully as it makes
assumptions currently about plans not changing. In this case, we know
it's in io_write, and we're just moving a pointer.
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
Such an API is required for when we stream it directly. Almost all our
handlers fit this pattern already, or nearly do.
We remove new_json_result() in favor of explicit json_stream_success()
and json_stream_fail(), but still allowing command_fail() if you just
want a simple all-in-one fail wrapper.
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
This isn't a big change, since we basically dump the entire JSON
resuly string into the membuf then write it out, but it's prep for the
next changes.
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
We occasionaly had a travis hang in test_multirpc, and it's due to a
thinko in the prior patch: if a command completes immediately, it will
do the wake before we go to sleep. That means we don't digest the
rest of the buffer until the next write.
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
There's a DoS if we keep reading commands and don't insist the client
read the responses.
My initial implementation simply removed the io_duplex, but that
doesn't work if we want to inject notifications in the stream (as we
will eventually want to do), so we operate it as duplex but have each
side wake the other when it's done.
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
My test case is a mainnet gossip store with 22107 channels, and
time to do `lightning-cli listchannels`:
Before: `lightning-cli listchannels` DEVELOPER=0
real 0m1.396000-1.409000(1.4022+/-0.005)s
After:
real 0m1.307000-1.320000(1.3128+/-0.005)s
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
It's the only user of them, and it's going to get optimized.
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
gossip.pydiff --git a/common/test/run-json.c b/common/test/run-json.c
index 956fdda35..db52d6b01 100644
Some people were alarmed that the state was set to "Loaded from
database" indefinitely. Saying that we are trying to reconnect may be
more informative.
And use wallet_forward_status_in_db() everywhere in db code.
And clean up extra CHANGELOG.md entry (looks like rebase error?)
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
The left join should make sure we still get the results but
referencing the fields and/or attempting to write them to the JSON-RPC
result will cause unforeseen problems. So just omit if we forgot
something.
We initialize it to 30 seconds, but it's *always* overridden by the
gossip_init message (and usually to 60 seconds, so it's doubly
misleading).
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
Gossipd provided a generic "get endpoints of this scid" and we only
use it in one place: to look up htlc forwards. But lightningd just
assumed that one would be us.
Instead, provide a simpler API which only returns the peer node
if any, and now we handle it much more gracefully.
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
Right now, the `config` file is read *after* the configuration working directory is moved to in the software. However one configuration option `lightning-dir` settable in the `config` file sets this working directory. As the directory is already opened (which defaults to `$HOME/.lightning`) before the configuration is read, the configured directory will not be used.
This patch parses the configuration file before opening the working directory, fixing this bug.
[ Update CHANGELOG.md and man pages -- RR ]
It went something like:
niftynei: Hey, cppcheck complains this might be NULL, so I put in a check.
rusty: cppcheck is dumb. Make it an assert("Rusty always right!").
niftynei: You seem certain of this so I shall do that.
https://github.com/ElementsProject/lightning/pull/1994
...
renepickhardt: I asked fiatjaf to run
`lightning-cli sendpay "[{'id':'02db8f487fcc0a'}]" 4efe0ba89b`
and his node crashed!
rusty: grep Assertion logs/*
lightningd/jsonrpc.c:326: connection_complete_error: Assertion `Rusty is always right!' failed.
It turns out that in the 'can't parse' error case, we hand NULL cmd to
connection_compete_error.
Next time, less asserting, more grepping!
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
This is a bit of overkill now that we simply accumulate the entire
JSON response in the buffer before flushing, but when we move to
streamed responses it allows us to have a single command that has
exclusive access to the out direction of the JSON-RPC connection.
This is the source of failure in the test_restart_many_payments stress
test: we don't commit the outgoing HTLC immediately, instead waiting for
gossip to tell us the peer for the outgoing channel, then waiting for
that channeld to tell is it's committed. The result was incoming HTLCs
with no outgoing.
I initially pushed the HTLCs through that same path, but of course
(since peers are not connected yet!) the only result was that we failed
these HTLCs immediately. So I chose the far simpler course of just
failing them directly.
To reproduce this, I had to increase the test_restart_many_payments
num to 10, and run it with nice -20 taskset -c 0.
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
During tests, this is half our log! And Travis truncates it if we get
a failure in test_restart_many_payments.
Interestingly, test_logging had a bug which relied on this spam :)
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
Instead of two code paths that return different help objects, simplify things by
always returning the full help object. This not only includes description and
the command name, but the verbose description as well.
Signed-off-by: William Casarin <jb55@jb55.com>
If another channel has set the optional `htlc_maximum_msat` field,
we should correctly parse that field and respect it when drawing up
routes for payments.
Noted by @cdecker, the term 'local' is grossly overused, and the hout
preimage is basically only used as a sanity check (though I've just put
a FIXME there for now).
Also eliminated spurious blank line which crept into wallet.c.
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
We don't expect payment or payment->route_channels to be NULL without an
old db, but putting an assert there reveals that we try to fail an HTLC
which has already succeeded in 'test_onchain_unwatch'.
Obviously we only want to fail an HTLC which goes onchain if we don't
already have the preimage!
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
failoutchannel tells us which channel to send an update for (specifically
for temporary_channel_failure); but we don't save it into the db. It's
not even clear we should, since it's a corner case and the channel might
not even exist when we come back.
So on db restore, change such errors to WIRE_TEMPORARY_NODE_FAILURE
which doesn't need an update.
We also don't memset it to 0 in the normal case (we only access if it
failcode has the UPDATE bit set) so valgrind will trigger if we're
wrong.
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
We don't save them to the database, so fix things up as we load them.
Next patch will actually save them into the db, and this will become
COMPAT code.
Also: call htlc_in_check() with NULL on db load, as otherwise it aborts
internally.
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
This means we need to check when we've altered the state, so the checks
are moved to the callers of htlc_in_update_state and htlc_out_update_state.
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
globalfeatures should not be accessed if we haven't received a
channel_update. Treat it like the other fields which are only
initialized and marshalled/unmarshalled if the timestamp is positive.
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
And use ARRAY_SIZE() everywhere which will break compile if it's not a
literal array, plus assertions that it's the same length.
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
We split json_invoice(), as it now needs to round-trip to the gossipd,
and uniqueness checks need to happen *after* gossipd replies to avoid
a race.
For every candidate channel gossipd gives us, we check that it's in
state NORMAL (not shutting down, not still waiting for lockin), that
it's connected, and that it has capacity. We then choose one with
probability weighted by excess capacity, so larger channels are more
likely.
As a side effect of this, we can tell if an invoice is unpayble (no
channels have sufficient incoming capacity) or difficuly (no *online*
channels have sufficient capacity), so we add those warnings.
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
For routeboost, we want to select from all our enabled channels with
sufficient incoming capacity. Gossipd knows which are enabled (ie. we
have received a `channel_update` from the peer), but doesn't know the
current incoming capacity.
So we get gossipd to give us all the candidates, and lightningd
selects from those.
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
We do this a lot, and had boutique helpers in various places. So add
a more generic one; for convenience it returns a pointer to the new
end element.
I prefer the name tal_arr_expand to tal_arr_append, since it's up to
the caller to populate the new array entry.
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
The help command now adds command usage to its output by calling each
command handler in CMD_USAGE mode.
Instead of seeing, for example:
decodepay
Decode {bolt11}, using {description} if necessary
we see:
decodepay bolt11 [description]
Decode {bolt11}, using {description} if necessary
Signed-off-by: Mark Beckwith <wythe@intrig.com>
Callers to param() can now optionally set a flag to see if command_fail was
called.
This is necessary because the `cmd` is freed in case of failure.
I spent a bit of time trying to extend the lifetime of the `cmd` to the end
of parse_request(), but the destructors still needed to be called when they
were, and it was getting ugly. So I took this minimal approach.
Signed-off-by: Mark Beckwith <wythe@intrig.com>
Now call param() even for commands that don't accept any parameters.
This is a bugfix of sorts. For example, before you could call:
bitcoin-cli getinfo blah
and the blah parameter would be ignored.
Now you will get an error: "too many parameters: got 1, expected 0"
Signed-off-by: Mark Beckwith <wythe@intrig.com>
Added the concept of a "command mode". The
behavior of param() changes based on the mode.
Added and tested the command mode of CMD_USAGE for
setting the usage of a command without running it.
Only infrastructure and test. No functional changes.
Signed-off-by: Mark Beckwith <wythe@intrig.com>
BOLT 7's been updated to split the flags field in `channel_update`
into two: `channel_flags` and `message_flags`. This changeset does the
minimal necessary to get to building with the new flags.
We couldn't use it before because it asserted dbid was non-zero. Remove
assert and save some code.
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
Header from folded patch 'fixup!_lightningd__use_hsm_get_client_fd()_helper_for_global_daemons_too.patch':
fixup! lightningd: use hsm_get_client_fd() helper for global daemons too.
Suggested-by: @ZmnSCPxj
That matches the other CSV names (HSM was the first, so it was written
before the pattern emerged).
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
The current code sends hsmstatus_client_bad_request via the req fd;
this won't work, since lightningd uses that synchronously and only
expects a reply to its commands. So send it via status_conn.
We also enhance hsmstatus_client_bad_request to include details, and
create convenience functions for it. Our previous handling was ad-hoc;
we sometimes just closed on the client without telling lightningd,
and sometimes we didn't tell lightningd *which* client was broken.
Also make every handler the exact same prototype, so they now use the
exact same patterns (hsmd *only* handles requests, makes replies).
I tested this manually by corrupting a request to hsmd.
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
We currently just ignore them. This is one reason the hsm (in some places)
explicitly calls log_broken so we get some idea.
This was the only subdaemon which had a NULL msgcb and msgname, so eliminate
those checks in subd.c.
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
The `json_tok_percentage` parser is used for the `fuzzpercent` in `getroute` and
`maxfeepercent` in `pay`. In both cases it seems reasonable to allow values
larger than 100%. This has bitten users in the past when they transferred single
satoshis to things like satoshis.place over a route longer than 2 hops.
With the previous patch, we could still get stuck behind a low-prio
request. Generalize it into separate queues, and allow more than one
request in parallel.
Worth noting that the test time for `VALGRIND=0 pytest -vx tests/ -n 10`
doesn't change measurably.
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
fiatjaf has a cheap VPS, connecting remotely to his home bitcoind node.
fiatjaf's latency on bitcoin-cli getblock is between 10 and 37 seconds.
fiatjaf's c-lightning node is getting one block per hour.
fiatjaf is sad.
We single-file our bitcoind requests, because bitcoind has a limited
thread pool and it *fails* rather than queueing if you upset it. We
probably be fine using separate queues for each command type, but simply
allowing some requests to cut in line should prove my theory that we're
getting stuck behind gossip verification requests.
fiatjaf now gets one block per 2 minutes.
fiatjaf is less sad.
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
Everything depends on common headers etc, and the HSM_CLIENT_HEADERS was removed
quite a while ago.
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
We would never complete further ping commands if we had < responses
than pings. Oops.
Fixes: #1928
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
We want to try it before --daemon, in case we error, but we don't know
the pid yet, so we split into 'lock' and 'write'.
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
If we run two daemons on the same directory we'd be getting the failure from
trying to listen to the same file before we'd hit the pid-file error, which was
causing confusion.
The first argument of 'ping' was documented as 'peerid', however
internally it is expected to be just 'id'.
To avoid breaking the API, opt to fix the documentation.
This was found because it means we have a non-zero feerate without
filling in the history of that feerate:
==15895== Conditional jump or move depends on uninitialised value(s)
==15895== at 0x408699: feerate_max (chaintopology.c:828)
==15895== by 0x41BE49: peer_start_openingd (opening_control.c:733)
==15895== by 0x425FE9: peer_connected (peer_control.c:515)
==15895== by 0x40CB8F: connectd_msg (connect_control.c:304)
==15895== by 0x42DB4E: sd_msg_read (subd.c:475)
==15895== by 0x42D499: read_fds (subd.c:302)
==15895== by 0x46EB18: next_plan (io.c:59)
==15895== by 0x46F5E9: do_plan (io.c:387)
==15895== by 0x46F627: io_ready (io.c:397)
==15895== by 0x471187: io_loop (poll.c:310)
==15895== by 0x41683D: main (lightningd.c:732)
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
Documentation changes:
1. Lots of extra detail suggested by @renepickhardt.
2. typo fixes from @practicalswift.
3. A section on 'const' usage.
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
Code changes:
1. Expose daemon_poll() so lightningd can call it directly, which avoids us
having store a global and document it.
2. Remove the (undocumented, unused, forgotten) --rpc-file="" option to disable
JSON RPC.
3. Move the ickiness of finding the executable path into subd.c, so it doesn't
distract from lightningd.c overview.
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
This could have been a local static but its used by the run-param test,
so putting it in json.c made things easier.
Signed-off-by: Mark Beckwith <wythe@intrig.com>
Note: Unlike before, this will now accept positional parameters.
Note: In case of error we no longer report the hop number. Is this acceptable?
We still report the name of the bad param and its value.
One option is to log the hop number if param() returns false. This would require
a change to command_fail so it doesn't delete the cmd, so we can still
access cmd->ld->log.
Signed-off-by: Mark Beckwith <wythe@intrig.com>
The `json_tok_X` functions now consistently check the success case first
and call `command_fail` at the end.
Signed-off-by: Mark Beckwith <wythe@intrig.com>