It went something like:
niftynei: Hey, cppcheck complains this might be NULL, so I put in a check.
rusty: cppcheck is dumb. Make it an assert("Rusty always right!").
niftynei: You seem certain of this so I shall do that.
https://github.com/ElementsProject/lightning/pull/1994
...
renepickhardt: I asked fiatjaf to run
`lightning-cli sendpay "[{'id':'02db8f487fcc0a'}]" 4efe0ba89b`
and his node crashed!
rusty: grep Assertion logs/*
lightningd/jsonrpc.c:326: connection_complete_error: Assertion `Rusty is always right!' failed.
It turns out that in the 'can't parse' error case, we hand NULL cmd to
connection_compete_error.
Next time, less asserting, more grepping!
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
Totally forgot to add this test. It just shows how a writer can take
exclusive access of a socket over multiple `io_write` calls, and
queuing all others behind it.
This is a bit of overkill now that we simply accumulate the entire
JSON response in the buffer before flushing, but when we move to
streamed responses it allows us to have a single command that has
exclusive access to the out direction of the JSON-RPC connection.
We've done this a number of times already where we're getting
exclusive access to either the out direction of a connection, or we
try to lock out the read side while we are responding to a previous
request. They usually are really cumbersome because we reach around to
the other direction to stop it from proceeding, or we flag our
exclusive access somewhere, and we always need to know whom to notify.
PR ElementsProject/lightning#1970 adds two new instances of this:
- Streaming a JSON response requires that nothing else should write
while the stream is active.
- We also want to stop reading new requests while we are responding
to one.
To remove the complexity of having to know whom to stop and notify
when we're done, this adds a simple `io_lock` primitive that can be
used to get exclusive access to a connection. This inverts the
requirement for notifications, since everybody registers interest in
the lock and they get notified if the lock holder releases it.
This is the source of failure in the test_restart_many_payments stress
test: we don't commit the outgoing HTLC immediately, instead waiting for
gossip to tell us the peer for the outgoing channel, then waiting for
that channeld to tell is it's committed. The result was incoming HTLCs
with no outgoing.
I initially pushed the HTLCs through that same path, but of course
(since peers are not connected yet!) the only result was that we failed
these HTLCs immediately. So I chose the far simpler course of just
failing them directly.
To reproduce this, I had to increase the test_restart_many_payments
num to 10, and run it with nice -20 taskset -c 0.
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
During tests, this is half our log! And Travis truncates it if we get
a failure in test_restart_many_payments.
Interestingly, test_logging had a bug which relied on this spam :)
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
Now that we're returning all the help data, we need to update the human_help
formatter to handle the extra data.
Signed-off-by: William Casarin <jb55@jb55.com>
Instead of exiting when we can't find a manpage, set the command and continue so
that we can try the json rpc for help.
Signed-off-by: William Casarin <jb55@jb55.com>
Instead of two code paths that return different help objects, simplify things by
always returning the full help object. This not only includes description and
the command name, but the verbose description as well.
Signed-off-by: William Casarin <jb55@jb55.com>
If another channel has set the optional `htlc_maximum_msat` field,
we should correctly parse that field and respect it when drawing up
routes for payments.
Noted by @cdecker, the term 'local' is grossly overused, and the hout
preimage is basically only used as a sanity check (though I've just put
a FIXME there for now).
Also eliminated spurious blank line which crept into wallet.c.
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
We don't expect payment or payment->route_channels to be NULL without an
old db, but putting an assert there reveals that we try to fail an HTLC
which has already succeeded in 'test_onchain_unwatch'.
Obviously we only want to fail an HTLC which goes onchain if we don't
already have the preimage!
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
failoutchannel tells us which channel to send an update for (specifically
for temporary_channel_failure); but we don't save it into the db. It's
not even clear we should, since it's a corner case and the channel might
not even exist when we come back.
So on db restore, change such errors to WIRE_TEMPORARY_NODE_FAILURE
which doesn't need an update.
We also don't memset it to 0 in the normal case (we only access if it
failcode has the UPDATE bit set) so valgrind will trigger if we're
wrong.
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
We don't save them to the database, so fix things up as we load them.
Next patch will actually save them into the db, and this will become
COMPAT code.
Also: call htlc_in_check() with NULL on db load, as otherwise it aborts
internally.
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
Under stress, it fails (test_restart_many_payments, the next test).
I suspect a deep misunderstanding in the comparison code, will chase
separately.
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
Usually, we only close an incoming HTLC once the outgoing HTLC is completely
resolved. However, we short-cut this in the FULFILL case: we have the
preimage, so might as well use it immediately (in fact, we wait for it to
be committed, but we don't need to in theory).
As a side-effect of this, our assumption that every outgoing HTLC has
a corresponding incoming HTLC is incorrect, and this test (xfail) tickles
that corner case.
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
This means we need to check when we've altered the state, so the checks
are moved to the callers of htlc_in_update_state and htlc_out_update_state.
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
globalfeatures should not be accessed if we haven't received a
channel_update. Treat it like the other fields which are only
initialized and marshalled/unmarshalled if the timestamp is positive.
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
It seems to be having a bit of trouble understanding the control flow to realize
it's not actually uninitialized.
Add an error handler after the switch in case we miss a real uninitialized error
in the future.
Signed-off-by: William Casarin <jb55@jb55.com>