Up until now, riskfactor was useless due to implementation bugs, and
also the default setting is wrong (too low to have an effect on
reasonable payment scenarios).
Let's simplify the definition (by assuming that P(failure) of a node
is 1), to make it a simple percentage. I examined the current network
fees to see what would work, and under this definition, a default of
10 seems reasonable (equivalent to 1000 under the old definition).
It is *this* change which finally fixes our test case! The riskfactor
is now 40msat (1500000 * 14 * 10 / 5259600 = 39.9), comparable with
worst-case fuzz is 50msat (1001 * 0.05 = 50).
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
We were only comparing by total msatoshis.
Note, this *still* isn't sufficient to fix our indirect problem, as
our risk values are all 1 (the minimum):
lightning_gossipd(25480): 2 hop solution: 1501990 + 2
lightning_gossipd(25480): 3 hop solution: 1501971 + 3
...
lightning_gossipd(25480): => chose 3 hop solution
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
We used a u16, and a 1000 multiplier, which meant we wrapped at
riskfactor 66. We also never undid the multiplier, so we ended up
applying 1000x the riskfactor they specified.
This changes us to pass the riskfactor with a 1M multiplier. The next
patch changes the definition of riskfactor to be more useful.
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
The test sometimes passes: our routing logic always chooses between
the shorter of two equal-cost routes (because we compare best with <
not <=).
By adding another hop, we add more noise, and by making the alternate
route fee 0 we provide the worst case.
But to be fair, we make the amount of the payment ~50c (15,000,000
msat), and increase our cltv-delay to 14 and fee-base 1000 to match
mainnet. The final patch shows the effect of this choice.
Otherwise our risk penalty is completely in the noise on
mainnet which has the vast majority of fees set at 1000msat + 1ppm.
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
This is the direct cause of the failure of the original
test_pay_direct test and it makes sense: invoice routehints may not be
necessary, so try without them *first* rather than last.
We didn't mention the use of routehints in CHANGELOG at all yet, so
do that now.
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
We have a seed, which is for (future!) unit testing consistency. This
makes it change every time, so our pay_direct_test is more useful.
I tried restarting the noed around the loop, but it tended to fail
rebinding to the same port for some reason?
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
Add the change output to owned_txfilter so its entry in db will
get a confirmation_height when detected in a block by filter_block_txs
before this commit, after a 'withdraw' command, 'listfunds' would
not show our change outputs as confirmed
Modified the log message in wallet_extract_owned_outputs to
append 'CONFIRMED' when it is called with a blockheight arg.
To make distinction between (1st call) when adding owned output to the
db and (2th call) when confirmed in block.
Otherwise a straight "make install" gives:
install: cannot stat 'plugins/pay': No such file or directory
make: *** [Makefile:482: install-program] Error 1
Fixes: #2288
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
Similar to the previous "handle peer input before gossip input", this
fixes similar potential deadlock for closingd and openingd which use
peer_or_gossip_sync_read.
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
This solves (or at least reduces probability of) a deadlock in channeld
when there is lot of gossip traffic, see issue #2286. That issue is
almost identical to #1943 (deadlock in openingd) and so is the fix.
- result fundchannel command now depends on successful or failed broadcast of the funding tx
- failure returns error code FUNDING_BROADCAST_FAIL
- don't fail the channel when broadcast failed, but keep in CHANNELD_AWAITING_LOCKIN
- after fixing the initial broadcast failure, the user could manually rebroadcast the tx and
keep the channel
openingd/opening_funder_finished:
- broadcast_tx callback function now handles both success and failure
jsonrpc: added error code FUNDING_BROADCAST_FAIL
manpage: added error code returned by fundchannel command
This makes the user more aware of broadcast failure, so it hopefully doesn't
try to broadcast new tx's that depend on its change_outputs. Some users have reported (see
issue #2171) a whole sequence of fundings failing, because each funding was using the change
output of the previous one, which would not confirm.
We actually produce an invalid JSON error at the moment: bitcoin-cli
complains "JSON value is not an integer as expected" rather than returning
the given error. Make our error a valid JSON RPC error.
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
It is suddenly timing out a lot and is breaking master, so we
temporarily disable it until it is fixed.
Signed-off-by: Christian Decker <decker.christian@gmail.com>
We were restarting the with the nodes before, which was causing some
port contention. This is more natural since `bitcoind` will take care
of terminating all proxies it returned.
Signed-off-by: Christian Decker <decker.christian@gmail.com>
Logging an empty line (without newline character) would raise an
Exception due to out of bounds check.
Signed-off-by: Christian Decker <decker.christian@gmail.com>
The use of `json_tok_full_len` and `json_tok_full` in addition to
single quotes will result in double quoting, which is really weird. I
opted to single quoting using `'` instead which does not need to be
escaped.
Signed-off-by: Christian Decker <decker.christian@gmail.com>
My manpage viewer did not know what to do with curly braces,
so I switched them to quotes and it works fine.
Signed-off-by: Mark Beckwith <wythe@intrig.com>
List the final one instead; if there's an error from the node it
may actually make sense to blame that channel (ie. previous node
did something wrong).
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
We no longer need a 'sendpay_result' structure, we can pass
appropriate parameter directly now they're simple calls.
Every waitsendpay command ends in tell_waiters_failed or
tell_waiters_success, which call sendpay_success or sendpay_fail on
all matching waiters. These all return 'struct command_result *'.
In cases where the result is immediately known, we call
sendpay_success/sendpay_fail directly for the command.
This also adds a helpful 'failcodename' field to the JSON output.
[ This was four separate cleanup patches, but that contained much
redundancy and was even worse to review ]
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
With only one caller, we don't need a callback pointer any more; we can simply
call the function.
This required some code shuffling, and I changed the callback function
arguments to be in a more natural order, now they're not used as
callbacks.
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
Now we don't have a second caller for these routines, we can move
them back into pay.c and make the functions static.
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
As a general rule, lightningd shouldn't parse user packets. We move the
parsing into gossipd, and have it respond only to permanent failures.
Note that we should *not* unconditionally remove a channel on
WIRE_INVALID_ONION_HMAC, as this can be triggered (and we do!) by
feeding sendpay a route with an incorrect pubkey.
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
We had a bug 0ba547ee10 caused by
short_channel_id overflow. If we'd caught this, we'd have terminated
the peer instead of crashing, so add appropriate checks.
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
I upgraded my node with --disable-compat, and a heap of channels closed like:
CHANNELD_NORMAL:We disagree on short_channel_ids: I have 557653x0x1351, you say 557653x2373x1",
This is because the scids are strings in the databases, and it failed to parse
them properly.
Now we'll not start if that happens.
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
Internally libplugin turns ' into ", which causes these messages to produce
bad JSON.
The real fix is to remove the '->" convenience substitution and port the
JSON creation APIs into common/ from lightningd/
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
Spurious errors were occuring around checking the provided
current commitment point from the peer on reconnect when
option_data_loss_protect is enabled. The problem was that
we were using an inaccurate measure to screen for which
commitment point to compare the peer's provided one to.
This fixes the problem with screening, plus makes our
data_loss test a teensy bit more robust.
Valgrind seems to be slowing the pay-plugin down enough for the 10
seconds timeout to get triggered on a semi-regular basis.
Reported-by: Rusty Russell <@rustyrussell>
Signed-off-by: Christian Decker <decker.christian@gmail.com>
Christian points out that we can iterate by ->size rather than calling
json_next() to find the end (which traverses the entire object!).
Now ->size is reliable (since previous patch), this is OK.
Reported-by: @cdecker
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
jsmn would accept invalid JSON objects. This is bad because it would
set ->size incorrectly: we expect to have at least size * 2 tokens (in
pairs). We want to rely on ->size, but this would create an exploitable
buffer overflow!
Fortunately, this is fixed upstream, so we add a test and upgrade to v1.0.0.
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
Wasn't using valid JSON, but worked anyway. This is actually OK
because we don't rely on tok->size, but we want to, so another fix
coming.
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
The external/jsmn/README.md only says:
int size; // Number of child (nested) tokens
But it only counts *direct* children, or *direct* members for an object.
This test verifies this (the bug proved to be elsewhere: see next patch!).
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
Avoid the unnecessary extra var, and don't use "capacity" since
that usually refers to static capacity.
Reported-by: @cdecker
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
Had a couple of tests randomly fail because a valgrind error file was
not empty. It contained:
lightning_channeld: Writing out status 65520: Broken pipe
This shouldn't happen, but the simplest workaround is not to print
that (useless) error.
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
So add a new 'strategy' field. This makes it clearer what is going
on, currently one of:
* "Initial attempt"
* "Excluded channel <scid>"
* "Removed route hint"
* "Excluded expensive channel <scid>"
* "Excluded delaying channel <scid>"
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>