Paid invoices need to know how much was actually paid: both for the case
where no 'msatoshi' amount was specified, and for the normal case, where
clients are permitted to overpay in order to help them disguise their
payments.
While we migrate the db, we leave this field as 0 for old paid
invoices. This is unhelpful for accounting, but at least clearly
indicates what happened if we find this in the wild.
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
'rhash' is the old terminology, but 'payment_preimage' and
'payment_hash' were decided on for the BOLTs, so we should fix that here.
We still use rhash internally, but that's much easier to fix.
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
> assert [c['active'] for c in l2.rpc.getchannels()['channels']] == [True, True]
E AssertionError: assert [False, True] == [True, True]
E At index 0 diff: False != True
E Full diff:
E - [False, True]
E + [True, True]
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
We need to make sure all the updates are known to gossip. Since
one is the local update, we change that message to look the same.
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
Receiving them in channeld is not enough to avoid the race:
route = l1.rpc.getroute(l3.info['id'], 4999999, 1)["route"]
...
ValueError: RPC call failed: Could not find a route
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
We wait the the receipt of the CHANNEL_UPDATE message by channeld,
but that doesn't mean it reached gossipd yet, causing spurious test
failure.
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
Since we seem to have some isolation concerns when re-generating the
same HSM secret and re-parsing the blockchain some blocks in the past.
This also alleviates the problem of printing to a logging stream that
has been closed. Previously bitcoind would keep running despite a test
had failed and continue logging to the, now closed, StringIO that
py.test uses when capturing stdout.
The performance impact seems to be 1-3 second per test, not too bad
IMHO for increased test isolation and cleaner logs:
|--------------------+---------------+----------|
| | No_valgrind | Valgrind |
|--------------------+---------------+----------|
| bitcoind per suite | 10 min 24 sec | 46:15.31 |
| bitcoind per test | 11 min 38 sec | 49:21.64 |
|--------------------+---------------+----------|
Signed-off-by: Christian Decker <decker.christian@gmail.com>
I was examining a test_onchain_timeout failure, and realized that we
were forgetting a peer even though we'd just spent the HTLC_TIMEOUT_TX!
This reveals that we weren't resolving an output when we stole the preimage
from it, like we should.
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
Since we panic when we see our root reorg out, even if we're not doing
anything yet, restoring the 100 block margin is the simplest fix.
Unfortunately this means adding a 100-block spacer in the tests, so things
don't get confused.
Fixes: #511
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
We stopped too early; we should continue and make sure it all goes well.
This means we have to fix them to be deterministic: by generating 2
blocks at once in test_htlc_in_timeout, we raced between fulfill and
timeout on the HTLC. Now it's always fulfilled.
Also, fixed confusing comments: l1 doesn't drop to chain.
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
OUR_HTLC_TIMEOUT_TO_US = normal tx, used to timeout htlc in their commit tx.
OUR_HTLC_TIMEOUT_TX = dual-sig tx with delay, used to timeout htlc in our commit tx.
Only one test looks at that string, so fix that too.
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
The test_reconnect_normal test is failing rather consistently on 32bit
architectures, disabling to reduce noise. Issue #468 tracks progress.
Signed-off-by: Christian Decker <decker.christian@gmail.com>
Relatively simple: until we reach funding-depth the channels should be
known locally, so we can already route through them, but they should
not be announced to peers to which the connection is non-local.
Signed-off-by: Christian Decker <decker.christian@gmail.com>
If send_htlc_out() fails, it doesn't initialize pc->out; that can
make us think it's still in progress.
Reported-by: Jonas Nick
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
All peers come from gossipd, and maintain an fd to talk to it. Sometimes
we hand the peer back, but to avoid a race, we always recreated it.
The race was that a daemon closed the gossip_fd, which made gossipd
forget the peer, then master handed the peer back to gossipd. We stop
the race by never closing the gossipfd, but hand it back to gossipd
for closing.
Now gossipd has to accept two fds, but the handling of peers is far
clearer.
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
We should not disconnect from a peer just because it fails opening; we
should return it to gossipd, and give a meaningful error.
Closes: #401
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
We don't use it yet, but now we'll decode correctly.
See: https://github.com/lightningnetwork/lightning-rfc/pull/317
lightning-rfc commit: ef053c09431442697ab46e83f9d3f86e3510a18e
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
If you run locally, it fails occasionally; presumably because it
sees previous funds. Use a random HSM key for that teste.
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
I noticed some breakage with git master:
1. getinfo no longer supported (for us, use getblockchaininfo)
2. generate no longer supported (use generatetoaddress)
Both these options are supported at least in 0.15, too.
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
This seems to happen when we manage to check between the
channel_announcement and the channel_update being processed.
Signed-off-by: Christian Decker <decker.christian@gmail.com>
Add two simple tests: one for a single direct payment and one with
hundreds of parallel payments, reusing the same route.
Signed-off-by: Christian Decker <decker.christian@gmail.com>
We are announcing that we are willing to accept incoming payments with
current_height + min_final_cltv_expiry + slack, assuming that the
sender adds some slack. In particular we'd reject the payment if
slack=0 which is allowed by the spec.
Signed-off-by: Christian Decker <decker.christian@gmail.com>
Raising the exception about non-zero exit values into the
teardown. This was previously masking the valgrind errors. Now
valgrind errors > crash errors > non-zero return value.
Still hoping to catch that elusive [7, 0] return value on travis.
Signed-off-by: Christian Decker <decker.christian@gmail.com>
These need to be different for testing the example in BOLT 11.
We also use the cltv_final instead of deadline_blocks in the final hop:
various tests assumed 5 was OK, so we tweak utils.py.
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
I run with ulimit -c unlimited, and valgrind leaves core files like
valgrind-errors.22114.core.22114 which test_lightning.py tries to
parse as log files.
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
It crashes under valgrind, causing a valgrind error: valgrind gives us a
backtrace anyway, so we don't need it.
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
And we report these through the getpeers JSON RPC again (carefully: in
our reconnect tests we can get duplicates which this patch now filters
out).
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
This is a bit messier than I'd like, but we want to clearly remove all
dev code (not just have it uncalled), so we remove fields and functions
altogether rather than stub them out. This means we put #ifdefs in callers
in some places, but at least it's explicit.
We still run tests, but only a subset, and we run with NO_VALGRIND under
Travis to avoid increasing test times too much.
See-also: #176
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
There are now only two kinds of subdaemons: global ones (hsmd, gossipd) and
per-peer ones. We can handle many callbacks internally now.
We can have a handler to set a new peer owner, and automatically do
the cleanup of the old one if necessary, since we now know which ones
are per-peer.
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
Now the flow is much simpler from a lightningd POV:
1. If we want to connect to a peer, just send gossipd `gossipctl_reach_peer`.
2. Every new peer, gossipd hands up to lightningd, with global/local features
and the peer fd and a gossip fd using `gossip_peer_connected`
3. If lightningd doesn't want it, it just hands the peerfd and global/local
features back to gossipd using `gossipctl_handle_peer`
4. If a peer sends a non-gossip msg (eg `open_channel`) the gossipd sends
it up using `gossip_peer_nongossip`.
5. If lightningd wants to fund a channel, it simply calls `release_channel`.
Notes:
* There's no more "unique_id": we use the peer id.
* For the moment, we don't ask gossipd when we're told to list peers, so
connected peers without a channel don't appear in the JSON getpeers API.
* We add a `gossipctl_peer_addrhint` for the moment, so you can connect to
a specific ip/port, but using other sources is a TODO.
* We now (correctly) only give up on reaching a peer after we exchange init
messages, which changes the test_disconnect case.
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
We currently assume the daemon gives up; gossipd won't, and we want to
test it there too.
This reveals a bug (returning io_close() is bad if the call is to
duplex()), and breaks a test which now continues after dropping a
packet..
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
Useful if we want to drop & suppress, for example. We change '=' to mean
do nothing to the packet.
We use this to clean up the test_reconnect_sender_add test.
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
We're going to make the ip/port optional, so they should go at the end.
In addition, using ip:port is nicer, for gethostbyaddr().
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
We have to do a dance when we get a reconnect in openingd, because we
don't normally expect to free both owner and peer. It's a layering
violation: freeing a peer should clean up the owner's pointer to it,
to avoid a double free, and we can eliminate this dance.
The free order is now different, and the test_reconnect_openingd was
overprecise.
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
Now that we have HTLC persistence we'd also like to test it. This
kills the second node in the middle of an HTLC, it'll recover and
finish the flow.
Signed-off-by: Christian Decker <decker.christian@gmail.com>
Moved the flagging for allowed failures into the factory getter, and
renamed into `may_fail`. Also stopped the teardown of a node from
throwing an exception if we are allowed to exit non-cleanly.
Signed-off-by: Christian Decker <decker.christian@gmail.com>
A failed returncode check could result in the cleanup for other
lightningds to be skipped. Now make sure to cleanup all and then
rethrow an exception that contains all returncodes.
Signed-off-by: Christian Decker <decker.christian@gmail.com>
We used to simply kill the daemon, which in some cases could result in
half-written crashlogs and similar artifacts such as half-completed
RPC calls. Now we ask lightningd to stop nicely, give it some time and
only then kill it. We also return the returncode of the daemon.
Signed-off-by: Christian Decker <decker.christian@gmail.com>
peer_fail_permanent() frees peer->owner, but for bad_peer() we're
being called by the sd->badpeercb(), which then goes on to
io_close(conn) which is a child of sd.
We need to detach the two for this case, so neither tries to free the
other.
This leads to a corner case when the subd exits after the peer is gone:
subd->peer is NULL, so we have to handle that too.
Fixes: #282
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
Note that it should really be a flag to daemon on construction, too,
but that may interfere with another concurrent branch so I've deferred.
Suggested-by: Christian Decker
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
We re-use the value for reasonable_depth given by the master, and we
tell it when our timeout transactions reach that depth.
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
When we sent out an HTLC-Timeout or HTLC-Success tx, we need to spend
it after the timeout so it's safely in our wallet.
We generalize the tx_type OUR_UNILATERAL_TO_US_RETURN_TO_WALLET to
OUR_DELAYED_RETURN_TO_WALLET, since we use it for HTLC transactions too.
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
When we see an offered HTLC onchain, we need to use the preimage if we
know it. So we dump all the known HTLC preimages at startup, and send
new ones as we discover them.
This doesn't cover preimages we know because we're the final
recipient; that can happen if an HTLC hasn't been irrevocably
committed yet. We'll do that in a followup patch.
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
lightningd can crash on shutdown if it's in the middle of getchaintips;
we free the conn, the finished callback is called (process_chaintips),
and it reports that it received an empty result.
The simplest fix is to set a flag in the struct bitcoind destructor,
and avoid the callback.
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
Either when it exits with a signal, or sends an error status message.
Then we make test_lightningd.py use it.
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
We simply kill lightningd; we should stop it properly and have a timeout
to kill it if that fails. However, that's beyond my python skills :(
So we just look for crash.log. Unfortunately, we usually kill
lightningd before it's finished writing it. So we look for it and
don't kill lightningd, just wait in this case.
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
This is the step where we broadcast the transaction to the network and
a nice place to extract the change from the transaction.
Signed-off-by: Christian Decker <decker.christian@gmail.com>
For the permfail tests the sendpay call is supposed to fail, so this
was printing stacktraces upon success. Running in futures captures any
thrown exceptions and rethrows them when calling `result()`, in our
case we just ignore them.
Signed-off-by: Christian Decker <decker.christian@gmail.com>
So far we were always using the deadline in the announcements, that's
obviously not good, so this introduces the parameter as per spec.
Signed-off-by: Christian Decker <decker.christian@gmail.com>
We weren't killing it. Eventually it would die, and peer_owner_finished()
would access subd->peer->owner, but that peer was freed already.
Closes: #261
Reported-by: Christian Decker <decker.christian@gmail.com>
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
jl777 reported a crash when we try to pay past reserve. Fix that (and
a whole class of related bugs) and add tests.
In test_lightning.py I had to make non-async path for sendpay() non-threaded
to get the exception passed through for testing.
Closes: #236
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
I was hoping to defer HTLC updates until we actually store HTLCs, but
we need to flush to DB whenever balances update as well.
Signed-off-by: Christian Decker <decker.christian@gmail.com>
This is the big one, and it's completely anticlimactic: it loads all
channels that have reached opening and are not marked as
closingd_complete into memory, that's it.
Signed-off-by: Christian Decker <decker.christian@gmail.com>
We'll need this for testing nodes going down during payment.
However, there's no good way to silence the threads that I can tell,
so we get a nasty backtrace from it.
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
I tracked down a bug, and couldn't figure out why valgrind wasn't
finding it.
From man valgrind:
--log-file=<filename>
Specifies that Valgrind should send all of its messages to the
specified file. If the file name is empty, it causes an abort.
There are three special format specifiers that can be used in the
file name.
%p is replaced with the current process ID. This is very useful for
program that invoke multiple processes. WARNING: If you use
--trace-children=yes and your program invokes multiple processes OR
your program forks without calling exec afterwards, and you don't
use this specifier (or the %q specifier below), the Valgrind output
from all those processes will go into one file, possibly jumbled
up, and possibly incomplete.
"possibly incomplete" indeed!
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
We weren't registering reconnecting peers for broadcasts. Just
starting a timer is enough. Also added an integration test to check
that the gossip sync is being resumed.
This is a transitional state, while we're waiting to see the
closing tx onchain (which is To Be Implemented).
The simplest way to do re-transmission is to re-use closingd, and just
disallow any updates.
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
We actually don't need to transition if we're reconnecting, and logic
to go to CHANNELD_NORMAL was wrong: we checked that we'd seen funding tx
locked, but not that we'd received a msg from the remote peer.
We need to fix the tests now we no longer double-transition, too.
Fixes: #188
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
When we drop an HTLC_ADD packet, sometimes the commit timer fires
before we try to read from the fd. In this case, the payment is
considered committed and we don't fail.
We need manual commit to work around this, and also we'd need to do
the pay command asynchronously from python because it will block.
That's a bit out of scope for now, so just handle either way.
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
We keep the scriptpubkey to send until after a commitment_signed (or,
in the corner case, if there's no pending commitment). When we
receive a shutdown from the peer, we pass it up to the master.
It's up to the master not to add any more HTLCs, which works because
we move from CHANNELD_NORMAL to CHANNELD_SHUTTING_DOWN.
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
Currently it's fairly ad-hoc, but we need to tell it to channeld when
it restarts, so we define it as the non-HTLC balance.
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
When adding their HTLCs, it needs all the information. When failing,
it needs the id as key and the failure reason. When fulfilling, it
needs the id and payment preimage.
It also needs to know when we have received an revoke_and_ack or a
commitment_signed, to place in the database.
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
On my laptop under load, 5 seconds was no longer enough for legacy.
But this breaks async (they all see mempool increase, and fire
prematurely), so stop doing that.
I can't get this test to work at all, in fact, without this patch.
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
I actually hit this very hard to reproduce race: if we haven't process
the channeld message when block #6 comes in, we won't send the gossip
message. We wait for logs, but don't generate new blocks, and timeout
on l1.daemon.wait_for_log('peer_out WIRE_ANNOUNCEMENT_SIGNATURES').
The solution, which also tests that we don't send announcement signatures
immediately, is to generate a single block, wait for CHANNELD_NORMAL,
then (in gossip tests), generate 5 more.
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>