d822ba1ee accidentally removed this case, which is important: if the
other side didn't get our final matching closing_signed, it will
reconnect and try again. We consider the channel no longer "active"
and thus ignore it, and get upset when it send the
`channel_reestablish` message.
We could just consider CLOSINGD_COMPLETE to be active, but then we'd
have to wait for the closing transaction to be mined before we'd allow
another connection.
We can't special case it when the peer reconnects, because there
could be (in theory) multiple channels for that peer in CLOSINGD_COMPLETE,
and we don't know which one to reestablish.
So, we need to catch this when they send the reestablish, and hand
that msg to closingd to do negotiation again. We already have code
to note that we're in CLOSINGD_COMPLETE and thus ignore any result
it gives us.
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
We're about to remove automatic retrying of connect, and that uncovered
that we actually print out our "Server started" message before we create
the listening socket.
Move the init higher (outside the db transaction) and make it a
request/response, the loop until it's done.
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
The new connect code revealed an existing race: we tell gossipd to
release the peer, but at the same time it connects in. gossipd fails
the release because the peer is remote, and json_fundchannel fails.
Instead, we catch this race when we get peer_connected() and we were
trying to open a channel. It means keeping a list of fundchannels which
are awaiting a gossipd response though.
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
We missed it in some corner cases where we crashed/were killed between
being told of the lockin and sending the channel_normal_operation message.
When we were restarted, we were told both sides were locked in already,
so we never updated the state.
Pull the entire "tell channeld" logic into channel_control.c, and make
it clear that we need to keep waching if we cant't tell channeld. I think
we did get this correct in practice, since funding_announce_cb has the
same test, but it's better to be clear.
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
We'd usually commit to the db soon, but there's a window where it
could be missed.
Also moves loc into the block it's used and make it tmpctx to avoid
an explicit free.
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
Without this, we can get errors on shutdown:
Valgrind error file: valgrind-errors.27444
==27444== Invalid read of size 8
==27444== at 0x1950E2: secp256k1_pubkey_load (secp256k1.c:127)
==27444== by 0x19CF87: secp256k1_ec_pubkey_serialize (secp256k1.c:189)
==27444== by 0x14FED9: towire_pubkey (towire.c:59)
==27444== by 0x15AAFB: towire_gossipctl_peer_disconnected (gen_gossip_wire.c:969)
==27444== by 0x1253EF: opening_channel_errmsg (opening_control.c:526)
==27444== by 0x1386A3: destroy_subd (subd.c:589)
==27444== by 0x18222C: notify (tal.c:240)
==27444== by 0x1826E1: del_tree (tal.c:400)
==27444== by 0x182733: del_tree (tal.c:410)
==27444== by 0x182733: del_tree (tal.c:410)
==27444== by 0x182B1F: tal_free (tal.c:511)
==27444== by 0x11FC53: main (lightningd.c:410)
==27444== Address 0x6c3af98 is 72 bytes inside a block of size 216 free'd
==27444== at 0x4C30D3B: free (in /usr/lib/valgrind/vgpreload_memcheck-amd64-linux.so)
==27444== by 0x1827BC: del_tree (tal.c:421)
==27444== by 0x182B1F: tal_free (tal.c:511)
==27444== by 0x11F3C7: shutdown_subdaemons (lightningd.c:211)
==27444== by 0x11FC27: main (lightningd.c:406)
==27444== Block was alloc'd at
==27444== at 0x4C2FB0F: malloc (in /usr/lib/valgrind/vgpreload_memcheck-amd64-linux.so)
==27444== by 0x182296: allocate (tal.c:250)
==27444== by 0x182863: tal_alloc_ (tal.c:448)
==27444== by 0x12F2DF: new_peer (peer_control.c:74)
==27444== by 0x125600: new_uncommitted_channel (opening_control.c:576)
==27444== by 0x125870: peer_accept_channel (opening_control.c:668)
==27444== by 0x13032A: peer_sent_nongossip (peer_control.c:427)
==27444== by 0x116B9E: peer_nongossip (gossip_control.c:60)
==27444== by 0x116F2B: gossip_msg (gossip_control.c:172)
==27444== by 0x138323: sd_msg_read (subd.c:503)
==27444== by 0x137C02: read_fds (subd.c:330)
==27444== by 0x175550: next_plan (io.c:59)
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
Also report tx and txid, and whether we closed unilaterally or
bilaterally, if we could close the channel.
Also make a manpage.
Fixes: #1207Fixes: #714Fixes: #622
We had an intermittant test failure, where the fee we negotiated was
further from our ideal than the final commitment transaction. It worked
fine if the other side sent the mutual close first, but not if we sent
our unilateral close first.
ERROR: test_closing_different_fees (__main__.LightningDTests)
----------------------------------------------------------------------
Traceback (most recent call last):
File "tests/test_lightningd.py", line 1319, in test_closing_different_fees
wait_for(lambda: p.rpc.listpeers(l1.info['id'])['peers'][0]['channels'][0]['status'][1] == 'ONCHAIN:Tracking mutual close transaction')
File "tests/test_lightningd.py", line 74, in wait_for
raise ValueError("Error waiting for {}", success)
ValueError: ('Error waiting for {}', <function LightningDTests.test_closing_different_fees.<locals>.<lambda> at 0x7f4b43e31a60>)
Really, if we're prepared to negotiate it, we should be prepared to
accept it ourselves. Simply take the cheapest tx which is above our
minimum.
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
The only use for these was to compute their txids so we could notify depth
in case of reorgs.
Signed-off-by: Christian Decker <decker.christian@gmail.com>
We are slowly hollowing out the in-memory blockchain representation to make
restarts easier.
Signed-off-by: Christian Decker <decker.christian@gmail.com>
All of the callback functions were only using the tx to generate the txid again,
so we just pass that in directly and save passing the tx itself.
This is a simplification to move to the DB backed depth callbacks. It'd be
rather wasteful to read the rawtx and deserialize just to serialize right away
again to find the txid, when we already searched the DB for exactly that txid.
Signed-off-by: Christian Decker <decker.christian@gmail.com>
This will later allow us to determine the transaction confirmation count, and
recover transactions for rebroadcasts.
Signed-off-by: Christian Decker <decker.christian@gmail.com>
Internally both payment and routing use 64-bit, but the interface
between them used 32-bit.
Since both components already support 64-bit we should use that.
In the short_channel_id check we were copying the entire result into the next
bitcoin-cli call, including the newline character.
Signed-off-by: Christian Decker <decker.christian@gmail.com>
Reported-By: @gdassori
Creating the pid-file before daemonizing results in the pid-file containing the
pid of the process that started the daemon, but is now dead.
Signed-off-by: Christian Decker <decker.christian@gmail.com>
Reported-By: Torkel Rogstad @torkelrogstad
We can have more than one; eg we might offer both bech32 and a p2sh
address, and in future we might offer v1 segwit, etc.
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
There are two very hard problems in software engineering:
1. Off-by-one errors
In this case we were rolling back further than needed and we were starting the
catchup one block further than expected.
Signed-off-by: Christian Decker <decker.christian@gmail.com>
I saw a failure in test_funding_fail():
assert l2.rpc.listpeers()['peers'][0]['connected']
This can happen if l2 hasn't yet handed back to gossipd. Turns out
we didn't mark uncommitted channels as connected:
[{'id': '03afa3c78bb39217feb8aac308852e6383d59409839c2b91955b2d992421f4a41e', 'connected': False, 'channels': [{'state': 'OPENINGD', 'owner': 'lightning_openingd', 'funder': 'REMOTE', 'status': ['Incoming channel: accepted, now waiting for them to create funding tx']}]}]
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
This is probably covered by our "channel capacity" heuristic which
requires the channel be significant, but best to be explicit and sure.
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
So we know how much counterparty could theoretically steal from us
(msatoshi_to_us - msatoshi_to_us_min) and how much we could
theoretically steal from counterparty (msatoshi_to_us_max -
msatoshi_to_us).
For more piloting goodness.
In particular, the main daemon and subdaemons share the backtrace code,
with hooks for logging.
The daemon hook inserts the io_poll override, which means we no longer
need io_debug.[ch]. Though most daemons don't need it, they still link
against ccan/io, so it's harmess (suggested by @ZmnSCPxj).
This was tested manually to make sure we get backtraces still.
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
I didn't convert all tests: they can still use a standalone context.
It's just marginally more efficient to share the libwally one for all
our daemons which link against it anyway.
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
If we only remember the actions that added channels then we'd restore them when
re-reading the gossip_store, so put a tombstone in there to remember to delete
it. These will be cleared upon re-writing the store since the announcements wont
be written anymore.
Signed-off-by: Christian Decker <decker.christian@gmail.com>
This is necessary since we have onchaind tell us about the
their_unilateral/to_us output, after it is already in a block.
Signed-off-by: Christian Decker <decker.christian@gmail.com>
We don't handle \u, since we assume everyone sane is using UTF-8. We'd
still have to reject '\u0000' and maybe other weird cases if we did.
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
Since we may want to extend the on-disk format by adding custom information we
may as well just go the extra mile and reuse the serialization primitives we
already have.
Signed-off-by: Christian Decker <decker.christian@gmail.com>
But only if we're actually going to change the feerate, otherwise we'd
log every time.
Suggested-by: @ZmnSCPxj
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
Naively, this would be 250 satoshi per sipa, but it's not since bitcoind's
fee calculation was not rewritten to deal with weight, but instead bolted
on using vbytes.
The resulting calculations made me cry; I dried my tears on the thorns
of BUILD_ASSERT (I know that makes no sense, but bear with me here as I'm
trying not to swear at my bitcoind colleagues right now).
Fixes: #1194
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
This bug is a classic case of being lazy:
1. peer_accept_channel() allocated its return off the input message,
rather than taking an explicit allocation context. This concealed the
lifetime nature of the return.
2. The context for sanitize_error was the error itself, rather than the
more obvious tmpctx (connect_failed does not take).
The global tmpctx removes the "efficiency" excuse for grabbing a random
object to use as context, and is also nice and explicit.
All-the-hard-work-by: @ZmnSCPxj
This fixes the root cause of https://github.com/ElementsProject/lightning/issues/1212
where we deleted the payment because we wanted to retry, then retry failed
so we had an (old) HTLC without a matching payment. We then fed that
HTLC to onchaind, which tells us it's missing, and we try to fail the
payment and deref a NULL pointer.
Fixes: #1212
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
We would `block_map_add` inside `add_tip`, but we never
`block_map_del` inside `remove_tip`, which is dangerous as
we actually `tal_free` the block inside `remove_tip`.
Our CI did not reliably trap this problem since block
hashes are random and rerunning the `test_blockchaintrack`
often passed spuriously.
If we're going to simply take() a pointer, don't allocate it off a random
object. Using NULL makes our intent clear, particularly with allocating
packets we're going to take() onto a queue.
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
I did a brief audit of tmpctx uses, and we do leak them in various
corner cases. Fortunely, all our daemons are based on some kind of
I/O loop, so it's fairly easy to clean a global tmpctx at that point.
This makes things a bit neater, and slightly more efficient, but also
clearer: I avoided creating a tmpctx in a few places because I didn't
want to add another allocation. With that penalty removed, I can use
it more freely and hopefully write clearer code.
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
Needed for particular race condition: client calls `sendpay` with
intent to call `waitsendpay` later to get information, but the
payment fails after `sendpay` returns but before client can invoke
`waitsendpay`.
This lets client know of information even if it manages to invoke
`waitsendpay` "late".
As we add more features, the current code is insufficient.
1. Keep an array of single feature bits, for easy switching on and off.
2. Create feature_offered() which checks for both compulsory and optional
variants.
3. Invert requires_unsupported_features() and unsupported_features()
which tend to be double-negative, all_supported_features() and
features_supported().
4. Move single feature definition from wire/peer_wire.h to common/features.h.
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
Improve usability in these scenarios:
* bitcoin-cli not available in PATH and/or bitcoind not running
* bitcoin-cli available in PATH but bitcoind is not running
This simplifies things, and means it's always in the database. Our
previous approach to creating it on the fly had holes when it was
created for onchaind, causing us to use another every time we
restarted.
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
Our testing also reveals a bug: we start lightningd and shut it down
before fully processing the blockchain, so we don't set
last_processed_block. Fix that by setting it immediately once we have
a block: worst case it goes backwards a little.
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
In general, it is true that accessors should take const and discard it,
but chainparams is *always* const.
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
Transaction filters are strongly related to the wallet, this move just
makes it a bit more explicit.
Signed-off-by: Christian Decker <decker.christian@gmail.com>
I leave all the now-unnecessary accessors in place to avoid churn, but
the use of bitfields has been more pain than help.
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
Let's have a simple function that allows us to check whether a channel
still has an HTLC open.
Signed-off-by: Christian Decker <decker.christian@gmail.com>
The following command can be used to trigger these messages:
```
$ timeout 0.01 cli/lightning-cli connect [insert-syntactically-valid-peer-id-here] 123.123.123.123 # where 123.123.123.123 is unreachable
```
These error codes will cause `pay` to retry, so `pay` will never
actually report those error codes.
Those error codes will only get reported at the `sendpay` level.
* Modifies invoice command to have the following format
invoice <msatoshi> <label> <desc> <?expiry> <?fallbackaddr>
* Adds support for Segwit bcrt1 addresses for withdraw
* Add test case for fallback address in invoice creation
* Create a common json_tok_address_scriptpubkey to be used
by invoice and withdraw commands.
There are two recurring calls: the estimatefee call and the
getblockcount call. Currently we simply discard them on error, the
timer isn't rearmed.
This should fix a number of cases where bitcoind has an intermittant
failure and lightningd simply stops collecting blocks.
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
In particular, process_getblockhash() exits with status 8 when the block
number is out of range, which is expected. Any other exit status should
be treated as a spurious error.
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
The billboard is now far more useful to tell what's going on, and this
gets us closer to a state == owner mapping.
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
Use NULL on the callback to mean "clear the slot", and call it.
We have do this in two places: the old daemon might die, or the new
daemon might start first.
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
Each state (effectively, each daemon) has two slots: a permanent slot
if something permanent happens (usually, a failure), and a transient
slot which summarizes what's happening right now.
Uncommitted channels only have a transient slot, by their very nature.
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
In #1018 we got no information, except "Internal error". At least
if we tell the other side what went wrong, we're more likely to get
an answer.
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
If the source channel is onchain, we try to send a message to onchaind
which (1) doesn't care, (2) doesn't take a channel_fail_htlc msg, and
(3) causes us to crash in subd.c:
assert(!strstarts(sd->msgname(fromwire_peektype(msg_out)), "INVALID"));
Fixes: #821
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
We always hand in "NULL" (which means use tal_len on the msg), except
for two places which do that manually for no good reason.
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
We also fold opening_got_hsm_funding_sig() into the caller; it was
previously a callback before we decided to always use the HSM
synchronously.
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
When we clear and recreate ltmp, we attach it to whatever logbook it's on.
This, of course, is fraught, since it may be freed.
We could make it NULL-parented, but that makes YA special-case to free
when we exit (we try to keep valgrind happy by freeing everything). So
since the first log_book is the permanent one attached to lightningd,
just keep that parent when we re-build it after use.
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
Because peer_failed would previously drop the connection, we had a
special 'negotiation_failed' message which made the master hand it
back to gossipd. We don't need that any more.
This also meant we no longer need a special hook in read_peer_msg
for openingd to send this message.
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
Several daemons (onchaind, hsm) want to use the status messages, but
don't communicate with peers. The coming changes made them drag in
more code they didn't need, so instead we have a different
non-overlapping type.
We combine the status_received_errmsg and status_sent_errmsg
into a single status_peer_error, with the presence or not of the
'error_for_them' field indicating direction.
We also rename status_fatal_connection_lost() to
peer_failed_connection_lost() to fit in.
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
And now we can finally do the db upgrade to remove any OPENINGD
channels once, since we never put them back.
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
It's giant, but it's encapsulating at least. It is called from the wallet
code when loading channels, or from the opening code when converting
an uncommitted_channel.
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
Now any struct channel is a genuine channel, the following fields are
always valid:
1. funding_txid: doesn't need to be a pointer.
2. our_msatoshi: doesn't need to be a pointer.
3. last_sig: doesn't need to be a pointer.
4. channel_info: doesn't need to be a pointer.
In addition, 'last_tx' is always valid.
The main effect is to remove a whole heap of branches from the wallet code.
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
Each peer can have one 'uncommitted' channel, which is in the process
of opening. This is used for openingd, and then on return we convert
it into a full-fledged struct channel and commit it into the database.
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
This means the caller needs to supply an explicit log to base the
subd log on, and also a callback for error handling.
The callback is kind of ugly, but it gets reworked towards the end
of this series.
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
Once we rely on the logbook outlasting the peer, we can't refer to the
peer from the logbook function:
Valgrind error file: valgrind-errors.26567
==26567== Invalid read of size 8
==26567== at 0x126297: copy_to_parent_log (peer_control.c:690)
==26567== by 0x11C06B: maybe_print (log.c:253)
==26567== by 0x11C145: logv (log.c:270)
==26567== by 0x11C448: log_ (log.c:319)
==26567== by 0x132951: destroy_subd (subd.c:537)
==26567== by 0x179C19: notify (tal.c:240)
==26567== by 0x17A0CE: del_tree (tal.c:400)
==26567== by 0x17A120: del_tree (tal.c:410)
==26567== by 0x17A4ED: tal_free (tal.c:509)
==26567== by 0x16DEB5: io_close (io.c:443)
==26567== by 0x1328BC: sd_msg_read (subd.c:516)
==26567== by 0x1320AC: read_fds (subd.c:328)
==26567== Address 0x6cf9ca0 is 48 bytes inside a block of size 216 free'd
==26567== at 0x4C30D3B: free (in /usr/lib/valgrind/vgpreload_memcheck-amd64-linux.so)
==26567== by 0x17A1A9: del_tree (tal.c:421)
==26567== by 0x17A4ED: tal_free (tal.c:509)
==26567== by 0x124B6C: delete_peer (peer_control.c:180)
==26567== by 0x12B369: destroy_uncommitted_channel (peer_control.c:2505)
==26567== by 0x179C19: notify (tal.c:240)
==26567== by 0x17A0CE: del_tree (tal.c:400)
==26567== by 0x17A4ED: tal_free (tal.c:509)
==26567== by 0x12B31E: opening_channel_errmsg (peer_control.c:2496)
==26567== by 0x13243A: handle_peer_error (subd.c:407)
==26567== by 0x1326E4: sd_msg_read (subd.c:472)
==26567== by 0x1320AC: read_fds (subd.c:328)
==26567== Block was alloc'd at
==26567== at 0x4C2FB0F: malloc (in /usr/lib/valgrind/vgpreload_memcheck-amd64-linux.so)
==26567== by 0x179C83: allocate (tal.c:250)
==26567== by 0x17A250: tal_alloc_ (tal.c:448)
==26567== by 0x124950: new_peer (peer_control.c:151)
==26567== by 0x12B3EC: new_uncommitted_channel (peer_control.c:2521)
==26567== by 0x12B5C5: peer_accept_channel (peer_control.c:2569)
==26567== by 0x126099: peer_sent_nongossip (peer_control.c:641)
==26567== by 0x113B28: peer_nongossip (gossip_control.c:55)
==26567== by 0x113D9D: gossip_msg (gossip_control.c:144)
==26567== by 0x132783: sd_msg_read (subd.c:487)
==26567== by 0x1320AC: read_fds (subd.c:328)
==26567== by 0x16D1FE: next_plan (io.c:59)
==26567==
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
BackgroundL Each log has a log_book: many logs can share the same one,
as each one can have a separate prefix.
Testing tickled a bug at the end of this series, where subd was
logging to the peer's log_book on shutdown, but the peer was already
freed. We've already had issues with logging while lightningd is
shutting down.
There are times when reference counting really is the right answer,
this seems to be one of them: the 'struct log' share the 'struct
log_book' and the last 'struct log' cleans it up.
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
We derive the seed from this, so it needs to be unique, but using
rowid forced us to put the channel into the db early, before it
was ready.
Instead, use a counter to ensure uniqueness, initialized when we load
existing peers. This doesn't need to touch the database at all.
As we now have only two places where the channel is committed (the
funder and fundee paths), so we create a new explicit
'wallet_channel_insert()' function: 'wallet_channel_save()' now just
updates.
Note that this also fixes some weirdness in
wallet_channels_load_active: we strangely avoided loading channels in
CLOSINGD_COMPLETE (which fortunately was a transient state, so
unlikely anyone hit this). Note that since the lines above already
delete all the OPENINGD channels, we now simply load them all.
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
Adds a simple check that compares genesis-blockhashes from the
chainparams against the blockhash that the wallet was created
with. The wallet is network specific, so mixing is always a bad idea.
Signed-off-by: Christian Decker <decker.christian@gmail.com>
We now keep a list of commands for the jcon instead of a simple
'current' pointer: the assertions become a bit more complex, but
the rest is fairly mechanical.
Fixes: #1007
Reported-by: @ZmnSCPxj
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
We do a complicated dance because we don't know the current block
height before setting up the topology.
If we're starting at a particular block, we want to go back 100 blocks
before that to cover any reorgs.
If we're not (fresh startup), we still want to go back 100 blocks
because we don't bother handling a reorg which removes all the blocks
we know.
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
With fallback depending on chainparams: this means the first upgrade
will be slow, but after that it'll be fast.
Fixes: #990
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
We error out for all kinds of reasons early on (eg. bitcoind down),
and printing a backtrace for them is pretty confusing.
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
Includes closing off stdout and stderr. We don't do it directly in the
arg parser, as we want to interact normally (eg with other errors) before
we turn off stdout/stderr.
Fixes: #986
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
This interacts badly with --daemon (next patch) which then tries to
reap a child it didn't create, which took me a couple of hours to
figure out.
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
Once we read a command, we are supposed to io_wait until it finishes.
However, we are actually woken in two places: when it's complete
(which is correct), and when it's written out (which is wrong).
We don't care when it's written out, only when it's finished:
refactor to make json_done() free and NULL the old ->current,
rather than have the callers do it. Now it's clear that it's
ready for both new output and new input.
Fixes: #934
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
We will have probably failed the others, but either way, don't try to
fulfill an HTLC we've already failed.
Fixes: #394
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
We usually did this, but sometimes they were named after what they did,
rather than what they cleaned up.
There are still a few exceptions:
1. I didn't bother creating destroy_xxx wrappers for htable routines
which already existed.
2. Sometimes destructors really are used for side-effects (eg. to simply
mark that something was freed): these are clearer with boutique names.
3. Generally destructors are static, but they don't need to be: in some
cases we attach a destructor then remove it later, or only attach
to *some* cases. These are best with qualifiers in the destroy_<type>
name.
Suggested-by: @ZmnSCPxj
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
This provides a sanity check that we are in sync, and also keeps the
logic in the program and out of the SQL.
Since the destructor now doesn't clean up the peer, there are some
wider changes to be made when cleaning up. Most notably we create
lots of channels in run-wallet.c and they previously freed the peer:
now we need free the peer explicitly, so we need to free them first.
Suggested-by: @cdecker
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
And return the correct error message for the channel they give, if
they try to re-establish on an error channel.
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
Channels are within the peer structure, but the peer is freed only
when the last channel is freed.
We also implement channel_set_owner() and make peer_set_owner() a temporary
wrapper.
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
Much like the database; peer contains id, address, channel contains
per-channel information. Where we create a channel, we always create
the peer too.
For the moment, peer->log and channel->log coexist side-by-side, to
reduce some of the churn.
Note that this changes the API to dev-forget-channel: if we have more
than one channel, we insist they specify the short-channel-id.
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
This is not connected yet; during the transition, there will be a 1:1
mapping from channel to peer, so we can use channel2peer and peer2channel
to shim between them.
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
Both when we forget about an opening peer, and at startup. We're
going to be relying on this, and the next patch, as we refactor
peer/channel handling to mirror the db.
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
Combining the two was just awkward, so it's clearer to have separate
functions. And we make the lower-level functions do the escaping.
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
The JSON-RPC spec specifies that if the request is unparseable we
should return an error with a NULL id. This is a bit more friendly
than slamming the door in the face.
Signed-off-by: Christian Decker <decker.christian@gmail.com>
As reported by @practicalswift in #945 it is possible to inject
non-printable, or shell escape, characters in a json command, that
will fail to parse and then clear the shell.
Reported-by: @practicalswift
Signed-off-by: Christian Decker <decker.christian@gmail.com>
Now we have wirestring, this is much more natural. And with the
24M length limit, we needn't be so concerned about dumping 64k peer
messages in hex.
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
These are now logically arrays of pointers. This is much more natural,
and gets rid of the horrible utxo array converters.
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
`activate_peer` does little more than wiring up some txwatches and
asking `gossipd` to reconnect to the peer. If the peer manages to
reconnect before we activate then we would crash.
This just changes the `assert` causing the crash into a conditional
whether we need to reconnect or not.
Signed-off-by: Christian Decker <decker.christian@gmail.com>
Due to the broadcast failure quite a few users are reporting channels
stuck in awaiting lockin. This commit adds a `dev-forget-channel`
command that checks whether the funding outpoint is in the UTXO, and
forgets the channel if not. The UTXO check can be overridden with the
`force` parameter, but that is dangerous.
Signed-off-by: Christian Decker <decker.christian@gmail.com>
We were sideloading it, which is awkward, now it's a field that we can
actually use in the code.
Signed-off-by: Christian Decker <decker.christian@gmail.com>
We currently don't handle LOG_IO properly, and we turn it into a string
before handing it to the ->print function, which makes it ugly for
the case where we're using copy_to_parent_log, and also means in
that case we lose *what peer* the IO is coming from.
Now, we handle the io as a separate arg, which is much neater.
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
Logging often gets called in error paths, so this is just good hygiene.
Also, log_io does this already.
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
libunwind does not accept a NULL parameter for the error callback. It
will simply call into the NULL pointer. So add an error callback.
This makes the crash output somewhat more sensible on FreeBSD, where
there is no libunwind stack trace available:
2018-02-05T20:24:50.598Z lightningd(75556): error getting backtrace: no stack trace because unwind library not available (0)
Signed-off-by: Wladimir J. van der Laan <laanwj@gmail.com>
Maintaining it was always fraught, since the command could go away
if the JSON RPC died. Most recently, it was broken again on shutdown
(see below).
In future we may allow pay commands to block on previous payments, so
it won't even be a 1:1 mapping. Generalize it: keep commands in a
simple list and do a lookup when a payment fails/succeeds.
Valgrind error file: valgrind-errors.5732
==5732== Invalid read of size 8
==5732== at 0x4149FD: remove_cmd_from_hout (pay.c:292)
==5732== by 0x468BAB: notify (tal.c:237)
==5732== by 0x469077: del_tree (tal.c:400)
==5732== by 0x4690C7: del_tree (tal.c:410)
==5732== by 0x46948A: tal_free (tal.c:509)
==5732== by 0x40F1EA: main (lightningd.c:362)
==5732== Address 0x69df148 is 1,512 bytes inside a block of size 1,544 free'd
==5732== at 0x4C2EDEB: free (in /usr/lib/valgrind/vgpreload_memcheck-amd64-linux.so)
==5732== by 0x469150: del_tree (tal.c:421)
==5732== by 0x46948A: tal_free (tal.c:509)
==5732== by 0x4198F2: free_htlcs (peer_control.c:1281)
==5732== by 0x40EBA9: shutdown_subdaemons (lightningd.c:209)
==5732== by 0x40F1DE: main (lightningd.c:360)
==5732== Block was alloc'd at
==5732== at 0x4C2DB8F: malloc (in /usr/lib/valgrind/vgpreload_memcheck-amd64-linux.so)
==5732== by 0x468C30: allocate (tal.c:250)
==5732== by 0x4691F7: tal_alloc_ (tal.c:448)
==5732== by 0x40A279: new_htlc_out (htlc_end.c:143)
==5732== by 0x41FD64: send_htlc_out (peer_htlcs.c:397)
==5732== by 0x41511C: send_payment (pay.c:388)
==5732== by 0x41589E: json_sendpay (pay.c:513)
==5732== by 0x40D9B1: parse_request (jsonrpc.c:600)
==5732== by 0x40DCAC: read_json (jsonrpc.c:667)
==5732== by 0x45C706: next_plan (io.c:59)
==5732== by 0x45D1DD: do_plan (io.c:387)
==5732== by 0x45D21B: io_ready (io.c:397)
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
This is a transitional patch so we can still close channels cleanly;
for want of a better option, I hooked it into --deprecated-apis.
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
We shouldn't fail negotiation just because they exceeded what we thought
fair: we're better off as long as it's actually <= final commitment fee.
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
We move it into jsonrpc where it belongs, and make it fail the command.
This means it can tell us exactly what was wrong.
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
With the new 'human-readable' mode of lightning-cli, this actually produces
a valid config file. It's a bit hacky though...
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
We may need to lookup UTXO entries for other reasons, so here we
disentangle it and make it into its own method.
Signed-off-by: Christian Decker <decker.christian@gmail.com>
Might help alleviate some of the issues of having to run a full-node
on the same machine as `lightningd`.
Signed-off-by: Christian Decker <decker.christian@gmail.com>
Exception: Node /tmp/lightning-t5gxc6gs/test_closing_different_fees/lightning-2/ has memory leaks: [{'value': '0x55caa0a0b8d0', 'label': 'ccan/ccan/tal/str/str.c:90:char[]', 'backtrace': ['ccan/ccan/tal/tal.c:467 (tal_alloc_)', 'ccan/ccan/tal/tal.c:496 (tal_alloc_arr_)', 'ccan/ccan/tal/str/str.c:90 (tal_vfmt)', 'lightningd/log.c:131 (new_log)', 'lightningd/subd.c:632 (new_subd)', 'lightningd/subd.c:686 (new_peer_subd)', 'lightningd/peer_control.c:2487 (peer_accept_channel)', 'lightningd/peer_control.c:674 (peer_sent_nongossip)', 'lightningd/gossip_control.c:55 (peer_nongossip)', 'lightningd/gossip_control.c:142 (gossip_msg)', 'lightningd/subd.c:477 (sd_msg_read)', 'lightningd/subd.c:319 (read_fds)', 'ccan/ccan/io/io.c:59 (next_plan)', 'ccan/ccan/io/io.c:387 (do_plan)', 'ccan/ccan/io/io.c:397 (io_ready)', 'ccan/ccan/io/poll.c:305 (io_loop)', 'lightningd/lightningd.c:347 (main)', '(null):0 ((null))', '(null):0 ((null))', '(null):0 ((null))'], 'parents': ['lightningd/log.c:103:struct log_book', 'lightningd/lightningd.c:43:struct lightningd']}]
Technically, true, but we save more memory by sharing the prefix pointer
than we lose by leaking it.
However, we'd ideally refcount so it's freed if the log is freed and
all the entries using it are pruned from the log book.
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
This makes much more sense when you ask for a specific peer's log.
Also, we put the peerid rather than pid ().
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
We added code to allow a few spurious failures, but it didn't unmark
the request running.
IRC user 'mlz' (@molxyz) provided logs from his stuck-at-old-block lightningd:
lightningd(31981): Adding block 1261159: 00000000da3890ccd0f313a74fccfd4789654b496836da5c28a8d2ad28852264
lightningd(31981): Adding block 1261160: 00000000f70938a33aecbdd7b047cb5cf5b095ea4770c1335acf1859bad1e767
lightningd(31981): bitcoin-cli -testnet estimatesmartfee 2 CONSERVATIVE exited with status 1
Fixes: #749
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
There is an interaction between --ipaddr and --port, namely that the
default port is used when parsing --ipaddr if --port comes after the
--ipaddr, and --port is used if it comes before it. Adding a port to
--ipaddr still trumps everything else, but this way we correctly set
port in the address.
Reported-by: Wladimir J. van der Laan @laanwj
Signed-off-by: Christian Decker <decker.christian@gmail.com>
The JSON connect command wouldn't terminate if peer reconnected
in a state CHANNELD_AWAITING_LOCKIN or above.
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
Such an htlc is invalid, and will be failed cleanly by our channeld
(which also checks that it meets the minimum amount), but it's
not the master's job to check it, and in fact, it asserts if we were
to try to pay or forward such a thing.
Fixes: #686
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
The peer shouldn't try, and channeld won't try to add it if it does,
but we shouldn't trust it. And it would make our htlc_in_check() code
assert.
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
Before this patch:
```
$ lightningd/lightningd
lightningd(PID): Creating lightningd dir /root/.lightning (because chdir gave No such file or directory)
lightningd(PID): Creating database
```
After this patch:
```
$ lightningd/lightningd
lightningd(PID): Creating lightningd dir /root/.lightning
lightningd(PID): Creating database
```
delinvoice was orginally documented to only allow deletion of unpaid
invoices, but there might be reasons to delete paid ones or unexpired ones.
But we have to avoid the race where someone pays as it's deleted: the
easiest way is to have the caller tell us the status, and fail if
it's wrong.
Fixes: #477
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
Error code is inverted (which makes sense: who returns 'true' on
error?), and anyway there's a leak if we do error.
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
We're going to have to support multiple channels per peer, even if only
when some are onchain. This would break the current listpeers, so
change it to an array (single element for now).
Other cleanups:
1. Only set connected true if daemon is not onchaind.
2. Only show netaddr if connected; don't make it an array, call it `address`
in comparison with `addresses` in listnodes.
3. Rename `channel` to `short_channel_id`
4. Add `funding_txid` field for voyeurism.
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
This allows us to add other fields, such as version information,
warnings or invoiceless payments, later.
(Note: the deprecated listinvoice is unchanged)
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
This matches the other names, and also the return value is about to change.
This will be removed before release!
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
This can be used for upgrades to make sure you're not using deprecated
options, JSON commands, JSON fields, etc.
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
For performance, we delay entering the 'wallet_payment' into the db
until we actually commit to the HTLC (when we have to touch the DB
anyway).
This opens a race where we can try to pay twice, and since it's not in
the database yet, we don't notice the duplicate.
So remove the temporary payment field from htlc_out, which was always
an uncomfortable hack, and make the wallet code abstract over the
deferred entry a little by maintaining a 'unstored_payments' list
and incorporating that in results.
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
We need these to decode any returned errors.
We remove it from struct pay_command too, and load directly from db
when we need it.
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
We should be saving this, as it's our proof of payment. Also, we return
it if they try to pay again.
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
This, of course, should never be used. But it helps maintain connections
for the moment while we dig deeper into feerates.
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
This is a common occurence on pruned nodes. By calling the callback
upon failures, we communicate that we couldn't verify the txoutput. We
fail safe rejecting any channel we can't verify.
Signed-off-by: Christian Decker <decker.christian@gmail.com>
This means we print out the correct path with --debugger, which
can be vital if there are multiple binaries (eg. compiled vs installed).
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
json_get_params does this for us.
Fixes: 78adf0b (pay: allow 'null' msatoshi field.)
Reported-by: ZmnSCPxj
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
Pulling up the save call from `peer_save_commitsig_received` into its
caller `peer_got_commitsig` and adding a call to
`peer_sending_commitsig`
Signed-off-by: Christian Decker <decker.christian@gmail.com>
Message buffer `why` is allocated in the `peer` context and also freed when peer is freed.
Only explicitly free the buffer when peer itself is not freed yet.
exit status is not enough to detect spent outputs. gettxout will return a
success exit code and 0 bytes.
Signed-off-by: William Casarin <jb55@jb55.com>
We'll pass this down to gossip and make sure to re-announce/update
channels every so often. This is also used as a pruning timer, i.e.,
channels that have not been updated in 2 x channel-update-interval
will be pruned from the local view.
Signed-off-by: Christian Decker <decker.christian@gmail.com>
Since most callers use positional arguments, we should allow a 'null'
literal where we require no value at all.
Also adds some more value tests.
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
Paid invoices need to know how much was actually paid: both for the case
where no 'msatoshi' amount was specified, and for the normal case, where
clients are permitted to overpay in order to help them disguise their
payments.
While we migrate the db, we leave this field as 0 for old paid
invoices. This is unhelpful for accounting, but at least clearly
indicates what happened if we find this in the wild.
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
'rhash' is the old terminology, but 'payment_preimage' and
'payment_hash' were decided on for the BOLTs, so we should fix that here.
We still use rhash internally, but that's much easier to fix.
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
Different commands (listinvoice, delinvoice, waitinvoice,
waitanyinvoice) returned different fields, as not all were updated.
This makes them uniform.
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
This reuses the same code internally, and also now means that we deal
correctly with "any" msatoshi invoices: the old code would a return
'msatoshi' of 0 in that case.
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
The manfile and the online help use 'msatoshi', the returned
response uses 'msatoshi', nearly every invoice-related
monetary amount is labelled 'msatoshi' and not 'amount'.
It would be nice if bitcoind had an RPC to do this in one, but that's
a bit much to ask for. We could also hand around proofs, for lite nodes.
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
1. htlc->fail has been changed to a u8 *.
2. wallet_get_newindex saves to the db.
3. peer->next_htlc_id is saved to the db in peer_save_commitsig_sent() below.
4. We do store commit in peer_save_commitsig_received(peer, commitnum),
and the fixme below talks about HTLC sigs.
5. We do commit shachain and next_per_commit_point in wallet_shachain_add_hash
and update_per_commit_point respectively.
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
All other users of json_get_params(...) check the return value:
```
lightningd/chaintopology.c: if (!json_get_params(buffer, params,
lightningd/chaintopology.c: if (!json_get_params(buffer, params,
lightningd/dev_ping.c: if (!json_get_params(buffer, params,
lightningd/gossip_control.c: if (!json_get_params(buffer, params,
lightningd/invoice.c: if (!json_get_params(buffer, params,
lightningd/invoice.c: if (!json_get_params(buffer, params,
lightningd/invoice.c: if (!json_get_params(buffer, params,
lightningd/invoice.c: if (!json_get_params(buffer, params,
lightningd/invoice.c: if (!json_get_params(buffer, params, "label", &labeltok, NULL)) {
lightningd/invoice.c: if (!json_get_params(buffer, params,
lightningd/jsonrpc.c: if (!json_get_params(buffer, params,
lightningd/pay.c: if (!json_get_params(buffer, params,
lightningd/pay.c: if (!json_get_params(buffer, params,
lightningd/peer_control.c: if (!json_get_params(buffer, params,
lightningd/peer_control.c: if (!json_get_params(buffer, params,
lightningd/peer_control.c: if (!json_get_params(buffer, params,
lightningd/peer_control.c: if (!json_get_params(buffer, params,
lightningd/peer_control.c: if (!json_get_params(buffer, params,
lightningd/peer_control.c: if (!json_get_params(buffer, params,
wallet/walletrpc.c: if (!json_get_params(buffer, params,
wallet/walletrpc.c: if (!json_get_params(buffer, params, "tx", &txtok, NULL)) {
```
I've only seen this under travis, so I can't verify that this fixes it,
but it's certainly a bug which could cause that issue.
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
This is necessary to grad the their_unilateral/to-us outputs since
they aren't being harvested by `onchaind`
Signed-off-by: Christian Decker <decker.christian@gmail.com>
This is the scriptpubkey that onchaind spends all funds to, except for
the their_unilateral/to-us case, so we better recognize that address.
Signed-off-by: Christian Decker <decker.christian@gmail.com>
This is the only case in which we don't respend to a simple keyindex'd
pubkey, so we need to handle this for future spends.
Signed-off-by: Christian Decker <decker.christian@gmail.com>
We always arm the funding_lockin_cb, even if we don't need to. If we
have an short_channel_id already from the db, this was replacing it
and leaking the old one.
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
Since we panic when we see our root reorg out, even if we're not doing
anything yet, restoring the 100 block margin is the simplest fix.
Unfortunately this means adding a 100-block spacer in the tests, so things
don't get confused.
Fixes: #511
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
This is surprisingly simple. We set up the watches for funding tx
depth and the funding output, then if it's not onchain we ask gossipd
to reconnect.
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
Load the first block we're possibly interested in, then load the peers so
we can restore the tx watches, then finally replay to the current tip.
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
Eventually we want to save blockchain in db to avoid this scan, but
for the moment, we need to reload as far back as we may be interested in.
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
This gives us a lower bound on where funding tx could be.
In theory, it could be lower than this if we get a reorganization, but
in practice this is already a 1-block buffer (since we can't get into
current block, only the next one).
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
In the normal (peer-to-peer) path, the HTLC state prevents us fulfilling
twice, but this goes out the window with onchain HTLCs.
The actual assert which caught it was lightningd/pay.c:70 (payment_succeeded)
in the test_htlc_in_timeout test, after the next commit.
So add an assert earlier (in fulfill_our_htlc_out) and check in the
one caller where it can be true.
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
We used to load the new tip and work backwards until we joined up with
the previous tip. That consumed quite a lot of memory if there were
many blocks.
Instead, just poll on blocknum+1, and grab it once that succeeds. If
prev is different from what we expect (reorg), we free the current tip
and try again.
We could theoretically miss a reorg which is the same length (2 block
reorg with more work due to difficulty adjustment), but even if that
happened we'd catch up on the next block.
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
It definitely changes when we get a block, but it also changes between
blocks as mempool fills. So put it on its own timer.
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
It's just a sha256_double, but importantly when we convert it to a
string (in type_to_string, which is used in logging) we use
bitcoin_blkid_to_hex() so it's reversed as people expect.
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
It's just a sha256_double, but importantly when we convert it to a
string (in type_to_string, which is used in logging) we use
bitcoin_txid_to_hex() so it's reversed as people expect.
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
I prefer the typesafety of specific functions, rather than having the
caller know that txids are traditionally reversed in bitcoin.
And we already have a bitcoin_txid_to_hex() function for this.
Closes: #411
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
* Add port parsing support to parse_wireaddr. This is in preparation for storing
addresses in the peers table. This also makes parse_wireaddr a proper inverse of
fmt_wireaddr.
* Move parse_wireaddr to common/wireaddr.c this seems like a better place for
it. I bring along parse_ip_port with it for convenience. This also fixes some
issues with the upcoming ip/port parsing tests.
Signed-off-by: William Casarin <jb55@jb55.com>
We set hout->key.id when channeld tells us what it is, but if channeld
dies before that we free the hout, and our destructor logs it:
Valgrind error file: valgrind-errors.20312
==20312== Use of uninitialised value of size 8
==20312== at 0x53ABC9B: _itoa_word (_itoa.c:179)
==20312== by 0x53B041F: vfprintf (vfprintf.c:1642)
==20312== by 0x53B17D5: buffered_vfprintf (vfprintf.c:2330)
==20312== by 0x53AEAA5: vfprintf (vfprintf.c:1301)
==20312== by 0x53B7D63: fprintf (fprintf.c:32)
==20312== by 0x128BAC: hout_subd_died (peer_htlcs.c:316)
==20312== by 0x16D8E0: notify (tal.c:240)
==20312== by 0x16DD95: del_tree (tal.c:400)
==20312== by 0x16DDE7: del_tree (tal.c:410)
==20312== by 0x16DDE7: del_tree (tal.c:410)
==20312== by 0x16E1B4: tal_free (tal.c:509)
==20312== by 0x162B5C: io_close (io.c:443)
==20312== by 0x12D563: sd_msg_read (subd.c:508)
==20312== by 0x161EA5: next_plan (io.c:59)
==20312== by 0x1629A2: do_plan (io.c:387)
==20312== by 0x1629E0: io_ready (io.c:397)
==20312== by 0x164319: io_loop (poll.c:305)
==20312== by 0x118E21: main (lightningd.c:334)
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
Accuracy improvements:
1. We assumed the output was a p2wpkh, but it can be user-supplied now.
2. We assumed we always had change; remove this for wallet_select_all.
Calculation out-by-one fixes:
1. We need to add 1 byte (4 sipa) for the input count.
2. We need to add 1 byte (4 sipa) for the output count.
3. We need to add 1 byte (4 sipa) for the output script length for each output.
4. We need to add 1 byte (4 sipa) for the input script length for each input.
5. We need to add 1 byte (4 sipa) for the PUSH optcode for each P2SH input.
The results are now a slight overestimate (due to guessing 73 bytes
for signature, whereas they're 71 or 72 in practice).
Fixes: #458
Reported-by: Jonas Nick @jonasnick
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
Two changes:
- Fixed the function signature of noleak_ to match in both
configurations
- Added memleak.o to linker for tests
Generating the stubs for the unit tests doesn't really work since the
stubs are checked in an differ between the two configurations, so
adding memleak to the linker fixes that, by not requiring stubs to be
generated in the first place.
Signed-off-by: Christian Decker <decker.christian@gmail.com>
We can call this multiple times. The best solution is to add and remove
the signature so it's always unsigned as we expect it.
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
The pay command in particular, attaches a reasonable number of
temporaries to cmd, knowing they'll be freed once cmd is done.
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
This is called when we load from database: clearly our tests aren't thorough
enough because we were allocating and initializing `r` in an unused structure.
invs is also the owner already; functions which steal are a bit surprising
to callers, so we either document them, or just don't do it.
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
We have things which we don't keep a pointer to, but aren't leaks.
Some are simply eternal (eg. listening sockets), others cases are
io_conn tied to the lifetime of an fd, and timers which expire.
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
memleak doesn't detect pointers to within an object, only pointers to their
exact address (it's simpler this way). Moving the linked list to the
top of the structure means it can follow the chain.
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
memleak doesn't detect pointers to within an object, only pointers to their
exact address (it's simpler this way). Moving the linked list to the
top of the structure means it can follow the chain.
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
This is not a child of cmd, since they have independent lifetimes, but
we don't want to noleak them all, since it's only the one currently in
progress (and its children) that we want to exclude.
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
We use the tal notifiers to attach a `backtrace` object on every
allocation.
This also means moving backtrace_state from log.c into lightningd.c, so
we can hand it to memleak_init().
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
This is a primitive mark-and-sweep-style garbage detector. The core is
in common/ for later use by subdaemons, but for now it's just lightningd.
We initialize it before most other allocations.
We walk the tal tree to get all the pointers, then search the `ld`
object for those pointers, recursing down. Some specific helpers are
required for hashtables (which stash bits in the unused pointer bits,
so won't be found).
There's `notleak()` for annotating things that aren't leaks: things
like globals and timers, and other semi-transients.
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
jsonrpc handlers usually directly call command_success or
command_fail; not doing that implies they're waiting for something
async.
Put an explicit call (currently a noop) there, and add debugging
checks to make sure it's used.
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
Couldn't find a good place to put these messages, we probably want to
do the same capability based request routing that we did for the HSM,
but for now this just defines the message in the master messages file.
Signed-off-by: Christian Decker <decker.christian@gmail.com>
If send_htlc_out() fails, it doesn't initialize pc->out; that can
make us think it's still in progress.
Reported-by: Jonas Nick
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
When gossipd sends a message, have a gossip_index. When it gets back a
peer, the current gossip_index is included, so it can know exactly where
it's up to.
Most of this is mechanical plumbing through openingd, channeld and closingd,
even though openingd and closingd don't (currently) read gossip, so their
gossip_index will be unchanged.
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
All peers come from gossipd, and maintain an fd to talk to it. Sometimes
we hand the peer back, but to avoid a race, we always recreated it.
The race was that a daemon closed the gossip_fd, which made gossipd
forget the peer, then master handed the peer back to gossipd. We stop
the race by never closing the gossipfd, but hand it back to gossipd
for closing.
Now gossipd has to accept two fds, but the handling of peers is far
clearer.
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
As demonstrated in the test at the end of this series, openingd dying
spontaneously causes the conn to be freed which causes the subd to be
destroyed, which fails the peer, which hits the db.
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
Rather than using the destructor, hook up the cmd so we can close it.
peers are allocated off ld, so they are only destroyed explicitly.
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
We are still generating only char* style aliases, but the field is
defined to be unicode, which doesn't mix too well with char.
Signed-off-by: Christian Decker <decker.christian@gmail.com>
We don't use it yet, but now we'll decode correctly.
See: https://github.com/lightningnetwork/lightning-rfc/pull/317
lightning-rfc commit: ef053c09431442697ab46e83f9d3f86e3510a18e
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
Change all calls to use the correct serialization and deserialization
functions, include the correct headers and remove the control
messages.
Signed-off-by: Christian Decker <decker.christian@gmail.com>
The master now hands channeld either an error code, and channeld
generates the error message, or an error message relayed from another
node to pass through.
This doesn't fill in the channel_update yet: we need to wire up gossipd
to give us that.
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
Currently lightningd does this, but channeld is perfectly capable of doing it.
channeld is also in a far better position to add channel_updates to it.
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
estimatesmartfee 4 ECONOMICAL was too high for lnd, so drop it, with some
increased security risk.
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
The filter is being populated while initializing the daemon and by
adding new keys as they are being generated. The filter is then used
in connect_block to identify transactions of interest.
Signed-off-by: Christian Decker <decker.christian@gmail.com>
This is mainly used to filter for transactions that may be of interest
to us, i.e., whether one of our keys is the recipient. It currently
does onyl simple scriptpubkey checks, but will eventually be extended
to use bloomfilters and add more sophisticated checks.
For now the goal is to speed up the processing of blocks during startup.
Signed-off-by: Christian Decker <decker.christian@gmail.com>
This addresses a performance regression introduced by
6ceb375650. We were storing it in an
otherwise empty DB transaction, which means that DB transaction was no
longer a no-op. Now we defer storing until we need to store the
corresponding HTLC anyway, so we can just piggyback on top of that
transaction.
This is also more consistent since we'd be forgetting the payment
anyway if we restart between adding the HTLC and committing to it.
Signed-off-by: Christian Decker <decker.christian@gmail.com>
We only send them when we're not awaiting revoke_and_ack: our
simplified handling can't deal with multiple in flights.
Closes: #244
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
The wire protocol uses this, in the assumption that we'll never see feerates
in excess of 4294967 satoshi per kiloweight.
So let's use that consistently internally as well.
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
Depending on what we're doing, we can want different ones. So use
IMMEDIATE (estimatesmartfee 2 CONSERVATIVE), NORMAL (estimatesmartfee
4 ECONOMICAL) and SLOW (estimatesmartfee 100 ECONOMICAL).
If one isn't available, we try making each one half the previous.
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
This means we convert it when retrieving from bitcoind; internally it's
always satoshi-per-1000-weight aka millisatoshi-per-weight.
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
Test objects must be added to $(ALL_OBJS) so they correctly depend on
CCAN headers etc.
Also, each test in a subdir must depend on headers and src in the parent
directory, as it will often #include them directly.
Reported-by: Christian Decker
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
Our testsuite uses --dev-fail-on-subdaemon-fail, so I didn't notice this
until I turned that off to chase a bug.
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
All the callers need to pass it in: currently channeld and openingd just
fake it by copying the payment point.
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
There were two bugs: we weren't returning the next from the given
label but the one matching the label, and we were appending new
invoices to the head instead of the tail, which meant we'd be
traversing in the wrong order.
Signed-off-by: Christian Decker <decker.christian@gmail.com>
Thought we don't handle it at the moment, nodes can certainly have multiple
addresses, and we should display them all.
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
We don't track them accurately when in onchaind, but we don't want to:
onchaind can be restarted at any time.
Once it's all settled, we're clear to clean them up.
Before this, valgrind could complain about deferncing hout->key.peer:
Valgrind error file: valgrind-errors.10876
==10876== Invalid read of size 4
==10876== at 0x41F8AF: peer_on_chain (peer_control.h:127)
==10876== by 0x42340D: notify_new_block (peer_htlcs.c:1461)
==10876== by 0x40A08D: connect_block (chaintopology.c:96)
==10876== by 0x40A96B: topology_changed (chaintopology.c:313)
==10876== by 0x40AC85: add_block (chaintopology.c:384)
==10876== by 0x40ABF0: gather_previous_blocks (chaintopology.c:363)
==10876== by 0x4051B3: process_rawblock (bitcoind.c:410)
==10876== by 0x4044DD: bcli_finished (bitcoind.c:155)
==10876== by 0x454665: destroy_conn (poll.c:183)
==10876== by 0x454685: destroy_conn_close_fd (poll.c:189)
==10876== by 0x45DF89: notify (tal.c:240)
==10876== by 0x45E43A: del_tree (tal.c:400)
==10876== Address 0x6929208 is 2,120 bytes inside a block of size 2,416 free'd
==10876== at 0x4C2EDEB: free (in /usr/lib/valgrind/vgpreload_memcheck-amd64-linux.so)
==10876== by 0x45E513: del_tree (tal.c:421)
==10876== by 0x45E849: tal_free (tal.c:509)
==10876== by 0x41A8E9: handle_irrevocably_resolved (peer_control.c:1172)
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
We are announcing that we are willing to accept incoming payments with
current_height + min_final_cltv_expiry + slack, assuming that the
sender adds some slack. In particular we'd reject the payment if
slack=0 which is allowed by the spec.
Signed-off-by: Christian Decker <decker.christian@gmail.com>
We save location where transaction was started, in case we try to nest.
There's now no error case; db_exec_mayfail() is the only one.
This means the tests need to override fatal() if they want to intercept
these errors.
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
This is a subset of a "bitcoind: wrap callbacks in transaction." from
the everything-in-transaction branch, but we need the ld pointer now.
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
And nail "make check-source" to that specific version (which is a commit id,
not a branch name, so needs a different syntax for git).
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
These need to be different for testing the example in BOLT 11.
We also use the cltv_final instead of deadline_blocks in the final hop:
various tests assumed 5 was OK, so we tweak utils.py.
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
It crashes under valgrind, causing a valgrind error: valgrind gives us a
backtrace anyway, so we don't need it.
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
Normally, we get an error as soon as we send WIRE_REVOKE_AND_ACK. But if the
commit timer goes off, we get some extra cycles, during which the other side
can reconnect. In this case, we simply kill the channeld before it fails,
and never check for the permfail string.
b'lightning_channeld(18613): TRACE: dev_disconnect: -WIRE_REVOKE_AND_ACK'
b'lightning_channeld(18613): TRACE: Trying commit'
b'lightning_channeld(18613): TRACE: htlc 0: SENT_ADD_REVOCATION->SENT_ADD_ACK_COMMIT'
b'lightning_channeld(18613): TRACE: htlc added REMOTE: local +0 remote -200000000'
b'lightning_channeld(18613): TRACE: sending_commit: HTLC REMOTE 0 = SENT_ADD_ACK_COMMIT/RCVD_ADD_ACK_COMMIT'
b'lightning_gossipd(18590): TRACE: Responder: Act 1'
b'lightning_channeld(18613): TRACE: Derived key 034aab0b5cb755de836cffb34c053ba115fba6fe75414e8f56261e23c80eabb1fe from basepoint 03e0a7bb422b254f54bc954be05bd6823a7b7a4b996ff8d3079ca211590fb5df39, point 02f3bf525b6ca595bf85d63e89c95fc59c0fde3ae434b55c8093bbb5c64849da37'
b'lightningd(18465): Connected json input'
b'lightningd(18465):jcon fd 16: Success'
b'lightningd(18465):jcon fd 16: Closing (Bad file descriptor)'
b'lightning_gossipd(18590): TRACE: Responder: Act 2'
b'lightning_gossipd(18590): TRACE: Responder: Act 3'
b'lightning_gossipd(18590): UPDATE WIRE_GOSSIP_PEER_CONNECTED'
b'lightning_gossipd(18590): UPDATE WIRE_GOSSIP_PEER_CONNECTED'
b'lightningd(18465): peer 0266e4598d1d3c415f572a8488830b60f7e744ed9235eb0b1ba93283b315c03518: Peer has reconnected, state CHANNELD_NORMAL'
b'lightning_channeld(18613): Status closed, but not exited. Killing'
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
And we report these through the getpeers JSON RPC again (carefully: in
our reconnect tests we can get duplicates which this patch now filters
out).
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
In future it will have TOR support, so the name will be awkward.
We collect the to/fromwire functions in common/wireaddr.c, and the
parsing functions in lightningd/netaddress.c.
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
They don't currently, since callers check, but be safe. In addition,
handle NULL returns from these in the bitcoind code.
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
There are others, but they really are casued by bad failure. We need a
parachute system for these.
Closes: #176
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
This is a bit messier than I'd like, but we want to clearly remove all
dev code (not just have it uncalled), so we remove fields and functions
altogether rather than stub them out. This means we put #ifdefs in callers
in some places, but at least it's explicit.
We still run tests, but only a subset, and we run with NO_VALGRIND under
Travis to avoid increasing test times too much.
See-also: #176
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
test_routing_gossip (__main__.LightningDTests) ... lightningd: Outstanding taken pointers: lightningd/peer_control.c:2352:towire_errorfmt(ld, ((void *)0), "Can't resolve your address")
This caused by the other end closing due to the next bug.
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
There is a race we see sometimes under valgrind on Travis which shows
gossipd receiving the node_announce from master before it reads the
channel_announce from channeld, and thus fails. The simplest solution
is to send the channel_announce and channel_update to master as well,
so it can ensure it sends them to gossipd in order
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
It makes it impossible to embed an ipaddr in another structure, since we
always try to skip over any zeroes, which may swallow a following field.
Do the skip specially for the case where we're parsing routing messages:
we never use padding for our own internal messages anyway.
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
Makes it easier to compare before/after failures. Ideally, we should
run under Travis both with this option and with the seed based on the
entire tmp path (which is still reproducible with determination, but
not fixed every run like this is).
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
There are now only two kinds of subdaemons: global ones (hsmd, gossipd) and
per-peer ones. We can handle many callbacks internally now.
We can have a handler to set a new peer owner, and automatically do
the cleanup of the old one if necessary, since we now know which ones
are per-peer.
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
We currently rely on a zero exit status. That's the only difference between
onchain finished handling and other per-peer daemons, so instead we should
have an explicit "done" message. This is both clearer, and allows us to
unify.
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
Now the flow is much simpler from a lightningd POV:
1. If we want to connect to a peer, just send gossipd `gossipctl_reach_peer`.
2. Every new peer, gossipd hands up to lightningd, with global/local features
and the peer fd and a gossip fd using `gossip_peer_connected`
3. If lightningd doesn't want it, it just hands the peerfd and global/local
features back to gossipd using `gossipctl_handle_peer`
4. If a peer sends a non-gossip msg (eg `open_channel`) the gossipd sends
it up using `gossip_peer_nongossip`.
5. If lightningd wants to fund a channel, it simply calls `release_channel`.
Notes:
* There's no more "unique_id": we use the peer id.
* For the moment, we don't ask gossipd when we're told to list peers, so
connected peers without a channel don't appear in the JSON getpeers API.
* We add a `gossipctl_peer_addrhint` for the moment, so you can connect to
a specific ip/port, but using other sources is a TODO.
* We now (correctly) only give up on reaching a peer after we exchange init
messages, which changes the test_disconnect case.
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
We're going to make the ip/port optional, so they should go at the end.
In addition, using ip:port is nicer, for gethostbyaddr().
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
We have to do a dance when we get a reconnect in openingd, because we
don't normally expect to free both owner and peer. It's a layering
violation: freeing a peer should clean up the owner's pointer to it,
to avoid a double free, and we can eliminate this dance.
The free order is now different, and the test_reconnect_openingd was
overprecise.
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
This fixes the only case where the master currently has to write directly
to the peer: re-sending an error. We make gossipd do it, by adding
a new gossipctl_fail_peer message.
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
In particular, the main daemon needs to pass it about (marshal/unmarshal)
but it won't need to actually use it after the next patch.
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
We pull them from the database on-demand, where we're storing them
anyway. No need to keep them in memory as well.
Signed-off-by: Christian Decker <decker.christian@gmail.com>
No idea why we were iterating over the list of stubs and then passing
in the index instead of a pointer to the stub directly.
Signed-off-by: Christian Decker <decker.christian@gmail.com>
This wires in the loading of `struct htlc_stub`s on-demand when
starting `onchaind` so that we don't need to keep them in memory.
Signed-off-by: Christian Decker <decker.christian@gmail.com>
So far we were tracking the status by including it either in the paid
or the unpaid list. This refactor makes the state explicit, which
matches the planned DB schema much better.
Signed-off-by: Christian Decker <decker.christian@gmail.com>
While loading HTLCs from the database we might not yet have all the
incoming HTLCs loaded when loading a dependent htlc_out. So we defer
the wiring of the HTLCs until we are sure we have them loaded.
This is also the first step towards keeping that association only in
the database, since otherwise we cannot selectively load channels from
DB.
Signed-off-by: Christian Decker <decker.christian@gmail.com>
Especially when testing we might want to disable the automatic
reconnection logic in order not to masquerade bugs that disappear when
reconnecting.
Signed-off-by: Christian Decker <decker.christian@gmail.com>
Seems to go out to lunch on reorgs:
+136792.168286138 lightningd(9465):BROKEN: bitcoin-cli getchaintips exited 28: 'error code: -28
error message:
Rewinding blocks...
Closes: #286
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
We don't hit this in testing, since we wait for startup already. Hacking
tests to avoid that, I tested this code by hand.
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
Using pc after free in the pay_command_destroyed destructor, so
we just steal cmd onto pc so free order is the one we want.
[ Edit: expanded comment, split commit ]
Signed-off-by: Christian Decker <decker.christian@gmail.com>
So far only happens during normal shutdown, but it may happen in other
cases as well. We simply define a new destructor that unregisters the
`cmd` from the `jcon`.
Signed-off-by: Christian Decker <decker.christian@gmail.com>
These were fun to hunt down. The jcon and the conn are allocated off
of ld, so the free order is unspecified and if conn is freed before
conn then the finish_jcon destructor uses conn after free.
[ Edit: split commit, modified to use a destructor directly on jcon,
which is more robust than relying on it only being freed via conn --RR ]
Signed-off-by: Christian Decker <decker.christian@gmail.com>
peer_fail_permanent() frees peer->owner, but for bad_peer() we're
being called by the sd->badpeercb(), which then goes on to
io_close(conn) which is a child of sd.
We need to detach the two for this case, so neither tries to free the
other.
This leads to a corner case when the subd exits after the peer is gone:
subd->peer is NULL, so we have to handle that too.
Fixes: #282
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
We have a race where we start onchaind, but state is unchanged, so checks
like peer_control.c's:
peer_ready = (peer->owner && peer->state == CHANNELD_AWAITING_LOCKIN);
if (!peer_ready) {
log_unusual(peer->log,
"Funding tx confirmed, but peer state %s %s",
peer_state_name(peer->state),
peer->owner ? peer->owner->name : "unowned");
} else {
subd_send_msg(peer->owner,
take(towire_channel_funding_locked(peer,
peer->scid)));
}
Can send to the wrong daemon.
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
We were sending a channeld message to onchaind, which was v. confusing
due to overlap. We make all the numbers distinct, which means we can
also add an assert() that it's valid for that daemon, which catches
such errors immediately.
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
We re-use the value for reasonable_depth given by the master, and we
tell it when our timeout transactions reach that depth.
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
When we see an offered HTLC onchain, we need to use the preimage if we
know it. So we dump all the known HTLC preimages at startup, and send
new ones as we discover them.
This doesn't cover preimages we know because we're the final
recipient; that can happen if an HTLC hasn't been irrevocably
committed yet. We'll do that in a followup patch.
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
If the HSM is slow it might happen that the timestamp has changed the
second time we come around, so we generate the timestamp externally
and pass it in so we're sure it won't change between calls.
Reported-by: Rusty Russell
Signed-off-by: Christian Decker <decker.christian@gmail.com>
lightningd can crash on shutdown if it's in the middle of getchaintips;
we free the conn, the finished callback is called (process_chaintips),
and it reports that it received an empty result.
The simplest fix is to set a flag in the struct bitcoind destructor,
and avoid the callback.
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
Either when it exits with a signal, or sends an error status message.
Then we make test_lightningd.py use it.
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
This change is really to allow us to have a --dev-fail-on-subdaemon-fail option
so we can handle failures from subdaemons generically.
It also neatens handling so we can have an explicit callback for "peer
did something wrong" (which matters if we want to close the channel in
that case).
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
1. Remove reference to old $(LIGHTNINGD_OLD_LIB_OBJS) var (in handshaked too).
2. Make check depend directly on unit tests, insteadof weird lightningd/tests
variable.
3. check-source-bolt and check-whitespace are automatic for $(ALL_TEST_PROGRAMS)
so we don't need them here.
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
This is the step where we broadcast the transaction to the network and
a nice place to extract the change from the transaction.
Signed-off-by: Christian Decker <decker.christian@gmail.com>
It's no longer used and we definitely do not want to run with an
outdated or future db, so we'll terminate if we can't upgrade or
the version is newer than what we understand.
Signed-off-by: Christian Decker <decker.christian@gmai.com>
So far we were always using the deadline in the announcements, that's
obviously not good, so this introduces the parameter as per spec.
Signed-off-by: Christian Decker <decker.christian@gmail.com>
We weren't killing it. Eventually it would die, and peer_owner_finished()
would access subd->peer->owner, but that peer was freed already.
Closes: #261
Reported-by: Christian Decker <decker.christian@gmail.com>
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
To reproduce the next bug, I had to ensure that one node keeps thinking it's
disconnected, then the other node reconnects, then the first node realizes
it's disconnected.
This code does that, adding a '0' dev-disconnect modifier. That means
we fork off a process which (due to pipebuf) will accept a little
data, but when the dev_disconnect file is truncated (a hacky, but
effective, signalling mechanism) will exit, as if the socket finally
realized it's not connected any more.
The python tests hang waiting for the daemon to terminate if you leave
the blackhole around; to give a clue as to what's happening in this
case I moved the log dump to before killing the daemon.
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
In this case, we unset the old subd->peer, then freed subd.
peer_owner_finished dereferenced subd->peer->owner, and boom:
test_disconnect_funder (__main__.LightningDTests) ... Fatal signal 11. Log dumped in crash.log
------------------------------- Valgrind errors --------------------------------
Valgrind error file: valgrind-errors.2882
==2882== Invalid read of size 8
==2882== at 0x413F74: peer_owner_finished (peer_control.c:679)
==2882== by 0x41EA2C: destroy_subd (subd.c:381)
==2882== by 0x459700: notify (tal.c:240)
==2882== by 0x459BB1: del_tree (tal.c:400)
==2882== by 0x459FC0: tal_free (tal.c:509)
==2882== by 0x413796: peer_reconnected (peer_control.c:493)
==2882== by 0x413A6A: add_peer (peer_control.c:592)
==2882== by 0x40ED1F: handshake_succeeded (new_connection.c:186)
==2882== by 0x41E3DD: sd_msg_reply (subd.c:262)
==2882== by 0x41E6BB: sd_msg_read (subd.c:318)
==2882== by 0x41E4E6: read_fds (subd.c:283)
==2882== by 0x44DEB4: next_plan (io.c:59)
==2882== Address 0x838 is not stack'd, malloc'd or (recently) free'd
==2882==
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
The logic of dispatching the announcement_signatures message was
distributed over several places and daemons. This aims to simplify it
by moving it all into `channeld`, making peer_control only report
announcement depth to `channeld`, which then takes care of the
rest. We also do not reuse the funding_locked tx watcher since it is
easier to just fire off a new watcher with the specific purpose of
waiting for the announcement_depth.
Signed-off-by: Christian Decker <decker.christian@gmail.com>
1. The code to skip over padding didn't take into account max.
2. It also didn't use symbolic names.
3. We are not supposed to fail on unknown addresses, just stop parsing.
4. We don't use the read_ip/write_ip code, so get rid of it.
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
Use a negative timestamp as the flag for this, making the test simple.
This allows valgrind to detect that we're accessing them prematurely,
including across the wire on gossip_getchannels_entry.
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
jl777 reported a crash when we try to pay past reserve. Fix that (and
a whole class of related bugs) and add tests.
In test_lightning.py I had to make non-async path for sendpay() non-threaded
to get the exception passed through for testing.
Closes: #236
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
You will want to 'make distclean' after this.
I also removed libsecp; we use the one in in libwally anyway.
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
Some fields were redundant, some are simply moved into 'struct lightningd'.
All routines updated to hand 'struct lightningd *ld' now.
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
Also, we split the more sophisticated json_add helpers to avoid pulling in
everything into lightning-cli, and unify the routines to print struct
short_channel_id (it's ':', not '/' too).
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
To avoid everything pulling in HTLCs stuff to the opening daemon, we
split the channel and commit_tx routines into initial_channel and
initial_commit_tx (no HTLC support) and move full HTLC supporting versions
into channeld.
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
Other places require the flags and states, but the structure is
only needed in channeld, and even then we can remove several fields.
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
I was hoping to defer HTLC updates until we actually store HTLCs, but
we need to flush to DB whenever balances update as well.
Signed-off-by: Christian Decker <decker.christian@gmail.com>
We're very simple about it: if there's a reorganization, we restart. Otherwise
we tell it about everything.
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
It's in the shachain, so storing it is completely redundant. We leave
it in for the moment so we can assert() that nothing has changed.
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
When loading from DB the list of htlcs was not being initialized which
caused a segfault when the first commit came around, this fixes it.
Signed-off-by: Christian Decker <decker.christian@gmail.com>
The peer->seed needs to be unique for each channel, since bitcoin
pubkeys and the shachain are generated from it. However we also need
to guarantee that the same seed is generated for a given channel every
time, e.g., upon a restart. The DB channel ID is guaranteed to be
unique, and will not change throughout the lifetime of a channel, so
we simply mix it in, instead of a separate increasing counter.
We also needed to make sure to store in the DB before deriving the
seed, in order to get an ID assigned by the DB.
Signed-off-by: Christian Decker <decker.christian@gmail.com>
This is the big one, and it's completely anticlimactic: it loads all
channels that have reached opening and are not marked as
closingd_complete into memory, that's it.
Signed-off-by: Christian Decker <decker.christian@gmail.com>
This was supposed to be a temporary solution anyway, and I had a
rather annoying mixup between peer_id and unique_id, the latter of
which is actually a connection identifier.
Add the channel to the peer on the two open paths (fundee and funder)
and store it into the database. Currently fails when opening a channel
to a known peer after loading from DB because we attempt to insert a
new peer with the same node_id. Will fix later.
As per lightning-rfc change 956e8809d9d1ee87e31b855923579b96943d5e63
"BOLT 7: add chain_hashes values to channel_update and channel_announcment"
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
This brings us up to 955e874acc535ab2c74c1cf0eab61896ea4224ff in
https://github.com/lightningnetwork/lightning-rfc
This doesn't actually change anything; the only actual change is held back
for the next commit.
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
This is required for onchaind: we want to watch all descendents by default,
as to do otherwise would be racy, which means we need to traverse the outputs
when a tx appears.
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
And store in peer->last_tx/peer->last_sig like all other places,
that way we broadcast it if we need to.
Note: the removal of tmpctx in funder_channel() is needed because we
use txs[0], which was allocated off tmpctx.
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
We need to check if we exit after sending a revoke_and_ack, otherwise
channeld ends up getting the closing_signed packet.
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
tal_strdup() doesn't set tal_count(), so we end up sending an ERROR
packet with an empty message. Wrap this and get it right.
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
There was a race condition that would cause an assertion to segfault
if a call to release_peer was interleaved with a fail_peer. The
release_peer was making the peer non-local, which was then causing the
assertion in fail_peer to fail. Now we just have 3 cases: not found,
local, and non-local.
Signed-off-by: Christian Decker <decker.christian@gmail.com>
After quite some back and forth we seem to finally agree on the bit
3 (mask 0x08) to signal optional initial_routing_sync.
Signed-off-by: Christian Decker <decker.christian@gmail.com>
This was causing failures on testnet where confirmations are not
immediate.
Reported-by: Fabrice Drouin @sstone
Signed-off-by: Christian Decker <decker.christian@gmail.com>
We support a number of features already, so failing connections
whenever we see an even bit set is not a good idea. This turned out to
kill our connections to eclair.
Also, the spec says that the LSB / bit 0 is to be counted as index 0, and
therefore even. So we need to check the lower of each 2-bit-tuple not
the higher one.
We were using the bitcoin genesis blockhash for all networks, which is
not correct, and would result in the open being aborted when talking
to other implementations.
Reported-by: @sstone and @pm47
Signed-off-by: Christian Decker <decker.christian@gmail.com>
We weren't registering reconnecting peers for broadcasts. Just
starting a timer is enough. Also added an integration test to check
that the gossip sync is being resumed.
test_closing_negotiation_reconnect (__main__.LightningDTests) ... peer state CLOSINGD_COMPLETE should be CLOSINGD_SIGEXCHANGE
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
This is a transitional state, while we're waiting to see the
closing tx onchain (which is To Be Implemented).
The simplest way to do re-transmission is to re-use closingd, and just
disallow any updates.
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
I made the mistake of thinking it was a [NUM_SIDES] array, but
it's actually our balance, and it's in millisatoshi. Rename
for clarity.
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
As tracked down by Christian; by setting up the master conn first,
we make the master fd async. This means that the synchronous read
(in init_channel) can fail with -EAGAIN, and indeed, Christian
saw this when not running under valgrind.
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
We actually don't need to transition if we're reconnecting, and logic
to go to CHANNELD_NORMAL was wrong: we checked that we'd seen funding tx
locked, but not that we'd received a msg from the remote peer.
We need to fix the tests now we no longer double-transition, too.
Fixes: #188
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
Important: a non-standard one can make the closing tx not propagate.
Drive-by cut&paste message fix, too.
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
This is what it actually is, and makes it clearer when we refer to the
spec. It's the commitment we're currently updating, which is the next
commitment.
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
We currently have the problem that the master can send new HTLCs before
we've processed the incoming reestablish message.
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
The next patch includes wire/peer_wire.h and causes a compile error
as lightningd/gossip_control.c defined its own gossip_msg function.
New names are clearer.
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
This can happen even without a protocol violation, if the incoming
update_add_htlc crosses over our outgoing shutdown.
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
We keep the scriptpubkey to send until after a commitment_signed (or,
in the corner case, if there's no pending commitment). When we
receive a shutdown from the peer, we pass it up to the master.
It's up to the master not to add any more HTLCs, which works because
we move from CHANNELD_NORMAL to CHANNELD_SHUTTING_DOWN.
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
We don't need to keep this around any more: by handing it to
subdaemons we ensure we'll close it if the peer disconnects, and we
also add code to get a new one on reconnection.
Because getting a gossip_fd is async, we re-check the peer state after
it gets back. This is kind of annoying: perhaps if we were to hand
the reconnected peer through gossipd (with a flag to immediately
return it) we could get the gossip fd that way and unify the paths?
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
At the moment, master simply keeps the gossip fd open when peer
disconnects. That's inefficient, and wrong anyway (it may want a
complete new sync, or may not, but we'll currently send all the
messages including stale ones).
This interface will be required for restart anyway.
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
Now we're always sync, just use an fd. Put the hsm_sync_read() helper
here, too, and do HSM init sync which makes things much simpler.
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
With no async calls left, we can just use a stack variable for the fd.
And we're now *always* in the hands of some daemon, unless we're
disconnected, so owner is only NULL in that case.
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
We had a terrible hack in gossip when a peer didn't exist. Formalize
a pattern when code+200 is a failure (with no fds passed), and use it
here.
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
This means there's no GETTING_HSMFD state at all any more. We
temporarily play games with the hsm fd; those will go away once we're
done.
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
This means there's no GETTING_SIG_FROM_HSM state at all any more. We
temporarily play games with the hsm fd; those will go away once we're
done.
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
Wallet should really be the container for anything bip32 related, so
I'd like to slowly wean off of `ld->bip32_base` in favor of
`ld->wallet->bip32_base`
We'll re-use them a few times so having them at a central location is
nice. We also fix a bug that was unreserving UTXO entries upon free,
instead of promoting them to being spent.
So far we always needed to know the public key, which was not the case
for addresses that we don't own. Moving the hashing outside of the
script construction allows us to send to arbitrary addresses. I also
added the hash computation to the pubkey primitives.
When we get a fail/fulfill on an outgoing HTLC, we tell the correspoding
incoming HTLC about it. But if that peer is disconnected, we don't.
The better solution is to copy the preimage/malformed/failmessage and mark
the incoming HTLC as resolved. This is done most simply by marking it
SENT_REMOVE_HTLC, which will work in the database case as well.
channeld now re-transmits appropriately when it gets started with an HTLC
in that state.
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
This matches what the master does: increments commit index when we send
commit_sig. Thus if we restart at that point, we match.
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
This matters in one case: channeld receiving a bad message is a
permenant failure, whereas losing a connection is transient.
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
We need the old remote per_commitment_point so we can validate the
per_commitment_secret when we get it.
We unify this housekeeping in the master daemon using
update_per_commit_point().
This patch also saves whether remote funding is locked, and disallows
doing that twice (channeld should ignore it).
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
It's a bit tricky since we want to hand more verbose errors to the local
case, but the locally-created and forwarded paths had diverged (the local
one missing some things).
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
There are two ways we can do retransmission on reconnect: re-derive
what we would have sent, or remember it and simply re-send. The
rederivation is difficult: unwinding state depends on whether we sent
a revoke_and_ack before or after the commitment_signed, and unwinding
a revoke_and_ack would require us to remember HTLCs we would have
normally forgotten at this point.
So we simply tell the master to remember the old signatures for us,
and hand them back in case we need to re-send.
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
In the case where we can't decrypt the onion, we can't fail it in the
normal way (which is encrypted using the onion shared secret), we need
to respond with a update_fail_malformed_htlc message.
Moreover, we need to remember this for persistence. This means that
we really have three conclusions for an HTLC: fulfilled, failed,
malformed. Fix up the logic everywhere which assumed failed or
fulfilled.
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
It's easiest to have the master keep the last commit we sent, for
re-transmission. We could recalculate it, but it's made more difficult
by the before/after revoke case.
And because revoke_and_ack changes the channel state, we need to
remember which order we sent them in for re-transmission.
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
We need this for reestablishing a channel.
(Note: this patch changes quite a bit in this series, but reshuffling was
tedious).
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
Currently it's fairly ad-hoc, but we need to tell it to channeld when
it restarts, so we define it as the non-HTLC balance.
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
It needs to save them to the db in case of restart; this means we tell
it about funding_locked, as well as the next_per_commit_point given
in revoke_and_ack.
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
The channel daemon gets the shared secrets from the HSM to save
the master daemon some work. It used to hand these over at
revoke_and_ack receive, which is when the master daemon needs them.
However, it's a bit simpler to hand them over when we first tell
the master about the incoming HTLC (the first commitsig).
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
They share some fields, but they're basically different, and it's clearest
to treat them differently in most places.
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
When adding their HTLCs, it needs all the information. When failing,
it needs the id as key and the failure reason. When fulfilling, it
needs the id and payment preimage.
It also needs to know when we have received an revoke_and_ack or a
commitment_signed, to place in the database.
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
We're about to change to a batch interface, where we tell the master
before we send certain packets (eg. commit, revoke). We need to wait
for it to respond before doing anything else, but it might cross-over
and be sending us commands at the same time.
This queues those requests until we're ready.
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
This prepares us for handlers turning off peer I/O, rather than assuming
we always want to handle the next incoming message.
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
We still get the shared secret, since that requires a round trip to the HSM
(why waste the master daemon's time?) but it does the processing, which
simplifies the message passing and things like realm handling which
have nothing to do with this particular channeld.
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
Some paths were still sending unencrypted failure messages; unify them
all. We need to keep the fail_msg around for resubmission if the
channeld dies; similarly, we need to keep the htlc_end structure
itself after failure, in case the failed HTLC is committed: we can
move it to a minimal archive once it's flushed from both sides,
however.
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
Means caller has to do some more work, but this is closer to what we want:
we're going to want to send them to the master daemon for atomic commit.
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
We used level -1 to mean "append to log", but that doesn't actually
work, and results in an assert if we try to prune the logs:
lightningd: daemon/pseudorand.c:36: pseudorand: Assertion `max' failed.
Expose logv_add and use that.
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
We use --log-level to control this, but we could add another switch.
It makes the test infrastructure simpler, since we can just look in the
main logs.
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
In particular, it reassured me that the ammag obfuscation step occurs
even for the initial failmsg creator.
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
I actually hit this very hard to reproduce race: if we haven't process
the channeld message when block #6 comes in, we won't send the gossip
message. We wait for logs, but don't generate new blocks, and timeout
on l1.daemon.wait_for_log('peer_out WIRE_ANNOUNCEMENT_SIGNATURES').
The solution, which also tests that we don't send announcement signatures
immediately, is to generate a single block, wait for CHANNELD_NORMAL,
then (in gossip tests), generate 5 more.
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
Since we have a simple way to query the database for UTXOs we can
simplify some of the coin selection logic. That gets rid of the
in-memory list of UTXOs.
Not the nicest code, but it allows us to store the bip32_max_index so
that we don't forget our addresses upon restart. We could have done
the same by retrieving the max index from our index, but then we'd
forget addresses that don't have an associated output. Conversion
to/from string is so that we can store arbitrary one off values in the
DB in the future, independent of type.
The format we use to generate marshal/unmarshal code is from
the spec's tools/extract-formats.py which includes the offset:
we don't use it at all, so rather than having manually-calculated
(and thus probably wrong) values, or 0, emit it altogther.
Reported-by: Christian Decker
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
We use this to make it send the funding_signed message, rather than having
the master daemon do it (which was even more hacky). It also means it
can handle the crypto, so no need for the packet to be handed up encrypted,
and also make --dev-disconnect "just work" for this packet.
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
We use a file descriptor, so when we consume an entry, we move past it
(and everyone shares a file offset, so this works).
The file contains packet names prefixed by - (treat fd as closed when
we try to write this packet), + (write the packet then ensure the file
descriptor fails), or @ ("lose" the packet then ensure the file
descriptor fails).
The sync and async peer-write functions hook this in automatically.
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
Header from folded patch 'test-run-cryptomsg__fix_compilation.patch':
test/run-cryptomsg: fix compilation.
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
Valgrind error file: /tmp/lightning-8k06jbb3/test_disconnect/lightning-7/valgrind-errors
==32307== Uninitialised byte(s) found during client check request
==32307== at 0x11EBAD: memcheck_ (mem.h:247)
==32307== by 0x11EC18: towire (towire.c:14)
==32307== by 0x11EF19: towire_short_channel_id (towire.c:92)
==32307== by 0x12203E: towire_channel_update (gen_peer_wire.c:918)
==32307== by 0x1148D4: send_channel_update (channel.c:185)
==32307== by 0x1175C5: peer_conn_broken (channel.c:1010)
==32307== by 0x13186F: destroy_conn (poll.c:173)
==32307== by 0x13188F: destroy_conn_close_fd (poll.c:179)
==32307== by 0x13B279: notify (tal.c:235)
==32307== by 0x13B721: del_tree (tal.c:395)
==32307== by 0x13BB3A: tal_free (tal.c:504)
==32307== by 0x130522: io_close (io.c:415)
==32307== Address 0xffefff87d is on thread 1's stack
==32307== in frame #2, created by towire_short_channel_id (towire.c:88)
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
This is simpler than passing back and forth, for the moment at least. That
means we don't need to ask for a new one on reconnect.
This partially reverts the gossip handling in openingd, since it no longer
passes the gossip fd back. We also close it when peer is freed, so it
needs initializing to -1.
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
We can go to release a gossip peer, and it can fail at the same time.
We work around the problem that the reply must be a gossipctl_release_peer_reply
with two fds, but it's not pretty.
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
We kill the existing connection if possible; this may mean simply
forgetting the prior peer altogether if it's in an early state.
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
Instead, send it the funding_signed message; it can watch, save to
database, and send it.
Now the openingd fundee path is a simple request and response, too.
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
Simplifies state machine. Master still has to calculate the tx to get
the signature and broadcast, but now the opening daemon funding path
is a simple request/response.
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
We want to use it in peer_control to generate the transaction, but we
really only need the funding_pubkey.
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
Like the fd, it's only useful when the peer is not in a daemon, so we
free & NULL it when that happens.
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
We steal it when we're closing connection, but we normally want to forget
it if connection just dies.
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
1. We explicitly assert what state we're coming from, to make transitions
clearer.
2. Every transition has a state, even between owners while waiting for HSM.
3. Explictly step though getting the HSM signature on the funding tx
before starting channeld, rather than doing it in parallel: makes
states clearer.
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
We need to do this on every connection, whether reconnecting or not,
so it makes sense for the handshake daemon to handle it and return
the feature fields.
Longer term I'm considering having the handshake daemon handle the
listening and connecting, and simply hand the fds back once the peers
are ready.
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
We currently create a peer struct, then complete handshake to find out
who it is. This means we have a half-formed peer, and worse: if it's
a reconnect we get two peers the same.
Add an explicit 'struct connection' for the handshake phase, and
construct a 'struct peer' once that's done.
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
Now in sync with 8ee57b97738b1e9467a1342ca8373d40f0c4aca5.
Our tool doesn't need to convert them any more, but we actually had a
mis-typed field in the HSM which needed fixing.
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
The single string-based hostname and port has been retired in favor of
having multiple `struct ipaddr`s from the `node_announcement`. This
breaks the hostnames and ports from IRC, but I didn't bother to
backport ipaddr for it since it is only used in the legacy daemon.
Rather a big commit, but I couldn't figure out how to split it
nicely. It introduces a new message from the channel to the master
signaling that the channel has been announced, so that the master can
take care of announcing the node itself. A provisorial announcement is
created and passed to the HSM, which signs it and passes it back to
the master. Finally the master injects it into gossipd which will take
care of broadcasting it.
We alternated between using a sha256 and using a privkey, but there are
numerous places where we have a random 32 bytes which are neither.
This fixes many of them (plus, struct privkey is now defined in terms of
struct secret).
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
Under stress, the tests can mine blocks too soon, and the funding never
locks. This gives more of a chance, at least.
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
We were getting an assert "!secp256k1_fe_is_zero(&ge->x)", because
an all-zero pubkey is invalid. We allow marshal/unmarshal of NULL for
now, and clean up the error handling.
1. Use status_failed if master sends a bad message.
2. Similarly, kill the gossip daemon if it gives a bad reply.
3. Use an array for returned pubkeys: 0 or 2.
4. Use type_to_string(trc, struct short_channel_id, &scid) for tracing.
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
I implemented this because a bug causes us to consider the HTLC malformed,
so I can trivially test it for now.
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
Since we now use the short_channel_id to identify the next hop we need
to resolve the channel_id to the pubkey of the next hop. This is done
by calling out to `gossipd` and stuffing the necessary information
into `htlc_end` and recovering it from there once we receive a reply.
This was overly complex since it was off-by-one and we were storing
some information elsewhere. Now this just loads the route as is into
structs, extracts some information for our outgoing HTLC, and then
shifts by the array of structs by one, and finally fills in the last
instruction, which is the terminal.
The new onion uses the `channel_id` instead of the `node_id` of the
next hop to identify where to forward the payment. So we return the
exact channel chosen by the routing algo, to avoid having to look it
up again later.
Mainly switching from the old include to the new include and adjusting
the actual size of the onion packet. It also moves `channel.c` to use
`struct hop_data`.
It introduces a dummy next hop in `channel.c` that will be replaced in
the next commit.
Adds a new command line flag `--dev-broadcast-interval=<ms>` that
allows us to specify how often the staggered broadcast should
trigger. The value is passed down to `gossipd` via an init message.
This is mainly useful for integration tests, since we do not want to
wait forever for gossip to propagate.
We were using an uninitialized `broadcast_index` on the peer which
would occasionally result in no forwardings at all, segmenting the
network. And during the `msg_queue` refactor, some wait targets were
not updated, resulting in the waits never to be woken up.
This moves all the non-legacy blackbox testing into python.
Before:
real 10m18.385s
After:
real 9m54.877s
Note that this doesn't valgrind the subdaemons: that patch seems to cause
some issues in the python framework which I am still chasing.
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
Rather than dumping all gossip messages then handling local ones again.
This should help us give timely ping replies.
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
This fails on the old dev-restart tests, so we need to only enable it
for the new tests:
rusty@rusty-XPS-13-9360:~/devel/cvs/lightning (guilt/ping-pong)$ daemon/test/test-basic --restart --verbose
...
{ }
RESTARTING
dev-restart failed!
valgrind: mmap(0x38000000, 2265088) failed in UME with error 22 (Invalid argument).
valgrind: this can be caused by executables with very large text, data or bss segments.
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
Only the side *accepting* the connection gives a `minumum_depth`, but both
sides are supposed to wait that long:
BOLT #2:
### The `funding_locked` message
...
#### Requirements
The sender MUST wait until the funding transaction has reached
`minimum-depth` before sending this message.
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
We now have two partially overlapping state-machines: the channel
state and the announcement state. We need to request signatures from
the HSM to exchange them with the peer, and we need to have both sets
of signatures before we can proceed and send the actual announcements.
Instead of reusing HSMFD_ECDH, we have an explicit channeld hsm fd,
which can do ECDH and will soon do channel announce signatures as well.
Based-on: Christian Decker <decker.christian@gmail.com>
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
We *should* split the struct into key and data, rather than only comparing
the key parts in the htlc_end_eq function. But meanwhile, this fixes
the code.
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
This lets us link HTLCs from one peer to another; but for the moment it
simply means we can adjust balance when an HTLC is fulfilled.
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
This is an approximate result (it's only our confirmed balance, not showing
outstanding HTLCs), but it gives an easy way to check HTLCs have been
resolved.
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
If a peer dies, and then we get a reply, that can cause access after free.
The usual way to handle this is to make the request a child of the peer,
but in fact we still want to catch (and disard) it, so it's a little
more complex internally.
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
We call channel_sent_commit *before* sending (so we know if we need
to), so the name is wrong. Similarly channel_sent_revoke_and_ack.
We can usefully have them tell is if there is outstanding work to do,
too.
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
Passing through 'struct peer *' was a layering violation.
Reported-by: Christian Decker <decker.christian@gmail.com>
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
The three cases we care about only happen on specific transitions:
1. They can no longer spend our failed HTLC: we can fail the source now.
2. They are fully committed to their new HTLC htlc: we can forward now.
3. They can no longer timeout their fulfilled HTLC: the funds are ours.
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
The direction bit was computed in several spots and was inconsistent
in some cases. Now we compute it just in routing, and once when
starting up `channeld`, this avoids recomputing it all over the place.
Now we correctly use the remote revocation basepoint, we need to set
it in run-channel (instead of the local revocation basepoint).
We also update all the comments, as per (pending) spec commit:
https://github.com/lightningnetwork/lightning-rfc/pull/137
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
Before exiting, `channeld` constructs and sends a `channel_update`
marking the channel as disabled. This is the pro-active signalling
that the channel may no longer be used.
Copied the JSON-request parsing from `pay.c`, passing through to
`gossipd`, filling the reply with the `route_hop` serialization, and
serializing as JSON-RPC response.
The `route_hop` struct introduced in the previous refactoring is
reused when returning the reply to a `getroute` request. Since these
are nested messages I added the serialization and deserialization
methods.
This came up while debugging the gossip daemon breaking upon calling
`getroute`. It turns out that log was still writing to stdout, but
stdout had been reused for an inter-daemon socket, which would
break...
The STDOUT fd being reused as communication sockets with other daemons
was causing some unexpected crashes if the sub-daemon wrote something,
e.g., using `log_*`. Not closing it should avoid that conflict.
Some of the struct array helpers need to allocate data when
deserializing their fields. The `getnodes` reply is one such example
that allocates the hostname. Since the change to calling array helpers
the getnodes call was broken because it was attempting to allocate off
of the entry, which did not have a tal header, thus failing.
Use msg_enqueue's wake and msg_queue_wait, and don't clone packets since
msg_enqueue() respects take.
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
We remove the unused status_send_fd, and rename status_send_sync (it
should only be used for that case now).
We add a status_setup_async(), and wire things internally to use that
if it's set up: status_setup() is renamed status_setup_sync().
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
This is a little more awkward, as we used to do some work
synchronously (the init message), but it's still pretty clear.
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
Since we use async IO, we can't use status_send. We keep a pointer to the
master daemon_conn, and use that to send.
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
The gossip subdaemon previously passed the fd after init: this is
unnecessary for peers which simply want to gossip (and not establish
channels).
Now we hand the gossip fd back with the peer fd. This adds another
error message for when we fail to create the gossip fds.
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
Instead of indicating where to place the fd, you say how many: the
fd array gets passed into the callback.
This is also clearer for the users.
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
This measn that gossip (which also wants to wake it) needs to wake
the queue, not the daemon_conn.
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
We use the fourth value (size) to determine the type, unless the fifth
value is suppled. That's silly: allow the fourth value to be a typename,
since that's the only reason we care about the size at all!
Unfortunately there are places in the spec where we use a raw fieldname
without '*1' for a length, so we have to distingish this from the
typename case.
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
Except for the trivial case of u8 arrays, have the generator create
the loop code for the array iteration.
This removes some trivial helpers, and avoids us having to write more.
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
subd_req() needs to get the type before it calls subd_send_msg, because
if it's take() then msg_enqueue() may reallocate.
Which also made me realize that subd_send_message() should not try to dup,
since msg_enqueue() handles that itself.
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
We have some duplication in handling queues, so this is an attempt at
deduplicating some of that work. `daemon_conn` now uses the
`msg_queue` and `channeld` was also migrated to `msg_queue`. At the
same time I made `msg_queue` create a copy of the messages or takes
over messages marked with `take()`. This should make cleaning up
messages easier.
This allows us to break out of the normal queue-based write loop and
handle things ourself for a while. Currently this is used to trigger
regular gossip dumps that do not proceed until the buffers have been
cleared in order to avoid memory-explosions.
We were firing off the wakeup timers all over the place, out of fear
that we would be triggering two concurrent broadcasts. This is not
really the case since the wakeup calls are idempotent. This also
allows us not to differentiate between triggering a broadcast on a
local peer or on a proxied peer.
This includes some code duplication, but since the two write targets
are fundamentally different we might need to refactor a bit more to
unify them again.
We will eventually ween off of the logging, or replace it with status
messages that log in `lightningd`, but for now we still have the
routing module that does some logging.
This uses a single fd for both status and control.
To make this work, we enforce the convention that replies are the same
as requests + 100, and that their name ends in "_REPLY".
This also means that various daemons can simply exit when done; there's
no race between reading request and closing status fds.
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
This was included in the lightningnetwork/lightning-rfc#105 update
to the test vectors, and it's a good idea. Takes a bit of work to
calculate (particularly, being aware of rounding issues).
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
aka "BOLT 3: Use revocation key hash rather than revocation key",
which builds on top of lightningnetwork/lightning-rfc#105 "BOLT 2,3,5:
Make htlc outputs of the commitment tx spendable with revocation key".
This affects callers, since they now need to hand us the revocation
pubkey, but commit_tx has that already anyway.
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
All the daemons will use a common seed for point derivation, so drag
it out of lightningd/opening.
This also provide a nice struct wrapper to reduce argument count.
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
We should start watching for the transaction before we send the
signature; we might miss it otherwise. In practice, we only see
transactions as they enter a block, so it won't happen, but be
thorough.
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
This is a bit tricky: for our signing code, we don't want scriptsigs,
but to calculate the txid, we need them. For most transactions in lightning,
they're pure segwit so it doesn't matter, but funding transactions can
have P2SH-wrapped P2WPKH inputs.
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
built_utxos needs to calculate fees for figuring out how many utxos to
use, so fix that logic and rely on it.
We make build_utxos return a pointer array, so funding_tx can simply hand
that to permute_inputs.
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
The spec 4af8e1841151f0c6e8151979d6c89d11839b2f65 uses a 32-byte 'channel-id'
field, not to be confused with the 8-byte short ID used by gossip. Rename
appropriately, and update to the new handshake protocol.
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>