This replacement is a little menial, but it explicitly catches all
the places where we allow a local socket. The actual implementation of
opening a AF_UNIX socket is almost hidden in the patch.
The detection of "valid address" is now more complex:
p->addr.itype != ADDR_INTERNAL_WIREADDR || p->addr.u.wireaddr.type != ADDR_TYPE_PADDING
But most places we do this, we should audit: I'm pretty sure we can't
get an invalid address any more from gossipd (they may be in db, but
we should fix that too).
Closes: #1323
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
It does all the other address handling, do this too. It also proves useful
as we clean up wildcard address handling.
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
Now we only bind to addresses in our wireaddrs array, we would not
autobind to local sockets if they couldn't reach google's nameserver.
That's clearly wrong: we should only not bind if there's a protocol
issue (eg. no IPv6 support).
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
It's become clear that our network options are insufficient, with the coming
addition of Tor and unix domain support.
Currently:
1. We always bind to local IPv4 and IPv6 sockets, unless --port=0, --offline,
or any address is specified explicitly. If they're routable, we announce.
2. --addr is used to announce, but not to control binding.
After this change:
1. --port is deprecated.
2. --addr controls what we bind to and announce.
3. --bind-addr/--announce-addr can be used to control one and not the other.
4. Unless --autolisten=0, we add local IPv4 & IPv6 port 9735 (and announce if they are routable).
5. --offline still overrides listening (though announcing is still the same).
This means we can bind to as many ports/interfaces as we want, and for
special effects we can announce different things (eg. we're sitting
behind a port forward or a proxy).
What remains to implement is semi-automatic binding: we should be able
to say '--addr=0.0.0.0:9999' and have the address resolve at bind
time, or even '--addr=0.0.0.0:0' and have the port autoresolve too
(you could determine what it was from 'lightning-cli getinfo'.
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
We set no_reconnect with --offline, but that doesn't work if !DEVELOPER.
Make the flag positive, and non-DEVELOPER mode for gossipd.
We also don't override portnum with --offline, but have an explicit
'listen' flag.
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
We can create the hsm file from python directly; that works even if we
don't have DEVELOPER set, and is simpler.
We add a test that the aliases are correct.
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
Originally we were supposed to tell the HSM we had just created the directory,
otherwise it wouldn't create a new seed. But we modified it to check if
there was a seed file anyway: just move that logic into a branch of hsmd.
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
No new functionality, just a continuation of my work toward completing #665.
I removed the common members of `struct withdrawal` and `struct fund_channel`
and placed them in a new `struct wallet_tx`. Then it was fairly straightforward
to reimplement the existing code in terms of `wallet_tx`.
Since I made some structural changes I wanted to get this approved before I
go any farther.
Added 'all' to fundchannel help message.
Fixes: #1445
Hacky fix, possibly. First cut at avoiding starting up onchaind and gossipd (which might make queries of chaintopology, which might start up a bitcoin-cli) before we can daemonize.
We're getting spurious closures, even on mainnet. Using --ignore-fee-limits
is dangerous; it's slightly less so to lower the minimum (which is the
usual cause of problems).
So let's halve it, but beware the floor.
This is a workaround, until we get independent feerates in the spec.
Fixes: #613
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
This means gossipd is live and we can tell it things, but it won't
receive incoming connections. The split also means that the main daemon
continues (eg. loading peers from db) while gossipd is loading from the store,
potentially speeding startup.
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
If we start accepting peer connections before we initialized some of the other
parts (mainly the chaintopology) we could end up asking for stuff that isn't
ready yet (blockchain head for example).
Signed-off-by: Christian Decker <decker.christian@gmail.com>
If channeld dies for some reason (eg, reconnect) and we didn't yet announce
the channel, we can miss doing so. This is unusual, because if lightningd
restarts it rearms the callback which gives us funding_locked, so it only
happens if just channel dies before sending the announcement message.
This problem applies to both temporary announcement (for gossipd) and
the real one. For the temporary one, simply re-send on startup, and
remote the error msg gossipd gives if it sees a second one. For the
real one, we need a flag to tell us the depth is sufficient; the peer
will ignore re-sends anyway.
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
When we get a reconnection, kill the current remote peer, and wait for the
master to tell us it's dead. Then we hand it the new peer.
Previously, we would end up with gossipd holding multiple peers, and
the logging was really hard to interpret; I'm not completely convinced
that we did the right thing when one terminated, either.
Note that this now means we can have peers with neither ->local nor ->remote
populated, so we check that more carefully.
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
Currently we intuit it from the fd being closed, but that may happen out
of order with when the master thinks it's dead.
So now if the gossip fd closes we just ignore it, and we'll get a
notification from the master when the peer is disconnected.
The notification is slightly ugly in that we have to disable it for
a channel when we manually hand the channel back to gossipd.
Note: as stands, this is racy with reconnects. See the next patch.
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
(This was sitting in my gossip-enchancement patch queue, but it simplifies
this set too, so I moved it here).
In 94711969f we added an explicit gossip_index so when gossipd gets
peers back from other daemons, it knows what gossip it has sent (since
gossipd can send gossip after the other daemon is already complete).
This solution is insufficient for the more general case where gossipd
wants to send other messages reliably, so replace it with the other
solution: have gossipd drain the "gossip fd" which the daemon returns.
This turns out to be quite simple, and is probably how I should have
done it originally :(
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
1. Lifetime of 'struct reaching' now only while we're actively doing connect.
2. Always free after a single attempt: if it's an important peer, retry
on a timer.
3. Have a single response message to master, rather than relying on
peer_connected on success and other msgs on failure.
4. If we are actively connecting and we get another command for the same
id, just increment the counter
The result is much simpler in the master daemon, and much nicer for
reconnection: if they say to connect they get an immediate response,
rather than waiting for 10 retries. Even if it's an important peer,
it fires off another reconnect attempt, unless it's actively
connecting now.
This removes exponential backoff: that's restored in next patch. It
also doesn't handle multiple addresses for a single peer.
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
And on channel_fail_permanent and closing (the two places we drop to
chain), we tell gossipd it's no longer important.
Fixes: #1316
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
These don't have a maximum number of reconnect attempts, and ensure
that we try to reconnect when the peer dies.
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
Simplification of the offset calculation to use the rescan parameter, and rename
of `wallet_first_blocknum`. We now use either relative rescan from our last
known location, or absolute if a negative rescan was given. It's all handled in
a single location (except the case in which the blockcount is below our
precomputed offset), so this should reduce surprises.
Signed-off-by: Christian Decker <decker.christian@gmail.com>
This is intended to recover from an inconsistent state, involving
`onchaind`. Should we for some reason not restore the `onchaind` process
correctly we can instruct `lightningd` to go back in time and just replay
everything.
Signed-off-by: Christian Decker <decker.christian@gmail.com>
Since we reference the channel ID to allow cascades in the database we also need
the ability to look up a channel by its database ID.
Signed-off-by: Christian Decker <decker.christian@gmail.com>
This will allow us in the next commit to store the transactions that triggered
this event in the DB and thus allowing us to replay them later on.
Signed-off-by: Christian Decker <decker.christian@gmail.com>
We used to queue the preimages to be sent to onchaind only after receiving the
onchaind_init_reply. Once we start replaying we might end up in a situation in
which we queue the tx that onchaind should react to before providing it with the
preimages. This commit just moves the preimages being sent, making it atomic
with the init, and without changing the order.
Signed-off-by: Christian Decker <decker.christian@gmail.com>
These were so far only used for bolt11 construction, but we'll need them for the
DNS seed as well, so here we just pull them out into their own unit and prefix
them.
Signed-off-by: Christian Decker <decker.christian@gmail.com>
d822ba1ee accidentally removed this case, which is important: if the
other side didn't get our final matching closing_signed, it will
reconnect and try again. We consider the channel no longer "active"
and thus ignore it, and get upset when it send the
`channel_reestablish` message.
We could just consider CLOSINGD_COMPLETE to be active, but then we'd
have to wait for the closing transaction to be mined before we'd allow
another connection.
We can't special case it when the peer reconnects, because there
could be (in theory) multiple channels for that peer in CLOSINGD_COMPLETE,
and we don't know which one to reestablish.
So, we need to catch this when they send the reestablish, and hand
that msg to closingd to do negotiation again. We already have code
to note that we're in CLOSINGD_COMPLETE and thus ignore any result
it gives us.
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
We're about to remove automatic retrying of connect, and that uncovered
that we actually print out our "Server started" message before we create
the listening socket.
Move the init higher (outside the db transaction) and make it a
request/response, the loop until it's done.
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
The new connect code revealed an existing race: we tell gossipd to
release the peer, but at the same time it connects in. gossipd fails
the release because the peer is remote, and json_fundchannel fails.
Instead, we catch this race when we get peer_connected() and we were
trying to open a channel. It means keeping a list of fundchannels which
are awaiting a gossipd response though.
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
We missed it in some corner cases where we crashed/were killed between
being told of the lockin and sending the channel_normal_operation message.
When we were restarted, we were told both sides were locked in already,
so we never updated the state.
Pull the entire "tell channeld" logic into channel_control.c, and make
it clear that we need to keep waching if we cant't tell channeld. I think
we did get this correct in practice, since funding_announce_cb has the
same test, but it's better to be clear.
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
We'd usually commit to the db soon, but there's a window where it
could be missed.
Also moves loc into the block it's used and make it tmpctx to avoid
an explicit free.
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
Without this, we can get errors on shutdown:
Valgrind error file: valgrind-errors.27444
==27444== Invalid read of size 8
==27444== at 0x1950E2: secp256k1_pubkey_load (secp256k1.c:127)
==27444== by 0x19CF87: secp256k1_ec_pubkey_serialize (secp256k1.c:189)
==27444== by 0x14FED9: towire_pubkey (towire.c:59)
==27444== by 0x15AAFB: towire_gossipctl_peer_disconnected (gen_gossip_wire.c:969)
==27444== by 0x1253EF: opening_channel_errmsg (opening_control.c:526)
==27444== by 0x1386A3: destroy_subd (subd.c:589)
==27444== by 0x18222C: notify (tal.c:240)
==27444== by 0x1826E1: del_tree (tal.c:400)
==27444== by 0x182733: del_tree (tal.c:410)
==27444== by 0x182733: del_tree (tal.c:410)
==27444== by 0x182B1F: tal_free (tal.c:511)
==27444== by 0x11FC53: main (lightningd.c:410)
==27444== Address 0x6c3af98 is 72 bytes inside a block of size 216 free'd
==27444== at 0x4C30D3B: free (in /usr/lib/valgrind/vgpreload_memcheck-amd64-linux.so)
==27444== by 0x1827BC: del_tree (tal.c:421)
==27444== by 0x182B1F: tal_free (tal.c:511)
==27444== by 0x11F3C7: shutdown_subdaemons (lightningd.c:211)
==27444== by 0x11FC27: main (lightningd.c:406)
==27444== Block was alloc'd at
==27444== at 0x4C2FB0F: malloc (in /usr/lib/valgrind/vgpreload_memcheck-amd64-linux.so)
==27444== by 0x182296: allocate (tal.c:250)
==27444== by 0x182863: tal_alloc_ (tal.c:448)
==27444== by 0x12F2DF: new_peer (peer_control.c:74)
==27444== by 0x125600: new_uncommitted_channel (opening_control.c:576)
==27444== by 0x125870: peer_accept_channel (opening_control.c:668)
==27444== by 0x13032A: peer_sent_nongossip (peer_control.c:427)
==27444== by 0x116B9E: peer_nongossip (gossip_control.c:60)
==27444== by 0x116F2B: gossip_msg (gossip_control.c:172)
==27444== by 0x138323: sd_msg_read (subd.c:503)
==27444== by 0x137C02: read_fds (subd.c:330)
==27444== by 0x175550: next_plan (io.c:59)
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
Also report tx and txid, and whether we closed unilaterally or
bilaterally, if we could close the channel.
Also make a manpage.
Fixes: #1207Fixes: #714Fixes: #622
We had an intermittant test failure, where the fee we negotiated was
further from our ideal than the final commitment transaction. It worked
fine if the other side sent the mutual close first, but not if we sent
our unilateral close first.
ERROR: test_closing_different_fees (__main__.LightningDTests)
----------------------------------------------------------------------
Traceback (most recent call last):
File "tests/test_lightningd.py", line 1319, in test_closing_different_fees
wait_for(lambda: p.rpc.listpeers(l1.info['id'])['peers'][0]['channels'][0]['status'][1] == 'ONCHAIN:Tracking mutual close transaction')
File "tests/test_lightningd.py", line 74, in wait_for
raise ValueError("Error waiting for {}", success)
ValueError: ('Error waiting for {}', <function LightningDTests.test_closing_different_fees.<locals>.<lambda> at 0x7f4b43e31a60>)
Really, if we're prepared to negotiate it, we should be prepared to
accept it ourselves. Simply take the cheapest tx which is above our
minimum.
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
The only use for these was to compute their txids so we could notify depth
in case of reorgs.
Signed-off-by: Christian Decker <decker.christian@gmail.com>
We are slowly hollowing out the in-memory blockchain representation to make
restarts easier.
Signed-off-by: Christian Decker <decker.christian@gmail.com>
All of the callback functions were only using the tx to generate the txid again,
so we just pass that in directly and save passing the tx itself.
This is a simplification to move to the DB backed depth callbacks. It'd be
rather wasteful to read the rawtx and deserialize just to serialize right away
again to find the txid, when we already searched the DB for exactly that txid.
Signed-off-by: Christian Decker <decker.christian@gmail.com>
This will later allow us to determine the transaction confirmation count, and
recover transactions for rebroadcasts.
Signed-off-by: Christian Decker <decker.christian@gmail.com>
Internally both payment and routing use 64-bit, but the interface
between them used 32-bit.
Since both components already support 64-bit we should use that.
In the short_channel_id check we were copying the entire result into the next
bitcoin-cli call, including the newline character.
Signed-off-by: Christian Decker <decker.christian@gmail.com>
Reported-By: @gdassori
Creating the pid-file before daemonizing results in the pid-file containing the
pid of the process that started the daemon, but is now dead.
Signed-off-by: Christian Decker <decker.christian@gmail.com>
Reported-By: Torkel Rogstad @torkelrogstad
We can have more than one; eg we might offer both bech32 and a p2sh
address, and in future we might offer v1 segwit, etc.
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
There are two very hard problems in software engineering:
1. Off-by-one errors
In this case we were rolling back further than needed and we were starting the
catchup one block further than expected.
Signed-off-by: Christian Decker <decker.christian@gmail.com>
I saw a failure in test_funding_fail():
assert l2.rpc.listpeers()['peers'][0]['connected']
This can happen if l2 hasn't yet handed back to gossipd. Turns out
we didn't mark uncommitted channels as connected:
[{'id': '03afa3c78bb39217feb8aac308852e6383d59409839c2b91955b2d992421f4a41e', 'connected': False, 'channels': [{'state': 'OPENINGD', 'owner': 'lightning_openingd', 'funder': 'REMOTE', 'status': ['Incoming channel: accepted, now waiting for them to create funding tx']}]}]
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
This is probably covered by our "channel capacity" heuristic which
requires the channel be significant, but best to be explicit and sure.
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
So we know how much counterparty could theoretically steal from us
(msatoshi_to_us - msatoshi_to_us_min) and how much we could
theoretically steal from counterparty (msatoshi_to_us_max -
msatoshi_to_us).
For more piloting goodness.
In particular, the main daemon and subdaemons share the backtrace code,
with hooks for logging.
The daemon hook inserts the io_poll override, which means we no longer
need io_debug.[ch]. Though most daemons don't need it, they still link
against ccan/io, so it's harmess (suggested by @ZmnSCPxj).
This was tested manually to make sure we get backtraces still.
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
I didn't convert all tests: they can still use a standalone context.
It's just marginally more efficient to share the libwally one for all
our daemons which link against it anyway.
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
If we only remember the actions that added channels then we'd restore them when
re-reading the gossip_store, so put a tombstone in there to remember to delete
it. These will be cleared upon re-writing the store since the announcements wont
be written anymore.
Signed-off-by: Christian Decker <decker.christian@gmail.com>
This is necessary since we have onchaind tell us about the
their_unilateral/to_us output, after it is already in a block.
Signed-off-by: Christian Decker <decker.christian@gmail.com>
We don't handle \u, since we assume everyone sane is using UTF-8. We'd
still have to reject '\u0000' and maybe other weird cases if we did.
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
Since we may want to extend the on-disk format by adding custom information we
may as well just go the extra mile and reuse the serialization primitives we
already have.
Signed-off-by: Christian Decker <decker.christian@gmail.com>
But only if we're actually going to change the feerate, otherwise we'd
log every time.
Suggested-by: @ZmnSCPxj
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
Naively, this would be 250 satoshi per sipa, but it's not since bitcoind's
fee calculation was not rewritten to deal with weight, but instead bolted
on using vbytes.
The resulting calculations made me cry; I dried my tears on the thorns
of BUILD_ASSERT (I know that makes no sense, but bear with me here as I'm
trying not to swear at my bitcoind colleagues right now).
Fixes: #1194
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
This bug is a classic case of being lazy:
1. peer_accept_channel() allocated its return off the input message,
rather than taking an explicit allocation context. This concealed the
lifetime nature of the return.
2. The context for sanitize_error was the error itself, rather than the
more obvious tmpctx (connect_failed does not take).
The global tmpctx removes the "efficiency" excuse for grabbing a random
object to use as context, and is also nice and explicit.
All-the-hard-work-by: @ZmnSCPxj
This fixes the root cause of https://github.com/ElementsProject/lightning/issues/1212
where we deleted the payment because we wanted to retry, then retry failed
so we had an (old) HTLC without a matching payment. We then fed that
HTLC to onchaind, which tells us it's missing, and we try to fail the
payment and deref a NULL pointer.
Fixes: #1212
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
We would `block_map_add` inside `add_tip`, but we never
`block_map_del` inside `remove_tip`, which is dangerous as
we actually `tal_free` the block inside `remove_tip`.
Our CI did not reliably trap this problem since block
hashes are random and rerunning the `test_blockchaintrack`
often passed spuriously.
If we're going to simply take() a pointer, don't allocate it off a random
object. Using NULL makes our intent clear, particularly with allocating
packets we're going to take() onto a queue.
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
I did a brief audit of tmpctx uses, and we do leak them in various
corner cases. Fortunely, all our daemons are based on some kind of
I/O loop, so it's fairly easy to clean a global tmpctx at that point.
This makes things a bit neater, and slightly more efficient, but also
clearer: I avoided creating a tmpctx in a few places because I didn't
want to add another allocation. With that penalty removed, I can use
it more freely and hopefully write clearer code.
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
Needed for particular race condition: client calls `sendpay` with
intent to call `waitsendpay` later to get information, but the
payment fails after `sendpay` returns but before client can invoke
`waitsendpay`.
This lets client know of information even if it manages to invoke
`waitsendpay` "late".
As we add more features, the current code is insufficient.
1. Keep an array of single feature bits, for easy switching on and off.
2. Create feature_offered() which checks for both compulsory and optional
variants.
3. Invert requires_unsupported_features() and unsupported_features()
which tend to be double-negative, all_supported_features() and
features_supported().
4. Move single feature definition from wire/peer_wire.h to common/features.h.
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
Improve usability in these scenarios:
* bitcoin-cli not available in PATH and/or bitcoind not running
* bitcoin-cli available in PATH but bitcoind is not running
This simplifies things, and means it's always in the database. Our
previous approach to creating it on the fly had holes when it was
created for onchaind, causing us to use another every time we
restarted.
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
Our testing also reveals a bug: we start lightningd and shut it down
before fully processing the blockchain, so we don't set
last_processed_block. Fix that by setting it immediately once we have
a block: worst case it goes backwards a little.
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
In general, it is true that accessors should take const and discard it,
but chainparams is *always* const.
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
Transaction filters are strongly related to the wallet, this move just
makes it a bit more explicit.
Signed-off-by: Christian Decker <decker.christian@gmail.com>
I leave all the now-unnecessary accessors in place to avoid churn, but
the use of bitfields has been more pain than help.
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
Let's have a simple function that allows us to check whether a channel
still has an HTLC open.
Signed-off-by: Christian Decker <decker.christian@gmail.com>
The following command can be used to trigger these messages:
```
$ timeout 0.01 cli/lightning-cli connect [insert-syntactically-valid-peer-id-here] 123.123.123.123 # where 123.123.123.123 is unreachable
```
These error codes will cause `pay` to retry, so `pay` will never
actually report those error codes.
Those error codes will only get reported at the `sendpay` level.
* Modifies invoice command to have the following format
invoice <msatoshi> <label> <desc> <?expiry> <?fallbackaddr>
* Adds support for Segwit bcrt1 addresses for withdraw
* Add test case for fallback address in invoice creation
* Create a common json_tok_address_scriptpubkey to be used
by invoice and withdraw commands.
There are two recurring calls: the estimatefee call and the
getblockcount call. Currently we simply discard them on error, the
timer isn't rearmed.
This should fix a number of cases where bitcoind has an intermittant
failure and lightningd simply stops collecting blocks.
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
In particular, process_getblockhash() exits with status 8 when the block
number is out of range, which is expected. Any other exit status should
be treated as a spurious error.
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
The billboard is now far more useful to tell what's going on, and this
gets us closer to a state == owner mapping.
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
Use NULL on the callback to mean "clear the slot", and call it.
We have do this in two places: the old daemon might die, or the new
daemon might start first.
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
Each state (effectively, each daemon) has two slots: a permanent slot
if something permanent happens (usually, a failure), and a transient
slot which summarizes what's happening right now.
Uncommitted channels only have a transient slot, by their very nature.
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
In #1018 we got no information, except "Internal error". At least
if we tell the other side what went wrong, we're more likely to get
an answer.
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
If the source channel is onchain, we try to send a message to onchaind
which (1) doesn't care, (2) doesn't take a channel_fail_htlc msg, and
(3) causes us to crash in subd.c:
assert(!strstarts(sd->msgname(fromwire_peektype(msg_out)), "INVALID"));
Fixes: #821
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
We always hand in "NULL" (which means use tal_len on the msg), except
for two places which do that manually for no good reason.
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
We also fold opening_got_hsm_funding_sig() into the caller; it was
previously a callback before we decided to always use the HSM
synchronously.
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
When we clear and recreate ltmp, we attach it to whatever logbook it's on.
This, of course, is fraught, since it may be freed.
We could make it NULL-parented, but that makes YA special-case to free
when we exit (we try to keep valgrind happy by freeing everything). So
since the first log_book is the permanent one attached to lightningd,
just keep that parent when we re-build it after use.
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
Because peer_failed would previously drop the connection, we had a
special 'negotiation_failed' message which made the master hand it
back to gossipd. We don't need that any more.
This also meant we no longer need a special hook in read_peer_msg
for openingd to send this message.
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
Several daemons (onchaind, hsm) want to use the status messages, but
don't communicate with peers. The coming changes made them drag in
more code they didn't need, so instead we have a different
non-overlapping type.
We combine the status_received_errmsg and status_sent_errmsg
into a single status_peer_error, with the presence or not of the
'error_for_them' field indicating direction.
We also rename status_fatal_connection_lost() to
peer_failed_connection_lost() to fit in.
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
And now we can finally do the db upgrade to remove any OPENINGD
channels once, since we never put them back.
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
It's giant, but it's encapsulating at least. It is called from the wallet
code when loading channels, or from the opening code when converting
an uncommitted_channel.
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
Now any struct channel is a genuine channel, the following fields are
always valid:
1. funding_txid: doesn't need to be a pointer.
2. our_msatoshi: doesn't need to be a pointer.
3. last_sig: doesn't need to be a pointer.
4. channel_info: doesn't need to be a pointer.
In addition, 'last_tx' is always valid.
The main effect is to remove a whole heap of branches from the wallet code.
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
Each peer can have one 'uncommitted' channel, which is in the process
of opening. This is used for openingd, and then on return we convert
it into a full-fledged struct channel and commit it into the database.
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
This means the caller needs to supply an explicit log to base the
subd log on, and also a callback for error handling.
The callback is kind of ugly, but it gets reworked towards the end
of this series.
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
Once we rely on the logbook outlasting the peer, we can't refer to the
peer from the logbook function:
Valgrind error file: valgrind-errors.26567
==26567== Invalid read of size 8
==26567== at 0x126297: copy_to_parent_log (peer_control.c:690)
==26567== by 0x11C06B: maybe_print (log.c:253)
==26567== by 0x11C145: logv (log.c:270)
==26567== by 0x11C448: log_ (log.c:319)
==26567== by 0x132951: destroy_subd (subd.c:537)
==26567== by 0x179C19: notify (tal.c:240)
==26567== by 0x17A0CE: del_tree (tal.c:400)
==26567== by 0x17A120: del_tree (tal.c:410)
==26567== by 0x17A4ED: tal_free (tal.c:509)
==26567== by 0x16DEB5: io_close (io.c:443)
==26567== by 0x1328BC: sd_msg_read (subd.c:516)
==26567== by 0x1320AC: read_fds (subd.c:328)
==26567== Address 0x6cf9ca0 is 48 bytes inside a block of size 216 free'd
==26567== at 0x4C30D3B: free (in /usr/lib/valgrind/vgpreload_memcheck-amd64-linux.so)
==26567== by 0x17A1A9: del_tree (tal.c:421)
==26567== by 0x17A4ED: tal_free (tal.c:509)
==26567== by 0x124B6C: delete_peer (peer_control.c:180)
==26567== by 0x12B369: destroy_uncommitted_channel (peer_control.c:2505)
==26567== by 0x179C19: notify (tal.c:240)
==26567== by 0x17A0CE: del_tree (tal.c:400)
==26567== by 0x17A4ED: tal_free (tal.c:509)
==26567== by 0x12B31E: opening_channel_errmsg (peer_control.c:2496)
==26567== by 0x13243A: handle_peer_error (subd.c:407)
==26567== by 0x1326E4: sd_msg_read (subd.c:472)
==26567== by 0x1320AC: read_fds (subd.c:328)
==26567== Block was alloc'd at
==26567== at 0x4C2FB0F: malloc (in /usr/lib/valgrind/vgpreload_memcheck-amd64-linux.so)
==26567== by 0x179C83: allocate (tal.c:250)
==26567== by 0x17A250: tal_alloc_ (tal.c:448)
==26567== by 0x124950: new_peer (peer_control.c:151)
==26567== by 0x12B3EC: new_uncommitted_channel (peer_control.c:2521)
==26567== by 0x12B5C5: peer_accept_channel (peer_control.c:2569)
==26567== by 0x126099: peer_sent_nongossip (peer_control.c:641)
==26567== by 0x113B28: peer_nongossip (gossip_control.c:55)
==26567== by 0x113D9D: gossip_msg (gossip_control.c:144)
==26567== by 0x132783: sd_msg_read (subd.c:487)
==26567== by 0x1320AC: read_fds (subd.c:328)
==26567== by 0x16D1FE: next_plan (io.c:59)
==26567==
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
BackgroundL Each log has a log_book: many logs can share the same one,
as each one can have a separate prefix.
Testing tickled a bug at the end of this series, where subd was
logging to the peer's log_book on shutdown, but the peer was already
freed. We've already had issues with logging while lightningd is
shutting down.
There are times when reference counting really is the right answer,
this seems to be one of them: the 'struct log' share the 'struct
log_book' and the last 'struct log' cleans it up.
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
We derive the seed from this, so it needs to be unique, but using
rowid forced us to put the channel into the db early, before it
was ready.
Instead, use a counter to ensure uniqueness, initialized when we load
existing peers. This doesn't need to touch the database at all.
As we now have only two places where the channel is committed (the
funder and fundee paths), so we create a new explicit
'wallet_channel_insert()' function: 'wallet_channel_save()' now just
updates.
Note that this also fixes some weirdness in
wallet_channels_load_active: we strangely avoided loading channels in
CLOSINGD_COMPLETE (which fortunately was a transient state, so
unlikely anyone hit this). Note that since the lines above already
delete all the OPENINGD channels, we now simply load them all.
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
Adds a simple check that compares genesis-blockhashes from the
chainparams against the blockhash that the wallet was created
with. The wallet is network specific, so mixing is always a bad idea.
Signed-off-by: Christian Decker <decker.christian@gmail.com>
We now keep a list of commands for the jcon instead of a simple
'current' pointer: the assertions become a bit more complex, but
the rest is fairly mechanical.
Fixes: #1007
Reported-by: @ZmnSCPxj
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
We do a complicated dance because we don't know the current block
height before setting up the topology.
If we're starting at a particular block, we want to go back 100 blocks
before that to cover any reorgs.
If we're not (fresh startup), we still want to go back 100 blocks
because we don't bother handling a reorg which removes all the blocks
we know.
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
With fallback depending on chainparams: this means the first upgrade
will be slow, but after that it'll be fast.
Fixes: #990
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
We error out for all kinds of reasons early on (eg. bitcoind down),
and printing a backtrace for them is pretty confusing.
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
Includes closing off stdout and stderr. We don't do it directly in the
arg parser, as we want to interact normally (eg with other errors) before
we turn off stdout/stderr.
Fixes: #986
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
This interacts badly with --daemon (next patch) which then tries to
reap a child it didn't create, which took me a couple of hours to
figure out.
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
Once we read a command, we are supposed to io_wait until it finishes.
However, we are actually woken in two places: when it's complete
(which is correct), and when it's written out (which is wrong).
We don't care when it's written out, only when it's finished:
refactor to make json_done() free and NULL the old ->current,
rather than have the callers do it. Now it's clear that it's
ready for both new output and new input.
Fixes: #934
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
We will have probably failed the others, but either way, don't try to
fulfill an HTLC we've already failed.
Fixes: #394
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
We usually did this, but sometimes they were named after what they did,
rather than what they cleaned up.
There are still a few exceptions:
1. I didn't bother creating destroy_xxx wrappers for htable routines
which already existed.
2. Sometimes destructors really are used for side-effects (eg. to simply
mark that something was freed): these are clearer with boutique names.
3. Generally destructors are static, but they don't need to be: in some
cases we attach a destructor then remove it later, or only attach
to *some* cases. These are best with qualifiers in the destroy_<type>
name.
Suggested-by: @ZmnSCPxj
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
This provides a sanity check that we are in sync, and also keeps the
logic in the program and out of the SQL.
Since the destructor now doesn't clean up the peer, there are some
wider changes to be made when cleaning up. Most notably we create
lots of channels in run-wallet.c and they previously freed the peer:
now we need free the peer explicitly, so we need to free them first.
Suggested-by: @cdecker
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
And return the correct error message for the channel they give, if
they try to re-establish on an error channel.
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
Channels are within the peer structure, but the peer is freed only
when the last channel is freed.
We also implement channel_set_owner() and make peer_set_owner() a temporary
wrapper.
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
Much like the database; peer contains id, address, channel contains
per-channel information. Where we create a channel, we always create
the peer too.
For the moment, peer->log and channel->log coexist side-by-side, to
reduce some of the churn.
Note that this changes the API to dev-forget-channel: if we have more
than one channel, we insist they specify the short-channel-id.
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
This is not connected yet; during the transition, there will be a 1:1
mapping from channel to peer, so we can use channel2peer and peer2channel
to shim between them.
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
Both when we forget about an opening peer, and at startup. We're
going to be relying on this, and the next patch, as we refactor
peer/channel handling to mirror the db.
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
Combining the two was just awkward, so it's clearer to have separate
functions. And we make the lower-level functions do the escaping.
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
The JSON-RPC spec specifies that if the request is unparseable we
should return an error with a NULL id. This is a bit more friendly
than slamming the door in the face.
Signed-off-by: Christian Decker <decker.christian@gmail.com>
As reported by @practicalswift in #945 it is possible to inject
non-printable, or shell escape, characters in a json command, that
will fail to parse and then clear the shell.
Reported-by: @practicalswift
Signed-off-by: Christian Decker <decker.christian@gmail.com>
Now we have wirestring, this is much more natural. And with the
24M length limit, we needn't be so concerned about dumping 64k peer
messages in hex.
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
These are now logically arrays of pointers. This is much more natural,
and gets rid of the horrible utxo array converters.
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
`activate_peer` does little more than wiring up some txwatches and
asking `gossipd` to reconnect to the peer. If the peer manages to
reconnect before we activate then we would crash.
This just changes the `assert` causing the crash into a conditional
whether we need to reconnect or not.
Signed-off-by: Christian Decker <decker.christian@gmail.com>
Due to the broadcast failure quite a few users are reporting channels
stuck in awaiting lockin. This commit adds a `dev-forget-channel`
command that checks whether the funding outpoint is in the UTXO, and
forgets the channel if not. The UTXO check can be overridden with the
`force` parameter, but that is dangerous.
Signed-off-by: Christian Decker <decker.christian@gmail.com>
We were sideloading it, which is awkward, now it's a field that we can
actually use in the code.
Signed-off-by: Christian Decker <decker.christian@gmail.com>
We currently don't handle LOG_IO properly, and we turn it into a string
before handing it to the ->print function, which makes it ugly for
the case where we're using copy_to_parent_log, and also means in
that case we lose *what peer* the IO is coming from.
Now, we handle the io as a separate arg, which is much neater.
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
Logging often gets called in error paths, so this is just good hygiene.
Also, log_io does this already.
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
libunwind does not accept a NULL parameter for the error callback. It
will simply call into the NULL pointer. So add an error callback.
This makes the crash output somewhat more sensible on FreeBSD, where
there is no libunwind stack trace available:
2018-02-05T20:24:50.598Z lightningd(75556): error getting backtrace: no stack trace because unwind library not available (0)
Signed-off-by: Wladimir J. van der Laan <laanwj@gmail.com>
Maintaining it was always fraught, since the command could go away
if the JSON RPC died. Most recently, it was broken again on shutdown
(see below).
In future we may allow pay commands to block on previous payments, so
it won't even be a 1:1 mapping. Generalize it: keep commands in a
simple list and do a lookup when a payment fails/succeeds.
Valgrind error file: valgrind-errors.5732
==5732== Invalid read of size 8
==5732== at 0x4149FD: remove_cmd_from_hout (pay.c:292)
==5732== by 0x468BAB: notify (tal.c:237)
==5732== by 0x469077: del_tree (tal.c:400)
==5732== by 0x4690C7: del_tree (tal.c:410)
==5732== by 0x46948A: tal_free (tal.c:509)
==5732== by 0x40F1EA: main (lightningd.c:362)
==5732== Address 0x69df148 is 1,512 bytes inside a block of size 1,544 free'd
==5732== at 0x4C2EDEB: free (in /usr/lib/valgrind/vgpreload_memcheck-amd64-linux.so)
==5732== by 0x469150: del_tree (tal.c:421)
==5732== by 0x46948A: tal_free (tal.c:509)
==5732== by 0x4198F2: free_htlcs (peer_control.c:1281)
==5732== by 0x40EBA9: shutdown_subdaemons (lightningd.c:209)
==5732== by 0x40F1DE: main (lightningd.c:360)
==5732== Block was alloc'd at
==5732== at 0x4C2DB8F: malloc (in /usr/lib/valgrind/vgpreload_memcheck-amd64-linux.so)
==5732== by 0x468C30: allocate (tal.c:250)
==5732== by 0x4691F7: tal_alloc_ (tal.c:448)
==5732== by 0x40A279: new_htlc_out (htlc_end.c:143)
==5732== by 0x41FD64: send_htlc_out (peer_htlcs.c:397)
==5732== by 0x41511C: send_payment (pay.c:388)
==5732== by 0x41589E: json_sendpay (pay.c:513)
==5732== by 0x40D9B1: parse_request (jsonrpc.c:600)
==5732== by 0x40DCAC: read_json (jsonrpc.c:667)
==5732== by 0x45C706: next_plan (io.c:59)
==5732== by 0x45D1DD: do_plan (io.c:387)
==5732== by 0x45D21B: io_ready (io.c:397)
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
This is a transitional patch so we can still close channels cleanly;
for want of a better option, I hooked it into --deprecated-apis.
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
We shouldn't fail negotiation just because they exceeded what we thought
fair: we're better off as long as it's actually <= final commitment fee.
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>