Commit Graph

11629 Commits

Author SHA1 Message Date
Rusty Russell
35d7449259 connectd: initialize peer->conn.
It's only used in one place, but that's enough.

Fixes: #1434
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
2018-08-10 16:15:12 +02:00
Rusty Russell
f8aed1b4b0 pytest: add reconnection stress test.
It sometimes triggers a crash like #1434 (though never under valgrind).

Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
2018-08-10 16:15:12 +02:00
Rusty Russell
fefb7faba7 pytest: try a simple reconnection test.
This passes, but that's OK.

Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
2018-08-10 16:15:12 +02:00
Cryptcoin Junkey
16190805cc Use port 80 instead of 11371 for accessing the keyserver.
Some firewalls may block port 11371.
2018-08-10 15:45:15 +02:00
Rusty Russell
9d8b3a070b pytest: make test_htlc_send_timeout more reliable.
Waiting for three node_announcments isn't always enough, since l2 can
publish two of them (an independent bug).  Do the more Right Thing and
just wait for 30 seconds of no input...

Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
2018-08-10 12:46:45 +02:00
Rusty Russell
65c882ca3a Minor cleanups.
1. connect convenience variable for improved readabilty.
2. a comment explaining that timer is on channel, not HTLC.
3. use modern python style in test_htlc_send_timeout

Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
2018-08-10 12:46:45 +02:00
Rusty Russell
63e4ea17af channeld: don't commit until we've seen recent incoming msg, ping if required.
Now sending a ping makes sense: it should force the other end to send
a reply, unblocking the commitment process.

Note that rather than waiting for a reply, we're actually spinning on
a 100ms loop in this case.  But it's simple and it works.

Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
2018-08-10 12:46:45 +02:00
Rusty Russell
93e445daf5 channeld: send our own pings whenever we indicate we want to send a commitment.
This doesn't do much (though we might get an error before we send the
commitment_signed), but it's infrastructure for the next patch.

Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
2018-08-10 12:46:45 +02:00
Rusty Russell
223cd97c94 lightningd: kill channeld if we added an HTLC and it didn't commit in 30 seconds.
This effectively constrains how long we'll delay an outgoing HTLC.

Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
2018-08-10 12:46:45 +02:00
Rusty Russell
ebd229fb37 pytest: add test for HTLC timeout when a node is unreachable (xfail).
Currently, if we don't realize a TCP connection is down, we almost
certainly don't find out until *after* we're sent the
commitment_signed message, in which case we cannot fail the incoming
HTLC.

This test demonstrates that.  Note the 30 second sleep: we should really
run Travis tests in parallel!

Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
2018-08-10 12:46:45 +02:00
Rusty Russell
86a46cb1d4 channeld: push TCP output on commitment and revocation messages.
These are the really time-critical ones.

Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
2018-08-10 12:46:45 +02:00
Rusty Russell
db22d2366e crypto_sync: sync_crypto_write_no_delay to flush TCP after critical packets.
Avoid that 200ms loss.  We don't want to disable nagle generally,
since it's great for gossip and other traffic; we just want to push at
critical times.

Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
2018-08-10 12:46:45 +02:00
Rusty Russell
71575b2115 ping: no longer a dev_ command.
Fixes: #1407
Suggested-by: conanoc@gmail.com
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
2018-08-10 12:46:45 +02:00
Rusty Russell
be7a27a765 connect: randomize backoff a little.
Since we now fixed the bug where nodes receiving a connection would
try to reconnect to the source IP/port of that connection, we now expose
an issue mentioned by other implementers: we can continually cross over
reconnections unless we add some fuzz.  One second should be sufficient.

Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
2018-08-10 12:46:45 +02:00
Rusty Russell
4f1186c4b1 connectd: iterate through all known addresses for a peer, not just one.
If we have an address hint, we start with that, but we'll use
node_announcement information if required.

Note: we (ab)use the address hint when restoring from the database
or reconnecting, even if the connection was *incoming*.  That meant
that the recipient of a connection would *never* manage to connect out.

We still don't take multiple addresses from the DNS seeds: I assume we
should, since there could be IPv4 and IPv6.

Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
2018-08-10 12:46:45 +02:00
Rusty Russell
74568a1c50 lightningd: peer_start_channeld always returns true; make it void.
It is always true, and we always ignore it.

Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
2018-08-10 12:46:45 +02:00
Rusty Russell
7c856470e2 wallet: add buildtime and runtime assertions on db enums.
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
2018-08-10 12:44:28 +02:00
Rusty Russell
1dde233a48 wallet: put explicit hook in for fatal error testing.
We're currently overriding fatal() with something that actually
returns, which contrasts with its declaration as NORETURN.

This breaks in the next patch which wants a real fatal() in wallet.h.

Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
2018-08-10 12:44:28 +02:00
Rene Pickhardt
8b902729b9 added a test case so that the enum wallet_payment_status which is critical for the communitcation with the data store will not accidentely being changed by some developer
[ Folded "cleaned up the test code" -- RR ]
[ Fixed prototype -- RR ]
2018-08-10 12:44:28 +02:00
Mark Beckwith
bd5bf1f168 Enhanced param parsing
[ Squashed into single commit --RR ]

This adds two new macros, `p_req_tal()` and `p_opt_tal()`. These support
callbacks that take a `struct command *` context.  Example:

	static bool json_tok_label_x(struct command *cmd,
                                      const char *name,
				      const char *buffer,
				      const jsmntok_t *tok,
				      struct json_escaped **label)

The above is taken from the run-param unit test (near the bottom of the diff).
The return value is true on success, or false (and it calls command_fail itself).

We can pretty much remove all remaining usage of `json_tok_tok` in the codebase
with this type of callback.
2018-08-10 02:15:30 +00:00
Christian Decker
d124f22421 travis: Actually build with clang if we want to 2018-08-10 01:31:24 +00:00
Rusty Russell
51c65ebc42 CHANGELOG: add openingd change.
So much code for so little noticable difference, but mention that
`state` (an accidental legacy from before we moved the field into
`channels` long ago) no longer exists.

Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
2018-08-09 19:44:27 +02:00
Rusty Russell
b4e6a0fcad peer_failed: write error message to peer directly.
We currently hand the error back to the master, who then stores it for
future connections and hands it back to another openingd to send and exit.

Just send directly; it's more reliable and simpler.

Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
2018-08-09 19:44:27 +02:00
Rusty Russell
d8d4b19f3a connectd: remove separate address hint message.
Include it as an optional field in the connect_to_peer message (it was
added before we had optional fields).

The only issue is that reconnects want it too, so again connectd hands
it back to master in connectctl_connect_failed.

Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
2018-08-09 19:44:27 +02:00
Rusty Russell
8939a5001b connectd: rely on the master to tell us to reconnect.
connectd tells master about every disconnection, and master knows
whether it's important to reconnect.  Just get the master to invoke a new
connect command if it considers the peer important!

The only twist is timeouts: we don't want to immediately reconnect if
we've failed to connect.  To solve this, connectd passes a 'delaytime'
to the master when a connection fails, and the master passes it back
when it asks for a connection.

Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
2018-08-09 19:44:27 +02:00
Rusty Russell
30f08cc2b0 connectd: always tell master when connection fails/succeeded.
We used to separate implicit connection requests (ie. timed retries
for important peers) and explicit ones, and send a
WIRE_CONNECTCTL_CONNECT_TO_PEER_RESULT for the latter.

In the success case, that's now redundant, since we hand the connected
peer to the master using WIRE_CONNECT_PEER_CONNECTED; we just need a
message for the failure case.  And we might as well tell the master
every failure, so we don't have to distinguish internally.

This also solves a race we had before: connectd would send
WIRE_CONNECTCTL_CONNECT_TO_PEER_RESULT which completes the incoming
JSON connect command, then send WIRE_CONNECT_PEER_CONNECTED.  So
there's a window where the JSON command can return, but the peer isn't
known to lightningd yet.

Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
2018-08-09 19:44:27 +02:00
Rusty Russell
684d60dbda lightningd: don't call connectd if we already know about peer.
The semantic here are that we 'succeed' if we're already connected.

Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
2018-08-09 19:44:27 +02:00
Rusty Russell
8b5c80f42a opening_control.c: make sure we always clean up in error cases.
Especially by closing the file descriptors we were handed!

Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
2018-08-09 19:44:27 +02:00
Rusty Russell
035362e151 openingd: don't exit when we receive an error.
In particular, all opening_read_peer_msg() callers need to know there
was an error (presumably, negotiating) so they can stop, but we should
not exit.

This lets us reenable the final disabled test.

Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
2018-08-09 19:44:27 +02:00
Rusty Russell
174c79acad openingd: tell master if funding failed, but don't exit.
We don't want to exit just because channel parameter negotiation
failed, but we do want to tell the master if it was a channel we were
trying to fund.

Note that lightningd still needs to fail the funding cmd if it gets a
fromwire_opening_fundee (they raced us and won), or an outright
failure.

Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
2018-08-09 19:44:27 +02:00
Rusty Russell
b39ee8bef5 openingd: make remoteconf a non-pointer member.
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
2018-08-09 19:44:27 +02:00
Rusty Russell
909cd4136b openingd: get told if we can't let them open a new channel.
Previously master would fail once the channel has been negotiated,
which is terrible, since the funder will have already broadcast tx.

Now we tell them if we have an active channel, and update if it goes away.

Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
2018-08-09 19:44:27 +02:00
Rusty Russell
9ad2b26224 connectd: remove 'local_peer_state'
We now only have peers during the init handshake, so they're all 'local'.

Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
2018-08-09 19:44:27 +02:00
Rusty Russell
5624afc340 connectd: clean up unused structure fields.
They can be local variables.

Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
2018-08-09 19:44:27 +02:00
Rusty Russell
02966a4857 connectd: remove unused handback APIs and code.
We now simply maintain a pubkey set for connected peers (we only care
if there's a reconnect), not the entire peer structure.

lightningd no longer queries us for getpeers: it knows more than we do
already.

Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
2018-08-09 19:44:27 +02:00
Rusty Russell
e59cbb3e2c pytest: make sure receiving peer's openingd is ready.
There's now a potential race: the source peer connect returns, but in
destination peer the master hasn't read the connect message from
connectd, so the peer isn't in listpeers yet.

(Previously the connection stayed in connectd, so there was no such
window).

This is an occasional issue in a few places.

Note that we take the opportunity to speed up test_disconnectpeer too
while we're there.

Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
2018-08-09 19:44:27 +02:00
Rusty Russell
50f5eb34b4 openingd: take peer before we're opening, wait for explicit funding msg.
Prior to this, lightningd would hand uninteresting peers back to connectd,
which would then return it to lightningd if it sent a non-gossip msg,
or if lightningd asked it to release the peer.

Now connectd hands the peer to lightningd once we've done the init
handshake, which hands it off to openingd.

This is a deep structural change, so we do the minimum here and cleanup
in the following patches.

Lightningd:
1. Remove peer_nongossip handling from connect_control and peer_control.
2. Remove list of outstanding fundchannel command; it was only needed to
   find the race between us asking connectd to release the peer and it
   reconnecting.
3. We can no longer tell if the remote end has started trying to fund a
   channel (until it has succeeded): it's very transitory anyway so not
   worth fixing.
4. We now always have a struct peer, and allocate an uncommitted_channel
   for it, though it may never be used if neither end funds a channel.
5. We start funding on messages for openingd: we can get a funder_reply
   or a fundee, or an error in response to our request to fund a channel.
   so we handle all of them.
6. A new peer_start_openingd() is called after connectd hands us a peer.
7. json_fund_channel just looks through local peers; there are none
   hidden in connectd any more.
8. We sometimes start a new openingd just to send an error message.

Openingd:
1. We always have information we need to accept them funding a channel (in
   the init message).
2. We have to listen for three fds: peer, gossip and master, so we opencode
   the poll.
3. We have an explicit message to start trying to fund a channel.
4. We can be told to send a message in our init message.

Testing:
1. We don't handle some things gracefully yet, so two tests are disabled.
2. 'hand_back_peer .*: now local again' from connectd is no longer a message,
   openingd says 'Handed peer, entering loop' once its managing it.
3. peer['state'] used to be set to 'GOSSIPING' (otherwise this field doesn't
   exist; 'state' is now per-channel.  It doesn't exist at all now.
4. Some tests now need to turn on IO logging in openingd, not connectd.
5. There's a gap between connecting on one node and having connectd on
   the peer hand over the connection to openingd.  Our tests sometimes
   checked getpeers() on the peer, and didn't see anything, so line_graph
   needed updating.

Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
2018-08-09 19:44:27 +02:00
Rusty Russell
5cd72c9620 connectd: explicitly log whether connection is IN or OUT.
Useful for debugging: it wasn't immediately obvious from the logs
which side was spuriously reconnecting.

Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
2018-08-09 19:44:27 +02:00
Rusty Russell
5135d3ad7d pytest: make sure we truncate disconnect file for blackhole processes.
In particular, I found lightning_openingd processes after running
tests.  When we use the dev_disconnect blackhole '0' option, they
stick around until the dev_disconnect file is truncated (there is only
so much you can do with only a file descriptor), so let's do that.

Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
2018-08-09 19:44:27 +02:00
Rusty Russell
8668b0028e pytest: make sure both sides of channel are ready before returning from fund_channel
The following changes revealed this race, where expecting listchannels()
to contain two channels immediately after fund_channel() was racy.

We also derive the short_channel_id first, so we can search logs for the
exact messages.

Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
2018-08-09 19:44:27 +02:00
Rusty Russell
329270525c pytest: only use dev-allow-localhost when needed.
The next patches get better at reconecting, so if we use dev-allow-localhost
nodes can often find each other and reconnect before shutting down; only
use that option where we actually need it.

Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
2018-08-09 19:44:27 +02:00
Ronald Mannak
e47ac33a37 Fix readme docker-compose port setting 2018-08-09 13:13:38 +02:00
Rusty Russell
fedcfd661f pytest: hand 'True' to decoderawtransaction so it doesn't get confused.
This explains the very-very occasional issue we had parsing (hence the
random-looking fixes!).  The decoderawtransaction heuristic sometimes
thinks it's a zero-input tx, not a segwit tx.  Setting 'iswitness' to
true makes it reliable.

Here's the example I finally caught:

```
rusty$ bitcoin-cli decoderawtransaction  0200000000010180f80017ceb208d84cd5be0d4e21c1acb91798c55ada6541f7633a2739453b4e0100000000ffffffff0269300f0000000000160014774b1c651a1b409213057783547e2bd37a71731240420f00000000002200205f743123f9584a76058bac1142ec2bc6c60b4b2af1d3145e74418d41ae51009e02483045022100ba65a905cf4ebbb9728dc682fcf17cb73ade0ca224729a1878f689a8afa9a49e02206f323f224c5171d170aafb8ff57e2761411a27dea304ac8f9a3663c456d21f3e012102225ce166e84b3833d9f620863b4e713099de616f559e8768f44ff674054bb07d00000000
{
  "txid": "3b8b78c18d30036f93b10a67eb8731325927fb046be969d24075e5b2e1e66e07",
  "hash": "3b8b78c18d30036f93b10a67eb8731325927fb046be969d24075e5b2e1e66e07",
  "version": 2,
  "size": 235,
  "vsize": 235,
  "locktime": 0,
  "vin": [
  ],
  "vout": [
    {
      "value": 6267898963.53775617,
      "n": 0,
      "scriptPubKey": {
        "asm": "be0d4e21c1acb91798c55ada6541f7633a2739453b4e0100000000ffffffff0269300f0000000000160014774b1c651a1b409213057783547e2bd37a71731240420f00000000002200205f743123f9584a76058bac1142ec2bc6c60b4b2af1d3145e74418d41ae51009e02483045022100ba65a905cf4ebbb9728dc682fcf17cb73ade0ca224729a1878f689a8afa9a49e02206f323f224c5171d170aafb8ff57e2761411a27dea304ac8f9a3663c456d21f3e012102225ce166e84b3833d9f620863b4e713099de616f559e8768f44ff674054bb0 OP_TUCK",
        "hex": "4cd5be0d4e21c1acb91798c55ada6541f7633a2739453b4e0100000000ffffffff0269300f0000000000160014774b1c651a1b409213057783547e2bd37a71731240420f00000000002200205f743123f9584a76058bac1142ec2bc6c60b4b2af1d3145e74418d41ae51009e02483045022100ba65a905cf4ebbb9728dc682fcf17cb73ade0ca224729a1878f689a8afa9a49e02206f323f224c5171d170aafb8ff57e2761411a27dea304ac8f9a3663c456d21f3e012102225ce166e84b3833d9f620863b4e713099de616f559e8768f44ff674054bb07d",
        "type": "nonstandard"
      }
    }
  ]
}
rusty$ bitcoin-cli decoderawtransaction  0200000000010180f80017ceb208d84cd5be0d4e21c1acb91798c55ada6541f7633a2739453b4e0100000000ffffffff0269300f0000000000160014774b1c651a1b409213057783547e2bd37a71731240420f00000000002200205f743123f9584a76058bac1142ec2bc6c60b4b2af1d3145e74418d41ae51009e02483045022100ba65a905cf4ebbb9728dc682fcf17cb73ade0ca224729a1878f689a8afa9a49e02206f323f224c5171d170aafb8ff57e2761411a27dea304ac8f9a3663c456d21f3e012102225ce166e84b3833d9f620863b4e713099de616f559e8768f44ff674054bb07d00000000 true
{
  "txid": "d1f0e478ada951d4ee2d952a526a90cda181da6226980d69b345f644ed57a05d",
  "hash": "3b8b78c18d30036f93b10a67eb8731325927fb046be969d24075e5b2e1e66e07",
  "version": 2,
  "size": 235,
  "vsize": 153,
  "locktime": 0,
  "vin": [
    {
      "txid": "4e3b4539273a63f74165da5ac59817b9acc1214e0dbed54cd808b2ce1700f880",
      "vout": 1,
      "scriptSig": {
        "asm": "",
        "hex": ""
      },
      "txinwitness": [
        "3045022100ba65a905cf4ebbb9728dc682fcf17cb73ade0ca224729a1878f689a8afa9a49e02206f323f224c5171d170aafb8ff57e2761411a27dea304ac8f9a3663c456d21f3e01",
        "02225ce166e84b3833d9f620863b4e713099de616f559e8768f44ff674054bb07d"
      ],
      "sequence": 4294967295
    }
  ],
  "vout": [
    {
      "value": 0.00995433,
      "n": 0,
      "scriptPubKey": {
        "asm": "0 774b1c651a1b409213057783547e2bd37a717312",
        "hex": "0014774b1c651a1b409213057783547e2bd37a717312",
        "reqSigs": 1,
        "type": "witness_v0_keyhash",
        "addresses": [
          "bc1qwa93ceg6rdqfyyc9w7p4gl3t6da8zucjnugke0"
        ]
      }
    },
    {
      "value": 0.01000000,
      "n": 1,
      "scriptPubKey": {
        "asm": "0 5f743123f9584a76058bac1142ec2bc6c60b4b2af1d3145e74418d41ae51009e",
        "hex": "00205f743123f9584a76058bac1142ec2bc6c60b4b2af1d3145e74418d41ae51009e",
        "reqSigs": 1,
        "type": "witness_v0_scripthash",
        "addresses": [
          "bc1qta6rzgletp98vpvt4sg59mptcmrqkje278f3ghn5gxx5rtj3qz0qgkydgs"
        ]
      }
    }
  ]
}

Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
2018-08-09 13:10:39 +02:00
Rusty Russell
58d090c3c2 pytest: fix flaky test.
Saw this in Travis: technically we return from the dev_set_max_scids...
cmd after sending it to gossipd, but we should wait for it to log.
Adding an internal reply message for a dev command seems overkill.

Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
2018-08-09 12:39:59 +02:00
Rusty Russell
60b95d4764 pylightning: add method and payload as proper fields.
They're much more useful being programatically-accessible, AFAICT.

The string stays the same so they're backwards compatible.

Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
2018-08-08 16:05:58 +02:00
Rusty Russell
d8a6028214 connectd: fix binding to same port on IPv4 and IPv6.
1. If the IPv6 address was public, that changed the wireaddr and thus the ipv4 bind
   would not be to a wildcard and would fail.
2. Binding two fds to the same port on both wildcard IPv4 and IPv6 succeeds; we only
   fail when we try to listen, so allow error at this point.

For some reason this triggered on my digital ocean machine.

Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
2018-08-08 15:04:06 +02:00
Rusty Russell
17f7f50814 wireaddr: correctly parse ':portnum' (meaning IPv4 and IPv6)
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
2018-08-08 15:04:06 +02:00
Corné Plooy
f1df9dfa4d lightning-cli: consider empty inputs to be strings 2018-08-07 13:47:07 +02:00
Christian Decker
2a14a98ead pytest: Migrate test_withdraw to use LightningNode.db_query
This was causing lock issues in some cases.
2018-08-07 00:54:19 +00:00
Christian Decker
90f74907f9 pytest: Use the file object and don't use print without line-endings 2018-08-07 00:54:19 +00:00