In order to maintain the original essence of the test, we need to clear
the state of missionControl with each attempt, essentially advancing
time between each payment attempt.
In this commit we modify the SendPayment loop to optimize for
time-to-first-payment-success-or-failure. The prior logic would first
attempt to find at least 100 routes to the destination, then
iteratively prune them away as errors were encountered. In this commit,
we modify this approach to instead take a lazy approach: we first find
the current “best” path, attempt to send to that, and if an error
occurs we prune a section of the graph by reporting to missionControl,
then continue.
With this new approach, if the first known path has sufficient
capacity, and is available, then the payment speed is greatly improved
from the PoV of users. Additionally, we avoid the excessive computation
of crawling most of the graph in the k-shortest paths loop. With the
decay on missionControl, all routes will now feed information into the
central knowledge hung, allowing all payments to iteratively find out
the inactive portions of the payment graph.
This commit adds a new system within the ChannelRouter: missionControl.
The purpose of this system to is to act as a shared memory of sorts
between payment sending attempts, recording which edges/vertexes word
or didn’t work. Allowing execution attempts to pass on their iterative
knowledge of the graph to later attempts will reduce the number of
failures encountered, and generally lead to a better UX when sending
payments.
The current capabilities of missionControl are rather limited just to
introduce the new abstraction. Later follow up commits will also add
preferential treatment for reliable nodes, knowledge the impact that
target payments have on unbalancing the payment graph, etc.
This commit fixes a bug that could lead to a deadlock inside bolt db
itself. In a recent commit we allowed a db transaction to be passed
directly into findPath, however, the initial call to graph.ForEachNode
instead passed a _nil_ transaction causing the method itself to create
a _new_ transaction, leading to a deadlock.
We fix this issue by instead re-using the transaction pointer.
This commit modifies the errors that we return within the
handleLocalDispatch method. Rather than returning a regular error, or
simply the matching error code in some instances, we now _always_
return an instance of ForwardingError. This will allow the router to
make more intelligent decisions w.r.t routing HTLC’s as with this
information it will now be able to differentiate errors that occur
within the switch (before sending out the HTLC), from errors that occur
within the HTLC route itself.
This commit adds a new field to the switch’s Config, namely the public
key of the backing lightning node. This field will soon be used to
return more detailed errors messages back to the ChannelRouter itself.
This commit adds a new field to the ForwardingError struct: ExtraMsg.
The purpose of this field is to allow the htlcswitch to tack on
additional error context to ForwardingError messages returned to the L3
router.
This commit fixes an existing bug within the iteration between the
queueHandler and the writeHandler. Under certain scenarios, if the
writeHandler was blocked for a non negligible period of time, then the
queueHandler would enter a very tight spinning loop. This was due to
the fact that the break statement in the inner select loop of the
queueHandler wouldn’t actually break the inner for loop, instead it
would cause the execution logic to re-enter that same select loop,
causing a very tight spin.
In this commit, we fix the issue by adding to things: we now label the
inner select loop so we can break out of it if we detect that the
writeHandler has blocked. Secondly, we introduce a new channel between
the queueHandler and the writeHandler to signal the queueHandler that
the writeHandler has finished processing the last message.
In this commit, we add an idle timer to the readHandler itself. This
will serve to slowly prune away inactive TCP connections as a result of
remote peer being blocked either upon reading or writing to the socket.
Our ping timer interval is 1 minute, so an idle timer interval of 5
minutes seem reasonable.
In this commit we add a new option to lnd, cpuprofile. With this option
we add additional telemetry hooks into and, allowing users to generate
CPU profiling files to measure hot spots within the daemon.
This commit removes another case of unnecessary blockage, by modifying
the sendToPeer method to be fully asynchronous. From the PoV of the
callers that utilize this method currently, there’s no reason to block
until the completion of this method. Additionally, as the graph grows
larger without more intelligent the number of messages sent during
initial dump will start to be prohibitive to waiting for full
completion before proceeding.
In this commit, we make the BroadcastMessage method on the server more
asynchronous by abandoning the two wait groups that it used for
synchronization. It has been observed that a circular waiting loop
between the AuthenticatedGossiper and a peer’s readHandler can cause
the system to dead lock.
By removing this unnecessary synchronization, we avoid the deadlock
case and allow the gossiper itself to no longer block in this scenario.
This commit modifies the path finding logic such that all path finding
is done inside a _single_ database transaction. With this change, we
ensure that we don’t end up possibly creating hundreds of database
transactions slowing down the path finding and payment sending process
all together.
This commit adds basic route pruning in response to HTLC onion errors.
With this new change, the router will now prune routes in response to
HTLC errors, which will reduce the time to payment success, and also
avoid a bunch of unnecessary network traffic.
We now respond to two errors lnwire.FailTemporaryChannelFailure and
lnwire.FailUnknownNextPeer. In response to the first error, we’ll prune
all routes that contain the channel which was unable to be routed over.
In response to the second error we’ll prune all routes that contain the
node which couldn’t be found.
In this commit we modify the newRoute function to also add the source
node to the nextHopMap index. With this addition the indexes will now
allow the router to react based on failures that occur during the
_first_ hop, meaning the channel directly attached to the source node.
This commit adds three new indexes to the Route struct. These indexes
allow a caller to check if a channel is in the route, check if a node
is in the route, query the next node after a target node, and query the
next channel after a target node. The combination of these new indexes
will allow the ChannelRouter to prune away routes from the available
set in response to any received errors.
This commit renames the Deobfuscator interface to ErrorDecrypter and
the Obfuscator interface to ErrorEncrypter. With this rename, the
purpose of these two interfaces are a bit clearer.
Additionally, DecryptError (which was formerly Deobfuscate) now
directly returns an ForwardingError type instead of the
lnwire.FailureMessage.
This commit introduces a new type to the package: ForwardingError. It
wraps an existing lnwire.FailureMessage interface, and also includes
the _source_ of the error message. By including the source of the
message, the router can now prune the set of available routes down in
order to reduce the number of subsequent failures based on the source
of the error and the type of the error itself.
In this commit we modify the main loop within the peerBootstrapper
slightly to check for a sufficient amount of connections, _before_
checking to see if we need to back off the main loop. With this, we
avoid unnecessarily backing off unless an actual error occurs.
This commit adds a tutorial on fuzzing with the go-fuzz library
into the docs folder. It includes an introduction to fuzzing,
setup and installation steps to run go-fuzz with lnd, tips to
generate a valid corpus for use with go-fuzz, and finally it
includes a small explanation of the test harness that was used
to find bugs in lnd.
This commit adds an additional return value to the updateChannel
method. We also now return the original ChannelAnnouncement, this can
be useful as it let’s the caller ensure that the channel announcement
will be broadcast along side the new channel update.
This commit does two things. First we fix a deadlock bug within boltdb
that could arise when the channel router attempted to open a
transaction, while the retransmission loop was attempting to send a
message to the channel router (circular wait). Second, we’ve refactored
out the retransmission into its own function. This allows us to kick
off retransmission as soon as the AuthenticatedGossiper is created.
This commit implements 2-week zombie channel pruning. This means that
every GraphPruneInterval (currently set to one hour), we’ll scan the
channel graph, marking any channels which haven’t had *both* edges
updated in 2 weeks as a “zombie”. During the second pass, all “zombie”
channel are removed from the channel graph all together.
Adding this functionality means we’ll ensure that we maintain a
“healthy” network view, which will cut down on the number of failed
HTLC routing attempts, and also reflect an active portion of the graph.