Since we want to support AMP payment using a different unique payment
identifier (AMP payments don't go to one specific hash), we change the
nomenclature to be Identifier instead of PaymentHash.
We'll let the payment's lifecycle register each shard it's sending with
the ShardTracker, canceling failed shards. This will be the foundation
for correct AMP derivation for each shard we'll send.
We'll use this to keep track of the outstanding shards and which
preimages we are using for each. For now this is a simple map from
attempt ID to hash, but later we'll hide the AMP child derivation behind
this interface.
To distinguish the attempt's unique ID from the overall payment
identifier, we name it attemptID everywhere, and note that the
paymentHash argument won't be the actual payment hash for AMP payments.
If we have processed a terminal state while we're pathfinding
for another shard, the payment loop should not error out on
ErrPaymentTerminal. Instead, it would wait for our shards to
complete then cleanly exit.
Move our more generic terminal check forward so that we only
need to handle a single class of expected errors. This change
is mirrored in our mock, and our reproducing tests are updated
to assert that this move catches both classes of errors we get.
Add an additional stuck-payment case, where our payment gets
a terminal error while it has other htlcs in-flight, and a
shard fails with ErrTerminalPayment. This payment also falls in
our class of expected errors, but is not currently handled. The
mock is updated accordingly, using the same ordering as in our
real RegisterAttempt implementation.
This commit adds a test which demonstrates that payments can
get stuck if we receive a payment failure while we're pathfinding
for another shard, then try to dispatch a shard after we've
recorded a permanent failure. It also updates our mock to
only consider payments with no in-flight htlcs as in-flight,
to more closely represent our actual RegisterAttempt.
This commit adds a step to our payment lifecycle test to add
control over when we find a path for our payment, This is
required for testing race conditions around pathfinding
completing and payment failures being reported.
Update our single shard success case to use a route which
splits the payment amount in half. This change still tests
the case where reveal of the preimage counts as a success,
even if we don't have the full amount. This change is made
to cut down on potential races in this test case. While we
are waiting for collectResultAsync to report a success, the
payment lifecycle will continue trying to dispatch shards.
In the case where we send 1/4 of the payment amount, we
send 1 or 2 more shards, depending on how long collectAsync
takes. Reducing this test to send 1/2 of the payment amount
means that we will always only try one more shard before
waiting for our shard.
This commit updates our mock to more closely follow the behavior of the
switch for mocked calls to GetPaymentResult. As it stands, our tests
send a test-created error from the switch when we want to mock shutdown.
In reality, the switch will close its result channel, so we update this
test to follow that behavior. This matters for the commit that follows,
because we start checking the error our payments return. If we have an
error from the switch, our tests will fail with an error that we do
not encounter in practice.
In our payment lifecycle tests, we have two goroutines that
compete for the lock in our mock control tower: the resumePayment
loop which tries to call RegisterAttempt, and the collectResult
handler which is launched in a goroutine by collectResultAsync
and is responsible for various settle/fail calls.
The order that the lock is acquired by these goroutines is
arbitrary, and can lead to flakes in our tests if the step
that we do not intend to execute first gets the lock (eg,
we want to fail a payment in collectResult, but RegisterAttempt
gets there first). This commit moves contention for this lock
after our mock's various "state driving" channels, so that the
lock will be acquired in the order that the test intends it.
Now that we run each test individually, we don't need to buffer
our mock's channels anymore. This helps to tighten our test loop,
which currently can move on from a step before it's actually
been processed by the mock. This removal ensures that our payment
loop processes each of the test's steps before moving on to the
next once.
Update our payment lifecycle test to run each test case with
a fresh router. This prevents test cases from interacting with
each other. Names are also added for easy debugging.
As is, we don't check that our SendPayment call in
TestRouterPaymentStateMachine completes. This makes it easier
to create malformed tests that just run through steps but leave
the SendPayment call hanging. This commit adds a check that we
have completed our payment to help catch tests like this. We
also remove an unused quit channel.
In this commit, we add strict zombie pruning as a config level param.
This allow us to add the option for those that want a tighter graph, and
not change the default composition of the channel graph for most users
over night.
In addition, we expand the test case slightly by testing that the self
node won't be pruned, but also that if there's a node with only a single
known stale edge, then both variants will prune that edge.
Since zombie pruning can be very slow on some devices (e.g. mobile) it
would stall lnd startup. Since it is not essential for pruning to be
finished for lnd to be functional, we instead delay the initial prune by
30 seconds.
Note that we could also wait for the graphPruneInterval to tick, but
since this is by default 2 hours, it is unlikely that a mobile app will
ever be open that long.
Previously, we would always allow dependent jobs to be processed,
regardless of the result of its parent job's validation. This isn't
correct, as a parent job contains actions necessary to successfully
process a dependent job. A prime example of this can be found within the
AuthenticatedGossiper, where an incoming channel announcement and update
are both processed, but if the channel announcement job fails to
complete, then the gossiper is unable to properly validate the update.
This commit aims to address this by preventing the dependent jobs to
run.
This commit fixes the following potential deadlock situation:
* Pathfinding holds a database lock and tries to obtain a mission control lock
via GetProbability
* ReportPaymentSuccess/ReportPaymentFail holds a mission control lock
and tries to obtain a database lock to store the payment result.
This produces a race condition when reading AssumeChannelValid from a
different goroutine. Instead we isolate the test cases and initial
AssumeChannelValid properly.
This commit reduces the number of concurrent validation operations the
router will perform when fully validating the channel graph. Reports
from several users indicate that GetInfo would hang for several minutes,
which is believed to be caused by attempting to validate massive amounts
of channels in parallel. This commit returns the limit back to its
original state before adding the batched gossip improvements.
We keep the 1000 concurrent validation request limit for
AssumeChannelValid, since we don't fetch blocks in that case. This
allows us to still keep the performance benefits on mobile/low-resource
devices.
In this commit, we thread through the necessary state to allow users to
set a max shard amount. If this value is set, then this'll effectively
serve as a ceiling for all our split attempts. If we need to split,
we'll first try to use `paymentAmt/2`, if that's bigger than
`MaxShardAmt, then we'll use the latter instead.
Ideally in the future we have a dynamic way to automatically set both
the `MaxShardAmt` as well as `MaxParts` for users. Until then exposing
these two new fields will allow us to experiment with setting them
automatically using the RPC interface, and also give users a bit more
control over how we attempt to route payments, akin to coin control for
on-chain payments.
Fixes#4730
All of the other mission control exported functions acquire their locks
immediately, and do not lock in the subsequent unexported functions.
This commit moves the lock up for the report payment functions so that
mission control's config values are covered by this lock, in preparation
for allowing config to be updated at runtime. Moving this lock means
that we will hold the lock for the additional time it takes to store a
single result, AddResult, to the store.
We are going to use the config struct to allow getting and setting
of the mission control config in the commits that follow. Self node
is not something we want to change, so we move it out for better
separation.
* mod: bump btcwallet version to accept db timeout
* btcwallet: add DBTimeOut in config
* kvdb: add database timeout option for bbolt
This commit adds a DBTimeout option in bbolt config. The relevant
functions walletdb.Open/Create are updated to use this config. In
addition, the bolt compacter also applies the new timeout option.
* channeldb: add DBTimeout in db options
This commit adds the DBTimeout option for channeldb. A new unit
test file is created to test the default options. In addition,
the params used in kvdb.Create inside channeldb_test is updated
with a DefaultDBTimeout value.
* contractcourt+routing: use DBTimeout in kvdb
This commit touches multiple test files in contractcourt and routing.
The call of function kvdb.Create and kvdb.Open are now updated with
the new param DBTimeout, using the default value kvdb.DefaultDBTimeout.
* lncfg: add DBTimeout option in db config
The DBTimeout option is added to db config. A new unit test is
added to check the default DB config is created as expected.
* migration: add DBTimeout param in kvdb.Create/kvdb.Open
* keychain: update tests to use DBTimeout param
* htlcswitch+chainreg: add DBTimeout option
* macaroons: support DBTimeout config in creation
This commit adds the DBTimeout during the creation of macaroons.db.
The usage of kvdb.Create and kvdb.Open in its tests are updated with
a timeout value using kvdb.DefaultDBTimeout.
* walletunlocker: add dbTimeout option in UnlockerService
This commit adds a new param, dbTimeout, during the creation of
UnlockerService. This param is then passed to wallet.NewLoader
inside various service calls, specifying a timeout value to be
used when opening the bbolt. In addition, the macaroonService
is also called with this dbTimeout param.
* watchtower/wtdb: add dbTimeout param during creation
This commit adds the dbTimeout param for the creation of both
watchtower.db and wtclient.db.
* multi: add db timeout param for walletdb.Create
This commit adds the db timeout param for the function call
walletdb.Create. It touches only the test files found in chainntnfs,
lnwallet, and routing.
* lnd: pass DBTimeout config to relevant services
This commit enables lnd to pass the DBTimeout config to the following
services/config/functions,
- chainControlConfig
- walletunlocker
- wallet.NewLoader
- macaroons
- watchtower
In addition, the usage of wallet.Create is updated too.
* sample-config: add dbtimeout option
In this commit, we extend the `BuildRoute` method and RPC on the router
sub-server to accept a raw payment address which will be included as
part of an MPP payload for the finla hop. This change actually also
allows users to craft their own MPP paths using BuildRoute+SendToRoute.
Our primary goal however, was to fix some broken itests since we now
require the payAddr to be present for ALL payments other than key send
payments.
This allows for a 1000 different persistent operations to proceed
concurrently. Now that we are batching operations at the db level, the
average number of outstanding requests will be higher since the commit
latency has increased. To compensate, we allow for more outstanding
requests to keep the router busy while batches are constructed.
Similarly as with kvdb.View this commits adds a reset closure to the
kvdb.Update call in order to be able to reset external state if the
underlying db backend needs to retry the transaction.
This commit adds a reset() closure to the kvdb.View function which will
be called before each retry (including the first) of the view
transaction. The reset() closure can be used to reset external state
(eg slices or maps) where the view closure puts intermediate results.
This commit clamps all user-chosen CLTVs in LND to be at least 18, which
is the new conservative value used in the sepc. This minimum is applied
uniformly to forwarding CLTV deltas (via channel updates) as well as
final CLTV deltas for new invoices.
This commit reverts cb4cd49dc8 to bring
back the insufficient local balance failure.
Distinguishing betweeen this failure and a regular "no route" failure
prevents meaningless htlcs from being sent out.
It can happen that the receiver times out some htlcs of the set if it
took to long to complete. But because the sender's mission control is
now updated, it is worth to keep trying to send those shards again.
Modifies the payment session to launch additional pathfinding attempts
for lower amounts. If a single shot payment isn't possible, the goal is
to try to complete the payment using multiple htlcs. In previous
commits, the payment lifecycle has been prepared to deal with
partial-amount routes returned from the payment session. It will query
for additional shards if needed.
Additionally a new rpc payment parameter is added that controls the
maximum number of shards that will be used for the payment.
With mpp it isn't possible anymore for findPath to determine that there
isn't enough local bandwidth. The full payment amount isn't known at
that point.
In a follow-up, this payment outcome can be reintroduced on a higher
level (payment lifecycle).
This commit fixes the inconsistency between the payment state as
reported by routerrpc.SendPayment/routerrpc.TrackPayment and the main
rpc ListPayments call.
In addition to that, payment state changes are now sent out for every
state change. This opens the door to user interfaces giving more
feedback to the user about the payment process. This is especially
interesting for multi-part payments.
We whitelist a set of "expected" errors that can be returned from
RequestRoute, by converting them into a new type noRouteError. For any
other error returned by RequestRoute, we'll now exit immediately.
We add validation making sure we are not trying to register MPP shards
for non-MPP payments, and vice versa. We also add validtion of total
sent amount against payment value, and matching MPP options.
We also add methods for copying Route/Hop, since it is useful to use
for modifying the route amount in the test.
This commit enables MPP sends for SendToRoute, by allowing launching
another payment attempt if the hash is already registered with the
ControlTower.
We also set the total payment amount of of the payment from mpp record,
to indicate that the shard value might be different from the total
payment value.
We only mark non-MPP payments as failed in the database after
encountering a failure, since we might want to try more shards for MPP.
For now this means that MPP sendToRoute payments will be failed
only after a restart has happened.
This commit finally enables MP payments within the payment lifecycle
(used for SendPayment). This is done by letting the loop launch shards
as long as there is value remaining to send, inspecting the outcomes for
the sent shards when the full payment amount has been filled.
The method channeldb.MPPayment.SentAmt() is added to easily look up how
much value we have sent for the payment.
(almost) PURE CODE MOVE
The only code change is to change a few select cases from
case _ <- channel:
to
case <- channel:
to please the linter.
The test is testing the payment lifecycle, so move it to
payment_lifecycle_test.go
This commit redefines how the control tower handles shard and payment
level settles and failures. We now consider the payment in flight as
long it has active shards, or it has no active shards but has not
reached a terminal condition (settle of one of the shards, or a payment
level failure has been encountered).
We also make it possible to settle/fail shards regardless of the payment
level status (since we must allow late shards recording their status
even though we have already settled/failed the payment).
Finally, we make it possible to Fail the payment when it is already
failed. This is to allow multiple concurrent shards that reach terminal
errors to mark the payment failed, without havinng to synchronize.
In preparation for MPP we return the terminal errors recorded with the
control tower. The reason is that we cannot return immediately when a
shard fails for MPP, since there might be more shards in flight that we
must wait for. For that reason we instead mark the payment failed in the
control tower, then return this error when we inspect the payment,
seeing it has been failed and there are no shards in flight.
To move towards how we will handle existing attempt in case of MPP
(collecting their outcome will be done in separate goroutines separate
from the payment loop), we move to collect their outcome first.
To easily fetch HTLCs that are still not resolved, we add the utility
method InFlightHTLCs to channeldb.MPPayment.
Now that SendToRoute is no longer using the payment lifecycle, we move
the max hop check out of the payment shard's launch() method, and return
the error directly, such that it can be handled in SendToRoute.
Now that SendToRoute is no longer using the payment lifecycle, we
remove the error structs and vars used to cache the last encountered
error. For SendToRoute this will now be returned directly after a shard
has failed.
For SendPayment this means that the last error encountered durinng
pathfinding no longer will be returned. All errors encounterd can
instead be inspected from the HTLC list.
Instead of having SendToRoute pull routes from the payment session in
the payment lifecycle, we utilize the new methods on the paymentShard to
launch and collect the result for this single route.
This also let us remove the check for noRouteError, as we will always
have the result from the tried attempt returned. A result of this is
that we can finally remove lastError from the payment lifecycle (see
next commits).
Fetching the final shard result will also be done for calls to
SendToRoute, so we extract this code into a new method.
We move the call to the ControlTower to set the payment level failure
out into the payment loop, as this must be handled differently when
multiple shards are in flight, and for SendToRoute.
Define shardHandler which is a struct holding what is needed to send
attempts along given routes. The reason we define the logic on this
struct instead of the paymentLifecycle is that we later will make
SendToRoute calls not go through the payment lifecycle, but only using
this struct.
The launch shard is responsible for registering the attempt with the
control tower, failing it if the launch fails. Note that it is NOT
responsible for marking the _payment_ failed in case a terminal error is
encountered. This is important since we will later reuse this method for
SendToRoute, where whether to fail the payment cannot be decided on the
shard level.
We replace the cached attempt, and instead use the control tower
(database) to fetch any in-flight attempt. This is done as a
preparation for having multiple attempts in flight.
In addition we remove the cached circuit, as it won't be applicable when
multiple shards are in flight.
Instead of tracking the attemp we consult the database on every
iteration, and pick up any existing attempt. This also let us avoid
having to pass in the existing attempts from the payment loop, as we
just fetch them direclty.
This method is used to fetch a payment and all HTLC attempt that have
been made for that payment. It will both be used to resume inflight
attempts, and to fetch the final outcome of previous attempts.
We also update the the mock control tower to mimic the real control
tower, by letting it track multiple HTLC attempts for a given payment
hash, laying the groundwork for later enabling it for MPP.
The test case's preimage was (mistakenly) overwritten after crafting the
lightning payment, causing the parts of the testcases use the same
preimage causing problems when we are using the payment hash and
preimage in the mock control tower to distinguish paymennts.
In our quest to move calls to the ControlTower into the main payment
lifecycle loop, we move the edge case of a too long route out of
createNewPaymentAttempt.
loop
To prepare for multiple in flight payment attempts, we move
checkpointing the payment attempt out of createNewPaymentAttempt and
into the main payment lifecycle loop.
We'll attempt to move all calls to the DB via the ControlTower into this
loop, so we can more easily handle them in sequence.
active shards
In preparation for doing pathfinding for routes sending a value less
than the total payment amount, we let the payment session take the max
amount to send and the fee limit as arguments to RequestRoute.
This commit moves supplying of the information in the LightningPayment
to the initialization of the paymentSession, away from every call to
RequestRoute.
Instead the paymentSession will store this information internally, as it
doesn't change between payment attempts.
This is done to rid the RequestRoute call of the LightingPayment
argument, as for SendToRoute calls, it is not needed to supply the next
route.
This commit extends the htlc fail info with the full failure reason that
was received over the wire. In a later commit, this info will also be
exposed on the rpc interface. Furthermore it serves as a building block
to make SendToRoute reliable across restarts.
This commit converts the database structure of a payment so that it can
not just store the last htlc attempt, but all attempts that have been
made. This is a preparation for mpp sending.
In addition to that, we now also persist the fail time of an htlc. In a
later commit, the full failure reason will be added as well.
A key change is made to the control tower interface. Previously the
control tower wasn't aware of individual htlc outcomes. The payment
remained in-flight with the latest attempt recorded, but an outcome was
only set when the payment finished. With this commit, the outcome of
every htlc is expected by the control tower and recorded in the
database.
Co-authored-by: Johan T. Halseth <johanth@gmail.com>
To better distinguish payments from HTLCs, we rename the attempt info
struct to HTLCAttemptInfo. We also embed it into the HTLCAttempt struct,
to avoid having to duplicate this information.
The paymentID term is renamed to attemptID.
Adds an integrated routing test of probability extrapolation for untried
channels. The larger part of this commit is mock code to simulate the
Lightning Network.
The difference between this test and the existing pathfinding tests, is that
this test focuses on the feedback loop from result interpretation via
mission control updates and probability estimation back to pathfinding.
Improvements like probability extrapolation were previously only
validated by reasoning, while this setup makes it possible to assert the
improvement in a test and guard it for the future.
Previously we only penalized the outgoing connections of a failing node.
This turned out not to be sufficient, because the next route sometimes
went into the same failing node again to try a different outgoing
connection that wasn't yet known to mission control and therefore not
penalized before.
This shortcut does not work when the destination is a private node. We
also don't have this shortcut for regular payments. This commit
aligns the behavior between SendPayment and QueryRoutes.
The default was increased for the main sendpayment RPC in commit
d3fa9767a9. This commit sets the
same default for QueryRoutes, routerrpc.SendPayment and
router.EstimateRouteFee.
Update the type check used for checking local payment
failures to check on the ClearTextError interface rather
than on the ForwardingError type. This change prepares
for splitting payment errors up into Link and Forwarding
errors.
This commit adds a ClearTextError interface
which is implemented by non-opaque errors that
we know the underlying wire failure message for.
This interface is implemented by ForwardingErrors,
because we can fully decrypt the onion blob to
obtain the underlying failure reason. This interface
will also be implemented by errors which originate
at our node in following commits, because we know
the failure reason when we fail the htlc.
The lnwire interface is un-embedded in the
ForwardingError struct in favour of implementing
this interface. This change is made to protect
against accidental passing of a ForwardingError
to the wire, where the embedded FailureMessage
interface will present as wire failure but
will not serialize properly.
Add a constructor for the creation of forwarding errors.
A special constructor is added for the case where we have
an unknown wire failure, and must set a nil failure message.
Modifies TestMissingFeatureDep and TestDestPaymentAddr to use the test
ctx directly instead of generating a closure and using local state to
modify restrictions.
This commit brings us inline with recent modifications to the spec, that
say we shouldn't pay nodes whose feature vectors signal unknown required
features, and also that we shouldn't route through nodes signaling
unknown required features.
Currently we assert that invoices don't have such features during
decoding, but now that users can specify feature vectors via the rpc
interface, it makes sense to perform this check deeper in call stack.
This will also allow us to remove the check from decoding entirely,
making decodepayreq more useful for debugging.
In this commit, we update the routing package to use the new
`sphinx.NewOnionPacket` method. The new version of this method allows us
to specify _how_ the packet should be filled before it's used to create
a mix-header. This isn't a fundamental change (totally backwards
compatible), instead it plugs a privacy leak that may have revealed to
the destination how long the true route was.
This commit adds success mission control
results for all hops along the route in
a mpp timeout and takes no action for
the final hop along the route. This is a
temporary measure to prevent the default
logic from penalizing the final node while
we decide how to penalize mpp timeouts.
This commit adds a getResolutionFailure function
which returns an appropriate wire failure based
on the outcome of a htlc resolution. It also updates
the MissionControlStore test to ensure that lnd
can handle failures which occur due to mpp timeout.
Also the max hop count check can be removed, because the real bound is
the payload size. By moving the check inside the search loop, we now
also backtrack when we hit the limit.
This commit fixes a potential bug in our test harness, by ensuring that
the constructed node policies are configured _after_ sorting. Currently
the node pubkeys are sorted, but additional parameters (max htlc,
disabled, etc) are applied using the unsorted policies.
Most of the constructors used today use the symmetric channel
constructor, so this shouldn't cause an issue with the majority of our
tests. We recently introduced an asymmetric channel constructor for
which this could have been an issue, however, no known issues were
discovered.
Lastly, we remove the direction from the configuration altogether, and
derive it purely from the final sorting of the pubkeys.
We move up the check for TLV support, since we will later use it to
determine if we can use dependent features, e.g. TLV records and payment
addresses.
This commit creates a wrapper struct, grouping all parameters that
influence the final hop during route construction. This is a preliminary
step for passing in the receiver's invoice feature bits, which will be
used to select an appropriate payment or payload type.
In this commit, we overwrite the final hop's features with either the
destination features or those loaded from the graph fallback. This
ensures that the same features used in pathfinding will be provided to
route construction.
In an earlier commit, we validated the final hop's transitive feature
dependencies, so we also add validation to non-final nodes.
This commit adds an optional PaymentAddr field to the RestrictParams, so
that we can verify the final hop can support it before doing an
expensive round of pathfindig.
In this commit, we fix a bug that prevents us from sending custom
records to nodes that aren't in the graph. Previously we would simply
fail if we were unable to retrieve the node's features.
To remedy, we add the option of supplying the destination's feature bits
into path finding. If present, we will use them directly without
consulting the graph, resolving the original issue. Instead, we will
only consult the graph as a fallback, which will still fail if the node
doesn't exist since the TLV features won't be populated in the empty
feature vector.
Furthermore, this also permits us to provide "virtual features" into the
pathfinding logic, where we make assumptions about what the receiver
supports even if the feature vector isn't actually taken from an
invoice. This can useful in cases like keysend, where we don't have an
invoice, but we can still attempt the payment if we assume the receiver
supports TLV.
This commit allows custom node features to be populated in specific test
instances. For consistency, we auto-populate an empty feature vector for
nodes that have nil feature vectors before writing them to the database.
Previously if a payment was sent with custom records attached, path
finding wouldn't perform a check whether the final node was capable of
receiving custom records in a tlv payload.
This commit prepares for more manipulation of custom records. A list of
tlv.Record types is more difficult to use than the more basic
map[uint64][]byte.
Furthermore fields and variables are renamed to make them more
consistent.