We'll let the payment's lifecycle register each shard it's sending with
the ShardTracker, canceling failed shards. This will be the foundation
for correct AMP derivation for each shard we'll send.
We'll use this to keep track of the outstanding shards and which
preimages we are using for each. For now this is a simple map from
attempt ID to hash, but later we'll hide the AMP child derivation behind
this interface.
To distinguish the attempt's unique ID from the overall payment
identifier, we name it attemptID everywhere, and note that the
paymentHash argument won't be the actual payment hash for AMP payments.
If we have processed a terminal state while we're pathfinding
for another shard, the payment loop should not error out on
ErrPaymentTerminal. Instead, it would wait for our shards to
complete then cleanly exit.
Move our more generic terminal check forward so that we only
need to handle a single class of expected errors. This change
is mirrored in our mock, and our reproducing tests are updated
to assert that this move catches both classes of errors we get.
Add an additional stuck-payment case, where our payment gets
a terminal error while it has other htlcs in-flight, and a
shard fails with ErrTerminalPayment. This payment also falls in
our class of expected errors, but is not currently handled. The
mock is updated accordingly, using the same ordering as in our
real RegisterAttempt implementation.
This commit adds a test which demonstrates that payments can
get stuck if we receive a payment failure while we're pathfinding
for another shard, then try to dispatch a shard after we've
recorded a permanent failure. It also updates our mock to
only consider payments with no in-flight htlcs as in-flight,
to more closely represent our actual RegisterAttempt.
This commit adds a step to our payment lifecycle test to add
control over when we find a path for our payment, This is
required for testing race conditions around pathfinding
completing and payment failures being reported.
Update our single shard success case to use a route which
splits the payment amount in half. This change still tests
the case where reveal of the preimage counts as a success,
even if we don't have the full amount. This change is made
to cut down on potential races in this test case. While we
are waiting for collectResultAsync to report a success, the
payment lifecycle will continue trying to dispatch shards.
In the case where we send 1/4 of the payment amount, we
send 1 or 2 more shards, depending on how long collectAsync
takes. Reducing this test to send 1/2 of the payment amount
means that we will always only try one more shard before
waiting for our shard.
This commit updates our mock to more closely follow the behavior of the
switch for mocked calls to GetPaymentResult. As it stands, our tests
send a test-created error from the switch when we want to mock shutdown.
In reality, the switch will close its result channel, so we update this
test to follow that behavior. This matters for the commit that follows,
because we start checking the error our payments return. If we have an
error from the switch, our tests will fail with an error that we do
not encounter in practice.
In our payment lifecycle tests, we have two goroutines that
compete for the lock in our mock control tower: the resumePayment
loop which tries to call RegisterAttempt, and the collectResult
handler which is launched in a goroutine by collectResultAsync
and is responsible for various settle/fail calls.
The order that the lock is acquired by these goroutines is
arbitrary, and can lead to flakes in our tests if the step
that we do not intend to execute first gets the lock (eg,
we want to fail a payment in collectResult, but RegisterAttempt
gets there first). This commit moves contention for this lock
after our mock's various "state driving" channels, so that the
lock will be acquired in the order that the test intends it.
Now that we run each test individually, we don't need to buffer
our mock's channels anymore. This helps to tighten our test loop,
which currently can move on from a step before it's actually
been processed by the mock. This removal ensures that our payment
loop processes each of the test's steps before moving on to the
next once.
Update our payment lifecycle test to run each test case with
a fresh router. This prevents test cases from interacting with
each other. Names are also added for easy debugging.
As is, we don't check that our SendPayment call in
TestRouterPaymentStateMachine completes. This makes it easier
to create malformed tests that just run through steps but leave
the SendPayment call hanging. This commit adds a check that we
have completed our payment to help catch tests like this. We
also remove an unused quit channel.
In this commit, we add strict zombie pruning as a config level param.
This allow us to add the option for those that want a tighter graph, and
not change the default composition of the channel graph for most users
over night.
In addition, we expand the test case slightly by testing that the self
node won't be pruned, but also that if there's a node with only a single
known stale edge, then both variants will prune that edge.
Since zombie pruning can be very slow on some devices (e.g. mobile) it
would stall lnd startup. Since it is not essential for pruning to be
finished for lnd to be functional, we instead delay the initial prune by
30 seconds.
Note that we could also wait for the graphPruneInterval to tick, but
since this is by default 2 hours, it is unlikely that a mobile app will
ever be open that long.
Previously, we would always allow dependent jobs to be processed,
regardless of the result of its parent job's validation. This isn't
correct, as a parent job contains actions necessary to successfully
process a dependent job. A prime example of this can be found within the
AuthenticatedGossiper, where an incoming channel announcement and update
are both processed, but if the channel announcement job fails to
complete, then the gossiper is unable to properly validate the update.
This commit aims to address this by preventing the dependent jobs to
run.
This commit fixes the following potential deadlock situation:
* Pathfinding holds a database lock and tries to obtain a mission control lock
via GetProbability
* ReportPaymentSuccess/ReportPaymentFail holds a mission control lock
and tries to obtain a database lock to store the payment result.
This produces a race condition when reading AssumeChannelValid from a
different goroutine. Instead we isolate the test cases and initial
AssumeChannelValid properly.
This commit reduces the number of concurrent validation operations the
router will perform when fully validating the channel graph. Reports
from several users indicate that GetInfo would hang for several minutes,
which is believed to be caused by attempting to validate massive amounts
of channels in parallel. This commit returns the limit back to its
original state before adding the batched gossip improvements.
We keep the 1000 concurrent validation request limit for
AssumeChannelValid, since we don't fetch blocks in that case. This
allows us to still keep the performance benefits on mobile/low-resource
devices.
In this commit, we thread through the necessary state to allow users to
set a max shard amount. If this value is set, then this'll effectively
serve as a ceiling for all our split attempts. If we need to split,
we'll first try to use `paymentAmt/2`, if that's bigger than
`MaxShardAmt, then we'll use the latter instead.
Ideally in the future we have a dynamic way to automatically set both
the `MaxShardAmt` as well as `MaxParts` for users. Until then exposing
these two new fields will allow us to experiment with setting them
automatically using the RPC interface, and also give users a bit more
control over how we attempt to route payments, akin to coin control for
on-chain payments.
Fixes#4730
All of the other mission control exported functions acquire their locks
immediately, and do not lock in the subsequent unexported functions.
This commit moves the lock up for the report payment functions so that
mission control's config values are covered by this lock, in preparation
for allowing config to be updated at runtime. Moving this lock means
that we will hold the lock for the additional time it takes to store a
single result, AddResult, to the store.
We are going to use the config struct to allow getting and setting
of the mission control config in the commits that follow. Self node
is not something we want to change, so we move it out for better
separation.
* mod: bump btcwallet version to accept db timeout
* btcwallet: add DBTimeOut in config
* kvdb: add database timeout option for bbolt
This commit adds a DBTimeout option in bbolt config. The relevant
functions walletdb.Open/Create are updated to use this config. In
addition, the bolt compacter also applies the new timeout option.
* channeldb: add DBTimeout in db options
This commit adds the DBTimeout option for channeldb. A new unit
test file is created to test the default options. In addition,
the params used in kvdb.Create inside channeldb_test is updated
with a DefaultDBTimeout value.
* contractcourt+routing: use DBTimeout in kvdb
This commit touches multiple test files in contractcourt and routing.
The call of function kvdb.Create and kvdb.Open are now updated with
the new param DBTimeout, using the default value kvdb.DefaultDBTimeout.
* lncfg: add DBTimeout option in db config
The DBTimeout option is added to db config. A new unit test is
added to check the default DB config is created as expected.
* migration: add DBTimeout param in kvdb.Create/kvdb.Open
* keychain: update tests to use DBTimeout param
* htlcswitch+chainreg: add DBTimeout option
* macaroons: support DBTimeout config in creation
This commit adds the DBTimeout during the creation of macaroons.db.
The usage of kvdb.Create and kvdb.Open in its tests are updated with
a timeout value using kvdb.DefaultDBTimeout.
* walletunlocker: add dbTimeout option in UnlockerService
This commit adds a new param, dbTimeout, during the creation of
UnlockerService. This param is then passed to wallet.NewLoader
inside various service calls, specifying a timeout value to be
used when opening the bbolt. In addition, the macaroonService
is also called with this dbTimeout param.
* watchtower/wtdb: add dbTimeout param during creation
This commit adds the dbTimeout param for the creation of both
watchtower.db and wtclient.db.
* multi: add db timeout param for walletdb.Create
This commit adds the db timeout param for the function call
walletdb.Create. It touches only the test files found in chainntnfs,
lnwallet, and routing.
* lnd: pass DBTimeout config to relevant services
This commit enables lnd to pass the DBTimeout config to the following
services/config/functions,
- chainControlConfig
- walletunlocker
- wallet.NewLoader
- macaroons
- watchtower
In addition, the usage of wallet.Create is updated too.
* sample-config: add dbtimeout option
In this commit, we extend the `BuildRoute` method and RPC on the router
sub-server to accept a raw payment address which will be included as
part of an MPP payload for the finla hop. This change actually also
allows users to craft their own MPP paths using BuildRoute+SendToRoute.
Our primary goal however, was to fix some broken itests since we now
require the payAddr to be present for ALL payments other than key send
payments.
This allows for a 1000 different persistent operations to proceed
concurrently. Now that we are batching operations at the db level, the
average number of outstanding requests will be higher since the commit
latency has increased. To compensate, we allow for more outstanding
requests to keep the router busy while batches are constructed.
Similarly as with kvdb.View this commits adds a reset closure to the
kvdb.Update call in order to be able to reset external state if the
underlying db backend needs to retry the transaction.
This commit adds a reset() closure to the kvdb.View function which will
be called before each retry (including the first) of the view
transaction. The reset() closure can be used to reset external state
(eg slices or maps) where the view closure puts intermediate results.
This commit clamps all user-chosen CLTVs in LND to be at least 18, which
is the new conservative value used in the sepc. This minimum is applied
uniformly to forwarding CLTV deltas (via channel updates) as well as
final CLTV deltas for new invoices.
This commit reverts cb4cd49dc8 to bring
back the insufficient local balance failure.
Distinguishing betweeen this failure and a regular "no route" failure
prevents meaningless htlcs from being sent out.
It can happen that the receiver times out some htlcs of the set if it
took to long to complete. But because the sender's mission control is
now updated, it is worth to keep trying to send those shards again.
Modifies the payment session to launch additional pathfinding attempts
for lower amounts. If a single shot payment isn't possible, the goal is
to try to complete the payment using multiple htlcs. In previous
commits, the payment lifecycle has been prepared to deal with
partial-amount routes returned from the payment session. It will query
for additional shards if needed.
Additionally a new rpc payment parameter is added that controls the
maximum number of shards that will be used for the payment.
With mpp it isn't possible anymore for findPath to determine that there
isn't enough local bandwidth. The full payment amount isn't known at
that point.
In a follow-up, this payment outcome can be reintroduced on a higher
level (payment lifecycle).
This commit fixes the inconsistency between the payment state as
reported by routerrpc.SendPayment/routerrpc.TrackPayment and the main
rpc ListPayments call.
In addition to that, payment state changes are now sent out for every
state change. This opens the door to user interfaces giving more
feedback to the user about the payment process. This is especially
interesting for multi-part payments.
We whitelist a set of "expected" errors that can be returned from
RequestRoute, by converting them into a new type noRouteError. For any
other error returned by RequestRoute, we'll now exit immediately.