This commit renames the mock structs by appending Old in their names. In
doing so the old tests stay unchanged and new mock structs can be added
in the following commit.
This commit adds payment session to shardHandler to enable private edge
policies being updated in shardHandler. The relevant interface and mock
are updated. From now on, upon seeing a ChannelUpdate message,
shardHandler will first try to find the target policy in additionalEdges
and update it. If nothing found, it will then check the database for
edge policy to update.
This commit adds the method UpdateAdditionalEdge in PaymentSession,
which allows the addtional channel edge policy to be updated from a
ChannelUpdate message. Another method, GetAdditionalEdgePolicy is added
to allow querying additional edge policies.
This commit moves the handleSendError method from ChannelRouter to
shardHandler. In doing so, shardHandler can now apply updates to the
in-memory paymentSession if they are found in the error message.
The simulated error returned was rejected due to signature failure,
and didn't simulate correctly the insufficient fees error as
intended. Fix error by including correct signature.
With this patch, we'll fail out earlier in the cycle in case of
some wonky parameters, and not leave zombie payments in the router
which currently are not cleaned up.
It seems #5246 introduced a subtle bug that lead to the error "out of
order block: expecting height=1, got height=XXX" some times during
startup. Apparently it can happen that during pruning of the graph tip
some blocks can come in before we start our chain view and the new block
subscription. By querying the chain backend for the best height before
syncing with the graph we ensure that we never miss a block.
The router subsystem has its own goroutine that receives chain updates
and then does its (quite time consuming) work on each new block. To make
it possible to find out what block the router currently is synced to, we
export its internal best height through a new method.
In this commit we add a new error for when we fail to validate the
funding transaction (invalid script, etc) and mark it as a zombie like
the other failed validation cases.
In this commit, we start to add any channels that fail the normal chain
validation to the zombie index. With this change, we'll ensure that we
won't continue to re-process the same set of spent channels over and
over again.
Fixes#5191.
This ensures the waiting receiving channel always receives an error to
prevent a deadlock when processing a network update that fails due to
the validation barrier.
On commit d5aedbcbd9:
1000 @ 0x43a285 0x44a38f 0xc42e86 0xc80fda 0xc8682d 0xc976c9 0x46fce1
github.com/lightningnetwork/lnd/routing.(*ChannelRouter).AddNode+0x245 github.com/lightningnetwork/lnd/routing/router.go:2218
github.com/lightningnetwork/lnd/discovery.(*AuthenticatedGossiper).addNode+0x3b9 github.com/lightningnetwork/lnd/discovery/gossiper.go:1510
github.com/lightningnetwork/lnd/discovery.(*AuthenticatedGossiper).processNetworkAnnouncement+0x574c github.com/lightningnetwork/lnd/discovery/gossiper.go:1554
github.com/lightningnetwork/lnd/discovery.(*AuthenticatedGossiper).networkHandler.func1+0x24github.com/lightningnetwork/lnd/discovery/gossiper.go:1043
This commit adds the block cache to the CfFilteredChainView struct
and wraps its GetBlock function so that block cache mutex map is used
when the call to neutrino's GetBlock function is called.
Since we want to support AMP payment using a different unique payment
identifier (AMP payments don't go to one specific hash), we change the
nomenclature to be Identifier instead of PaymentHash.
We'll let the payment's lifecycle register each shard it's sending with
the ShardTracker, canceling failed shards. This will be the foundation
for correct AMP derivation for each shard we'll send.
We'll use this to keep track of the outstanding shards and which
preimages we are using for each. For now this is a simple map from
attempt ID to hash, but later we'll hide the AMP child derivation behind
this interface.
To distinguish the attempt's unique ID from the overall payment
identifier, we name it attemptID everywhere, and note that the
paymentHash argument won't be the actual payment hash for AMP payments.
If we have processed a terminal state while we're pathfinding
for another shard, the payment loop should not error out on
ErrPaymentTerminal. Instead, it would wait for our shards to
complete then cleanly exit.
Move our more generic terminal check forward so that we only
need to handle a single class of expected errors. This change
is mirrored in our mock, and our reproducing tests are updated
to assert that this move catches both classes of errors we get.
Add an additional stuck-payment case, where our payment gets
a terminal error while it has other htlcs in-flight, and a
shard fails with ErrTerminalPayment. This payment also falls in
our class of expected errors, but is not currently handled. The
mock is updated accordingly, using the same ordering as in our
real RegisterAttempt implementation.
This commit adds a test which demonstrates that payments can
get stuck if we receive a payment failure while we're pathfinding
for another shard, then try to dispatch a shard after we've
recorded a permanent failure. It also updates our mock to
only consider payments with no in-flight htlcs as in-flight,
to more closely represent our actual RegisterAttempt.
This commit adds a step to our payment lifecycle test to add
control over when we find a path for our payment, This is
required for testing race conditions around pathfinding
completing and payment failures being reported.
Update our single shard success case to use a route which
splits the payment amount in half. This change still tests
the case where reveal of the preimage counts as a success,
even if we don't have the full amount. This change is made
to cut down on potential races in this test case. While we
are waiting for collectResultAsync to report a success, the
payment lifecycle will continue trying to dispatch shards.
In the case where we send 1/4 of the payment amount, we
send 1 or 2 more shards, depending on how long collectAsync
takes. Reducing this test to send 1/2 of the payment amount
means that we will always only try one more shard before
waiting for our shard.
This commit updates our mock to more closely follow the behavior of the
switch for mocked calls to GetPaymentResult. As it stands, our tests
send a test-created error from the switch when we want to mock shutdown.
In reality, the switch will close its result channel, so we update this
test to follow that behavior. This matters for the commit that follows,
because we start checking the error our payments return. If we have an
error from the switch, our tests will fail with an error that we do
not encounter in practice.
In our payment lifecycle tests, we have two goroutines that
compete for the lock in our mock control tower: the resumePayment
loop which tries to call RegisterAttempt, and the collectResult
handler which is launched in a goroutine by collectResultAsync
and is responsible for various settle/fail calls.
The order that the lock is acquired by these goroutines is
arbitrary, and can lead to flakes in our tests if the step
that we do not intend to execute first gets the lock (eg,
we want to fail a payment in collectResult, but RegisterAttempt
gets there first). This commit moves contention for this lock
after our mock's various "state driving" channels, so that the
lock will be acquired in the order that the test intends it.
Now that we run each test individually, we don't need to buffer
our mock's channels anymore. This helps to tighten our test loop,
which currently can move on from a step before it's actually
been processed by the mock. This removal ensures that our payment
loop processes each of the test's steps before moving on to the
next once.
Update our payment lifecycle test to run each test case with
a fresh router. This prevents test cases from interacting with
each other. Names are also added for easy debugging.
As is, we don't check that our SendPayment call in
TestRouterPaymentStateMachine completes. This makes it easier
to create malformed tests that just run through steps but leave
the SendPayment call hanging. This commit adds a check that we
have completed our payment to help catch tests like this. We
also remove an unused quit channel.
In this commit, we add strict zombie pruning as a config level param.
This allow us to add the option for those that want a tighter graph, and
not change the default composition of the channel graph for most users
over night.
In addition, we expand the test case slightly by testing that the self
node won't be pruned, but also that if there's a node with only a single
known stale edge, then both variants will prune that edge.
Since zombie pruning can be very slow on some devices (e.g. mobile) it
would stall lnd startup. Since it is not essential for pruning to be
finished for lnd to be functional, we instead delay the initial prune by
30 seconds.
Note that we could also wait for the graphPruneInterval to tick, but
since this is by default 2 hours, it is unlikely that a mobile app will
ever be open that long.
Previously, we would always allow dependent jobs to be processed,
regardless of the result of its parent job's validation. This isn't
correct, as a parent job contains actions necessary to successfully
process a dependent job. A prime example of this can be found within the
AuthenticatedGossiper, where an incoming channel announcement and update
are both processed, but if the channel announcement job fails to
complete, then the gossiper is unable to properly validate the update.
This commit aims to address this by preventing the dependent jobs to
run.
This commit fixes the following potential deadlock situation:
* Pathfinding holds a database lock and tries to obtain a mission control lock
via GetProbability
* ReportPaymentSuccess/ReportPaymentFail holds a mission control lock
and tries to obtain a database lock to store the payment result.
This produces a race condition when reading AssumeChannelValid from a
different goroutine. Instead we isolate the test cases and initial
AssumeChannelValid properly.
This commit reduces the number of concurrent validation operations the
router will perform when fully validating the channel graph. Reports
from several users indicate that GetInfo would hang for several minutes,
which is believed to be caused by attempting to validate massive amounts
of channels in parallel. This commit returns the limit back to its
original state before adding the batched gossip improvements.
We keep the 1000 concurrent validation request limit for
AssumeChannelValid, since we don't fetch blocks in that case. This
allows us to still keep the performance benefits on mobile/low-resource
devices.
In this commit, we thread through the necessary state to allow users to
set a max shard amount. If this value is set, then this'll effectively
serve as a ceiling for all our split attempts. If we need to split,
we'll first try to use `paymentAmt/2`, if that's bigger than
`MaxShardAmt, then we'll use the latter instead.
Ideally in the future we have a dynamic way to automatically set both
the `MaxShardAmt` as well as `MaxParts` for users. Until then exposing
these two new fields will allow us to experiment with setting them
automatically using the RPC interface, and also give users a bit more
control over how we attempt to route payments, akin to coin control for
on-chain payments.
Fixes#4730
All of the other mission control exported functions acquire their locks
immediately, and do not lock in the subsequent unexported functions.
This commit moves the lock up for the report payment functions so that
mission control's config values are covered by this lock, in preparation
for allowing config to be updated at runtime. Moving this lock means
that we will hold the lock for the additional time it takes to store a
single result, AddResult, to the store.
We are going to use the config struct to allow getting and setting
of the mission control config in the commits that follow. Self node
is not something we want to change, so we move it out for better
separation.
* mod: bump btcwallet version to accept db timeout
* btcwallet: add DBTimeOut in config
* kvdb: add database timeout option for bbolt
This commit adds a DBTimeout option in bbolt config. The relevant
functions walletdb.Open/Create are updated to use this config. In
addition, the bolt compacter also applies the new timeout option.
* channeldb: add DBTimeout in db options
This commit adds the DBTimeout option for channeldb. A new unit
test file is created to test the default options. In addition,
the params used in kvdb.Create inside channeldb_test is updated
with a DefaultDBTimeout value.
* contractcourt+routing: use DBTimeout in kvdb
This commit touches multiple test files in contractcourt and routing.
The call of function kvdb.Create and kvdb.Open are now updated with
the new param DBTimeout, using the default value kvdb.DefaultDBTimeout.
* lncfg: add DBTimeout option in db config
The DBTimeout option is added to db config. A new unit test is
added to check the default DB config is created as expected.
* migration: add DBTimeout param in kvdb.Create/kvdb.Open
* keychain: update tests to use DBTimeout param
* htlcswitch+chainreg: add DBTimeout option
* macaroons: support DBTimeout config in creation
This commit adds the DBTimeout during the creation of macaroons.db.
The usage of kvdb.Create and kvdb.Open in its tests are updated with
a timeout value using kvdb.DefaultDBTimeout.
* walletunlocker: add dbTimeout option in UnlockerService
This commit adds a new param, dbTimeout, during the creation of
UnlockerService. This param is then passed to wallet.NewLoader
inside various service calls, specifying a timeout value to be
used when opening the bbolt. In addition, the macaroonService
is also called with this dbTimeout param.
* watchtower/wtdb: add dbTimeout param during creation
This commit adds the dbTimeout param for the creation of both
watchtower.db and wtclient.db.
* multi: add db timeout param for walletdb.Create
This commit adds the db timeout param for the function call
walletdb.Create. It touches only the test files found in chainntnfs,
lnwallet, and routing.
* lnd: pass DBTimeout config to relevant services
This commit enables lnd to pass the DBTimeout config to the following
services/config/functions,
- chainControlConfig
- walletunlocker
- wallet.NewLoader
- macaroons
- watchtower
In addition, the usage of wallet.Create is updated too.
* sample-config: add dbtimeout option
In this commit, we extend the `BuildRoute` method and RPC on the router
sub-server to accept a raw payment address which will be included as
part of an MPP payload for the finla hop. This change actually also
allows users to craft their own MPP paths using BuildRoute+SendToRoute.
Our primary goal however, was to fix some broken itests since we now
require the payAddr to be present for ALL payments other than key send
payments.
This allows for a 1000 different persistent operations to proceed
concurrently. Now that we are batching operations at the db level, the
average number of outstanding requests will be higher since the commit
latency has increased. To compensate, we allow for more outstanding
requests to keep the router busy while batches are constructed.
Similarly as with kvdb.View this commits adds a reset closure to the
kvdb.Update call in order to be able to reset external state if the
underlying db backend needs to retry the transaction.
This commit adds a reset() closure to the kvdb.View function which will
be called before each retry (including the first) of the view
transaction. The reset() closure can be used to reset external state
(eg slices or maps) where the view closure puts intermediate results.
This commit clamps all user-chosen CLTVs in LND to be at least 18, which
is the new conservative value used in the sepc. This minimum is applied
uniformly to forwarding CLTV deltas (via channel updates) as well as
final CLTV deltas for new invoices.
This commit reverts cb4cd49dc8 to bring
back the insufficient local balance failure.
Distinguishing betweeen this failure and a regular "no route" failure
prevents meaningless htlcs from being sent out.
It can happen that the receiver times out some htlcs of the set if it
took to long to complete. But because the sender's mission control is
now updated, it is worth to keep trying to send those shards again.
Modifies the payment session to launch additional pathfinding attempts
for lower amounts. If a single shot payment isn't possible, the goal is
to try to complete the payment using multiple htlcs. In previous
commits, the payment lifecycle has been prepared to deal with
partial-amount routes returned from the payment session. It will query
for additional shards if needed.
Additionally a new rpc payment parameter is added that controls the
maximum number of shards that will be used for the payment.
With mpp it isn't possible anymore for findPath to determine that there
isn't enough local bandwidth. The full payment amount isn't known at
that point.
In a follow-up, this payment outcome can be reintroduced on a higher
level (payment lifecycle).
This commit fixes the inconsistency between the payment state as
reported by routerrpc.SendPayment/routerrpc.TrackPayment and the main
rpc ListPayments call.
In addition to that, payment state changes are now sent out for every
state change. This opens the door to user interfaces giving more
feedback to the user about the payment process. This is especially
interesting for multi-part payments.
We whitelist a set of "expected" errors that can be returned from
RequestRoute, by converting them into a new type noRouteError. For any
other error returned by RequestRoute, we'll now exit immediately.
We add validation making sure we are not trying to register MPP shards
for non-MPP payments, and vice versa. We also add validtion of total
sent amount against payment value, and matching MPP options.
We also add methods for copying Route/Hop, since it is useful to use
for modifying the route amount in the test.
This commit enables MPP sends for SendToRoute, by allowing launching
another payment attempt if the hash is already registered with the
ControlTower.
We also set the total payment amount of of the payment from mpp record,
to indicate that the shard value might be different from the total
payment value.
We only mark non-MPP payments as failed in the database after
encountering a failure, since we might want to try more shards for MPP.
For now this means that MPP sendToRoute payments will be failed
only after a restart has happened.
This commit finally enables MP payments within the payment lifecycle
(used for SendPayment). This is done by letting the loop launch shards
as long as there is value remaining to send, inspecting the outcomes for
the sent shards when the full payment amount has been filled.
The method channeldb.MPPayment.SentAmt() is added to easily look up how
much value we have sent for the payment.
(almost) PURE CODE MOVE
The only code change is to change a few select cases from
case _ <- channel:
to
case <- channel:
to please the linter.
The test is testing the payment lifecycle, so move it to
payment_lifecycle_test.go
This commit redefines how the control tower handles shard and payment
level settles and failures. We now consider the payment in flight as
long it has active shards, or it has no active shards but has not
reached a terminal condition (settle of one of the shards, or a payment
level failure has been encountered).
We also make it possible to settle/fail shards regardless of the payment
level status (since we must allow late shards recording their status
even though we have already settled/failed the payment).
Finally, we make it possible to Fail the payment when it is already
failed. This is to allow multiple concurrent shards that reach terminal
errors to mark the payment failed, without havinng to synchronize.
In preparation for MPP we return the terminal errors recorded with the
control tower. The reason is that we cannot return immediately when a
shard fails for MPP, since there might be more shards in flight that we
must wait for. For that reason we instead mark the payment failed in the
control tower, then return this error when we inspect the payment,
seeing it has been failed and there are no shards in flight.
To move towards how we will handle existing attempt in case of MPP
(collecting their outcome will be done in separate goroutines separate
from the payment loop), we move to collect their outcome first.
To easily fetch HTLCs that are still not resolved, we add the utility
method InFlightHTLCs to channeldb.MPPayment.
Now that SendToRoute is no longer using the payment lifecycle, we move
the max hop check out of the payment shard's launch() method, and return
the error directly, such that it can be handled in SendToRoute.
Now that SendToRoute is no longer using the payment lifecycle, we
remove the error structs and vars used to cache the last encountered
error. For SendToRoute this will now be returned directly after a shard
has failed.
For SendPayment this means that the last error encountered durinng
pathfinding no longer will be returned. All errors encounterd can
instead be inspected from the HTLC list.
Instead of having SendToRoute pull routes from the payment session in
the payment lifecycle, we utilize the new methods on the paymentShard to
launch and collect the result for this single route.
This also let us remove the check for noRouteError, as we will always
have the result from the tried attempt returned. A result of this is
that we can finally remove lastError from the payment lifecycle (see
next commits).
Fetching the final shard result will also be done for calls to
SendToRoute, so we extract this code into a new method.
We move the call to the ControlTower to set the payment level failure
out into the payment loop, as this must be handled differently when
multiple shards are in flight, and for SendToRoute.
Define shardHandler which is a struct holding what is needed to send
attempts along given routes. The reason we define the logic on this
struct instead of the paymentLifecycle is that we later will make
SendToRoute calls not go through the payment lifecycle, but only using
this struct.
The launch shard is responsible for registering the attempt with the
control tower, failing it if the launch fails. Note that it is NOT
responsible for marking the _payment_ failed in case a terminal error is
encountered. This is important since we will later reuse this method for
SendToRoute, where whether to fail the payment cannot be decided on the
shard level.
We replace the cached attempt, and instead use the control tower
(database) to fetch any in-flight attempt. This is done as a
preparation for having multiple attempts in flight.
In addition we remove the cached circuit, as it won't be applicable when
multiple shards are in flight.
Instead of tracking the attemp we consult the database on every
iteration, and pick up any existing attempt. This also let us avoid
having to pass in the existing attempts from the payment loop, as we
just fetch them direclty.
This method is used to fetch a payment and all HTLC attempt that have
been made for that payment. It will both be used to resume inflight
attempts, and to fetch the final outcome of previous attempts.
We also update the the mock control tower to mimic the real control
tower, by letting it track multiple HTLC attempts for a given payment
hash, laying the groundwork for later enabling it for MPP.
The test case's preimage was (mistakenly) overwritten after crafting the
lightning payment, causing the parts of the testcases use the same
preimage causing problems when we are using the payment hash and
preimage in the mock control tower to distinguish paymennts.
In our quest to move calls to the ControlTower into the main payment
lifecycle loop, we move the edge case of a too long route out of
createNewPaymentAttempt.
loop
To prepare for multiple in flight payment attempts, we move
checkpointing the payment attempt out of createNewPaymentAttempt and
into the main payment lifecycle loop.
We'll attempt to move all calls to the DB via the ControlTower into this
loop, so we can more easily handle them in sequence.
active shards
In preparation for doing pathfinding for routes sending a value less
than the total payment amount, we let the payment session take the max
amount to send and the fee limit as arguments to RequestRoute.
This commit moves supplying of the information in the LightningPayment
to the initialization of the paymentSession, away from every call to
RequestRoute.
Instead the paymentSession will store this information internally, as it
doesn't change between payment attempts.
This is done to rid the RequestRoute call of the LightingPayment
argument, as for SendToRoute calls, it is not needed to supply the next
route.
This commit extends the htlc fail info with the full failure reason that
was received over the wire. In a later commit, this info will also be
exposed on the rpc interface. Furthermore it serves as a building block
to make SendToRoute reliable across restarts.
This commit converts the database structure of a payment so that it can
not just store the last htlc attempt, but all attempts that have been
made. This is a preparation for mpp sending.
In addition to that, we now also persist the fail time of an htlc. In a
later commit, the full failure reason will be added as well.
A key change is made to the control tower interface. Previously the
control tower wasn't aware of individual htlc outcomes. The payment
remained in-flight with the latest attempt recorded, but an outcome was
only set when the payment finished. With this commit, the outcome of
every htlc is expected by the control tower and recorded in the
database.
Co-authored-by: Johan T. Halseth <johanth@gmail.com>
To better distinguish payments from HTLCs, we rename the attempt info
struct to HTLCAttemptInfo. We also embed it into the HTLCAttempt struct,
to avoid having to duplicate this information.
The paymentID term is renamed to attemptID.
Adds an integrated routing test of probability extrapolation for untried
channels. The larger part of this commit is mock code to simulate the
Lightning Network.
The difference between this test and the existing pathfinding tests, is that
this test focuses on the feedback loop from result interpretation via
mission control updates and probability estimation back to pathfinding.
Improvements like probability extrapolation were previously only
validated by reasoning, while this setup makes it possible to assert the
improvement in a test and guard it for the future.
Previously we only penalized the outgoing connections of a failing node.
This turned out not to be sufficient, because the next route sometimes
went into the same failing node again to try a different outgoing
connection that wasn't yet known to mission control and therefore not
penalized before.
This shortcut does not work when the destination is a private node. We
also don't have this shortcut for regular payments. This commit
aligns the behavior between SendPayment and QueryRoutes.
The default was increased for the main sendpayment RPC in commit
d3fa9767a9. This commit sets the
same default for QueryRoutes, routerrpc.SendPayment and
router.EstimateRouteFee.
Update the type check used for checking local payment
failures to check on the ClearTextError interface rather
than on the ForwardingError type. This change prepares
for splitting payment errors up into Link and Forwarding
errors.
This commit adds a ClearTextError interface
which is implemented by non-opaque errors that
we know the underlying wire failure message for.
This interface is implemented by ForwardingErrors,
because we can fully decrypt the onion blob to
obtain the underlying failure reason. This interface
will also be implemented by errors which originate
at our node in following commits, because we know
the failure reason when we fail the htlc.
The lnwire interface is un-embedded in the
ForwardingError struct in favour of implementing
this interface. This change is made to protect
against accidental passing of a ForwardingError
to the wire, where the embedded FailureMessage
interface will present as wire failure but
will not serialize properly.
Add a constructor for the creation of forwarding errors.
A special constructor is added for the case where we have
an unknown wire failure, and must set a nil failure message.
Modifies TestMissingFeatureDep and TestDestPaymentAddr to use the test
ctx directly instead of generating a closure and using local state to
modify restrictions.
This commit brings us inline with recent modifications to the spec, that
say we shouldn't pay nodes whose feature vectors signal unknown required
features, and also that we shouldn't route through nodes signaling
unknown required features.
Currently we assert that invoices don't have such features during
decoding, but now that users can specify feature vectors via the rpc
interface, it makes sense to perform this check deeper in call stack.
This will also allow us to remove the check from decoding entirely,
making decodepayreq more useful for debugging.
In this commit, we update the routing package to use the new
`sphinx.NewOnionPacket` method. The new version of this method allows us
to specify _how_ the packet should be filled before it's used to create
a mix-header. This isn't a fundamental change (totally backwards
compatible), instead it plugs a privacy leak that may have revealed to
the destination how long the true route was.