This commit adds a new method `SendToRouteSkipTempErr` that skips
failing the payment unless a terminal error occurred. This is
accomplished by demoting the original `SendToRoute` to a private method
and creating two new methods on top of it to minimize code change.
feature-bit channels
This allows opening zero-conf chan-type, scid-alias chan-type, and
scid-alias feature-bit channels. scid-alias chan-type channels are
required to be private. Two paths are available for opening a zero-conf
channel:
* explicit chan-type negotiation
* LDK carve-out where chan-types are not used, LND is on the
receiving end, and a ChannelAcceptor is used to enable zero-conf
When a zero-conf channel is negotiated, the funding manager:
* sends a FundingLocked with an alias
* waits for a FundingLocked from the remote peer
* calls addToRouterGraph to persist the channel using our alias in
the graph. The peer's alias is used to send them a ChannelUpdate.
* wait for six confirmations. If public, the alias edge in the
graph is deleted and replaced (not atomically) with the confirmed
edge. Our policy is also read-and-replaced, but the counterparty's
policy won't exist until they send it to us.
When a scid-alias-feature channel is negotiated, the funding manager:
* sends a FundingLocked with an alias:
* calls addToRouterGraph, sends ChannelUpdate with the confirmed SCID
since it exists.
* when six confirmations occurs, the edge is deleted and re-inserted
since the peer may have sent us an alias ChannelUpdate that we are
storing in the graph.
Since it is possible for a user to toggle the scid-alias-feature-bit
to on while channels exist in the funding manager, care has been taken
to ensure that an alias is ALWAYS sent in the funding_locked message
if this happens.
This allows the router to determine what is and isn't an alias from
lnd's definition of an alias. Any ChannelAnnouncement that has an
alias ShortChannelID field is not verified on-chain. To prevent a
DoS vector from existing, the gossiper ensures that only the local
lnd node can send its ChannelAnnouncements to the router with an
alias ShortChannelID.
This commit fixes a formatting issue in the router. The commit is in
this PR to demonstrate how the .editorconfig settings also affect the
way GitHub displays the code diff.
Add a new chainview interface test that runds the chainview tests
against a bitcoind node that we are getting block and tx notifications
from using the rpc interface.
The unit test sometimes fails with a connection timeout when trying to
connect to the reorg mining node. We attempt to make things more robust
by doubling both the number of retries as well as the retry timeout
itself.
Base the calculation on the actual float64 overflow point rather than an
indirect limit on probability.
This is a preparation for an infinite attempt cost.
In case of a multi shard payment with more than one in-flight shards,
one shard quitting with a terminal failure will stop the payment
lifecycle and close the `shardHandler`'s `quit` channel. In the
`collectResult` function we're waiting for the `Switch` to
asynchronously return a result for each shard. This may have been
interrupted by the aformentioned `quit` channel's closing skipping
attempt failure (or success) notification towards the control tower
and therefore skipping proper settle/fail info fill in the channel db.
Since payments have a composite state of a global failure reason and
settle/fail info for all attempts, any attempt with an unfilled
settle/fail info keeps a payment in-flight even if the payment itself
isn't in-flight anymore.
This commit was previously split into the following parts to ease
review:
- 2d746f68: replace imports
- 4008f0fd: use ecdsa.Signature
- 849e33d1: remove btcec.S256()
- b8f6ebbd: use v2 library correctly
- fa80bca9: bump go modules
This commit adds the `force` flag to the `XImportMissionControl` RPC
which allows skipping rules around the pair import except for what is
mandatory to make values meaningful. This can be useful for when clients
would like to forcibly override MC state in order to manipulate routing
results.
In case the channeldb package is used as a library in external tools, it
can be useful to allow read-only access to a DB. This allows such a
tool to access a DB even if not all migrations were executed, which can
be useful for recovery purposes.
To make it possible to even start the DB with a read-only backend, we
need to disable the automatic migration step.
Since bbolt returns references to internally stored data when storing
locally it's best to copy the byte slices returned or alternatively
convert them to string (which also makes a copy) to avoid crashes casued
by memory corruption.
This commit partially reverts bf27d05a.
To avoid creating multiple database transactions during a single path
finding operation, we create an explicit transaction when the cached
graph is instantiated.
We cache the source node to avoid needing to look that up for every path
finding session.
The database transaction will be nil in case of the in-memory graph.
With this commit we forward the config option for disabling the channel
graph cache as a boolean to the channeldb. But we invert its meaning to
make the flag easier to understand.
Pass htlc amount down to the channel so that we don't need to rely
on minHtlc (and pad it when the channel sets a 0 min htlc). Update
test to just check some sane values since we're no longer relying
on minHtlc amount at all.
To avoid the channel map needing to be re-grown while we fill the cache
initially, we might as well pre-allocate it with a somewhat sane value
to decrease the number of grow events.
To further separate the channel graph from the channel state, we
refactor the AddrsForNode method to use the graphs's public methods
instead of directly accessing any buckets. This makes sure that we can
have the channel state cached with just its buckets while not using a
kvdb level cache for the graph.
At the same time we refactor the graph's test to also be less dependent
upon the channel state DB.
It was being considered a misbehaviour from the intermediate hop which
penalized that hop, which is a bit harsh considering mobile nodes. We
change it be considered as other channel update types and only penalize
the channel that failed.
This fixes a flake I've seen in the wild lately:
```
--- FAIL: TestBlockDifferenceFix (0.01s)
router_test.go:4335: height should have been updated to 5, instead got 4
FAIL
FAIL github.com/lightningnetwork/lnd/routing 3.865s
FAIL
```
We wrap things in an assertion loop to ensure that timing quirks don't
cause the test to fail sporadically.
Fixes an issue where an out of order block error occurs in the router. When this occurs, the change uses the chain notifier to catch up on missed blocks and uses those blocks to fully update the routing graph with closed channels. Fixes#4710, #5132
Adds an optional tx parameter to ForAllOutgoingChannels and FetchChannel
so that data can be queried within the context of an existing database
transaction.
This commit changes missioncontrol's store update from per payment to
every second. Updating the missioncontrol store on every payment caused
gradual slowdown when using etcd.
We also completely eliminate the use of the cursor, further reducing
the performance bottleneck.
In this commit, we fix a regression introduced by a recent bug fix in
this area. Before this change, we'd inspect the error returned by
`processSendError`, and then fail the payment from the PoV of mission
control using the returned error.
A recent refactoring removed `processSendError` and combined the logic
with `tryApplyChannelUpdate` in order to introduce a new
`handleSendError` method that consolidates the logic within the
`shardHandler`. Along the way, the behavior of the prior check was
replicated in the form of a new internal `failPayment` closure. However,
the new function closure ends up returning a `channeldb.FailureReason`
instance, which is actually an `error`.
In the wild, when `SendToRoute` fails due to an error at the
destination, then this new logic caused the `handleSendErorr` method to
fail with an error, returning an unstructured error back to the caller,
instead of the usual payment failure details.
We fix this by no longer checking the `handleSendErorr` for an error as
normal. The `handleSendErorr` function as is will always return an error
of type `*channeldb.FailureReason`, therefore we don't need to treat it
as a normal error. Instead, we check for the type of error returned, and
update the control tower state accordingly.
With this commit, the test added in the prior commit now passes.
Fixes#5477.
In this commit, we modify the existing `TestSendToRouteStructuredError`
test to return an error that doesn't trigger the second chance logic.
Otherwise, we'll get a nil failure result from the mission control
interpretation, meaning we won't exercise the full code path. Instead,
we use a terminal error to ensure that the expected code path is hit.
As is, this test will fail as a recent refactoring causes us to return a
`channeldb.FailureReason` error, since the newly added `handleSendError`
code path in the `SendToRoute` method will return the raw error, rather
than the `shardError`, which is of the expected type.
A followup commit for PR#5332. In this commit we add more docs, rename
function updatePaymentState to fetchePaymentState, and add back the
check for channeldb.ErrPaymentTerminal after we launch shard.
This commit refactors the resumePayment to extract some logics back to
paymentState so that the code is more testable. It also adds unit tests
for paymentState, and breaks the original MPPayment tests into independent tests
so that it's easier to maintain and debug. All the new tests are built
using mock so that the control flow is eaiser to setup and change.
This commit renames the mock structs by appending Old in their names. In
doing so the old tests stay unchanged and new mock structs can be added
in the following commit.
This commit adds payment session to shardHandler to enable private edge
policies being updated in shardHandler. The relevant interface and mock
are updated. From now on, upon seeing a ChannelUpdate message,
shardHandler will first try to find the target policy in additionalEdges
and update it. If nothing found, it will then check the database for
edge policy to update.
This commit adds the method UpdateAdditionalEdge in PaymentSession,
which allows the addtional channel edge policy to be updated from a
ChannelUpdate message. Another method, GetAdditionalEdgePolicy is added
to allow querying additional edge policies.
This commit moves the handleSendError method from ChannelRouter to
shardHandler. In doing so, shardHandler can now apply updates to the
in-memory paymentSession if they are found in the error message.
The simulated error returned was rejected due to signature failure,
and didn't simulate correctly the insufficient fees error as
intended. Fix error by including correct signature.
With this patch, we'll fail out earlier in the cycle in case of
some wonky parameters, and not leave zombie payments in the router
which currently are not cleaned up.
It seems #5246 introduced a subtle bug that lead to the error "out of
order block: expecting height=1, got height=XXX" some times during
startup. Apparently it can happen that during pruning of the graph tip
some blocks can come in before we start our chain view and the new block
subscription. By querying the chain backend for the best height before
syncing with the graph we ensure that we never miss a block.
The router subsystem has its own goroutine that receives chain updates
and then does its (quite time consuming) work on each new block. To make
it possible to find out what block the router currently is synced to, we
export its internal best height through a new method.
In this commit we add a new error for when we fail to validate the
funding transaction (invalid script, etc) and mark it as a zombie like
the other failed validation cases.
In this commit, we start to add any channels that fail the normal chain
validation to the zombie index. With this change, we'll ensure that we
won't continue to re-process the same set of spent channels over and
over again.
Fixes#5191.
This ensures the waiting receiving channel always receives an error to
prevent a deadlock when processing a network update that fails due to
the validation barrier.
On commit d5aedbcbd9:
1000 @ 0x43a285 0x44a38f 0xc42e86 0xc80fda 0xc8682d 0xc976c9 0x46fce1
github.com/lightningnetwork/lnd/routing.(*ChannelRouter).AddNode+0x245 github.com/lightningnetwork/lnd/routing/router.go:2218
github.com/lightningnetwork/lnd/discovery.(*AuthenticatedGossiper).addNode+0x3b9 github.com/lightningnetwork/lnd/discovery/gossiper.go:1510
github.com/lightningnetwork/lnd/discovery.(*AuthenticatedGossiper).processNetworkAnnouncement+0x574c github.com/lightningnetwork/lnd/discovery/gossiper.go:1554
github.com/lightningnetwork/lnd/discovery.(*AuthenticatedGossiper).networkHandler.func1+0x24github.com/lightningnetwork/lnd/discovery/gossiper.go:1043
This commit adds the block cache to the CfFilteredChainView struct
and wraps its GetBlock function so that block cache mutex map is used
when the call to neutrino's GetBlock function is called.
Since we want to support AMP payment using a different unique payment
identifier (AMP payments don't go to one specific hash), we change the
nomenclature to be Identifier instead of PaymentHash.
We'll let the payment's lifecycle register each shard it's sending with
the ShardTracker, canceling failed shards. This will be the foundation
for correct AMP derivation for each shard we'll send.
We'll use this to keep track of the outstanding shards and which
preimages we are using for each. For now this is a simple map from
attempt ID to hash, but later we'll hide the AMP child derivation behind
this interface.
To distinguish the attempt's unique ID from the overall payment
identifier, we name it attemptID everywhere, and note that the
paymentHash argument won't be the actual payment hash for AMP payments.
If we have processed a terminal state while we're pathfinding
for another shard, the payment loop should not error out on
ErrPaymentTerminal. Instead, it would wait for our shards to
complete then cleanly exit.
Move our more generic terminal check forward so that we only
need to handle a single class of expected errors. This change
is mirrored in our mock, and our reproducing tests are updated
to assert that this move catches both classes of errors we get.
Add an additional stuck-payment case, where our payment gets
a terminal error while it has other htlcs in-flight, and a
shard fails with ErrTerminalPayment. This payment also falls in
our class of expected errors, but is not currently handled. The
mock is updated accordingly, using the same ordering as in our
real RegisterAttempt implementation.
This commit adds a test which demonstrates that payments can
get stuck if we receive a payment failure while we're pathfinding
for another shard, then try to dispatch a shard after we've
recorded a permanent failure. It also updates our mock to
only consider payments with no in-flight htlcs as in-flight,
to more closely represent our actual RegisterAttempt.
This commit adds a step to our payment lifecycle test to add
control over when we find a path for our payment, This is
required for testing race conditions around pathfinding
completing and payment failures being reported.
Update our single shard success case to use a route which
splits the payment amount in half. This change still tests
the case where reveal of the preimage counts as a success,
even if we don't have the full amount. This change is made
to cut down on potential races in this test case. While we
are waiting for collectResultAsync to report a success, the
payment lifecycle will continue trying to dispatch shards.
In the case where we send 1/4 of the payment amount, we
send 1 or 2 more shards, depending on how long collectAsync
takes. Reducing this test to send 1/2 of the payment amount
means that we will always only try one more shard before
waiting for our shard.
This commit updates our mock to more closely follow the behavior of the
switch for mocked calls to GetPaymentResult. As it stands, our tests
send a test-created error from the switch when we want to mock shutdown.
In reality, the switch will close its result channel, so we update this
test to follow that behavior. This matters for the commit that follows,
because we start checking the error our payments return. If we have an
error from the switch, our tests will fail with an error that we do
not encounter in practice.
In our payment lifecycle tests, we have two goroutines that
compete for the lock in our mock control tower: the resumePayment
loop which tries to call RegisterAttempt, and the collectResult
handler which is launched in a goroutine by collectResultAsync
and is responsible for various settle/fail calls.
The order that the lock is acquired by these goroutines is
arbitrary, and can lead to flakes in our tests if the step
that we do not intend to execute first gets the lock (eg,
we want to fail a payment in collectResult, but RegisterAttempt
gets there first). This commit moves contention for this lock
after our mock's various "state driving" channels, so that the
lock will be acquired in the order that the test intends it.
Now that we run each test individually, we don't need to buffer
our mock's channels anymore. This helps to tighten our test loop,
which currently can move on from a step before it's actually
been processed by the mock. This removal ensures that our payment
loop processes each of the test's steps before moving on to the
next once.
Update our payment lifecycle test to run each test case with
a fresh router. This prevents test cases from interacting with
each other. Names are also added for easy debugging.
As is, we don't check that our SendPayment call in
TestRouterPaymentStateMachine completes. This makes it easier
to create malformed tests that just run through steps but leave
the SendPayment call hanging. This commit adds a check that we
have completed our payment to help catch tests like this. We
also remove an unused quit channel.
In this commit, we add strict zombie pruning as a config level param.
This allow us to add the option for those that want a tighter graph, and
not change the default composition of the channel graph for most users
over night.
In addition, we expand the test case slightly by testing that the self
node won't be pruned, but also that if there's a node with only a single
known stale edge, then both variants will prune that edge.
Since zombie pruning can be very slow on some devices (e.g. mobile) it
would stall lnd startup. Since it is not essential for pruning to be
finished for lnd to be functional, we instead delay the initial prune by
30 seconds.
Note that we could also wait for the graphPruneInterval to tick, but
since this is by default 2 hours, it is unlikely that a mobile app will
ever be open that long.
Previously, we would always allow dependent jobs to be processed,
regardless of the result of its parent job's validation. This isn't
correct, as a parent job contains actions necessary to successfully
process a dependent job. A prime example of this can be found within the
AuthenticatedGossiper, where an incoming channel announcement and update
are both processed, but if the channel announcement job fails to
complete, then the gossiper is unable to properly validate the update.
This commit aims to address this by preventing the dependent jobs to
run.
This commit fixes the following potential deadlock situation:
* Pathfinding holds a database lock and tries to obtain a mission control lock
via GetProbability
* ReportPaymentSuccess/ReportPaymentFail holds a mission control lock
and tries to obtain a database lock to store the payment result.
This produces a race condition when reading AssumeChannelValid from a
different goroutine. Instead we isolate the test cases and initial
AssumeChannelValid properly.
This commit reduces the number of concurrent validation operations the
router will perform when fully validating the channel graph. Reports
from several users indicate that GetInfo would hang for several minutes,
which is believed to be caused by attempting to validate massive amounts
of channels in parallel. This commit returns the limit back to its
original state before adding the batched gossip improvements.
We keep the 1000 concurrent validation request limit for
AssumeChannelValid, since we don't fetch blocks in that case. This
allows us to still keep the performance benefits on mobile/low-resource
devices.
In this commit, we thread through the necessary state to allow users to
set a max shard amount. If this value is set, then this'll effectively
serve as a ceiling for all our split attempts. If we need to split,
we'll first try to use `paymentAmt/2`, if that's bigger than
`MaxShardAmt, then we'll use the latter instead.
Ideally in the future we have a dynamic way to automatically set both
the `MaxShardAmt` as well as `MaxParts` for users. Until then exposing
these two new fields will allow us to experiment with setting them
automatically using the RPC interface, and also give users a bit more
control over how we attempt to route payments, akin to coin control for
on-chain payments.
Fixes#4730
All of the other mission control exported functions acquire their locks
immediately, and do not lock in the subsequent unexported functions.
This commit moves the lock up for the report payment functions so that
mission control's config values are covered by this lock, in preparation
for allowing config to be updated at runtime. Moving this lock means
that we will hold the lock for the additional time it takes to store a
single result, AddResult, to the store.
We are going to use the config struct to allow getting and setting
of the mission control config in the commits that follow. Self node
is not something we want to change, so we move it out for better
separation.
* mod: bump btcwallet version to accept db timeout
* btcwallet: add DBTimeOut in config
* kvdb: add database timeout option for bbolt
This commit adds a DBTimeout option in bbolt config. The relevant
functions walletdb.Open/Create are updated to use this config. In
addition, the bolt compacter also applies the new timeout option.
* channeldb: add DBTimeout in db options
This commit adds the DBTimeout option for channeldb. A new unit
test file is created to test the default options. In addition,
the params used in kvdb.Create inside channeldb_test is updated
with a DefaultDBTimeout value.
* contractcourt+routing: use DBTimeout in kvdb
This commit touches multiple test files in contractcourt and routing.
The call of function kvdb.Create and kvdb.Open are now updated with
the new param DBTimeout, using the default value kvdb.DefaultDBTimeout.
* lncfg: add DBTimeout option in db config
The DBTimeout option is added to db config. A new unit test is
added to check the default DB config is created as expected.
* migration: add DBTimeout param in kvdb.Create/kvdb.Open
* keychain: update tests to use DBTimeout param
* htlcswitch+chainreg: add DBTimeout option
* macaroons: support DBTimeout config in creation
This commit adds the DBTimeout during the creation of macaroons.db.
The usage of kvdb.Create and kvdb.Open in its tests are updated with
a timeout value using kvdb.DefaultDBTimeout.
* walletunlocker: add dbTimeout option in UnlockerService
This commit adds a new param, dbTimeout, during the creation of
UnlockerService. This param is then passed to wallet.NewLoader
inside various service calls, specifying a timeout value to be
used when opening the bbolt. In addition, the macaroonService
is also called with this dbTimeout param.
* watchtower/wtdb: add dbTimeout param during creation
This commit adds the dbTimeout param for the creation of both
watchtower.db and wtclient.db.
* multi: add db timeout param for walletdb.Create
This commit adds the db timeout param for the function call
walletdb.Create. It touches only the test files found in chainntnfs,
lnwallet, and routing.
* lnd: pass DBTimeout config to relevant services
This commit enables lnd to pass the DBTimeout config to the following
services/config/functions,
- chainControlConfig
- walletunlocker
- wallet.NewLoader
- macaroons
- watchtower
In addition, the usage of wallet.Create is updated too.
* sample-config: add dbtimeout option
In this commit, we extend the `BuildRoute` method and RPC on the router
sub-server to accept a raw payment address which will be included as
part of an MPP payload for the finla hop. This change actually also
allows users to craft their own MPP paths using BuildRoute+SendToRoute.
Our primary goal however, was to fix some broken itests since we now
require the payAddr to be present for ALL payments other than key send
payments.
This allows for a 1000 different persistent operations to proceed
concurrently. Now that we are batching operations at the db level, the
average number of outstanding requests will be higher since the commit
latency has increased. To compensate, we allow for more outstanding
requests to keep the router busy while batches are constructed.
Similarly as with kvdb.View this commits adds a reset closure to the
kvdb.Update call in order to be able to reset external state if the
underlying db backend needs to retry the transaction.
This commit adds a reset() closure to the kvdb.View function which will
be called before each retry (including the first) of the view
transaction. The reset() closure can be used to reset external state
(eg slices or maps) where the view closure puts intermediate results.
This commit clamps all user-chosen CLTVs in LND to be at least 18, which
is the new conservative value used in the sepc. This minimum is applied
uniformly to forwarding CLTV deltas (via channel updates) as well as
final CLTV deltas for new invoices.
This commit reverts cb4cd49dc8 to bring
back the insufficient local balance failure.
Distinguishing betweeen this failure and a regular "no route" failure
prevents meaningless htlcs from being sent out.
It can happen that the receiver times out some htlcs of the set if it
took to long to complete. But because the sender's mission control is
now updated, it is worth to keep trying to send those shards again.
Modifies the payment session to launch additional pathfinding attempts
for lower amounts. If a single shot payment isn't possible, the goal is
to try to complete the payment using multiple htlcs. In previous
commits, the payment lifecycle has been prepared to deal with
partial-amount routes returned from the payment session. It will query
for additional shards if needed.
Additionally a new rpc payment parameter is added that controls the
maximum number of shards that will be used for the payment.
With mpp it isn't possible anymore for findPath to determine that there
isn't enough local bandwidth. The full payment amount isn't known at
that point.
In a follow-up, this payment outcome can be reintroduced on a higher
level (payment lifecycle).
This commit fixes the inconsistency between the payment state as
reported by routerrpc.SendPayment/routerrpc.TrackPayment and the main
rpc ListPayments call.
In addition to that, payment state changes are now sent out for every
state change. This opens the door to user interfaces giving more
feedback to the user about the payment process. This is especially
interesting for multi-part payments.