In this commit, we eliminate some code duplication by removing the old
`HashMutex` struct as it just duplicates all the code with a different
type (uint64 and hash). We then make the main Mutex struct take a type
param, so the key can be parametrized when the struct is instantiated.
This commit adds a a new `MarshalOutPoint` helper in the `lnrpc` package
that can be used to convert a `wire.Outpoint` to an `lnrpc.Outpoint`.
By using this helper, we are less likely to forget to populate both the
string and byte form of the TXID.
The process how we calculate a total probability from the direct and
node probability is modified to give more importance to the direct
probability.
This is important if we know about a recent failure. We should not try
to send over this channel again if we know we can't. Otherwise this can
lead to infinite retrials over the channel, when the probability is
pinned at high probabilities by the other node results.
Having a capacity available is important for liquidity estimation.
* We add a dummy capacity to hop hints. Hop hints specify neither the
capacity nor a maxHTLC size, which is why we assume the channel to
have a high capacity relative to the amount we send.
* We add a capacity to local edges. These channels should always have a
capacity associated with them, even in the neutrino case (a fallback
to maxHTLC is not necessary). This is just for completeness, as the
probability calculation for local channels is done separately.
* The maximal reduction in the probability is limited to 0.5 (previously
~0.05), such that we don't get too low apriori probabilities.
Otherwise, this may lead to a too strong selection of large (and maybe
expensive) channels. A two-hop path would get total probability
penalties of:
- 1000PPM/(0.6*0.6) = 2778 PPM in the unsaturated case
- 1000PPM/(0.6*(0.6*0.5)) = 5556 PPM in the saturated case, where the
second hop is saturated
The difference in PPM of 2778 PPM should be enough to bias towards the
first path.
* The smearing factor is reduced. Previously we had to keep a higher
smearing factor in order to make the capacity factor not go to zero
for high amounts, to still give a fully saturated channel a chance.
This is not needed anymore due to the capping to 0.5. A lower value of
the smearing factor lets us more precisely choose a capacity fraction
and the capacity factor is more neutral when it comes to intermediate
amounts.
We set a conservative default value for the capacity fraction, which
still has the effect of discarding exhausted channels, giving a
noticeable effect when about 90% of the capacity is being used.
We make the capacity factor configurable via an lnd.conf routerrpc
apriori parameter. The capacity factor trades off increased success
probability with a reduced set of channel candidates, which may lead to
increased fees. To let users choose whether the factor is active or not,
we add a config setting where a capacity fraction of 1.0 disables the
factor. We limit the capacity fraction to values between 0.75 and 1.0.
Lower values may discard too many channels.
We require channel updates to have the max HTLC message flag set.
Several flows need to pass that check before channel updates are
forwarded to peers:
* after channel funding: `addToRouterGraph`
* after receiving channel updates from a peer:
`ProcessRemoteAnnouncement`
* after we update channel policies: `PropagateChanPolicyUpdate`
We rename `ChanUpdateOptionMaxHtlc` to `ChanUpdateRequiredMaxHtlc`
as with the latest changes it is now required.
Similarly, rename `validateOptionalFields` to
`ValidateChannelUpdateFields`, export it to use it in a later commit.
We use a more general `Estimator` interface for probability estimation
in missioncontrol.
The estimator is created outside of `NewMissionControl`, passed in as a
`MissionControlConfig` field, to facilitate usage of externally supplied
estimators.
Implements a new probability estimator based on a probability theory
framework.
The computed probability consists of:
* the direct channel probability, which is estimated based on a
depleted liquidity distribution model, formulas and broader concept derived
after Pickhardt et al. https://arxiv.org/abs/2103.08576
* an extension of the probability model to incorporate knowledge decay
after time for previous successes and failures
* a mixed node probability taking into account successes/failures on other
channels of the node (similar to the apriori approach)
We introduce a probability `Estimator` interface which is implemented by
the current apriori probability estimator. A second implementation, the
bimodal probability estimator follows.
* we rename the current probability estimator to be the "apriori"
probability estimator to distinguish from a different implementation
later
* the AprioriEstimator is exported to later be able to type switch
* getLocalPairProbability -> LocalPairProbabiltiy (later part of an
exported interface)
* getPairProbability -> getPairProbabiltiy (later part of an exported
interface)
The test cases in `TestUpdatePaymentState` run in parallel. One of the
parameters is a pointer and the value to the struct it points to gets
modified during the test.
The race condition was introduced in 8d49dfb07e
To test the fix run from the main folder `go test ./routing/. -race`
before this fix and after.
We multiply the apriori probability with a factor to take capacity into
account:
P *= 1 - 1 / [1 + exp(-(amount - cutoff)/smearing)]
The factor is a function value between 1 (small amount) and 0 (high
amount). The zero limit may not be reached exactly depending on the
smearing and cutoff combination. The function is a logistic function
mirrored about the y-axis. The cutoff determines the amount at which a
significant reduction in probability takes place and the smearing
parameter defines how smooth the transition from 1 to 0 is. Both, the
cutoff and smearing parameters are defined in terms of fixed fractions
of the capacity.
Extends the pathfinder with a capacity argument for later usage.
In tests, the inserted testCapacity has no effect, but will be used
later to estimate reduced probabilities from it.
This commit adds the maximal capacity between two nodes to the unified
edge data. We use MaxHTLC as a replacement if the channel capacity is
not available.
In tests we use larger maxHTLC values to be able to convert to a
non-zero sat capacity.
This commit refactors the semantics of unified policies to unified
edges. The main changes are the following renamings:
* unifiedPolicies -> nodeEdgeUnifier
* unifiedPolicy -> edgeUnifier
* unifiedPolicyEdge -> unifiedEdge
Comments and shortened variable names are changed to reflect the new
semantics.
We encapsulate the capacity inside a unifiedPolicyEdge for later usage.
The meaning of "policy" has changed now, which will be refactored in the
next commmit.
This commit refactors the `networkHandler` to use the new method
`handleNetworkUpdate`. Because the `select` is called inside a for loop,
which is equivalent of firing goroutine inside range loop, it's possible
that a variable used inside a previous goroutine is referencing the
current one. This is now fixed by making the goroutine taking the params
used for network update.
This commit renames the method `GetPaymentResult` to be
`GetAttemptResult` to avoid potential confusion and to address the
one-to-many relationship between a payment and its attempts.
Add ignore condition to additional edges that connect to self. These
edges are already known and avoiding these hints protect the payment
from malformed channel ids which could lead to infinite loop.
Fixes lightningnetwork#6169.
Co-authored-by: lsunsi <lsunsi@pm.me>
Test stream cancellation of the TrackPayments rpc call. In order to achieve
this, ControlTowerSubscriber is converted to an interface, to avoid trying to
close a null channel when closing the subscription. By returning a mock
implementation of the ControlTowerSubscriber in the test that problem is
avoided.
Add a method 'SubscribeAllPayments' to the control tower, in order to be able to
subscribe to any payment, rather than subscribing to a specific payment hash.
This commit removes the old multi shards test for `SendToRoute` as the
method doesn't support sending MPP. The original test passed due to a
flawed mocking method, where the mockers bypassed the public interfaces
to maintain their internal state, which caused a non-exsiting situation
that a temp error wouldn't fail the payment. A new unit test is added to
demonstrate the actual case.
This commit adds a new method `SendToRouteSkipTempErr` that skips
failing the payment unless a terminal error occurred. This is
accomplished by demoting the original `SendToRoute` to a private method
and creating two new methods on top of it to minimize code change.
feature-bit channels
This allows opening zero-conf chan-type, scid-alias chan-type, and
scid-alias feature-bit channels. scid-alias chan-type channels are
required to be private. Two paths are available for opening a zero-conf
channel:
* explicit chan-type negotiation
* LDK carve-out where chan-types are not used, LND is on the
receiving end, and a ChannelAcceptor is used to enable zero-conf
When a zero-conf channel is negotiated, the funding manager:
* sends a FundingLocked with an alias
* waits for a FundingLocked from the remote peer
* calls addToRouterGraph to persist the channel using our alias in
the graph. The peer's alias is used to send them a ChannelUpdate.
* wait for six confirmations. If public, the alias edge in the
graph is deleted and replaced (not atomically) with the confirmed
edge. Our policy is also read-and-replaced, but the counterparty's
policy won't exist until they send it to us.
When a scid-alias-feature channel is negotiated, the funding manager:
* sends a FundingLocked with an alias:
* calls addToRouterGraph, sends ChannelUpdate with the confirmed SCID
since it exists.
* when six confirmations occurs, the edge is deleted and re-inserted
since the peer may have sent us an alias ChannelUpdate that we are
storing in the graph.
Since it is possible for a user to toggle the scid-alias-feature-bit
to on while channels exist in the funding manager, care has been taken
to ensure that an alias is ALWAYS sent in the funding_locked message
if this happens.
This allows the router to determine what is and isn't an alias from
lnd's definition of an alias. Any ChannelAnnouncement that has an
alias ShortChannelID field is not verified on-chain. To prevent a
DoS vector from existing, the gossiper ensures that only the local
lnd node can send its ChannelAnnouncements to the router with an
alias ShortChannelID.
This commit fixes a formatting issue in the router. The commit is in
this PR to demonstrate how the .editorconfig settings also affect the
way GitHub displays the code diff.
Add a new chainview interface test that runds the chainview tests
against a bitcoind node that we are getting block and tx notifications
from using the rpc interface.
The unit test sometimes fails with a connection timeout when trying to
connect to the reorg mining node. We attempt to make things more robust
by doubling both the number of retries as well as the retry timeout
itself.
Base the calculation on the actual float64 overflow point rather than an
indirect limit on probability.
This is a preparation for an infinite attempt cost.
In case of a multi shard payment with more than one in-flight shards,
one shard quitting with a terminal failure will stop the payment
lifecycle and close the `shardHandler`'s `quit` channel. In the
`collectResult` function we're waiting for the `Switch` to
asynchronously return a result for each shard. This may have been
interrupted by the aformentioned `quit` channel's closing skipping
attempt failure (or success) notification towards the control tower
and therefore skipping proper settle/fail info fill in the channel db.
Since payments have a composite state of a global failure reason and
settle/fail info for all attempts, any attempt with an unfilled
settle/fail info keeps a payment in-flight even if the payment itself
isn't in-flight anymore.
This commit was previously split into the following parts to ease
review:
- 2d746f68: replace imports
- 4008f0fd: use ecdsa.Signature
- 849e33d1: remove btcec.S256()
- b8f6ebbd: use v2 library correctly
- fa80bca9: bump go modules
This commit adds the `force` flag to the `XImportMissionControl` RPC
which allows skipping rules around the pair import except for what is
mandatory to make values meaningful. This can be useful for when clients
would like to forcibly override MC state in order to manipulate routing
results.
In case the channeldb package is used as a library in external tools, it
can be useful to allow read-only access to a DB. This allows such a
tool to access a DB even if not all migrations were executed, which can
be useful for recovery purposes.
To make it possible to even start the DB with a read-only backend, we
need to disable the automatic migration step.
Since bbolt returns references to internally stored data when storing
locally it's best to copy the byte slices returned or alternatively
convert them to string (which also makes a copy) to avoid crashes casued
by memory corruption.
This commit partially reverts bf27d05a.
To avoid creating multiple database transactions during a single path
finding operation, we create an explicit transaction when the cached
graph is instantiated.
We cache the source node to avoid needing to look that up for every path
finding session.
The database transaction will be nil in case of the in-memory graph.
With this commit we forward the config option for disabling the channel
graph cache as a boolean to the channeldb. But we invert its meaning to
make the flag easier to understand.
Pass htlc amount down to the channel so that we don't need to rely
on minHtlc (and pad it when the channel sets a 0 min htlc). Update
test to just check some sane values since we're no longer relying
on minHtlc amount at all.
To avoid the channel map needing to be re-grown while we fill the cache
initially, we might as well pre-allocate it with a somewhat sane value
to decrease the number of grow events.
To further separate the channel graph from the channel state, we
refactor the AddrsForNode method to use the graphs's public methods
instead of directly accessing any buckets. This makes sure that we can
have the channel state cached with just its buckets while not using a
kvdb level cache for the graph.
At the same time we refactor the graph's test to also be less dependent
upon the channel state DB.
It was being considered a misbehaviour from the intermediate hop which
penalized that hop, which is a bit harsh considering mobile nodes. We
change it be considered as other channel update types and only penalize
the channel that failed.
This fixes a flake I've seen in the wild lately:
```
--- FAIL: TestBlockDifferenceFix (0.01s)
router_test.go:4335: height should have been updated to 5, instead got 4
FAIL
FAIL github.com/lightningnetwork/lnd/routing 3.865s
FAIL
```
We wrap things in an assertion loop to ensure that timing quirks don't
cause the test to fail sporadically.
Fixes an issue where an out of order block error occurs in the router. When this occurs, the change uses the chain notifier to catch up on missed blocks and uses those blocks to fully update the routing graph with closed channels. Fixes#4710, #5132
Adds an optional tx parameter to ForAllOutgoingChannels and FetchChannel
so that data can be queried within the context of an existing database
transaction.
This commit changes missioncontrol's store update from per payment to
every second. Updating the missioncontrol store on every payment caused
gradual slowdown when using etcd.
We also completely eliminate the use of the cursor, further reducing
the performance bottleneck.
In this commit, we fix a regression introduced by a recent bug fix in
this area. Before this change, we'd inspect the error returned by
`processSendError`, and then fail the payment from the PoV of mission
control using the returned error.
A recent refactoring removed `processSendError` and combined the logic
with `tryApplyChannelUpdate` in order to introduce a new
`handleSendError` method that consolidates the logic within the
`shardHandler`. Along the way, the behavior of the prior check was
replicated in the form of a new internal `failPayment` closure. However,
the new function closure ends up returning a `channeldb.FailureReason`
instance, which is actually an `error`.
In the wild, when `SendToRoute` fails due to an error at the
destination, then this new logic caused the `handleSendErorr` method to
fail with an error, returning an unstructured error back to the caller,
instead of the usual payment failure details.
We fix this by no longer checking the `handleSendErorr` for an error as
normal. The `handleSendErorr` function as is will always return an error
of type `*channeldb.FailureReason`, therefore we don't need to treat it
as a normal error. Instead, we check for the type of error returned, and
update the control tower state accordingly.
With this commit, the test added in the prior commit now passes.
Fixes#5477.
In this commit, we modify the existing `TestSendToRouteStructuredError`
test to return an error that doesn't trigger the second chance logic.
Otherwise, we'll get a nil failure result from the mission control
interpretation, meaning we won't exercise the full code path. Instead,
we use a terminal error to ensure that the expected code path is hit.
As is, this test will fail as a recent refactoring causes us to return a
`channeldb.FailureReason` error, since the newly added `handleSendError`
code path in the `SendToRoute` method will return the raw error, rather
than the `shardError`, which is of the expected type.
A followup commit for PR#5332. In this commit we add more docs, rename
function updatePaymentState to fetchePaymentState, and add back the
check for channeldb.ErrPaymentTerminal after we launch shard.
This commit refactors the resumePayment to extract some logics back to
paymentState so that the code is more testable. It also adds unit tests
for paymentState, and breaks the original MPPayment tests into independent tests
so that it's easier to maintain and debug. All the new tests are built
using mock so that the control flow is eaiser to setup and change.
This commit renames the mock structs by appending Old in their names. In
doing so the old tests stay unchanged and new mock structs can be added
in the following commit.
This commit adds payment session to shardHandler to enable private edge
policies being updated in shardHandler. The relevant interface and mock
are updated. From now on, upon seeing a ChannelUpdate message,
shardHandler will first try to find the target policy in additionalEdges
and update it. If nothing found, it will then check the database for
edge policy to update.
This commit adds the method UpdateAdditionalEdge in PaymentSession,
which allows the addtional channel edge policy to be updated from a
ChannelUpdate message. Another method, GetAdditionalEdgePolicy is added
to allow querying additional edge policies.
This commit moves the handleSendError method from ChannelRouter to
shardHandler. In doing so, shardHandler can now apply updates to the
in-memory paymentSession if they are found in the error message.
The simulated error returned was rejected due to signature failure,
and didn't simulate correctly the insufficient fees error as
intended. Fix error by including correct signature.
With this patch, we'll fail out earlier in the cycle in case of
some wonky parameters, and not leave zombie payments in the router
which currently are not cleaned up.
It seems #5246 introduced a subtle bug that lead to the error "out of
order block: expecting height=1, got height=XXX" some times during
startup. Apparently it can happen that during pruning of the graph tip
some blocks can come in before we start our chain view and the new block
subscription. By querying the chain backend for the best height before
syncing with the graph we ensure that we never miss a block.
The router subsystem has its own goroutine that receives chain updates
and then does its (quite time consuming) work on each new block. To make
it possible to find out what block the router currently is synced to, we
export its internal best height through a new method.
In this commit we add a new error for when we fail to validate the
funding transaction (invalid script, etc) and mark it as a zombie like
the other failed validation cases.
In this commit, we start to add any channels that fail the normal chain
validation to the zombie index. With this change, we'll ensure that we
won't continue to re-process the same set of spent channels over and
over again.
Fixes#5191.
This ensures the waiting receiving channel always receives an error to
prevent a deadlock when processing a network update that fails due to
the validation barrier.
On commit d5aedbcbd9:
1000 @ 0x43a285 0x44a38f 0xc42e86 0xc80fda 0xc8682d 0xc976c9 0x46fce1
github.com/lightningnetwork/lnd/routing.(*ChannelRouter).AddNode+0x245 github.com/lightningnetwork/lnd/routing/router.go:2218
github.com/lightningnetwork/lnd/discovery.(*AuthenticatedGossiper).addNode+0x3b9 github.com/lightningnetwork/lnd/discovery/gossiper.go:1510
github.com/lightningnetwork/lnd/discovery.(*AuthenticatedGossiper).processNetworkAnnouncement+0x574c github.com/lightningnetwork/lnd/discovery/gossiper.go:1554
github.com/lightningnetwork/lnd/discovery.(*AuthenticatedGossiper).networkHandler.func1+0x24github.com/lightningnetwork/lnd/discovery/gossiper.go:1043
This commit adds the block cache to the CfFilteredChainView struct
and wraps its GetBlock function so that block cache mutex map is used
when the call to neutrino's GetBlock function is called.