Prior to this commit, taproot channels had a bug:
- If a disconnect happened before peer.AddNewChannel was called,
then the subsequent reconnect would call peer.AddNewChannel and
attempt the ChannelReestablish dance.
- peer.AddNewChannel would call NewLightningChannel with
populated nonce ChannelOpts. This in turn would call
InitRemoteMusigNonces which would create a new musig pair session
and set the channel's pendingVerificationNonce to nil.
- During the reestablish dance, ProcessChanSyncMsg would be called.
This would also call InitRemoteMusigNonces, except it would fail
since pendingVerificationNonce was set to nil in the previous
invocation.
To fix this, we add a new functional option to signal to the init logic
that it doesn't need to call InitRemoteMusigNonces in in
ProcessChanSyncMsg.
This commit refactors some of the bookkeeping around the ping logic
inside of the Brontide. If the pong response is noncompliant with
the spec or if it times out, we disconnect from the peer.
This change makes the generation of the ping payload a no-arg
closure parameter, relieving the pingHandler of having to
directly monitor the chain state. This makes use of the
BestBlockView that was introduced in earlier commits.
This PR is a follow up, to a [follow
up](https://github.com/lightningnetwork/lnd/pull/7938) of an [initial
concurrency issue](https://github.com/lightningnetwork/lnd/pull/7856)
fixed in the peer goroutine.
In #7938, we noticed that the introduction of `p.startReady` can cause
`Disconnect` to block. This happens as `Disconnect` cannot be called
until `p.startReady` has been closed. `Disconnect` is also called from
`InboundPeerConnected` (the case of concurrent peers, so we need to
remove one of the connections) while the main server mutex is held. If
`p.Start` blocks for any reason, then this leads to the deadlock as: we
can't disconnect until we've finished starting, and we can't finish
starting as we need the disconnect caller to exit as it has the mutex.
In this commit, we now make the call to `prunePersistentPeerConnection`
async. The call to `prunePersistentPeerConnection` eventually wants to
grab the server mutex, which triggers the circular waiting scenario
above.
The main learning here is that no calls to the main server mutex path
can block from `p.Start`. This is more or less a stop gap to resolve the
issue initially introduced in v0.16.4. Assuming we want to move forward
with this fix, we should reexamine `p.startReady` all together, and also
revisit attempt to refactor this section of the code to eliminate the
mega mutex in the server in favor of a dedicated event loop.
* lnwallet: fix log output msg
The log message is off by one.
* htlcswitch: fail channel when revoking it fails.
When the revocation of a channel state fails after receiving a new
CommitmentSigned msg we have to fail the channel otherwise we
continue with an unclean state.
* docs: update release-docs
* htlcswitch: tear down connection if revocation processing fails
If we couldn't revoke due to a DB error, then we want to also tear down
the connection, as we don't want the other party to continue to send
updates. That may lead to de-sync'd state an eventual force close.
Otherwise, the database might be able to recover come the next
reconnection attempt.
* kvdb: use sql.LevelSerializable for all backends
In this commit, we modify the default isolation level to be
`sql.LevelSerializable. This is the strictness isolation type for
postgres. For sqlite, there's only ever a single writer, so this doesn't
apply directly.
* kvdb/sqlbase: add randomized exponential backoff for serialization failures
In this commit, we add randomized exponential backoff for serialization
failures. For postgres, we''ll his this any time a transaction set fails
to be linearized. For sqlite, we'll his this if we have many writers
trying to grab the write lock at time same time, manifesting as a
`SQLITE_BUSY` error code.
As is, we'll retry up to 10 times, waiting a minimum of 50 miliseconds
between each attempt, up to 5 seconds without any delay at all. For
sqlite, this is also bounded by the busy timeout set, which applies on
top of this retry logic (block for busy timeout seconds, then apply this
back off logic).
* docs/release-notes: add entry for sqlite/postgres tx retry
---------
Co-authored-by: ziggie <ziggie1984@protonmail.com>
In this commit, we attempt to fix circular waiting scenario introduced
inadvertently when [fixing a race condition
scenario](https://github.com/lightningnetwork/lnd/pull/7856). With that
PR, we added a new channel that would block `Disconnect`, and
`WaitForDisconnect` to ensure that only until the `Start` method has
finished would those calls be allowed to succeed.
The issue is that if the server is trying to disconnect a peer due to a
concurrent connection, but `Start` is blocked on `maybeSendNodeAnn`,
which then wants to grab the main server mutex, then `Start` can never
exit, which causes `startReady` to never be closed, which then causes
the server to be blocked.
This PR attempts to fix the issue by calling `maybeSendNodeAnn` in a
goroutine, so it can grab the server mutex and not block the `Start`
method.
Fixes https://github.com/lightningnetwork/lnd/issues/7924
Fixes https://github.com/lightningnetwork/lnd/issues/7928
Fixes https://github.com/lightningnetwork/lnd/issues/7866
In this commit, we add support for the new musig2 channel funding flow.
This flow is identical to the existing flow, but not both sides need to
exchange local nonces up front, and then signatures sent are now partial
signatures instead of regular signatures.
The funding manager also gains some new state of the local nonces it
needs to generate in order to send the funding locked message, and also
process the funding locked message from the remote party.
In order to allow the funding manger to generate the nonces that need to
be applied to each channel, then AddNewChannel method has been modified
to accept a set of options that the peer will then use to bind the
nonces to a new channel.
In this commit, we update the co-op close flow to support the new musig2
keyspend flow. We'll use some new functional options to allow a caller
to pass in an active musig2 session. If this is present, then we'll use
that to complete the musig2 flow by signing with a partial signature,
and then ultimately combining the signatures at the end.
In this commit, we modify the starting logic to note attempt to add a
tower client for taproot channels. Instead, we'll just log that this
isn't available yet.
This commit now sends messages to `chanStream` for both pending and
active channels. If the message is sent to a pending channel, it will be
queued in `chanStream`. Once the channel link becomes active, the early
messages will be processed.
This commit adds a new interface method, `RemovePendingChannel`, to be
used when the funding flow is failed after calling `AddPendingChannel`
such that the Brontide has the most up-to-date view of the active
channels.
This commit adds a new channel `newPendingChannel` and its dedicated
handler `handleNewPendingChannel` to keep track of pending open
channels. This should not affect the original handling of new active
channels, except `addedChannels` is now updated in
`handleNewPendingChannel` such that this new pending channel won't be
reestablished in link.
In this commit, we make sure that all the `wg.Add(1)` calls succeed
before we attempt to wait on the shutdown of all the goroutines. Under
rare scheduling scenarios, if both `Start` and `Disconnect` are called
concurrently, then this internal race error can be hit, causing the
panic to occur.
Fixes https://github.com/lightningnetwork/lnd/issues/7853
As new pending channels will be tracked in the following commits, the
`newChannels` is now renamed to `newActiveChannel` to explicitly refer
to active, non-pending channels.
In this commit, we add a new LinkFailureDisconnect action that'll be
used if we detect that the remote party hasn't sent a revoke and ack
when it actually should.
Before this commit, we would log our action, tear down the link, but
then not actually force a connection recycle, as we assumed that if the
TCP connection was actually stale, then the read/write timeout would
expire.
In practice this doesn't always seem to be the case, so we make a strong
action here to actually force a disconnection in hopes that either side
will reconnect and keep the good times rollin' 🕺.
In this commit, we add a new LinkFailureAction enum to take over the old
force close bool. Force closing isn't the only thing we might want to do
when we decide to fail the link, so this is a prep refactoring for an
upcoming change.
In preparation for a more complex function signature for set node
announcement, separate get and set so that readonly callers don't need
to handle the extra arguments.
The right way to solve the problem of the link not being up to date with
custom user set forwarding policies once the channel is announced would
be to pass in those custom values when the link is created initially.
This requires a bit more of a refactor and is not addressed in this bug
fix.
This commit makes retrying enabling channels conditional. We now would
only retry sending the enable request when the `ChanActiveTimeout` is no
greater than 1 min.
This commit adds a retry logic to the channels that failed with
`ErrEnableInactiveChan` when requesting enabling. We now subscribe the
channel events to decide what to do with the failed channels.
In this commit, we modify the way we compute the starting ideal fee for
the co-op close transaction. Before thsi commit, channel.CalcFee was
used, which'll compute the fee based on the commitment transaction
itself, rathern than the co-op close transaction. As the co-op close
transaction is potentailly bigger (two P2TR outputs) than the commitment
transaction, this can cause us to under estimate the fee, which can
result in the fee rate being too low to propagate.
To remedy this, we now compute a fee estimate from scratch, based on the
delivery fees of the two parties.
We also add a bug fix in the chancloser unit tests that wasn't caught
due to loop variable shadowing.
The wallet import itest has been updated as well, since we'll now pay
600 extra saothis to close the channel, since we're accounting for the
added weight of the P2TR outputs.
Fixes#6953
In this commit, we parse the new max fee field, and pass it through the
switch, all the way to the peer where it's ultimately passed into the
chan closer state machine.
This commit modifies the netann subsystem to use the peer's alias
for ChannelUpdates where appropriate (i.e. in case we are sending
the alias to the peer). It also modifies the loadActiveChannels
function in the peer package to handle upgrading a channel when the
scid-alias feature bit is turned on.
Warning messages are intended to add "softer" failure modes for peers,
so to start with we simply log the warnings sent to us. While we "may"
disconnect from the peer according to the spec, we start with the least
extreme option (which is also not a change in behavior because
previously we'd just log that we received an unknown odd message).
On startup, we'll check whether we have the coop close chan status
and have already broadcasted a coop close txn, and then make a
decision on whether to restart the process based on that.
This commit was previously split into the following parts to ease
review:
- 2d746f68: replace imports
- 4008f0fd: use ecdsa.Signature
- 849e33d1: remove btcec.S256()
- b8f6ebbd: use v2 library correctly
- fa80bca9: bump go modules
This was not properly enforced and would be a spec violation on the
peer's end. Also re-use a pong buffer to save on heap allocations if
there are a lot of peers. The pong buffer is only read from, so this
is concurrent safe.
In this commit, we fix an inadvertent memory leak by ensuring we always
use `defer` to clean up the allocated objects/memory we use to be
notified of new blocks to update what we send within the set of ping
headers.
A further optimization here would be using a single global block epoch
housed within the server, that all peer `pingHandler` goroutines use
directly.
Fixes#6143.
In this commit, we take an initial step towards converting the existing
breach arbiter and utxo nursery logic into contract resolvers by moving
the files as is, into the `contractcourt` pacakge.
This commit is primarily move only, though we had to massage some
interfaces and config names along the way to make things compile and the
tests run properly.
This commit adds a missing return statement to pingHandler. This
prevents a nil pointer dereference panic from happening if an error is
returned from RegisterBlockEpochNtfn.
Adds a new Brontide struct method tryLinkShutdown that attempts to
fetch the target link and calls ShutdownIfChannelClean on it. This
allows the coop close process to guarantee atomicity of the underlying
channel state. Also removes the UnregisterChannel method from the
chancloser's config as the link is shut down before the chancloser
is created.
In this commit, which builds on top of the prior commit, rather than
using the returned buffer outside of the closure (which means it'll be
copied), we instead use it within the `Submit` closure instead. With the
recent changes to the `brontide` package, we won't allocate any new
buffer when we decrypt, as a result, the `rawMsg` byte slice actually
just slices into the passed `buf` slice (obtained from the pool)`.
With this change, we ensure that the buffer pool can release the slice
back to the pool and eliminate any extra allocations along the way.
In this commit, we implement a long discussed mechanism to use the
Lightning Network as a redundant block header source. By sending our
latest block header with each ping message, we give peers another source
(outside of the Bitcoin P2P network) they can use to spot check their
chain state. Peers can also use this information to detect if they've
been eclipsed from the traditional Bitcoin P2P network itself.
As is, we only send this data in Ping messages (which are periodically
sent), in the future we could also move to send them as the partial
payload for our pong messages, and also randomize the payload size
requested as well.
This commit allows the peer to be tested without relying on a raw
htlcswitch.Switch pointer. This is accomplished by using a messageSwitch
interface and adding the CreateAndAddLink method to the
htlcswitch.Switch.
This commit adds a new config option: "--coop-close-target-confs"
which allows a user to override the default target confirmations of 6
that is used to estimate a fee rate to use during a co-op closure
initiated by a remote peer.
Add a new boolean parameter without changing any existing
functionality. The parameter will be used to indicate
whether a request to update a channel's status was made
manually by a user (currently always false).
In this commit, we fix an issue that would cause peers running lnd 0.12
to not be able to connect to existing peers due to a feature bit
compatibility issue. In a recent PR we started to downgrade our required
feature bit for static key from required to optional, if we had a legacy
(non-tweakless) open with the peer then we would unset the required bit
and set the optional bit to ensure we could still connect to them.
The change implementing this new version of downgrade failed _also_
unset the bit (the required bit) in the "legacy global" feature bit
section. This caused the `RawFeatureVector.Merge` method to fail as we
would have the required bit set in the `GlobalFeatures` section, but the
optional bit set in the `Features` section. The `Merge` method ensures
that a required and optional bit can't be set in two different locations
for the same feature.
This PR fixes this issue by also unsetting the bit in the
`GlobalFeatures` field in the init message.
Fixes#4871
This commit caps the update fee the initiator will send when the anchors
channel type is used. We do not limit anything on the receiver side.
10 sat/vbyte is the current default max fee rate we use. This should be
enough to ensure propagation before anchoring down the commitment
transaction.
This commit unsets option static remote key required for peers that
we have existing legacy channels with. This ensures that we can still
connect to peers that do not recognize this feature bit that we have
existing channels with.
This commit changes the logic when garbage collecting forwarding
packages such that they are removed once when the function is called,
and then again upon subsequent ticks. This allows us to bump the
peer timer to 1 hour to limit the number of db transactions happening
in lnd. The forwarding packages need to be removed initially as
otherwise a flappy node will never have them garbage collected.