This PR is a follow up, to a [follow
up](https://github.com/lightningnetwork/lnd/pull/7938) of an [initial
concurrency issue](https://github.com/lightningnetwork/lnd/pull/7856)
fixed in the peer goroutine.
In #7938, we noticed that the introduction of `p.startReady` can cause
`Disconnect` to block. This happens as `Disconnect` cannot be called
until `p.startReady` has been closed. `Disconnect` is also called from
`InboundPeerConnected` (the case of concurrent peers, so we need to
remove one of the connections) while the main server mutex is held. If
`p.Start` blocks for any reason, then this leads to the deadlock as: we
can't disconnect until we've finished starting, and we can't finish
starting as we need the disconnect caller to exit as it has the mutex.
In this commit, we now make the call to `prunePersistentPeerConnection`
async. The call to `prunePersistentPeerConnection` eventually wants to
grab the server mutex, which triggers the circular waiting scenario
above.
The main learning here is that no calls to the main server mutex path
can block from `p.Start`. This is more or less a stop gap to resolve the
issue initially introduced in v0.16.4. Assuming we want to move forward
with this fix, we should reexamine `p.startReady` all together, and also
revisit attempt to refactor this section of the code to eliminate the
mega mutex in the server in favor of a dedicated event loop.
In this commit, we modify the incoming contest resolver to use a
concurrent queue. This is meant to ensure that the invoice registry
subscription loop never blocks. This change is meant to be minimal and
implements option `5` as outlined here:
https://github.com/lightningnetwork/lnd/issues/8023.
With this change, the inner loop of the subscription dispatch method in
the invoice registry will no longer block, as the concurrent queue uses
a fixed buffer of a queue, then overflows into another queue when that
gets full.
Fixes https://github.com/lightningnetwork/lnd/issues/7917
This tests that the funding manager doesn't immediately consider
a coinbase transaction that is also a funding transaction usable
until the maturity has been reached.
In this commit, we add the ability to obtain blocking and mutex
profiles. The blocking profile will show which goroutines are
consistently blocked on synchronization primitives like channels, or
I/O. The mutex profile will show which mutexes are very contested.
The blocking profile can be enabled with a new arg: `--blockingprofile`.
The mutex profile can be enabled with a new arg: `--mutexprofile`. These
are both ignored if the profile port isn't set.
Activating these profiles requires the caller to pass in a sampling
rate. For now I've set it just to `1` to test things out. Unfortunately
documentation is rather scarce, so there aren't any good guides re what
these values should be set to. AFAICT, these add more overhead than the
other prowling options, so they shouldn't necessarily be enabled
persistently in production.
In this commit, we update the set of protos to accept the local secret
nonces over RPC. This is actually a 97 byte value, as it includes the
two 32 byte nonces, as well as the 33 byte value of the public key of
the signer.
This is needed in order to be able to open taproot channels over the RPC
interface.
In this commit, we modify the musig2 interfaces to instead use an
explicit value for the local nonces. Before this commit, we used the
functional option, but we want to also support specifying this value
over RPC for the remote signer. The functional option pattern is opaque,
so we can't get the nonce value we need. To get around this, we'll just
make this an explicit pointer, then map this to the functional option at
the very last moment.
In this commit, we increase the max message size for the ws proxy. We
have a similar setting for the normal gRPC server which was tuned to be
able to support decoding `GetNetworkInfo` as the channel graph got
larger. We keep the default buffer size of 64 KB, but allow that to be
expanded to up to 4 MB (current value) to decode larger messages.
One alternative would be to modify the `Split` function to break up
larger lines into smaller ones. We'd need to double check that the
libraries at a higher level of abstraction can handle the chunks. The
scan function would look something like:
```go
splitFunc := func(data []byte, eof bool) (int, []byte, error) {
if len(data) >= chunkSize {
return chunkSize, data[:chunkSize], nil
}
return bufio.ScanLines(data, eof))
}
scanner.Split(splitFunc)
```
In this commit, update the start up logic to gracefully handle a
seemingly rare case. In this case, a peer detects local data loss with a
set of active HTLCs. These HTLCs then eventually expire (they may or may
not actually "exist"), causing a force close decision. Before this PR,
this attempt would fail with a fatal error that can impede start up.
To better handle such a scenario, we'll now catch the error when we fail
to force close due to entering the DLP and instead terminate the state
machine at the broadcast state. When a commitment transaction eventually
confirms, we'll play it as normal.
Fixes https://github.com/lightningnetwork/lnd/issues/7984
In this commit, we introduce the concept of a rogue update. An update is
rogue if we need to ACK it but we have already deleted all the data for
the associated channel due to the channel being closed. In this case, we
now no longer error out and instead keep count of how many rogue updates
a session has backed-up.
This commit adds a new test to the tower client to demonstrate a bug
that can happen if a channel is closed while an update for it has yet to
be acked by the tower server. This will be fixed in an upcomming commit.
The watchtower client test framework currently uses a mock version of
the tower client DB. This can lead to bugs if the mock DB works slightly
differently to the actual bbolt DB. So this commit ensures that we only
use the bbolt db for the tower client tests. We also increment the
`waitTime` used in the tests a bit to account for the slightly longer DB
read and write times. Doing this switch resulted in one bug being
caught: we were not removing sessions from the in-memory set on deletion
of the session and so that is fixed here too.
In this commit, we use exhaustive build tags to ensure that we can
always build the `sqlbase` package, independent of the set build tags.
To do this, we move the type declarations _into_ the parsing functions.
This then allows us to create two versions for each db: with the db, and
without it.
To avoid a module tag round trip to get this working, we use a local
replace for now. Once this is merged in, we can do the tag (along side
rc3), then remove the replace.