In this commit, we add the ability to obtain blocking and mutex
profiles. The blocking profile will show which goroutines are
consistently blocked on synchronization primitives like channels, or
I/O. The mutex profile will show which mutexes are very contested.
The blocking profile can be enabled with a new arg: `--blockingprofile`.
The mutex profile can be enabled with a new arg: `--mutexprofile`. These
are both ignored if the profile port isn't set.
Activating these profiles requires the caller to pass in a sampling
rate. For now I've set it just to `1` to test things out. Unfortunately
documentation is rather scarce, so there aren't any good guides re what
these values should be set to. AFAICT, these add more overhead than the
other prowling options, so they shouldn't necessarily be enabled
persistently in production.
In this commit, we update the set of protos to accept the local secret
nonces over RPC. This is actually a 97 byte value, as it includes the
two 32 byte nonces, as well as the 33 byte value of the public key of
the signer.
This is needed in order to be able to open taproot channels over the RPC
interface.
In this commit, we modify the musig2 interfaces to instead use an
explicit value for the local nonces. Before this commit, we used the
functional option, but we want to also support specifying this value
over RPC for the remote signer. The functional option pattern is opaque,
so we can't get the nonce value we need. To get around this, we'll just
make this an explicit pointer, then map this to the functional option at
the very last moment.
In this commit, we increase the max message size for the ws proxy. We
have a similar setting for the normal gRPC server which was tuned to be
able to support decoding `GetNetworkInfo` as the channel graph got
larger. We keep the default buffer size of 64 KB, but allow that to be
expanded to up to 4 MB (current value) to decode larger messages.
One alternative would be to modify the `Split` function to break up
larger lines into smaller ones. We'd need to double check that the
libraries at a higher level of abstraction can handle the chunks. The
scan function would look something like:
```go
splitFunc := func(data []byte, eof bool) (int, []byte, error) {
if len(data) >= chunkSize {
return chunkSize, data[:chunkSize], nil
}
return bufio.ScanLines(data, eof))
}
scanner.Split(splitFunc)
```
In this commit, update the start up logic to gracefully handle a
seemingly rare case. In this case, a peer detects local data loss with a
set of active HTLCs. These HTLCs then eventually expire (they may or may
not actually "exist"), causing a force close decision. Before this PR,
this attempt would fail with a fatal error that can impede start up.
To better handle such a scenario, we'll now catch the error when we fail
to force close due to entering the DLP and instead terminate the state
machine at the broadcast state. When a commitment transaction eventually
confirms, we'll play it as normal.
Fixes https://github.com/lightningnetwork/lnd/issues/7984
In this commit, we introduce the concept of a rogue update. An update is
rogue if we need to ACK it but we have already deleted all the data for
the associated channel due to the channel being closed. In this case, we
now no longer error out and instead keep count of how many rogue updates
a session has backed-up.
This commit adds a new test to the tower client to demonstrate a bug
that can happen if a channel is closed while an update for it has yet to
be acked by the tower server. This will be fixed in an upcomming commit.
The watchtower client test framework currently uses a mock version of
the tower client DB. This can lead to bugs if the mock DB works slightly
differently to the actual bbolt DB. So this commit ensures that we only
use the bbolt db for the tower client tests. We also increment the
`waitTime` used in the tests a bit to account for the slightly longer DB
read and write times. Doing this switch resulted in one bug being
caught: we were not removing sessions from the in-memory set on deletion
of the session and so that is fixed here too.
In this commit, we use exhaustive build tags to ensure that we can
always build the `sqlbase` package, independent of the set build tags.
To do this, we move the type declarations _into_ the parsing functions.
This then allows us to create two versions for each db: with the db, and
without it.
To avoid a module tag round trip to get this working, we use a local
replace for now. Once this is merged in, we can do the tag (along side
rc3), then remove the replace.
The docker image have been updated so we are using another protobuf
version to generate the files. The generate files include the version of
the compiler used to creating them, so we need this commit to pass the
`rpc-check` step in our CI.
Bump all build go versions to v1.21.0
Bump the minimum build package version to v1.19.0
Debian "buster" is not longer supported. Security updates have been
discontinued since June 30th 2022. We will build using the latest
version, "bookworm".
When the numTweaks is zero, we should return a nil instead of
initializing an empty map as we'd get the following error,
```
Diff:
--- Expected
+++ Actual
@@ -11007,4 +11007,3 @@
},
- BreachedHtlcTweaks: (contractcourt.htlcTapTweaks) {
- },
+ BreachedHtlcTweaks: (contractcourt.htlcTapTweaks) <nil>,
```
Fix the following uint test flake,
```
--- FAIL: TestHistoricalConfDetailsTxIndex (0.00s)
--- FAIL: TestHistoricalConfDetailsTxIndex/rpc_polling_enabled (1.16s)
bitcoind_test.go:174: should have found the transaction within the mempool, but did not: TxNotFoundIndex
FAIL
```
This commit fixes the error,
```
go: version constraints conflict:
github.com/dhui/dktest@v0.3.16 requires golang.org/x/time@v0.0.0-20220224211638-0e9765cccd65, but golang.org/x/time@v0.0.0-20210220033141-f8bda1e9f3ba is requested
github.com/golang-migrate/migrate/v4@v4.16.1 requires golang.org/x/time@v0.0.0-20220224211638-0e9765cccd65, but golang.org/x/time@v0.0.0-20210220033141-f8bda1e9f3ba is requested
```
By calling `go get golang.org/x/time@v0.0.0-20220224211638-0e9765cccd65"
and `go mod tidy`.
Fix the following conflict,
```
go: conflicting replacements for github.com/ulikunitz/xz:
github.com/ulikunitz/xz@v0.5.11
github.com/ulikunitz/xz@v0.5.8
```
* lnwallet: fix log output msg
The log message is off by one.
* htlcswitch: fail channel when revoking it fails.
When the revocation of a channel state fails after receiving a new
CommitmentSigned msg we have to fail the channel otherwise we
continue with an unclean state.
* docs: update release-docs
* htlcswitch: tear down connection if revocation processing fails
If we couldn't revoke due to a DB error, then we want to also tear down
the connection, as we don't want the other party to continue to send
updates. That may lead to de-sync'd state an eventual force close.
Otherwise, the database might be able to recover come the next
reconnection attempt.
* kvdb: use sql.LevelSerializable for all backends
In this commit, we modify the default isolation level to be
`sql.LevelSerializable. This is the strictness isolation type for
postgres. For sqlite, there's only ever a single writer, so this doesn't
apply directly.
* kvdb/sqlbase: add randomized exponential backoff for serialization failures
In this commit, we add randomized exponential backoff for serialization
failures. For postgres, we''ll his this any time a transaction set fails
to be linearized. For sqlite, we'll his this if we have many writers
trying to grab the write lock at time same time, manifesting as a
`SQLITE_BUSY` error code.
As is, we'll retry up to 10 times, waiting a minimum of 50 miliseconds
between each attempt, up to 5 seconds without any delay at all. For
sqlite, this is also bounded by the busy timeout set, which applies on
top of this retry logic (block for busy timeout seconds, then apply this
back off logic).
* docs/release-notes: add entry for sqlite/postgres tx retry
---------
Co-authored-by: ziggie <ziggie1984@protonmail.com>
In this commit, we attempt to fix circular waiting scenario introduced
inadvertently when [fixing a race condition
scenario](https://github.com/lightningnetwork/lnd/pull/7856). With that
PR, we added a new channel that would block `Disconnect`, and
`WaitForDisconnect` to ensure that only until the `Start` method has
finished would those calls be allowed to succeed.
The issue is that if the server is trying to disconnect a peer due to a
concurrent connection, but `Start` is blocked on `maybeSendNodeAnn`,
which then wants to grab the main server mutex, then `Start` can never
exit, which causes `startReady` to never be closed, which then causes
the server to be blocked.
This PR attempts to fix the issue by calling `maybeSendNodeAnn` in a
goroutine, so it can grab the server mutex and not block the `Start`
method.
Fixes https://github.com/lightningnetwork/lnd/issues/7924
Fixes https://github.com/lightningnetwork/lnd/issues/7928
Fixes https://github.com/lightningnetwork/lnd/issues/7866