This commit moves over the last two methods, `RegisterChannel` and
`BackupState` from the `Client` to the `Manager` interface. With this
change, we no longer need to pass around the individual clients around
and now only need to pass the manager around.
To do this change, all the goroutines that handle channel closes,
closable sessions needed to be moved to the Manager and so a large part
of this commit is just moving this code from the TowerClient to the
Manager.
This commit removes the mutex from ClientStats and instead puts that in
clientStats which wraps ClientStats with a mutex. This is so that the
tower client interface can return a ClientStats struct without worrying
about copying a mutex.
Simiarly to the previous commit, this commit moves the RemoveTower
method from the Client to the TowerClientManager interface. The manager
handles any DB related handling. The manager will first attempt to
remove the tower from the in-memory state of each client and then will
attempt to remove the tower from the DB. If the removal from the DB
fails, the manager will re-add the tower to the in-memory state of each
client.
In this commit we move the AddTower method from the Client interface to
the TowerClientManager interface. The wtclientrpc is updated to call the
`AddTower` method of the Manager instead of calling the `AddTower`
method of each individual client. The TowerClient now is also only
concerned with adding a new tower (or new tower address) to its
in-memory state; the tower Manager will handle adding the tower to the
DB.
In this commit, the `Stop` and `Start` methods are removed from the
`Client` interface and instead added to the new `Manager`. Callers now
only need to call the Manager to start or stop the clients instead of
needing to call stop/start on each individual client.
Introduce a wtclient `Manager` which handles tower clients. It indexes
clients by the policy used. The policy field is thus removed from the
`Config` struct which configures the Manager and is instead added to a
new `towerClientCfg` which configures a specific client managed by the
manager. For now, only the `NewClient` method is added to the Manager.
It can be used to construct a new `TowerClient`. The Manager currently
does notthing with the clients added to it.
This commit also adds tests for the DB changes made in the previous
commit since we can now read the new field with the FetchChanInfos
method.
The commit following this one does the backfill migration.
In this commit, we introduce the concept of a rogue update. An update is
rogue if we need to ACK it but we have already deleted all the data for
the associated channel due to the channel being closed. In this case, we
now no longer error out and instead keep count of how many rogue updates
a session has backed-up.
This commit adds a new test to the tower client to demonstrate a bug
that can happen if a channel is closed while an update for it has yet to
be acked by the tower server. This will be fixed in an upcomming commit.
The watchtower client test framework currently uses a mock version of
the tower client DB. This can lead to bugs if the mock DB works slightly
differently to the actual bbolt DB. So this commit ensures that we only
use the bbolt db for the tower client tests. We also increment the
`waitTime` used in the tests a bit to account for the slightly longer DB
read and write times. Doing this switch resulted in one bug being
caught: we were not removing sessions from the in-memory set on deletion
of the session and so that is fixed here too.
In this commit, we update the Sig type to support ECDSA and schnorr
signatures. We need to do this as the HTLC signatures will become
schnorr sigs for taproot channels. The current spec draft opts to
overload this field since both the sigs are actually 64 bytes in length.
The only consideration with this move is that callers need to "coerce" a
sig to the proper type if they need schnorr signatures.
In this commit, the bugs demonstrated in prior commits are fixed. In the
case where an session has persisted a CommittedUpdate and the tower is
being removed, the session will now replay that update on to the main
task pipeline so that it can be backed up using a different session.
Add a new DeleteCommittedUpdate method to the wtdb In preparation for an
upcoming commit that will replay committed updates from one session to
another.
This commit demonstrates that if a session has persisted committed
updates and the client is restarted _after_ these committed updates have
been persisted, then removing the tower will fail.
In this commit, we demonstrate the situation where a client has
persisted CommittedUpdates but has not yet recieved Acks for them from
the tower. If this happens and the client attempts to remove the tower,
it will with the "tower has unacked updates" error.
This commit does a few things:
- First, it gives the sessionQueue access to the TowerClient task
pipeline so that it can replay backup tasks onto the pipeline on Stop.
- Given that the above is done, the ForceQuit functionality of the
sessionQueue and TowerClient can be removed.
- The bug demonstrated in a prior commit is now fixed due to the above
changes.
In preparation for an upcoming commit where multiple threads will have
access to the TowerClient sessionQueueSet, we turn it into a thread safe
struct.
This commit demonstrates a bug. It shows that if backup tasks have been
bound to a session with a tower (ie, the tasks are in the session's
pendingQueue) and then the tower is removed and a new one is added, then
the tasks from the pendingQueue are _not_ replayed to the session with
the new tower. Instead, they are silently lost. This will be fixed in an
upcoming commit.
This commit adds a new watchtower client test to demonstrate that a
client is able to successfully switch to a new tower and continue
backing up updates to that new tower.
Test AddressIterator for the absence of panics, nil addresses, and empty
lists.
This fuzz test finds https://github.com/lightningnetwork/lnd/issues/7552
in seconds. No other panics found after 300+ CPU-hours of fuzzing.
In this commit, we add an Identifier method to the blob.Type struct
which returns a unique identifier for a given blob type. This identifier
is then used for initialising the disk overflow queue of the given
client.
In this commit, a new generic DiskOverflowQueue implementation is added.
This allows a user to specify a maximum number of items that the queue
can hold in-memory. Any new items will then overflow to disk. The
producer and consumer of the queue items will interact with the queue
just like a normal in-memory queue.
This commit adds a test to the wtclient. The test demonstrates that if a
client tries to back up states while it has no active sessions with a
server then those updates are accumlated in memory and lost on restart.
This will be fixed in upcoming commits.
Lock the `backupMu` when accessing `c.chanCommitHeights` in the `New`
function. It is not strictly necessary right now but good to add it so
that there is no accidental oversight if the `perUpdate` method is ever
extracted and reused in future.
Since the retrubution info of a backup task is now only constructed at
the time that the task is being bound to a session, the in-memory queue
only needs to carry the BackupID of the task.
Since the TowerClient now has a callback that it can use to retrieve the
retribution for a certain channel and commit height, let it use this
call back instead of requiring the info to be passed to it through
BackupState.
In this commit, a new BuildBreachRetribution callback is added to the
tower client's Config struct. The main LND server provides the client
with an implementation of the callback.
If the tower returns CreateSessionCodeAlreadyExists in response to the
CreateSession message from the client, then skip forward a few key
indices until we find one that the server does not return the error
for. This will allow a client to recover after a data loss incident.
This commit adds a forceNext boolean parameter to NextSessionKeyIndex.
Setting this param to true will force the index to cycle over 1000 key
indices before returning the new key.
In this commit, a test is added to demonstrate how clients can end up
getting the StateUpdateCodeClientBehind error from a tower server. This
can happen if a client ever deletes their db. If they do this then the
sessions they create with the tower will have the same IDs as the
sessions created in the now deleted db. This is because the session keys
(and thus session IDs) are calculated deterministically from a counter
(which is reset if the db is deleted). The tower server then throws this
error because the client would say that the sequence ID is 1 for the
next update.
Ensure that calling Next twice in a row without first calling Reset is
safe when the iterator is at the end of its list. Also alter the
towerListIterator to call Reset after hitting an error on Next.