Running test cases in parallel with shared state means that they
can intermingle if they're run at the same time and set up incorrect
values when we assert on the number of channels we have. Moving the
peer setup into the parallel test run fixes this because no single
test case can interfere with the other.
In processZombieUpdate, the SCID passed to MarkEdgeLive should _not_ be
derived from the ChannelEdgeInfo ChannelID field since this field will
not be populated when GetChannelByID returns a ChannelEdgeInfo along
with an ErrZombieEdge error. So this commit ensures that a usable
SCID is provided to processZombieUpdate.
Let MarkEdgLive return a new ErrNotZombieEdge error if an entry with the
given channel ID cannot be found. In processZombieUpdate, we then
check for this error and log accordingly.
This commit makes sure a testing payment is created via
`createDummyLightningPayment` to ensure the payment hash is unique to
avoid collision of the same payment hash being used in uint tests. Since
the tests are running in parallel and accessing db, if two difference
tests are using the same payment hash, no clean test state can be
guaranteed.
This commit adds unit tests for `resumePayment`. In addition, the
`resumePayment` has been split into two parts so it's easier to be
tested, 1) sending the htlc, and 2) collecting results. As seen in the
new tests, this split largely reduces the complexity involved and makes
the unit test flow sequential.
This commit also makes full use of `mock.Mock` in the unit tests to
provide a more clear testing flow.
The old payment lifecycle is removed due to it's not "unit" -
maintaining these tests probably takes as much work as the actual
methods being tested, if not more so. Moreover, the usage of the old
mockers in current payment lifecycle test is removed as it re-implements
other interfaces and sometimes implements it uniquely just for the
tests. This is bad as, not only we need to work on the actual interface
implementations and test them , but also re-implement them again in the
test without testing them!
This commit adds a new interface method `TerminalInfo` and changes its
implementation to return an `*HTLCAttempt` so it includes the route for
a successful payment. Method `GetFailureReason` is now removed as its
returned value can be found in the above method.
This commit adds a new struct, `stateStep`, to decide the workflow
inside `resumePayment`.
It also refactors `collectResultAsync` introducing a new channel
`resultCollected`. This channel is used to signal the payment
lifecycle that an HTLC attempt result is ready to be processed.
This commit makes sure we only fail attempt inside `handleSwitchErr` to
ensure the orders in failing payment and attempts. It refactors
`collectResult` to return `attemptResult`, and expands `handleSwitchErr`
to also handle the case where the attemptID is not found.
This commit refactors the `resumePayment` method by adding the methods
`checkTimeout` and `requestRoute` so it's easier to understand the flow
and reason about the error handling.
This commit removes the method `launchShard` and splits its original
functionality into two steps - first create the attempt, second send the
attempt. This enables us to have finer control over "which error is
returned from which system and how to handle it".
This commit starts handling switch error inside `sendAttempt` when an
error is returned from sending the HTLC. To make sure the updated
`HTLCAttempt` is always returned to the callsite, `handleSwitchErr` now
also returns a `attemptResult`.
This commit removes the `launchOutcome` and `shardResult` and uses
`attemptResult` instead. This struct is also used in `failAttempt` so we
can future distinguish critical vs non-critical errors when handling
HTLC attempts.
`handleSwitchErr` is now responsible for failing the given HTLC attempt
after deciding to fail the payment or not. This is crucial as
previously, we might enter into a state where the payment's HTLC has
already been marked as failed, and while we are marking the payment as
failed, another HTLC attempt can be made at the same time, leading to
potential stuck payments.
This commit removes the unclear abstraction `shardHandler` that's used
in our payment lifecycle. As we'll see in the following commits,
`shardHandler` is an unnecessary layer and everything can be cleanly
managed inside `paymentLifecycle`.
Without this the following could happen:
* InboundPeerConnected is called while we already have an inbound
connection with the peer. This calls removePeer which calls Disconnect.
* If the peer is starting up in Start, it may be sending messages
synchronously via SendMessage(true, ...). This eventually calls the
writeMessage function which will exit if disconnect is set to 1.
* Since Disconnect was called, disconnect will be 1 and writeMessage
will exit, causing writeHandler to exit.
* If there is more than 1 message being sent, later messages will
queue in queueHandler but be unable to get into sendQueue as the
writeHandler goroutine has exited.
* The synchronous sends will be waiting on the errChan indefinitely
and startReady will never get closed meaning Disconnect will never
proceed.
The end result is that the server's mutex will be held until shutdown.
Avoid this by using writeMessage to bypass the writeHandler goroutine.