* Add test to check that we split short channel ids correctly
reply_channel_range messages should not overlap i.e different replies should not contain
channel ids that have the same block height.
The test in this commit fails, because our 'split' function needs to be updated.
* Channel Queries: make sure that our replies match the request range (fixes#1269)
Even though it's not completely explicit in the specs, we should make sure that
the [firstBlock, numBlock] range that we cover in our replies is not computed
from the ids that we actually have but instead matches the [firstBlock, numBlock] range
that was requested.
* Make sure that serialised replies stay below the 65Kb limit
We prune short channel id chunks to make sure that serialised replies stay below the 65 Kb limit.
The pruning algo is very simple: for each chunk we randomly keep the first or last 3200 ids
Selection is random so peers that re-connect will eventually receive all channel info.
The limit of 3200 was chosen for the worst case where replies are not compressed and include timestamps and checksum.
It is a fairly conservative boundary, the highest number of public channels in a single block so far is <300, and
there 3200 is roughly the currently observed number of transactions in a "full" block.
* Set default ids chunk size to 1500
Have smaller chunks (smaller than 3200 / 2) reduces the probability of merging 2 chunks and having to prune the result because the encoded reply would be over 65K.
* Smarter algo for enforcing max chunk size policy
Instead of keeping either the first or last items, we use a random offset. This way peers will eventually receive info about all channels even if chunks are much larger than the max chunk size and are pruned.
There is currently a backwards-compatibility issue with eclair-mobile.
Eclair-mobile mistakes feature bit 15 (payment_secret) for the
gossip_queries_ex prototype (which is incompatible with the spec-ed version).
To temporarily avoid this issue (until eclair-mobile is patched and all users have updated),
we never advertize those ambiguous bits in Init.
They're only really needed in the invoice so it's ok.
Implement https://github.com/lightningnetwork/lightning-rfc/pull/666
Keep the global/local split in Commitments to avoid backwards incompatibility in the codec.
Remove allowMultiPart API field: we instead rely on the MPP feature being set in nodeParams.
That means MPP-enabled nodes need to update their reference.conf.
Rework features:
* Add types to allow cleaner dependency validation.
* Most of the time we don't care whether a feature is activated as optional or mandatory, which caused duplicate code. This is now handled more cleanly.
* It also paves the way to annotate features with the places they should be advertised (Init vs NodeAnn vs ChannelAnn vs invoice).
This is safer for now since the splitting algorithm isn't working
well on nodes with a large number of channels and we don't
expect too many payments from Phoenix to non-Phoenix to
actually need MPP in the short term.
Mockito sometimes throws an unnecessary stubbing exception, it's unclear whether the test is faulty or mockito has issues with our parallel setup.
Rewrite switchboard tests without mockito makes them more flexible.
In case they randomly fail we should get more useful data to help troubleshooting.
Start relaying trampoline payments with multi-part aggregation (disabled by default,
must be enabled with config).
Recovery after a restart is correctly handled, even if payments were being forwarded.
No DB schema update in this commit.
The trampoline UX will be somewhat bad because many improvements/polish are missing.
Some shortcuts were taken, a few hacks here and there need to be fixed, but nothing too scary.
Those improvements will be done in separate commits before the next release.
Randomization is necessary, otherwise if two peers attempt to reconnect
to each other in a synchronized fashion, they will enter in a
disconnect-reconnect loop.
We already had randomization for the initial reconnection attempt, but
further reconnection attempts were using a deterministic schedule
following an exponential backoff curve.
Fixes#1238.
* Add a configurable time-out to onchain fee provider requests
We configure a timeout of 5 seconds, applicable to all fee providers. If a provider times out we switch to the next one in our list.
Our mobile app needs a feerate to start properly and currently waits too long when a fee provider is online but very slow to respond.
We can't guarantee with the current algorithm that the last HTLC won't be
a small one (the leftovers).
If we see that happen in real scenario, we'll need to add heuristics to avoid it.
This allows us to only use logback.xml to control the log level.
From akka docs [1]:
> If you set the loglevel to a higher level than DEBUG, any DEBUG events
will be filtered out already at the source and will never reach the
logging backend, regardless of how the backend is configured.
> You can enable DEBUG level for akka.loglevel and control the actual
level in the SLF4J backend without any significant overhead, also for
production.
[1] https://doc.akka.io/docs/akka/current/logging.html
Using the `max()` aggregating function on outgoing payments'
timestamps, we can ensure that the non-aggregated columns
for the outgoing payments contain the most recent/pertinent data.
If a chain re-org happens and a new ShortChannelId is assigned,
the `Relayer` kept both entries (new and old).
This resulted in an incorrect balance because we effectively counted this channel twice.
While #1222 was being reviewed, a new unit test was added to OnionCodecsSpec.
It didn't cause any file conflict so Github didn't warn about merging #1222.
However this test needed to be updated to the new truncated int format.
The spec defines tu64 (and friends) without the length prefix.
Multi-part uses a tu64 without a length prefix inside the PaymentData record.
Our previous implementation only supported using tu64 alone in a TLV record.
We make this more flexible by separating the length encoding.
MPP implies payment secret.
Avoid raising exceptions in PaymentInitiator: validate invoice instead of using a require.
This way senders always get a response.
We previously had some logic where we would fail incoming HTLCs
for which we were the final recipient when a channel would come online.
That made sense when we didn't have MPP, but with MPP we cannot do that.
There is a risk that we would be failing HTLCs that are considered received by the MPP FSM.
Instead we need to use the CommandBuffer when we are the final recipient.
This way pending commands cannot be lost and HTLCs are cleaned-up on restart.
This includes a bit of refactoring in `MultiPartPaymentLifecycle`. Note
that we can't use the `onTermination` handler to finish the spans,
because it is asynchronous and may not be called after a long time.
That's why we use a dedicated `myStop` function.
In Kamon 2.0, by default spans are automatically generated for tracked
actors, which we don't want because we define our own spans. That's why
there is an additional configuration in `application.conf`.
MPP split/retry improvements:
* Only use public channels when sending to remote node
* Don't retry when sending to direct peer
* Blacklist channels that are a bad route prefix
When paying a multi-part payment, we tell the PaymentLifecycle to use a route prefix that contains the first hop (for example a -> b via channel 1).
We need to also tell the router to ignore the nodes that are in the route prefix, otherwise when retrying it may try some completely dumb routes that have no chance of succeeding.
* Fix `allUpdates` API when used with the public key filter, the API now returns all updates that involve a channel of which the filter key has made an update
This is due to a callback being executed after the parent actor has been
cleaned up. We don't really care about the result anyway, so we can
safely ignore, even if the issue only arises in tests.
The root problem here is that we are making references to actor methods
from a callback, which we shouldn't do, because whatever we reference
may have disappeared by the time the callback tries to access it. A
better pattern would be to `pipe` the results of the `Future` to
oneself, but that would require more work and possibly change the FSM,
which seems overkill for the issue at hand.
When an actor sends a message to itself as part of its class definition,
there is no guarantee that this message will be processed first. Relying
on that to set the default payment handler is problematic and causes
race conditions in tests.
Add support for multi-part payments (MPP).
We can now send and receive multi-part payments, with a somewhat basic splitting algorithm that will be refined based on real-world usage.
Compatibility with other implementations hasn't been tested yet as they don't have a branch ready.
This compatibility testing may reveal small details that need to be changed and may invalidate pending multi-part invoices.