We do this instead of using the source of the AnnounceSignatures
message, as we filter out the source when broadcasting any
announcements, leading to the remote node not receiving our channel
update. Note that this is done more for the sake of correctness and to
address a flake within the integration tests, as channel updates are
sent directly and reliably to channel counterparts.
As similarly done with premature channel announcements, we'll no longer
allow premature channel updates to be rebroadcast once mature. This is
no longer necessary as channel announcements that we're not aware of are
usually broadcast to us with their accompanying channel updates.
In this commit, we add a new option to toggle gossip rate limiting. This
new option can be useful in contexts that require near instant
propagation of gossip messages like integration tests.
This change was largely motivated by an increase in high disk usage as a
result of channel update spam. With an in memory graph, this would've
gone mostly undetected except for the increased bandwidth usage, which
this doesn't aim to solve yet. To minimize the effects to disks, we
begin to rate limit channel updates in two ways. Keep alive updates,
those which only increase their timestamps to signal liveliness, are now
limited to one per lnd's rebroadcast interval (current default of 24H).
Non keep alive updates are now limited to one per block per direction.
This allows for a 1000 different validation operations to proceed
concurrently. Now that we are batching operations at the db level, the
average number of outstanding requests will be higher since the commit
latency has increased. To compensate, we allow for more outstanding
requests to keep the gossiper busy while batches are constructed.
This reworks the locking behavior of the Gossiper so that a race
condition on channel updates and block notifications doesn't cause any
loss of messages.
This fixes an issue that manifested mostly as flakes on itests during
WaitForNetworkChannelOpen calls.
The previous behavior allowed ChannelUpdates to be missed if they
happened concurrently to block notifications. The
processNetworkAnnoucement call would check for the current block height,
then lock the gossiper and add the msg to the prematureAnnoucements
list. New blocks would trigger an update to the current block height
then a lock and check of the aforementioned list.
However, specially during itests it could happen that the missing lock
before checking the height could case a race condition if the following
sequence of events happened:
- A new ChannelUpdate message was received and started processing on a
separate goroutine
- The isPremature() call was made and verified that the ChannelUpdate
was in fact premature
- The goroutine was scheduled out
- A new block started processing in the gossiper. It updated the block
height, asked and was granted the lock for the gossiper and verified
there was zero premature announcements. The lock was released.
- The goroutine processing the ChannelUpdate asked for the gossiper lock
and was granted it. It added the ChannelUpdate in the
prematureAnnoucements list. This can never be processed now.
The way to fix this behavior is to ensure that both isPremature checks
done inside processNetworkAnnoucement and best block updates are made
inside the same critical section (i.e. while holding the same lock) so
that they can't both check and update the prematureAnnoucements list
concurrently.
The policy update logic that resided part in the gossiper and
part in the rpc server is extracted into its own object.
This prepares for additional validation logic to be added for policy
updates that would otherwise make the gossiper heavier.
It is also a small first step towards separation of our own channel data
from the rest of the graph.
As a preparation for making the gossiper less responsible for validating
and supplementing local channel policy updates, this commits moves the
on-the-fly max htlc migration up the call tree. The plan for a follow up
commit is to move it out of the gossiper completely for local channel
updates, so that we don't need to return a list of final applied policies
anymore.
In this commit, we fix a bug where if a user updates a forwarding policy to be
zero, the update will be applied to the policy correctly on-disk, but not
in-memory.
We solve this issue by having the gossiper return the list of on-disk updated
policies and passing these policies to the switch, so the switch can assume
that zero-valued fields are intentional and not just uninitialized.
There's no need to broadcast these as we assume that online nodes have
already received them. For nodes that were offline, they should receive
them as part of their initial graph sync.
We do this to ensure the node announcement propagates to our channel
counterparty. At times, the node announcement does not propagate to them
when opening our first channel due to a race condition between
IsPublicNode and processing announcement signatures. This isn't
necessary for channel updates and announcement signatures as we send
those to our channel counterparty directly through the reliable sender.
Since ActiveSync GossipSyncers no longer synchronize our state with the
remote peers, none of the logic surrounding the round-robin is required
within the SyncManager.
In this commit, we extend the gossiper with support for external callers
to provide optional fields that can serve as useful when processing a
specific network announcement. This will serve useful for light clients,
which are unable to obtain the channel point and capacity for a given
channel, but can provide them manually for their own set of channels.
In this commit, we modify the main loop in `processChanPolicyUpdate` to
send updates for private channels directly to the remote peer via the
reliable message sender. This fixes a prior issue where the remote peer
wouldn't receive new updates as this method doesn't go through the
traditional path for channel updates.
In this commit, we introduce a new type: SyncerType. This type denotes
the type of sync a GossipSyncer is currently under. We only introduce
the two possible entry states, ActiveSync and PassiveSync. An ActiveSync
GossipSyncer will exchange channels with the remote peer and receive new
graph updates from them, while a PassiveSync GossipSyncer will not and
will only response to the remote peer's queries.
This commit does not modify the behavior and is only meant to be a
refactor.
In this commit, we address an assumption of the gossiper's recently
introduce reliable sender. The reliable sender is currently only used
for messages of unannounced channels. This makes sense as peers should
be able to retrieve messages from the network if they've previously
announced. However, within isMsgStale, we assumed that the reliable
sender would be used for every ChannelUpdate being sent, even if the
channel is already announced. Due to this, checking if the policy is
stale was unnecessary. But since this isn't the case, we should actually
be checking whether it is stale to prevent sending it later on.
In this commit, we leverage the recently introduced zombie edge index to
quickly reject announcements for edges we've previously deemed as
zombies. Care has been taken to ensure we don't reject fresh updates for
edges we've considered zombies.
In this commit, we also allow channel updates for our channels to be
sent reliably to our channel counterparty. This is especially crucial
for private channels, since they're not announced, in order to ensure
each party can receive funds from the other side.
In this commit, we alter the ValidateChannelUpdateAnn function in
ann_validation to validate a remote ChannelUpdate's message flags
and max HTLC field. If the message flag is set but the max HTLC
field is not set or vice versa, the ChannelUpdate fails validation.
Co-authored-by: Johan T. Halseth <johanth@gmail.com>
In this commit:
* we partition lnwire.ChanUpdateFlag into two (ChanUpdateChanFlags and
ChanUpdateMsgFlags), from a uint16 to a pair of uint8's
* we rename the ChannelUpdate.Flags to ChannelFlags and add an
additional MessageFlags field, which will be used to indicate the
presence of the optional field HtlcMaximumMsat within the ChannelUpdate.
* we partition ChannelEdgePolicy.Flags into message and channel flags.
This change corresponds to the partitioning of the ChannelUpdate's Flags
field into MessageFlags and ChannelFlags.
Co-authored-by: Johan T. Halseth <johanth@gmail.com>
In this commit, we allow the gossiper to also broadcast the
corresponding node announcements, if we know of them, of a channel when
constructing its full proof. We do this to ensure peers (other than our
remote peer) receive all the relevant announcements for a channel.
The tests changes were made to ensure the new behavior introduced works
as intended. Previously, the node announcements for each test channel
announcement were not processed, so they never existed from the
gossiper's point of view.
This also addresses an existing flake in the integration test
`testNodeAnnouncement`. This problem arose due to the node announcement
being sent before the connection between Dave (node announcement sender)
and Alice (node announcement receiver) was initiated and the full
channel proof was constructed.
This commit passes the peer's quit signal to the
gossipSyncer when attempt to hand off gossip query
messages. This allows a rate-limited peer's read
handler to break out immediately, which would
otherwise remain stuck until the rate-limited
gossip syncer pulled the message.
This commit restructures the delivery of gossip
query related messages, such that they are delivered
directly to the gossip syncers. Gossip query rate
limiting was introduced in #1824 on a per-peer basis.
However, since all gossip query messages were being
delivered in the main event loop, the end result is
that one rate-limited peer could stall all other
peers.
In addition, since no other peers would be able to
submit gossip-related messages through the blocked
event loop, the back pressure would eventually rate
limit the read handlers of all peers as well.
The end result would be lengthy delays in reading
messages related to htlc forwarding.
The fix is to lift the delivery of gossip query
messages outside of the main event loop. With
this change, the rate limiting backpressure is
delivered only to the intended peer.
In this commit, we modify the gossiper to no longer broadcast
NodeAnnouncements of nodes who intend to remain private. We do this to
prevent leaking their information to the greater network.
Previously, gossiper was the only object that validated channel
updates. Because updates can also be received as part of a
failed payment session in the routing package, validation logic
needs to be available there too. Gossiper already depends on
routing and having routing call the validation logic inside
gossiper would be a circular dependency. Therefore the validation
was moved to routing.
This commit removes the fallback in fetchGossipSyncer
that creates a gossip syncer if one is not registered
w/in the gossiper. Now that we register gossip syncers
explicitly before reading any gossip query messages,
this should not longer be required. The fallback also
did not honor the cfg.NoChanUpdates flag, which may
have led to inconsistencies between configuration and
actual behavior.
restransmitStaleChannels
In this commit, we add an additional error check for
ErrNoGraphEdgesFound when restransmitting stale channels during the
gossiper's startup. We do this to prevent benign log messages as we'll
log that we were unable to retransmit stale channels when we didn't have
any channels in our graph to begin with.
In this commit, we aim to resolve an issue with nodes requesting for
channel announcements when receiving a channel update for a channel
they're not aware of. This can happen if a node is not caught up with
the chain or if they receive updates for zombie channels. This would
lead to a spam issue, as if a node is not caught up with the chain,
every new update they receive is premature, causing them to manually
request the backing channel announcement. Ideally, we should be able to
detect this as a potential DoS vector and ban the node responsible, but
for now we'll simply remove this functionality.
In this commit, we select on the peer's QuitSignal to allow the caller
to unblock if the peer itself is disconnecting. With this change, we now
ensure that it isn't possible for a peer to block on this method and
prevent a graceful exit.
Previosuly we would immediately return nil on the error channel for
premature ChannelUpdates, which would break the expection that a a
returned non-error meant the update was successfully added to the
database. This meant that the caller would believe the update was added
to the database, while it is actually still in volatile memory and can
be lost during restarts.
This change makes us handle premature ChannelUpdates as we handle other
premature announcements within the gossiper, by deferring sending on the
error channel until we have reprocessed the update.
Previously we wouldn't return anything in the case where the
announcement were meant for a chain we didn't recognize. After this
change we should return an error on the error channel in all flows
within the gossiper.
Corrects an instance that holds a reference to a boltdb
byte slice after returning from the transaction. This
can cause panics under certain conditions, which is
avoided by creating a copy of the key.
In this commit, we allow the gossiper syncer to store the chunk size for
its respective encoding type. We do this to prevent a race condition
that would arise within the unit tests by modifying the values of the
encodingTypeToChunkSize map to allow for easier testing.
In this commit, we fix the logging when adding new gossip syncers. The
old log would log the byte array, rather than the byte slice. We fix
this by slicing before logging.
This commit changes the gossiper to direct messages to
peer objects, instead of sending them through the
server every time. The primary motivation is to reduce
contention on the server's mutex and, more importantly,
avoid deadlocks in the Triangle of Death.