This BIP describes a structure for compact filters on block data, for use in the
BIP 157 light client protocol<ref>bip-0157.mediawiki</ref>. The filter
construction proposed is an alternative to Bloom filters, as used in BIP 37,
that minimizes filter size by using Golomb-Rice coding for compression. This
document specifies two initial types of filters based on this construction that
enables basic wallets and applications with more advanced smart contracts.
== Motivation ==
[[bip-0157.mediawiki|BIP 157]] defines a light client protocol based on
deterministic filters of block content. The filters are designed to
minimize the expected bandwidth consumed by light clients, downloading filters
and full blocks. This document defines two initial filter types, ''basic'' and
''extended'', to provide support for advanced applications while reducing the
filter size for regular wallets.
== Definitions ==
<code>[]byte</code> represents a vector of bytes.
<code>[N]byte</code> represents a fixed-size byte array with length N.
''CompactSize'' is a compact encoding of unsigned integers used in the Bitcoin
P2P protocol.
''Data pushes'' are byte vectors pushed to the stack according to the rules of
Bitcoin script.
''Bit streams'' are readable and writable streams of individual bits. The
following functions are used in the pseudocode in this document:
* <code>new_bit_stream</code> instantiates a new writable bit stream
* <code>new_bit_stream(vector)</code> instantiates a new bit stream reading data from <code>vector</code>
* <code>write_bit(stream, b)</code> appends the bit <code>b</code> to the end of the stream
* <code>read_bit(stream)</code> reads the next available bit from the stream
* <code>write_bits_big_endian(stream, n, k)</code> appends the <code>k</code> least significant bits of integer <code>n</code> to the end of the stream in big-endian bit order
* <code>read_bits_big_endian(stream, k)</code> reads the next available
* <code>k</code> bits from the stream and interprets them as the least significant bits of a big-endian integer
The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT", "SHOULD",
"SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" in this document are to be
interpreted as described in RFC 2119.
== Specification ==
=== Golomb-Coded Sets ===
For each block, compact filters are derived containing sets of items associated
with the block (eg. addresses sent to, outpoints spent, etc.). A set of such
data objects is compressed into a probabilistic structure called a
''Golomb-coded set'' (GCS), which matches all items in the set with probability
1, and matches other items with probability <code>2^(-P)</code> for some integer
parameter <code>P</code>.
At a high level, a GCS is constructed from a set of <code>N</code> items by:
# hashing all items to 64-bit integers in the range <code>[0, N * 2^P)</code>
# sorting the hashed values in ascending order
# computing the differences between each value and the previous one
# writing the differences sequentially, compressed with Golomb-Rice coding
The following sections describe each step in greater detail.
==== Hashing Data Objects ====
The first step in the filter construction is hashing the variable-sized raw
items in the set to the range <code>[0, F)</code>, where <code>F = N *
2^P</code>. Set membership queries against the hash outputs will have a false
positive rate of <code>2^(-P)</code>. To avoid integer overflow, the number of
items <code>N</code> MUST be <2^32 and <code>P</code> MUST be <=32.
The items are first passed through the pseudorandom function ''SipHash'', which
takes a 128-bit key <code>k</code> and a variable-sized byte vector and produces
a uniformly random 64-bit output. Implementations of this BIP MUST use the
SipHash parameters <code>c = 2</code> and <code>d = 4</code>.
The 64-bit SipHash outputs are then mapped uniformly over the desired range by
multiplying with F and taking the top 64 bits of the 128-bit result. This
algorithm is a faster alternative to modulo reduction, as it avoids the
Test vectors for a P value of 20 on five testnet blocks, including the filters and filter headers, can be found [[bip-0158/testnet-20.csv|here]]. The code to generate these vectors for P values of 1 through 32 can be found [[bip-0158/gentestvectors.go|here]].