Currently there is no standard for bitcoin wallet clients when ordering transaction inputs and outputs.
As a result, wallet clients often have a discernible blockchain fingerprint, and can leak private information about their users.
By contrast, a standard for non-deterministic sorting could be difficult to audit.
This document proposes deterministic lexicographical sorting, using hashes of previous transactions and output indices to sort transaction inputs, as well as values and scriptPubKeys to sort transaction outputs.
Currently, there is no clear standard for how wallet clients ought to order transaction inputs and outputs.
Since wallet clients are left to their own devices to determine this ordering, they often leak information about their users’ finances.
For example, a wallet client might naively order inputs based on when addresses were added to a wallet by the user through importing or random generation.
Many wallets will place spending outputs first and change outputs second, leaking information about both the sender and receiver’s finances to passive blockchain observers.
Such information should remain private not only for the benefit of consumers, but in higher order financial systems must be kept secret to prevent fraud.
Currently, there is no clear standard for how wallet clients ought to order transaction inputs and outputs.
Since wallet clients are left to their own devices to determine this ordering, they often leak information about their users’ finances.
For example, a wallet client might naively order inputs based on when addresses were added to a wallet by the user through importing or random generation.
Many wallets will place spending outputs first and change outputs second, leaking information about both the sender and receiver’s finances to passive blockchain observers.
Such information should remain private not only for the benefit of consumers, but in higher order financial systems must be kept secret to prevent fraud.
A researcher recently demonstrated this principle when he detected that Bitstamp leaked information when creating exchange transactions, enabling potential espionage among traders. [1]
One way to address these privacy weaknesses is by randomly ordering inputs and outputs. [2]
After all, the order of inputs and outputs does not impact the function of the transaction they belong to, making random sorting viable.
Unfortunately, it can be difficult to prove that this sorting process is genuinely randomly sorted based on code or run-time analysis, especially if the software is closed source.
A malicious software developer can abuse the ordering of inputs and outputs as a side channel of leaking information.
For example, if an attacker can patch a victim’s HD wallet client to order inputs and outputs based on the bits of a master private key, then the attacker can eventually steal all of the victim’s funds by monitoring the blockchain.
Non-deterministic methods of sorting are difficult to audit because they are not repeatable.
The lack of standardization between wallet clients when ordering inputs and outputs can yield predictable quirks that characterize particular wallet clients or services.
Such quirks create unique fingerprints that a privacy attacker can employ through simple passive blockchain observation.
The solution is to create an algorithm for sorting transaction inputs and outputs that is deterministic.
Since it is deterministic, it should also be unambiguous — that is, given a particular transaction, the proper order of inputs and outputs should be obvious.
To make this standard as widely applicable as possible, it should rely on information that is downloaded by both full nodes (with or without typical efficiency techniques such as pruning) and SPV nodes.
In order to ensure that it does not leak confidential data, it must rely on information that is publicly accessible through the blockchain.
The use of public blockchain information also allows a transaction to be sorted even when it is a multi-party transaction, such as in the example of a CoinJoin.
This BIP applies to any transaction for which the order of its inputs and outputs does not impact the transaction’s function.
Currently, this refers to any transaction that employs the SIGHASH_ALL signature hash type, in which signatures commit to the exact order of inputs and outputs.
Transactions that use SIGHASH_ANYONECANPAY and/or SIGHASH_NONE may include inputs and/or outputs that are not signed; however, compliant software should still emit transactions with lexicographically sorted inputs and outputs, even though they may later be modified by others.
In the event that future protocol upgrades introduce new signature hash types, compliant software should apply the lexicographical ordering principle analogously.
While out of scope of this BIP, protocols that do require a specified order of inputs/outputs (e.g. due to use of SIGHASH_SINGLE) should consider the goals of this BIP and how best to adapt them to the specific needs of those protocols.
104 Transaction inputs are defined by the hash of a previous transaction, the output index of of a UTXO from that previous transaction, the size of an unlocking script, the unlocking script, and a sequence number. [3]
For sorting inputs, the hash of the previous transaction and the output index within that transaction are sufficient for sorting purposes; each transaction hash has an extremely high probability of being unique in the blockchain — this is enforced for coinbase transactions by BIP30 — and output indices within a transaction are unique.
For the sake of efficiency, transaction hashes should be compared first before output indices, since output indices from different transactions are often equivalent, while all bytes of the transaction hash are effectively random variables.
For the sake of efficiency, amounts should be compared first for sorting, since they contain fewer bytes of information (8 bytes) compared to a standard P2PKH scriptPubKey (25 bytes). [4]
Transaction output amounts (as 64-bit unsigned integers) are to be sorted in ascending order.
In the event of two matching output amounts, the respective output scriptPubKeys (in their little-endian, byte-array form) will be compared lexicographically, in ascending order.
If the scriptPubKeys match, the outputs are considered equal.
* [[https://lists.linuxfoundation.org/pipermail/bitcoin-dev/2015-June/008484.html|<nowiki>[Bitcoin-development]</nowiki> Lexicographical Indexing of Transaction Inputs and Outputs]]
* [[https://lists.linuxfoundation.org/pipermail/bitcoin-dev/2015-June/008487.html|<nowiki>[Bitcoin-development] [RFC]</nowiki> Canonical input and output ordering in transactions]]
Danno Ferrin <danno@numisight.com>, Sergio Demian Lerner <sergiolerner@certimix.com>, Justus Ranvier <justus@openbitcoinprivacyproject.org>, and Peter Todd <pete@petertodd.org> contributed to the design and motivations for this BIP.
A similar proposal was submitted to the Bitcoin-dev mailing list independently by Rusty Russell <rusty@rustcorp.com.au>