However, both the character set and the checksum algorithm have limitations:
* Base58 needs a lot of space in QR codes, as it cannot use the ''alphanumeric mode''.
* The mixed case in base58 makes it inconvenient to reliably write down, type on mobile keyboards, or read out loud.
* The double SHA256 checksum is slow and has no error-detection guarantees.
* Most of the research on error-detecting codes only applies to character-set sizes that are a [https://en.wikipedia.org/wiki/Prime_power prime power], which 58 is not.
* Base58 decoding is complicated and relatively slow.
Included in the Segregated Witness proposal are a new class of outputs
We first describe the general checksummed base32<ref>'''Why use base32 at all?''' The lack of mixed case makes it more
efficient to read out loud or to put into QR codes. It does come with a 15% length
increase, but that does not matter when copy-pasting addresses.</ref> format called
''Bech32'' and then define Segregated Witness addresses using it.
===Bech32===
A Bech32<ref>'''Why call it Bech32?''' "Bech" contains the characters BCH (the error
detection algorithm used) and sounds a bit like "base".</ref> string is at most 90 characters long and consists of:
* The '''human-readable part''', which is intended to convey the type of data or anything else that is relevant for the reader. Its validity (including the used set of characters) is application specific, but restricted to ASCII characters with values in the range 33-126.
* The '''separator''', which is always "1". In case "1" is allowed inside the human-readable part, the last one in the string is the separator<ref>'''Why include a separator in addresses?''' That way the human-readable
part is unambiguously separated from the data part, avoiding potential
collisions with other human-readable parts that share a prefix. It also
allows us to avoid having character-set restrictions on the human-readable part. The
separator is ''1'' because using a non-alphanumeric character would
complicate copy-pasting of addresses (with no double-click selection in
several applications). Therefore an alphanumeric character outside the normal character set
was chosen.</ref>.
* The '''data part''', which is at least 6 characters long and only consists of alphanumeric characters excluding "1", "b", "i", and "o"<ref>'''Why not use an existing character set like [http://www.faqs.org/rfcs/rfc3548.html RFC3548] or [https://philzimmermann.com/docs/human-oriented-base-32-encoding.txt z-base-32]'''?
The character set is chosen to minimize ambiguity according to
[https://hissa.nist.gov/~black/GTLD/ this] visual similarity data, and
the ordering is chosen to minimize the number of pairs of similar
characters (according to the same data) that differ in more than 1 bit.
As the checksum is chosen to maximize detection capabilities for low
numbers of bit errors, this choice improves its performance under some
error models.</ref>.
{| class="wikitable"
|-
!
!0
!1
!2
!3
!4
!5
!6
!7
|-
!+0
|q||p||z||r||y||9||x||8
|-
!+8
|g||f||2||t||v||d||w||0
|-
!+16
|s||3||j||n||5||4||k||h
|-
!+24
|c||e||6||m||u||a||7||l
|}
'''Checksum'''
The last six characters of the data part form a checksum and contain no
information. Valid strings MUST pass the criteria for validity specified
by the Python3 code snippet below. The function
<tt>bech32_verify_checksum</tt> must return true when its arguments are:
* <tt>hrp</tt>: the human-readable part as a string
* <tt>data</tt>: the data part as a list of integers representing the characters after conversion using the table above
<pre>
def bech32_polymod(values):
GEN = [0x3b6a57b2, 0x26508e6d, 0x1ea119fa, 0x3d4233dd, 0x2a1462b3]
chk = 1
for v in values:
b = (chk >> 25)
chk = (chk & 0x1ffffff) << 5 ^ v
for i in range(5):
chk ^= GEN[i] if ((b >> i) & 1) else 0
return chk
def bech32_hrp_expand(s):
return [ord(x) >> 5 for x in s] + [0] + [ord(x) & 31 for x in s]
This implements a [https://en.wikipedia.org/wiki/BCH_code BCH code] that
guarantees detection of '''any error affecting at most 4 characters'''
and has less than a 1 in 10<sup>9</sup> chance of failing to detect more
errors. More details about the properties can be found in the
Checksum Design appendix. The human-readable part is processed by first
feeding the higher bits of each character's ASCII value into the
checksum calculation followed by a zero and then the lower bits of each<ref>'''Why are the high bits of the human-readable part processed first?'''
This results in the actually checksummed data being ''[high hrp] 0 [low hrp] [data]''. This means that under the assumption that errors to the
human readable part only change the low 5 bits (like changing an alphabetical character into another), errors are restricted to the ''[low hrp] [data]''
part, which is at most 89 characters, and thus all error detection properties (see appendix) remain applicable.</ref>.
To construct a valid checksum given the human-readable part and (non-checksum) values of the data-part characters, the code below can be used:
A segwit address<ref>'''Why not make an address format that is generic for all scriptPubKeys?'''
That would lead to confusion about addresses for
existing scriptPubKey types. Furthermore, if addresses that do not have a one-to-one mapping with scriptPubKeys (such as ECDH-based
addresses) are ever introduced, having a fully generic old address type available would
permit reinterpreting the resulting scriptPubKeys using the old address
format, with lost funds as a result if bitcoins are sent to them.</ref> is a Bech32 encoding of:
* The human-readable part "bc"<ref>'''Why use 'bc' as human-readable part and not 'btc'?''' 'bc' is shorter.</ref> for mainnet, and "tb"<ref>'''Why use 'tb' as human-readable part for testnet?''' It was chosen to
be of the same length as the mainnet counterpart (to simplify
implementations' assumptions about lengths), but still be visually
distinct.</ref> for testnet.
* The data-part values:
** 1 value: the witness version
** A conversion of the the 2-to-40-byte witness program (as defined by [https://github.com/bitcoin/bips/blob/master/bip-0141.mediawiki BIP141]) to base32:
*** Start with the bits of the witness program, most significant bit per byte first.
*** Re-arrange those bits into groups of 5, and pad with zeroes at the end if needed.
*** Translate those bits to characters using the table above.
'''Decoding'''
Software interpreting a segwit address:
* MUST verify that the human-readable part is "bc" for mainnet and "tb" for testnet.
* MUST verify that the first decoded data value (the witness version) is between 0 and 16, inclusive.
* Convert the rest of the data to bytes:
** Translate the values to 5 bits, most significant bit first.
** Re-arrange those bits into groups of 8 bits. Any incomplete group at the end MUST be 4 bits or less, MUST be all zeroes, and is discarded.
** There MUST be between 2 and 40 groups, which are interpreted as the bytes of the witness program.
Decoders SHOULD enforce known-length restrictions on witness programs.
For example, BIP141 specifies ''If the version byte is 0, but the witness
program is neither 20 nor 32 bytes, the script must fail.''
As a result of the previous rules, addresses are always between 14 and 74 characters long, and their length modulo 8 cannot be 0, 3, or 5.
Version 0 witness addresses are always 42 or 62 characters, but implementations MUST allow the use of any version.
===Compatibility===
Only new software will be able to use these addresses, and only for
receivers with segwit-enabled new software. In all other cases, P2SH or
P2PKH addresses can be used.
==Rationale==
<references />
==Reference implementations==
* Reference encoder and decoder:
** [https://github.com/sipa/bech32/tree/master/ref/c For C]
** [https://github.com/sipa/bech32/tree/master/ref/javascript For JavaScript]
** [https://github.com/sipa/bech32/tree/master/ref/python For Python]
This means that when 5 changed characters occur randomly distributed in
the 39 characters of a P2WPKH address, there is a chance of
''0.756 per billion'' that it will go undetected. When those 5 changes
occur randomly within a 19-character window, that chance goes down to
''0.093 per billion''. As the number of errors goes up, the chance
converges towards ''1 in 2<sup>30</sup>'' = ''0.931 per billion''.
Even though the chosen code performs reasonably well up to 1023 characters,
other designs are preferable for lengths above 89 characters (excluding the
separator).
==Acknowledgements==
This document is inspired by the [https://rusty.ozlabs.org/?p=578 address proposal] by Rusty Russell, the
[https://lists.linuxfoundation.org/pipermail/bitcoin-dev/2014-February/004402.html base32] proposal by Mark Friedenbach, and had input from Luke Dashjr,
Johnson Lau, Eric Lombrozo, Peter Todd, and various other reviewers.