This converts the GetSigOpCount function to make use of the new
tokenizer instead of the far less efficient parseScript thereby
significantly optimizing the function.
A new function named countSigOpsV0 which accepts the raw script is
introduced to perform the bulk of the work so it can be reused for
precise signature operation counting as well in a later commit. It
retains the same semantics in terms of counting the number of signature
operations either up to the first parse error or the end of the script
in the case it parses successfully as required by consensus.
Finally, this also deprecates the getSigOpCount function that requires
opcodes in favor of the new function and modifies the comment on
GetSigOpCount to explicitly call out the script version semantics.
The following is a before and after comparison of analyzing a large
script:
benchmark old ns/op new ns/op delta
BenchmarkGetSigOpCount-8 61051 677 -98.89%
benchmark old allocs new allocs delta
BenchmarkGetSigOpCount-8 1 0 -100.00%
benchmark old bytes new bytes delta
BenchmarkGetSigOpCount-8 311299 0 -100.00%
This moves the check for non push-only pay-to-script-hash signature
scripts before the script parsing logic when creating a new engine
instance to avoid the extra overhead in the error case.
This modifies the check for whether or not a pay-to-script-hash
signature script is a push only script to make use of the new and more
efficient raw script function.
Also, since the script will have already been checked further above when
the ScriptVerifySigPushOnly flags is set, avoid checking it again in
that case.
Backport of af67951b9a66df3aac1bf3d6376af0730287bbf2
This converts the IsUnspendable function to make use of a combination of
raw script analysis and the new tokenizer instead of the far less
efficient parseScript thereby significantly optimizing the function.
It is important to note that this new implementation intentionally has a
semantic difference from the existing implementation in that it will now
report scripts that are larger than the max allowed script size are
unspendable as well.
Finally, the comment is modified to explicitly call out the script
version semantics.
Note: this function was recently optimized in master, so the gains here
are less noticable than other optimizations.
The following is a before and after comparison of analyzing a large
script:
benchmark old ns/op new ns/op delta
BenchmarkIsUnspendable-8 656 584 -10.98%
benchmark old allocs new allocs delta
BenchmarkIsUnspendable-8 1 0 -100.00%
benchmark old bytes new bytes delta
BenchmarkIsUnspendable-8 1 0 -100.00%
This converts the IsNullData function to analyze the raw script instead
of using the far less efficient parseScript, thereby significantly
optimizing the function.
The following is a before and after comparison of analyzing a large
script:
benchmark old ns/op new ns/op delta
BenchmarkIsNullDataScript-8 62495 2.65 -100.00%
benchmark old allocs new allocs delta
BenchmarkIsNullDataScript-8 1 0 -100.00%
benchmark old bytes new bytes delta
BenchmarkIsNullDataScript-8 311299 0 -100.00%
This converts the IsPayToWitnessScriptHash function to analyze the raw
script instead of using the far less efficient parseScript, thereby
significantly optimizing the function.
In order to accomplish this, it introduces two new functions. The first
one is named extractWitnessScriptHash and works with the raw script byte
to simultaneously deteremine if the script is a p2wsh script, and in the
case that is is, extract and return the hash. The second new function is
named isWitnessScriptHashScript and is defined in terms of the former.
The extract function approach was chosed because it is common for
callers to want to only extract relevant details from a script if the
script is of the specific type. Extracting those details requires
performing the exact same checks to ensure the script is of the correct
type, so it is more efficient to combine the two into one and define the
type determination in terms of the result, so long as the extraction
does not require allocations.
Finally, this also deprecates the isWitnessScriptHash function that
requires opcodes in favor of the new functions and modifies the comment
on IsPayToWitnessScriptHash to call out the script version semantics.
The following is a before and after comparison of executing
IsPayToWitnessScriptHash on a large script:
benchmark old ns/op new ns/op delta
BenchmarkIsWitnessScriptHash-8 62774 0.63 -100.00%
benchmark old allocs new allocs delta
BenchmarkIsWitnessScriptHash-8 1 0 -100.00%
benchmark old bytes new bytes delta
BenchmarkIsWitnessScriptHash-8 311299 0 -100.00%
This converts the IsPayToWitnessPubKeyHash function to analyze the raw
script instead of the far less efficient parseScript, thereby
significantly optimizing the function.
In order to accomplish this, it introduces two new functions. The first
one is named extractWitnessPubKeyHash and works with the raw script
bytes to simultaneously deteremine if the script is a p2wkh, and in case
it is, extract and return the hash. The second new function is name
isWitnessPubKeyHashScript which is defined in terms of the former.
The extract function is approach was chosen because it is common for
callers to want to only extract relevant details from the script if the
script is of the specific type. Extracting those details requires the
exact same checks to ensure the script is of the correct type, so it is
more efficient to combine the two and define the type determination in
terms of the result so long as the extraction does not require
allocations.
Finally, this deprecates the isWitnessPubKeyHash function that requires
opcodes in favor of the new functions and modifies the comment on
IsPayToWitnessPubKeyHash to explicitly call out the script version
semantics.
The following is a before and after comparison of executing
IsPayToWitnessPubKeyHash on a large script:
benchmark old ns/op new ns/op delta
BenchmarkIsWitnessPubKeyHash-8 68927 0.53 -100.00%
benchmark old allocs new allocs delta
BenchmarkIsWitnessPubKeyHash-8 1 0 -100.00%
benchmark old bytes new bytes delta
BenchmarkIsWitnessPubKeyHash-8 311299 0 -100.00%
This converts the IsPushOnlyScript function to make use of the new
tokenizer instead of the far less efficient parseScript thereby
significantly optimizing the function.
It also deprecates the isPushOnly function that requires opcodes in
favor of the new function and modifies the comment on IsPushOnlyScript
to explicitly call out the script version semantics.
The following is a before and after comparison of analyzing a large
script:
benchmark old ns/op new ns/op delta
BenchmarkIsPushOnlyScript-8 62412 622 -99.00%
benchmark old allocs new allocs delta
BenchmarkIsPushOnlyScript-8 1 0 -100.00%
benchmark old bytes new bytes delta
BenchmarkIsPushOnlyScript-8 311299 0 -100.00%
This converts the IsMultisigSigScript function to analyze the raw script
and make use of the new tokenizer instead of the far less efficient
parseScript thereby significantly optimizing the function.
In order to accomplish this, it first rejects scripts that can't
possibly fit the bill due to the final byte of what would be the redeem
script not being the appropriate opcode or the overall script not having
enough bytes. Then, it uses a new function that is introduced named
finalOpcodeData that uses the tokenizer to return any data associated
with the final opcode in the signature script (which will be nil for
non-push opcodes or if the script fails to parse) and analyzes it as if
it were a redeem script when it is non nil.
It is also worth noting that this new implementation intentionally has
the same semantic difference from the existing implementation as the
updated IsMultisigScript function in regards to allowing zero pubkeys
whereas previously it incorrectly required at least one pubkey.
Finally, the comment is modified to explicitly call out the script
version semantics.
The following is a before and after comparison of analyzing a large
script that is not a multisig script and both a 1-of-2 multisig public
key script (which should be false) and a signature script comprised of a
pay-to-script-hash 1-of-2 multisig redeem script (which should be true):
benchmark old ns/op new ns/op delta
BenchmarkIsMultisigSigScriptLarge-8 69328 2.93 -100.00%
BenchmarkIsMultisigSigScript-8 2375 146 -93.85%
benchmark old allocs new allocs delta
BenchmarkIsMultisigSigScriptLarge-8 5 0 -100.00%
BenchmarkIsMultisigSigScript-8 3 0 -100.00%
benchmark old bytes new bytes delta
BenchmarkIsMultisigSigScriptLarge-8 330035 0 -100.00%
BenchmarkIsMultisigSigScript-8 9472 0 -100.00%
This converts the IsMultisigScript function to make use of the new
tokenizer instead of the far less efficient parseScript thereby
significantly optimizing the function.
In order to accomplish this, it introduces two new functions. The first
one is named extractMultisigScriptDetails and works with the raw script
bytes to simultaneously determine if the script is a multisignature
script, and in the case it is, extract and return the relevant details.
The second new function is named isMultisigScript and is defined in
terms of the former.
The extract function accepts the script version, raw script bytes, and a
flag to determine whether or not the public keys should also be
extracted. The flag is provided because extracting pubkeys results in
an allocation that the caller might wish to avoid.
The extract function approach was chosen because it is common for
callers to want to only extract relevant details from a script if the
script is of the specific type. Extracting those details requires
performing the exact same checks to ensure the script is of the correct
type, so it is more efficient to combine the two into one and define the
type determination in terms of the result so long as the extraction does
not require allocations.
It is important to note that this new implementation intentionally has a
semantic difference from the existing implementation in that it will now
correctly identify a multisig script with zero pubkeys whereas
previously it incorrectly required at least one pubkey. This change is
acceptable because the function only deals with standardness rather than
consensus rules.
Finally, this also deprecates the isMultiSig function that requires
opcodes in favor of the new functions and deprecates the error return on
the export IsMultisigScript function since it really does not make sense
given the purpose of the function.
The following is a before and after comparison of analyzing both a large
script that is not a multisig script and a 1-of-2 multisig public key
script:
benchmark old ns/op new ns/op delta
BenchmarkIsMultisigScriptLarge-8 64166 5.52 -99.99%
BenchmarkIsMultisigScript-8 630 59.4 -90.57%
benchmark old allocs new allocs delta
BenchmarkIsMultisigScriptLarge-8 1 0 -100.00%
BenchmarkIsMultisigScript-8 1 0 -100.00%
benchmark old bytes new bytes delta
BenchmarkIsMultisigScriptLarge-8 311299 0 -100.00%
BenchmarkIsMultisigScript-8 2304 0 -100.00%
This converts the IsPayToScriptHash function to analyze the raw script
instead of using the far less efficient parseScript thereby
significantly optimizing the function.
In order to accomplish this, it introduces two new functions. The first
one is named extractScriptHash and works with the raw script bytes to
simultaneously determine if the script is a p2sh script, and in the case
it is, extract and return the hash. The second new function is named
isScriptHashScript and is defined in terms of the former.
The extract function approach was chosen because it is common for
callers to want to only extract relevant details from a script if the
script is of the specific type. Extracting those details requires
performing the exact same checks to ensure the script is of the correct
type, so it is more efficient to combine the two into one and define the
type determination in terms of the result so long as the extraction does
not require allocations.
Finally, this also deprecates the isScriptHash function that requires
opcodes in favor of the new functions and modifies the comment on
IsPayToScriptHash to explicitly call out the script version semantics.
The following is a before and after comparison of analyzing a large
script that is not a p2sh script:
benchmark old ns/op new ns/op delta
BenchmarkIsPayToScriptHash-8 62393 0.60 -100.00%
benchmark old allocs new allocs delta
BenchmarkIsPayToScriptHash-8 1 0 -100.00%
benchmark old bytes new bytes delta
BenchmarkIsPayToScriptHash-8 311299 0 -100.00%
This converts the IsPayToPubKeyHash function to analyze the raw script
instead of using the far less efficient parseScript, thereby
significantly optimization the function.
In order to accomplish this, it introduces two new functions. The first
one is named extractPubKeyHash and works with the raw script bytes
to simultaneously determine if the script is a pay-to-pubkey-hash script,
and in the case it is, extract and return the hash. The second new
function is named isPubKeyHashScript and is defined in terms of the
former.
The extract function approach was chosen because it is common for
callers to want to only extract relevant details from a script if the
script is of the specific type. Extracting those details requires
performing the exact same checks to ensure the script is of the correct
type, so it is more efficient to combine the two into one and define the
type determination in terms of the result so long as the extraction does
not require allocations.
The following is a before and after comparison of analyzing a large
script:
benchmark old ns/op new ns/op delta
BenchmarkIsPubKeyHashScript-8 62228 0.45 -100.00%
benchmark old allocs new allocs delta
BenchmarkIsPubKeyHashScript-8 1 0 -100.00%
benchmark old bytes new bytes delta
BenchmarkIsPubKeyHashScript-8 311299 0 -100.00%
This converts the IsPayToScriptHash function to analyze the raw script
instead of using the far less efficient parseScript, thereby
significantly optimizing the function.
In order to accomplish this, it introduces four new functions:
extractCompressedPubKey, extractUncompressedPubKey, extractPubKey, and
isPubKeyScript. The extractPubKey function makes use of
extractCompressedPubKey and extractUncompressedPubKey to combine their
functionality as a convenience and isPubKeyScript is defined in terms of
extractPubKey.
The extractCompressedPubKey works with the raw script bytes to
simultaneously determine if the script is a pay-to-compressed-pubkey
script, and in the case it is, extract and return the raw compressed
pubkey bytes.
Similarly, the extractUncompressedPubKey works in the same way except it
determines if the script is a pay-to-uncompressed-pubkey script and
returns the raw uncompressed pubkey bytes in the case it is.
The extract function approach was chosen because it is common for
callers to want to only extract relevant details from a script if the
script is of the specific type. Extracting those details requires
performing the exact same checks to ensure the script is of the correct
type, so it is more efficient to combine the two into one and define the
type determination in terms of the result so long as the extraction does
not require allocations.
The following is a before and after comparison of analyzing a large
script:
benchmark old ns/op new ns/op delta
BenchmarkIsPubKeyScript-8 62323 2.97 -100.00%
benchmark old allocs new allocs delta
BenchmarkIsPubKeyScript-8 1 0 -100.00%
benchmark old bytes new bytes delta
BenchmarkIsPubKeyScript-8 311299 0 -100.00%
This converts the asSmallInt function to accept an opcode as a byte
instead of the internal opcode data struct in order to make it more
flexible for raw script analysis.
It also updates all callers accordingly.
This converts the isSmallInt function to accept an opcode as a byte
instead of the internal opcode data struct in order to make it more
flexible for raw script analysis.
The comment is modified to explicitly call out the script version
semantics.
Finally, it updates all callers accordingly.
This converts the tests for calculating signature hashes to use the
exported function which handles the raw script versus the now deprecated
variant requiring parsed opcodes.
Backport of 06f769ef72e6042e7f2b5ff1c512ef1371d615e5
This modifies the CalcSignatureHash function to make use of the new
signature hash calculation function that accepts raw scripts without
needing to first parse them. Consequently, it also doubles as a slight
optimization to the execution time and a significant reduction in the
number of allocations.
In order to convert the CalcScriptHash function and keep the same
semantics, a new function named checkScriptParses is introduced which
will quickly determine if a script can be fully parsed without failure
and return the parse failure in the case it can't.
The following is a before and after comparison of analyzing a large
multiple input transaction:
benchmark old ns/op new ns/op delta
BenchmarkCalcSigHash-8 3627895 3619477 -0.23%
benchmark old allocs new allocs delta
BenchmarkCalcSigHash-8 1335 801 -40.00%
benchmark old bytes new bytes delta
BenchmarkCalcSigHash-8 1373812 1293354 -5.86%
This introduces a new function named calcSignatureHashRaw which accepts
the raw script bytes to calculate the script hash versus requiring the
parsed opcode only to unparse them later in order to make it more
flexible for working with raw scripts.
Since there are several places in the rest of the code that currently
only have access to the parsed opcodes, this modifies the existing
calcSignatureHash to first unparse the script before calling the new
function.
Backport of decred/dcrd:f306a72a16eaabfb7054a26f9d9f850b87b00279
This converts the DisasmString function to make use of the new
zero-allocation script tokenizer instead of the far less efficient
parseScript thereby significantly optimizing the function.
In order to facilitate this, the opcode disassembly functionality is
split into a separate function called disasmOpcode that accepts the
opcode struct and data independently as opposed to requiring a parsed
opcode. The new function also accepts a pointer to a string builder so
the disassembly can be more efficiently be built.
While here, the comment is modified to explicitly call out the script
version semantics.
The following is a before and after comparison of a large script:
benchmark old ns/op new ns/op delta
BenchmarkDisasmString-8 102902 40124 -61.01%
benchmark old allocs new allocs delta
BenchmarkDisasmString-8 46 51 +10.87%
benchmark old bytes new bytes delta
BenchmarkDisasmString-8 389324 130552 -66.47%
This implements an efficient and zero-allocation script tokenizer that
is exported to both provide a new capability to tokenize scripts to
external consumers of the API as well as to serve as a base for
refactoring the existing highly inefficient internal code.
It is important to note that this tokenizer is intended to be used in
consensus critical code in the future, so it must exactly follow the
existing semantics.
The current script parsing mechanism used throughout the txscript module
is to fully tokenize the scripts into an array of internal parsed
opcodes which are then examined and passed around in order to implement
virtually everything related to scripts.
While that approach does simplify the analysis of certain scripts and
thus provide some nice properties in that regard, it is both extremely
inefficient in many cases, and makes it impossible for external
consumers of the API to implement any form of custom script analysis
without manually implementing a bunch of error prone tokenizing code or,
alternatively, the script engine exposing internal structures.
For example, as shown by profiling the total memory allocations of an
initial sync, the existing script parsing code allocates a total of
around 295.12GB, which equates to around 50% of all allocations
performed. The zero-alloc tokenizer this introduces will allow that to
be reduced to virtually zero.
The following is a before and after comparison of tokenizing a large
script with a high opcode count using the existing code versus the
tokenizer this introduces for both speed and memory allocations:
benchmark old ns/op new ns/op delta
BenchmarkScriptParsing-8 63464 677 -98.93%
benchmark old allocs new allocs delta
BenchmarkScriptParsing-8 1 0 -100.00%
benchmark old bytes new bytes delta
BenchmarkScriptParsing-8 311299 0 -100.00%
The following is an overview of the changes:
- Introduce new error code ErrUnsupportedScriptVersion
- Implement zero-allocation script tokenizer
- Add a full suite of tests to ensure the tokenizer works as intended
and follows the required consensus semantics
- Add an example of using the new tokenizer to count the number of
opcodes in a script
- Update README.md to include the new example
- Update script parsing benchmark to use the new tokenizer
This resolves the more fundamental flake in the unit tests noted in the
prior commit.
Because multiple unit tests call rand.Seed in parallel, it's possible
they can be executed with the same unix timestamp (in seconds). If the
second call happens between generating the hash cache and checking that
the cache doesn't contain a random txn, the random transaction is in
fact a duplicate of one generated earlier since the RNG state was reset.
To remedy, we initialize rand.Seed once in the init function.
TestHashCacheAddContainsHashes flakes fairly regularly when rebasing
PR #1684 with:
txid <txid> wasn't inserted into cache but was found.
With probabilty 1/10^2 there will be no inputs on the transaction. This
reduces the entropy in the txid, and I belive is the primary cause of
the flake.
- create benchmarks to measure allocations
- add test for benchmark input
- create a low alloc parseScriptTemplate
- refactor parsing logic for a single opcode
In this commit, we extend the txscript package to support re-deriving
the PkScript of an output by looking at the input's signature
script/witness attempting to spend it. As of this commit, the only
supported types are P2SH, v0 P2WSH, and v0 P2WPKH.
This will serve useful to detect when a particular script has been spent
on-chain.
A set of test vectors has also been added for the supported script types
to ensure its correctness.
This cleans up the code for handling the checksig and checkmultisig
opcode strict signatures to explicitly call out any semantics that are
likely not obvious and improve readability.
It also introduce new distinct errors for each condition which can
result in a signature being rejected due to not following the strict
encoding requirements and updates reference test adaptor accordingly.
This modifies calcSignatureHash to use a shallow copy of the transaction
versus a deep copy since the actual scripts themselves are not modified
and therefore don't need to be copied.
This is being done because profiling the most overall allocated space
shows that the deep copy performed in calcSignatureHash accounts for
nearly 20% of all allocations on a synced running instance. Also,
copying all of the additional data makes it more time consuming as well.
With this change, that figure drops from ~20% to ~5% of all allocations.
The following benchmark shows the relative speedups and allocation
reduction as a result of the optimization on my system. In particular,
the changes result in approximately a 15% speedup and a whopping 99.89%
reduction in allocations when using a large transaction with thousands
of inputs which was the worst case scenario.
benchmark old allocs new allocs delta
--------------------------------------------------------------------
BenchmarkCalcSignatureHash 11151 12 -99.89%
benchmark old ns/op new ns/op delta
--------------------------------------------------------------------
BenchmarkCalcSignatureHash 3599845 3056359 -15.10%
This commit adds verification of the post-segwit standardness
requirement that all pubkeys involved in checks operations MUST be
serialized as compressed public keys. A new ScriptFlag has been added
to guard this behavior when executing scripts.
This commit modifies the op-code execution for OP_IF and OP_NOTIF to
enforce the additional “minimal if” constraints which require the
top-stack item when the op codes are encountered to be either an empty
vector, or exactly [0x01].
This commit implements the flag activation portion of BIP 0147. The
verification behavior triggered by the NULLDUMMY script verification
flag has been present within btcd for some time, however it wasn’t
activated by default.
With this commit, once segwit has activated, the ScriptStrictMultiSig
will also be activated within the Script VM. Additionally, the
ScriptStrictMultiSig is now a standard script verification flag which
is used unconditionally within the mempool.
This commit implements full witness program validation for the
currently defined version 0 witness programs. This includes validation
logic for nested p2sh, p2wsh, and p2wkh. Additionally, when in witness
validation mode, an additional set of constrains are enforced such as
using the new sighash digest algorithm and enforcing clean stack
behavior within witness programs.
This commit fixes an off-by-one error which is only manifested by the
new behavior of OP_CODESEPARATOR within sig hashes triggered by the
segwit behavior. The current behavior within the Script VM
(txscript.Engine) is known to be fully correct to the extent that it has
been verified. However, once segwit activates a consensus divergence
would emerge due to *when* the program counter was incremented in the
previous code (pre-this-commit).
Currently (pre-segwit) when calculating the pre-image to a transaction
sighash for signature verification, *all* instances of OP_CODESEPARATOR
are removed from the subScript being signed before generating the final
sighash. SegWit has additional nerfed the behavior of OP_CODESEPARATOR
by no longer removing them (and starting after the last instance), but
instead simply starting the subScript to be directly *after* the last
instance of an OP_CODESEPARATOR within the pkScript.
Due to this new behavior, without this commit, an off-by-one error
(which only matters post-segwit), would cause txscript to generate an
incorrect subScript since the instance of OP_CODESEPARATOR would remain
as part of the subScript instead of being sliced off as the new behavior
dictates. The off-by-one error itself is manifested due to a slight
divergence in txscript.Engine’s logic compared to Bitcoin Core. In
Bitcoin Core script verification is as follows: first the next op-code
is fetched, then program counter is incremented, and finally the op-code
itself is executed. Before this commit, btcd flipped the order
of the last two steps, executing the op-code *before* the program
counter was incremented.
This commit fixes the post-segwit consensus divergence by incrementing
the program-counter *before* the next op-code is executed. It is
important to note that this divergence is only significant post-segwit,
meaning that txscript.Engine is still consensus compliant independent of
this commit.
This commit introduces a series of internal and external helper
functions which enable the txscript package to be aware of the new
standard script templates introduced as part of BIP0141. The two new
standard script templates recognized are pay-to-witness-key-hash
(P2WKH) and pay-to-witness-script-hash (P2WSH).
This commit implements most of BIP0143 by adding logic to implement the
new sighash calculation, signing, and additionally introduces the
HashCache optimization which eliminates the O(N^2) computational
complexity for the SIGHASH_ALL sighash type.
The HashCache struct is the equivalent to the existing SigCache struct,
but for caching the reusable midstate for transactions which are
spending segwitty outputs.
The btclog package has been changed to defining its own logging
interface (rather than seelog's) and provides a default implementation
for callers to use.
There are two primary advantages to the new logger implementation.
First, all log messages are created before the call returns. Compared
to seelog, this prevents data races when mutable variables are logged.
Second, the new logger does not implement any kind of artifical rate
limiting (what seelog refers to as "adaptive logging"). Log messages
are outputted as soon as possible and the application will appear to
perform much better when watching standard output.
Because log rotation is not a feature of the btclog logging
implementation, it is handled by the main package by importing a file
rotation package that provides an io.Reader interface for creating
output to a rotating file output. The rotator has been configured
with the same defaults that btcd previously used in the seelog config
(10MB file limits with maximum of 3 rolls) but now compresses newly
created roll files. Due to the high compressibility of log text, the
compressed files typically reduce to around 15-30% of the original
10MB file.
The github markdown interpreter has been changed such that it no longer
allows spaces in between the brackets and parenthesis of links and now
requires a newline in between anchors and other formatting. This
updates all of the markdown files accordingly.
While here, it also corrects a couple of inconsistencies in some of the
README.md files.
Now that glide is used for version management and a specific commit of
the upstream repository can be locked it is no longer necessary to
maintain a fork of the package specifically to keep a stable dependency.
While here, update the glide dependency for btcutil as well since it was
switched to use the upstream path as well.
This simplifies the code based on the recommendations of the gosimple
lint tool.
Also, it increases the deadline for the linters to run to 10 minutes and
reduces the number of threads that is uses. This is being done because
the Travis environment has become increasingly slower and it also seems
to be hampered by too many threads running concurrently.
ScriptVerifyNullFail defines that signatures must be empty if a
CHECKSIG or CHECKMULTISIG operation fails.
This commit also enables ScriptVerifyNullFail at the mempool policy
level.
This updates the data driven transaction script tests to use the most
recent format and test data as implemented by Core so the test data can
more easily be updated and help prove cross-compatibility correctness.
In particular, the new format combines the previously separate valid and
invalid test data files into a single file and adds a field for the
expected result. This is a nice improvement since it means tests can
now ensure script failures are due to a specific expected reason as
opposed to only generically detecting failure as the previous format
required.
The btcd script engine typically returns more fine grained errors than
the test data expects, so the test adapter handles this by allowing
expected errors in the test data to be mapped to multiple txscript
errors.
It should also be noted that the tests related to segwit have been
stripped from the data since the segwit PR has not landed in master yet,
however the test adapter does recognize the new ability for optional
segwit data to be supplied, though it will need to properly construct
the transaction using that data when the time comes.
This converts the majority of script errors from generic errors created
via errors.New and fmt.Errorf to use a concrete type that implements the
error interface with an error code and description.
This allows callers to programmatically detect the type of error via
type assertions and an error code while still allowing the errors to
provide more context.
For example, instead of just having an error the reads "disabled opcode"
as would happen prior to these changes when a disabled opcode is
encountered, the error will now read "attempt to execute disabled opcode
OP_FOO".
While it was previously possible to programmatically detect many errors
due to them being exported, they provided no additional context and
there were also various instances that were just returning errors
created on the spot which callers could not reliably detect without
resorting to looking at the actual error message, which is nearly always
bad practice.
Also, while here, export the MaxStackSize and MaxScriptSize constants
since they can be useful for consumers of the package and perform some
minor cleanup of some of the tests.
The CSV consensus rules dictate that the opcode fails when the
transaction version is not at least version 2, however that only applies
if the disable flag is not set in the sequence.
This is not an issue at the current time because we do not yet enforce
CSV at a consensus level, however, I noticed this discrepancy when doing
a thorough audit of the CSV paths due to the ongoing work to add full
consensus-enforced CSV support.
As a result, this must be merged prior to enabling consensus enforcement
for CSV or it would open up the potential for a hard fork.
This modifies the NewMsgTx function to accept the transaction version as
a parameter and updates all callers.
The reason for this change is so the transaction version can be bumped
in wire without breaking existing tests and to provide the caller with
the flexibility to create the specific transaction version they desire.
This modifies the recently-added NullDataScript function in several
ways in an effort to make them more consistent with the tests in the
rest of the code base and improve/correct the logic:
- Use the hexToBytes and mustParseShortForm functions
- Consistently format the test errors
- Replace the valid bool flag with an expected error and test against it
- Ensure the returned script type is the expected type in all cases
This adds a new function named NullDataScript to the txscript package that returns a provably-pruneable OP_RETURN script with the provided data. The function will return an error if the provided data is larger than the maximum allowed length for a nulldata script to be be considered standard.
Putting the test code in the same package makes it easier for forks
since they don't have to change the import paths as much and it also
gets rid of the need for internal_test.go to bridge.
Also, do some light cleanup on a few tests while here.
This corrects the isNullData standard transaction type test to work
properly with canonically-encoded data pushes. In particular, single
byte data pushes that are small integers (0-16) are converted to the
equivalent numeric opcodes when canonically encoded and the code failed
to detect them properly.
It also adds several tests to ensure that both canonical and
non-canonical nulldata scripts are recognized properly and modifies the
test failure print to include the script that failed.
This does not affect consensus since it is just a standardness check.
This exposes a new function on the ScriptBuilder type named AddOps that
allows multiple opcodes to be added via a single call and adds tests to
exercise the new function.
Finally, it updates a couple of places in the signing code that were
abusing the interface by setting its private script directly to use the
new public function instead.
This is mostly a backport of some of the same modifications made in
Decred along with a few additional things cleaned up. In particular,
this updates the code to make use of the new chainhash package.
Also, since this required API changes anyways and the hash algorithm is
no longer tied specifically to SHA, all other functions throughout the
code base which had "Sha" in their name have been changed to Hash so
they are not incorrectly implying the hash algorithm.
The following is an overview of the changes:
- Remove the wire.ShaHash type
- Update all references to wire.ShaHash to the new chainhash.Hash type
- Rename the following functions and update all references:
- wire.BlockHeader.BlockSha -> BlockHash
- wire.MsgBlock.BlockSha -> BlockHash
- wire.MsgBlock.TxShas -> TxHashes
- wire.MsgTx.TxSha -> TxHash
- blockchain.ShaHashToBig -> HashToBig
- peer.ShaFunc -> peer.HashFunc
- Rename all variables that included sha in their name to include hash
instead
- Update for function name changes in other dependent packages such as
btcutil
- Update copyright dates on all modified files
- Update glide.lock file to use the required version of btcutil
This changes the script template parsing function to use a pointer into
the constant global opcode array for parsed opcodes as opposed to making
a copy of the opcode entries which causes unnecessary allocations.
Profiling showed that after roughly 48 hours of operation, this
copy was the culprit of 207 million unnecessary allocations.
Profiles discovered that lookups into the signature cache included an
expensive comparison to the stored `sigInfo` struct. This lookup had the
potential to be more expensive than directly verifying the signature
itself!
In addition, evictions were rather expensive because they involved
reading from /dev/urandom, or equivalent, for each eviction once the
signature cache was full as well as potentially iterating over every
item in the cache in the worst-case.
To remedy this poor performance several changes have been made:
* Change the lookup key to the fixed sized 32-byte signature hash
* Perform a full equality check only if there is a cache hit which
results in a significant speed up for both insertions and existence
checks
* Override entries in the case of a colliding hash on insert Add an
* .IsEqual() method to the Signature and PublicKey types in the
btcec package to facilitate easy equivalence testing
* Allocate the signature cache map with the max number of entries in
order to avoid unnecessary map re-sizes/allocations
* Optimize evictions from the signature cache Delete the first entry
* seen which is safe from manipulation due to
the pre image resistance of the hash function
* Double the default maximum number of entries within the signature
cache due to the reduction in the size of a cache entry
* With this eviction scheme, removals are effectively O(1)
Fixes#575.
This modifies the conversion of the output index from the JSON-based
test data for valid and invalid transactions as well as the signature
hash type for signature hash tests to first convert to a signed int and
then to an unsigned int. This is necessary because the result of a
direct conversion of a float to an unsigned int is implementation
dependent and doesn't result in the expected value on all platforms.
Also, while here, change the function names in the error prints to match
the actual names.
Fixes#600.
500 tests with various transactions and scripts, verifying that
calcSignatureHash generates the expected hash in each case.
This requires changing SigHashType to uint32; that won't affect the
standard use-cases, but will make calcSignatureHash behave more like the
Core counterpart for non-standard SigHashType settings, like those in
some of these tests.
First, it removes the documentation section from all the README.md files
and instead puts a web-based godoc badge and link at the top with the
other badges. This is being done since the local godoc tool no longer
ships with Go by default, so the instructions no longer work without
first installing godoc. Due to this, pretty much everyone uses the
web-based godoc these days anyways. Anyone who has manually installed
godoc won't need instructions.
Second, it makes sure the ISC license badge is at the top with the other
badges and removes the textual reference in the overview section.
Finally, it's modifies the Installation section to Installation and
Updating and adds a '-u' to the 'go get' command since it works for both
and thus is simpler.
isMultiSig was not verifying the number of pubkeys specified matched
the number of pubkeys provided. This caused certain non-standard
scripts to be considered multisig scripts.
However, the script still would have failed during execution.
NOTE: This only affects whether or not the script is considered
standard and does NOT affect consensus.
Also, add a test for this check.
See https://github.com/bitcoin/bips/blob/master/bip-0065.mediawiki for
more information.
This commit mimics Bitcoin Core commit bc60b2b4b401f0adff5b8b9678903ff8feb5867b
and includes additional tests from Bitcoin Core commit
cb54d17355864fa08826d6511a0d7692b21ef2c9
We've already been generating lowS sigs for quite a while. This removes
the malleability vector.
This mimics Bitcoin Core commit 49dd5c629df0a08cf3b1ea8085c03312d1a81696
Introduce an ECDSA signature verification into btcd in order to
mitigate a certain DoS attack and as a performance optimization.
The benefits of SigCache are two fold. Firstly, usage of SigCache
mitigates a DoS attack wherein an attacker causes a victim's client to
hang due to worst-case behavior triggered while processing attacker
crafted invalid transactions. A detailed description of the mitigated
DoS attack can be found here: https://bitslog.wordpress.com/2013/01/23/fixed-bitcoin-vulnerability-explanation-why-the-signature-cache-is-a-dos-protection/
Secondly, usage of the SigCache introduces a signature verification
optimization which speeds up the validation of transactions within a
block, if they've already been seen and verified within the mempool.
The server itself manages the sigCache instance. The blockManager and
txMempool respectively now receive pointers to the created sigCache
instance. All read (sig triplet existence) operations on the sigCache
will not block unless a separate goroutine is adding an entry (writing)
to the sigCache. GetBlockTemplate generation now also utilizes the
sigCache in order to avoid unnecessarily double checking signatures
when generating a template after previously accepting a txn to the
mempool. Consequently, the CPU miner now also employs the same
optimization.
The maximum number of entries for the sigCache has been introduced as a
config parameter in order to allow users to configure the amount of
memory consumed by this new additional caching.
While current existing numeric opcodes are limited to 4 bytes, new
opcodes may need different limits.
This mimics Bitcoin Core commit 99088d60d8a7747c6d1a7fd5d8cd388be1b3e138
This commit modifies the DisasmString function to use a bytes buffer for
constructing the disassembled string instead of naive string
concatenation. This makes a huge difference when disassembling scripts
with large numbers of opcodes.
IsUnspendable takes a public key script and returns whether it is
spendable.
Additionally, hook this into the mempool isDust function, since
unspendable outputs can't be spent.
This mimics Bitcoin Core commit 0aad1f13b2430165062bf9436036c1222a8724da
This change moves IsFinalizedTransaction to txscript and also changes
the first argument to take a wire.MsgTx instead of btcutil.Tx. This
is needed for an upcoming diff in which txscript will require
IsFinalizedTransaction and we do not want to import the btcd/blockchain.
This commit contains fixes from the results of a thorough audit of
txscript to find any cases of script evaluation which doesn't match the
required consensus behavior. These conditions are fairly obscure and
highly unlikely to happen in any real scripts, but they could have
nevertheless been used by a clever attacker with malicious intent to
cause a fork.
Test cases which exercise these conditions have been added to the
reference tests and will contributed upstream to improve the quality for
the entire ecosystem.
Unlike OP_IF and OP_NOTIF which interpret the top stack item as a
number, OP_IFDUP interprets it as a boolean. This has important
consequences because numbers are imited to int32s while booleans can be
an arbitrary number of bytes.
The offending script was found and reported by Jonas Nick through the
use of fuzzing.
- Move reference tests to test package since they are intended to
exercise the engine as callers would
- Improve the short form script parsing to allow additional opcodes:
DATA_#, OP_#, FALSE, TRUE
- Make use of a function to decode hex strings rather than manually
defining byte slices
- Update the tests to make use of the short form script parsing logic
rather than manually defining byte slices
- Consistently replace all []byte{} and [][]byte{} with nil
- Define tests only used in a specific function inside that func
- Move invalid flag combination test to engine_test since that is what
it is testing
- Remove all redundant script tests in favor of the JSON-based tests in
the data directory.
- Move several functions from internal_test.go to the test files
associated with what the tests are checking
This commit moves all code related to standard scripts into a separate
file named standard.go as well as the associated tests into
standard_test.go. Since the code in address.go and address_test.go is
only related to standard scripts, it has been combined into the new
files and the old files deleted.
The intent here is to make it clear that the code in standard.go is not
related to consensus.
This commit implements a new type, named scriptNum, for handling all
numeric values used in scripts and converts the code over to make use of
it. This is being done for a few of reasons.
First, the consensus rules for handling numeric values in the scripts
require special handling with subtle semantics. By encapsulating those
details into a type specifically dedicated to that purpose, it
simplifies the code and generally helps prevent improper usage.
Second, the new type is quite a bit more efficient than big.Ints which
are designed to be arbitrarily large and thus involve a lot of heap
allocations and additional multi-precision bookkeeping. Because this
new type is based on an int64, it allows the numbers to be stack
allocated thereby eliminating a lot of GC and also eliminates the extra
multi-precision arithmetic bookkeeping.
The use of an int64 is possible because the consensus rules dictate that
when data is interpreted as a number, it is limited to an int32 even
though results outside of this range are allowed so long as they are not
interpreted as integers again themselves. Thus, the maximum possible
result comes from multiplying a max int32 by itself which safely fits
into an int64 and can then still appropriately provide the serialization
of the larger number as required by consensus.
Finally, it more closely resembles the implementation used by Bitcoin
Core and thus makes is easier to compare the behavior between the two
implementations.
This commit also includes a full suite of tests with 100% coverage of
the semantics of the new type.
This commit contains a lot of cleanup on the txscript code to make it
more consistent with the code throughout the rest of the project. It
doesn't change any operational logic.
The following is an overview of the changes:
- Add a significant number of comments throughout in order to better
explain what the code is doing
- Fix several comment typos
- Move a couple of constants only used by the engine to engine.go
- Move a variable only used by the engine to engine.go
- Fix a couple of format specifiers in the test prints
- Reorder functions so they're defined before/closer to use
- Make the code lint clean with the exception of the opcode definitions
- Remove all redundant opcode tests in favor of the JSON-based tests
in the data directory.
- Remove duplicate stack nip test
- Add new tests to data/script_invalid.json to exercise additional
negative error paths
- Remove old unneeded pubkey trace code from opcodeCheckSig
- Simplify and improve the disassembly print function
- Add new tests to directly test all individual opcode disassembly
- Add new tests to directly test opcode disabled function which does not
get invoked during ordinary execution
- Improve test coverage of opcode.go
This commit moves the opcode execution logic from the opcode type to the
engine type because execution of an opcode modifies the engine state
(primarily the main and alternate data stacks) as opposed to the state
of the opcode. Making the engine the receiver more clearly indicates
this fact.
This commit very slightly optimizes the cryptographic hashing performed
by the script opcodes by calling the hash sum routines directly (for
those that support it) rather than allocating a new generic hash.Hash
hasher instance for them.
This commit unexports the Stack type since it is only intended to be
used internally during script execution. Further, the engine exposes
the {G,S}etStack and {G,S}etAltStack functions which return the items as
a slice of byte slices ([][]byte) for caller access while stepping.
This commit improves the way the conditional execution stack is handled in
a few ways.
First, the current execution state is now pushed onto the end of the slice
rather than the front of it. This has been done because it results in
fewer allocations and is therefore more efficient.
Second, the need for allocating and setting an initial true in the
conditional stack has been eliminated. The vast majority of scripts don't
contain any conditionals, so there is no reason to allocate a slice when
it isn't needed.
Third, a new function has been added to the engine to determine if the
current conditional branch is executing named isBranchExecuting which
handles the fact the conditional execution stack can now be empty and
improves the readability of the code.
Finally, it removes a couple of TODOs which I have verified do not apply.
This commit exports a new map named OpcodeByName which can be used to
lookup an opcode value given a human-readable opcode name.
It also modifies the test function which does short form parsing to use
the new map instead of the internal array.
Closes#267.
This commit converts the opcode map to an array to improve performance.
Benchmark of executing a standard p2pk transaction:
New: BenchmarkExecute 2000 784349 ns/op
Old: BenchmarkExecute 2000 792600 ns/op
The time is dominated by the signature checking as expected, however there
is still an increase in speed.
This commit modifies the definition of the opcodes to their hex
counterparts rather than decimal since it is far more common to see
scripts in hex. This makes it easier when manually looking at script
dumps to correlate opcodes. However, since there are also cases where it
is useful to see the decimal value of the opcode, the decimal value has
been left as a comment. Obviously converting the numbers is trivial, but
it is handy when looking at the opcode definitions to already have it
there.
In addition, it syncs the opcodes with the latest Bitcoin Core internal
opcodes for completeness and modifies the tests accordingly.
Rather than storing a separate bool for whether or not each flag is set in
every script engine instance, store the flags and check if the relevant
flag is set from each specific location.
This reduces the memory needed by each script engine instance and means
future flags will not require new fields.
This commit renames the Script type to Engine to better reflect its
purpose. It also renames the NewScript function to NewEngine to match.
This is being done because name Script for the engine is confusing since
it implies it is an actual script rather than the execution environment
for the script. It also paves the way for eventually supplying a
ParsedScript type which will be less likely to be confused with the
execution environment.
While moving the code, some additional variable names and comments have
been updated to better match the style used throughout the rest of the
code base. In addition, an attempt has been made to use consistent naming
of the engine as 'vm' instead of using different variables names as it was
previously.
Finally, the relevant engine code has been moved into a new file named
engine.go and related tests moved to engine_test.go.
This commit separates the test functions and associated helper functions
which are used to execute the reference transaction and script tests from
Bitcoin Core into a separate file named reference_test.go.
Also, add a few comments and fix a couple of typos along the way.
This commit removes the unnecessary sigScript parameter from the
txscript.NewScript function. This has bothered me for a while because it
can and really should be obtained from the provided transaction and input
index. The way it was, the passed script could technically be different
than what is in the transaction. Obviously that would be an improper use
of the API, but it's safer and more convenient to simply pull it from the
provided transaction and index.
Also, since the function signature is changing anyways, make the input
index parameter come after the transaction which it references.