Overview
Bitcoin Core includes an internal benchmarking framework for measuring performance of critical components. Benchmarks cover:- Cryptographic algorithms (SHA1, SHA256, SHA512, RIPEMD160, Poly1305, ChaCha20)
- Rolling bloom filter
- Coins selection
- Thread queue
- Wallet balance
- And more
Building the Benchmark Binary
To compile the benchmark binary:The bench runner will warn if you configure with
-DCMAKE_BUILD_TYPE=Debug. Consider whether building without debug mode will impact your benchmarks by unlatching log printers and lock analysis.Running Benchmarks
Execute all benchmarks:Example Output
- ns/op: Nanoseconds per operation
- op/s: Operations per second
- err%: Error percentage (variance)
- total: Total time in seconds
- ns/byte: Nanoseconds per byte (for data processing benchmarks)
- byte/s: Bytes per second throughput
Benchmark Options
View all available options:- Listing benchmarks without running them
- Using regex filters to run specific benchmarks
- Controlling number of iterations
- Adjusting time limits
Filter Benchmarks
Run specific benchmarks using regex patterns:List Benchmarks
See all available benchmarks without running:What to Benchmark
Benchmarks should focus on performance-critical components where degradation has high cost:Initial Block Download (IBD)
Cost: Slow IBD makes full node operation less accessible Benchmark candidates:- Block validation
- Script verification
- Signature checking
- Database operations
- UTXO set updates
Block Template Creation
Cost: Slow template creation may reduce fee revenue for miners Benchmark candidates:- Transaction selection algorithms
- Ancestor/descendant calculation
- Fee estimation
- Mempool operations
Block Propagation
Cost: Slow propagation may increase orphaned blocks and mining centralization Benchmark candidates:- Compact block encoding/decoding
- Block serialization
- Network message processing
Best Practices
When to Add Benchmarks
Benchmarks are appropriate for:- Core consensus code (validation, signatures)
- Frequently called functions
- Data structure operations on large datasets
- Cryptographic primitives
- Database operations
- Network protocol processing
When NOT to Use Benchmarks
Benchmarks are ill-suited for testing denial-of-service issues as they use restricted input sets (introducing bias). Use fuzz tests instead, which explore the full input space.
Performance Improvements
A performance improvement may be rejected if:- Clear end-to-end performance gain cannot be demonstrated
- Code bloat is too high relative to the improvement
- Review/maintenance burden outweighs the benefit
- The benchmark doesn’t reflect real-world usage patterns
Writing Good Benchmarks
- Isolate what you’re measuring: Minimize setup/teardown in the benchmark loop
- Use realistic data: Benchmark with representative inputs
- Avoid optimization artifacts: Ensure the compiler doesn’t optimize away the code
- Measure what matters: Focus on real bottlenecks, not micro-optimizations
- Consider cache effects: Benchmark with cold and warm caches
- Document assumptions: Note any special conditions or configurations
Example Benchmark
Here’s the structure of a typical benchmark:Advanced Benchmarking
For more comprehensive performance analysis beyond the internal framework:Benchcoin
For in-depth performance monitoring of operations like reindex or IBD: Repository: bitcoin-dev-tools/benchcoin Benchcoin provides:- Full IBD timing
- Reindex performance
- Memory usage tracking
- CPU profiling integration
- Historical performance comparison
System-Level Profiling
Complement benchmarks with system profiling tools:- perf: CPU profiling on Linux (see Developer Notes)
- Valgrind: Memory profiling and cache analysis
- gprof: Function-level profiling
- Hotspot: Visual profiling (Linux)
- Instruments: Profiling on macOS
Interpreting Results
Statistical Significance
Theerr% column shows measurement variance. Lower is better:
- < 1%: Very stable, reliable results
- 1-5%: Good, results are trustworthy
- 5-10%: Acceptable, but consider running more iterations
- > 10%: High variance, results may be unreliable
Comparing Results
When comparing benchmark runs:- Control variables: Use same hardware, OS, and build configuration
- Multiple runs: Run benchmarks several times to account for variance
- System state: Close unnecessary programs, ensure consistent CPU frequency
- Warm-up: First run may be slower due to cold caches
- Statistical analysis: Use median or mean with confidence intervals
Red Flags
- Results varying wildly between runs (high err%)
- Unexpectedly fast results (compiler may have optimized away code)
- Results not matching real-world performance observations
- Benchmark spending most time in setup/teardown
Continuous Integration
Benchmarks in CI:- Compile benchmarks to ensure they build correctly
- Run smoke tests to verify they execute without crashing
- Compare against baseline to detect regressions (in some projects)