FABRIC Testbed (HyperLedgerLab)

Updated 30 January 2026

FABRIC Testbed, also known as HyperLedgerLab, is a fully automated, Kubernetes-based infrastructure for benchmarking and analyzing transaction failures in Hyperledger Fabric networks.
It deploys realistic workloads, detailed instrumentation, and formal taxonomies to measure performance, transaction reliability, and concurrency control under diverse configurations.
The testbed supports comparative experiments on advanced optimizations like Fabric++, Streamchain, and FabricSharp, guiding adaptive tuning and optimal system configurations.

The FABRIC Testbed, known as HyperLedgerLab, is a comprehensive, fully automated Kubernetes-based benchmarking infrastructure for studying transaction failures in Hyperledger Fabric 1.4.x networks. By deploying realistic workloads, supporting detailed instrumentation, and offering formal taxonomies of failure types, HyperLedgerLab enables rigorous, repeatable experiments on permissioned blockchain performance, transaction reliability, and concurrency control. The testbed incorporates standard and synthetic chaincodes, scales from default to large clusters, and captures the nuanced interplay of block sizes, endorsement policies, key distributions, and network conditions. HyperLedgerLab has yielded foundational empirical findings, practical operational guidelines, and has evaluated recent advanced optimizations—Fabric++, Streamchain, and FabricSharp—providing comparative analysis and pathways for further research (Chacko et al., 2021).

1. HyperLedgerLab Architecture and Key Operational Flow

HyperLedgerLab integrates cloud infrastructure provisioning (OpenStack or any cloud API), automated Kubernetes cluster ensemble, and deployment of Fabric 1.4 network components as pods—including peers, orderers, Certificate Authorities, and CouchDB/LevelDB state databases. Client-side workloads are generated via Caliper adapters (Node.js SDK), with custom Caliper extensions tracing precise failure causes.

The Fabric transaction lifecycle within HyperLedgerLab is structured by the Execute–Order–Validate (E–O–V) paradigm:

Execution: Clients submit transaction proposals (chaincode name, function, arguments) to endorsing peers, which execute chaincode against their local world-state, yielding read-set and write-set artifacts, and sign the result as endorsements.
Ordering: The client assembles the collected endorsements (satisfying a policy threshold, e.g., “N-of-[all]”, “2-of-(orgs)”, quorum, etc.), and submits the signed envelope to the ordering service. Ordering nodes (using Raft/Kafka) batch transactions into blocks, based on block size, timeout, or byte-size criteria, and broadcast blocks to all peers.
Validation & Commit: Each peer verifies that endorsements satisfy the endorsement policy (VSCC stage) and executes an MVCC check: for each key in the read-set, the local world-state version must match the recorded version. Transactions passing both checks commit their write-set and update key versions; otherwise, they are marked aborted. The block, annotated with commit/abort metadata, is appended to the ledger.

Key architectural variables include the number of organizations, peers per org, orderers, client processes, block size, block timeout, state database selection, endorsement policy configuration, transaction arrival rates, and synthetic workload distribution generators.

2. Formal Taxonomy of Transaction Failure Types

HyperLedgerLab formally defines transaction failures in Fabric, enabling reproducible tracking and benchmarking. Employing notation:

$P$ = set of endorsing peers
$T$ = set of transactions
$B$ = set of blocks
$RS_{T_i,P_a}$ = read-set from peer $P_a$ for transaction $T_i$
$WS_{T_i,P_a}$ = write-set from peer $P_a$ for $T_i$
$WX$ = world-state key-version set

Three principal failure types are delineated:

Endorsement Policy Failure: Occurs if two endorsing peers disagree on any read-set version for the same key; formally, $\exists\, T_i \in T,\, P_a \neq P_b \in P,\, K^R \in RS_{T_i,P_a} \cap RS_{T_i,P_b}$ such that $V^R_{k}(RS_{T_i,P_a}) \neq V^R_{k}(RS_{T_i,P_b})$ .
MVCC Read Conflict: At validation, failure if the read-version for key $K$ does not match its world-state version. Intra-block MVCC occurs if the conflicting write was earlier in the same block; inter-block MVCC if the write was from a prior block.
Phantom Read Conflict: For range queries over intervals $[i,j]$ , the validator aborts if validation discovers changes within the range (insertions/deletions) or finds missing keys compared to the initial scan.

The taxonomy further differentiates intra-block from inter-block MVCC conflicts and formally links range-scan validation to phantom detection.

3. Chaincode Workloads and Synthetic Load Generation

HyperLedgerLab provides four canonical, realistic chaincodes written in Go, supplemented by a configurable chaincode/workload generator that supports fine-tuned experimentation:

EHR (Electronic Health Records): 100 patient profiles plus 100 EHR documents; functions include access management and profile/EHR updates, typically driving $2\times$ Read+ $2\times$ Write per access grant.
DV (Digital Voting): 1,000 voters across 12 parties; functions encompass voting, election management, querying parties/results, with range-read operations for party queries.
SCM (Supply Chain Management): 5 LSPs, 2,400 logistics units; functions model ASN and shipping workflows with substantial range-read usage.
DRM (Digital Rights Management): 200 artwork metadata records and 200 right-holder IDs; includes creation, play count increments, rights queries, and revenue calculation via range scans.

The synthetic chaincode/workload generator allows specification of up to 100,000 keys, configurable function types (read/insert/update/delete/scan), key distributions (Zipfian or uniform), transaction type distributions (read-heavy, update-heavy, insert-heavy, delete-heavy, range-heavy), range scan count, and Zipf skew ( $\alpha=0 \ldots 2$ ) to modulate conflict rates.

4. Experimental Methodology and Metrics

HyperLedgerLab operates two reference clusters:

Cluster	Workers	Peers	Orderers	Clients	Peak TPS
C1	3	4/org	3	5	~200
C2	32	8 orgs×4	3	25	~200

Parameter sweeps cover Fabric version (1.4.x), state DB (LevelDB or CouchDB), block size (10–500 txs), block timeout (2 sec), endorsement policies, number of orgs, tx arrival rates (10–200 tps), workload mixes, key skew, and injected network delay (100±10 ms).

Instrumentation collects:

Total txs submitted/committed/aborted, with percentages for each failure type
Endorsement policy failures, intra-block/inter-block MVCC conflicts, phantom reads
End-to-end latency (execution, ordering, validation)
Committed throughput (number of committed txs per unit time)

These metrics enable granular mapping of configuration, workload, and environmental factors to observed transactional reliability and system performance.

5. Empirical Findings: Failure Modes, Performance, and Optimizations

Key results from extensive HyperLedgerLab experimentation reveal:

Block Size Tuning: Optimal block size scales roughly linearly with arrival rate; tuning can reduce aggregate transaction failures by up to 60%.
MVCC Dynamics: Intra-block MVCC failure rate increases with block size, while inter-block MVCC decreases; overall MVCC minimized at intermediary block size.
State Database Selection: LevelDB delivers $3-5\times$ lower latency than CouchDB due to direct embedding (vs REST APIs); CouchDB range scans ( $\approx88$ ms/query) are orders of magnitude slower than LevelDB ( $\approx1.4$ ms).
Organizations and Endorsement Policy: Increasing organizations and complex endorsement policies result in 30–50% higher endorsement failures and increased validation latency.
Workload Mix Impact: Insert-heavy and delete-heavy workloads result in $\ll1\%$ conflicts; update-heavy can trigger up to 40% MVCC failures under worst-case access patterns; reads and range-reads fail predominantly when racing with writes (10–20%).
Key Distribution: Zipfian ( $\alpha=2$ ) skew multiplies conflict rates by $2-3\times$ compared to uniform access ( $\alpha=0$ ).
Network Delay: Injected 100ms RTT increases both endorsement failures and MVCC conflicts by 20–30%.

Advanced optimizations have distinct effects:

Fabric++ (intra-block reordering, early abort): Reduces failures by 20–30% under mixed work‐ loads, optimally applied to large blocks; effectiveness diminishes if reordering opportunities are scarce.
Streamchain (streaming, RAM-disk): Reduces MVCC failures by up to 50% under low loads ( $\leq$ 100 tps), but throughput degrades for higher rates or without RAM-disk.
FabricSharp (global conflict graph, early abort): Eliminates on-chain MVCC failures, at the cost of lowered throughput and modest increase in endorsement failures due to stale snapshots; not applicable to workloads requiring range scans.

Comparison (EHR chaincode, 100 tps, block size=10, C1 cluster):

Variant	Failure Rate	Latency	Throughput
Fabric 1.4	~25%	~300 ms	~180 tps
Fabric++	~15%	~320 ms	~175 tps
Streamchain	~12%	~80 ms	~160 tps
FabricSharp	~8% (endorsement only)	~310 ms	~150 tps

6. Operational Guidelines and Lessons Learned

Derived evidence leads to specific best practices:

Adaptive Block Sizing: Continuous monitoring of transaction arrival rates and chaincode-induced conflict profiles enables dynamic adjustment of block size, optimizing MVCC failure rates.
Minimizing Endorsement Failures: Limiting the number of organizations, co-locating endorsers, and simplifying endorsement policies reduce failure rates and latency.
State DB Selection: LevelDB is preferred unless rich queries or extensive range scans are essential; CouchDB incurs much higher latency and inhibits phantom detection.
Chaincode/Keyspace Engineering: Sharding composite keys and minimizing range queries (favoring rollup counters or off-chain read replication) mitigate “hot” key contention and reduce conflicts.
Client-Side Optimization: Avoid routing pure read-only transactions through ordering unless an audit trail is required; batch read-only queries or utilize off-chain replicas.
Optimization Selection: Fabric++ suits medium block sizes and moderate MVCC rates; Streamchain should only be used at low loads and with RAM-disk; FabricSharp is appropriate for zero MVCC tolerance (on-chain), restricted throughput, and non-range-scan workloads.

7. Research Directions and Prospects

The HyperLedgerLab results motivate extensions and novel investigations:

Adaptive Block Sizing Algorithms: Propose algorithms that tune block sizes reactively utilizing live transactional conflict metrics.
High-Performance Rich Query Engines: Identify or develop alternatives to CouchDB, targeting efficient JSON queries and lower latency, while enabling phantom detection.
World-State Synchronization: New protocols to synchronize endorsing peer world-states, decreasing endorsement policy failures arising from data inconsistencies.
Automated Chaincode Optimization: Analytical tools to restructure keyspaces and range queries, thereby lowering conflict rates and maximizing throughput.
Hybrid Concurrency Control Techniques: Combine optimistic reordering and selective pessimistic locking to improve transactional reliability and reduce abort rates.

A plausible implication is that fine-grained, adaptive control over network, workload, and policy parameters is essential for scalable, failure-resilient Fabric deployments. HyperLedgerLab's rigorous methodology sets a methodological precedent for empirical blockchain systems research (Chacko et al., 2021).

Markdown Upgrade to Chat

References (1)

Why Do My Blockchain Transactions Fail? A Study of Hyperledger Fabric (Extended version)* (2021)

Topic to Video (Beta)

No one has generated a video about this topic yet.

Whiteboard

No one has generated a whiteboard explanation for this topic yet.

Follow Topic

Get notified by email when new papers are published related to FABRIC Testbed.