RosettaStone 2.0 Benchmarking Backplane

Updated 31 January 2026

RosettaStone 2.0 is an open-source benchmarking backplane offering a reproducible RTL-to-GDS flow for both conventional 2D and Pin-3D F2F designs.
It integrates co-versioned tools, standardized METRICS2.1 reporting, and CI-based regression to ensure transparent and fair comparisons of key PPA metrics.
The framework fosters community trust with a governance model based on GitHub pull requests, DCO compliance, and a branch-based leaderboard for reproducible results.

RosettaStone 2.0 is an open-source benchmarking “backplane” designed to enable sustainable, transparent, and reproducible benchmarking for academic VLSI physical-design (PD) research, with rigorous support for both conventional planar (2D) designs and Pin-3D-style face-to-face (F2F) hybrid-bonded 3D designs. Built atop the OpenROAD-Research and OpenROAD-flow-scripts (ORFS)-Research repositories, it delivers fully co-versioned RTL-to-GDS reference flows, comprehensive integration with continuous integration (CI)-based regression, standardized reporting via the METRICS2.1 convention, and a community-facing governance model anchored in verifiable GitHub pull-requests and Developer Certificate of Origin (DCO) compliance (Jiang et al., 24 Jan 2026).

1. Goals, Motivations, and Scope

Academic physical design (PD) research has faced long-standing obstacles due to fragmented flows, unavailable or inconsistent PDK enablements (e.g., Liberty files, LEFs, parasitics), and lack of standardized evaluation protocols. This previously prevented rigorous apples-to-apples comparisons across both designs and integration styles—particularly for the emerging Pin-3D F2F 3D integration class, for which no reference flow existed. RosettaStone 2.0 addresses these gaps by:

Co-versioning tools, benchmarks, and evaluation scripts within the OpenROAD-Research/ORFS-Research environment, ensuring all flow or PDK updates automatically trigger regression and metric validation.
Providing a fully open RTL-to-GDS pipeline for Pin-3D F2F designs, with explicit HBT (Hybrid-Bonded Through) via modeling, tier-aware standard-cell libraries, and stage-wise checkpoints for comparable 2D/3D studies.
Standardizing reporting and metric definitions using the METRICS2.1 schema, reducing ambiguity in area, power, timing, and wirelength comparisons.
Enforcing transparent, reproducible results through a community-managed leaderboard, with posts validated via CI and DCO-signed contributions.

2. System Architecture and RTL-to-GDS Workflow

RosettaStone 2.0 instantiates parallel RTL-to-GDS flows for both planar and Pin-3D F2F designs, sharing the ORFS-Research scripting infrastructure. Each flow is divided into six major stages:

Stage	2D Flow	Pin-3D F2F Flow
1	Synthesis & 2D abstraction: Input RTL, output gate-level netlist	Identical to 2D
2	Floorplanning	Floorplanning & timing-driven bipartitioning with TritonPart, partitioning via UBfactor sweep (bootstrapping for min cutsize); remap to tier-specific LEFs, apply COVER views
3	Iterative placement & legalization	Alternating tier placement, using "Restricted" (cell masters only on active tier, COVER on inactive) or "Flexible" (both masters available, with tier-legalization) strategies
4	Clock Tree Synthesis	CTS built on one tier and sinks connected via HBTs (homogeneous: bottom, heterogeneous: top)
5	Global/detailed routing & parasitic extraction	Both: model HBTs as cut-layer vias; SPEF parasitics include inter-tier R/C
6	Metrics/reporting	ORFS-Research generates METRICS2.1 JSON logs; COMM (commercial) flows reports parsed to same schema

Key enabling mechanisms for 3D flows include unified technology abstraction (HBTs as special cut-layer vias), tier-aware standard-cell libraries, per-tier reconstructed PDNs, and COVER LEF views that hide the inactive tier.

3. Integration, Continuous Integration, and Leaderboard Governance

RosettaStone 2.0 is distributed as an ORFS-Research submodule, deeply integrated with CI-based quality assurance. Each commit to the main ORFS-Research branch triggers a GitHub Actions workflow that:

Pins and checks out consistent OpenROAD-Research/RosettaStone 2.0 versions.
Runs both ORD (open) and COMM (commercial) flows on a suite of benchmarks.
Validates that all METRICS2.1 reports meet schema completeness and does not regress beyond fixed thresholds.
Publishes results and dashboards as artifacts for community review.

The leaderboard operates as a dedicated branch (e.g., leaderboard/2026-03) in the ORFS-Research repository, where each metrics update corresponds to a new commit. Contributor pull-requests must include valid “Signed-off-by” lines (DCO compliance), pass all CI regression tests, and only merge upon full verification—ensuring all posted results are reproducible by any third-party checking out the same commit and rerunning CI.

4. Evaluation Methodologies and METRICS2.1 Schema

RosettaStone 2.0 evaluation is standardized via METRICS2.1, which defines canonical schema elements for key PD metrics:

StdCell Area: $A_{\mathrm{stdcell}} = \sum_{c\in\mathcal{C}} \mathrm{area}(c)$
Core Die Area: Directly from final floorplan
Routed Wirelength: $\mathrm{rWL} = \sum_{e\in E} \ell(e)$
Worst-Negative Slack (WNS): $\mathrm{WNS} = \min_{i\in\mathrm{endpoints}} (T_{\mathrm{arr},i} - T_{\mathrm{req},i})$
Total-Negative Slack (TNS): $\mathrm{TNS} = \sum_{i: s_i < 0} s_i, \quad s_i = T_{\mathrm{arr},i} - T_{\mathrm{req},i}$
Dynamic Power (vectorless): $P_{\mathrm{dyn}} = \sum_{n \in \mathrm{nets}} C_n V^2 f \alpha_n$ , for fixed switching activity $\alpha_n$

Violations are categorized as design rule violations (DRVs) and failing endpoints (FEPs). All results, including power, runtime, memory, and domain-specific data (such as HBT count for 3D), are encoded in structured JSON logs. Commercial flow results are parsed into this schema for direct comparability.

Sample METRICS2.1-compliant report:

{
  "metrics_version": "METRICS2.1",
  "design": "aes",
  "enablement": "ASAP7_3D",
  "flow": "ORD",
  "clock_period_ns": 0.82,
  "area": { "core": 13342.6, "stdcell": 16446.5 },
  "wirelength_mm": 232.2,
  "timing": { "WNS_ns": -0.016, "TNS_ns": -0.043 },
  "power_mW": 20.34,
  "violations": { "DRVs": 3, "FEPs": 6 },
  "hbt_count": 904,
  "runtime_s": 1240.5,
  "memory_GB": 12.8
}

5. Transparency, Reproducibility, and Community Processes

RosettaStone 2.0 codifies reproducibility through a branch-based leaderboard submission system governed by both CI and DCO enforcement. Submissions require:

Forking ORFS-Research and updating to the new result commit.
Running the prescribed regression script for the chosen design/enablement.
Inspecting and staging the output metrics JSON.
Creating a GitHub pull-request subject to automated CI regression and DCO signature verification.
Successful merges only upon zero regressions and valid authorship certification.

This workflow fosters community trust, guarantees that posted benchmark results are always reproducible given the exact code and flow state, and prevents unauthorized or unverifiable updates.

6. Key Findings and Impact

RosettaStone 2.0 has enabled comprehensive 3D versus 2D baseline benchmarking across three RTL designs (aes, ibex, jpeg) and five enablements (ASAP7, NanGate45, 7+7, 45+45, 7+45). Notable experimental insights include:

Consistent, stage-wise reporting elucidates the impact of 3D stacking on standard PPA metrics (area, power, timing).
Tier strategy comparison demonstrated that a "Flexible" cross-tier master strategy typically reduces HBT usage and improves timing closure relative to "Restricted" assignment.
Mixed-toolchain ablation studies identified Yosys synthesis as a primary contributor to dynamic power increases, whereas OpenROAD routing dominates runtime overhead.
Detailed runtime breakdowns (e.g., for jpeg) indicate that detailed routing dominates total elapsed time in ORD-based flows.
Reducing F2F via pitch in Pin-3D designs (from 1.0 μm to ~0.5 μm) drastically lowers DRVs—enabling precise physical trade-off exploration not previously feasible in open-access flows.
Timing constraint sweeps reveal directly the coupling between slack (WNS, TNS), wirelength, power, HBT count, and violation metrics—demonstrating the utility of the benchmark in multi-objective optimization contexts.

7. Limitations and Future Directions

The Pin-3D reference flow currently does not achieve best-possible quality of results (QoR), instead serving as an evolving, maintained baseline for fair comparison. Certain optimization limitations arise from row rebuilding and pin hiding in heterogeneous stacks, suggesting the need for multi-library synthesis support. Planned roadmap extensions include:

Bookshelf/“fake” LEF/DEF translation via OpenDB
ArtNet-synthesized netlist integration
Expansion to TSV (Through-Silicon Via) and Monolithic 3D (M3D) enablements under the METRICS2.1 schema

This suggests that the framework will continue to evolve to support broader academic technology exploration and tighter alignment with industry-standard PD methodologies.

RosettaStone 2.0 provides a reproducible, community-driven infrastructure, co-versioning all flows and benchmarks, enforcing standardized reporting, and enabling direct comparison across both 2D and Pin-3D F2F integration styles, thereby supporting the advancement of transparent and sustainable physical design research (Jiang et al., 24 Jan 2026).

Markdown Upgrade to Chat

References (1)

Invited: Toward Sustainable and Transparent Benchmarking for Academic Physical Design Research (2026)

Topic to Video (Beta)

No one has generated a video about this topic yet.

Whiteboard

No one has generated a whiteboard explanation for this topic yet.

Follow Topic

Get notified by email when new papers are published related to RosettaStone 2.0.