Signature Semantic Decoder

Updated 4 July 2026

Signature Semantic Decoder is an approach that converts raw program behavior into compact semantic signatures for cross-program performance prediction.
It employs a two-stage process using an RWKV-based encoder for basic block embeddings and a Set Transformer for order-invariant signature aggregation.
The decoded signatures enable accurate CPI regression, clustering, and universal program point selection, achieving notable simulation speedups.

Searching arXiv for the cited papers and closely related context. A signature semantic decoder denotes an encoder–decoder pattern in which complex inputs are compressed into a compact semantic signature and subsequently decoded into task-relevant properties rather than reconstructed at the raw-data level. In the microarchitecture-simulation setting, the most explicit formulation is SemanticBBV, which maps executed basic blocks to learned embeddings, aggregates them into an order-invariant interval signature, and decodes that signature into performance quantities such as CPI while also supporting clustering, nearest-neighbor reuse, and cross-program inference (Liu et al., 11 Dec 2025). Related work in semantic communication and task-oriented decoding uses the same broad pattern with different signatures—semantic maps, explicit semantic bases, or scene graphs—suggesting that the term is best understood as an architectural principle rather than a single fixed algorithm (Yang et al., 2023).

1. Conceptual definition

In the SemanticBBV formulation, the signature semantic decoder is built around the idea that program behavior should be represented by a semantic and performance-aware signature rather than by arbitrary identifiers. Traditional BBV-based sampling represents each interval as counts over basic-block IDs assigned in order of first appearance, which makes the representation single-program in scope and prevents meaningful comparison across binaries. SemanticBBV replaces that representation with a learned signature that is comparable across programs, captures program semantics, and is explicitly aligned with microarchitectural behavior such as CPI (Liu et al., 11 Dec 2025).

This interpretation separates two operations. The encoder transforms raw symbolic inputs into a compact latent code; the decoder maps that code into behavioral quantities, cluster memberships, or reusable representative points. In SemanticBBV, the decoder is partly explicit—a CPI regression head $h(\mathbf{s}_I)$ —and partly geometric, because distances in signature space support clustering and cross-program reuse. This suggests a definition of a signature semantic decoder as a system in which latent codes are not merely compressed descriptors, but structured semantic objects whose geometry is intended to support downstream inference.

A common misconception is to equate such a decoder with conventional decompression. In the SemanticBBV framing, decoding is not recovery of the original instruction stream. It is recovery of performance-relevant semantics from a signature that has been trained to preserve semantic similarity, structural distinctiveness, and CPI sensitivity (Liu et al., 11 Dec 2025).

2. Representational shift from BBV to semantic signatures

The immediate motivation for SemanticBBV is the inadequacy of the classical Basic Block Vector for cross-program reasoning. In SimPoint-style sampling, an execution is partitioned into intervals such as 10M instructions, and each interval is represented by execution counts of basic blocks. Because each distinct basic block is assigned an ID in order of first appearance, BBV coordinate systems differ across runs and across programs. BB ID 0 in one program has no semantic relation to BB ID 0 in another. As a result, distances and centroids are meaningful only within a single execution trace, which blocks cross-program reuse of simulation knowledge (Liu et al., 11 Dec 2025).

A second limitation is the absence of semantic content predictive of hardware behavior. BBV entries are counts indexed by arbitrary IDs; semantically similar intervals compiled differently need not occupy nearby points in BBV space, while intervals with similar BBV shapes may have different CPI. SemanticBBV addresses both defects by constructing signatures from the semantics of basic blocks and by training those signatures to reflect performance similarity (Liu et al., 11 Dec 2025).

Formally, an interval $I$ is represented as a multiset of basic blocks $\{b_1,\dots,b_n\}$ , each with an embedding $\mathbf{v}_{\text{BBE}(b_i)}$ and execution frequency $f_i$ . The interval signature is

$\mathbf{s}_I = g\Big(\{(\mathbf{v}_{\text{BBE}(b_i)}, f_i)\}_{i=1}^{n(I)}\Big),$

where $g$ is a learned set function implemented by a Set Transformer. Because $g$ operates on a set, the resulting signature is invariant to input ordering. The signature therefore occupies a canonical shared space rather than a per-program index space (Liu et al., 11 Dec 2025).

This shift from identifier-count vectors to learned semantic signatures is the core representational move that makes the “decoder” meaningful: once the latent space is shared and order-invariant, it can support cross-program clustering, universal representative points, and performance inference.

3. Stage 1: semantic encoding of basic blocks

The first stage of SemanticBBV is an RWKV-based semantic encoder that converts an assembly basic block into a Basic Block Embedding. Tokenization uses six semantic dimensions: assembly token, instruction type, operand type, register type, access type, and flags. Immediate values and addresses are normalized into a single IMM token. The embedding layer size is 0.32M parameters, and the representation is intended to preserve structural information such as the dependence encoded in forms like [rsp+IMM] (Liu et al., 11 Dec 2025).

RWKV is used as a linear-time Transformer-like recurrent model with time-mixing and delta-rule-like state updates. The motivation given is computational efficiency for long assembly sequences, with linear complexity and constant memory with respect to sequence length. Pre-training is performed on BinaryCorp with two self-supervised tasks: Next Token Prediction and Next Instruction Prediction. The former learns local syntax and short-range structure, while the latter forces the model to encode higher-level control- and data-flow relations across instructions. The corresponding MLP heads are discarded before fine-tuning (Liu et al., 11 Dec 2025).

For a basic block with hidden states $\mathbf{H}=[\mathbf{h}_1,\dots,\mathbf{h}_N]$ , the BBE is obtained by self-attention pooling. The attention scores are

$e_i = \mathbf{u}_a^\top \tanh(\mathbf{W}_a \mathbf{h}_i^\top + \mathbf{b}_a),$

with normalized weights $I$ 0, and the embedding is the weighted sum of token states. The encoder is then fine-tuned with a triplet loss on binary code similarity, using an anchor and positive compiled from the same function at different optimization levels and a negative from a different function. This makes the embeddings compiler-robust and semantically meaningful (Liu et al., 11 Dec 2025).

The Stage 1 model is trained and fine-tuned exclusively on BinaryCorp, which contains 26M functions from 10k projects across optimization levels $I$ 1– $I$ 2 and $I$ 3. The RWKV-based encoder has about 22M parameters total and achieves the reported BCSD results of MRR 0.911 and Recall@1 0.858 with pool size 100, and 0.581 and 0.505 with pool size 10,000. These results are used in the paper as evidence that the learned basic-block semantics are strong enough to support the downstream signature construction (Liu et al., 11 Dec 2025).

4. Stage 2: order-invariant aggregation and performance-aware decoding

The second stage aggregates BBEs for an interval into a single SemanticBBV signature using a Set Transformer composed of Self-Attention Blocks in the encoder and Pooling by Multi-head Attention in the decoder. With a single PMA seed, the architecture produces one signature vector $I$ 4. Because the aggregation is permutation-invariant, shuffling the basic blocks does not change the signature, which removes the dependence on arbitrary basic-block ordering that defined the BBV limitation (Liu et al., 11 Dec 2025).

Execution frequency is incorporated during this stage. Each basic block carries a frequency $I$ 5, analogous to BBV counts, and the model uses these frequencies either as concatenated features or as attention/pooling weights. Conceptually, the signature is constructed from the set $I$ 6, allowing frequently executed blocks to dominate the aggregate representation in a manner analogous to classical interval profiles (Liu et al., 11 Dec 2025).

The crucial distinction from a purely semantic encoder is that Stage 2 is trained so that the latent signature can be decoded into performance. The total loss is

$I$ 7

The interval-level triplet loss preserves structural distinctiveness using positive and negative pairs defined from traditional BBV phase similarity. A CPI regression head predicts

$I$ 8

with ground-truth CPI from Gem5 and a Huber loss rather than MSE. The CPI consistency term penalizes intervals that are close in signature space but far apart in CPI, thereby aligning geometric proximity with performance similarity (Liu et al., 11 Dec 2025).

This combination gives the “decoder” its specific meaning. The signature is not only a latent descriptor; it is a latent code from which one can decode CPI directly through $I$ 9 or indirectly through clustering and representative reuse. In the paper’s terminology, the signature space itself becomes a performance-aware semantic embedding space (Liu et al., 11 Dec 2025).

5. Cross-program reuse, universal program points, and transfer

The most distinctive use of the signature semantic decoder in SemanticBBV is cross-program simulation reuse. On 10 SPEC CPU 2017 integer benchmarks, the study takes 100k intervals of 10M instructions, totaling 1T instructions, computes a SemanticBBV signature for each interval, and applies K-means with $\{b_1,\dots,b_n\}$ 0. Each cluster is interpreted as a universal behavioral archetype, and the interval closest to each centroid is selected as a universal program point (Liu et al., 11 Dec 2025).

For a benchmark $\{b_1,\dots,b_n\}$ 1, each interval is assigned to its nearest centroid, producing a behavior profile

$\{b_1,\dots,b_n\}$ 2

with $\{b_1,\dots,b_n\}$ 3. If $\{b_1,\dots,b_n\}$ 4 denotes the simulated CPI of the representative interval for cluster $\{b_1,\dots,b_n\}$ 5, the benchmark CPI is estimated by

$\{b_1,\dots,b_n\}$ 6

This procedure estimates the performance of all 10 benchmarks by simulating only 14 representative intervals, or 140M instructions in total, instead of the full 1T instructions (Liu et al., 11 Dec 2025).

The reported result is 86.3% average accuracy for CPI estimation with a 7143x simulation speedup. At the same time, on single-program SimPoint-like tasks on floating-point benchmarks, SemanticBBV matches traditional BBV closely: BBVs achieve 98.56% average accuracy excluding one outlier, and SemanticBBV differs by approximately −0.24 percentage points on average. This is significant because it shows that the semantic signature does not sacrifice established intra-program behavior while enabling cross-program reuse that BBV cannot support (Liu et al., 11 Dec 2025).

The same signature space is also adapted to a new microarchitecture. Training is first performed on TimingSimpleCPU, then fine-tuning to O3CPU uses only 20% of 10B instructions from two benchmarks, 602.perlbench and 602.gcc, while fine-tuning only the Set Transformer and CPI head. On the O3CPU evaluation, unseen benchmarks such as 625.x264 achieve 84.6% CPI accuracy, and time-series CPI trends such as the periodic behavior in x264 are captured. The paper also identifies a limitation: memory-bound workloads such as xz and deepsjeng are poorly predicted because the target is only CPI, not richer memory-system metrics, and the model misses cold-cache spikes with CPI greater than 30 (Liu et al., 11 Dec 2025).

The SemanticBBV paper explicitly describes its own pipeline as a semantic codec: Stage 1 encodes basic blocks into BBEs, Stage 2 encodes intervals into a signature, and the decoder maps that signature to CPI or to reusable cluster memberships. Other recent work uses the same high-level pattern with different semantic carriers, indicating that the notion of a signature semantic decoder is broader than microarchitecture alone.

Work	Semantic signature	Decoded output
SemanticBBV (Liu et al., 11 Dec 2025)	Set-aggregated interval signature from BBEs and frequencies	CPI, clustering, universal program points
SCDGSC (Yang et al., 2023)	Semantic map plus local static reference image	Conditional DDPM-generated monitoring scene
Explicit Seb framework (Zheng et al., 2023)	Seb codebook and usage indices	Reference image plus residual reconstruction
GBSED (Ribouh et al., 9 Mar 2026)	Compressed scene-graph tensor and node features	Reconstructed scene graph and risk prediction

In the remote-monitoring framework of SCDGSC, the source transmits a semantic map of relevant object locations and contours, while the receiver uses a conditional DDPM, a local static scene image, and classifier-free guidance with $\{b_1,\dots,b_n\}$ 7 to generate semantically consistent frames. The encoder and decoder are optimized independently, the semantic map is about 5 kB versus JPEG frames of about 82–128 kB, and the segmentation module reports mean IoU 81.7 with class IoUs 97.9 and 65.6. Here the decoder is generative rather than regressive, but it still operates by decoding a compact semantic representation into task-aligned output rather than reconstructing raw transmitted pixels (Yang et al., 2023).

In the explicit semantic base framework, the shared signature is a transmitted codebook of latent patch prototypes, the Sebs, plus per-patch usage indices. The decoder reconstructs a Seb-based reference image and then refines it with compensation and residual features. The framework is trained end-to-end with a rate–distortion objective and a regularizer aligning latent features to Sebs, and it reports gains of about 0.5–1.5 dB PSNR over state-of-the-art baselines across SNR values. This formulation emphasizes that a semantic signature may also take the form of an explicit, interpretable codebook rather than a continuous latent embedding (Zheng et al., 2023).

In the CAV-oriented GBSED architecture, the signature is a compressed scene graph comprising a relation tensor and node feature matrix. The decoder reconstructs the full graph and performs risk assessment through a GNN, LSTM, and MLP. The framework reports up to 99% semantic compression relative to raw images, semantic fidelity exceeding 0.9 above 10 dB, and an overall raw-image compression ratio of 2425 with about 99.9% reduction. In this case the semantic decoder reconstructs a structured world model and directly supports a downstream control-related task (Ribouh et al., 9 Mar 2026).

Across these examples, the recurring principle is that decoding is organized around meaning-bearing signatures rather than around raw observations. In the SemanticBBV case, the signature semantic decoder is distinguished by three properties: order-invariant aggregation, explicit performance supervision, and cross-program reuse. Its main limitation is that a CPI-only target under-represents memory behavior. The paper therefore suggests extending the decoder to richer microarchitectural targets such as branch miss rates and L1/L2 miss rates, which would turn the current CPI decoder into a multi-output microarchitectural decoder and broaden the scope of the signature space (Liu et al., 11 Dec 2025).