AMR Conceptual Entropy

Updated 2 December 2025

AMR-based conceptual entropy is an unsupervised, information-theoretic method that uses AMR graphs to quantify and extract semantically pivotal content.
It computes entropy scores for AMR graph nodes via subword token probabilities, enabling precise selection of high-impact concepts.
The approach compresses document contexts by about 50% while preserving semantic integrity, thereby boosting LLM retrieval and inference efficiency.

AMR-based conceptual entropy is an unsupervised, information-theoretic approach for quantifying semantic importance within textual contexts by operating over Abstract Meaning Representation (AMR) graphs. This framework extracts and retains the core conceptual elements from long documents, facilitating context compression for applications such as retrieval-augmented generation (RAG) in LLMs while discarding redundant information. The process centers on assigning entropy scores to AMR graph nodes, thereby enabling principled selection of semantically essential content. The conceptual entropy paradigm leverages stable linguistic features, theoretical information measures, and token-level statistical testing to generate compressed, semantically focused contexts that outperform traditional and neural baselines in both accuracy and efficiency (Shi et al., 24 Nov 2025).

1. Mathematical Definition of AMR-based Conceptual Entropy

Given a document $d$ , each sentence is parsed into an AMR graph $G = (V, E)$ , where $V = \{v_1, \dots, v_{|V|}\}$ are concept nodes (entities, predicates, modifiers) and $E$ comprise semantic relations. Each node $v \in V$ is mapped to its corresponding surface-text realization.

The information-theoretic score $H(v)$ for each node $v$ is computed using an AMR-generation model $M_\theta$ . The process involves decomposing the surface form of $v$ into $m$ subword tokens $[s_1,\dots,s_m]$ , and then estimating the entropy contribution per token: $E(s_j) = \exp(-\log P_\theta(s_j \mid s_{<j}, G))$ where $s_{<j}$ denotes preceding tokens of $v$ . The concept-level entropy is normalized across tokens: $H(v) = \frac{1}{m} \sum_{j=1}^m E(s_j)$ High-entropy nodes ( $H(v)$ outliers) are likely to represent semantically pivotal concepts. Statistical significance is determined via a one-sample $t$ -test relative to the concept entropy distribution

$t(v) = \frac{H(v) - \overline{H}}{s/\sqrt{n}}, \quad n=|V|, \overline{H} = \frac{1}{n} \sum_i H(v_i), s^2 = \frac{1}{n-1} \sum_i (H(v_i) - \overline{H})^2$

with selection based on the resulting two-tailed $p$ -value ( $p(v) < \alpha$ , typically $\alpha=0.3$ ).

2. Pipeline Structure and Workflow

The AMR-based conceptual entropy pipeline comprises three principal stages:

AMR Parsing: Supporting documents are split into sentences and parsed using an AMR parser (mBART-based, trained on AMR 3.0), yielding sentence-level graphs $G_i=(V_i,E_i)$ .
Entropy Scoring: Each node $v \in V_i$ is linearized to subwords and scored using the generation model $M_\theta$ to calculate $H(v)$ .
Concept Selection and Context Reconstruction: Statistically significant high-entropy nodes are selected via one-sample $t$ -testing. Selected nodes are mapped back to surface text $\phi(v)$ , temporal expressions reconstructed, duplicates removed, and outputs concatenated to form the compressed context $C'$ .

This pipeline provides a linguistically grounded mechanism to distill essential information while aggressively pruning irrelevant or redundant content. The process is designed for offline preprocessing and scales as $O(|D|) + O(N \times m)$ for $|D|$ documents and $N$ total concept nodes.

3. Compression Algorithm and Data-driven Pruning

The core compression algorithm is described with the following pseudocode:

def CompressContext(D, alpha):
    C_prime = []
    for d in D:
        G = AMR_parse(d)
        V = G.nodes
        for v in V:
            subwords = tokenize(v)
            E = [exp(-log P_theta(s_j | s_<j, G)) for s_j in subwords]
            H_v = sum(E) / len(E)
        mean_H = mean([H_v for v in V])
        std_H = std([H_v for v in V])
        selected_V = [v for v in V if p_value(H_v, mean_H, std_H) < alpha]
        fragments = [surface_reconstruct(v) for v in selected_V]
        C_prime.extend(fragments)
    return C_prime

Selection of conceptual nodes for $C'$ is strictly statistical, utilizing local $t$ -test significance against the per-document entropy distribution. Post-processing enforces text de-duplication and temporal normalization. The outcome is a reduction of document size by roughly $50\%$ while preserving or improving semantic coverage (Shi et al., 24 Nov 2025).

4. Experimental Validation, Baseline Comparison, and Metrics

Extensive empirical evaluation is performed on PopQA (14k entity-rich question-answer pairs) and EntityQuestions datasets. Answer-containing documents are retained to isolate compression efficacy.

Retrievers: Contriever (PopQA), BM25 (EntityQuestions). Backbone LLMs: Diverse architectures (GPT-Neo, OPT, BLOOM, Llama-2-chat, Llama-3.1-Instruct, DeepSeek-V2-Lite, Qwen3).

Baselines: TF-IDF term selection; prompt-based keyword extraction; summary generation; dedicated compression (SelCon, LLMLingua); raw documents.

Metrics:

Accuracy (exact-match).
AUC of accuracy vs. document count, over both short ( $I_s=[1,10]$ ) and long ( $I_l=[6,10]$ ) context intervals.
$\sigma$ : AUC standard deviation across LLM backbones.
Inference latency (tokens processed per second).

Key Results:

Average AUC $_{I_s}$ improves from 553.32 (vanilla) to 600.62 (AMR conceptual entropy), $\Delta=+47.3$ ; $\sigma$ drops from 119.6 to 104.3.
Long-context PopQA ( $I_l$ ): AUC rises from 262.07 to 283.54 ( $\Delta=+21.5$ ).
EntityQuestions ( $I_s$ ): net AUC gain $+61.3$ .
Compression ratio: approximately $50\%$ of token count (Figure 1).
Latency improved by $10$– $20\%$ (Table~5).

5. Illustrative Example: Conceptual Distillation in Practice

Consider the toy input: "Marie Curie was born in Warsaw in 1867. She discovered radium."

AMR parsing yields nodes: {marie-curie, be-01, warsaw, 1867, she, discover-01, radium}. Entropy scores assigned by $M_\theta$ are:

H(marie-curie)=2.1, H(warsaw)=1.8, H(1867)=1.2
H(she)=1.0, H(discover-01)=2.3, H(radium)=2.8

After $t$ -test filtering ( $p<0.3$ ), selected concepts are {marie-curie, discover-01, radium}. The reconstructed compressed context $C'\approx\,\text{"Marie Curie | discovered | radium"}$ preserves the semantic core while reducing context length from 15 tokens to 3 fragments, supporting enhanced downstream LLM reasoning (Shi et al., 24 Nov 2025).

6. Limitations, Computational Complexity, and Prospective Extensions

Limitations:

AMR parsing errors can omit critical concepts.
Sentence-level graph analysis ignores discourse links across sentences.
Evaluation isolates compression via answer-rich document selection; full noisy retrieval stack remains to be tested.
Preprocessing cost for AMR parsing and entropy calculation is nontrivial, though feasible offline.

Computational Attributes:

AMR parsing: $O(|D|)$ in document count.
Entropy scoring: $O(N \times m)$ , where $N$ is the number of concepts, $m$ average tokens per concept.
Statistical testing and reconstruction: $O(N)$ .

Future Directions:

Extension to multi-modal AMRs (image+text).
Cross-document graph modeling to aggregate distributed evidence.
Adaptive thresholding $\alpha(Q)$ depending on query properties.
Lightweight parsers for latency-constrained use cases.
Hybridization with surface redundancy measures.

This framework demonstrates the utility of AMR-based conceptual entropy as a robust, linguistically motivated context compression strategy, distinct from classical redundancy and summary techniques. It significantly advances the state-of-the-art in semantic focus and inference efficiency for LLM contexts (Shi et al., 24 Nov 2025).

7. Conceptual Entropy in Physical Simulations: Connection and Distinction

In adaptive mesh refinement (AMR) simulations of physical systems (e.g., galaxy clusters), conceptual entropy refers to a measure of the thermodynamic state—defined as $S(R) = \log[T_{\rm gas}(R)/\rho_{\rm gas}(R)^{2/3}]$ (Power et al., 2013). This entropy profile diagnoses cluster assembly, as entropy is preserved in smooth hydrodynamic flows and generated in shocks. AMR codes capture physical entropy generation via shock resolution using Riemann solvers, ensuring dissipation is strictly resolution-limited and benign. Entropy cores thus emerge in AMR as robust, convergent, physical phenomena rather than numerical artifacts, contrasting with deficiencies observed in classic SPH.

While the term “conceptual entropy” is adapted for AMR graph-based context engineering (Shi et al., 24 Nov 2025), its mathematical and methodological rigor echoes the principles established in physical AMR modeling (Power et al., 2013): both rely on well-defined entropy quantification (thermodynamic or information-theoretic) at localized resolution scales, avoiding global contamination or spurious artifacts.

A plausible implication is that the stability and interpretability inherent in AMR-based entropy—whether in linguistics or physics—arise from principled, scale-limited measurement and selective aggregation, thus underpinning reliable inference or prediction in their respective domains.