Hierarchical Sentence Graphs: Models & Applications

Updated 7 January 2026

Hierarchical Sentence Graphs are structured models that represent documents at multiple levels—words, sentences, sections, and the entire document—to capture complex linguistic dependencies.
They offer scalable and efficient alternatives to full graph models by focusing dense interactions on local segments and summarizing higher-order structures.
These graphs enable fine-grained reasoning, multi-hop question answering, and improved summarization through hierarchical message passing and cross-level aggregation.

A hierarchical sentence graph is a structured, multi-level graph representation of the sentence- and discourse-level relationships within text. It is designed to model intra- and inter-sentence, section, and document-level dependencies and enables fine-grained reasoning, retrieval, summarization, and classification over long or complex documents. Hierarchical sentence graphs constitute the backbone for various state-of-the-art neural architectures in multi-hop question answering, document summarization, text classification, and semantic matching, offering a scalable alternative to quadratic full-graph or sequence models by leveraging linguistic hierarchy and the structured nature of discourse.

1. Formal Structure and Variants

The hierarchical sentence graph paradigm encompasses a range of architectures, which can be instantiated on the level of local segments, sentences, sections, and global document memory, often specialized for their target task.

General Schema: Hierarchical sentence graphs are typically layered, with nodes and edges defined at different granularity levels. Common abstractions include:

Word–Sentence–Section–Document Hierarchies: Nodes at each tier represent words, sentences, sections, or entire documents, with cross-level edges encoding composition (word–in–sentence, sentence–in–section, etc.). Edges within each level encode logical, rhetorical, or proximity relations (Zhang et al., 2023, Zhao et al., 2024, Liu et al., 17 Sep 2025).
Segment–Summary Graphs: Input sequences are partitioned into segments, with local graphs built per segment, each summarized by a compact node. These summaries form the nodes of a global graph, which approximates the semantic topology of the entire document (Liu et al., 17 Sep 2025).
Latent Tree and Factorization: In approaches such as latent matrix-tree based induction, sentences serve as nodes of a soft, weighted tree learned from the data, or, in compositional factorization, as levels in a semantic tree (Qiu et al., 2022, Liu et al., 2018).
Rhetorical/Discourse-Based Graphs: Graph construction is guided by discourse parsing (e.g., Rhetorical Structure Theory), distinguishing nucleus/satellite sentences and labeling edges with semantic/rhetorical relations (Liang et al., 6 Jan 2026, Zhang et al., 2023).

Examples of node and edge types:

Graph Level	Nodes	Key Edge Types
Segment	Tokens/phrases	Local semantic similarity
Sentence	Sentences	Discourse/logical proximity
Section/Topic	Section summaries	Intra-/inter-section links
Document/Global	Summary vectors	Entity bridges, topic relations

2. Graph Construction Principles

Segmentation and Hierarchy Induction: The document is decomposed into semantic units—segments, sentences, sections, sometimes inferred trees or clusters—using surface boundaries, linguistic parsers, or latent structure induction. Segment lengths might be tuned (e.g., $k\ll N$ , where $k$ is segment length and $N$ is document length) for computational efficiency (Liu et al., 17 Sep 2025). Hierarchical sentence factorization relies on AMR parsing and ordered tree construction (Liu et al., 2018).

Edge Formation:

Intra-level edges: Local semantic similarity (e.g., thresholded cosine between token or sentence embeddings); proximity (e.g., adjacency in the document); or rhetorical/AMR-inspired logical relations (Liu et al., 17 Sep 2025, Qiu et al., 2022, Liang et al., 6 Jan 2026).
Inter-level (hierarchical) edges: Node composition or assignment (e.g., sentence–to–section, section–to–document); summary node creation; alignment via mapping or aggregation functions (Zhang et al., 2023, Zhao et al., 2024).
Cross-document edges: Created via entity overlap, bridge relations, or LLM-prompted connections for multi-document reasoning (Liang et al., 6 Jan 2026).

Special structures:

Hypergraphs: Used to represent higher-order section–sentence grouping, where hyperedges connect multiple sentences in the same section, supporting aggregation and attention over sections (Zhao et al., 2024).
Latent Trees: Edge weights are soft probabilities learned by matrix-tree algorithms, optimized for integration in end-to-end systems (Qiu et al., 2022).

3. Hierarchical Message Passing and Learning

Graph Neural Networks process information hierarchically, reflecting the multi-level graph structure:

Local GCN/GAT Layers: Run within segments or sections to encode fine-grained relationships among tokens or sentences under local context (Liu et al., 17 Sep 2025, Zhang et al., 2023).
Cross-level Aggregation: Representation at each node level is pooled and propagated to higher-level nodes (e.g., mean/max pooling from words to sentences, sentences to sections) (Liu et al., 17 Sep 2025, Zhao et al., 2024, Hua et al., 2022).
Attention and Contrastive Objectives: Multihead attention mechanisms are used both to weight intra- and inter-level messages, and to selectively fuse global and local context information. Graph contrastive learning augments node representations with theme-aware global signals (Zhang et al., 2023).

A representative update defines node states at level $\ell+1$ using neighbor aggregation, e.g.,

$h_{i}^{(\ell+1)} = \sigma\left( W^{(\ell)} \cdot \sum_{j \in \mathcal{N}(i)} \alpha_{ij}^{(\ell)} h_j^{(\ell)} + b^{(\ell)} \right)$

where $\alpha_{ij}^{(\ell)}$ encodes attention or edge weight, and message passing is typically task- and level-dependent.

4. Core Applications

Multi-hop Question Answering: Hierarchical sentence graphs support multi-hop evidence selection by modeling fine-grained logical dependencies and topic connections—often using RST-inspired labeling and cross-document entity bridges for evidence path expansion and compositional reasoning (Liang et al., 6 Jan 2026, Xiong, 2020).

Summarization: Both extractive and abstractive summarizers leverage hierarchical graphs to capture salient cross-sentence and section-level structure, identifying and scoring important sentences for extraction or forming global context for neural decoders (Zhang et al., 2023, Zhao et al., 2024, Qiu et al., 2022).

Text Classification: Hierarchical graph architectures improve text classification by aggregating discriminative cues at word, sentence, and document levels, with adaptive weighting to balance local and global information (Hua et al., 2022).

Semantic Matching and Ordering: Latent hierarchical trees or semantic factorization support fine-grained semantic alignment and matching, including unsupervised optimal-transport distances and multi-scale Siamese models for similarity, paraphrase, and order prediction tasks (Liu et al., 2018, Wu et al., 2021).

5. Complexity, Approximation, and Scalability

Computational Tradeoffs: Full global token/sentence graphs scale as $O(N^2)$ in input size, making them impractical for long documents. Hierarchical sentence graph models—by restricting dense reasoning to local graphs and summarizing higher in the hierarchy—reduce worst-case complexity to $O(N k + (N/k)^2)$ ( $k\ll N$ ), enabling processing of multi-thousand-token documents (Liu et al., 17 Sep 2025).

Approximation Analysis: The block-sparse nature of hierarchical graphs introduces bounded representational error, often quantified in Frobenius norm difference between the full and approximated adjacency matrices. Tight control via adaptive thresholds ensures empirical error remains small relative to computational savings (Liu et al., 17 Sep 2025).

6. Empirical Performance and Insights

Empirical Gains: Across long-text AMR parsing, semantic role labeling, and legal/event extraction, hierarchical sentence graph models achieve 2–4× inference speedup, ≥60% reduction in peak memory usage, and ≥95% retention of end-task accuracy compared to full global-graph models (Liu et al., 17 Sep 2025). In extractive and abstractive summarization, hierarchical graph approaches yield +2–3 ROUGE improvements over flat baselines and improve relevance and non-redundancy in human evaluation (Zhang et al., 2023, Qiu et al., 2022).

Key Insights:

Explicitly modeling sentence-level logic and discourse structure outperforms chunk-level or sequential retrieval in multi-hop QA (Liang et al., 6 Jan 2026).
Hierarchical architectures preserve long-range context while ensuring that local coherence is not lost, crucial for summarization and insertion tasks (Wu et al., 2021, Zhao et al., 2024).
Latent structure induction (via matrix-tree or AMR-based factorization) gracefully handles documents lacking clear section boundaries or explicit discourse markers (Qiu et al., 2022, Liu et al., 2018).

7. Limitations and Future Directions

Challenges:

Construction of hierarchical graphs, especially those guided by LLMs or RST parsers, can be resource-intensive and may introduce spurious or noisy connections (Liang et al., 6 Jan 2026).
Task and genre-specific definitions of hierarchy or discourse relation may limit transferability across domains.
Most existing frameworks focus on two- or three-level hierarchies; ultra-deep or recursive hierarchies, or those integrating cross-document knowledge, require further investigation.

Ongoing Directions:

Incremental, streaming, or self-supervised graph induction to avoid costly offline construction (Liu et al., 17 Sep 2025)
Integration of more nuanced rhetorical, entity, and temporal relations in graph construction (Liang et al., 6 Jan 2026)
Enhanced graph contrastive learning and global–local fusion for robustness in summarization and retrieval under extreme document lengths (Zhang et al., 2023, Hua et al., 2022)

Hierarchical sentence graphs thus provide a unifying substrate for modeling, reasoning, and retrieval in complex natural language processing scenarios where both fine-grained and abstracted semantic relations are essential.