2000 character limit reached

Graph-based Dynamic Sequence Compressor (GDC)

Updated 16 November 2025

The GDC compresses high-dimensional sequences by dynamically routing features through capsule networks and graph convolution, effectively reducing redundancy and noise.
It builds an adaptive graph with self-attention to align and merge temporal features, enabling uniform cross-modal processing and efficient genomic data compression.
Empirical evaluations demonstrate improved sentiment analysis accuracy and robust compression ratios, highlighting GDC's scalability and performance in diverse applications.

A Graph-based Dynamic Sequence Compressor (GDC) is a neural module designed to reduce sequential redundancy and noise in high-dimensional time series such as acoustic and visual modalities in multimodal sentiment analysis. The GDC paradigm has also been interpreted as an implicit mechanism in large-scale genomic data compression, where a dynamic dictionary captures repeated substrings across sequences. In neural multimodal frameworks, GDC combines capsule networks and graph convolutional mechanisms; in genomics, GDC describes a two-level Lempel–Ziv (LZ)-style factoring, organized as an evolving dictionary that can be viewed as a dynamic graph over substrings. These methods facilitate more efficient downstream analysis or storage by condensing input sequences into compact graph representations while preserving essential information.

1. GDC in Neural Multimodal Sequence Compression

The GDC introduced in "Improving Multimodal Sentiment Analysis via Modality Optimization and Dynamic Primary Modality Selection" (Yang et al., 9 Nov 2025) targets temporal feature sequences from non-language modalities, specifically acoustic and visual signals, post-feature-extraction. Its objective is to compress these sequences $H_m\in\mathbb R^{T_m\times d_m}$ ( $m\in\{a,v\}$ ) into a condensed form $\widetilde H_m\in\mathbb R^{T_l\times d}$ , matching the number of time steps $T_l$ and feature dimension $d$ of the language modality, thus enabling uniform cross-modal operations.

The GDC module fulfills several roles:

Removes temporal redundancy and noise by adaptively concentrating information into salient “nodes.”
Matches sequence lengths across modalities, critical for modalities selection, attention, and fusion mechanisms.

2. Capsule Network Layer for Dynamic Projection

Each modality's feature sequence is projected into capsule embeddings via a learned set of transformations. For each time-step $i=1,\dots,T_m$ and each target node $j=1,\dots,T_l$ :

$\mathbf{u}_{i\mid j}^{(0)} = W^{ij}_m H^i_m \in \mathbb{R}^d$

Here, $W^{ij}_m \in \mathbb{R}^{d \times d_m}$ are trainable parameters. Dynamic routing “by agreement” iteratively computes soft assignment coefficients $r_{i,j}$ for merging:

$r_{i, j} = \frac{\exp(b_{i,j})}{\sum_{j'} \exp(b_{i,j'})}$

$\mathbf{v}_j = \sum_{i=1}^{T_m} r_{i, j} \mathbf{u}_{i | j}^{(0)}$

$b_{i,j} \leftarrow b_{i,j} + \mathbf{u}_{i|j}^{(0)} \odot \tanh(\mathbf{v}_j)$

This iterative process (typically $R=3$ rounds) converges to node representations $N_m \in \mathbb{R}^{T_l \times d}$ .

3. Graph Construction and Convolution

After capsule aggregation, nodes $N_m$ are connected via an adaptive adjacency matrix $A_m$ determined by self-attention:

$S = \frac{(N_mW_m^Q)(N_mW_m^K)^T}{\sqrt{d}}$

$A_m = \mathrm{softmax}_{\text{row}}(\mathrm{ReLU}(S))$

With $W_m^Q, W_m^K \in \mathbb{R}^{d \times d}$ , this attention builds edge weights based on feature similarity, yielding a learned graph over compressed sequence positions.

A stack of $L$ Graph Convolutional Network (GCN) layers then propagates information across this graph:

$H_m^{(\ell+1)} = \mathrm{ReLU}\left( \widetilde D_m^{-\frac{1}{2}} \widetilde A_m H_m^{(\ell)} W_m^{(\ell)} + b_m^{(\ell)} \right)$

with $\widetilde A_m=A_m+I_{T_l}$ and $H_m^{(0)}=N_m$ .

4. Compression Mechanism, Learning, and Pseudocode

The dynamic routing selectively pools input time steps into salient nodes, concentrating distributed information and suppressing noise. GCN layers promote further redundancy reduction and structural abstraction by diffusing information across informative nodes. The module lacks a stand-alone reconstruction loss; instead, it is optimized end-to-end for the target sentiment analysis and InfoNCE objectives.

Pseudocode sketch:

Input: H_m ∈ R^{T_m×d_m}, T_l, d, R, L
Output: H̃_m ∈ R^{T_l×d}

for i in range(T_m):
    for j in range(T_l):
        u[i][j] = W_m^{ij} @ H_m[i]

b = zeros(T_m, T_l)
for r in range(R):
    r_coef = softmax(b, axis=1)
    for j in range(T_l):
        v[j] = sum(r_coef[i][j] * u[i][j] for i in range(T_m))
    for i in range(T_m):
        for j in range(T_l):
            b[i][j] += dot(u[i][j], tanh(v[j]))
N_m = stack([v[j] for j in range(T_l)])

Q = N_m @ W_m^Q
K = N_m @ W_m^K
S = (Q @ K.T) / sqrt(d)
E = relu(S)
A_m = row_softmax(E)

H = N_m
for layer in range(L):
    A_tilde = A_m + I
    D_tilde = diag(A_tilde @ ones(T_l))
    H = relu(D_tilde^{-1/2} @ A_tilde @ D_tilde^{-1/2} @ H @ W_m^{layer} + b_m^{layer})
H̃_m = H
return H̃_m

5. Empirical Performance and Ablation

Typical configurations set $R=3$ (routing iterations), $T_l=50$ –$60$ (aligned to language input), $d=128$ (matching BERT features), $L=1$ –2 (GCN layers). Direct tensor shapes: input $H_m \in (300 \times 74)$ , projected capsules $u \in (300 \times 50 \times 128)$ , output $H̃_m \in (50 \times 128)$ .

Ablation experiments indicate that excluding GDC from the MODS framework results in a 5–8% absolute reduction in correlation/accuracy, while omitting the capsule routing mechanism itself causes an additional ~2% loss. This suggests both the capsule and graph components are independently contributory to eliminating non-language sequential redundancy and improving cross-modal fusion (Yang et al., 9 Nov 2025).

6. Theoretical Lineage and Implicit Graph Structure in Lempel–Ziv Compression

In the context of large-scale genomic sequence collection compression, GDC (as GDC2) has been interpreted as constructing an implicit, dynamically evolving direct acyclic substring dictionary. Each unique substring of a factored tuple stream forms a node; chaining of substrings recorded in the compressed stream corresponds to edges. This structure facilitates redundancy exploitation across collections—first by referencing a fixed reference genome, then by cross-referencing previously observed tuple streams. The directed acyclic graph grows dynamically as more genomes are processed. Though not explicitly stored as a graph, this interpretation aligns with dynamic dictionary methods and redundancy mining in a general graph-structured sequence space (Deorowicz et al., 2015).

7. Practical Considerations and Comparative Performance

In a neural MODS framework, GDC requires end-to-end joint optimization, with hyperparameters that can be tuned to balance the degree of compression against computational cost. No explicit reconstruction objective is imposed.

For GDC2 in genomics, empirical results show for 1092 human diploid genomes ( $6.67\times10^{12}$ bytes raw), a compression ratio ${\sim}9500\times$ is achieved, with ${\sim}200$ MB/s throughput and memory usage of 5–24 GB depending on the number of reference sequences retained at the second level. The method is notably faster (2–4×) and more effective than prior art such as FRESCO or RLZ; randomly accessing a single compressed sequence necessitates partial decompression of references, a trade-off manageable via selective retention (Deorowicz et al., 2015).

The GDC formalism thus unifies neural and classical approaches to removing sequential redundancy via graph-based dynamic abstraction, supporting both high-throughput, high-density unsupervised compression and supervised discriminative modeling.

PDF Markdown Chat (Pro)

References (2)

Improving Multimodal Sentiment Analysis via Modality Optimization and Dynamic Primary Modality Selection (2025)

GDC 2: Compression of large collections of genomes (2015)

Follow Topic

Get notified by email when new papers are published related to Graph-based Dynamic Sequence Compressor (GDC).