Graph-based Dynamic Sequence Compressor (GDC)
- The GDC compresses high-dimensional sequences by dynamically routing features through capsule networks and graph convolution, effectively reducing redundancy and noise.
- It builds an adaptive graph with self-attention to align and merge temporal features, enabling uniform cross-modal processing and efficient genomic data compression.
- Empirical evaluations demonstrate improved sentiment analysis accuracy and robust compression ratios, highlighting GDC's scalability and performance in diverse applications.
A Graph-based Dynamic Sequence Compressor (GDC) is a neural module designed to reduce sequential redundancy and noise in high-dimensional time series such as acoustic and visual modalities in multimodal sentiment analysis. The GDC paradigm has also been interpreted as an implicit mechanism in large-scale genomic data compression, where a dynamic dictionary captures repeated substrings across sequences. In neural multimodal frameworks, GDC combines capsule networks and graph convolutional mechanisms; in genomics, GDC describes a two-level Lempel–Ziv (LZ)-style factoring, organized as an evolving dictionary that can be viewed as a dynamic graph over substrings. These methods facilitate more efficient downstream analysis or storage by condensing input sequences into compact graph representations while preserving essential information.
1. GDC in Neural Multimodal Sequence Compression
The GDC introduced in "Improving Multimodal Sentiment Analysis via Modality Optimization and Dynamic Primary Modality Selection" (Yang et al., 9 Nov 2025) targets temporal feature sequences from non-language modalities, specifically acoustic and visual signals, post-feature-extraction. Its objective is to compress these sequences () into a condensed form , matching the number of time steps and feature dimension of the language modality, thus enabling uniform cross-modal operations.
The GDC module fulfills several roles:
- Removes temporal redundancy and noise by adaptively concentrating information into salient “nodes.”
- Matches sequence lengths across modalities, critical for modalities selection, attention, and fusion mechanisms.
2. Capsule Network Layer for Dynamic Projection
Each modality's feature sequence is projected into capsule embeddings via a learned set of transformations. For each time-step and each target node :
Here, are trainable parameters. Dynamic routing “by agreement” iteratively computes soft assignment coefficients for merging:
This iterative process (typically rounds) converges to node representations .
3. Graph Construction and Convolution
After capsule aggregation, nodes are connected via an adaptive adjacency matrix determined by self-attention:
With , this attention builds edge weights based on feature similarity, yielding a learned graph over compressed sequence positions.
A stack of Graph Convolutional Network (GCN) layers then propagates information across this graph:
with and .
4. Compression Mechanism, Learning, and Pseudocode
The dynamic routing selectively pools input time steps into salient nodes, concentrating distributed information and suppressing noise. GCN layers promote further redundancy reduction and structural abstraction by diffusing information across informative nodes. The module lacks a stand-alone reconstruction loss; instead, it is optimized end-to-end for the target sentiment analysis and InfoNCE objectives.
Pseudocode sketch:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 |
Input: H_m ∈ R^{T_m×d_m}, T_l, d, R, L Output: H̃_m ∈ R^{T_l×d} for i in range(T_m): for j in range(T_l): u[i][j] = W_m^{ij} @ H_m[i] b = zeros(T_m, T_l) for r in range(R): r_coef = softmax(b, axis=1) for j in range(T_l): v[j] = sum(r_coef[i][j] * u[i][j] for i in range(T_m)) for i in range(T_m): for j in range(T_l): b[i][j] += dot(u[i][j], tanh(v[j])) N_m = stack([v[j] for j in range(T_l)]) Q = N_m @ W_m^Q K = N_m @ W_m^K S = (Q @ K.T) / sqrt(d) E = relu(S) A_m = row_softmax(E) H = N_m for layer in range(L): A_tilde = A_m + I D_tilde = diag(A_tilde @ ones(T_l)) H = relu(D_tilde^{-1/2} @ A_tilde @ D_tilde^{-1/2} @ H @ W_m^{layer} + b_m^{layer}) H̃_m = H return H̃_m |
5. Empirical Performance and Ablation
Typical configurations set (routing iterations), –$60$ (aligned to language input), (matching BERT features), –2 (GCN layers). Direct tensor shapes: input , projected capsules , output .
Ablation experiments indicate that excluding GDC from the MODS framework results in a 5–8% absolute reduction in correlation/accuracy, while omitting the capsule routing mechanism itself causes an additional ~2% loss. This suggests both the capsule and graph components are independently contributory to eliminating non-language sequential redundancy and improving cross-modal fusion (Yang et al., 9 Nov 2025).
6. Theoretical Lineage and Implicit Graph Structure in Lempel–Ziv Compression
In the context of large-scale genomic sequence collection compression, GDC (as GDC2) has been interpreted as constructing an implicit, dynamically evolving direct acyclic substring dictionary. Each unique substring of a factored tuple stream forms a node; chaining of substrings recorded in the compressed stream corresponds to edges. This structure facilitates redundancy exploitation across collections—first by referencing a fixed reference genome, then by cross-referencing previously observed tuple streams. The directed acyclic graph grows dynamically as more genomes are processed. Though not explicitly stored as a graph, this interpretation aligns with dynamic dictionary methods and redundancy mining in a general graph-structured sequence space (Deorowicz et al., 2015).
7. Practical Considerations and Comparative Performance
In a neural MODS framework, GDC requires end-to-end joint optimization, with hyperparameters that can be tuned to balance the degree of compression against computational cost. No explicit reconstruction objective is imposed.
For GDC2 in genomics, empirical results show for 1092 human diploid genomes ( bytes raw), a compression ratio is achieved, with MB/s throughput and memory usage of 5–24 GB depending on the number of reference sequences retained at the second level. The method is notably faster (2–4×) and more effective than prior art such as FRESCO or RLZ; randomly accessing a single compressed sequence necessitates partial decompression of references, a trade-off manageable via selective retention (Deorowicz et al., 2015).
The GDC formalism thus unifies neural and classical approaches to removing sequential redundancy via graph-based dynamic abstraction, supporting both high-throughput, high-density unsupervised compression and supervised discriminative modeling.
Sponsored by Paperpile, the PDF & BibTeX manager trusted by top AI labs.
Get 30 days free