Seq2Tens: Scalable Sequence Representation
- Seq2Tens is a framework that represents sequences by lifting static features into tensor algebra, preserving order and encoding complex dependencies.
- It employs low-rank tensor projections to efficiently approximate high-dimensional feature maps, reducing computational costs without sacrificing expressiveness.
- The framework seamlessly integrates with neural, probabilistic, and graph models, enhancing applications in time series analysis and deep learning.
Seq2Tens is a framework for efficient, expressive, and mathematically grounded sequence representation, central to the interface of modern machine learning, rough path theory, and tensor algebra. It constructs feature maps for sequences of arbitrary length by systematically lifting static feature representations into the tensor algebra, using compositions of low-rank tensor projections. This approach enables the encoding of complex, nonlinear, and global dependencies in sequential data with strong theoretical universality and practical scalability. Seq2Tens is applicable across time series analysis, deep learning, probabilistic modeling, and structured data such as graphs. The sections below elaborate its algebraic formulation, computational innovations, integration with neural and kernel models, theoretical properties, practical performance, and position within signature-based learning.
1. Algebraic and Mathematical Foundations
Seq2Tens builds on the structure of the tensor algebra over a vector space , where is the codomain of a static feature map for the input domain. The tensor algebra is the graded sum ; elements are tuples with .
For a sequence , Seq2Tens defines the sequential feature map as
where and the product is the non-commutative multiplication of . This product, for , is defined by
which is the convolution product, ensuring sensitivity to input order. For each degree , the -th component of is given by
This formula generalizes -gram features to sequences of arbitrary objects and length, providing a universal, hierarchical summary of all subsequences.
Non-commutativity is essential: reordering sequence elements changes the result in , allowing for the capture of order-dependent information intrinsic to sequential semantics. This contrasts with commutative summations or bag-of-words style representations, which ignore ordering and therefore cannot distinguish fundamentally different sequences.
2. Low-Rank Tensor Projections and Scalability
Direct computation with full tensors is computationally intractable due to the exponential growth in the number of coefficients with and input dimension . Seq2Tens circumvents this by restricting evaluations to linear functionals on that are of low CP rank, such as rank-1 tensors.
A rank-1 linear functional for degree can be represented as for . Then, the action of on the feature map is
This recursion avoids explicit tensor construction and can be implemented efficiently with complexity , where is the truncation degree and the input dimension.
The framework extends to low-rank sums via
which enables a trade-off between expressive power and computational efficiency. Stacking such functionals produces a low-rank Seq2Tens map (LS2T): To restore expressivity lost by enforcing low rank, one stacks or composes several such LS2T layers, similar to layering in neural architectures.
3. Integration with Neural, Probabilistic, and Structured Models
Seq2Tens is modularly integrated into neural network architectures as a sequence-to-sequence transformation layer. It is compatible with:
- Feedforward and convolutional layers: Used as front-end static feature maps .
- Neural stacks: LS2T modules can be chained, akin to deep stacking in RNNs or CNNs, supporting both unidirectional and bidirectional processing.
- Hybrid architectures: In combination with standard network blocks, including RNNs, CNNs, and transformers, LS2T layers add the capacity for capturing global, high-order, and non-local dependencies.
- Probabilistic models: In Gaussian process and kernel regression models (notably with signature kernels), the Seq2Tens mapping allows for scalable approximation and parameterization of kernel functions, with the weight space parameterized by low-rank tensors.
- Structured data and graphs: The expected signature features of random walks on graphs use the same algebraic and low-rank evaluation techniques, supporting tasks requiring long-range and global reasoning.
4. Theoretical Properties: Universality and Invariance
Seq2Tens inherits theoretical strengths from the algebraic structure:
- Universality: If the static feature map is universal (i.e., sufficiently rich to approximate any continuous function on ), then the sequence map is universal for the space of sequences; any continuous function on sequences can be approximated as with suitable .
- Order sensitivity: The non-commutative algebra ensures that sequence order is preserved and exploited, supporting tasks where even basic semantics depend on ordering.
- Compatibility with invariance: Seq2Tens can be adapted to be invariant or sensitive to specific transformations (e.g., time reparameterization) by modifying the static map or inputs, matching key properties of signature features from rough path theory.
- Depth vs width: Layered LS2T architectures can approximate high-rank functionals; depth compensates for rank-restricted parameter efficiency analogously to findings in deep learning.
5. Computational Complexity and Implementation
Efficient evaluation algorithms for Seq2Tens (see App. D, Algorithm D.1/D.2) exploit the recursive structure of low-rank projections to achieve:
- Time complexity: Linear in sequence length , quadratic or linear in truncation degree , and linear in output dimension .
- Space complexity: Linear in and , avoiding exponential tensor storage costs.
- Scalability: Amenable to batched, vectorized implementations on GPUs or parallel hardware; empirical timings demonstrate orders-of-magnitude improvements over naive tensor approaches for similar expressiveness.
A plausible implication is that scalability properties make Seq2Tens suitable for both research prototypes and production-scale deployment on large and heterogeneous sequential datasets, without a dramatic loss in expressivity.
6. Empirical Results and Applications
Seq2Tens has demonstrated strong empirical performance on multiple benchmarks:
- Multivariate time series classification: On the UCR/UEA archive, LS2T layers improved or matched state-of-the-art baselines, often reducing parameter counts.
- Healthcare: mortality prediction: On Physionet2012 ICU data, FCN+LS2T architectures outperformed strong RNN and Transformer models in both accuracy and precision-recall area.
- Generative modeling: As a component within variational autoencoders (e.g., GP-VAE), LS2T improved negative log-likelihood, mean squared error, and AUROC on video and medical imputation tasks.
- Parameter/sample efficiency: Models with LS2T layers often achieved similar or better accuracy than larger baselines, suggesting improved sample efficiency.
- Graph-structured learning: The same low-rank projection techniques have been applied for high-order path-based encoding in graph models, beneficial for tasks requiring global aggregation.
7. Position within Signature-Based and Tensor-Based Learning
Seq2Tens generalizes and unifies several prior approaches:
- String kernels and -gram models: By capturing subsequence features of all orders in a universal algebraic structure.
- Path signatures from rough path theory: By adopting the tensor algebra and extending signature machinery with low-rank, scalable projections to handle real-world complexities.
- Kernel learning and Gaussian processes: By providing scalable feature maps for kernel methods over sequences and structured domains.
- Deep learning: By enabling the insertion of mathematically principled, parameter-efficient, and order-sensitive sequence representations into modern deep architectures.
- Graph learning: By providing a framework for efficient and expressive path-based aggregation in node and graph representations.
Seq2Tens thus provides a modular, mathematically rigorous, and computationally scalable bridge between algebraic functional analysis, probabilistic modeling, and state-of-the-art deep learning pipelines, alleviating prior bottlenecks in expressive yet practical sequence modeling.