Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
157 tokens/sec
GPT-4o
8 tokens/sec
Gemini 2.5 Pro Pro
46 tokens/sec
o3 Pro
4 tokens/sec
GPT-4.1 Pro
38 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

Seq2Tens: Scalable Sequence Representation

Updated 30 June 2025
  • Seq2Tens is a framework that represents sequences by lifting static features into tensor algebra, preserving order and encoding complex dependencies.
  • It employs low-rank tensor projections to efficiently approximate high-dimensional feature maps, reducing computational costs without sacrificing expressiveness.
  • The framework seamlessly integrates with neural, probabilistic, and graph models, enhancing applications in time series analysis and deep learning.

Seq2Tens is a framework for efficient, expressive, and mathematically grounded sequence representation, central to the interface of modern machine learning, rough path theory, and tensor algebra. It constructs feature maps for sequences of arbitrary length by systematically lifting static feature representations into the tensor algebra, using compositions of low-rank tensor projections. This approach enables the encoding of complex, nonlinear, and global dependencies in sequential data with strong theoretical universality and practical scalability. Seq2Tens is applicable across time series analysis, deep learning, probabilistic modeling, and structured data such as graphs. The sections below elaborate its algebraic formulation, computational innovations, integration with neural and kernel models, theoretical properties, practical performance, and position within signature-based learning.

1. Algebraic and Mathematical Foundations

Seq2Tens builds on the structure of the tensor algebra T(V)T(V) over a vector space VV, where VV is the codomain of a static feature map ϕ:XV\phi : X \to V for XX the input domain. The tensor algebra is the graded sum T(V)=m=0VmT(V) = \bigoplus_{m=0}^\infty V^{\otimes m}; elements are tuples t=(t0,t1,t2,)t = (t_0, t_1, t_2, \ldots) with tmVmt_m \in V^{\otimes m}.

For a sequence x=(x1,,xL)x = (x_1, \ldots, x_L), Seq2Tens defines the sequential feature map as

Φ(x)=i=1Lφ(xi)\Phi(x) = \prod_{i=1}^{L} \varphi(x_i)

where φ(x)=(1,ϕ(x),0,0,)T(V)\varphi(x) = (1, \phi(x), 0, 0, \ldots) \in T(V) and the product is the non-commutative multiplication of T(V)T(V). This product, for s,tT(V)s, t \in T(V), is defined by

(st)m=i=0msitmi(s \cdot t)_m = \sum_{i = 0}^{m} s_i \otimes t_{m-i}

which is the convolution product, ensuring sensitivity to input order. For each degree mm, the mm-th component of Φ(x)\Phi(x) is given by

Φm(x)=1i1<<imLϕ(xi1)ϕ(xim)\Phi_m(x) = \sum_{1 \le i_1 < \cdots < i_m \le L} \phi(x_{i_1}) \otimes \cdots \otimes \phi(x_{i_m})

This formula generalizes nn-gram features to sequences of arbitrary objects and length, providing a universal, hierarchical summary of all subsequences.

Non-commutativity is essential: reordering sequence elements changes the result in T(V)T(V), allowing for the capture of order-dependent information intrinsic to sequential semantics. This contrasts with commutative summations or bag-of-words style representations, which ignore ordering and therefore cannot distinguish fundamentally different sequences.

2. Low-Rank Tensor Projections and Scalability

Direct computation with full tensors VmV^{\otimes m} is computationally intractable due to the exponential growth in the number of coefficients with mm and input dimension dd. Seq2Tens circumvents this by restricting evaluations to linear functionals on T(V)T(V) that are of low CP rank, such as rank-1 tensors.

A rank-1 linear functional \ell for degree mm can be represented as m=v1mvmm\ell_m = v^m_1 \otimes \cdots \otimes v^m_m for vkmVv^m_k \in V. Then, the action of \ell on the feature map is

,Φ(x)=m=0M1i1<<imLk=1mvkm,ϕ(xik)\langle \ell, \Phi(x) \rangle = \sum_{m = 0}^M \sum_{1 \le i_1 < \cdots < i_m \le L} \prod_{k=1}^m \langle v^m_k, \phi(x_{i_k}) \rangle

This recursion avoids explicit tensor construction and can be implemented efficiently with complexity O(M2Ld)O(M^2 L d), where MM is the truncation degree and dd the input dimension.

The framework extends to low-rank sums via

m=j=1Rvj,1mvj,mm\ell_m = \sum_{j=1}^R v^m_{j,1} \otimes \cdots \otimes v^m_{j,m}

which enables a trade-off between expressive power and computational efficiency. Stacking NN such functionals produces a low-rank Seq2Tens map (LS2T): Φ~θ~(x1,,xL)=(j,Φ(x1,,xL))j=1N\tilde{\Phi}_{\tilde{\theta}}(x_1, \ldots, x_L) = (\langle \ell^j, \Phi(x_1, \ldots, x_L) \rangle)_{j = 1}^N To restore expressivity lost by enforcing low rank, one stacks or composes several such LS2T layers, similar to layering in neural architectures.

3. Integration with Neural, Probabilistic, and Structured Models

Seq2Tens is modularly integrated into neural network architectures as a sequence-to-sequence transformation layer. It is compatible with:

  • Feedforward and convolutional layers: Used as front-end static feature maps ϕ\phi.
  • Neural stacks: LS2T modules can be chained, akin to deep stacking in RNNs or CNNs, supporting both unidirectional and bidirectional processing.
  • Hybrid architectures: In combination with standard network blocks, including RNNs, CNNs, and transformers, LS2T layers add the capacity for capturing global, high-order, and non-local dependencies.
  • Probabilistic models: In Gaussian process and kernel regression models (notably with signature kernels), the Seq2Tens mapping allows for scalable approximation and parameterization of kernel functions, with the weight space parameterized by low-rank tensors.
  • Structured data and graphs: The expected signature features of random walks on graphs use the same algebraic and low-rank evaluation techniques, supporting tasks requiring long-range and global reasoning.

4. Theoretical Properties: Universality and Invariance

Seq2Tens inherits theoretical strengths from the algebraic structure:

  • Universality: If the static feature map ϕ\phi is universal (i.e., sufficiently rich to approximate any continuous function on XX), then the sequence map Φ\Phi is universal for the space of sequences; any continuous function on sequences can be approximated as f(x),Φ(x)f(x) \approx \langle \ell, \Phi(x) \rangle with suitable T(V)\ell \in T(V)^*.
  • Order sensitivity: The non-commutative algebra ensures that sequence order is preserved and exploited, supporting tasks where even basic semantics depend on ordering.
  • Compatibility with invariance: Seq2Tens can be adapted to be invariant or sensitive to specific transformations (e.g., time reparameterization) by modifying the static map or inputs, matching key properties of signature features from rough path theory.
  • Depth vs width: Layered LS2T architectures can approximate high-rank functionals; depth compensates for rank-restricted parameter efficiency analogously to findings in deep learning.

5. Computational Complexity and Implementation

Efficient evaluation algorithms for Seq2Tens (see App. D, Algorithm D.1/D.2) exploit the recursive structure of low-rank projections to achieve:

  • Time complexity: Linear in sequence length LL, quadratic or linear in truncation degree MM, and linear in output dimension NN.
  • Space complexity: Linear in NN and MM, avoiding exponential tensor storage costs.
  • Scalability: Amenable to batched, vectorized implementations on GPUs or parallel hardware; empirical timings demonstrate orders-of-magnitude improvements over naive tensor approaches for similar expressiveness.

A plausible implication is that scalability properties make Seq2Tens suitable for both research prototypes and production-scale deployment on large and heterogeneous sequential datasets, without a dramatic loss in expressivity.

6. Empirical Results and Applications

Seq2Tens has demonstrated strong empirical performance on multiple benchmarks:

  • Multivariate time series classification: On the UCR/UEA archive, LS2T layers improved or matched state-of-the-art baselines, often reducing parameter counts.
  • Healthcare: mortality prediction: On Physionet2012 ICU data, FCN+LS2T architectures outperformed strong RNN and Transformer models in both accuracy and precision-recall area.
  • Generative modeling: As a component within variational autoencoders (e.g., GP-VAE), LS2T improved negative log-likelihood, mean squared error, and AUROC on video and medical imputation tasks.
  • Parameter/sample efficiency: Models with LS2T layers often achieved similar or better accuracy than larger baselines, suggesting improved sample efficiency.
  • Graph-structured learning: The same low-rank projection techniques have been applied for high-order path-based encoding in graph models, beneficial for tasks requiring global aggregation.

7. Position within Signature-Based and Tensor-Based Learning

Seq2Tens generalizes and unifies several prior approaches:

  • String kernels and nn-gram models: By capturing subsequence features of all orders in a universal algebraic structure.
  • Path signatures from rough path theory: By adopting the tensor algebra and extending signature machinery with low-rank, scalable projections to handle real-world complexities.
  • Kernel learning and Gaussian processes: By providing scalable feature maps for kernel methods over sequences and structured domains.
  • Deep learning: By enabling the insertion of mathematically principled, parameter-efficient, and order-sensitive sequence representations into modern deep architectures.
  • Graph learning: By providing a framework for efficient and expressive path-based aggregation in node and graph representations.

Seq2Tens thus provides a modular, mathematically rigorous, and computationally scalable bridge between algebraic functional analysis, probabilistic modeling, and state-of-the-art deep learning pipelines, alleviating prior bottlenecks in expressive yet practical sequence modeling.