Linear-time Online Transformation (LOT)

Updated 23 June 2026

Linear-time Online Transformation (LOT) is a framework that ensures incremental, single-pass updates of data structures with overall linear time complexity.
LOT methodologies are applied in domains like time series, convex optimization, attention in neural networks, string processing, and online suffix trie construction.
Empirical benchmarks demonstrate LOT's advantages in speed, memory efficiency, and exact output over traditional offline techniques.

A Linear-time Online Transformation (LOT) is a class of algorithmic techniques or frameworks that enable incremental, single-pass, linear or near-linear-time computation and data structure updates as data arrives—typically in a streaming or online fashion—without requiring full random access to past data or global recomputation. LOT methodologies have been explicitly proposed or adopted in areas as diverse as time series analysis (e.g., online visibility graph construction), online convex optimization (e.g., online-to-batch last-iterate conversion), attention mechanisms in neural sequence models, string processing (e.g., online recognizability of languages under palindromic concatenation), and online suffix trie construction. In each case, the defining property is that, after any incremental update (arrival of a symbol, time step, or query), the relevant data structure or output can be computed by examining only a small, bounded subset of the input with per-step cost scaling linearly (or near-linearly) in the local problem size, and total cost linear in total input length.

1. Formal Definitions and Common Criteria

The unifying motif of LOT frameworks is the requirement to achieve online, incremental updates to a data structure, index, or state such that, after $n$ input steps (where input may be a time series, string, sequence of online decisions, etc.), the total computational cost is $O(n)$ (or $O(n \log \sigma)$ if the input is over a size- $\sigma$ alphabet), and each step incurs only $O(1)$ or $O(\mathrm{poly}(\log n))$ amortized work beyond the baseline per-symbol or per-decision cost.

Two typical desiderata for LOT algorithms (exemplified in online visibility graph construction) are:

In-place node removal: At each window advance, efficiently eliminate the oldest element and its associated state/edges in $O(N)$ time.
Incremental update: Efficiently determine all new edges/relationships for the incoming data point in $O(N)$ (or constant) time, without rebuilding from scratch.

This principle appears, for instance, in the streaming windowed visibility graph problem (Huang et al., 2023), online-to-batch convex optimization (Cutkosky, 2019), online monotonic attention (Raffel et al., 2017), palindromic language recognition (Kosolobov et al., 2014), and online suffix trie construction (Hendrian et al., 2023).

2. Algorithmic Frameworks and Implementation Patterns

Specific LOT frameworks instantiate the general template in domain-adapted ways:

Online Visibility Graphs (NVG/HVG, (Huang et al., 2023)): Maintain an adjacency map for a moving window; on each step, remove the oldest node and scan backwards to determine visibility edges for the new node using running minima/maxima (i.e., minimum slope for NVG, maximum height for HVG), in $O(N)$ per-step time.
Anytime Online-to-Batch Averaging (Cutkosky, 2019): Maintain weighted averages of online algorithm iterates for stochastic optimization, permitting last-iterate guarantees and smoothness adaptivity, with $O(d)$ per-iteration complexity and strictly streamed memory costs.
Online Monotonic Attention (Raffel et al., 2017): Enforce a monotonic left-to-right alignment in sequence-to-sequence models so that each encoder state is attended at most once, enabling $O(n)$ 0 test-time cost (vs. $O(n)$ 1 for standard soft-attention).
Linear Recognizability of Pal $O(n)$ 2 (Kosolobov et al., 2014): Maintain bit-vectors and palindromic iterators to decide, after reading each symbol, whether the current prefix lies in $O(n)$ 3, with all updates and checks amortized $O(n)$ 4 per-symbol via bit-manipulation and auxiliary data structures.
Online Linear-size Suffix Trie Construction (Hendrian et al., 2023): Maintain active points, suffix links, and incremental type-1/type-2 node creation when scanning the text stream strictly once in either direction, achieving $O(n)$ 5 time without retaining past input.

3. Mathematical Correctness and Complexity Guarantees

LOT algorithms are designed so that each intermediate state, after any incremental input, exactly matches the output or data structure that would be built from scratch on that input prefix or window. Proof strategies typically rely on the following:

Data structure invariants: For example, adjacency lists reflect true visibility relationships in the current moving window (Huang et al., 2023); incrementally maintained averages accurately aggregate all past iterates with specified weighting (Cutkosky, 2019); or all palindromic suffixes and result indicators are up to date for recognition (Kosolobov et al., 2014).
Bounding per-update cost: Using sliding window, monotonic scan, amortized analysis on palindromic iterators, or O(1) update cost in union-find or nearest-marked-ancestor structures (Hendrian et al., 2023).
Global total work: For moving windows or streams of length $O(n)$ 6, total computation is $O(n)$ 7 or $O(n)$ 8 (input-size times local window/alphabet size) and sometimes exactly $O(n)$ 9, depending on data model.

These strategies ensure that the term "linear-time" is justified in the context of either the entire data stream or, for moving windows, in the product of window size and overall stream length, providing strong scalability guarantees.

4. Canonical Application Domains

Time Series and Visibility Graphs

LOT enables real-time conversion of streaming time series into visibility graphs under natural or horizontal criteria. Each window-advance involves only local removal/insertion and one traversal per node, ensuring $O(n \log \sigma)$ 0 computation per window and outperforming all prior baseline methods on both synthetic and real data by more than an order of magnitude for typical window sizes (Huang et al., 2023).

Online Convex Optimization

LOT reductions for online-to-batch conversion provide anytime last-iterate excess loss guarantees, smooth/optimistic accelerations, and adaptivity to unknown problem parameters, all with $O(n \log \sigma)$ 1 per-iteration and no global gradient storage. These advances enable strong stochastic guarantees for real-world optimization under streaming data (Cutkosky, 2019).

Sequence-to-Sequence Attention

LOT-style monotonic attention permits real-time, strictly streaming decoding in neural models, with minimal empirical sacrifice in predictive accuracy compared to standard softmax attention ( $O(n \log \sigma)$ 2 BLEU or $O(n \log \sigma)$ 3 absolute error), and with order-of-magnitude improvements in online latency and memory (Raffel et al., 2017).

Online String Processing and Pattern Recognition

LOT frameworks enable linear-time recognition for concatenations with palindromic languages (Kosolobov et al., 2014), as well as online construction of full-text indices (LSTs) from read-only or streaming data, matching the efficiency of offline suffix trie/tree constructions while never storing the full input string (Hendrian et al., 2023).

5. Empirical Evaluation and Benchmarks

Extensive experiments validate the linear-time online transformation frameworks:

Domain	LOT Variant	Per-step Time	Competing Baselines	Relative Speedup
Visibility Graph	LOT-NVG/HVG	0.14–1.0 ms (N=100–2000)	DC, SC, BST, LT	$O(n \log \sigma)$ 4 faster (real-world windows)
Sequence Models	LOT Attention	$O(n \log \sigma)$ 5 test time	Softmax, CTC	Comparable accuracy, $O(n \log \sigma)$ 6 speedup
Convex Opt.	LOT Online-to-Batch	$O(n \log \sigma)$ 7 per-iter	Classic OTB	Anytime, accelerated, adaptive
Pal-recognition	LOT Pal $O(n \log \sigma)$ 8	$O(n \log \sigma)$ 9 per-iter (RAM)	Naive enumeration	Optimal (given model)
Online Suffix Trie	LOT-LST	$\sigma$ 0 per-symbol	Offline STree+LST conv.	No need to store full T

LOT variants consistently achieve state-of-the-art throughput, minimal memory overhead, and exact (i.e., non-approximate) outputs for their respective tasks.

6. Theoretical and Practical Significance

Adaptivity and Black-box Design: LOT methods, especially in online convex optimization, deliver optimal or near-optimal rates while never requiring knowledge of problem smoothness, variance, or other hyperparameters (Cutkosky, 2019).
Strict Online Constraints: Many LOT algorithms are constructed for models where the unseen portion of the data cannot be revisited or retained (e.g., streaming text, live sensor feeds, finite-memory devices), thus the methodology is critical for memory- or latency-constrained deployments (Hendrian et al., 2023).
Extensibility: LOT paradigms generalize to composed transformations (e.g., repeated application in palindromic composition (Kosolobov et al., 2014)), to multi-component data streams, and to more complex sequential models (limited forms of non-monotonic attention or rewinding (Raffel et al., 2017)).

A plausible implication is that as online, streaming, and incremental computation become even more central to large-scale data processing, the LOT paradigm will further underpin advances in algorithm design for latency-sensitive domains.

7. Limitations and Potential Extensions

Monotonicity and Structure Assumptions: Some LOT frameworks (e.g., monotonic attention) sacrifice flexibility for efficiency. Strict monotonicity may degrade performance on tasks requiring arbitrary reordering (e.g., machine translation between syntactically divergent languages) (Raffel et al., 2017).
Training Complexity: While test-time and update procedures are linear, some training variants (e.g., expectation-based differentiable monotonic attention) still involve $\sigma$ 1 computations due to soft recurrences, suggesting a need for further research into strictly linear-time training for large sequences.
Generality: Not all data structures or models admit efficient LOT versions without further structural knowledge (e.g., special periodicity/combinatorics for palindromic recognition (Kosolobov et al., 2014)).
Implementation Overheads: In some settings, attaining theoretical $\sigma$ 2 update per input requires nontrivial bit-manipulation, amortized analysis, or model-specific optimizations.

The overall consensus across the literature is that the Linear-time Online Transformation abstraction unifies a significant body of modern algorithmic and learning-theoretic development, offering robust, scalable, and deployable alternatives to classical (batched, offline) computation in domains where online and streaming performance is a priority (Huang et al., 2023, Cutkosky, 2019, Raffel et al., 2017, Kosolobov et al., 2014, Hendrian et al., 2023).