Papers
Topics
Authors
Recent
Search
2000 character limit reached

SeqTopK: Efficient Top-K Selection

Updated 16 March 2026
  • SeqTopK is a set of algorithmic techniques designed to efficiently select the top-K sequences or elements from large candidate sets, with applications in spatio-temporal pattern mining, database graphs, and neural network routing.
  • The methods employ advanced strategies such as depth-first traversal with dynamic thresholding, sampling-based approaches in complex data graphs, and differentiable relaxations to optimize performance.
  • SeqTopK approaches offer practical benefits including near-linear runtime scalability, robust load balancing in mixture-of-experts models, and enhanced privacy guarantees with significant empirical speedups.

SeqTopK refers collectively to a set of algorithms and operator families that enable efficient, often exact or near-exact, top-KK selection over sequences. The term is not limited to a single methodological context but encompasses core algorithmic ideas in sequential pattern mining, differentiable top-kk selection, scalable document retrieval, adaptive routing in mixture-of-experts models, and differentially private selection. Across these applications, SeqTopK techniques share the central goal of efficiently identifying the KK highest-ranking entities—whether sequences, patterns, tokens, or experts—from combinatorially large or dynamically generated candidate sets, often under stringent constraints of efficiency, scalability, or privacy.

1. SeqTopK in Sequential Pattern Mining

In event-based spatio-temporal data, SeqTopK denotes an algorithm to identify the KK most significant sequential event patterns, each representing a chain of event types exhibiting strong attraction relations in space and time. The SeqTopK framework formalizes the problem as follows: given a finite event type set FF, a multiset of timestamped and spatially localized event instances DD, and an embedding space VRd×RV \subset \mathbb{R}^d \times \mathbb{R}, the objective is to enumerate all length min_len\geq \text{min\_len} sequences with the top KK highest significance scores. Significance is quantified by a recursively defined function, SeqIndex(s)\mathrm{SeqIndex}(s), for a sequence ss, rooted in local spatio-temporal density ratios:

SeqIndex(s[1]s[k])=min(SeqIndex(s[1]s[k1]),DensityRatio(s[k1]s[k]))\mathrm{SeqIndex}(s[1] \rightarrow \dots \rightarrow s[k]) = \min(\mathrm{SeqIndex}(s[1] \rightarrow \dots \rightarrow s[k-1]), \mathrm{DensityRatio}(s[k-1] \rightarrow s[k]))

The SeqTopK algorithm leverages depth-first traversal, dynamically maintains a threshold set by the KK-th best sequence found so far, and prunes subtrees whose significance cannot exceed this threshold. This avoids the need for an explicit significance cutoff and results in aggressive search space reduction; typical settings yield pruning of more than 80% of the expansion tree. Empirical evaluation confirms near-linear runtime scaling in KK, and practical suitability for datasets with moderate event type and pattern length (Maciąg, 2017).

2. SeqTopK in Database Graph Sequential Pattern Mining

Extension to database graphs, in which each vertex contains a transaction database, yields a #P-hard top-KK sequential pattern mining problem due to the combinatorial explosion of induced transaction sequences over all paths. Exact enumeration is infeasible for any realistic ll (pattern length), motivating a two-step sampling-based SeqTopK framework:

  1. Path Sampling: Randomly sample paths in the database graph using a progressive, length-ll-aware distribution.
  2. Transaction-Sequence Sampling: For each path, sample a transaction sequence by uniform draws from the associated transaction databases, adjusting for bias with explicit correction factors.

An unbiased estimator computes support counts for patterns over this sample, and an in-memory sequential pattern miner such as PrefixSpan is applied to produce empirical top-KK patterns. Theoretical bounds guarantee that for sample size m=O(ε2log(P/δ))m = O(\varepsilon^{-2} \log(|\mathcal{P}|/\delta)), the estimated top-KK matches the true set with probability at least 1δ1-\delta, where P|\mathcal{P}| is the number of candidate patterns. This approach provides a rigorous quality–efficiency tradeoff (Lei et al., 2018).

3. SeqTopK in Mixture-of-Experts Routing

In Mixture-of-Experts (MoE) architectures for neural networks, standard routing assigns a fixed number KK of experts to each token independently, ignoring intra-sequence complexity variation. The sequence-level TopK (SeqTopK) routing strategy shifts the budget: for a sequence of length TT, the router selects the top TKT\cdot K expert activations based on all T×NT\times N gating scores across the entire sequence, allowing difficult (high-entropy) tokens to receive more expertise and easy tokens less (subject to per-token lower/upper bounds). The selection is performed by simply taking the top TKT\cdot K entries of the score matrix. This strategy is implemented by flattening and masking the scores, requiring only minor code adjustment and introducing <1%<1\% overhead:

1
2
3
4
5
6
flat_scores = scores.view(B, T*N)
vals, idx = flat_scores.topk(T*K, dim=-1)
seq_mask = torch.zeros_like(flat_scores)
seq_mask.scatter_(1, idx, 1.0)
mask = seq_mask.view(B, T, N)
routed = mask * scores

Empirical results show consistent improvement over standard token-wise TopK, with the margin increasing with higher sparsity (smaller KK). Notably, SeqTopK exhibits robust load balancing and self-allocates expert capacity according to token-level uncertainty (Wen et al., 9 Nov 2025).

4. Differentiable SeqTopK Operators

In neural network architectures requiring differentiable relaxation of the top-KK operation (e.g., memory retrieval, hard attention), the Successive Halving SeqTopK operator provides an efficient, continuous relaxation computed via a tournament selection. Each round pairs candidates, applies a two-element softmax with a boosting factor, and halves the candidate set until KK remain. This reduces the computational complexity from O(kn)\mathcal{O}(kn) (as in global SoftTopK) to O(nlog(n/K))\mathcal{O}(n\log(n/K)). The forward and backward passes both benefit from shallow dependence chains, improving training dynamics:

wi=eCvieCvi+eCvj,E=wiEi+wjEjw_i = \frac{e^{C v_i}}{e^{C v_i} + e^{C v_j}}, \quad E' = w_i E_i + w_j E_j

Empirical evaluation demonstrates 2–5x speedups and improved cosine-similarity approximation compared to the global SoftTopK, especially at larger nn and KK (Pietruszka et al., 2020).

5. SeqTopK in Efficient Top-KK Inference and Retrieval

In large-scale multi-target learning (e.g., collaborative filtering, multi-label classification), SeqTopK refers to exact top-KK inference algorithms using the threshold algorithm applied to separable linear relational models. The key insight is to use RR pre-sorted lists (one per embedding dimension) for efficient sequential exploration: at each depth dd, the best unseen score is upper-bounded, and once the current KKth-lowest observed score exceeds the upper bound, the algorithm terminates with correctness guarantees. This approach is instance-optimal among deterministic, wild-guess-free algorithms. A partial scoring extension further allows early aborting of dot-products below the current lower bound. In practice, massive savings (up to 100×100\times) over exhaustive scoring are achieved (Stock et al., 2016).

In document retrieval, a space- and time-optimal index for top-KK term-frequency queries over concatenated string collections builds upon this principle. Using compressed suffix arrays, document arrays, and succinct range-max query structures, sublinear-time retrieval is realized while maintaining nearly optimal space: O(CSA+nlogD)\mathcal{O}(|CSA| + n\log D) bits and O(ts(p)+kloglogn)O(t_s(p) + k\log\log n) query time, where ts(p)t_s(p) is the cost of pattern search in the CSA (Hon et al., 2011).

6. Top-KK High-Utility Sequential Pattern Mining

Top-KK methods are essential in mining high-utility sequential patterns (HUSPM) where utility thresholds are difficult to specify a priori. The TKUS algorithm, instantiating SeqTopK for HUSPM, employs an initial projection and threshold-raising phase to set a dynamic minimum utility (minutil) equal to the kkth-highest observed utility among all 1-, 2-, and q-sequences, then iterates with tight upper bounds (PEU, RSU) and local projections for recursive pattern extension. Aggressive subtree pruning (Sequence Utility Raising, Terminate Descendants Early, Eliminate Unpromising Items) ensures that only promising candidates are retained. Empirically, TKUS achieves $5$–20×20\times better performance and $10$–30%30\% lower memory use than prior art on diverse benchmarks (Zhang et al., 2020).

7. SeqTopK in Differentially Private Top-KK Selection

SeqTopK also denotes a highly efficient algorithmic solution to the problem of selecting a sequence of kk items with the highest scores in a differentially private manner. The FastJoint algorithm formulates top-KK as a joint exponential mechanism over the space of P(D,k)P(D,k) distinct kk-sequences, with loss function given by the maximum deviation from the true top kk scores. By truncating the loss and merging groups, the number of subsets to consider decreases from O(dk)O(dk) to O((k3/ϵ)lnd)O((k^3/\epsilon)\ln d). The final selection is performed via group-weighted Gumbel-max sampling and efficient rejection sampling within the identified group. Theoretical analysis guarantees ϵ\epsilon-DP and high-probability utility bounds. Empirically, FastJoint is orders of magnitude faster than previous joint mechanisms, while maintaining equivalent accuracy (Wu et al., 2024).


References:

  • Efficient Discovering of Top-K Sequential Patterns in Event-Based Spatio-Temporal Data (Maciąg, 2017)
  • Mining Top-k Sequential Patterns in Database Graphs: A New Challenging Problem and a Sampling-based Approach (Lei et al., 2018)
  • Route Experts by Sequence, not by Token (Wen et al., 9 Nov 2025)
  • Exact and efficient top-K inference for multi-target prediction by querying separable linear relational models (Stock et al., 2016)
  • TKUS: Mining Top-K High-Utility Sequential Patterns (Zhang et al., 2020)
  • Successive Halving Top-k Operator (Pietruszka et al., 2020)
  • Faster Differentially Private Top-kk Selection: A Joint Exponential Mechanism with Pruning (Wu et al., 2024)
  • Towards an Optimal Space-and-Query-Time Index for Top-k Document Retrieval (Hon et al., 2011)

Topic to Video (Beta)

No one has generated a video about this topic yet.

Whiteboard

No one has generated a whiteboard explanation for this topic yet.

Follow Topic

Get notified by email when new papers are published related to SeqTopK.