One-by-One Sequential Processing

Updated 4 July 2026

One-by-One is a research pattern where individual units are processed sequentially, allowing explicit local control and state for tasks like gating and diagnostics.
It is applied in diverse fields such as feature selection, quantum optics, and sequential decision-making to achieve precise activation, extraction, and measurement.
The method trades aggregate symmetry for finer control, leading to benefits like exact zeros and improved interpretability, while introducing sensitivity to ordering and feasibility constraints.

“One-by-One” denotes a recurring research pattern in which individual units are processed sequentially, locally, or with explicit per-item state, rather than only through aggregate operators. Across the cited literature, the expression is used for diagonal input gating in nonlinear feature selection, progressive collection of weakly supervised action evidence, ordered enumeration of tagged visual objects, sequential screening and no-recall bandit decisions, single-particle partitioning and absorption, one-at-a-time extraction of atoms, per-defect and per-pulsar diagnostics, and day-by-day assignment through balanced permutation sequences (Sudo et al., 2021, Zhong et al., 2018, Yan et al., 2024, Cohen et al., 2023, Bocquillon et al., 2012, Lopez-Eiguren et al., 2016, Agazie et al., 2024, Adams et al., 25 Feb 2026).

1. Semantic scope and recurring formalizations

In the cited work, “one-by-one” is not a single technical term but a family of constructions. Sometimes it means literal sequential processing of one entity at a time, as in vehicular edge access where only one vehicle wakes up per frame, or in online bandits where a current arm can be pulled or permanently abandoned (Jang et al., 2023, Roy et al., 2017). In other settings it denotes a structural decomposition into independent units, such as one-to-one input gates in LassoLayer or one-by-one clause imposition in compressed ALLSAT (Sudo et al., 2021, Wild, 2016). In physical sciences it often refers to elementary excitations or particles being partitioned, absorbed, or extracted individually (Bocquillon et al., 2012, Selstø, 2020, Hamodi et al., 2020).

Domain	Operational meaning	Representative paper
Feature selection	One input dimension, one gate	(Sudo et al., 2021)
Weak supervision	One classifier stage collected after another	(Zhong et al., 2018)
Multimodal LLMs	List tagged objects in numeric order	(Yan et al., 2024)
Sequential decision	One test, arm, or vehicle at a time	(Cohen et al., 2023, Roy et al., 2017, Jang et al., 2023)
Quantum and condensed matter	One excitation, particle, atom, or trap event at a time	(Bocquillon et al., 2012, Selstø, 2020, Hamodi et al., 2020, Clement et al., 2010)
Diagnostics and fairness	One defect, pulsar, or daily assignment step at a time	(Lopez-Eiguren et al., 2016, Agazie et al., 2024, Adams et al., 25 Feb 2026)

A closely related formulation is “one-to-one.” In LassoLayer, the first layer is a diagonal operator with one scalar gate per input coordinate, while in asynchronous network theory there is a one-to-one correspondence between effective connectivity and the temporal structure of pairwise averaged correlations (Sudo et al., 2021, Albada et al., 2014). The common feature is explicit locality: each unit has its own gate, test, trajectory, clause state, or assignment position.

2. Learning systems: gating, ordered supervision, and target-conditioned explanations

In nonlinear feature selection, LassoLayer turns selection into a learned diagonal gating operation. For input $x \in \mathbb{R}^d$ , the layer applies

$y=\sigma_{\mathrm{out}}\!\big(w\odot \sigma_{\mathrm{in}}(x)\big),$

and in the experimental LassoMLP instantiation both activations are identity, so the layer reduces to $y=w\odot x$ (Sudo et al., 2021). Sparsity is imposed only on the gate vector $w$ through

$\mathcal{L}(\theta,w)=\mathcal{L}_{\mathrm{task}}(\theta,w)+\lambda\lVert w\rVert_1,$

followed by the proximal shrinkage operator

$S_\lambda(w_i)=\operatorname{sign}(w_i)\max(|w_i|-\lambda,0),$

which yields exact zeros and therefore true “off” switches (Sudo et al., 2021). The paper also introduces an early-phase “kicking” heuristic: during the first $K$ epochs, any zero gate is reactivated with probability $\rho$ by setting $w_i:=\pm\delta$ . In the reported settings, $K=1000$ , $y=\sigma_{\mathrm{out}}\!\big(w\odot \sigma_{\mathrm{in}}(x)\big),$ 0, and $y=\sigma_{\mathrm{out}}\!\big(w\odot \sigma_{\mathrm{in}}(x)\big),$ 1 for synthetic data or $y=\sigma_{\mathrm{out}}\!\big(w\odot \sigma_{\mathrm{in}}(x)\big),$ 2 for MNIST (Sudo et al., 2021). On a nonlinear regression task with $y=\sigma_{\mathrm{out}}\!\big(w\odot \sigma_{\mathrm{in}}(x)\big),$ 3 and output $y=\sigma_{\mathrm{out}}\!\big(w\odot \sigma_{\mathrm{in}}(x)\big),$ 4, LassoMLP’s feature-selection AUC approaches $y=\sigma_{\mathrm{out}}\!\big(w\odot \sigma_{\mathrm{in}}(x)\big),$ 5 as $y=\sigma_{\mathrm{out}}\!\big(w\odot \sigma_{\mathrm{in}}(x)\big),$ 6 increases, whereas linear Lasso saturates near $y=\sigma_{\mathrm{out}}\!\big(w\odot \sigma_{\mathrm{in}}(x)\big),$ 7 for $y=\sigma_{\mathrm{out}}\!\big(w\odot \sigma_{\mathrm{in}}(x)\big),$ 8 (Sudo et al., 2021). On MNIST with 784 image dimensions augmented by 5000 Gaussian noise features, LassoMLP outperforms LassoNet, HSIC Lasso, PFA, and Fisher score in all but two reported settings, namely $y=\sigma_{\mathrm{out}}\!\big(w\odot \sigma_{\mathrm{in}}(x)\big),$ 9 and $y=w\odot x$ 0 for training samples and selected features (Sudo et al., 2021).

A different learning use of one-by-one supervision appears in Set-of-Mark training for multimodal LLMs. “List items one by one” asks the model to enumerate every tagged object in ascending numeric order, producing outputs such as “1. person, 2. cat, 3. dog” (Yan et al., 2024). The task is trained with standard autoregressive next-token prediction rather than explicit alignment or ordering losses. The dataset construction reported in the paper uses 10k SoM-listing pairs and 20k SoM-QA conversations on MS-COCO, mixed with LLaVA-1.5 Mix665K for a total of 695k instruction-tuning samples (Yan et al., 2024). The reported gains persist even when tags are omitted at inference: for LLaVA-1.5 Vicuna-13B, POPE F1 improves from 85.9 to 86.6, MME overall from 1531.3 to 1563.1, SEED-I from 68.2 to 69.6, and LLaVA-W from 70.7 to 75.3 (Yan et al., 2024). With tags at test time, MM-Vet reaches 37.2, which is +1.8 over the baseline LLaVA-1.5 13B (Yan et al., 2024).

In explanation methods, GANMEX makes one-vs-one attributions depend on a generated target-class counterfactual baseline rather than on a zero or blur baseline. The generator $y=w\odot x$ 1 is trained so that the baseline is realistic, classified as the target class, and close to the original input; the classifier being explained is inserted into the adversarial objective as the class discriminator (Shih et al., 2020). For Integrated Gradients, the attribution is computed on the score difference $y=w\odot x$ 2 along the path from the GANMEX baseline $y=w\odot x$ 3 to $y=w\odot x$ 4. Reported quantitative gains include AOPC100 on MNIST improving from 0.614 with a zero baseline to 1.421 with GANMEX, and inverse localization on BAM improving overall from 1.591 with zero and 0.852 with MDTS to 0.747 with GANMEX (Shih et al., 2020). The paper also reports better sensitivity under cascading randomization of the classifier, addressing a standard criticism of saliency methods (Shih et al., 2020).

3. Sequential discovery, screening, and resource allocation

In weakly supervised temporal action detection, “one-by-one” names the testing phase of a two-stage method: step-by-step erasion during training and one-by-one collection during inference (Zhong et al., 2018). A snippet-wise classifier is trained, the most discriminative clips are stochastically erased, a new classifier is trained on the remainder, and the process repeats up to four steps (Zhong et al., 2018). At test time, all trained classifiers are run, their soft masks and probabilities are fused, and a fully connected CRF refines temporal continuity (Zhong et al., 2018). On THUMOS’14, the reported weakly supervised detector reaches mAP@0.5 of 15.9 versus 13.7 for UntrimmedNet; on ActivityNet v1.2 it achieves average mAP 15.6, with [email protected] of 27.3 (Zhong et al., 2018). The FC-CRF is a major part of the one-by-one collection stage: on THUMOS’14 it raises [email protected] from 6.9% to 14.0%, and on ActivityNet it raises average mAP from 2.6% to 14.9% (Zhong et al., 2018).

In sequential strategic screening, one-by-one means a fixed pipeline of classifiers through which an agent can manipulate between stages (Cohen et al., 2023). The central result is the “zig-zag” phenomenon: because earlier-stage constraints need not be maintained later, sequential success can be strictly cheaper than satisfying the intersection of all positive regions simultaneously (Cohen et al., 2023). In the paper’s two-dimensional example, the conjunctive best response costs $y=w\odot x$ 5, while the sequential best response costs $y=w\odot x$ 6 by first moving to $y=w\odot x$ 7 and then to $y=w\odot x$ 8 (Cohen et al., 2023). More generally, the paper proves $y=w\odot x$ 9, gives a convex program for optimal sequential manipulation, shows an unbounded gap in two dimensions, and proves that monotone classifiers eliminate zig-zag advantages (Cohen et al., 2023).

In the online streaming multi-armed bandit model, arms arrive one at a time and cannot be revisited after being skipped (Roy et al., 2017). For Bernoulli arms with means drawn i.i.d. from a distribution with left-tail behavior $w$ 0 near zero, the paper derives lower bounds of order $w$ 1 in the large- $w$ 2 regime and $w$ 3 in the small- $w$ 4 regime, and matches them up to constant factors with threshold-style policies (Roy et al., 2017). In the uniform case $w$ 5, the fixed-payout recursion is $w$ 6, giving $w$ 7, while the small- $w$ 8 lower bound is $w$ 9 (Roy et al., 2017). The no-revisit constraint is the defining difference from classical MAB formulations (Roy et al., 2017).

Vehicular edge computing uses the phrase more literally. The reported system adopts a one-by-one scheduling mechanism in which only one vehicle is active in uplink and only one in downlink per frame, with the optimization jointly choosing scheduling, offloading ratio, and bit allocation over a mission horizon (Jang et al., 2023). The uplink energy for vehicle $\mathcal{L}(\theta,w)=\mathcal{L}_{\mathrm{task}}(\theta,w)+\lambda\lVert w\rVert_1,$ 0 in frame $\mathcal{L}(\theta,w)=\mathcal{L}_{\mathrm{task}}(\theta,w)+\lambda\lVert w\rVert_1,$ 1 is

$\mathcal{L}(\theta,w)=\mathcal{L}_{\mathrm{task}}(\theta,w)+\lambda\lVert w\rVert_1,$ 2

and the paper solves the resulting mixed-integer non-convex problem through a Lagrange dual method (Jang et al., 2023). In the reported numerical experiments, optimized one-by-one access yields lower total vehicle energy than local execution, orthogonal access, and one-by-one access with equal bit allocation (Jang et al., 2023).

4. Physical one-by-one processes: partitioning, absorption, extraction, and activation

In electron quantum optics, “partitioning electrons one by one” refers to a Hanbury Brown–Twiss geometry fed by an on-demand mesoscopic capacitor that emits one electron and one hole per cycle into a quantum Hall edge channel (Bocquillon et al., 2012). A quantum point contact with transmission $\mathcal{L}(\theta,w)=\mathcal{L}_{\mathrm{task}}(\theta,w)+\lambda\lVert w\rVert_1,$ 3 and reflection $\mathcal{L}(\theta,w)=\mathcal{L}_{\mathrm{task}}(\theta,w)+\lambda\lVert w\rVert_1,$ 4 partitions the excitations, and the low-frequency current correlations count emitted electron/hole excitations at the single-charge level (Bocquillon et al., 2012). In the equal-temperature excess-noise form used experimentally,

$\mathcal{L}(\theta,w)=\mathcal{L}_{\mathrm{task}}(\theta,w)+\lambda\lVert w\rVert_1,$ 5

so at $\mathcal{L}(\theta,w)=\mathcal{L}_{\mathrm{task}}(\theta,w)+\lambda\lVert w\rVert_1,$ 6 the measured $\mathcal{L}(\theta,w)=\mathcal{L}_{\mathrm{task}}(\theta,w)+\lambda\lVert w\rVert_1,$ 7 directly yields $\mathcal{L}(\theta,w)=\mathcal{L}_{\mathrm{task}}(\theta,w)+\lambda\lVert w\rVert_1,$ 8 (Bocquillon et al., 2012). The paper also shows that thermal antibunching suppresses low-energy source excitations, making the noise a probe of the emitted energy distribution (Bocquillon et al., 2012).

In unbound quantum dynamics, one-by-one absorption is implemented with a one-body complex absorbing potential and a Lindblad hierarchy over particle-number sectors (Selstø, 2020). Each absorption event is represented as a quantum jump generated by

$\mathcal{L}(\theta,w)=\mathcal{L}_{\mathrm{task}}(\theta,w)+\lambda\lVert w\rVert_1,$ 9

and the absorbed flux is projected onto single-particle scattering states through

$S_\lambda(w_i)=\operatorname{sign}(w_i)\max(|w_i|-\lambda,0),$ 0

to obtain singly differential spectra without constructing full many-particle scattering states (Selstø, 2020). The explicit two-particle formulas give $S_\lambda(w_i)=\operatorname{sign}(w_i)\max(|w_i|-\lambda,0),$ 1 from the time-integrated reduced one-body kernel $S_\lambda(w_i)=\operatorname{sign}(w_i)\max(|w_i|-\lambda,0),$ 2 and $S_\lambda(w_i)=\operatorname{sign}(w_i)\max(|w_i|-\lambda,0),$ 3 from the one-particle density matrix $S_\lambda(w_i)=\operatorname{sign}(w_i)\max(|w_i|-\lambda,0),$ 4 (Selstø, 2020). This framework is presented as naturally extensible to any number of particles, while the paper works out compact formulas for the two-particle case (Selstø, 2020).

The attractive Bose–Hubbard model provides a different one-by-one mechanism: by weakly modulating the tunnelling rate, a single atom can be extracted from a small matter-wave soliton while the remaining atoms stay localized (Hamodi et al., 2020). The drive is written as $S_\lambda(w_i)=\operatorname{sign}(w_i)\max(|w_i|-\lambda,0),$ 5, and in the strong-coupling regime the resonance for ejecting one atom is $S_\lambda(w_i)=\operatorname{sign}(w_i)\max(|w_i|-\lambda,0),$ 6 (Hamodi et al., 2020). For $S_\lambda(w_i)=\operatorname{sign}(w_i)\max(|w_i|-\lambda,0),$ 7, $S_\lambda(w_i)=\operatorname{sign}(w_i)\max(|w_i|-\lambda,0),$ 8, and $S_\lambda(w_i)=\operatorname{sign}(w_i)\max(|w_i|-\lambda,0),$ 9, the first excited band corresponding to one free atom becomes almost fully populated within a few tens of tunnelling times, and the transition rate scales as $K$ 0 with fitted slope $K$ 1 (Hamodi et al., 2020). The paper interprets the protection of the remaining cluster through energy gaps plus translation and parity-based selection rules (Hamodi et al., 2020).

One-by-one measurement also appears in two-qubit tomography. The reported protocol measures all density-matrix elements directly, rather than inferring off-diagonals from an ill-conditioned linear system (Bartkiewicz et al., 2015). The key theoretical property is the condition number $K$ 2 for the reconstruction matrix, whereas the paper reports $K$ 3, $K$ 4, $K$ 5, and $K$ 6 for comparison schemes (Bartkiewicz et al., 2015). Seventeen two-qubit polarization states were reconstructed, and the optimal protocol produced the smallest uncertainty circles in trace distance and disturbance (Bartkiewicz et al., 2015).

A nanoscale condensed-matter use is one-by-one trap activation in silicon nanowire transistors. As gate voltage increases, oxide traps do not generally turn on simultaneously; instead, Coulomb repulsion between occupied traps shifts neighboring trap energies and causes sequential activation (Clement et al., 2010). The paper reports non-overlapping occupancy peaks $K$ 7 for most traps, a repulsion energy $K$ 8 meV for a strongly coupled pair, and a noise reduction by more than one order of magnitude relative to a naive $K$ 9 superposition model (Clement et al., 2010). The effect weakens with increased channel electron density because screening reduces both the effective trap charge and inter-trap repulsion (Clement et al., 2010).

5. Individualized tracking, diagnostics, and fair repetition

In cosmological defect simulations, measuring monopole velocities “one by one” means identifying every monopole and antimonopole on the lattice by its topological charge and reconstructing worldlines across analysis times (Lopez-Eiguren et al., 2016). The method replaced earlier field-averaged estimators and resolved a long-standing ambiguity in the velocity-dependent one-scale model: the reported average velocities are $\rho$ 0 in the radiation era and $\rho$ 1 in the matter era, with no evidence for a luminal branch (Lopez-Eiguren et al., 2016). The same simulations also calibrate VOS parameters, with $\rho$ 2, $\rho$ 3, $\rho$ 4, and $\rho$ 5 in the corrected tables (Lopez-Eiguren et al., 2016).

A related diagnostic strategy appears in pulsar timing arrays. NANOGrav’s 15-year analysis removes pulsars one by one and recomputes the noise-marginalized optimal statistic to test the internal consistency of a spatially correlated stochastic signal (Agazie et al., 2024). For the full 67-pulsar array, the paper reports SCNMOS Hellings–Downs $\rho$ 6 with S/N $\rho$ 7, and MCNMOS Hellings–Downs $\rho$ 8 with S/N $\rho$ 9 (Agazie et al., 2024). The least-noisy-first removal order exhibits abrupt changes, including a large initial drop when removing PSR J1909−3744 and later jumps when removing B1855+09 and J2317+1439, but comparison with 100 simulated arrays shows no inconsistency with a stochastic gravitational-wave background (Agazie et al., 2024).

In fair division, one-by-one refers to day-by-day assignment rather than end-of-horizon averaging. The model considers $w_i:=\pm\delta$ 0 players and $w_i:=\pm\delta$ 1 indivisible items per day, with each day represented by a permutation of item ranks to players (Adams et al., 25 Feb 2026). Strong balance requires

$w_i:=\pm\delta$ 2

for every player $w_i:=\pm\delta$ 3, day $w_i:=\pm\delta$ 4, and rank position $w_i:=\pm\delta$ 5, while weak balance relaxes this to $w_i:=\pm\delta$ 6 (Adams et al., 25 Feb 2026). The strong condition implies ordinal PROP1 after every day, exists for all $w_i:=\pm\delta$ 7, and is impossible for many larger values, including all $w_i:=\pm\delta$ 8 (Adams et al., 25 Feb 2026). Weak balance also implies ordinal PROP1, exists for all $w_i:=\pm\delta$ 9, and fails for $K=1000$ 0 with $K=1000$ 1 (Adams et al., 25 Feb 2026). A weaker PROP2-sufficient condition remains open for general $K=1000$ 2 (Adams et al., 25 Feb 2026).

These examples show that one-by-one methods are not restricted to sparse learning or sequential control. They also function as diagnostic regimes in which the fundamental object of interest is an individual trajectory, source, or day-prefix rather than an ensemble average.

6. Algorithmic and theoretical consequences

Some one-by-one constructions formalize limits rather than procedures. In asynchronous recurrent networks, there is a one-to-one correspondence between effective population-level connectivity and the temporal structure of pairwise averaged correlations, except in degenerate cases (Albada et al., 2014). For binary and LIF networks, the correlation matrices take resolvent forms such as

$K=1000$ 3

and the paper proves that preserving pairwise averaged correlations under downscaling requires preserving the effective connectivity $K=1000$ 4 itself (Albada et al., 2014). This yields a hard reducibility bound: when in-degree is scaled by $K=1000$ 5, the minimal feasible factor is

$K=1000$ 6

so asynchronous networks cannot in general be reduced arbitrarily while keeping both mean activity and second-order structure fixed (Albada et al., 2014).

In combinatorics and graph embedding, one-by-one appears as a constructive locality condition. A particular sub-quadtree $K=1000$ 7 with $K=1000$ 8 vertices is embedded into the $K=1000$ 9-dimensional crossed cube $y=\sigma_{\mathrm{out}}\!\big(w\odot \sigma_{\mathrm{in}}(x)\big),$ 00 by placing vertices and edges one by one so that every quadtree edge maps to a path of length at most two (Selmi et al., 2022). The reported sufficient host dimensions are

$y=\sigma_{\mathrm{out}}\!\big(w\odot \sigma_{\mathrm{in}}(x)\big),$ 01

and the resulting embedding has dilation two (Selmi et al., 2022).

Boolean satisfiability offers another clause-by-clause use. “ALLSAT compressed with wildcards” imposes clauses one by one on disjoint 012 $y=\sigma_{\mathrm{out}}\!\big(w\odot \sigma_{\mathrm{in}}(x)\big),$ 02-rows, using the classical don’t-care symbol 2 together with the $y=\sigma_{\mathrm{out}}\!\big(w\odot \sigma_{\mathrm{in}}(x)\big),$ 03-wildcard meaning “at least one 0 here” (Wild, 2016). The method maintains disjointness, so the final row family is an orthogonal DNF after optional refinement of $y=\sigma_{\mathrm{out}}\!\big(w\odot \sigma_{\mathrm{in}}(x)\big),$ 04-bubbles to ordinary 012-rows (Wild, 2016). The paper’s Master Theorem yields total time $y=\sigma_{\mathrm{out}}\!\big(w\odot \sigma_{\mathrm{in}}(x)\big),$ 05, and for CNFs on $y=\sigma_{\mathrm{out}}\!\big(w\odot \sigma_{\mathrm{in}}(x)\big),$ 06 variables and $y=\sigma_{\mathrm{out}}\!\big(w\odot \sigma_{\mathrm{in}}(x)\big),$ 07 clauses this becomes

$y=\sigma_{\mathrm{out}}\!\big(w\odot \sigma_{\mathrm{in}}(x)\big),$ 08

with the method reported as most efficient for few but large clauses and extendable from clauses to superclauses (Wild, 2016).

Across these works, one-by-one procedures usually trade aggregate symmetry for explicit local state. This suggests a common pattern: the gain is finer control—exact zeros, target-conditioned explanations, trajectory-level diagnostics, day-prefix fairness, or dilation bounds—while the cost is sensitivity to ordering, feasibility checks, or impossibility thresholds. The cited literature makes that trade-off explicit through examples such as LassoLayer’s dependence on $y=\sigma_{\mathrm{out}}\!\big(w\odot \sigma_{\mathrm{in}}(x)\big),$ 09 and kicking parameters, over-erasion in weakly supervised action detection, the irreversibility of no-recall bandits and sequential screening, the reducibility limit $y=\sigma_{\mathrm{out}}\!\big(w\odot \sigma_{\mathrm{in}}(x)\big),$ 10 in asynchronous networks, and the nonexistence of strong balance for all sufficiently large $y=\sigma_{\mathrm{out}}\!\big(w\odot \sigma_{\mathrm{in}}(x)\big),$ 11 (Sudo et al., 2021, Zhong et al., 2018, Roy et al., 2017, Cohen et al., 2023, Albada et al., 2014, Adams et al., 25 Feb 2026).