DOSS-Select: Sparse Selection & Optimization

Updated 24 December 2025

DOSS-Select is a framework that implements sparse selection and pruning techniques to optimize performance across various domains while handling structural and resource constraints.
It employs specific algorithms such as double sparsity kernel learning, dispersion-aware sequence design, and diversity-pruned sampling to extract relevant features efficiently.
DOSS-Select demonstrates rigorous theoretical guarantees and empirical success, ensuring improved predictive accuracy, safe exploration, and decentralized sensor performance.

DOSS-Select encompasses a set of strategies and algorithms unified by selection, pruning, or sparsity optimization motifs. Its principle variants are found in diverse domains: sparse kernel learning, nonlinear sequence design for communications, deepfake detection, safe bandit optimization, and distributed sensor selection. Despite disparate application contexts, all DOSS-Select methods address the challenge of efficiently extracting, selecting, or activating relevant data, variables, or sequences to optimize predictive, transmission, or detection performance under structural or resource constraints.

1. Principles and Domain-Specific Definitions

"DOSS-Select" is canonically associated with double sparsity kernel learning, dispersion-aware sequence selection, diversity-pruned dataset construction, safety-optimized linear bandit algorithms, and distributed submodular selection.

Double Sparsity Kernel Learning: DOSK-Select formulates an RKHS estimator with simultaneous $\ell_1$ penalties on dual coefficients (support vector sparsity) and input variable weights (variable selection). The optimization balances empirical loss, data extraction, variable selection, and RKHS norm regularization (Chen et al., 2017).
Dispersion-Aware Sequence Selection: DOSS-Select (D-SS) in optical communication minimizes transmission degradation by scoring candidate $M$ -QAM symbol sequences via a dispersion-aware energy dispersion index (D-EDI) metric after channel-modelled chromatic dispersion steps (Liu et al., 2023).
Diversity-Optimized Data Pruning: In deepfake detection, DOSS-Select implements sample capping per attack domain (source-generator pair), enforcing a nearly uniform representation capped at $N_c$ , preserving a global real-to-fake ratio, and favoring diversity over volume (Huang et al., 20 Dec 2025).
Safe Linear Bandits: The DOSS algorithm employs double optimism: maximizing reward and relaxing safety constraints under uncertainty, leveraging dual estimates for unknown optima in polytope-constrained bandits (Gangrade et al., 2022).
Distributed Sensor Selection: As DOG/DOSS-Select, online submodular maximization with bandit feedback is performed in a communication-efficient, decentralized architecture for sensor query selection, maintaining low regret and resource demands (Golovin et al., 2010).

2. Core Mathematical Formulations

Several distinct DOSS-Select instantiations have well-defined optimization or selection statements:

Double Sparsity Kernel Selection: Given data $(x_i, y_i)$ , the estimator solves

$\min_{\alpha, w, b} \frac{1}{n} \sum_{i=1}^n L\bigl(y_i, f_{w,\alpha,b}(x_i)\,\bigr) + \lambda_1 \|\alpha\|_1 + \lambda_2 \|w\|_1 + \lambda_3 \alpha^{\top} K_w \alpha$

where $K_w(x, x') = K(w \odot x, w \odot x')$ and $f_{w,\alpha,b}(x) = \sum_{j=1}^n \alpha_j K_w(x, x_j) + b$ (Chen et al., 2017).

Dispersion-Aware Metric Selection: For a candidate complex 4D symbol sequence $X$ , D-EDI is: $\Psi_D[X] = \frac{1}{m_D+1} \sum_{N=0}^{m_D} \Psi[X^{(N)}]$ where $\Psi[\cdot]$ is the normalized variance of a sliding-window energy profile and $X^{(N)}$ is the sequence after $N$ chromatic dispersion steps (Liu et al., 2023).

Diversity-Pruning for Deepfake Data: For fake domain set $\mathcal{F}$ , real domains $\mathcal{R}$ , saturation cap $N_c$ , and ratio $\rho$ : $s_f = \min(n_f, N_c), \qquad F_r = \sum_{f: \mathrm{base}(f) = r} s_f, \qquad s_r = \rho F_r$ Samples are drawn without replacement for the pruned training set (Huang et al., 20 Dec 2025).

Safe Bandit Play: DOSS builds confidence sets for unknown reward and constraint vectors, defines an optimistic feasible set $\widehat{K}_t$ , and selects actions by maximizing estimated reward within $\widehat{K}_t$ (Gangrade et al., 2022).

Online Distributed Sensor Greedy: DOG runs $k$ EXP3 bandit subroutines to select $k$ sensors per round, with expected $(1-1/e)$ -regret

$R_T \leq O\bigl(k \sqrt{n \log n\, T}\bigr)$

under submodular utility feedback (Golovin et al., 2010).

3. Algorithmic Frameworks and Solution Methods

DOSS-Select variants exhibit structured yet domain-adapted solution schemes:

Alternating Optimization with Local Approximation: DOSK-Select cycles through updates for $\alpha$ (convex), $b$ (convex scalar), and $w$ (quadratic-approximate $\ell_1$ penalized), based on Taylor expansions for the kernel (Chen et al., 2017).
Candidate Evaluation via Metric Computation: In D-SS, sequences are generated via enumerative sphere shaping, dispersion is emulated, and D-EDI is computed for every candidate, retaining only those minimizing the metric (Liu et al., 2023).
Closed-Form Pruning: DOSS-Select in deepfake detection trivially applies caps, ratios, and mapping for sample counts per domain, enabling direct subsampling via standard set mapping and rounding (Huang et al., 20 Dec 2025).
Doubly-Optimistic Linear Bandit: DOSS algorithm maintains parameter/confidence estimates for reward and constraints, computes confidence radii, and selects optimistic feasible actions while updating estimates (Gangrade et al., 2022).
Distributed Sampling and Weight Update: DOG/DOSS-Select leverages Poisson multinomial sampling for sensor activation, distributed local normalization, and bandit-style multiplicative weight updates within stages (Golovin et al., 2010).

4. Theoretical Guarantees and Empirical Performance

DOSS-Select frameworks provide rigorous guarantees with demonstrated empirical efficacy:

Variant	Main Guarantee(s)	Empirical Highlights
Double Sparsity Kernel	$O_P(\log n / \sqrt n)$ $L_2$ error; variable selection consistency	Best test MSE or accuracy vs. COSSO, KNIFE, RFE+kernel, LASSO (Chen et al., 2017)
Dispersion-Aware D-SS	Up to +0.3 bits/4D-symbol GMI gain; matches SSFM-SS	Throughput gain in 205 km/2400 km links; >80% gain with reduced metric eval (Liu et al., 2023)
Deepfake Data Pruning	Outperforms naive aggregation using just 3% data	0.52% absolute EER improvement at $N_c=500$ ; further gains saturate after $N_c=2500$ (Huang et al., 20 Dec 2025)
Safe Linear Bandit	$O(\log^2 T)$ efficacy regret; $\widetilde O(\sqrt T)$ safety violation	Instance-dependent tight bounds, robust safety under unknown constraints (Gangrade et al., 2022)
Distributed Sensor Selection	No- $(1-1/e)$ -regret, $O(k \sqrt {n \log n\, T})$	Near-offline greedy performance, minimal communication, scaling to $n \sim 10^4$ (Golovin et al., 2010)

These results are primarily supported by formal proofs, convergence-rate analyses, or benchmarking against standard and contemporary baselines.

5. Implementation, Complexity, and Practical Considerations

Critical implementation details and operational guidance:

Hyperparameter Selection: DOSK-Select requires cross-validation of penalty and kernel parameters; D-SS needs block/sequence length, window, number of dispersion steps; deepfake pruning needed choices for $N_c$ and $\rho$ .
Scalability: For large $n$ or $p$ , DOSK-Select applies block-wise screening; D-SS metric calculation scales with candidate count but >80% of gain is attainable with sparse span evaluation; deepfake DOSS-Select supports rapid pruning down to $3\%$ of data.
Communication Efficiency: DOG uses only $O(k)$ broadcasts per round (with Poisson multinomial sampling and lazy normalization); OD-DOG adapts to observation-driven activation costs (Golovin et al., 2010).
Preprocessing: DOSK-Select requires predictor scaling; DOSS-Select for data maintains domain mappings and proportional class balance.
Robust Initialization and Avoidance of Local Minima: Randomized or uniform initialization, multiple runs, and domain-aware pre-screening are recommended to avoid poor local minima in nonconvex optimization procedures.

6. Domain-Specific Applications and Comparative Context

DOSS-Select is deployed for:

Sparse kernel regression/classification (Chen et al., 2017)
Nonlinear fiber-optic sequence shaping and selection for WDM/DMB links (Liu et al., 2023)
Out-of-domain generalizable speech deepfake detection with large heterogeneous datasets (Huang et al., 20 Dec 2025)
Safe action selection in unknown constraint industrial or online engineering applications (Gangrade et al., 2022)
Resource-constrained sensor network monitoring, environmental estimation, and anomaly detection (Golovin et al., 2010)

Each target domain motivates DOSS-Select’s design to maximize efficacy, sparsity, or generalizability within fundamental system, operational, or inferential constraints.

7. Broader Significance and Future Directions

The DOSS-Select paradigm, through principled selection, pruning, and sparsity-enforcing mechanisms, addresses the limitations of brute-force data aggregation, naive overparameterization, and risk-agnostic exploration. Its versatility is evidenced by broad deployment across machine learning, communications engineering, security, and network optimization. The dominance of diversity over raw volume in generalization, scalable distributed selection under submodular utilities, and dual guarantees of statistical efficiency and application-specific resource management suggest DOSS-Select approaches will continue to inform advanced learning, transmission, and detection frameworks where structured selection underpins robustness and interpretability.

Markdown Upgrade to Chat

References (5)

Double Sparsity Kernel Learning with Automatic Variable Selection and Data Extraction (2017)

Sequence Selection with Dispersion-Aware Metric for Long-Haul Transmission Systems (2023)

A Data-Centric Approach to Generalizable Speech Deepfake Detection (2025)

Safe Linear Bandits over Unknown Polytopes (2022)

Online Distributed Sensor Selection (2010)

Topic to Video (Beta)

No one has generated a video about this topic yet.

Whiteboard

No one has generated a whiteboard explanation for this topic yet.

Follow Topic

Get notified by email when new papers are published related to DOSS-Select.