Distance-Based Randomized Smoothing
- Distance-based first-order randomized smoothing is a defense framework that certifies discrete sequence classifiers against adversarial insertions, deletions, and substitutions.
- It employs randomized deletion sampling and first-order linear approximations to derive certified robustness guarantees based on edit-distance metrics.
- Empirical evaluations, such as in malware detection, demonstrate a strong trade-off between clean accuracy and certified robustness radius.
Distance-based first-order randomized smoothing encompasses a class of certified defenses for discrete sequence classifiers, wherein robustness guarantees are derived against adversarial edits measured by a sequence metric such as Levenshtein (edit) distance. Unlike smoothing for continuous domains with -norm constraints, these methods specifically address the adversarial threat model of insertions, deletions, and substitutions in discrete or variable-length data, notably source code and raw binary sequences. A principal instantiation of this approach is the RS-Del mechanism, which constructs robustness certificates via the randomized application of deletion edits, leveraging first-order linear bounds to relate adversarial edit distance to prediction invariance.
1. Formalism of Randomized Deletion Smoothing
Let denote a finite alphabet and the space of all finite sequences. Given a base classifier , randomized smoothing constructs a smoothed classifier via the sampling of random deletions:
- For sequence with length , index positions by .
- A deletion edit selects the positions to survive; the resulting subsequence is .
- The deletion distribution independently deletes each position with probability , such that:
- The smoothing distribution over edited sequences is:
- Smoothed class probabilities are given by:
with optional class-specific thresholds . The predicted class is .
This framework enables robustness certification against edit-based adversaries, notably those bounded by Levenshtein distance.
2. Certification of Edit-Distance Robustness
Robustness certification is achieved by establishing that the class prediction remains invariant under any input within an edit distance of , where edits include insertions, deletions, and substitutions.
Main theorem (edit-distance guarantee):
- Let the winning class be with confidence .
- Define
- The certified radius is
guaranteeing for all with .
The relationship between the smoothed confidence and edit distance is further quantified using longest common subsequence (LCS) bounds:
For all within Levenshtein distance , a conservative lower bound is , from which is derived by requiring .
3. First-Order Approximation and Monotonicity
Certificate computation harnesses two monotonicity properties under adversarial edits, constituting what may be termed a first-order bound (Editor's term):
- Each symbol deletion reduces the smoothed class probability by a factor of per deletion.
- Insertions and substitutions degrade the LCS overlap, incurring additional exponentiation in the bound as .
These linear-in-the-exponent relationships approximate the Neyman-Pearson lemma, which is otherwise computationally intractable for discrete sequence editing threat models.
4. Sampling-Based Certification Algorithm
Certification is performed via Monte-Carlo approximation, significantly more scalable than exhaustive enumeration, which is intractable ( subsets):
- Draw samples , tally base-model votes to estimate for all .
- Predict .
- Draw further samples to obtain lower confidence bound (LCB) on true .
- Compute largest satisfying ; abstain if , otherwise certify radius .
Computational complexity is , where is the cost of one base-model inference on a sequence of expected length . Empirical practice suggests -- yield reliable certificates.
5. Empirical Evaluation: Malware Detection Case Study
RS-Del was evaluated on the MalConv binary classifier for raw-byte Windows executables, input length truncated/padded to 2MB, with deletion probability . Key results include:
- Certified accuracy at edit-distance radius $128$ bytes: (fraction of files both correctly classified and provably robust at that radius).
- Trade-off profile on Sleipnir2 dataset: At , clean accuracy was , with median certified radius $137$ bytes ( of file).
- Increasing above increases certificate radius but decreases clean accuracy.
- RS-Del achieves strictly higher certified accuracy versus certificate radius compared to randomized-ablation smoothing (which exclusively targets Hamming-distance substitutions), highlighting effectiveness for the larger Levenshtein threat model.
6. Comparison With Related Smoothing Frameworks
The RS-Del mechanism represents a notable extension of randomized smoothing to discrete domains, directly addressing adversarial edits quantified by edit distance, in contrast to prior work limited to -norm or Hamming distance. Importantly, deleting alone suffices to confer robustness to deletions, insertions, and substitutions due to the role of longest common subsequence. This suggests randomized smoothing can be adapted to variable-length discrete data with appropriately chosen edit distributions and threat models.
Randomized smoothing in continuous regimes yields robustness to adversaries; RS-Del generalizes this logic to sequence classifiers, substituting deletion operations for additive Gaussian noise. The first-order nature of the certificate reflects a linear approximation suitable for practical large-scale certification. Comparison with alternative frameworks reveals that the RS-Del mechanism offers enhanced certified accuracy particularly in domains where edit-based adversaries are realistic and impactful, such as malware detection.
7. Limitations and Practical Considerations
The RS-Del certificate's efficacy depends critically on the choice of deletion probability and base classifier reliability. Higher values yield larger certified radii but may reduce clean accuracy. The Monte-Carlo nature of the algorithm enables scalability on sequences of practical length but does so with only a first-order bound, not a tight Neyman-Pearson certificates obtainable in specific continuous settings.
Empirical results demonstrate strong performance in malware detection, but the approach may be constrained by domain-specific requirements such as sequence alphabet size and application threat models. A plausible implication is that while randomized smoothing via deletion supports edit-distance certification, the trade-off between clean accuracy and certified robustness must be closely managed in deployment, and domain adaptation may be necessary for non-binary or highly structured sequence data.