Adaptive Non-Maximal Suppression (ANMS) Overview

Updated 30 June 2025

Adaptive Non-Maximal Suppression (ANMS) is a context-aware method that dynamically adjusts suppression criteria to preserve true positives and ensure even spatial distribution.
It tailors suppression based on local detection density and instance difficulty, effectively improving object localization in crowded or ambiguous environments.
By integrating rule-based and learnable strategies, ANMS overcomes limitations of fixed-threshold NMS, leading to enhanced accuracy and reduced false positives.

Adaptive Non-Maximal Suppression (ANMS) refers to a class of methods that generalize the standard non-maximal suppression (NMS) procedure by introducing adaptivity—either through rules, learned strategies, or context awareness—when deciding which detections or features to retain. While classical NMS operates with fixed, globally applied suppression criteria, ANMS adapts these criteria based on instance, local context, density, or additional signal, in order to preserve true positives, improve localization in crowded or ambiguous settings, and achieve more evenly distributed or meaningful selections in spatial or feature space.

1. Foundational Principles and Motivation

Non-Maximal Suppression is a ubiquitous post-processing step in computer vision, used in object detection to select one representative bounding box for each object among overlapping detections, and in feature detection to extract salient, spatially well-distributed interest points. The canonical greedy NMS uses a global Intersection over Union (IoU) or response threshold to suppress overlapping or non-maximal candidates. This uniform approach, while simple, creates an inherent precision–recall tradeoff: low thresholds can boost recall at the cost of higher false positives; high thresholds often suppress true positives in crowded or overlapping cases (Hosang et al., 2015, Bodla et al., 2017).

Adaptive Non-Maximal Suppression seeks to overcome these limitations by introducing adaptivity into the suppression process, allowing the method to consider local context, detection density, instance difficulty, or additional semantics such as feature appearance or learned scene context. The core objective is to avoid systematic error modes of fixed-threshold NMS: missed detections in crowded scenes, uneven spatial coverage, or over-suppression of valid instances.

2. Key Adaptive NMS Variants and Algorithms

Several distinct approaches to ANMS have emerged, reflecting its applicability across different tasks:

a. Rule-Based Adaptive Thresholding

An early ANMS method, frequently used in feature detection, computes an adaptive suppression radius for each candidate by measuring its distance to the nearest stronger response. Given a set of points $S = \{k_i\}$ with strengths $s_i$ , for each keypoint:

$r_i = \min_{k_j \in S,\, s_j > s_i} \| (x_i, y_i) - (x_j, y_j) \|$

Keypoints are then sorted descending by $r_i$ and the top $N$ are selected. This suppresses weaker points near stronger ones, resulting in a spatially uniform, high-quality distribution [Brown et al., CVPR 2005; (Syed et al., 16 Jun 2025)]. SuperPoint-SLAM3 implements this ANMS variant, showing major reductions in SLAM drift and error (Syed et al., 16 Jun 2025).

b. Density-Aware Suppression Thresholds

In crowded object detection, ANMS can dynamically set the suppression threshold for an instance by estimating the local density of objects. In the adaptive NMS algorithm (Liu et al., 2019), a small subnetwork regresses a "density" score per detection, defined as the maximum IoU overlap with nearby ground-truth objects. The suppression threshold for each detection is then set as

$N_{M} = \max(N_t, d_M)$

where $N_t$ is the base NMS threshold and $d_M$ is the predicted density. This increases the threshold in dense regions, preventing the suppression of true positives that are close together, and maintains strict suppression in sparse areas.

c. Learnable and Contextual Suppression

Recent approaches propose fully learnable suppression mechanisms. The Tyrolean network (Tnet) (Hosang et al., 2015) introduces a convolutional network trained to map detection score maps and IoU patterns to a new set of scores, effectively learning when and how much suppression to apply in a data-driven way. Similarly, Gnet (Hosang et al., 2017) uses message-passing neural blocks to collectively rescore bounding boxes, suppressing duplicates in context-aware fashion. These models move beyond hand-designed adaptation, offering fine-grained, contextually aware adaptive suppression.

d. Appearance- and Embedding-Based Criteria

FeatureNMS (Salscheider, 2020) supplements geometric overlap with learned appearance embeddings. If two overlapping detections have similar appearance embeddings (small L2 distance), they are more likely to be suppressed as duplicates, even in high-overlap situations, improving recall and precision in crowded scenes.

e. Asynchronous and Event-Driven ANMS

For event-based vision, ANMS has been extended to asynchronous streams (Zhou et al., 2023). Here, non-maximal suppression is performed in continual time per event, using "decayed scores" that account for both spatial and temporal proximity, enabling suppression on fine-grained temporal scales with negligible latency.

3. Practical Impact and Benchmarks

ANMS methods consistently outperform traditional NMS in scenarios characterized by high object density, occlusion, or perceptual ambiguity:

On the CityPersons pedestrian benchmark (Liu et al., 2019), Adaptive NMS reduced log-average miss-rate from 14.5% (Greedy NMS, Faster R-CNN) to 12.9%, and on CrowdHuman, from 52.35% to 49.73% (MR $^{-2}$ , FPN), with the largest gains for crowded subsets (density > 0.7).
SuperPoint-SLAM3 with ANMS achieved a reduction in mean translational error from 4.15% (ORB-SLAM3) to 0.34%, and mean rotational error from 0.0027 deg/m to 0.0010 deg/m on KITTI Odometry, demonstrating the impact of spatially uniform feature coverage for SLAM (Syed et al., 16 Jun 2025).
In detection evaluation, learnable adaptive NMS (Tnet) yielded substantial gains in average recall, especially at high occlusion or crowded regions, surpassing the entire range of possible traditional NMS thresholds (Hosang et al., 2015, Hosang et al., 2017).

4. Implementation Considerations

Computational Overhead: The dynamic adaptation in ANMS methods often adds negligible computation—adaptive rules (radius, density) are cheap to evaluate, and density subnets are lightweight (Liu et al., 2019).
Integration: ANMS variants are typically plug-and-play into standard detection pipelines, requiring only minor changes (e.g., adding a suppression threshold vector, replacing hard-threshold logic, or adding a lightweight subnet head) (Liu et al., 2019).
Scalability: With the rise of massively parallel and hardware-optimized NMS implementations (Oro et al., 1 Feb 2025, Si et al., 30 Sep 2024), even data-adaptive suppression can be applied to thousands of detections per frame in real time.

Methodology	Adaptation Target	Efficiency	Reported Impact
Suppression radius (ANMS, SuperPoint-SLAM3)	Feature coverage	O(N log N), trivial in postproc.	Halves SLAM error on KITTI/EuRoC
Density-adaptive (ANMS (Liu et al., 2019))	Crowding, occlusion	O(N²⁾	Reduces miss rate by 1–3% (CrowdHuman)
Feature-based (FeatureNMS (Salscheider, 2020))	Appearance/semantics	Small embedding head	AP up by 2–3% on CrowdHuman
Learnable NMS (Tnet, Gnet)	Full context	CNN at NMS step	Surpasses recall/precision of all NMS

Soft-NMS (Bodla et al., 2017) decays box scores as a continuous function of IoU, offering a simple form of adaptivity over fixed-threshold NMS. However, it lacks explicit context or density dependence and is less effective than density- or context-based ANMS in high-density settings.
IoU-Aware Calibration (Gilg et al., 2023) replaces suppression by confidence calibration, conditioning the calibrated probability on the overlap to more confident detections, without explicit suppression. This also addresses the duplicate modeling problem, with further calibration gains.
Graph- and Pruning-Based Fast ANMS (Si et al., 30 Sep 2024, Oro et al., 1 Feb 2025) provide efficient, scalable core routines for NMS and ANMS by leveraging graph structure (WCCs) or massively parallel hardware.

6. Limitations and Open Directions

Learning-Free vs. Learning-Based ANMS: Rule-based ANMS is easier to deploy but may not optimally capture all interactions—learnable approaches (e.g., Tnet, Gnet) require additional data and supervision but can adapt to more complex context.
Extension to Multi-Class and Multi-Modal Detection: Some ANMS techniques are task-specific (e.g., crowd scenes); further generalization is ongoing (Liu et al., 2019, Hosang et al., 2015).
Asynchronous and Online Settings: Event-based ANMS (Zhou et al., 2023) demonstrates the domain can benefit from further adaptation to non-synchronous data.

7. Applications and Broader Implications

ANMS is widely applicable wherever the selection of salient or non-redundant points is critical and spatial/non-spatial context is variable:

Dense object detection: Pedestrian or vehicle detection in crowds, cellular/microscopy imaging.
SLAM and visual odometry: Uniform and robust keypoint selection.
Event-based and neuromorphic vision: Real-time, low-latency feature selection.
Resource-constrained and real-time robotics: Efficient, uniform, and adaptive feature extraction for downstream processing.

In summary, Adaptive Non-Maximal Suppression encompasses a class of methods that enhance non-maximal suppression by adapting suppression parameters based on detection strength, local context, spatial density, or learned criteria. This adaptation leads to improved accuracy, robustness, and spatial coverage, especially in crowded, ambiguous, or dynamically varying environments.