Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
119 tokens/sec
GPT-4o
56 tokens/sec
Gemini 2.5 Pro Pro
43 tokens/sec
o3 Pro
6 tokens/sec
GPT-4.1 Pro
47 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

Localization with Sampling-Argmax (2110.08825v1)

Published 17 Oct 2021 in cs.CV and cs.LG

Abstract: Soft-argmax operation is commonly adopted in detection-based methods to localize the target position in a differentiable manner. However, training the neural network with soft-argmax makes the shape of the probability map unconstrained. Consequently, the model lacks pixel-wise supervision through the map during training, leading to performance degradation. In this work, we propose sampling-argmax, a differentiable training method that imposes implicit constraints to the shape of the probability map by minimizing the expectation of the localization error. To approximate the expectation, we introduce a continuous formulation of the output distribution and develop a differentiable sampling process. The expectation can be approximated by calculating the average error of all samples drawn from the output distribution. We show that sampling-argmax can seamlessly replace the conventional soft-argmax operation on various localization tasks. Comprehensive experiments demonstrate the effectiveness and flexibility of the proposed method. Code is available at https://github.com/Jeff-sjtu/sampling-argmax

User Edit Pencil Streamline Icon: https://streamlinehq.com
Authors (6)
  1. Jiefeng Li (22 papers)
  2. Tong Chen (200 papers)
  3. Ruiqi Shi (1 paper)
  4. Yujing Lou (10 papers)
  5. Yong-Lu Li (47 papers)
  6. Cewu Lu (203 papers)
Citations (13)

Summary

Localization with Sampling-Argmax: A Differentiable Approach to Improve Detection-Based Tasks

The paper "Localization with Sampling-Argmax" addresses a fundamental challenge in computer vision: the localization of target positions using neural networks. The common approach involves the use of probability maps and the soft-argmax operation to approximate the traditional, non-differentiable argmax operation. However, the authors identify intrinsic limitations in the soft-argmax method related to unconstrained probability map shapes, which can lead to performance degradation due to lack of pixel-wise supervision.

Core Proposal

The paper introduces a novel method called sampling-argmax, which serves as a differentiable substitute to the soft-argmax operation. The method applies implicit constraints to the probability map's shape by shifting the training objective from minimizing the error of the expectation of localization to minimizing the expectation of the error itself. This is accomplished through a continuous formulation of the target distribution and a differentiable sampling technique.

Methodological Insights

The central idea involves modelling the target distribution as a mixture distribution. The continuous approximation of the probability maps enables the use of standard basis functions like uniform, triangular, or Gaussian to represent sub-intervals within the distribution. Sampling-argmax then draws samples through a differentiable pipeline utilizing the Gumbel-softmax operation to approximate categorical sampling, thus ensuring a differentiable path from sampling to model parameters.

Experimental Evaluation

The authors empirically validate the effectiveness of sampling-argmax across varying localization tasks, including 2D human pose estimation, 3D pose from RGB, retina segmentation from OCT, and object keypoint estimation from point clouds.

  • 2D Human Pose Estimation on COCO Keypoint: Sampling-argmax, especially with triangular bases, significantly outperforms conventional soft-argmax and its variants, with improvements in mAP ranging from 5.3 mAP with respect to soft-argmax alone.
  • 3D Pose Estimation on Human3.6M: The method provides consistent improvements over the baselines in terms of MPJPE and PA-MPJPE metrics.
  • Retina Segmentation from OCT: Demonstrates superior mean absolute distance (MAD) and standard deviation (Std. Dev.) over alternative methods.
  • Object Keypoint Estimation from Point Clouds: The flexibility and adoption of Gaussian bases showed effectiveness in both supervised and unsupervised settings, indicating improved accuracy metrics across 16 categories.

Implications and Future Directions

The work represents a significant methodological advancement in enhancing the flexibility and accuracy of detection-based localization tasks. Practically, the improved localization and well-calibrated probability maps contribute to more reliable predictions, which are critical in various real-world applications. Theoretically, this technique highlights the importance of incorporating differentiable sampling in neural network training. Future developments may include extending this approach to learn adaptive basis functions, potentially using normalizing flows to achieve even more accurate and flexible approximations of the underlying probability distributions.

Overall, the sampling-argmax method presents a robust solution to the shortcomings of conventional soft-argmax operations, enabling better-calibrated probability maps and improving the granularity of localization in complex tasks.

Github Logo Streamline Icon: https://streamlinehq.com