Learning with Average Precision: Training Image Retrieval with a Listwise Loss (1906.07589v1)

Published 18 Jun 2019 in cs.CV

Abstract: Image retrieval can be formulated as a ranking problem where the goal is to order database images by decreasing similarity to the query. Recent deep models for image retrieval have outperformed traditional methods by leveraging ranking-tailored loss functions, but important theoretical and practical problems remain. First, rather than directly optimizing the global ranking, they minimize an upper-bound on the essential loss, which does not necessarily result in an optimal mean average precision (mAP). Second, these methods require significant engineering efforts to work well, e.g. special pre-training and hard-negative mining. In this paper we propose instead to directly optimize the global mAP by leveraging recent advances in listwise loss formulations. Using a histogram binning approximation, the AP can be differentiated and thus employed to end-to-end learning. Compared to existing losses, the proposed method considers thousands of images simultaneously at each iteration and eliminates the need for ad hoc tricks. It also establishes a new state of the art on many standard retrieval benchmarks. Models and evaluation scripts have been made available at https://europe.naverlabs.com/Deep-Image-Retrieval/

Citations (343)

View on Semantic Scholar

Summary

The paper introduces a novel listwise loss that directly optimizes mean Average Precision, eliminating the need for heuristic pre-training and complex tuning.
It employs histogram binning for differentiable approximations, enabling simultaneous ranking of thousands of images with improved computational feasibility.
Extensive experiments on Oxford and Paris benchmarks show the approach outperforms traditional methods in retrieval accuracy and efficiency.

Learning with Average Precision: Training Image Retrieval with a Listwise Loss

The paper presented focuses on enhancing image retrieval systems by directly optimizing for a key performance metric, the mean Average Precision (mAP). Contemporary deep learning models for image retrieval often rely on ranking loss functions indirectly optimizing retrieval quality and often involve substantial engineering efforts such as ad hoc pre-training phases and strategies for mining difficult cases. This paper proposes a novel approach that optimizes mAP using listwise loss formulations, leveraging histogram binning approximations to render AP differentiable, thus facilitating seamless end-to-end learning within deep neural networks.

Methodology

The authors contend that prevalent ranking losses used in image retrieval systems act as upper bounds over essential losses, which themselves upper-bound standard retrieval metrics like mAP. This indirect optimization risks either lack of alignment with the actual retrieval metrics or the necessity for numerous heuristic tricks, complicating implementation and hyperparameter tuning. Hence, the significance of developing a more theoretically sound and practically efficient loss function is emphasized.

A haLLMark feature of the proposed methodology is its exploitation of listwise loss, which evaluates and ranks thousands of images simultaneously, as opposed to a limited subset typically used in local ranking losses such as those operating on pairs or triplets of images. This approach bypasses the labor-intensive steps traditionally required, simplifying the pipeline while improving mAP scores across standard benchmarks.

Utilizing recent advancements in listwise loss and differentiable approximations via histogram binning, the paper successfully integrates this loss formulation within a large-scale backpropagation framework capable of handling high-resolution images in a computationally feasible manner. Crucially, the described optimization scheme can accommodate large batch sizes irrespective of image resolution, an attribute crucial for maintaining high performance levels.

Empirical Results

Extensive experiments have been conducted, with results affirming the superiority of the proposed approach over existing state-of-the-art methods on standard benchmarks like the Revisited Oxford and Paris datasets. In elaborate ceteris paribus analyses, the proposed listwise loss consistently demonstrates higher retrieval efficiency without the need for substantial sample mining or pre-training requirements, marking a significant reduction in computational complexity and training time compared with traditional losses.

Furthermore, the paper's use of generalized-mean (GeM) pooling, a preferred alternative to regional maximum activation of convolutions (R-MAC), contributes positively to the retrieval performance metrics by fostering more discriminative image representations.

Implications and Future Directions

The work presented marks a notable advancement in image retrieval by shedding light on the benefits of direct optimization techniques tailored to the actual evaluation metrics. By offering a blueprint for effectively incorporating listwise learning methodologies, this research provides a compelling case for refining image retrieval systems beyond instance level retrieval.

The findings underline key considerations for practitioners and researchers in the field, signaling a potential paradigm shift toward adopting more holistic learning approaches that align more closely with retrieval objectives. Future avenues could explore integrating the mAP optimization strategy with other domains within AI that benefit from ranked outputs, such as recommendation systems or automated content curation platforms, paving the way for cross-disciplinary applications of the findings in this paper.

In conclusion, by mitigating the requirement for elaborate auxiliary processes and placing emphasis on optimizing retrieval-relevant metrics directly, this research offers a robust enhancement to current image retrieval models and opens future research pathways that can further exploit listwise learning paradigms in AI.

PDF Markdown