Contrastive Learning with Large Memory Bank and Negative Embedding Subtraction for Accurate Copy Detection (2112.04323v1)

Published 8 Dec 2021 in cs.CV

Abstract: Copy detection, which is a task to determine whether an image is a modified copy of any image in a database, is an unsolved problem. Thus, we addressed copy detection by training convolutional neural networks (CNNs) with contrastive learning. Training with a large memory-bank and hard data augmentation enables the CNNs to obtain more discriminative representation. Our proposed negative embedding subtraction further boosts the copy detection accuracy. Using our methods, we achieved 1st place in the Facebook AI Image Similarity Challenge: Descriptor Track. Our code is publicly available here: \url{https://github.com/lyakaap/ISC21-Descriptor-Track-1st}

Citations (21)

View on Semantic Scholar

Summary

The paper introduces a multi-step contrastive learning framework using EfficientNetV2, a large memory bank, and negative embedding subtraction to improve copy detection.
It demonstrates significant performance gains by achieving top precision and recall metrics in challenging image manipulation scenarios.
The approach provides robust insights for image retrieval and digital rights management, highlighting scalable methods for future visual verification tasks.

Contrastive Learning with Large Memory Bank and Negative Embedding Subtraction for Effective Copy Detection

The paper presents a notable contribution in the domain of computer vision, particularly in the task of copy detection. The authors address the problem of identifying whether an image is a modified version of another image in a database. Traditional methods have struggled with this task due to the complexity of varied image manipulations and the vast size of image databases. This research leverages convolutional neural networks (CNNs) trained with contrastive learning to develop highly discriminative image representations, which mitigate the issues faced in existing approaches.

The main components of their approach include the use of EfficientNetV2 trained with a multi-step contrastive learning pipeline, the incorporation of a large memory bank, and a novel post-process called negative embedding subtraction. These advances prove central to achieving high copy detection accuracy, as evidenced by their top placement in the Facebook AI Image Similarity Challenge: Descriptor Track.

Methodological Contributions

The core technical contributions are structured into three primary innovations:

Multi-step Training with Contrastive Learning: The authors employ a progressive learning methodology with a carefully crafted data augmentation strategy that aligns with the types of manipulations seen in the competition dataset (DISC21). The CNNs are trained with varying input resolutions and augmentation magnitudes across multiple stages. This shift aligns with the target of broadening the model's capability to learn increasingly complex representations.
Negative Embedding Subtraction: A novel post-process technique that enhances the discriminative power of representations. By subtractively isolating descriptor vectors from hard negative samples, the method effectively distinguishes copied images from distractors by refining feature space representation.
Augmentation Pipeline: The comprehensive data augmentation strategy, pivotal to their approach, spans a range of manipulations from basic geometric transformations to advanced pixelation techniques, mimicking real-world image modifications.

Empirical Results

The empirical evaluation demonstrates significant improvements over baseline methods. In particular, the use of ground-truth pairs in training, despite the constraints on augmenting query and reference images, provided a notable boost in performance metrics such as micro-average precision (\textmu AP) and Recall@P90. The integration of negative embedding subtraction yielded substantial performance increments, highlighting its effectiveness.

During the competition, the proposed approach was validated against a diverse set of images, including those with complex manipulations that were unseen during training. The authors outperformed all other participants who disclosed their full methodologies, indicating the robustness of their pipeline.

Implications and Future Directions

From a theoretical perspective, this paper emphasizes the utility of contrastive learning frameworks in image retrieval tasks, particularly when combined with adversarially-designed augmentation strategies. Practically, the robustness of the approach suggests significant applications in digital rights management and content verification across social platforms.

For future research directions, the exploration of larger and more diverse datasets could further validate the generality of the proposed methods. Additionally, investigating the transferability of these techniques to other domains of image recognition, where contrastive learning might similarly unveil hidden patterns, appears promising. The scalability of the negative embedding subtraction method also warrants exploration, potentially leveraging more sophisticated similarity metrics or hierarchical embedding spaces.

Overall, this paper presents a methodologically sound and practically impactful advancement in the field of copy detection, offering comprehensive insights into the nuances and potential trajectories for future research in visual similarity and retrieval tasks.

PDF Markdown

Related Papers

GitHub

GitHub - lyakaap/ISC21-Descriptor-Track-1st: The 1st Place Solution of the Facebook AI Image Similarity Challenge (ISC21) : Descriptor Track. (135 stars)

Tweets

https://twitter.com/lyakaap/status/1469189162100469761