Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
126 tokens/sec
GPT-4o
28 tokens/sec
Gemini 2.5 Pro Pro
42 tokens/sec
o3 Pro
4 tokens/sec
GPT-4.1 Pro
38 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

CM-NAS: Cross-Modality Neural Architecture Search for Visible-Infrared Person Re-Identification (2101.08467v3)

Published 21 Jan 2021 in cs.CV

Abstract: Visible-Infrared person re-identification (VI-ReID) aims to match cross-modality pedestrian images, breaking through the limitation of single-modality person ReID in dark environment. In order to mitigate the impact of large modality discrepancy, existing works manually design various two-stream architectures to separately learn modality-specific and modality-sharable representations. Such a manual design routine, however, highly depends on massive experiments and empirical practice, which is time consuming and labor intensive. In this paper, we systematically study the manually designed architectures, and identify that appropriately separating Batch Normalization (BN) layers is the key to bring a great boost towards cross-modality matching. Based on this observation, the essential objective is to find the optimal separation scheme for each BN layer. To this end, we propose a novel method, named Cross-Modality Neural Architecture Search (CM-NAS). It consists of a BN-oriented search space in which the standard optimization can be fulfilled subject to the cross-modality task. Equipped with the searched architecture, our method outperforms state-of-the-art counterparts in both two benchmarks, improving the Rank-1/mAP by 6.70%/6.13% on SYSU-MM01 and by 12.17%/11.23% on RegDB. Code is released at https://github.com/JDAI-CV/CM-NAS.

Citations (82)

Summary

Overview of CM-NAS: Cross-Modality Neural Architecture Search for Visible-Infrared Person Re-Identification

The paper introduces a pioneering approach in the field of Visible-Infrared person re-identification (VI-ReID) by proposing a novel method called Cross-Modality Neural Architecture Search (CM-NAS). VI-ReID seeks to match pedestrian images across visible and infrared modalities, addressing limitations inherent in single-modality person identification, particularly in low-light environments. The primary challenge within VI-ReID is the substantial modality discrepancy resulting from the differing wavelengths between visible and infrared imagery.

Key Findings and Methodology

Traditionally, VI-ReID tasks have relied on manually crafted two-stream architectures aimed at learning both modality-specific and modality-sharable features. These architectures often require extensive empirical tuning and significant experimental effort. In their work, the authors have highlighted the importance of appropriately separating Batch Normalization (BN) layers across modalities, noting that such separation is pivotal in enhancing cross-modality matching performance.

CM-NAS Framework

CM-NAS is designed to automatically determine the optimal separation scheme for BN layers in neural networks. The authors developed a BN-oriented search space that supports architecture optimization specific to cross-modality tasks. This is a marked improvement over existing NAS methods focused on single-modality tasks, like Auto-ReID, which fail to address the modal discrepancies inherent in VI-ReID tasks.

The paper systematically analyzes 195 different manually designed architectures, concluding that separations at the level of BN layers instead of entire blocks are critical. Leveraging NAS techniques, CM-NAS efficiently navigates the combinatorial complexity of the separation scheme space, which is infeasible to explore manually due to its size — on the order of 2532^{53} architectures considering a ResNet50 backbone.

Numerical Results

The empirical evaluation demonstrates that CM-NAS outperforms state-of-the-art methods across two benchmark datasets: SYSU-MM01 and RegDB. Specifically, CM-NAS improves Rank-1 accuracy and mAP on SYSU-MM01 by 6.70% and 6.13%, respectively, in a single-shot, all-search setting. On the RegDB dataset, improvements are even more pronounced, with Rank-1 and mAP scores boosted by 12.17% and 11.23%, respectively.

Implications and Future Work

The implications of this research extend beyond just the immediate problem of VI-ReID. The introduction of a cross-modality NAS approach could influence other fields where heterogeneous data sources need effective integration. The automatic architecture search tailored to modality-specific challenges could be pivotal in enhancing performance while reducing the labor-intensive manual design process prevalent in such tasks.

Future work could explore expanding CM-NAS to more complex network architectures or different sensor modalities, potentially integrating temporal dynamics for video-based surveillance systems. Moreover, as the field advances, the integration of more sophisticated loss functions and optimization strategies could further bolster cross-modality representation learning paradigms.

This innovative approach sets a compelling precedent for automated architecture search methodologies within domains characterized by significant data heterogeneity, marking progress toward more robust and scalable VI-ReID solutions.