QAConv-QA: Query Adaptive Convolution for ReID

Updated 3 February 2026

The paper introduces a module that integrates pixel-level importance weighting with bidirectional consistency, enhancing identity matching under severe clothing changes.
It fuses RGB and parsing-based features through multi-modal attention and uses dynamic query-adaptive convolution to generate robust, clothing-invariant representations.
Evaluations on PRCC, LTCC, and VC-Clothes benchmarks show significant Top-1 and mAP improvements, validating the method's effectiveness in CC-ReID.

Quality-Aware Query-Adaptive Convolution (QAConv-QA) is a module designed to enhance pixel-level matching within the dual-branch QA-ReID architecture, targeting the challenges of person re-identification (ReID) under severe clothing changes. QAConv-QA introduces two critical mechanisms—pixel-level importance weighting and explicit bidirectional consistency constraints—that together facilitate robust identity correspondence even as superficial appearance varies with clothing. This approach proves essential in clothes-changing ReID (CC-ReID), a setting characterized by strong intra-person appearance shifts.

1. Role of QAConv-QA in the Dual-Branch QA-ReID Framework

QAConv-QA is embedded within the two-branch backbone of QA-ReID, which utilizes complementary cues from RGB images and clothing-invariant structural features. The RGB branch extracts feature maps from the full image using ResNet-50 up to stage 3, producing $F_{rgb}\in\mathbb R^{C\times H\times W}$ . The parsing branch applies a human-parsing network to produce a body-part mask $M_{body}\in\{0,1\}^{H'\times W'}$ , removing the clothing regions to form $I_{par}=I\odot M_{body}$ and generating $F_{par}\in\mathbb R^{C\times H\times W}$ .

A multi-modal attention fusion module computes a joint attention map $\omega\in [0,1]^{C\times H\times W}$ , blending $F_{rgb}$ and $F_{par}$ into a fused feature map $F_{fuse}$ via: $F_{mix} = \omega\odot F_{rgb}+(1-\omega)\odot F_{par}$

$F_{fuse} = \text{Conv}_{1\times1}(F_{rgb} + F_{par} + F_{mix})$

QAConv-QA directly operates on these fused features at the pixel level, comparing query and gallery images ( $F^q_{fuse}$ , $F^g_{fuse}$ ) through a sequence of similarity calculations, weighting, and aggregation, followed by a post-processing head (bidirectional global max pooling $\rightarrow$ batch norm $\rightarrow$ MLP $\rightarrow$ sigmoid) to yield match probabilities $p_{qg}$ (Wang et al., 27 Jan 2026).

2. Pixel-Level Importance Weighting

Each spatial location $(i,j)$ on the $F_{fuse}$ feature map receives a quality score $Q_{i,j}$ reflecting the likelihood that it lies on an identity-relevant (typically non-clothing) region. The score $\bar Q_{i,j}$ is computed as the fraction of the corresponding $k\times k$ input patch covered by the body-part mask, and is normalized by a spatial softmax: $Q_{i,j} =\frac{\exp(\bar Q_{i,j})}{\sum_{h=1}^H\sum_{w=1}^W \exp(\bar Q_{h,w})}$ The pairwise cosine similarity between query and gallery pixel features $f^q_{i_1,j_1}$ , $f^g_{i_2,j_2}$ is then re-weighted: $\mathrm{sim}^1(f^q_{i_1,j_1},f^g_{i_2,j_2}) = Q^q_{i_1,j_1}\cdot Q^g_{i_2,j_2} \cdot \rho(f^q_{i_1,j_1},f^g_{i_2,j_2})$ This mechanism prioritizes features that localize to identity-stable, body-based regions and suppresses the influence of clothing-related areas.

3. Bidirectional Consistency Constraints

To further enhance reliability in pixel-level matching, QAConv-QA introduces explicit bidirectional consistency. Conditional softmaxes are defined over feature locations, establishing the probability that a given pixel in one sample is the best match for a pixel in the other, and vice versa: $\bar\rho(f^q_{i_1,j_1}\mid f^g_{i_2,j_2}) = \frac{\exp(\mathrm{sim}^1(f^q_{i_1,j_1},f^g_{i_2,j_2}))}{\sum_{h,w} \exp(\mathrm{sim}^1(f^q_{h,w},f^g_{i_2,j_2}))}$

$\bar\rho(f^g_{i_2,j_2}\mid f^q_{i_1,j_1}) = \frac{\exp(\mathrm{sim}^1(f^q_{i_1,j_1},f^g_{i_2,j_2}))}{\sum_{h,w} \exp(\mathrm{sim}^1(f^q_{i_1,j_1},f^g_{h,w}))}$

The bidirectional-consistent similarity takes the product: $\mathrm{sim}^2(f^q_{i_1,j_1},f^g_{i_2,j_2}) = \bar\rho(f^q_{i_1,j_1}\mid f^g_{i_2,j_2}) \cdot \bar\rho(f^g_{i_2,j_2}\mid f^q_{i_1,j_1})$ Aggregating $\mathrm{sim}^2$ over all pixel pairs with bidirectional global maximum pooling yields a scalar score $s_{q,g}$ , emphasizing only mutually top-matching, identity-consistent region pairs.

4. Query-Adaptive Convolution and Dynamic Filtering

QAConv-QA adopts a dynamic filter paradigm inspired by the original QAConv formulation [Shengcai Liao & Ling Shao, ECCV 2020], where each query pixel feature $f^q_{i,j}$ acts as a $1\times1$ convolutional filter upon the gallery feature map:

$W^q_{i,j} = f_\theta(f^q_{i,j}) \in \mathbb{R}^C, \quad S_{i,j}(p,q) = (W^q_{i,j})^T f^g_{p,q}$

This equates to a full query-gallery location similarity matrix. Computation is optimized batch-wise using im2col and einsum. After initial cosine similarity, QAConv-QA systematically applies pixel-level reweighting and bidirectional consistency as above.

The QAConv-QA module relies on multi-modal fusion of RGB and parsing-based features, providing joint representations for matching. The forward pass, as outlined in the implementation, comprises: fused feature extraction, pixel weight calculation, pairwise cosine similarity computation, quality reweighting, dual-direction softmax normalization, computation of bidirectionally consistent similarity, aggregation via Bi-GMP, and final post-processing through batch normalization, MLP, and sigmoid activation. The following summarizes the computational sequence:

Fq = fuse_branch(RGB_q, Parse_q)
Fg = fuse_branch(RGB_g, Parse_g)

Qq = compute_pixel_weights(ParseMask_q)
Qg = compute_pixel_weights(ParseMask_g)

S_raw = cosine_similarity(Fq, Fg)
S1 = outer(Qq, Qg) * S_raw

P_q2g = softmax_over_query_locs(S1)
P_g2q = softmax_over_gallery_locs(S1)
S2 = P_q2g * P_g2q

Sagg_q2g = max_{(i,j)} max_{(p,q)} S2
Sagg_g2q = max_{(p,q)} max_{(i,j)} S2
Sagg = (Sagg_q2g + Sagg_g2q)/2

p = sigmoid(MLP(BN(Sagg)))

6. Supervision and Training Loss Composition

The QA-ReID framework integrates three types of losses:

Identity classification loss on globally pooled features of each branch,
Triplet loss operating over these embeddings, and
Binary cross-entropy matching loss on pixel-level pairwise scores from QAConv-QA.

The total loss $L$ is given by: $L = (L_{\mathrm{cls}^{rgb}} + L_{\mathrm{cls}^{par}}) + (L_{\mathrm{tri}^{rgb}} + L_{\mathrm{tri}^{par}}) + L_{\mathrm{match}}$ This composite objective enforces both global structural identity constraints and fine-grained local alignment under varied clothing.

7. Performance in Clothes-Changing ReID

On challenging CC-ReID benchmarks—PRCC, LTCC, and VC-Clothes—QA-ReID augmented with QAConv-QA achieves state-of-the-art results under clothing-changing protocols:

Dataset	Top-1 Gain	mAP Gain
PRCC	+6.9%	+3.9%
LTCC	+0.7%	+1.9%
VC-Clothes	+3.0%	+2.8%

Ablation studies isolate the contributions of the two QAConv-QA blocks: pixel weighting alone yields +1.6% Top-1 (PRCC), bidirectional matching alone +0.7%, with the full combination providing +3.1% improvement.

Visualization of QAConv-QA attention maps demonstrates that the model attends chiefly to identity-stable regions—such as the head and limbs—rather than clothing-variant areas, confirming the intended focus on semantically stable cues (Wang et al., 27 Jan 2026).

In sum, QAConv-QA imparts quality-aware, mutual pixel-level filtering to query-adaptive convolution, crucially advancing robust ReID performance amid drastic clothing transitions through a unified mechanism of feature fusion, spatial weighting, and tightly enforced mutual consistency.

Markdown Report Issue Upgrade to Chat

References (1)

QA-ReID: Quality-Aware Query-Adaptive Convolution Leveraging Fused Global and Structural Cues for Clothes-Changing ReID (2026)

Topic to Video (Beta)

No one has generated a video about this topic yet.

Whiteboard

No one has generated a whiteboard explanation for this topic yet.

Follow Topic

Get notified by email when new papers are published related to Quality-Aware Query Adaptive Convolution (QAConv-QA).

QAConv-QA: Query Adaptive Convolution for ReID

1. Role of QAConv-QA in the Dual-Branch QA-ReID Framework

2. Pixel-Level Importance Weighting

3. Bidirectional Consistency Constraints

4. Query-Adaptive Convolution and Dynamic Filtering

6. Supervision and Training Loss Composition

7. Performance in Clothes-Changing ReID

Topic to Video (Beta)

Whiteboard

Follow Topic

Continue Learning

Don't miss out on important new AI/ML research

QAConv-QA: Query Adaptive Convolution for ReID

1. Role of QAConv-QA in the Dual-Branch QA-ReID Framework

2. Pixel-Level Importance Weighting

3. Bidirectional Consistency Constraints

4. Query-Adaptive Convolution and Dynamic Filtering

5. Integration with Multi-Modal Fusion and Forward Pass Workflow

6. Supervision and Training Loss Composition

7. Performance in Clothes-Changing ReID

Topic to Video (Beta)

Whiteboard

Follow Topic

Continue Learning

Related Topics

Don't miss out on important new AI/ML research

Sign up for free to explore the frontiers of research