Papers
Topics
Authors
Recent
Search
2000 character limit reached

QAConv-QA: Query Adaptive Convolution for ReID

Updated 3 February 2026
  • The paper introduces a module that integrates pixel-level importance weighting with bidirectional consistency, enhancing identity matching under severe clothing changes.
  • It fuses RGB and parsing-based features through multi-modal attention and uses dynamic query-adaptive convolution to generate robust, clothing-invariant representations.
  • Evaluations on PRCC, LTCC, and VC-Clothes benchmarks show significant Top-1 and mAP improvements, validating the method's effectiveness in CC-ReID.

Quality-Aware Query-Adaptive Convolution (QAConv-QA) is a module designed to enhance pixel-level matching within the dual-branch QA-ReID architecture, targeting the challenges of person re-identification (ReID) under severe clothing changes. QAConv-QA introduces two critical mechanisms—pixel-level importance weighting and explicit bidirectional consistency constraints—that together facilitate robust identity correspondence even as superficial appearance varies with clothing. This approach proves essential in clothes-changing ReID (CC-ReID), a setting characterized by strong intra-person appearance shifts.

1. Role of QAConv-QA in the Dual-Branch QA-ReID Framework

QAConv-QA is embedded within the two-branch backbone of QA-ReID, which utilizes complementary cues from RGB images and clothing-invariant structural features. The RGB branch extracts feature maps from the full image using ResNet-50 up to stage 3, producing FrgbRC×H×WF_{rgb}\in\mathbb R^{C\times H\times W}. The parsing branch applies a human-parsing network to produce a body-part mask Mbody{0,1}H×WM_{body}\in\{0,1\}^{H'\times W'}, removing the clothing regions to form Ipar=IMbodyI_{par}=I\odot M_{body} and generating FparRC×H×WF_{par}\in\mathbb R^{C\times H\times W}.

A multi-modal attention fusion module computes a joint attention map ω[0,1]C×H×W\omega\in [0,1]^{C\times H\times W}, blending FrgbF_{rgb} and FparF_{par} into a fused feature map FfuseF_{fuse} via: Fmix=ωFrgb+(1ω)FparF_{mix} = \omega\odot F_{rgb}+(1-\omega)\odot F_{par}

Ffuse=Conv1×1(Frgb+Fpar+Fmix)F_{fuse} = \text{Conv}_{1\times1}(F_{rgb} + F_{par} + F_{mix})

QAConv-QA directly operates on these fused features at the pixel level, comparing query and gallery images (FfuseqF^q_{fuse}, FfusegF^g_{fuse}) through a sequence of similarity calculations, weighting, and aggregation, followed by a post-processing head (bidirectional global max pooling \rightarrow batch norm \rightarrow MLP \rightarrow sigmoid) to yield match probabilities pqgp_{qg} (Wang et al., 27 Jan 2026).

2. Pixel-Level Importance Weighting

Each spatial location (i,j)(i,j) on the FfuseF_{fuse} feature map receives a quality score Qi,jQ_{i,j} reflecting the likelihood that it lies on an identity-relevant (typically non-clothing) region. The score Qˉi,j\bar Q_{i,j} is computed as the fraction of the corresponding k×kk\times k input patch covered by the body-part mask, and is normalized by a spatial softmax: Qi,j=exp(Qˉi,j)h=1Hw=1Wexp(Qˉh,w)Q_{i,j} =\frac{\exp(\bar Q_{i,j})}{\sum_{h=1}^H\sum_{w=1}^W \exp(\bar Q_{h,w})} The pairwise cosine similarity between query and gallery pixel features fi1,j1qf^q_{i_1,j_1}, fi2,j2gf^g_{i_2,j_2} is then re-weighted: sim1(fi1,j1q,fi2,j2g)=Qi1,j1qQi2,j2gρ(fi1,j1q,fi2,j2g)\mathrm{sim}^1(f^q_{i_1,j_1},f^g_{i_2,j_2}) = Q^q_{i_1,j_1}\cdot Q^g_{i_2,j_2} \cdot \rho(f^q_{i_1,j_1},f^g_{i_2,j_2}) This mechanism prioritizes features that localize to identity-stable, body-based regions and suppresses the influence of clothing-related areas.

3. Bidirectional Consistency Constraints

To further enhance reliability in pixel-level matching, QAConv-QA introduces explicit bidirectional consistency. Conditional softmaxes are defined over feature locations, establishing the probability that a given pixel in one sample is the best match for a pixel in the other, and vice versa: ρˉ(fi1,j1qfi2,j2g)=exp(sim1(fi1,j1q,fi2,j2g))h,wexp(sim1(fh,wq,fi2,j2g))\bar\rho(f^q_{i_1,j_1}\mid f^g_{i_2,j_2}) = \frac{\exp(\mathrm{sim}^1(f^q_{i_1,j_1},f^g_{i_2,j_2}))}{\sum_{h,w} \exp(\mathrm{sim}^1(f^q_{h,w},f^g_{i_2,j_2}))}

ρˉ(fi2,j2gfi1,j1q)=exp(sim1(fi1,j1q,fi2,j2g))h,wexp(sim1(fi1,j1q,fh,wg))\bar\rho(f^g_{i_2,j_2}\mid f^q_{i_1,j_1}) = \frac{\exp(\mathrm{sim}^1(f^q_{i_1,j_1},f^g_{i_2,j_2}))}{\sum_{h,w} \exp(\mathrm{sim}^1(f^q_{i_1,j_1},f^g_{h,w}))}

The bidirectional-consistent similarity takes the product: sim2(fi1,j1q,fi2,j2g)=ρˉ(fi1,j1qfi2,j2g)ρˉ(fi2,j2gfi1,j1q)\mathrm{sim}^2(f^q_{i_1,j_1},f^g_{i_2,j_2}) = \bar\rho(f^q_{i_1,j_1}\mid f^g_{i_2,j_2}) \cdot \bar\rho(f^g_{i_2,j_2}\mid f^q_{i_1,j_1}) Aggregating sim2\mathrm{sim}^2 over all pixel pairs with bidirectional global maximum pooling yields a scalar score sq,gs_{q,g}, emphasizing only mutually top-matching, identity-consistent region pairs.

4. Query-Adaptive Convolution and Dynamic Filtering

QAConv-QA adopts a dynamic filter paradigm inspired by the original QAConv formulation [Shengcai Liao & Ling Shao, ECCV 2020], where each query pixel feature fi,jqf^q_{i,j} acts as a 1×11\times1 convolutional filter upon the gallery feature map:

Wi,jq=fθ(fi,jq)RC,Si,j(p,q)=(Wi,jq)Tfp,qgW^q_{i,j} = f_\theta(f^q_{i,j}) \in \mathbb{R}^C, \quad S_{i,j}(p,q) = (W^q_{i,j})^T f^g_{p,q}

This equates to a full query-gallery location similarity matrix. Computation is optimized batch-wise using im2col and einsum. After initial cosine similarity, QAConv-QA systematically applies pixel-level reweighting and bidirectional consistency as above.

5. Integration with Multi-Modal Fusion and Forward Pass Workflow

The QAConv-QA module relies on multi-modal fusion of RGB and parsing-based features, providing joint representations for matching. The forward pass, as outlined in the implementation, comprises: fused feature extraction, pixel weight calculation, pairwise cosine similarity computation, quality reweighting, dual-direction softmax normalization, computation of bidirectionally consistent similarity, aggregation via Bi-GMP, and final post-processing through batch normalization, MLP, and sigmoid activation. The following summarizes the computational sequence:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
Fq = fuse_branch(RGB_q, Parse_q)
Fg = fuse_branch(RGB_g, Parse_g)

Qq = compute_pixel_weights(ParseMask_q)
Qg = compute_pixel_weights(ParseMask_g)

S_raw = cosine_similarity(Fq, Fg)
S1 = outer(Qq, Qg) * S_raw

P_q2g = softmax_over_query_locs(S1)
P_g2q = softmax_over_gallery_locs(S1)
S2 = P_q2g * P_g2q

Sagg_q2g = max_{(i,j)} max_{(p,q)} S2
Sagg_g2q = max_{(p,q)} max_{(i,j)} S2
Sagg = (Sagg_q2g + Sagg_g2q)/2

p = sigmoid(MLP(BN(Sagg)))

6. Supervision and Training Loss Composition

The QA-ReID framework integrates three types of losses:

  • Identity classification loss on globally pooled features of each branch,
  • Triplet loss operating over these embeddings, and
  • Binary cross-entropy matching loss on pixel-level pairwise scores from QAConv-QA.

The total loss LL is given by: L=(Lclsrgb+Lclspar)+(Ltrirgb+Ltripar)+LmatchL = (L_{\mathrm{cls}^{rgb}} + L_{\mathrm{cls}^{par}}) + (L_{\mathrm{tri}^{rgb}} + L_{\mathrm{tri}^{par}}) + L_{\mathrm{match}} This composite objective enforces both global structural identity constraints and fine-grained local alignment under varied clothing.

7. Performance in Clothes-Changing ReID

On challenging CC-ReID benchmarks—PRCC, LTCC, and VC-Clothes—QA-ReID augmented with QAConv-QA achieves state-of-the-art results under clothing-changing protocols:

Dataset Top-1 Gain mAP Gain
PRCC +6.9% +3.9%
LTCC +0.7% +1.9%
VC-Clothes +3.0% +2.8%

Ablation studies isolate the contributions of the two QAConv-QA blocks: pixel weighting alone yields +1.6% Top-1 (PRCC), bidirectional matching alone +0.7%, with the full combination providing +3.1% improvement.

Visualization of QAConv-QA attention maps demonstrates that the model attends chiefly to identity-stable regions—such as the head and limbs—rather than clothing-variant areas, confirming the intended focus on semantically stable cues (Wang et al., 27 Jan 2026).

In sum, QAConv-QA imparts quality-aware, mutual pixel-level filtering to query-adaptive convolution, crucially advancing robust ReID performance amid drastic clothing transitions through a unified mechanism of feature fusion, spatial weighting, and tightly enforced mutual consistency.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (1)

Topic to Video (Beta)

No one has generated a video about this topic yet.

Whiteboard

No one has generated a whiteboard explanation for this topic yet.

Follow Topic

Get notified by email when new papers are published related to Quality-Aware Query Adaptive Convolution (QAConv-QA).