Papers
Topics
Authors
Recent
Search
2000 character limit reached

CARE: Class-Aware Representation Refinement

Updated 29 January 2026
  • The paper introduces CARE, a framework that refines instance embeddings using class prototypes to boost discriminative power.
  • CARE employs optimal cross-domain assignment and pseudo-label refinement to align source and target features in unsupervised domain adaptation.
  • For graph classification, CARE integrates set encoding to generate class prototypes, enhancing both intra-class compactness and inter-class separation.

Class-Aware Representation Refinement (CARE) is a framework for improving discriminative power and generalization in representation learning by explicitly incorporating class information throughout the learning process. Distinct from conventional approaches that treat examples or graphs independently, CARE leverages class prototypes or global class structure to refine instance-level representations, thereby enhancing class separability and mitigating overfitting. The paradigm has been instantiated in both unsupervised domain adaptation for images and supervised graph classification, demonstrating substantial gains via optimal assignment, pseudo-label refinement, and explicit class-aware loss terms (Zhang et al., 2022, Xu et al., 2022).

1. Motivation and Scope

CARE addresses limitations inherent in standard representation learning frameworks, including:

  • In unsupervised domain adaptation (UDA), source domain bias and suboptimal feature alignment can degrade pseudo-label quality and hinder adaptation due to mismatched class structures across domains (Zhang et al., 2022).
  • In graph classification, standard Graph Neural Networks (GNNs) process each input graph or subgraph independently, neglecting relationships between graphs of the same class and failing to explicitly encourage intra-class clustering or inter-class separation. This can result in overfitting and less transferable embeddings (Xu et al., 2022).

The CARE methodology systematically injects class structure at the representation level, refining latent embeddings using global class prototypes and alignment strategies. In both domains, empirical and theoretical results validate improvements in accuracy, discriminability, and generalization.

2. Core Components and Methodology

CARE is instantiated with distinct but converging methodologies in UDA and graph classification:

2.1 Optimal Cross-Domain Assignment (Image UDA Context)

Given labeled source embeddings {fis,yis}i=1Ns\{f_i^s, y_i^s\}_{i=1}^{N_s} and unlabeled target embeddings {fit}i=1Nt\{f_i^t\}_{i=1}^{N_t}, CARE clusters target samples via KK-means to obtain centroids Ct={cjt}j=1KC^t = \{c_j^t\}_{j=1}^K, and computes source class centroids Cs={cks}k=1KC^s = \{c_k^s\}_{k=1}^K by averaging embeddings per class. A bipartite assignment matrix M{0,1}K×KM \in \{0,1\}^{K\times K} aligns target clusters with source classes using the Hungarian algorithm:

minM{0,1}K×Ki=1Kj=1Kdijmijs.t.i=1Kmij=1 j,j=1Kmij=1 i\min_{M \in \{0,1\}^{K\times K}} \sum_{i=1}^K \sum_{j=1}^K d_{ij} m_{ij} \quad\text{s.t.}\quad \sum_{i=1}^K m_{ij} = 1\ \forall j,\quad \sum_{j=1}^K m_{ij} = 1\ \forall i

where dij=ciscjt2d_{ij} = \|c^s_i - c^t_j\|_2.

Pseudo-labels y~it\tilde y_i^t for target samples are assigned via their nearest target centroid and the cluster-to-class mapping defined by MM (Zhang et al., 2022).

2.2 Class Prototypes and Set Encoders (Graph Classification Context)

For each class iYi \in Y, CARE maintains a bag BiB_i of subgraph embeddings hgGsubhg_G^{sub} for all training graphs GG with label yG=iy_G = i. A permutation-invariant Set-Encoder (DeepSets) summarizes BiB_i into a class prototype hcihc_i:

hci=ρ(hgBiϕ(hg))hc_i = \rho\left( \sum_{hg \in B_i} \phi(hg) \right)

where ϕ(hg)\phi(hg) is a mean-pooling operator and ρ\rho is an MLP with ReLU activation (Xu et al., 2022).

3. Representation Refinement and Loss Functions

3.1 Injection Mechanism

CARE refines each instance's embedding by concatenation with its class prototype:

hgG=Trans([hgGhcyG])hg_G' = \mathrm{Trans}([hg_G \| hc_{y_G}])

where Trans\mathrm{Trans} is an MLP+ReLU transformation. This encourages instance embeddings to move closer to their class centroids, increasing within-class compactness (Xu et al., 2022).

3.2 Pseudo-Label Refinement and Confidence Filtering

In UDA, CARE employs a target-only auxiliary network for pseudo-label refinement, trained on target data and current pseudo-labels to avoid source bias. A self-paced learning objective includes only "easy" samples (with high model confidence φni=eL()\varphi^i_n = e^{-\mathbb L(\cdot)}), adding harder samples gradually by increasing a threshold parameter λ\lambda per epoch (Zhang et al., 2022). Final confidence filtering retains only samples where model confidence exceeds eλe^{-\lambda}.

3.3 Class-Aware Loss Design

CARE introduces explicit loss components to promote class structure at the representation level:

  • In graph classification:
    • Intra-class similarity loss: Average cosine similarity between subgraph embeddings and their class prototype.
    • Inter-class similarity loss: Average cosine similarity between different class prototypes.
    • Combined class-aware loss: Lclass=exp(Linterλ1Lintra)L_\mathrm{class} = \exp(L_\mathrm{inter} - \lambda_1 L_\mathrm{intra}).
    • Total loss per batch: L=Lcls+λ2LclassL = L_\mathrm{cls} + \lambda_2 L_\mathrm{class} (Xu et al., 2022).
  • In image UDA:
    • Center-to-Center (C2C) MMD: RKHS distance between source and target class centroids.
    • Probability-to-Probability (P2P) MMD: RKHS distance between averaged class-conditional predicted probabilities in source and target.
    • Total loss includes cross-entropy on source, self-paced pseudo-label refinement, optimal assignment cost, and weighted C2C/P2P alignment terms (Zhang et al., 2022).

4. Integration with Backbone Architectures

CARE is designed to operate as a plug-in on top of standard GNN, CNN, or other feature extraction backbones:

  • Graph domain: CARE can be integrated as a "global" block after the last pooling or readout, or hierarchically at each layer if the backbone supports multi-level pooling. The only additional computational cost arises from the subgraph selector and two small MLPs for set encoding and transformation, often resulting in negligible or even reduced total training time due to faster convergence (Xu et al., 2022).
  • Image domain (UDA): CARE is implemented using standard architectures such as ResNet-50/101, replacing the FC output layer with a KK-way classifier head. Auxiliary networks replicate the primary architecture but maintain independent batch normalization layers. No adversarial discriminator is used; domain alignment is realized via the aforementioned MMD losses (Zhang et al., 2022).

5. Theoretical Properties

A key theoretical result is that CARE provides improved generalization guarantees relative to its backbone. VC-dimension analysis shows that when calibrated for equal parameter counts, CARE yields a strictly lower upper bound on the VC dimension:

VCCARE<VCBackbone\mathrm{VC}_{\mathrm{CARE}} < \mathrm{VC}_{\mathrm{Backbone}}

This reduction implies a smaller upper bound on the generalization gap, offering formal justification for the reduced overfitting observed empirically (Xu et al., 2022).

6. Empirical Evaluation and Results

CARE has been extensively evaluated in both UDA and graph contexts:

Task Type Datasets Baselines Key Gains
Image UDA Office-31, ImageCLEF-DA, VisDA-2017, Digit-Five DAN, JAN, DANN, MADA, RevGrad, etc. +3–10 pts vs. backbone (e.g., Office-31: 76.1% → 94.0%) (Zhang et al., 2022)
Graph Classification DD, PROTEINS, MUTAG, NCI1, OGB-MOLHIV GCN, GIN, GraphSAGE, GAT, etc. 84/88 pairs improved; +1–11% accuracy; +1–5 AUC on OGB (Xu et al., 2022)

Further analysis confirms high class separability (as measured by Silhouette, Hypothesis Margin), reduced boundary errors, and stable training behavior, with CARE demonstrating resilience to overfitting and effective cluster formation across different architectures and domains.

7. Practical Considerations and Best Practices

Hyperparameter choices are robust within reasonable intervals, e.g., hidden size $32$–$256$, subgraph selector pool ratio $0.25$–$0.75$, and loss weights λ1,λ2[0.1,10]\lambda_1,\lambda_2 \in [0.1,10]. For optimization, Adam (η=1×104\eta = 1 \times 10^{-4}) and early stopping (patience =25=25 epochs) are recommended. Class prototype updates are online, requiring no additional momentum or batch-level synchronization. For deployment in GNNs, insertion after global pooling suffices for standard architectures, while hierarchical models benefit from per-layer integration (Xu et al., 2022). CARE imposes minimal computational overhead and is compatible with mainstream deep learning frameworks. In UDA, careful tuning of self-paced learning and assignment parameters further optimizes accuracy and stability (Zhang et al., 2022).


References:

  • (Zhang et al., 2022): "CA-UDA: Class-Aware Unsupervised Domain Adaptation with Optimal Assignment and Pseudo-Label Refinement"
  • (Xu et al., 2022): "A Class-Aware Representation Refinement Framework for Graph Classification"

Topic to Video (Beta)

Whiteboard

No one has generated a whiteboard explanation for this topic yet.

Follow Topic

Get notified by email when new papers are published related to Class-Aware Representation Refinement (CARE).