CARE: Class-Aware Representation Refinement

Updated 29 January 2026

The paper introduces CARE, a framework that refines instance embeddings using class prototypes to boost discriminative power.
CARE employs optimal cross-domain assignment and pseudo-label refinement to align source and target features in unsupervised domain adaptation.
For graph classification, CARE integrates set encoding to generate class prototypes, enhancing both intra-class compactness and inter-class separation.

Class-Aware Representation Refinement (CARE) is a framework for improving discriminative power and generalization in representation learning by explicitly incorporating class information throughout the learning process. Distinct from conventional approaches that treat examples or graphs independently, CARE leverages class prototypes or global class structure to refine instance-level representations, thereby enhancing class separability and mitigating overfitting. The paradigm has been instantiated in both unsupervised domain adaptation for images and supervised graph classification, demonstrating substantial gains via optimal assignment, pseudo-label refinement, and explicit class-aware loss terms (Zhang et al., 2022, Xu et al., 2022).

1. Motivation and Scope

CARE addresses limitations inherent in standard representation learning frameworks, including:

In unsupervised domain adaptation (UDA), source domain bias and suboptimal feature alignment can degrade pseudo-label quality and hinder adaptation due to mismatched class structures across domains (Zhang et al., 2022).
In graph classification, standard Graph Neural Networks (GNNs) process each input graph or subgraph independently, neglecting relationships between graphs of the same class and failing to explicitly encourage intra-class clustering or inter-class separation. This can result in overfitting and less transferable embeddings (Xu et al., 2022).

The CARE methodology systematically injects class structure at the representation level, refining latent embeddings using global class prototypes and alignment strategies. In both domains, empirical and theoretical results validate improvements in accuracy, discriminability, and generalization.

2. Core Components and Methodology

CARE is instantiated with distinct but converging methodologies in UDA and graph classification:

2.1 Optimal Cross-Domain Assignment (Image UDA Context)

Given labeled source embeddings $\{f_i^s, y_i^s\}_{i=1}^{N_s}$ and unlabeled target embeddings $\{f_i^t\}_{i=1}^{N_t}$ , CARE clusters target samples via $K$ -means to obtain centroids $C^t = \{c_j^t\}_{j=1}^K$ , and computes source class centroids $C^s = \{c_k^s\}_{k=1}^K$ by averaging embeddings per class. A bipartite assignment matrix $M \in \{0,1\}^{K\times K}$ aligns target clusters with source classes using the Hungarian algorithm:

$\min_{M \in \{0,1\}^{K\times K}} \sum_{i=1}^K \sum_{j=1}^K d_{ij} m_{ij} \quad\text{s.t.}\quad \sum_{i=1}^K m_{ij} = 1\ \forall j,\quad \sum_{j=1}^K m_{ij} = 1\ \forall i$

where $d_{ij} = \|c^s_i - c^t_j\|_2$ .

Pseudo-labels $\tilde y_i^t$ for target samples are assigned via their nearest target centroid and the cluster-to-class mapping defined by $M$ (Zhang et al., 2022).

2.2 Class Prototypes and Set Encoders (Graph Classification Context)

For each class $i \in Y$ , CARE maintains a bag $B_i$ of subgraph embeddings $hg_G^{sub}$ for all training graphs $G$ with label $y_G = i$ . A permutation-invariant Set-Encoder (DeepSets) summarizes $B_i$ into a class prototype $hc_i$ :

$hc_i = \rho\left( \sum_{hg \in B_i} \phi(hg) \right)$

where $\phi(hg)$ is a mean-pooling operator and $\rho$ is an MLP with ReLU activation (Xu et al., 2022).

3.1 Injection Mechanism

CARE refines each instance's embedding by concatenation with its class prototype:

$hg_G' = \mathrm{Trans}([hg_G \| hc_{y_G}])$

where $\mathrm{Trans}$ is an MLP+ReLU transformation. This encourages instance embeddings to move closer to their class centroids, increasing within-class compactness (Xu et al., 2022).

In UDA, CARE employs a target-only auxiliary network for pseudo-label refinement, trained on target data and current pseudo-labels to avoid source bias. A self-paced learning objective includes only "easy" samples (with high model confidence $\varphi^i_n = e^{-\mathbb L(\cdot)}$ ), adding harder samples gradually by increasing a threshold parameter $\lambda$ per epoch (Zhang et al., 2022). Final confidence filtering retains only samples where model confidence exceeds $e^{-\lambda}$ .

3.3 Class-Aware Loss Design

CARE introduces explicit loss components to promote class structure at the representation level:

In graph classification:
- Intra-class similarity loss: Average cosine similarity between subgraph embeddings and their class prototype.
- Inter-class similarity loss: Average cosine similarity between different class prototypes.
- Combined class-aware loss: $L_\mathrm{class} = \exp(L_\mathrm{inter} - \lambda_1 L_\mathrm{intra})$ .
- Total loss per batch: $L = L_\mathrm{cls} + \lambda_2 L_\mathrm{class}$ (Xu et al., 2022).
In image UDA:
- Center-to-Center (C2C) MMD: RKHS distance between source and target class centroids.
- Probability-to-Probability (P2P) MMD: RKHS distance between averaged class-conditional predicted probabilities in source and target.
- Total loss includes cross-entropy on source, self-paced pseudo-label refinement, optimal assignment cost, and weighted C2C/P2P alignment terms (Zhang et al., 2022).

4. Integration with Backbone Architectures

CARE is designed to operate as a plug-in on top of standard GNN, CNN, or other feature extraction backbones:

Graph domain: CARE can be integrated as a "global" block after the last pooling or readout, or hierarchically at each layer if the backbone supports multi-level pooling. The only additional computational cost arises from the subgraph selector and two small MLPs for set encoding and transformation, often resulting in negligible or even reduced total training time due to faster convergence (Xu et al., 2022).
Image domain (UDA): CARE is implemented using standard architectures such as ResNet-50/101, replacing the FC output layer with a $K$ -way classifier head. Auxiliary networks replicate the primary architecture but maintain independent batch normalization layers. No adversarial discriminator is used; domain alignment is realized via the aforementioned MMD losses (Zhang et al., 2022).

5. Theoretical Properties

A key theoretical result is that CARE provides improved generalization guarantees relative to its backbone. VC-dimension analysis shows that when calibrated for equal parameter counts, CARE yields a strictly lower upper bound on the VC dimension:

$\mathrm{VC}_{\mathrm{CARE}} < \mathrm{VC}_{\mathrm{Backbone}}$

This reduction implies a smaller upper bound on the generalization gap, offering formal justification for the reduced overfitting observed empirically (Xu et al., 2022).

6. Empirical Evaluation and Results

CARE has been extensively evaluated in both UDA and graph contexts:

Task Type	Datasets	Baselines	Key Gains
Image UDA	Office-31, ImageCLEF-DA, VisDA-2017, Digit-Five	DAN, JAN, DANN, MADA, RevGrad, etc.	+3–10 pts vs. backbone (e.g., Office-31: 76.1% → 94.0%) (Zhang et al., 2022)
Graph Classification	DD, PROTEINS, MUTAG, NCI1, OGB-MOLHIV	GCN, GIN, GraphSAGE, GAT, etc.	84/88 pairs improved; +1–11% accuracy; +1–5 AUC on OGB (Xu et al., 2022)

Further analysis confirms high class separability (as measured by Silhouette, Hypothesis Margin), reduced boundary errors, and stable training behavior, with CARE demonstrating resilience to overfitting and effective cluster formation across different architectures and domains.

7. Practical Considerations and Best Practices

Hyperparameter choices are robust within reasonable intervals, e.g., hidden size $32$–$256$, subgraph selector pool ratio $0.25$–$0.75$, and loss weights $\lambda_1,\lambda_2 \in [0.1,10]$ . For optimization, Adam ( $\eta = 1 \times 10^{-4}$ ) and early stopping (patience $=25$ epochs) are recommended. Class prototype updates are online, requiring no additional momentum or batch-level synchronization. For deployment in GNNs, insertion after global pooling suffices for standard architectures, while hierarchical models benefit from per-layer integration (Xu et al., 2022). CARE imposes minimal computational overhead and is compatible with mainstream deep learning frameworks. In UDA, careful tuning of self-paced learning and assignment parameters further optimizes accuracy and stability (Zhang et al., 2022).

References:

(Zhang et al., 2022): "CA-UDA: Class-Aware Unsupervised Domain Adaptation with Optimal Assignment and Pseudo-Label Refinement"
(Xu et al., 2022): "A Class-Aware Representation Refinement Framework for Graph Classification"

Markdown Upgrade to Chat

References (2)

CA-UDA: Class-Aware Unsupervised Domain Adaptation with Optimal Assignment and Pseudo-Label Refinement (2022)

A Class-Aware Representation Refinement Framework for Graph Classification (2022)

Topic to Video (Beta)

Whiteboard

Follow Topic

Get notified by email when new papers are published related to Class-Aware Representation Refinement (CARE).

CARE: Class-Aware Representation Refinement

1. Motivation and Scope

2. Core Components and Methodology

2.1 Optimal Cross-Domain Assignment (Image UDA Context)

2.2 Class Prototypes and Set Encoders (Graph Classification Context)

3. Representation Refinement and Loss Functions

3.1 Injection Mechanism

3.2 Pseudo-Label Refinement and Confidence Filtering

3.3 Class-Aware Loss Design

4. Integration with Backbone Architectures

5. Theoretical Properties

6. Empirical Evaluation and Results

7. Practical Considerations and Best Practices

Topic to Video (Beta)

Whiteboard

Follow Topic

Continue Learning

Don't miss out on important new AI/ML research

Sign up for free to explore the frontiers of research

CARE: Class-Aware Representation Refinement

1. Motivation and Scope

2. Core Components and Methodology

2.1 Optimal Cross-Domain Assignment (Image UDA Context)

2.2 Class Prototypes and Set Encoders (Graph Classification Context)

3. Representation Refinement and Loss Functions

3.1 Injection Mechanism

3.2 Pseudo-Label Refinement and Confidence Filtering

3.3 Class-Aware Loss Design

4. Integration with Backbone Architectures

5. Theoretical Properties

6. Empirical Evaluation and Results

7. Practical Considerations and Best Practices

Topic to Video (Beta)

Whiteboard

Follow Topic

Continue Learning

Related Topics

Don't miss out on important new AI/ML research

Sign up for free to explore the frontiers of research