Contrastive Relation Encoder (CORE)
- Contrastive Relation Encoder (CORE) is a neural framework that uses contrastive loss to learn relational structures by pulling together true relation pairs and pushing apart negatives.
- It integrates task-specific feature extraction backbones—such as CharacterBERT, GNNs, and CNNs—with explicit relational modeling for applications in biomedical text, knowledge graphs, and scene text detection.
- Empirical results demonstrate significant improvements in metrics like F1 score, Hits@10, and Hmean, validating CORE's effectiveness across multiple domains.
A Contrastive Relation Encoder (CORE) refers to a family of neural architectures and training strategies that employ contrastive learning objectives to impose or leverage relation structure in learned representations. The primary aim of CORE frameworks is to ensure that the representation space reflects relational inductive biases—either from linguistic graphs, knowledge graphs, or relational proposal groupings—by pulling together true (positive) relation or instance pairs and pushing apart hard negatives. This class of methods has been influential in relation extraction, inductive knowledge graph completion, and scene text detection, with instantiations in several modalities and task contexts.
1. Model Architectures and Task Domains
CORE has been instantiated in three principal domains: relation extraction from biomedical text, inductive relation prediction in knowledge graphs, and instance-aware proposal grouping for scene text detection. The architectural core is task-adaptive but consistently combines three elements:
- Feature Extraction Backbone—LLMs (e.g., CharacterBERT (Theodoropoulos et al., 2021)), GNNs for knowledge graphs (Wu et al., 2022), or CNN-based region proposals in computer vision (Lin et al., 2021).
- Relational Structure Modeling—explicit graph structure via GCNs (text), clustering and encoder partitioning (KGs), or relation blocks (scene text).
- Contrastive Head(s)—loss modules enforcing semantic similarity for positive relation/entity pairs and dissimilarity for negatives.
1.1 Text-based Relation Extraction (CORE-NLP)
- CharacterBERT Encoder: Tokenized input sentences are processed with a domain-specific CharacterBERT, freezing the lower transformer layers and fine-tuning the upper layers, to produce contextual token embeddings with .
- Graph Encoder (GCN): For each sentence, a relation graph is constructed over drug–adverse-effect (AE) token pairs. Node features for are initialized to the BERT embeddings. A normalized adjacency matrix is built, and GCN propagation is carried out via .
- Pooling and Pairwise Units: Two variants—CLGS uses mean/max-pooling over nodes for graph and sentence embedding alignment; CLDR constructs 2-node subgraphs per relation with custom adjacency for precise relation pair modeling and concatenation of node embeddings (Theodoropoulos et al., 2021).
1.2 Inductive Relation Prediction (CORE-KG)
- Clustered GNN Encoders: For each cluster of semantically related relations (defined via GloVe embeddings, t-SNE, and K-Means clustering), a dedicated GNN encoder is maintained. Inference on a triplet routes to the corresponding encoder by cluster assignment of .
- Enclosing Subgraph Extraction: Directed -hop subgraphs centered on are extracted (with label leakage mitigation by dropping true edges). Nodes and edges are featurized by relative distances and relation types, respectively.
- Relational Message Passing: Node and edge embeddings are jointly updated through message passing. The final subgraph representation is constructed as .
1.3 Instance-Aware Scene Text Detection (CORE-Text)
- CORE Module Integration: Implemented as a two-stage plug-in to Mask R-CNN. Region proposals are encoded via appearance and geometric features, refined using stacked CORE modules consisting of relation blocks (attention over proposals) and a contrastive head encouraging embedding proximity among the same text instance (Lin et al., 2021).
2. Contrastive Learning Objectives
Contrastive learning in CORE enforces instance- or structure-level relational similarity via variants of the InfoNCE or SimCLR loss:
2.1 SimCLR-Type Loss in Text
For positive (anchor, target) pairs and negatives: where cosine similarity and temperature (e.g., ).
Variants:
- CLGS: Sentence-to-graph alignment.
- CLDR: Relation-to-relation pairwise matching.
- CLNER: Entity embedding clustering by BIO tags.
2.2 Clustered Contrastive Loss in Knowledge Graphs
For each anchor triplet, positives are drawn within the same relation cluster, and negatives across clusters. All encoders are trained jointly:
2.3 Instance-wise Contrastive Loss in Scene Text
After a relation block, proposal embeddings are processed via a two-layer MLP, L2 normalization, then contrasted such that sub-texts/full-texts of the same instance are positive, proposals of other instances are negative. The InfoNCE loss is applied with temperature and weighting .
3. Training Protocols and Hyperparameters
Training procedures in all CORE instantiations consistently employ small batch sizes, Adam or SGD optimizers, and temperature scheduling.
| Domain | Encoder Backbone | Batch Size | Embedding Dim | Key Loss Temp. | Task-specific Features |
|---|---|---|---|---|---|
| Text (BioNLP) | CharacterBERT | 8–16 | 768 | 0.5 | GCN for relation graphs |
| Knowledge Graphs | (Clustered) GNN | varies | set per valid. | Cluster per relation clusters | |
| Scene Text | ResNet-50 + FPN | 16 | 128/1024 | 0.2 | Relation block, 2-layer MLP |
Early stopping is used on validation loss. For CORE-KG, relation clusters are determined by running t-SNE followed by K-Means on GloVe relation name embeddings. For CORE-Text, anchor aspect ratios, loss temperature, and contrastive loss weight are selected by tuning for optimal detection Hmean.
4. Empirical Results and Benchmarks
4.1 Relation Extraction in Text (Theodoropoulos et al., 2021)
- Strict RE (with KNN classifier in learned space, ADE dataset, macro-averaged F1):
- Baseline (vanilla CharacterBERT + linear): 66.8 F1
- Linear evaluation of CharacterBERT_CLDR: 81.73 F1
- KNN in CLDR space: 79.97 F1
- RE-only (upper bound): 86.5 F1
- State of the art: 80.1–81.14 F1
- Named Entity Recognition (CLNER):
- KNN classifier on CLNER space: 88.3 F1
4.2 Inductive KG Completion (Wu et al., 2022)
- Datasets: FB15k-237 (long-tail), NELL-995.
- Metrics: AUC-PR, Hits@10 (rank among 50 negatives).
- Performance:
- ReCoLe (CORE-KG) surpasses prior art (GraIL, CoMPILE, Meta-iKG) by 1–3 pts AUC-PR and up to 7 pts Hits@10.
- On long-tail relations, CORE shows 8–10 pts AUC advantage over PathCon.
- Ablations: Removal of cluster sampling or contrastive pretraining significantly degrades performance (–10.3 and –3.0 pts Hits@10, respectively).
4.3 Scene Text Detection (CORE-Text) (Lin et al., 2021)
- Hmean improvements on four benchmarks:
- ICDAR 2017 MLT (val): Baseline 80.0%, relation module 81.1%, full CORE 82.1%
- ICDAR 2017 MLT (test): 78.7% (base 77.2)
- ICDAR 2015: 89.3% (base 88.2)
- CTW1500: 85.7% (base 84.9)
- Total-Text: 86.3% (base 85.3)
- Notably, the number of erroneous “sub-text” fragments reduced by over 400 on MLT validation.
5. Relational Structure and Negative Sampling
CORE leverages hard negative sampling and explicit modeling of relational semantics to maximize representation discriminability:
- Text: Negative relation graphs are generated by replacing one endpoint of a drug–AE pair with a random token of the opposite type.
- KGs: Positives come from the same relation cluster; negatives from other clusters, defined by unsupervised clustering of relation names.
- Vision: Positives are all proposals (sub- and full-text) for the same ground-truth text instance, negatives are proposals for other instances.
Visualization (t-SNE) demonstrates that contrastively learned relation/entity spaces yield well-separated clusters for true versus negative pairs (Fig. 6/8 in (Theodoropoulos et al., 2021)).
6. Extensions, Joint Inference, and Significance
Multiple works report that contrastive pretraining on relation structure and entity class with dedicated embedding spaces enables highly accurate non-parametric classification with simple KNN, competitive with more complex and opaque models.
- Text: Joint entity-relation inference is performed by predicting relations in the CLDR space and entities in the CLNER space, yielding near state-of-the-art performance on ADE with transparent inference schemes.
- Knowledge Graph: Inductive generalization to unseen entities and relations is natively supported, as entity IDs are not encoded, and new relations can be classified into clusters at test time.
- Scene Text: CORE modules substantially reduce “sub-text” errors and support plug-in integration into standard detectors, with robust performance gains across multiple benchmarks.
A plausible implication is that contrastive relational learning provides a scalable, general mechanism for imparting symbolic or graph-level relational inductive bias in neural architectures across modalities. This is particularly impactful in domains where structure and instance grouping are critical for downstream reasoning and prediction.
7. Implementation Best Practices and Constraints
- For text applications: Freeze lower transformer layers during fine-tuning to preserve general representations; pool embeddings appropriately for the task variant (mean or [CLS]).
- For knowledge graphs: Cluster relations on pre-trained semantic embeddings to share statistical strength among similar relations.
- For vision: Use two CORE modules, with 16 relation heads of hidden size 64, and tune the contrastive temperature and loss weight for optimal accuracy and convergence.
- Contrastive loss implementation: Always combine with strong negative sampling, matching positive/negative ratios per anchor, and conduct ablation to assess contribution of each loss term.
These design practices collectively support the robust, generalizable, and highly effective realization of contrastive relational encoding across distinct task settings.