Joint Generalized Cosine Similarity (JGCS)
- JGCS is a family of similarity measures that extends classical cosine similarity through convex cost functions, learned metric tensors, and hypervolume angles for multi-vector and multi-modal comparisons.
- JGCS is applied in semantic alignment, cross-domain matching, and contrastive learning, enhancing tasks like word similarity, visual matching, and multi-modal classification.
- JGCS improves task adaptation and computational efficiency by incorporating context-sensitive formulations that recover classical cosine similarity as a special case under specific conditions.
Joint Generalized Cosine Similarity (JGCS) is a family of similarity measures that generalize the classical cosine similarity to accommodate a broad range of data structures, learning contexts, and multi-vector comparisons. JGCS encompasses formulations based on convex cost functions (“Bregman cosine”), metric tensors, affine-quadratic forms for cross-domain matching, and the extension to joint similarity for vectors via hypervolume angles. These generalizations enable task-adapted, context-sensitive, and multi-modal alignment, surpassing the limitations of standard pairwise cosine similarity in both expressive power and computational efficiency.
1. Mathematical Formulations of JGCS
At its core, JGCS refers to any of several extensions of cosine similarity designed to capture richer geometric, statistical, and algebraic relationships.
1.1. Bregman (Convex Cost Function) JGCS
Given a convex cost function , the similarity between two points is defined as
where if is differentiable at , and otherwise any , with practical schemes choosing to maximize the similarity (Gunay et al., 2014). Special cases include:
- Negative entropy: ,
- Total variation: , as the signed finite differences
- Euclidean: , , recovering ordinary cosine similarity
1.2. Metric Tensor (Learned Inner Product) JGCS
Let be a learned symmetric positive semi-definite matrix. For ,
with parameterized via a learned linear map (typically ) to ensure is PSD (Vos et al., 2022). This enables context-sensitive and data-driven geometric adaptation.
1.3. Affine-Quadratic Form JGCS for Cross-Domain Matching
For , (possibly high-level features from different domains), JGCS is defined as a quadratic form: where are PSD, is the cross-domain matrix, affine terms, and (Lin et al., 2016). This can be decomposed into a weighted sum of affine Mahalanobis distance and affine cosine terms, subsuming both as special cases.
1.4. Multi-Vector/Modal JGCS via Gram Determinant Hypervolume
For vectors , the JGCS is defined by
This recovers standard cosine similarity for but enables well-defined joint similarity over any number of vectors (Chen et al., 6 May 2025).
2. Theoretical Properties and Special Cases
JGCS possesses key invariances and compatibility with geometric intuition:
- Rotation invariance: Under orthogonal transformations, both the Bregman and hypervolume-based JGCS remain unchanged.
- Downward compatibility: The hypervolume formulation yields for ; the metric-tensor and convex cost function approaches recover standard cosine when or .
- Permutation symmetry: Joint similarity is invariant to the order of vectors up to sign (hypervolume angle) (Chen et al., 6 May 2025).
- Sensitivity to data semantics: By choosing , , or the affine maps appropriately, JGCS can emphasize distribution shape (via negative entropy), edge/variation structure (total variation), or task/context-specific axes (learned /) (Gunay et al., 2014, Vos et al., 2022).
- Degeneracy/extrema: For linearly dependent vectors (in hypervolume JGCS), , JGCS=1; for mutually orthogonal vectors, JGCS=0.
3. Algorithmic and Computational Considerations
Most JGCS variants introduce computational overhead compared to plain cosine, but retain tractability and favorable scaling.
| Variant | Main Computational Cost | Key Notes |
|---|---|---|
| Bregman/convex cost (Gunay et al., 2014) | gradient/subgradient eval | Subgradient selection for nondifferentiable |
| Metric tensor (Vos et al., 2022) | for | PSD enforced via parameterization |
| Affine-quadratic form (Lin et al., 2016) | / weights in deep net | End-to-end learning via hinge loss |
| Hypervolume/k-modal (Chen et al., 6 May 2025) | per k-tuple | Gram det via Cholesky; efficient for |
The hypervolume JGCS's scaling remains practical for typical , while avoiding the combinatorial redundancy of all possible pairwise cosine computations as increases (Chen et al., 6 May 2025). For nondifferentiable convex , subgradient maximization may add further cost, though the similarity reduces to standard norms and inner products in smooth cases (Gunay et al., 2014).
4. Applications
JGCS has been applied across embedding evaluation, visual domain-matching, and multi-modal contrastive learning:
- Word/semantic similarity: Metric-tensor JGCS (learned ) achieves higher correlation with human ratings than standard cosine, especially with contextualized BERT/GPT-2 embeddings and task-adapted context metrics (Vos et al., 2022).
- Cross-domain visual matching: The affine-quadratic JGCS enables domain-robust matching (e.g., matching ID photos to surveillance images) within a single end-to-end learned deep network, blending Mahalanobis and cosine terms (Lin et al., 2016).
- N-way semantic alignment and contrastive learning: The hypervolume-based JGCS admits direct joint alignment of arbitrary modalities, reducing the need for all pairwise losses, enhancing convergence, semantic collapse, and noise robustness in multi-modal datasets (e.g., clinical/dermatoscopic/image metadata) (Chen et al., 6 May 2025).
- Kernel and manifold learning: The Gram-hypervolume angle has been suggested for multi-vector kernels, regularization, and sensor fusion (Chen et al., 6 May 2025).
5. Empirical Performance and Practical Impact
Empirical evaluations document consistent performance improvements from JGCS-based methods:
- Contextual word similarity: Relative correlation gains (Spearman/Pearson) of +150–652% over plain cosine similarity on SimLex-999 and task-derived datasets with BERT-class embeddings (Vos et al., 2022).
- Cross-modal vision tasks: Superior retrieval/identification accuracy over existing state-of-the-art on re-identification and multimodal face verification tasks (Lin et al., 2016).
- Multi-modal medical classification: On the Derm7pt tri-modal skin lesion dataset, GHA Loss (based on hypervolume JGCS) outperforms dual/pairwise-objective baselines by +2–5% accuracy and up to +5.4 points macro-F1, with enhanced robustness to injected Gaussian noise (mean error grows linearly with noise STD up to ) (Chen et al., 6 May 2025).
- Computational efficiency: For practical , GHA Loss remains faster than explicit pairwise InfoNCE, with scaling and minimal additional cost for determinants (Chen et al., 6 May 2025).
6. Limitations and Possible Extensions
Limitations observed include:
- Metric tensor (contextual) JGCS: Linear transformations may be insufficient for highly nonlinear context adaptation; parameter counts can grow rapidly with dimensionality (Vos et al., 2022).
- Bregman/convex JGCS: Subgradient choice is nontrivial for nonsmooth and may have ambiguous optimality (Gunay et al., 2014).
- Hypervolume (N-modal) JGCS: The Gram determinant is numerically sensitive for nearly collinear vectors but can be regularized by shift (Chen et al., 6 May 2025).
Suggested extensions:
- Nonlinear context metrics for capturing more complex data dependencies (Vos et al., 2022).
- Low-rank or sparse for computational/regularization benefits.
- Kernelization for multi-modal learning, e.g., (Chen et al., 6 May 2025).
- Eigendecomposition for interpretability of learned metrics (Vos et al., 2022).
- Application to multi-target tracking, sensor fusion, and attention gating (Chen et al., 6 May 2025).
7. Relationship to Classical Cosine and Other Similarities
JGCS constitutes a strict generalization of cosine similarity. For convex , learned metric , or in the hypervolume angle, all JGCS variants reduce exactly to the ordinary cosine. For more structured or context-sensitive data, the measure embodies rich task and data-dependent inductive biases, strictly increasing discriminative and alignment capacity over traditional cosine, Euclidean, or Mahalanobis comparisons (Gunay et al., 2014, Vos et al., 2022, Lin et al., 2016, Chen et al., 6 May 2025).
A plausible implication is that as the complexity and heterogeneity of data modalities increases, JGCS's joint, geometry-aware, and context-adaptive formalism will continue to supplant pairwise similarity techniques in both supervised and unsupervised settings.