Optimization under Similarity
- Optimization under similarity is a framework that formalizes data, function, or solution likeness into explicit constraints or objectives to enable sharper convergence guarantees.
- It underpins methods such as variance reduction, gradient sliding, and metric learning that accelerate performance in distributed, federated, and deep embedding learning scenarios.
- Practical applications span robust federated learning, imaging with SSIM, and quantum chemistry, where similarity-based acceleration significantly reduces communication and computation costs.
Optimization under similarity refers to a class of mathematical and algorithmic frameworks in which the central objective or constraints of an optimization problem are expressed, regularized, or accelerated leveraging various forms of similarity—between data, functions, or solutions. In both continuous and discrete optimization, this encompasses pairwise similarity optimization in representation learning, federated and decentralized optimization leveraging functional or Hessian similarity, similarity-aware communication-efficient algorithms in distributed optimization, empirical and statistical learning using function closeness, and several specialized applications where the notion of similarity guides design, analysis, and convergence.
1. Formal Definitions and Models of Similarity
A fundamental organizing principle across recent research is explicit quantification of similarity in operator, function, or Hessian space:
- Hessian-Relatedness and Second-order Similarity: In distributed or federated settings, data or function similarity is often formalized as a uniform or average bound on the deviation of local Hessians from the global average, i.e., for all and clients , with the similarity constant. In practice, such similarity arises under mild distributional assumptions (e.g., i.i.d. data splits), yielding (the usual smoothness constant). This notion underpins the acceleration of distributed optimization and is the driver behind communication gains (Bylinkin et al., 2024, Khaled et al., 2022, Lin et al., 2023, Tian et al., 2021, Takezawa et al., 6 Jun 2025, Zhou et al., 2024).
- Function Closeness—(ε, δ)-closeness: Huang & Wang define a two-parameter closeness relation for functions , writing if their suboptimality gaps transform tightly in both directions, specifically
and symmetrically. This provides a general framework for interpolating between different statistical learning and online optimization settings, and allows for transfer and unification of classical risk/concentration bounds in terms of closeness (Huang et al., 14 Jan 2025).
- Pairwise Similarity in Deep Embedding Learning: Pair similarity-based objectives, typified by maximizing within-class and minimizing between-class similarity metrics (e.g., cosine similarities between feature vectors or proxies), are structurally encoded in metric learning losses (e.g., Circle Loss, triplet loss). Optimization under such similarity is foundational for discriminative representation learning (Sun et al., 2020).
- Similarity Measures for Functions and Aggregations: Task-specific or operator-specific similarity notions—such as risk similarity in statistical learning, structural similarity in imaging (e.g., SSIM), proxy loss similarity for robust federated aggregation—enter as either the explicit objective or as a surrogate to facilitate accelerated computation (Otero et al., 2020, Gaucher et al., 3 Feb 2026, Naser et al., 2024).
2. Principle Methodologies Leveraging Similarity
Across application domains, similarity serves three primary algorithmic purposes: establishing sharper convergence bounds, enabling communication or computation reductions, or directly forming the optimization objective.
- Acceleration via Similarity in Distributed and Federated Optimization:
- Variance-Reduced and Sliding Methods: When -similarity holds, gradient sliding or variance-reduction techniques (e.g., SVRS, SVRP, AccSVRS, Catalyzed SVRP) exploit the fact that global and local objectives differ only in low curvature, thus allowing most computation to be performed locally with infrequent aggregation, yielding communication complexities of
for SVRS and its acceleration, respectively, where is the number of nodes and the strong convexity constant (Lin et al., 2023, Khaled et al., 2022). - Accelerated Proximal and Gradient Sliding: Composite-case acceleration (e.g., AGS, SC-AccExtragradient, AccVRCS) achieves per-group communication complexities reflecting each component's similarity parameter, e.g., rounds for frequent clients and for rare clients, when Hessian similarity is treated asymmetrically (Kovalev et al., 2022, Bylinkin et al., 13 Jan 2026, Takezawa et al., 6 Jun 2025). - Compression under Similarity: Communication cost can be further reduced by unbiased or biased gradient compression, and under -similarity, algorithms such as OLGA and EF-OLGA invoke variance reduction and error feedback to achieve complexities of or for unbiased and biased compressors, respectively ( denotes number of nodes) (Bylinkin et al., 2024).
Pair Similarity and Metric Learning Losses:
In deep embedding learning, classic objectives (triplet, softmax, ArcFace) can be unified as special cases of similarity optimization losses over pairs of within- and between-class similarities. Circle Loss introduces self-paced, adaptive weights:
where coefficients adaptively amplify poorly optimized similarities, leading to a circular decision boundary in the -plane—analytically guaranteeing sharper convergence behavior and empirically yielding state-of-the-art results in face recognition, re-ID, and fine-grained retrieval tasks (Sun et al., 2020).
Optimization Based on Structural or Task-specific Similarity:
- Imaging via SSIM: The Structural Similarity Index Measure (SSIM), widely used as a perceptual fidelity criterion, is employed as the core term in several imaging inverse problems. The corresponding optimization formulations—either constrained/quasiconvex or unconstrained/proximal—require specialized bisection, Newton-type, or ADMM-based solvers, leveraging the specific structure of SSIM and its gradients (Otero et al., 2020).
- Similarity-Transformed Hamiltonians in Quantum Chemistry: In many-body wave function optimization, the similarity transformation by Jastrow factors enables more effective orbital optimization by regularizing electron-electron cusp singularities. Self-consistent and one-shot schemes combining transcorrelated (TC) Hamiltonian optimization with variational Monte Carlo (VMC) iterations demonstrate systematic energy improvements in closed-shell atoms (Ochi, 2021).
- Similarity-Based Population or Solution Transformations:
- Metaheuristics with Multi-metric Solution Similarity: In nonconvex, black-box, or multi-objective optimization, algorithms such as SPINEX-Optimization build explicit similarity matrices (incorporating cosine, Pearson, Spearman, Euclidean kernels) across current populations, and use similarity-weighted linear transformations to both intensify local search and diversify global exploration. This approach is shown to achieve consistently high performance and robust explainability in benchmarking scenarios (Naser et al., 2024).
3. Analysis of Complexity, Communication, and Optimality
Formal exploitation of similarity constants allows optimization methods to break past traditional black-box or smoothness-governed complexity barriers:
| Method/Setting | Communication/Iteration Complexity | Similarity Parameter | Reference |
|---|---|---|---|
| SVRS (variance-reduction, finite sum) | (AveSS) | (Lin et al., 2023) | |
| AccSVRS (Katyusha-accelerated) | (AveSS) | (Lin et al., 2023) | |
| OLGA/EF-OLGA (compression+acceleration) | (Hessian) | (Bylinkin et al., 2024) | |
| Proximal/Gradient Sliding (AGS) | (Hessian) | (Kovalev et al., 2022) | |
| SPDO, ACC-SONATA (decentralized) | (Functional) | (Takezawa et al., 6 Jun 2025, Tian et al., 2021) | |
| Bregman Prox (arbitrary geometry) | (Operator) | (Beznosikov et al., 2023) |
In all cases, when similarity is strong (), the iteration or communication cost scales with or rather than , and the rates (often up to log factors) match known lower bounds for distributed and decentralized optimization under similarity (Kovalev et al., 2022, Beznosikov et al., 2023, Tian et al., 2021, Bylinkin et al., 2024). Acceleration is typically achieved through proximal-point, extragradient, or Nesterov-Alexanderov acceleration techniques, exploiting the geometry induced by function similarity.
4. Applications and Empirical Findings
Optimization under similarity underlies several key advances across distinct fields:
- Machine Learning (Deep Embedding and Metric Learning): Circle Loss brings finer control of pairwise similarity optimization, leading to empirically superior performance in face recognition, person re-identification, and fine-grained image retrieval benchmarks by ensuring embedding clusters concentrate at (Sun et al., 2020).
- Distributed/Federated Machine Learning: Methods exploiting similarity constants are validated on large-scale datasets (ridge/logistic regression, MNIST, CIFAR-10), consistently outperforming both accelerated but non-similarity-based and classical uncompressed baselines, notably in high-heterogeneity regimes. For example, OLGA and EF-OLGA show faster convergence in terms of communication-equivalent floating-point operations as decreases (Bylinkin et al., 2024).
- Decentralized and Networked Systems: Stabilized or accelerated decentralized frameworks (SPDO, ACC-SONATA) achieve order-of-magnitude improvements in communication rounds over networks, with precise gains dictated by network spectral gap and functional similarity, confirmed on logistic regression over non-i.i.d. data partitions (Takezawa et al., 6 Jun 2025, Tian et al., 2021).
- Imaging and Vision: SSIM-based optimization directly yields higher subjective fidelity in denoising, deblurring, and inpainting, especially when combined with convex regularization and solved via Newton-type or ADMM splitting (Otero et al., 2020).
- Robust and Byzantine-Resistant Optimization: The PIGS ("Prox Inexact Gradient under Similarity") framework enables robust federated learning under adversarial (Byzantine) regimes, using proxy losses with small Hessian deviation and robust gradient aggregation to achieve communication complexities of (Gaucher et al., 3 Feb 2026).
- Structured Data Joins and Query Optimization: By reframing similarity-join as preference-maximizing set selection, the need for costly threshold tuning is eliminated, and efficient incremental algorithms achieve optimal or near-optimal F1 scores with substantially reduced computation (Gao et al., 2017).
5. Theoretical and Practical Extensions
- Adaptivity and Geometry Awareness: Recent advances generalize similarity-exploiting methods to arbitrary problem geometries using Bregman distances and mirror-prox mappings, allowing for simplex, PSD cone, or manifold constraints with matched lower bounds in monotone VI problems (Beznosikov et al., 2023).
- Composite and Heterogeneous Objectives: Frameworks have been developed to handle federated learning settings with component-wise similarity (i.e., group-separable similarity for frequent/rare clients), with probability-tuned "sliding" between components for per-group optimality (Bylinkin et al., 13 Jan 2026).
- Statistical Learning and Online Optimization: The (ε, δ)-closeness framework unifies empirical risk minimization rates and dynamic regret bounds, and renders explicit the translation of sub-optimality from empirical to population objectives (Huang et al., 14 Jan 2025).
- Natural Gradient and Riemannian Optimization: Optimizing generic similarity measures between probability distributions induces a canonical metric tensor on parameter space, justifying and generalizing natural gradient methods to arbitrary divergence functions (Mallasto et al., 2019).
- Pareto-Efficient Multi-Objective Optimization: SPINEX-Optimization exemplifies the practical success of similarity-guided transformations for large-scale single, multi-, and many-objective search, yielding competitive scalability and explainability in complex benchmark suites (Naser et al., 2024).
6. Open Problems, Extensions, and Outlook
Several avenues remain active or open:
- Data-Driven or Dynamic Estimation of Similarity: Most optimality results assume known ; adaptively estimating or controlling similarity in real time is an open challenge.
- Beyond Second-order Similarity: Analysis and methods are being extended to capture higher-order (third-derivative) or more general functional similarities, as alluded to in recent decentralized optimization work (Takezawa et al., 6 Jun 2025).
- Unified, Black-box Lower Bounds: Theoretical lower bounds matching all aspects of communication and computation complexity under mixed similarity models remain incompletely characterized.
- Privacy and Robustness: Integration of privacy preservation and Byzantine-robustness for similarity-exploiting algorithms remains an area of rapid development (Gaucher et al., 3 Feb 2026).
- Nonconvex and Stochastic Regimes: Most analyses above apply to convex/strongly-convex objectives; nonconvex analogs, particularly for deep learning, are at the research frontier.
Optimization under similarity thus constitutes a multifaceted and strongly active research paradigm, unifying algorithmic acceleration, statistical efficiency, practical scalability, and robustness in modern large-scale learning and computational frameworks. The interplay between intrinsic problem similarity and computational architecture remains central for future advances.