Double-Bounded Optimal Transport for Advanced Clustering and Classification (2401.11418v1)
Abstract: Optimal transport (OT) is attracting increasing attention in machine learning. It aims to transport a source distribution to a target one at minimal cost. In its vanilla form, the source and target distributions are predetermined, which contracts to the real-world case involving undetermined targets. In this paper, we propose Doubly Bounded Optimal Transport (DB-OT), which assumes that the target distribution is restricted within two boundaries instead of a fixed one, thus giving more freedom for the transport to find solutions. Based on the entropic regularization of DB-OT, three scaling-based algorithms are devised for calculating the optimal solution. We also show that our DB-OT is helpful for barycenter-based clustering, which can avoid the excessive concentration of samples in a single cluster. Then we further develop DB-OT techniques for long-tailed classification which is an emerging and open problem. We first propose a connection between OT and classification, that is, in the classification task, training involves optimizing the Inverse OT to learn the representations, while testing involves optimizing the OT for predictions. With this OT perspective, we first apply DB-OT to improve the loss, and the Balanced Softmax is shown as a special case. Then we apply DB-OT for inference in the testing process. Even with vanilla Softmax trained features, our extensive experimental results show that our method can achieve good results with our improved inference scheme in the testing stage.
- Barycenters in the Wasserstein space. SIAM Journal on Mathematical Analysis, 43(2): 904–924.
- The k-means algorithm: A comprehensive survey and performance evaluation. Electronics, 9(8): 1295.
- Wasserstein GAN. ICML.
- Sliced optimal partial transport. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 13681–13690.
- Iterative Bregman projections for regularized transportation problems. SIAM Journal on Scientific Computing, 37(2): A1111–A1138.
- Bregman, L. M. 1967. The relaxation method of finding the common point of convex sets and its application to the solution of problems in convex programming. USSR computational mathematics and mathematical physics, 7(3): 200–217.
- Free boundaries in optimal transport and Monge-Ampere obstacle problems. Annals of mathematics, 673–730.
- Learning imbalanced datasets with label-distribution-aware margin loss. Advances in neural information processing systems, 32.
- Unsupervised learning of visual features by contrasting cluster assignments. Advances in neural information processing systems, 33: 9912–9924.
- Partial optimal tranport with applications on positive-unlabeled learning. Advances in Neural Information Processing Systems, 33: 2903–2913.
- A simple framework for contrastive learning of visual representations. In International conference on machine learning, 1597–1607. PMLR.
- Discrete Probabilistic Inverse Optimal Transport. In International Conference on Machine Learning, 3925–3946. PMLR.
- Scaling algorithms for unbalanced optimal transport problems. Mathematics of Computation, 87(314): 2563–2609.
- Spherical optimal transportation. Computer-Aided Design, 115: 181–193.
- Class-balanced loss based on effective number of samples. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, 9268–9277.
- Cuturi, M. 2013. Sinkhorn distances: Lightspeed computation of optimal transportation distances. arXiv preprint arXiv:1306.0895.
- Figalli, A. 2010. The optimal partial transport problem. Archive for rational mechanics and analysis, 195(2): 533–560.
- Improved Training of Wasserstein GANs. In NIPS.
- Learning from imbalanced data. IEEE Transactions on knowledge and data engineering, 21(9): 1263–1284.
- Deep residual learning for image recognition. In Proceedings of the IEEE conference on computer vision and pattern recognition, 770–778.
- Open graph benchmark: Datasets for machine learning on graphs. Advances in neural information processing systems, 33: 22118–22133.
- Otlda: A geometry-aware optimal transport approach for topic modeling. Advances in Neural Information Processing Systems, 33: 18573–18582.
- Decoupling representation and classifier for long-tailed recognition. arXiv preprint arXiv:1910.09217.
- Kantorovich, L. 1942. On the transfer of masses (in Russian). In Doklady Akademii Nauk, volume 37, 227–229.
- Supervised contrastive learning. Advances in neural information processing systems, 33: 18661–18673.
- M2m: Imbalanced classification via major-to-minor translation. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, 13896–13905.
- Learning Multiple Layers of Features from Tiny Images. Technical report, Citeseer.
- Léonard, C. 2012. From the Schrödinger problem to the Monge–Kantorovich problem. Journal of Functional Analysis, 262(4): 1879–1920.
- Learning to match via inverse optimal transport. Journal of machine learning research, 20.
- Improving generative adversarial networks via adversarial learning in latent space. Advances in Neural Information Processing Systems, 35: 8868–8881.
- Optimal entropy-transport problems and a new Hellinger–Kantorovich distance between positive measures. Inventiones mathematicae, 211(3): 969–1117.
- Focal loss for dense object detection. In Proceedings of the IEEE international conference on computer vision, 2980–2988.
- Large-scale long-tailed recognition in an open world. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, 2537–2546.
- Long-tail learning via logit adjustment. arXiv preprint arXiv:2007.07314.
- Influence-balanced loss for imbalanced visual classification. In Proceedings of the IEEE/CVF International Conference on Computer Vision, 735–744.
- Balanced meta-softmax for long-tailed visual recognition. Advances in neural information processing systems, 33: 4175–4186.
- Graph matching via optimal transport. arXiv preprint arXiv:2111.05366.
- Understanding and Generalizing Contrastive Learning from the Inverse Optimal Transport Perspective. in ICML.
- Relative Entropic Optimal Transport: a (Prior-aware) Matching Perspective to (Unbalanced) Classification. In Thirty-seventh Conference on Neural Information Processing Systems.
- Inverse optimal transport. SIAM Journal on Applied Mathematics, 80(1): 599–619.
- Equalization loss for long-tailed object recognition. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, 11662–11671.
- Understanding the behaviour of contrastive loss. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2495–2504.
- An improved K-Means clustering algorithm. In 2011 IEEE 3rd international conference on communication software and networks, 44–46. IEEE.
- Learning Combinatorial Embedding Networks for Deep Graph Matching. In ICCV, 3056–3065.
- Neural graph matching network: Learning lawler’s quadratic assignment problem with extension to hypergraph and multiple-graph matching. IEEE Transactions on Pattern Analysis and Machine Intelligence.
- A linear optimal transportation framework for quantifying and visualizing variations in sets of images. International journal of computer vision, 101(2): 254–269.
- Wilson, A. G. 1969. The use of entropy maximising models, in the theory of trip distribution, mode split and route split. Journal of transport economics and policy, 108–126.
- Adversarial robustness under long-tailed distribution. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, 8659–8668.
- Gromov-wasserstein learning for graph matching and node embedding. In International conference on machine learning, 6932–6941. PMLR.
- Deep long-tailed learning: A survey. arXiv preprint arXiv:2110.04596.
- Deep long-tailed learning: A survey. IEEE Transactions on Pattern Analysis and Machine Intelligence.