Gradient Aligned Regression via Pairwise Losses (2402.06104v3)
Abstract: Regression is a fundamental task in machine learning that has garnered extensive attention over the past decades. The conventional approach for regression involves employing loss functions that primarily concentrate on aligning model prediction with the ground truth for each individual data sample. Recent research endeavors have introduced novel perspectives by incorporating label similarity to regression via imposing extra pairwise regularization on the latent feature space and demonstrated the effectiveness. However, there are two drawbacks for those approaches: i) their pairwise operation in latent feature space is computationally more expensive than conventional regression losses; ii) it lacks of theoretical justifications behind such regularization. In this work, we propose GAR (Gradient Aligned Regression) as a competitive alternative method in label space, which is constituted by a conventional regression loss and two pairwise label difference losses for gradient alignment including magnitude and direction. GAR enjoys: i) the same level efficiency as conventional regression loss because the quadratic complexity for the proposed pairwise losses can be reduced to linear complexity; ii) theoretical insights from learning the pairwise label difference to learning the gradient of the ground truth function. We limit our current scope as regression on the clean data setting without noises, outliers or distributional shifts, etc. We demonstrate the effectiveness of the proposed method practically on two synthetic datasets and on eight extensive real-world tasks from six benchmark datasets with other eight competitive baselines. Running time experiments demonstrate the superior efficiency of the proposed GAR over existing methods with pairwise regularization in latent feature space and ablation studies demonstrate the effectiveness of each component for GAR.
- Preface: Regression methods based on or techniques and computational aspects in management, economics and finance. Annals of Operations Research, 306:1–6, 2021.
- A simple framework for contrastive learning of visual representations. In International conference on machine learning, pages 1597–1607. PMLR, 2020.
- A survey and taxonomy of loss functions in machine learning. arXiv preprint arXiv:2301.05579, 2023.
- Fast and accurate deep network learning by exponential linear units (elus). arXiv preprint arXiv:1511.07289, 2015.
- Modeling wine preferences by data mining from physicochemical properties. Decision support systems, 47(4):547–553, 2009.
- Machine learning in drug discovery: a review. Artificial Intelligence Review, 55(3):1947–1999, 2022.
- Y Dodge. Least absolute deviation regression. The Concise Encyclopedia of Statistics, pages 299–302, 2008.
- Systematic identification of genomic markers of drug sensitivity in cancer cells. Nature, 483(7391):570–575, 2012.
- Ranksim: Ranking similarity regularization for deep imbalanced regression. In International Conference on Machine Learning, pages 7634–7649. PMLR, 2022.
- Kam Hamidieh. Superconductivty Data. UCI Machine Learning Repository, 2018. DOI: https://doi.org/10.24432/C53P47.
- Long short-term memory. Neural computation, 9(8):1735–1780, 1997.
- A framework and benchmark for deep batch active learning for regression. Journal of Machine Learning Research, 24(164):1–81, 2023.
- Peter J Huber. Robust estimation of a location parameter. In Breakthroughs in statistics: Methodology and distribution, pages 492–518. Springer, 1992.
- Conr: Contrastive regularizer for deep imbalanced regression. arXiv preprint arXiv:2309.06651, 2023.
- Adam: A method for stochastic optimization. arXiv preprint arXiv:1412.6980, 2014.
- Adrien Marie Legendre. Nouvelles méthodes pour la détermination des orbites des comètes: avec un supplément contenant divers perfectionnemens de ces méthodes et leur application aux deux comètes de 1805. Courcier, 1806.
- Sgdr: Stochastic gradient descent with warm restarts. arXiv preprint arXiv:1608.03983, 2016.
- Agedb: the first manually collected, in-the-wild age database. In proceedings of the IEEE conference on computer vision and pattern recognition workshops, pages 51–59, 2017.
- Abalone. UCI Machine Learning Repository, 1995. DOI: https://doi.org/10.24432/C55C7W.
- Ordinal regression with multiple output cnn for age estimation. In Proceedings of the IEEE conference on computer vision and pattern recognition, pages 4920–4928, 2016.
- Online robust regression via sgd on the l1 loss. Advances in Neural Information Processing Systems, 33:2540–2552, 2020.
- Balanced mse for imbalanced visual regression. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 7926–7935, 2022.
- Dex: Deep expectation of apparent age from a single image. In Proceedings of the IEEE international conference on computer vision workshops, pages 10–15, 2015.
- Walter Rudin et al. Principles of mathematical analysis, volume 3. McGraw-hill New York, 1976.
- Learning representations by back-propagating errors. nature, 323(6088):533–536, 1986.
- Multi-task regression using minimal penalties. The Journal of Machine Learning Research, 13(1):2773–2812, 2012.
- Machine learning modeling of superconducting critical temperature. npj Computational Materials, 4(1):29, 2018.
- Robert Tibshirani. Regression shrinkage and selection via the lasso. Journal of the Royal Statistical Society Series B: Statistical Methodology, 58(1):267–288, 1996.
- Andrey Nikolayevich Tikhonov et al. On the stability of inverse problems. In Dokl. akad. nauk sssr, volume 39, pages 195–198, 1943.
- Accurate telemonitoring of parkinson’s disease progression by non-invasive speech tests. Nature Precedings, pages 1–1, 2009.
- Attention is all you need. Advances in neural information processing systems, 30, 2017.
- Denny Wu and Ji Xu. On the optimal weighted ℓ2subscriptℓ2\ell_{2}roman_ℓ start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT regularization in overparameterized linear regression. Advances in Neural Information Processing Systems, 33:10112–10123, 2020.
- Delving into deep imbalanced regression. In International Conference on Machine Learning, pages 11842–11851. PMLR, 2021.
- Rank-n-contrast: Learning continuous representations for regression. In Thirty-seventh Conference on Neural Information Processing Systems, 2023.
- Improving deep regression with ordinal entropy. arXiv preprint arXiv:2301.08915, 2023.
- A machine learning approach for air quality prediction: Model regularization and optimization. Big data and cognitive computing, 2(1):5, 2018.
- Label distributionally robust losses for multi-class classification: Consistency, robustness and adaptivity. In International Conference on Machine Learning, pages 43289–43325. PMLR, 2023.
- Hui Zou. The adaptive lasso and its oracle properties. Journal of the American statistical association, 101(476):1418–1429, 2006.
- Regularization and variable selection via the elastic net. Journal of the Royal Statistical Society Series B: Statistical Methodology, 67(2):301–320, 2005.