Delving into Deep Imbalanced Regression (2102.09554v2)

Published 18 Feb 2021 in cs.LG, cs.AI, and cs.CV

Abstract: Real-world data often exhibit imbalanced distributions, where certain target values have significantly fewer observations. Existing techniques for dealing with imbalanced data focus on targets with categorical indices, i.e., different classes. However, many tasks involve continuous targets, where hard boundaries between classes do not exist. We define Deep Imbalanced Regression (DIR) as learning from such imbalanced data with continuous targets, dealing with potential missing data for certain target values, and generalizing to the entire target range. Motivated by the intrinsic difference between categorical and continuous label space, we propose distribution smoothing for both labels and features, which explicitly acknowledges the effects of nearby targets, and calibrates both label and learned feature distributions. We curate and benchmark large-scale DIR datasets from common real-world tasks in computer vision, natural language processing, and healthcare domains. Extensive experiments verify the superior performance of our strategies. Our work fills the gap in benchmarks and techniques for practical imbalanced regression problems. Code and data are available at https://github.com/YyzHarry/imbalanced-regression.

Citations (226)

View on Semantic Scholar

Summary

The paper defines deep imbalanced regression (DIR) by highlighting challenges in handling continuous targets where traditional classification methods fail.
The paper presents Label Distribution Smoothing (LDS) and Feature Distribution Smoothing (FDS) to adjust label and feature distributions for improved performance.
The paper demonstrates these methods’ effectiveness on five benchmark datasets, significantly reducing errors especially in underrepresented regions.

Delving into Deep Imbalanced Regression

The paper addresses the challenges of Deep Imbalanced Regression (DIR), which arises when dealing with continuous targets and imbalanced data. This issue is prevalent in numerous applications across computer vision, NLP, and healthcare. Traditional methods mainly focus on categorical imbalance, making them unsuitable for continuous label spaces where class boundaries are non-existent.

Key Contributions

DIR Definition and Challenges: DIR is defined as dealing with imbalanced continuous targets, requiring extrapolation and interpolation across the target space to generalize over the entire range. DIR differs from classification due to the absence of hard class boundaries and the meaningful distances between targets.
Proposed Methods: The authors propose two techniques:
- Label Distribution Smoothing (LDS): This leverages kernel density estimation to estimate the effective imbalance considering label continuity. It smooths empirical label densities, providing a more accurate imbalance reflection for regression tasks.
- Feature Distribution Smoothing (FDS): This technique adjusts feature statistics by using kernel smoothing over target bins. It addresses the continuity in feature space, compensating for biased estimates due to data imbalance.
Benchmark Datasets: The authors curate five DIR datasets spanning various domains, providing a comprehensive evaluation platform. This contribution fills the gap in benchmarking for imbalanced regression problems.
Extensive Experiments: Results across the introduced datasets verify the effectiveness of the proposed methods. LDS and FDS significantly improve performance, especially in medium and few-shot regions, often outperforming existing techniques even when integrated with them.

Numerical Results and Insights

The proposed methods collectively achieve notable error reductions, with consistent performance gains across varied datasets like IMDB-WIKI-DIR and STS-B-DIR, among others.
Notably, the combination of LDS and FDS outperformed conventional techniques in several scenarios, substantially reducing errors in underrepresented areas.

Practical and Theoretical Implications

Practical Implications: The methods can be seamlessly integrated into existing deep learning pipelines. Their application in real-world tasks offers improved accuracy and reliability in systems dealing with imbalanced continuous data.
Theoretical Implications: This work advances the understanding of imbalanced learning, highlighting the need for tailored strategies for continuous targets. It opens up discussions on new directions to handle data imbalance beyond traditional re-weighting and sampling.

Future Directions

The exploration of DIR ignites interest in further optimizing learning algorithms for continuous targets under imbalanced settings. Potential developments include adaptive methods for settings with varying degrees of target continuity and imbalance dynamics. Additionally, extending these approaches to unsupervised or semi-supervised learning paradigms could enrich their applicability.

In conclusion, this paper offers significant insights and contributions to the field of imbalanced regression, providing robust methods and a foundation for future research in handling continuous targets in real-world scenarios. The curated datasets and thorough experiments enhance the understanding of DIR challenges, setting a benchmark for subsequent studies.

PDF Markdown

Related Papers

GitHub

GitHub - YyzHarry/imbalanced-regression: [ICML 2021, Long Talk] Delving into Deep Imbalanced Regression (833 stars)