- The paper introduces feature weighting using ridge regression coefficients to enhance representativeness and diversity in sample selection.
- It demonstrates that this approach consistently reduces RMSE and increases correlation coefficients across various benchmark datasets.
- The method is versatile, benefiting both single-task and multi-task regression and is applicable to both linear and nonlinear models.
Feature Weighting in Pool-Based Sequential Active Learning for Regression
Motivation and Context
Active learning for regression (ALR) seeks to minimize labeling effort by sequentially querying the most valuable samples from an unlabeled pool, enabling accurate model construction under budget constraints. In pool-based sequential ALR, sample selection is typically determined by informativeness, representativeness, and diversity, the latter two relying on inter-sample distance calculations. However, standard approaches treat all features equally in the distance metric, introducing problems of bias from feature scale, dominance of irrelevant variables, and unit inconsistency. These deficiencies result in suboptimal sample selection strategies.
Proposed Methodology
The paper introduces a principled enhancement to ALR sample selection for both single-task and multi-task scenarios: explicitly weighting features by their importance as estimated from ridge regression coefficients trained on currently labeled data. The procedure applies weighting before inter-sample distance calculations and cluster assignments, ensuring unit, scale, and relevance invariance.
Five methods are introduced:
- FW-RD: Improves the representativeness-diversity (RD) method by using feature-weighted clustering in both initialization and sequential sample selection.
- FW-GSx: Refines greedy sampling in input space by replacing standard Euclidean distances with feature-weighted distances for diversity maximization.
- FW-iGS: Extends greedy sampling to both input and output space, incorporating feature weights into dual-space distance calculations.
- FW-MT-GSx and FW-MT-iGS: Generalize FW-GSx and FW-iGS to multi-task regression, using task-specific regression weights and composite sample distances over all tasks.
For each approach, the weights are derived from the fitted regression coefficients after each labeling iteration. Clustering, distance computation, and selection steps are performed in the feature-weighted space.
Experimental Evaluation
Single-Task ALR
Experiments on 11 benchmark datasets demonstrate consistent improvements from feature weighting across three classes of methods (FW-RD, FW-GSx, FW-iGS), regardless of feature redundancy, correlation, or initialization size. Aggregate metrics (AUC-normalized RMSE and CC) confirm that FW-iGS dominates unweighted competitors, with average RMSE reductions and higher CCs.
Notably, feature weighting maintains method ordering; FW-iGS outperforms FW-GSx and FW-RD, mirroring the original iGS's dominance without weighting. The enhancement is robust to the regularization parameter λ and resilient to the introduction of correlated or irrelevant features.
Performance gains are preserved when using nonlinear regression models (e.g., regression trees) for prediction; feature weighting from linear and nonlinear model coefficients are equally effective, making the approach broadly applicable regardless of predictor architecture.
Multi-Task ALR
On multi-task datasets (VAM, Energy Efficiency), FW-MT-GSx and FW-MT-iGS achieve lower RMSE and higher CC than their unweighted counterparts, with FW-MT-iGS yielding the best performance due to its more comprehensive diversity criteria. Feature weighting consistently improves selection, regardless of the balance or relative importance across tasks.
Implications
Practical Implications
The use of ridge regression-derived feature weighting in inter-sample-distance calculations leads to more robust, scale-invariant, and relevance-aware sample selection in ALR. The improvements are easy to implement and compatible with both linear and nonlinear models, making them immediately practical for real-world applications in regression with expensive label acquisition, including affective computing, energy prediction, and industrial process monitoring.
Feature weighting mitigates issues from redundant or irrelevant features and is insensitive to feature scaling. Initialization choice and the number of labeled samples have little influence on the relative benefit of feature weighting, which can be decoupled from initial sample selection strategies.
Theoretical Implications
The results empirically validate the hypothesis that feature importance should be incorporated into representativeness and diversity metrics in ALR. The approach enables richer integration of model-dependent information into the active learning loop, potentially stimulating future work on theoretical guarantees relating to convergence, sample efficiency, and statistical consistency.
Extensions
The methodology is not limited to pool-based ALR for regression. It can be applied to stream-based ALR settings and classification tasks where inter-sample distance is used for diversity/representativeness estimation. However, methods relying solely on informativeness (uncertainty, disagreement) do not directly benefit, as feature weighting impacts only distance-based selection.
Conclusion
Incorporating feature weighting based on regression coefficients into pool-based sequential ALR algorithms yields consistent performance improvements in both single-task and multi-task settings, across a variety of datasets and model architectures. The approach addresses fundamental biases in sample selection, ensuring scale, unit, and redundancy invariance in distance metrics. Broader application to stream-based and classification ALR settings is straightforward wherever distance computations drive the selection protocol. The empirical evidence supports the adoption of feature weighting as a standard enhancement in practical ALR systems. Future avenues include theoretical analysis, integration into stream-based frameworks, and the study of sample weighting in cluster-based batch selection.