Strategic Feature Selection

Published 17 Jun 2026 in cs.LG, cs.CY, and stat.ML | (2606.18867v1)

Abstract: When algorithmic predictors inform resource allocation in high-stakes domains such as healthcare, these predictors must account for strategic manipulation of input features. The typical solution is to redesign the predictor itself to explicitly account for strategic interactions. In practice, however, decision makers are often constrained to adjusting coarser levers within existing prediction pipelines. For example, healthcare organizations often select which features to exclude based on perceived manipulability, while using standard regularization procedures to shrink the coefficients of retained features. In this work, we initiate a formal study of strategic classification through feature selection and its interaction with ridge regularization. Our main finding is that excluding individual features based on their manipulability alone is generally suboptimal. We provide a fine-grained characterization of the performance of a feature subset under optimal regularization, yielding new insights for policy design. Motivated by this characterization, we develop a practical algorithm for jointly choosing the feature set and the level of ridge regularization. Through a real-world case study on a healthcare payments benchmark, we illustrate how our algorithm can guide the design of coarse policy levers in practice. Our results provide a principled, practical framework for mitigating the effects of strategic behavior in algorithmic decision-making systems.

Abstract PDF Upgrade to Chat

Authors (9)

Summary

The paper introduces a formal decomposition of strategic MSE into predictive loss, manipulability gain, and heterogeneity gap, clarifying how each trade-off affects model performance.
The study derives optimality gap bounds that guide the joint tuning of feature selection and ridge regularization, demonstrating near-optimal performance under specific manipulation cost conditions.
Empirical results from synthetic benchmarks and a Medicare Advantage simulation show that joint strategic feature selection can achieve up to a 40% reduction in post-manipulation MSE.

Strategic Feature Selection: A Formal Characterization of Joint Predictability and Manipulability

Motivation and Problem Setting

Strategic manipulation of input features in high-stakes algorithmic decision-making—such as risk adjustment in healthcare payments—compromises predictive validity and incentivizes undesirable behaviors like upcoding. Traditionally, adversarial robustness and strategic classification focus on selecting predictors that optimize performance against strategic agents. However, practical deployment is constrained: redesigning prediction pipelines is often infeasible due to institutional inertia. In real-world policy contexts, decision makers typically rely on coarse levers—feature selection and ridge regularization—to mitigate strategic manipulation. This paper formally analyzes the efficacy of these levers, specifically characterizing their interplay and the resultant performance gap relative to the strategic optimum.

Theoretical Contributions

Strategic MSE Decomposition

The paper develops a linear strategic learning framework with quadratic manipulation costs, modeling the agent's best response as $a^* = H^{-1} \theta$ . The strategic MSE after manipulation is decomposed into three principal components: predictive loss from omitting features, strategic burden relieved by restricting the support, and a heterogeneity gap induced by cost structure:

Predictive loss: $L_{\mathrm{pred}}(S)$ quantifies irrecoverable signal from dropped features.
Manipulability gain: $-(\Gamma([d]) - \Gamma(S))$ measures the reduction in strategic vulnerability.
Heterogeneity gap: $C_H(S)\delta_H(S)^2$ captures loss due to manipulation cost anisotropy among retained features.

This explicit decomposition rigorously demonstrates that feature selection based solely on manipulability is suboptimal. Optimal supports balance signal, manipulability, and cost geometry, a phenomenon illustrated in a detailed synthetic example.

Figure 1: The best subset balances predictive loss, manipulability gain, and heterogeneity gap—optimality is achieved by supports that exploit joint structure.

Optimality Gap Bounds

A lower bound on achievable strategic MSE is established, showing an irreducible gap between zero-intercept ridge estimators and the strategic optimum; this gap vanishes for high-cost manipulation directions (i.e., when $\theta^{*\top} H^{-1} \theta^*$ is small).

An upper bound shows conditions under which support-restricted ridge is near-optimal: when there exists a support with minimal predictive loss, sufficient reduction in strategic burden, and homogeneous manipulation costs ( $\delta_H(S) \approx 0$ ). Scalar ridge achieves the restricted oracle exactly when manipulation costs are isotropic—generalized ridge is only needed when retained features have heterogeneous costs. This is formalized both for isotropic and two-level cost regimes.

Policy Implications: Design Principles for Strategic Robustness

Joint Tuning of Feature Selection and Regularization

Empirical and theoretical results invalidate the heuristic of separately ranking features by manipulability or predictive value. The optimal support and regularization level are interdependent; regularization alters preferred supports and vice versa.

Figure 2: Manipulable groups with homogeneous costs can be retained and regularized—joint tuning with ridge yields strategic robustness.

Homogeneous Manipulable Groups and Proxy Features

Contrary to prevailing policy, groups of highly manipulable but homogeneous features may optimally be retained and aggressively regularized. Regularization cannot substitute for feature exclusion when cost heterogeneity is pronounced.

Less manipulable, correlated proxies can replace manipulable features without substantial predictive loss. As manipulability or feature correlation increases, supports optimally switch from direct to proxy features.

Figure 3: Strategic feature selection exploits proxy relationships—optimal supports transition to less manipulable substitutes as correlation increases.

Interior Solutions and Retention of Intensely Coded Features

Empirical benchmarks in Medicare Advantage payment confirm theory: blanket exclusion of diagnosis groups is not necessary for robustness. Retaining intensely coded, predictive HCCs under joint support-restricted ridge achieves lower post-manipulation MSE than both full-support models and heuristic exclusion.

Figure 4: Feature selection under optimal regularization retains predictive features—even intensely coded HCCs—without sacrificing strategic robustness.

Algorithmic Solutions

A computational pipeline for joint support and regularization selection is proposed: continuous-weight relaxation, rounding, and exact local refinement. Synthetic benchmarks demonstrate recovery of the exact oracle under combinatorial constraints, validating the algorithmic approach.

Figure 5: Weighted screen-and-refit achieves oracle strategic MSE across synthetic regimes—combining relaxation and local refinement is critical.

Case Study: Healthcare Payments

A simulation calibrated to real Medicare Advantage coding demonstrates the practicality of the framework and algorithm. Feature selection under optimal regularization achieves a 40% reduction in post-manipulation MSE relative to full ridge and outperforms prediction- and cost-only baselines.

Figure 6: Retained features reflect joint signal and manipulability—not all intensely coded groups are dropped; the optimal selection balances these properties.

Robustness Under Cost Uncertainty

Regularization and feature selection inherently reduce exposure to uncertain manipulation directions. Unlike intercept correction (optimal only if costs are precisely known), these levers yield smaller worst-case MSE under cost misspecification. Support restriction and shrinkage provide additional protection against estimation error in manipulation costs.

Practical and Theoretical Implications

The rigorous characterization reveals that robust strategic performance in algorithmic decision systems requires joint tuning of feature selection and regularization. Blanket exclusion policies, prevalent in resource allocation regimes, fail to exploit nuanced opportunities for balancing predictive utility and strategic exposure. The framework provides actionable guidance for policymakers and algorithm designers in regulated domains: structure-aware subset selection yields superior robustness and preserves signal.

Theoretically, the fine-grained decomposition offers new insights into manipulation-aware model selection, contrasting sharply with adversarial robustness paradigms focused on exogenous perturbations. The approach bridges institutional constraints with incentive-aware statistical learning.

Future Directions

Further research should extend the analysis to dynamic, multi-shot strategic interactions, explore robust optimization over cost uncertainty, and investigate grouped or per-feature regularization. Understanding endogenous deployment shifts and their impact on risk estimator tuning remains key for real-world deployments.

Conclusion

This work provides a principled, actionable framework for robust algorithmic design in strategic environments, establishing that optimal feature selection must be grounded in the joint structure of predictability and manipulability. The formal characterization and practical algorithm empower policymakers and researchers to advance incentive-aware machine learning beyond legacy heuristics.

References: "Strategic Feature Selection" (2606.18867).

Markdown Report Issue