Inference-Time Feature Manipulation

Updated 23 November 2025

Inference-time feature manipulation is a paradigm that intervenes in feature processing during deployment without retraining, enhancing privacy, efficiency, and robustness.
Adaptive techniques like sequential feature selection and network bending enable precise control over model activations with empirical evidence showing minimal use of sensitive features.
Post-hoc methods such as alignment vectors and robust feature inference offer flexible, cost-effective control over outputs while maintaining strong model performance across domains.

Inference-time feature manipulation refers to any intervention or procedure that modifies, selects, steers, or otherwise conditions the features used—or their representation, routing, or activation—in a machine learning system strictly during test-time or deployment, without altering the underlying trained model’s parameters or requiring further end-to-end training. This paradigm encompasses a spectrum of methodologies, ranging from adaptive feature selection to explicit steering of neural activations and output-space statistics. The principal motivations include privacy-utility tradeoffs, computational efficiency, privacy preservation, interpretability, robustness to distributional shift or adversarial perturbation, alignment with external preferences, and flexible post hoc control.

1. Core Concepts and Motivations

Inference-time feature manipulation is motivated by scenarios where test-time constraints diverge from those encountered at training. Common drivers include:

Data minimization: Only a subset of features may be necessary to reproduce the model’s prediction to within a desired tolerance, enabling reduced privacy leakage or effort, as in personalized deployment settings with sensitive attributes (Tran et al., 2023).
Cost-sensitive deployment: Feature acquisition may entail non-trivial costs (monetary, latency, risk), motivating joint optimization over predictive and acquisition costs at test time (Maguedong-Djoumessi, 2013).
Adversarial robustness: Actively projecting or selecting features to suppress sensitivity to adversarial directions (Singh et al., 2023).
Task-specific precision and reliability: Fine-grained steering of transformer network activations to modulate outputs in safety-critical or subjective-dimensioned tasks (Darm et al., 18 Mar 2025, Shahriar et al., 24 Oct 2024).
Quality-preserving watermarking: Embedding signals into model outputs by only selecting among candidates with suitable feature statistics, without modifying model weights or logits (Yu et al., 11 Aug 2025).
Feature-based structural or semantic control: Manipulating internal representations for creative or analytical purposes, e.g., in image or sequence generation (Broad et al., 2020, Novack et al., 22 Jan 2024).

This class of techniques is characterized by its strict avoidance of model retraining or weight updates, instead relying on feature-level interventions, selection, or routing dictated by the inference-time context.

2. Sequential and Adaptive Feature Selection

A central application is the adaptive, instance-wise selection of a minimal subset of input features during inference, while retaining the target model’s predictive fidelity. (Tran et al., 2023) formalizes this as finding for each input $x=(x_1,\dots,x_d)$ the smallest subset $R$ of potentially sensitive features such that the classifier output with only $x_R$ (and public features $x_P$ ) matches that of the full-feature model with probability at least $1-\delta$ (core set). The sequential MinDRel algorithm greedily and adaptively reveals features that maximally reduce predictive uncertainty as quantified via entropy of the marginal prediction. At each round, the unrevealed feature with the highest expected entropy reduction is revealed, and the process halts once the prediction is statistically certain according to a $\delta$ -core set criterion.

This strategy yields sharp privacy-utility tradeoffs: empirical results demonstrate that, depending on tolerance, only $0$– $10\%$ of sensitive features are typically required to achieve full-feature accuracy, and the algorithm’s entropy-based selector outperforms coefficient-magnitude or random heuristics for data minimization in tabular domains such as finance, healthcare, and recruitment (Tran et al., 2023).

A related approach for cost-sensitive scenarios is model reframing via feature context change (Maguedong-Djoumessi, 2013), optimizing the deployment-time subset of features to minimize a weighted combination of misclassification and acquisition cost (joint cost, $JC$ ). Here, the optimal subset is determined by evaluating performance and test cost on a validation set for different subsets (using quadratic-time approximations, e.g., backward feature elimination), constructing a JROC (joint ROC) plot to visualize the cost-tradeoff frontier.

3. Direct Feature and Representation Manipulation

Inference-time feature manipulation also operates at the internal feature or activation level. In generative models, “network bending” (Broad et al., 2020) introduces deterministic transformation layers at selected depths of a pre-trained generator (e.g., StyleGAN2). These transformations (scaling, translation, thresholding, spatial warps) are applied to select feature channels or activation clusters identified by unsupervised spatial clustering, enabling users to control semantically meaningful aspects of the generated output (e.g., increasing eye size or altering background texture in images) by simply editing activation values or their spatial structure at inference.

Transformer-based models have adopted similar techniques: interventions in the activations of specialized attention heads can reliably steer outputs. For example, shifting attention head outputs along directions that separate correct vs. incorrect task reasoning enables precise, knob-like control over outputs, with interventions discovered by hierarchical search across heads and layers (Darm et al., 18 Mar 2025). These methods enable precision–recall tradeoffs unattainable via prompt or fine-tuning alone, with each intervention strength controlling the model’s “cautiousness,” and selection of minimal head-sets reducing collateral effects.

4. Post-hoc Alignment, Robustness, and Control

Inference-time manipulation extends to post-hoc alignment and robustification. Alignment Vectors (AVs) (Shahriar et al., 24 Oct 2024) are vector-valued differences between base and fine-tuned models on specific preference axes (e.g., expert levels within domains). At inference, the base model’s parameters are offset by user-chosen linear combinations of AVs for personalized domain behavior, admitting multidimensional alignment “knobs.” This technique achieves a $50\%$ reduction in inference cost relative to prompt engineering, maintains strong performance across domains and levels, and is highly transferable across different preference axes.

For adversarial robustness, Robust Feature Inference (RFI) (Singh et al., 2023) projects penultimate-layer features onto the principal eigenspace of empirical feature covariance, justified by a lower bound on the certified robustness within generalized additive models. This approach selects, post-training, the most stable/robust subspace and modifies only the last-layer weights accordingly, providing consistent, non-trivial improvements in adversarial accuracy without added inference-time computational burden.

5. Feature Manipulation for Communication, Scheduling, and Remote Inference

In remote inference systems, joint optimization of feature transmission (e.g., buffer length, sequence portion, or temporal freshness) and predictive performance is structurally a form of inference-time feature manipulation (Shisher et al., 2023). Here, under bandwidth constraints, the system dynamically selects which sequence window (“feature” in the temporal sense) to transmit and when, trading off Age-of-Information (AoI) and feature richness to minimize overall expected inference error. Sophisticated scheduling (Semi-Markov policies, multi-arm bandits) compute offline value functions and deploy optimal policies or resource allocation strategies in real time, achieving orders-of-magnitude improvement in average inference error under communication constraints.

6. Manipulation through Output-Space Feature Steering and Post-hoc Sampling

Steering model outputs at inference can also be realized through post-hoc feature constraints applied to generated outputs (rather than model internals). SAEMark (Yu et al., 11 Aug 2025) watermarks texts generated by API-access LLMs using rejection sampling: $N$ candidate continuations are generated per segment, features (extracted by a deterministic Sparse Autoencoder) are computed for each, and the candidate whose summary statistic is closest to a key-derived target is selected. This black-box, feature-guided approach enables watermarking without any access to model logits or parameters while maintaining naturalness and strong detection guarantees.

In diffusion models, DITTO (Novack et al., 22 Jan 2024) frames inference-time manipulation as direct gradient-based optimization of initial noise latents, with gradients flowing through the entire generation chain to minimize differentiable feature extraction loss (e.g., musical structure, melody, intensity). This enables flexible, training-free editing, interpolation, or control in generative music without ever retraining the model.

7. Analysis, Evaluation, and Practical Considerations

Empirical evaluation across domains reveals that inference-time feature manipulation methods can preserve or enhance predictive performance, privacy, efficiency, controllability, and robustness:

Data minimization algorithms (e.g., MinDRel (Tran et al., 2023)) select minimal sensitive features for full-accuracy predictions, with as little as $10\%$ of features revealed.
Network bending (Broad et al., 2020) demonstrates semantically coherent and stable edits in image generative models, with fidelity to intended changes and no degradation from omitted retraining.
Transformer head interventions (Darm et al., 18 Mar 2025) reach $100\%$ precision on graph reasoning with select heads, adjustable by per-head intervention strengths and robust to temperature/layering.
Post-hoc alignment (Shahriar et al., 24 Oct 2024) achieves multidomain, continuous, cost-efficient alignment with negligible inference overhead.
Robustness projections (Singh et al., 2023) deliver behavioral guarantees (theoretical and empirical) without additional computation.
Communication-inference co-design (Shisher et al., 2023) provides online scheduling policies with $10^4\times$ lower inference error than classic precedence policies.

Resource and implementation factors include:

Quadratic-time approximations for subset selection (Maguedong-Djoumessi, 2013) enable scaling to moderate feature counts.
Post-processing techniques (RFI, AV editing) have $O(pC)$ – $O(P)$ complexity and introduce negligible inference latency.
Sampling-based methods (SAEMark) leverage parallel best-of- $N$ decoding, feasible in modern batched inference.
All approaches are agnostic to the original model, enabling seamless integration with off-the-shelf or closed-source systems.

References

"Data Minimization at Inference Time" (Tran et al., 2023)
"Feature Importance Explanations for Temporal Black-Box Models" (Sood et al., 2021)
"Network Bending: Expressive Manipulation of Deep Generative Models" (Broad et al., 2020)
"DITTO: Diffusion Inference-Time T-Optimization for Music Generation" (Novack et al., 22 Jan 2024)
"Robust Feature Inference: A Test-time Defense Strategy using Spectral Projections" (Singh et al., 2023)
"Inference-Time Intervention in LLMs for Reliable Requirement Verification" (Darm et al., 18 Mar 2025)
"Inference time LLM alignment in single and multidomain preference spectrum" (Shahriar et al., 24 Oct 2024)
"SAEMark: Multi-bit LLM Watermarking with Inference-Time Scaling" (Yu et al., 11 Aug 2025)
"Learning and Communications Co-Design for Remote Inference Systems: Feature Length Selection and Transmission Scheduling" (Shisher et al., 2023)
"Model Reframing by Feature Context Change" (Maguedong-Djoumessi, 2013)