SHIFT: Steering Hidden Intermediates in Flow Transformers

Published 10 Apr 2026 in cs.CV | (2604.09213v1)

Abstract: Diffusion models have become leading approaches for high-fidelity image generation. Recent DiT-based diffusion models, in particular, achieve strong prompt adherence while producing high-quality samples. We propose SHIFT, a simple but effective and lightweight framework for concept removal in DiT diffusion models via targeted manipulation of intermediate activations at inference time, inspired by activation steering in LLMs. SHIFT learns steering vectors that are dynamically applied to selected layers and timesteps to suppress unwanted visual concepts while preserving the prompt's remaining content and overall image quality. Beyond suppression, the same mechanism can shift generations into a desired \emph{style domain} or bias samples toward adding or changing target objects. We demonstrate that SHIFT provides effective and flexible control over DiT generation across diverse prompts and targets without time-consuming retraining.

Abstract PDF Upgrade to Chat

Authors (3)

Summary

The paper introduces an activation-space steering framework that modulates hidden layers in MM-DiTs to suppress and modify semantic features without retraining.
It employs contrastive prompt pairs and SVM-based vectors to achieve 3-4× greater suppression of undesirable content while retaining image fidelity.
The approach demonstrates flexible concept erasure, style manipulation, and cross-model transferability, offering scalable safety and generative control.

SHIFT: Steering Hidden Intermediates in Flow Transformers

Introduction and Context

SHIFT introduces an activation-space steering framework for concept manipulation in large-scale Multimodal Diffusion Transformers (MM-DiTs), exemplified by FLUX.1[dev] and FLUX.1[schnell]. Building upon the paradigm of linear activation steering in LLMs, SHIFT circumvents the computational burdens and architectural limitations of retraining-centric concept erasure techniques, such as ESD, CA, and EAP, by enabling direct, inference-time interventions on the internal representations. This approach is motivated by the increasing integration of image and text semantics in unified Transformer-based generation architectures, where explicit interfaces for steering specific concepts are obfuscated relative to decoupled UNet models.

Method: Activation Steering in MM-DiTs

Pipeline Design

SHIFT leverages contrastive prompt pairs to construct datasets distinguishing the presence and absence of a target concept. Activations are sampled at the text encoder's pooled output and the text-token representations after MM-DiT backbone attention. Steering vectors, representing the mean or SVM-defined differential along the activation manifold between positive and negative samples, are computed. At inference, steering is performed by adding a scaled vector to the corresponding activations, with adaptive strength modulated by simple classifiers (e.g., SVM) and cosine similarity for the text encoder.

Figure 1: Overview of the SHIFT steering pipeline and intervention points in the diffusion transformer.

This design facilitates robust erasure and editing of semantic features without altering ground model weights, generalizing to both original and guidance-distilled DiTs. The approach supports both temporally-invariant and block-targeted interventions, enabling flexible allocation of steering across the diffusion process.

Choice of Steering Locations

Effective steering in MM-DiTs exploits the locations where semantic information is introduced and propagated. Experiments in FLUX variants demonstrate superior controllability when steering simultaneously at the text encoder (typically CLIP or analogous pooled vectors) and at selected MM-DiT blocks. Targeting only the text encoder is insufficient for nontrivial concept manipulation, confirming that impactful semantic control in unified architectures requires interventions at multiple, strategically selected sites.

Construction and Parameterization of Steering Vectors

For each concept and steering locus, paired prompt activations are gathered, and vectors representing the distinguishing activation differential are computed. Both simple mean-difference and SVM-hyperplane approaches are employed. Experiments confirm negligible sensitivity to variant parameterizations, though token-wise differences often maximize concept suppression.

Steering strengths are regularized with respect to residual concept evidence, as assessed by lightweight SVM classifiers applied to intermediate activations. This adaptive approach offers principled mitigation against oversteering, balancing semantic intervention and fidelity retention.

Empirical Results

Abstract and Concrete Concept Erasure

On the safety-critical I2P benchmark (nudity erasure), SHIFT demonstrates 3-4 $\times$ greater suppression of undesirable content relative to classical baselines (CA, EAP, UCE, ESD), achieving competitive or minimally degraded FID and CLIP metrics, and preserving generative diversity. Notably, steering vectors derived from a distilled model (FLUX.1[schnell]) remain efficacious when transferred to the non-distilled variant (FLUX.1[dev]), supporting strong claims of cross-model vector transferability.

Figure 2: SHIFT effectively suppresses safety-critical concepts compared to prior baseline methods in FLUX.1[schnell].

Figure 3: SHIFT demonstrates both strong erasure and cross-model steering vector transferability for concrete concepts in FLUX.1[dev] and FLUX.1[schnell].

In concrete concept erasure (e.g., Snoopy), SHIFT outperforms weight-editing competitors, yielding substantial target suppression with minimal CLIP score reduction for non-target classes and lower FID, notably surpassing Concept Ablation in both semantic isolation and visual quality.

Style and Small Object Steering

Conceptual steering is not limited to erasure: SHIFT supports style removal/addition, such as selectively erasing artistic styles (e.g., Van Gogh) or steering generations into styles like "cyberpunk" or "impressionist". Results illustrate that SHIFT suppresses specific styles while preserving attributes of related (non-targeted) artists, confirming its capacity for high-dimensional manifold navigation without global collapse.

Figure 4: Example of SHIFT applying style erasure for Van Gogh, outperforming ESD and UCE in selectivity and quality.

Figure 5: SHIFT selectively removes Van Gogh features while preserving other artists’ styles.

Small-scale object erasures (e.g., glasses, hats, lipstick) are also reliably achievable, further demonstrating the method’s precision.

Figure 6: SHIFT can remove localized visual concepts such as glasses, hats, and lipstick.

Add and Switch Concept Tasks, Stylization

SHIFT generalizes to adding or swapping concepts. By manipulating the activation manifold via steering vectors, the framework can robustly insert detailed attributes (e.g., smile, apple) into novel generations, or perform domain-to-domain edits (e.g., "banana $\rightarrow$ apple"), exhibiting task flexibility not attainable via block-level or prompt engineering alone.

Figure 7: Dataset construction for activation steering in “add concept” tasks, supporting attribute insertion such as hat or smile.

Figure 8: Block ablation for steering; concept addition is most effective with subsets of blocks rather than all.

Ablation Studies and Temporal Consistency

Extensive analysis reveals that steering vectors are temporally stable; a single vector, applied uniformly across all or early timesteps in the diffusion process, is sufficient for robust concept control. This stability reflects the encoding of semantic features as persistent, global directions in the DiT latent space. Block-local ablations further indicate that the efficacy of steering is block-dependent, with early blocks playing critical roles in concept formation.

Figure 9: Ablation for concrete concept steering; adjusting block/step assignments fine-tunes control specificity.

Implications and Theoretical Insights

SHIFT’s steering-based methodology represents a significant advance in practical, scalable generative model editing:

Efficiency and Flexibility: The lightweight, inference-time steering makes it feasible to modify large DiTs (up to 12B parameters) without retraining or architectural changes, scaling to deployment-relevant contexts.
Cross-Model Transfer: The observed invariance and cross-distillation transferability of steering vectors suggest that semantically grounded directions in DiT latent spaces are robust across model variants, with further implications for transfer learning, model distillation, and scalable moderation practices.
Limits of Unified Latent Manifolds: The empirical success of temporally uniform steering suggests that current DiT architectures may encode high-level concepts along simple, consistent manifolds. This opens directions for probing the geometry of generative latent spaces, understanding the interplay between disentanglement and controllability, and informing the design of models with stronger safety and interpretability guarantees.
Weaknesses for Image Editing: The paper explicitly distinguishes SHIFT from precise image editing; it enables semantic manifold steering, not pixel-level consistency. Thus, while powerful for global content and style modifications, it does not guarantee foreground/background invariance typical of inpainting or conditional editing.

Future Research Directions

Potential extensions involve geometric refinement of steering vector estimation, e.g., using more sophisticated contrastive representations, and exploration of higher-order manifold traversals. Interventions targeting compositional consistency and disentangled attribute control may further improve the decoupling of semantic features from scene structure, critical for editorial and safety applications in AI image synthesis. Additionally, theoretical investigation into the stability, locality, and communicative efficacy of steering directions within and across model architectures is warranted.

Conclusion

SHIFT advances the practical steering of generative latent spaces by leveraging linear activation interventions in MM-DiTs, providing an efficient, retraining-free framework for concept erasure, addition, and domain transfer. The demonstrated empirical performance, cross-distillation vector generality, and minimal trade-offs against output fidelity strongly support the utility of activation-based steering as a first-class tool for controllable synthesis and generative model alignment.

(2604.09213)

Markdown Report Issue