Papers
Topics
Authors
Recent
Assistant
AI Research Assistant
Well-researched responses based on relevant abstracts and paper content.
Custom Instructions Pro
Preferences or requirements that you'd like Emergent Mind to consider when generating responses.
GPT-5.1
GPT-5.1 96 tok/s
Gemini 3.0 Pro 52 tok/s Pro
Gemini 2.5 Flash 159 tok/s Pro
Kimi K2 203 tok/s Pro
Claude Sonnet 4.5 37 tok/s Pro
2000 character limit reached

Dynamic Activation Composition (Dyn)

Updated 17 November 2025
  • Dynamic Activation Composition (Dyn) is a neural network technique that combines adaptive activation mixing and dynamic steering to enhance model adaptability and control.
  • It utilizes trainable, normalized convex combinations of basis activations and contrastive steering vectors during decoding to optimize expressiveness and property-specific behavior.
  • Empirical evaluations show that Dyn improves performance in image classification and language generation tasks through robust multi-property conditioning with minimal fluency loss.

Dynamic Activation Composition (Dyn) refers to a class of neural network interventions in which activation functions or steering interventions are constructed or applied dynamically with trainable coefficients, or with decoding-time adaptive control, in order to enhance model expressiveness, interpretability, or property-specific behavior. Dyn has emerged as a unifying term for two distinct but related methodologies: (1) dynamic learned mixtures of basis activation functions, and (2) dynamic composition of steering directions with adaptive intensities in LLMs.

1. Formal Definition and Motivation

Dynamic Activation Composition denotes two principal mechanisms:

  1. Adaptive Activation Mixing: Each layer's nonlinearity is defined as a convex combination of several canonical activation functions, with learnable, normalized mixture weights per layer. Concretely, for base functions {fj(x)}j=1K\{f_j(x)\}_{j=1}^K and non-negative layer-wise weights wjw_j, the Dyn activation is

A(x)=j=1KPjfj(x),wherePj=wji=1KwiA(x) = \sum_{j=1}^K P_j\,f_j(x), \quad \text{where}\quad P_j = \frac{w_j}{\sum_{i=1}^K w_i}

The network learns both the usual feature weights and these activation mixture coefficients.

  1. Dynamic Steering in LLMs: During autoregressive decoding, steering vectors Ai(μ)A_i^{(\mu)}—computed via contrastive prompt pairs—are injected into attention head outputs with a stepwise scalar weight αi(μ)\alpha_i^{(\mu)} for property μ\mu. The steering intensity is not fixed but set dynamically per token based on information-theoretic contrast between distributions:

zi=zi+αi(μ)Ai(μ)z_i' = z_i + \alpha_i^{(\mu)}A_i^{(\mu)}

For multi-property steering, Dyn composes:

zi=zi+μ=1Mαi(μ)Ai(μ)z_i' = z_i + \sum_{\mu=1}^M \alpha_i^{(\mu)}A_i^{(\mu)}

The primary motivations are heightened adaptability to input distributions (activation mixing), and robust, minimally disruptive conditioning of model outputs (dynamic steering).

2. Mathematical Frameworks

2.1 Basis Activation Mixtures

Let KK candidate base activations, e.g., ReLU\mathrm{ReLU}, tanh\tanh, sin\sin. For each layer \ell:

  • Parameter vector: w=(w1,,wK)\mathbf w^\ell = (w_1^\ell,\dots,w_K^\ell)
  • Normalized coefficients: Pj=wj/(iwi)P_j^\ell = w_j^\ell / (\sum_i w_i^\ell)
  • Layer activation: A(x)=j=1KPjfj(x)A^\ell(x) = \sum_{j=1}^K P_j^\ell f_j(x)

Parameterization strategies:

  • wj=exp(sj)w_j^\ell = \exp(s_j^\ell) (unconstrained sjs_j^\ell)
  • Initialization: uniform weights or small random noise wjU(0.9,1.1)w_j^\ell \sim U(0.9, 1.1)

Joint optimization alternates between network weight updates and activation weight updates, typically using an Adam-based three-phase freezing/unfreezing schedule.

2.2 Dynamic Activation Steering (LLMs)

Activation steering in LLMs operates by adding steering vectors to intermediate activations. For step ii and property μ\mu, the steering intensity is inferred via KL-divergence between unsteered and strongly-steered next-token distributions, nucleus-filtered to top-pp tokens:

  • KL-guided weighting:

αi(μ)=min{KL(p~ip~i(μ),strong),αmax}\alpha_i^{(\mu)} = \min\big\{\mathrm{KL}(\tilde p_i\|\tilde p^{(\mu),\mathrm{strong}}_i), \alpha_{\max}\big\}

where p~i\tilde p_i and p~i(μ),strong\tilde p^{(\mu),\mathrm{strong}}_i are renormalized probabilities over the nucleus set QiQ_i.

Ai(μ)=vi+vi,vi+=1Kk=1Kf(Pk+,y<i),vi=1Kk=1Kf(Pk,y<i)A_i^{(\mu)} = v_i^+ - v_i^-, \quad v_i^+ = \frac{1}{K}\sum_{k=1}^K f(P_k^+, y_{<i}), \quad v_i^- = \frac{1}{K}\sum_{k=1}^K f(P_k^-, y_{<i})

The procedure enables both single- and multi-property steering.

3. Decoding and Optimization Procedures

3.1 Basis Activation Learning

Training schedule:

  • Epochs 1–10: Freeze activation weights {wj}\{w_j\}; optimize network weights Θ\Theta.
  • Epochs 11–20: Freeze Θ\Theta; optimize activation weights {wj}\{w_j\}.
  • Epochs 21–30: Freeze {wj}\{w_j\}; optimize Θ\Theta. No explicit 1\ell_1 or 2\ell_2 regularization on ww, beyond non-negativity and normalization constraints.

Pseudocode

1
2
3
4
5
6
7
8
9
10
11
12
13
14
for epoch = 1 to 10:
    freeze activation weights
    unfreeze network weights
    train network weights

for epoch = 11 to 20:
    freeze network weights
    unfreeze activation weights
    train activation weights

for epoch = 21 to 30:
    freeze activation weights
    unfreeze network weights
    train network weights

3.2 Dynamic Steering in LLMs

Dyn decoding pseudocode (high-level):

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
procedure DynDecode(prompt q, max_length M):
    y  []
    for i in 1M:
        logits = f.forward(q, y; injection=0)
        p_un = softmax(logits)
        for μ in 1M:
            logits_strongμ = f.forward(q, y; injection=α_max·A_i^(μ))
            p_strongμ = softmax(logits_strongμ)
            Q_i = top_p_tokens(p_un, p_top)  top_p_tokens(p_strongμ, p_top)
            p̄_unμ = renormalize(p_un over Q_i)
            p̄_strongμ = renormalize(p_strongμ over Q_i)
            α_i^(μ) = min(KL(p̄_unμ  p̄_strongμ), α_max)
        Δ_i = _{μ=1}^M α_i^(μ) · A_i^(μ)
        logits = f.forward(q, y; injection=Δ_i)
        p_final = softmax(logits)
        t_next = argmax p_final
        y.append(t_next)
    return y

4. Empirical Evaluations

4.1 Activation Mixing (Image Classification)

Empirical results (MNIST, FashionMNIST, KMNIST) reveal learned layer-wise preference for base activations:

  • First layer: P1P_1 (ReLU) 0.480.56\approx 0.48-0.56, network exhibits classical ReLU-like behavior.
  • Second/Third layer: Increasing reliance on tanh/sin, P3P_3 (sin) becomes dominant at depth, up to 0.92\approx 0.92.

Behavior by input range: For small xx, mixtures mimic LeakyReLU; for large xx, ReLU dominates due to unbounded growth.

4.2 Dynamic Steering in LLMs (Conditioned Generation)

Datasets: Alpaca QA (multi-lingual), BeaverTails (safety), GYAFC/XFORMAL (formality) Model: Mistral-7B-Instruct v0.2 Baselines: Start, Fixed, Dim schedules; In-Context Learning; noICL

Metrics:

  • Conditioning strength: property-specific confidence (langdetect, LLama Guard 2 8B, XLM-R classifier)
  • Fluency: Δ\Delta perplexity vs. ICL baseline

Results:

  • Dyn matches or outperforms static baselines, securing strong multi-property accuracy with minimal perplexity increase.
  • For multi-property prompts (“Italian + Unsafe”, “French + Informal”), Dyn delivers robust conditioning and fluency preservation, outperforming fixed and decaying schedules.
  • α\alpha-schedules spike at segment starts, decay as property is established.

5. Constraints, Limitations, and Extensions

Constraints:

  • Non-negativity and normalization of mixture weights.
  • KL divergence is capped at αmax\alpha_{\max}; nucleus filtering constrains contrast computation.

Limitations:

  • Activation mixing with only three basis functions omits newer forms (Swish, GELU).
  • LLM steering vector extraction hinges on synthetic/MT-generated parallel data; human-labeled corpora may offer refinement.
  • Only Mistral-7B was studied; broader architecture validation is pending.
  • Conditioning and fluency metrics are automatic and proxy-based; human-grade evaluation is desirable.
  • Training schedules for activation mixing introduce complexity.
  • Hyperparameters (ptopp_{\text{top}}, αmax\alpha_{\max}) are coarsely tuned.

Potential Extensions:

  • Activation mixing over larger dictionaries (Mish, ELU, SELU).
  • End-to-end, joint weight optimization with regularization on mixture entropy.
  • Investigate alternative steering vector extraction methods (probing-classifiers, PCA).
  • Adaptive, per-property or per-layer steering schedules.

6. Significance and Impact

Dynamic Activation Composition unifies advances in neural function expressiveness (layerwise activation blending) and controlled LLM output conditioning (robust multi-property steering), yielding models capable of task- or property-specific adaptation without manual schedule tuning or loss of output fluency. The Dyn paradigm demonstrates empirical gains—layerwise specialization in image tasks and robust multi-property conditioning in LLMs—using relatively modest architectural or computational changes, with promising prospects for generalization to richer activation sets, steering vectors, and broader model classes.

Dynamic activation mixing and steering are thematically allied with dynamic network parameterization, meta-learning, and neural network interpretability. Research parallels exist in dynamic composition in tree-structured models, wherein meta-networks synthesize composition functions at each parse node (Liu et al., 2017), further highlighting the utility of adaptive mixtures and dynamic, context-sensitive interventions. A plausible implication is that further exploration into both Dyn-style activation mixtures and dynamic steering may yield new directions for universal neural adaptation mechanisms, bridging fine-grained representation, robust output control, and minimal disruption to underlying model fluency.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (1)
Forward Email Streamline Icon: https://streamlinehq.com

Follow Topic

Get notified by email when new papers are published related to Dynamic Activation Composition (Dyn).