Papers
Topics
Authors
Recent
Search
2000 character limit reached

Matrix-Valued modReLU in TMAF

Updated 23 February 2026
  • Matrix-Valued modReLU is a trainable extension that replaces elementwise activations with flexible matrix functions within the TMAF framework.
  • It computes activations by scaling complex or real vector pairs using a block-diagonal matrix whose entries depend on input magnitudes and bias parameters.
  • The approach ensures efficient gradient computation and numerical stability with piecewise-constant parameterization and small epsilon adjustments.

Matrix-Valued modReLU is a generalization of the modReLU activation function within the framework of trainable matrix activation functions (TMAF), as developed by Li, Liu, and Zikatanov. This approach extends the standard scalar, fixed activation procedures by allowing matrix-valued functions with entries parameterized and optimized during network training. For modReLU, the matrix-valued variant encodes the activation as a block-diagonal or more general matrix, with entries dependent on input magnitudes, enabling flexible nonlinearities and learnable parameterization.

1. Trainable Matrix Activation Functions (TMAF) Framework

In the TMAF approach, the standard elementwise scalar nonlinearity σ(t)\sigma(t) applied to neural activations is replaced. The activation becomes a matrix-vector product:

σA(y)=DA(y)y,\sigma_A(y) = D_A(y) y,

where DA(y)Rn×nD_A(y) \in \mathbb{R}^{n \times n} is a matrix whose entries are themselves functions (often piecewise-constant) of the input vector yy. The simplest construction is diagonal:

DA(y)=diag(α1(y1),,αn(yn)),D_A(y) = \mathrm{diag}\big(\alpha_1(y_1), \ldots, \alpha_n(y_n)\big),

with each αi:RR\alpha_i : \mathbb{R} \to \mathbb{R} specified by a set of "knots" (thresholds si,js_{i,j}) and "levels" (ti,jt_{i,j}), all of which are trainable. This matrix-valued activation can also assume block-diagonal or full-matrix forms, with entries depending on multivariate patterns from yy (Liu et al., 2021).

2. Matrix-Valued modReLU Construction

The modReLU activation, originally defined for complex inputs zCz \in \mathbb{C}, takes the form:

σmodReLU(z)={max(0,z+b)zz,z0, 0,z=0,\sigma_{\mathrm{modReLU}}(z) = \begin{cases} \dfrac{\max(0, |z| + b)}{|z|} z, & z \neq 0, \ 0, & z = 0, \end{cases}

where bRb \in \mathbb{R} is a bias parameter. Within TMAF, layers are R2\mathbb{R}^2-valued, representing Rez\mathrm{Re}\, z and Imz\mathrm{Im}\, z. Set y=(x,y)R2y = (x, y) \in \mathbb{R}^2, with r=x2+y2r = \sqrt{x^2 + y^2}. Define u(r)=max(0,r+b)/ru(r) = \max(0, r + b)/r (u(0)=0u(0) = 0 for convention). The modReLU acts as:

σmodReLU((x,y))=u(r)(x y)\sigma_{\mathrm{modReLU}}((x, y)) = u(r) \begin{pmatrix} x \ y \end{pmatrix}

This can be rewritten as a matrix activation:

DA(y)=u(r)I2,D_A(y) = u(r) I_2,

where I2I_2 is the 2×22\times 2 identity. In effect, DA(y)D_A(y) is a block diagonal matrix scaling each complex pair, with A={b}A = \{b\}. A piecewise formulation via thresholds s1=bs_1 = -b and two levels t0=0t_0 = 0, t1=(r+b)/rt_1 = (r + b)/r reproduces modReLU exactly.

3. Forward Pass and Computational Mechanics

Consider a layer with kk complex-valued units, represented as kk real/imaginary pairs (xi,yi)(x_i, y_i). The forward pass proceeds for i=1,,ki = 1, \ldots, k:

  • Compute ri=xi2+yi2r_i = \sqrt{x_i^2 + y_i^2}.
  • Compute scaling factor αi=max(0,ri+b)/(ri+ε)\alpha_i = \max(0, r_i + b)/(r_i + \varepsilon), with small ε\varepsilon for numerical stability.
  • Apply scaling: xi+=αixix^+_i = \alpha_i x_i, yi+=αiyiy^+_i = \alpha_i y_i.

In matrix notation, with stacked vector YR2kY \in \mathbb{R}^{2k},

σmodReLU(Y)=D(Y)Y\sigma_{\mathrm{modReLU}}(Y) = D(Y) Y

where D(Y)D(Y) is block-diagonal with repeated scalars αi\alpha_i on each 2×22\times 2 block. This construction preserves the structure of the input domain and preserves equivariance under complex-phase rotations.

4. Gradient Computation and Backward Pass

For the modReLU block, the trainable parameter is bb. With loss function LL, its derivative with respect to bb for each input pair (when ri>0r_i>0 and ri+b>0r_i+b>0) is:

Lb=i=1k[δxixiri+δyiyiri]1ri+b>0,\frac{\partial L}{\partial b} = \sum_{i=1}^k \left[ \delta x_i \frac{x_i}{r_i} + \delta y_i \frac{y_i}{r_i} \right] 1_{r_i + b > 0},

where δxi=Lxi+\delta x_i = \frac{\partial L}{\partial x^+_i} and δyi=Lyi+\delta y_i = \frac{\partial L}{\partial y^+_i}. In the general TMAF context, gradients with respect to the piecewise-constant parameters (ti,j,si,jt_{i,j}, s_{i,j}) follow analogous formulas, involving batch-wise summations conditioned on the activation region.

5. Integration into Deep Network Training

The modReLU matrix-valued activation integrates into common deep learning workflows:

  1. Compute linear pre-activations: Z=WXbatch+bZ = W X_\text{batch} + b.
  2. Split ZZ into real/imaginary components.
  3. Compute radii RR for each input pair.
  4. Calculate modReLU scaling factors A=ReLU(R+bmodReLU)/RA = \mathrm{ReLU}(R + b_\text{modReLU}) / R.
  5. Apply elementwise scaling to both real and imaginary parts.
  6. Stack output and proceed through loss and optimizer updates.

In frameworks such as PyTorch, this is realized via standard tensor computations, and differentiation with respect to bmodReLUb_\text{modReLU} is handled by autograd systems (Liu et al., 2021).

6. Computational Complexity and Numerical Stability

For each R2\mathbb{R}^2 pair, the activation performs one squaring, one addition, one square root, one division, and one comparison—constituting O(1)O(1) overhead compared to ReLU. Modern parallel hardware mitigates the cost of sqrt\mathrm{sqrt} and division. Numerical instabilities near r=0r=0 are addressed by modifying rr to r=x2+y2+ε2r=\sqrt{x^2 + y^2 + \varepsilon^2} with a small ε\varepsilon (typically 10610^{-6}), ensuring well-defined operations and correct handling at the origin. Piecewise-constant extensions for α\alpha scales impose only a small multiple overhead compared to standard ReLU, since all operations remain elementary.

7. Functional Generalization and Applicability

The matrix-valued modReLU within the TMAF framework enables trainable, data-adaptive nonlinearities beyond fixed-pointwise activations. While the two-region case with bias bb captures the original modReLU exactly, subdivision of the input norm axis and inclusion of more knots and levels enable fine-grained, piecewise-constant activations. The design provides a systematic mechanism for incorporating learnable matrix nonlinearities in neural architectures with complex-valued or real vector-valued input structure, while preserving computational simplicity and stability (Liu et al., 2021).

Definition Search Book Streamline Icon: https://streamlinehq.com
References (1)

Topic to Video (Beta)

Whiteboard

No one has generated a whiteboard explanation for this topic yet.

Follow Topic

Get notified by email when new papers are published related to Matrix-Valued modReLU.