Matrix-Valued modReLU in TMAF

Updated 23 February 2026

Matrix-Valued modReLU is a trainable extension that replaces elementwise activations with flexible matrix functions within the TMAF framework.
It computes activations by scaling complex or real vector pairs using a block-diagonal matrix whose entries depend on input magnitudes and bias parameters.
The approach ensures efficient gradient computation and numerical stability with piecewise-constant parameterization and small epsilon adjustments.

Matrix-Valued modReLU is a generalization of the modReLU activation function within the framework of trainable matrix activation functions (TMAF), as developed by Li, Liu, and Zikatanov. This approach extends the standard scalar, fixed activation procedures by allowing matrix-valued functions with entries parameterized and optimized during network training. For modReLU, the matrix-valued variant encodes the activation as a block-diagonal or more general matrix, with entries dependent on input magnitudes, enabling flexible nonlinearities and learnable parameterization.

1. Trainable Matrix Activation Functions (TMAF) Framework

In the TMAF approach, the standard elementwise scalar nonlinearity $\sigma(t)$ applied to neural activations is replaced. The activation becomes a matrix-vector product:

$\sigma_A(y) = D_A(y) y,$

where $D_A(y) \in \mathbb{R}^{n \times n}$ is a matrix whose entries are themselves functions (often piecewise-constant) of the input vector $y$ . The simplest construction is diagonal:

$D_A(y) = \mathrm{diag}\big(\alpha_1(y_1), \ldots, \alpha_n(y_n)\big),$

with each $\alpha_i : \mathbb{R} \to \mathbb{R}$ specified by a set of "knots" (thresholds $s_{i,j}$ ) and "levels" ( $t_{i,j}$ ), all of which are trainable. This matrix-valued activation can also assume block-diagonal or full-matrix forms, with entries depending on multivariate patterns from $y$ (Liu et al., 2021).

2. Matrix-Valued modReLU Construction

The modReLU activation, originally defined for complex inputs $z \in \mathbb{C}$ , takes the form:

$\sigma_{\mathrm{modReLU}}(z) = \begin{cases} \dfrac{\max(0, |z| + b)}{|z|} z, & z \neq 0, \ 0, & z = 0, \end{cases}$

where $b \in \mathbb{R}$ is a bias parameter. Within TMAF, layers are $\mathbb{R}^2$ -valued, representing $\mathrm{Re}\, z$ and $\mathrm{Im}\, z$ . Set $y = (x, y) \in \mathbb{R}^2$ , with $r = \sqrt{x^2 + y^2}$ . Define $u(r) = \max(0, r + b)/r$ ( $u(0) = 0$ for convention). The modReLU acts as:

$\sigma_{\mathrm{modReLU}}((x, y)) = u(r) \begin{pmatrix} x \ y \end{pmatrix}$

This can be rewritten as a matrix activation:

$D_A(y) = u(r) I_2,$

where $I_2$ is the $2\times 2$ identity. In effect, $D_A(y)$ is a block diagonal matrix scaling each complex pair, with $A = \{b\}$ . A piecewise formulation via thresholds $s_1 = -b$ and two levels $t_0 = 0$ , $t_1 = (r + b)/r$ reproduces modReLU exactly.

3. Forward Pass and Computational Mechanics

Consider a layer with $k$ complex-valued units, represented as $k$ real/imaginary pairs $(x_i, y_i)$ . The forward pass proceeds for $i = 1, \ldots, k$ :

Compute $r_i = \sqrt{x_i^2 + y_i^2}$ .
Compute scaling factor $\alpha_i = \max(0, r_i + b)/(r_i + \varepsilon)$ , with small $\varepsilon$ for numerical stability.
Apply scaling: $x^+_i = \alpha_i x_i$ , $y^+_i = \alpha_i y_i$ .

In matrix notation, with stacked vector $Y \in \mathbb{R}^{2k}$ ,

$\sigma_{\mathrm{modReLU}}(Y) = D(Y) Y$

where $D(Y)$ is block-diagonal with repeated scalars $\alpha_i$ on each $2\times 2$ block. This construction preserves the structure of the input domain and preserves equivariance under complex-phase rotations.

4. Gradient Computation and Backward Pass

For the modReLU block, the trainable parameter is $b$ . With loss function $L$ , its derivative with respect to $b$ for each input pair (when $r_i>0$ and $r_i+b>0$ ) is:

$\frac{\partial L}{\partial b} = \sum_{i=1}^k \left[ \delta x_i \frac{x_i}{r_i} + \delta y_i \frac{y_i}{r_i} \right] 1_{r_i + b > 0},$

where $\delta x_i = \frac{\partial L}{\partial x^+_i}$ and $\delta y_i = \frac{\partial L}{\partial y^+_i}$ . In the general TMAF context, gradients with respect to the piecewise-constant parameters ( $t_{i,j}, s_{i,j}$ ) follow analogous formulas, involving batch-wise summations conditioned on the activation region.

5. Integration into Deep Network Training

The modReLU matrix-valued activation integrates into common deep learning workflows:

Compute linear pre-activations: $Z = W X_\text{batch} + b$ .
Split $Z$ into real/imaginary components.
Compute radii $R$ for each input pair.
Calculate modReLU scaling factors $A = \mathrm{ReLU}(R + b_\text{modReLU}) / R$ .
Apply elementwise scaling to both real and imaginary parts.
Stack output and proceed through loss and optimizer updates.

In frameworks such as PyTorch, this is realized via standard tensor computations, and differentiation with respect to $b_\text{modReLU}$ is handled by autograd systems (Liu et al., 2021).

6. Computational Complexity and Numerical Stability

For each $\mathbb{R}^2$ pair, the activation performs one squaring, one addition, one square root, one division, and one comparison—constituting $O(1)$ overhead compared to ReLU. Modern parallel hardware mitigates the cost of $\mathrm{sqrt}$ and division. Numerical instabilities near $r=0$ are addressed by modifying $r$ to $r=\sqrt{x^2 + y^2 + \varepsilon^2}$ with a small $\varepsilon$ (typically $10^{-6}$ ), ensuring well-defined operations and correct handling at the origin. Piecewise-constant extensions for $\alpha$ scales impose only a small multiple overhead compared to standard ReLU, since all operations remain elementary.

7. Functional Generalization and Applicability

The matrix-valued modReLU within the TMAF framework enables trainable, data-adaptive nonlinearities beyond fixed-pointwise activations. While the two-region case with bias $b$ captures the original modReLU exactly, subdivision of the input norm axis and inclusion of more knots and levels enable fine-grained, piecewise-constant activations. The design provides a systematic mechanism for incorporating learnable matrix nonlinearities in neural architectures with complex-valued or real vector-valued input structure, while preserving computational simplicity and stability (Liu et al., 2021).

Markdown Report Issue Upgrade to Chat

References (1)

Neural networks with trainable matrix activation functions (2021)

Topic to Video (Beta)

No one has generated a video about this topic yet.

Whiteboard

No one has generated a whiteboard explanation for this topic yet.

Follow Topic

Get notified by email when new papers are published related to Matrix-Valued modReLU.