Matrix-Valued modReLU in TMAF
- Matrix-Valued modReLU is a trainable extension that replaces elementwise activations with flexible matrix functions within the TMAF framework.
- It computes activations by scaling complex or real vector pairs using a block-diagonal matrix whose entries depend on input magnitudes and bias parameters.
- The approach ensures efficient gradient computation and numerical stability with piecewise-constant parameterization and small epsilon adjustments.
Matrix-Valued modReLU is a generalization of the modReLU activation function within the framework of trainable matrix activation functions (TMAF), as developed by Li, Liu, and Zikatanov. This approach extends the standard scalar, fixed activation procedures by allowing matrix-valued functions with entries parameterized and optimized during network training. For modReLU, the matrix-valued variant encodes the activation as a block-diagonal or more general matrix, with entries dependent on input magnitudes, enabling flexible nonlinearities and learnable parameterization.
1. Trainable Matrix Activation Functions (TMAF) Framework
In the TMAF approach, the standard elementwise scalar nonlinearity applied to neural activations is replaced. The activation becomes a matrix-vector product:
where is a matrix whose entries are themselves functions (often piecewise-constant) of the input vector . The simplest construction is diagonal:
with each specified by a set of "knots" (thresholds ) and "levels" (), all of which are trainable. This matrix-valued activation can also assume block-diagonal or full-matrix forms, with entries depending on multivariate patterns from (Liu et al., 2021).
2. Matrix-Valued modReLU Construction
The modReLU activation, originally defined for complex inputs , takes the form:
where is a bias parameter. Within TMAF, layers are -valued, representing and . Set , with . Define ( for convention). The modReLU acts as:
This can be rewritten as a matrix activation:
where is the identity. In effect, is a block diagonal matrix scaling each complex pair, with . A piecewise formulation via thresholds and two levels , reproduces modReLU exactly.
3. Forward Pass and Computational Mechanics
Consider a layer with complex-valued units, represented as real/imaginary pairs . The forward pass proceeds for :
- Compute .
- Compute scaling factor , with small for numerical stability.
- Apply scaling: , .
In matrix notation, with stacked vector ,
where is block-diagonal with repeated scalars on each block. This construction preserves the structure of the input domain and preserves equivariance under complex-phase rotations.
4. Gradient Computation and Backward Pass
For the modReLU block, the trainable parameter is . With loss function , its derivative with respect to for each input pair (when and ) is:
where and . In the general TMAF context, gradients with respect to the piecewise-constant parameters () follow analogous formulas, involving batch-wise summations conditioned on the activation region.
5. Integration into Deep Network Training
The modReLU matrix-valued activation integrates into common deep learning workflows:
- Compute linear pre-activations: .
- Split into real/imaginary components.
- Compute radii for each input pair.
- Calculate modReLU scaling factors .
- Apply elementwise scaling to both real and imaginary parts.
- Stack output and proceed through loss and optimizer updates.
In frameworks such as PyTorch, this is realized via standard tensor computations, and differentiation with respect to is handled by autograd systems (Liu et al., 2021).
6. Computational Complexity and Numerical Stability
For each pair, the activation performs one squaring, one addition, one square root, one division, and one comparison—constituting overhead compared to ReLU. Modern parallel hardware mitigates the cost of and division. Numerical instabilities near are addressed by modifying to with a small (typically ), ensuring well-defined operations and correct handling at the origin. Piecewise-constant extensions for scales impose only a small multiple overhead compared to standard ReLU, since all operations remain elementary.
7. Functional Generalization and Applicability
The matrix-valued modReLU within the TMAF framework enables trainable, data-adaptive nonlinearities beyond fixed-pointwise activations. While the two-region case with bias captures the original modReLU exactly, subdivision of the input norm axis and inclusion of more knots and levels enable fine-grained, piecewise-constant activations. The design provides a systematic mechanism for incorporating learnable matrix nonlinearities in neural architectures with complex-valued or real vector-valued input structure, while preserving computational simplicity and stability (Liu et al., 2021).