Papers
Topics
Authors
Recent
Search
2000 character limit reached

Rectified Linear Complexity in ReLU Networks

Updated 12 January 2026
  • Rectified Linear Complexity is a metric that quantifies how the interplay of depth and width in ReLU networks governs the creation of affine (piecewise linear) regions.
  • The analysis reveals that increased depth exponentially multiplies affine segments, thereby enhancing expressivity while imposing computational challenges.
  • Theoretical results, including depth–size gap theorems and zonotope-based lower bounds, underscore the need for deep architectures to efficiently approximate complex functions.

Rectified Linear Complexity denotes the interplay among depth, width, and the number of affine (piecewise linear) regions in functions realized by @@@@1@@@@ with rectified linear units (ReLU-DNNs). It quantifies the expressivity of ReLU networks by measuring how their architecture governs the partitioning of input space into regions where the computed function is affine. The notion synthesizes structural and functional complexity of ReLU-DNNs and establishes formal lower bounds relating network architecture to function representation and training complexity (Arora et al., 2016).

1. Function Class and Complexity Measures

A ReLU-DNN with input dimension w0w_0, output dimension wk+1w_{k+1}, and kk hidden layers of widths w1,,wkw_1, \dots, w_k implements functions:

f(x)=Tk+1σTkσT1(x)f(x) = T_{k+1} \circ \sigma \circ T_k \circ \cdots \circ \sigma \circ T_1(x)

where Ti:Rwi1RwiT_i: \mathbb{R}^{w_{i-1}} \rightarrow \mathbb{R}^{w_i} is affine for i=1ki=1\dots k, Tk+1T_{k+1} is linear, and σ\sigma applies coordinate-wise as σ(t)=max{0,t}\sigma(t) = \max\{0, t\}.

Key structural and functional measures:

Term Definition Notation
Depth Total number of layers, including output k+1k+1
Width Max hidden layer width max{w1,,wk}\max\{w_1, \dots, w_k\}
Size Sum of hidden units across layers i=1kwi\sum_{i=1}^k w_i
Affine Pieces Maximal connected regions on which ff is affine Number of PWL regions

Any ReLU-DNN computes a continuous piecewise linear (PWL) function. Conversely, every PWL function f:RnRf : \mathbb{R}^n \rightarrow \mathbb{R} can be represented by a ReLU-DNN of depth log2(n+1)+1\lceil \log_2(n+1)\rceil + 1. The number of affine regions, i.e., the cardinality of maximal connected input regions mapped affinely, serves as a fundamental complexity metric.

2. Global Optimization for One Hidden Layer

Empirical risk minimization over ReLU networks with one hidden layer and convex loss \ell can be globally optimized as:

minA,b,a1Dj=1D(aσ(Axj+b),yj)\min_{A, b, a'} \frac{1}{D} \sum_{j=1}^{D} \ell\left(a' \cdot \sigma(A x_j + b), y_j\right)

A globally optimal algorithm proceeds via:

  1. Writing each hidden unit ii as simax{0,a~ix+b~i}s_i \max\{0, \tilde{a}^i \cdot x + \tilde{b}_i\}, si{±1}s_i \in \{\pm 1\}.
  2. Partitioning data by sign of a~ixj+b~i\tilde{a}^i \cdot x_j + \tilde{b}_i for all ii.
  3. Enumerating all sign and partition choices (si)i=1w{±1}w(s_i)_{i=1}^w \in \{\pm1\}^w and all hyperplane partitions (P+i,Pi)(P^i_+, P^i_-), with possible count 2wDnw2^w D^{n w}.
  4. For each, solving the induced convex program in (a~i,b~i)(\tilde{a}^i, \tilde{b}_i).

Total runtime:

O(2wDnwpoly(D,n,w))O \left( 2^w D^{n w} \mathrm{poly}(D, n, w) \right)

This is polynomial in sample size DD for fixed n,wn, w but exponential in nn and ww, matching known computational hardness bounds.

3. Depth–Size Gap Theorems

The expressivity of ReLU-DNNs grows rapidly with increased depth compared to width or overall size. For integers k1k \geq 1, w2w \geq 2, there exists a 1D function ff such that:

  • A (k+1)(k+1)-layer ReLU net of width ww represents ff.
  • Any representation by a shallower (k+1)(k'+1)-layer net with k<kk' < k incurs a lower bound on required size:

size12kwk/k1\text{size} \geq \frac{1}{2} k' w^{k/k' - 1}

Furthermore, for every kNk \in \mathbb{N}, there exists a member of a smoothly-parameterized family (by M>0ΔMk31\bigcup_{M>0}\Delta_M^{k^3-1}):

  • ff is realized by a depth k2+1k^2 + 1 net of size k3k^3.
  • Any depth k+1k + 1 ReLU net computing ff requires at least:

12kk+11\frac{1}{2} k^{k+1} - 1

The construction uses sawtooth-composed functions—composition amplifies the number of affine segments exponentially in depth.

4. Lower Bounds via Zonotope Constructions

A new lower bound for affine region count in ReLU-DNNs is established via the theory of zonotopes. For vectors v1,,vmRnv_1,\dots,v_m \in \mathbb{R}^n, the zonotope Z(v1,,vm)Z(v_1,\dots,v_m) is:

Z(v1,,vm)={i=1mλivi:1λi1}Z(v_1,\dots,v_m) = \left\{ \sum_{i=1}^m \lambda_i v_i : -1 \leq \lambda_i \leq 1 \right\}

and its support function:

γZ(x)=maxzZ(v1,,vm)x,z=i=1mx,vi\gamma_Z(x) = \max_{z \in Z(v_1,\dots,v_m)} \langle x, z \rangle = \sum_{i=1}^m |\langle x, v_i \rangle|

For general viv_i, γZ\gamma_Z has:

verts(Z(v1,,vm))=i=0n1(m1i)|\mathrm{verts}(Z(v_1,\dots,v_m))| = \sum_{i=0}^{n-1} \binom{m-1}{i}

distinct affine pieces. γZ\gamma_Z can be implemented by a two-layer ReLU net of size $2m$.

Composition with a kk-fold sawtooth map Ht1,,tkH_{t_1,\dots,t_k} yields a ReLU net of depth k+2k+2, size $2m + wk$, and number of segments:

(i=0n1(m1i))wk\left( \sum_{i=0}^{n-1} \binom{m-1}{i} \right) w^k

Asymptotically, choosing mnm \gg n or mnm \approx n,

Ω(mn1wk)\Omega(m^{n-1} w^k)

For depth k+1\leq k+1, matching this piece count requires size at least Ω((m1)(n1)/kwk/k)\Omega((m-1)^{(n-1)/k'} w^{k/k'}).

5. Synthesis and Implications of Rectified Linear Complexity

The composition of depth and width exponentially increases the count of affine regions:

  • Depth acts as the exponential composition resource; each layer can multiply the region count.
  • Width (or size) determines the parallel granularity per layer.
  • The total number of affine pieces grows as widthdepth\text{width}^{\text{depth}} in 1D or (i=0n1(m1i))widthdepth(\sum_{i=0}^{n-1} \binom{m-1}{i}) \, \text{width}^{\text{depth}} in Rn\mathbb{R}^n.

The triplet (depth,width,number of affine regions)(\text{depth}, \text{width}, \text{number of affine regions}) forms a natural complexity measure—Rectified Linear Complexity—which encapsulates the expressive power of ReLU networks. Deeper networks attain exponential region growth with moderate width, while shallow networks require super-polynomial size for function approximation equivalence.

A plausible implication is that for function classes requiring exponentially many affine pieces, depth is indispensable for architectural efficiency. Furthermore, computational hardness in training aligns with the representational barriers: even for one hidden layer, the exponential increase in complexity with input dimensionality indicates that training algorithms are fundamentally limited by both representational and computational regimes (Arora et al., 2016).

Definition Search Book Streamline Icon: https://streamlinehq.com
References (1)

Topic to Video (Beta)

Whiteboard

No one has generated a whiteboard explanation for this topic yet.

Follow Topic

Get notified by email when new papers are published related to Rectified Linear Complexity.