Rectified Linear Complexity in ReLU Networks
- Rectified Linear Complexity is a metric that quantifies how the interplay of depth and width in ReLU networks governs the creation of affine (piecewise linear) regions.
- The analysis reveals that increased depth exponentially multiplies affine segments, thereby enhancing expressivity while imposing computational challenges.
- Theoretical results, including depth–size gap theorems and zonotope-based lower bounds, underscore the need for deep architectures to efficiently approximate complex functions.
Rectified Linear Complexity denotes the interplay among depth, width, and the number of affine (piecewise linear) regions in functions realized by @@@@1@@@@ with rectified linear units (ReLU-DNNs). It quantifies the expressivity of ReLU networks by measuring how their architecture governs the partitioning of input space into regions where the computed function is affine. The notion synthesizes structural and functional complexity of ReLU-DNNs and establishes formal lower bounds relating network architecture to function representation and training complexity (Arora et al., 2016).
1. Function Class and Complexity Measures
A ReLU-DNN with input dimension , output dimension , and hidden layers of widths implements functions:
where is affine for , is linear, and applies coordinate-wise as .
Key structural and functional measures:
| Term | Definition | Notation |
|---|---|---|
| Depth | Total number of layers, including output | |
| Width | Max hidden layer width | |
| Size | Sum of hidden units across layers | |
| Affine Pieces | Maximal connected regions on which is affine | Number of PWL regions |
Any ReLU-DNN computes a continuous piecewise linear (PWL) function. Conversely, every PWL function can be represented by a ReLU-DNN of depth . The number of affine regions, i.e., the cardinality of maximal connected input regions mapped affinely, serves as a fundamental complexity metric.
2. Global Optimization for One Hidden Layer
Empirical risk minimization over ReLU networks with one hidden layer and convex loss can be globally optimized as:
A globally optimal algorithm proceeds via:
- Writing each hidden unit as , .
- Partitioning data by sign of for all .
- Enumerating all sign and partition choices and all hyperplane partitions , with possible count .
- For each, solving the induced convex program in .
Total runtime:
This is polynomial in sample size for fixed but exponential in and , matching known computational hardness bounds.
3. Depth–Size Gap Theorems
The expressivity of ReLU-DNNs grows rapidly with increased depth compared to width or overall size. For integers , , there exists a 1D function such that:
- A -layer ReLU net of width represents .
- Any representation by a shallower -layer net with incurs a lower bound on required size:
Furthermore, for every , there exists a member of a smoothly-parameterized family (by ):
- is realized by a depth net of size .
- Any depth ReLU net computing requires at least:
The construction uses sawtooth-composed functions—composition amplifies the number of affine segments exponentially in depth.
4. Lower Bounds via Zonotope Constructions
A new lower bound for affine region count in ReLU-DNNs is established via the theory of zonotopes. For vectors , the zonotope is:
and its support function:
For general , has:
distinct affine pieces. can be implemented by a two-layer ReLU net of size $2m$.
Composition with a -fold sawtooth map yields a ReLU net of depth , size $2m + wk$, and number of segments:
Asymptotically, choosing or ,
For depth , matching this piece count requires size at least .
5. Synthesis and Implications of Rectified Linear Complexity
The composition of depth and width exponentially increases the count of affine regions:
- Depth acts as the exponential composition resource; each layer can multiply the region count.
- Width (or size) determines the parallel granularity per layer.
- The total number of affine pieces grows as in 1D or in .
The triplet forms a natural complexity measure—Rectified Linear Complexity—which encapsulates the expressive power of ReLU networks. Deeper networks attain exponential region growth with moderate width, while shallow networks require super-polynomial size for function approximation equivalence.
A plausible implication is that for function classes requiring exponentially many affine pieces, depth is indispensable for architectural efficiency. Furthermore, computational hardness in training aligns with the representational barriers: even for one hidden layer, the exponential increase in complexity with input dimensionality indicates that training algorithms are fundamentally limited by both representational and computational regimes (Arora et al., 2016).