Primitive-Specific MLP Architecture
- Primitive-specific MLP is a feedforward neural classifier that decomposes complex tasks into simple geometric primitives, enabling modular and interpretable design.
- It employs MLP-algebra operations such as union, intersection, and complement to systematically synthesize complex decision boundaries from basic subnetworks.
- The approach demonstrates practical applications like XOR classification, offering mathematical guarantees and controlled network capacity through hierarchical construction.
A primitive-specific multilayer perceptron (MLP) is a feed-forward neural architecture systematically constructed by decomposing complex classification tasks into geometric primitives, training dedicated MLPs for these primitive regions, then algebraically combining these subnetworks to yield a tailored, interpretable classifier. The resulting architecture and its construction principles are grounded in the formalism of MLP-algebra, which provides a suite of closed, compositional operations for synthesizing complex decision boundaries from a finite library of primitive MLPs (Peng, 2017).
1. The Formal Framework: Universe of MLPs and Primitives
The universe comprises all feed-forward MLPs with layers, arbitrary finite widths, and (unless otherwise specified) sigmoid activations. Each network is specified by its layer dimensions , its collection of weight matrices , and threshold vectors . The forward map is realized by iterating .
A network is designated as "primitive" if it is trainable to act as the characteristic function of a simple geometric set. Canonical examples include:
- Half-space: (single-layer perceptron).
- Ball/disk: (two-layer: ).
- Axis-aligned boxes: 0 (hidden layer size 1).
- Cartesian products and unions of lower-dimensional primitives.
For a concrete task, a finite library 2 of such primitives is selected to serve as the building blocks for higher-complexity MLP construction.
2. Algebraic Operations for MLP Composition
The MLP-algebra defines a set of operators that act on compatible networks (i.e., with matching input spaces and structure) to produce new networks also in 3. These include:
- Complement 4: For 1D-output MLP 5, the complement 6 is constructed by negating the final-layer weights and thresholds, thereby inverting the decision boundary via the identity 7.
- Sum (Union) 8: Combines two L-layer, scalar-output MLPs so that the resulting network represents the union of the individual decision regions. The final layer applies a sharp sigmoid with weights 9 and threshold 0, enforcing logical OR.
- Multi-Sum 1: Generalizes the union operator to 2 networks, stacking hidden layers accordingly.
- Difference 3: Set difference realized algebraically as 4.
- 5-Product (Intersection) 6: Forms the intersection of the positive regions of two compatible MLPs; constructed using block-diagonal combining at each hidden layer and weight 7, threshold 8 in the final layer.
- Multi-9-Product 0: Multi-way intersection.
- Component Extraction 1: Extracts the 2-th output neuron from a multi-label MLP, yielding a binary classifier.
- 3-Product 4: Concatenates two binary classifiers into a 2-label MLP by direct-sum of hidden layers and stacking the outputs.
- Identical Extension 5: Appends an identity mapping layer (linear or ReLU) to align depth for algebraic summing or multiplication with deeper networks.
These operations are closed in 6, enabling hierarchical design without leaving the universe of valid MLPs (Peng, 2017).
3. Key Algebraic Properties
Networks with 1D output under the fundamental operators obey a commutative algebra:
| Property | Formal Statement | Intuitive Explanation |
|---|---|---|
| Commutativity | 7; 8 | Order of union/intersection does not affect result |
| Associativity | 9, etc. | Grouping does not affect the output decision boundary |
| Distributivity | 0 (up to scaling/thresholds) | Logical AND distributes over OR at the decision level |
| Involution | 1 | Double complement restores the original decision boundary |
Closure is immediate from the recursive definitions of weights, thresholds, and dimensions. Associativity and commutativity arise from block-diagonal and concatenation symmetries. Distributivity is approximate, depending on the sharpness parameter 2; exactness can be enforced by further SGD fine-tuning. Involution of complement is algebraic (Peng, 2017).
4. Construction Methodology and Pseudocode
Building a primitive-specific MLP proceeds as follows:
- For each label 3 in 4, partition the label-specific data 5 into regions approximately corresponding to primitives in 6 (e.g., via clustering or geometric heuristics).
- Train an MLP for each primitive region 7—selecting the appropriate template (half-space, ball, box) and fine-tuning as a characteristic function.
- Combine the label-specific primitive nets by multi-sum (union): 8.
- Merge all 9 label-specific nets by the 0-product to yield a 1-output classifier.
- Optionally, fine-tune the assembled MLP on the full labeled set for several epochs to enhance logical gating.
Pseudocode (appearing verbatim in the source):
7 Regions embedded in projected subspaces can be re-aligned using the 2-product operation (Peng, 2017).
5. Practical Strategies for Primitive Selection and Capacity
Choosing optimal primitives:
- For clusters near-linearly separable, use half-space MLPs.
- For "blob-like" clusters, employ ball/disk templates (3).
- For box-shaped clusters, use axis-aligned box MLPs (hidden layer size 4).
- If primitive fitting is nontrivial, approximate with unions of basic shapes.
Architectural and learning considerations:
- Depth increments by one whenever a sum or 5-product is applied, corresponding to an added combining layer.
- Width increases via block-diagonal concatenation as networks are joined.
- The sharpness parameter 6 modulates how Boolean-like the logical gates become; higher 7 yields sharper transitions, with initial value 8 suggested, tuned as needed.
- After construction, re-fine-tuning using SGD on the recombined data often sharpens logical implementation and mitigates softness in composite gates.
6. Illustrative Example: Constructing XOR in 9
For the classic XOR problem on 0, the positive region is the union of two quadrants:
- 1 : 2
- 3 : 4
Primitive networks:
- 5: half-space 6 (one-layer)
- 7: half-space 8
Compositional construction:
- 9 via 0 (intersection)
- 1 via 2
- Union 3
The final binary XOR network is 4. The explicit architecture includes stacking the hidden representations, forming logical AND via sharp sigmoid (5), and summing for the logical OR. SGD fine-tuning for a few epochs enables the MLP to achieve 6 accuracy on XOR inputs. This construction demonstrates the systematic decomposition and algebraic recombination at the heart of the MLP-algebra approach (Peng, 2017).
7. Significance and Interpretability
By leveraging primitive-specific construction, the resulting MLP encodes the logical structure of the classification task explicitly in its architecture. The compositionality of the framework ensures transparency: internal subnetworks correspond to interpretable geometric regions. For datasets that admit decompositions into simple primitives, this approach offers a systematic, design-theoretic alternative to unguided end-to-end training, with capacity and architectural complexity controlled through algebraic operations and hyperparameters. The method provides provable guarantees about the underlying logical form of the resulting MLP, unique among network construction methodologies (Peng, 2017).