Primitive-Specific MLP Architecture

Updated 6 May 2026

Primitive-specific MLP is a feedforward neural classifier that decomposes complex tasks into simple geometric primitives, enabling modular and interpretable design.
It employs MLP-algebra operations such as union, intersection, and complement to systematically synthesize complex decision boundaries from basic subnetworks.
The approach demonstrates practical applications like XOR classification, offering mathematical guarantees and controlled network capacity through hierarchical construction.

A primitive-specific multilayer perceptron (MLP) is a feed-forward neural architecture systematically constructed by decomposing complex classification tasks into geometric primitives, training dedicated MLPs for these primitive regions, then algebraically combining these subnetworks to yield a tailored, interpretable classifier. The resulting architecture and its construction principles are grounded in the formalism of MLP-algebra, which provides a suite of closed, compositional operations for synthesizing complex decision boundaries from a finite library of primitive MLPs (Peng, 2017).

1. The Formal Framework: Universe of MLPs and Primitives

The universe $\mathcal{M}$ comprises all feed-forward MLPs with $L \geq 2$ layers, arbitrary finite widths, and (unless otherwise specified) sigmoid activations. Each network $\mathcal{N} \in \mathcal{M}$ is specified by its layer dimensions $(n_1, ..., n_L)$ , its collection of weight matrices $\omega^k \in \mathbb{R}^{n_{k+1} \times n_k}$ , and threshold vectors $\theta^k \in \mathbb{R}^{n_{k+1}}$ . The forward map is realized by iterating $x \mapsto \sigma(\omega^1 x - \theta^1) \rightarrow \cdots \rightarrow \mathcal{N}(x) \in \mathbb{R}^{n_L}$ .

A network is designated as "primitive" if it is trainable to act as the characteristic function of a simple geometric set. Canonical examples include:

Half-space: $\{x : a \cdot x \geq b \}$ (single-layer perceptron).
Ball/disk: $\{x: \lVert x - c \rVert < r \}$ (two-layer: $n \rightarrow n+1 \rightarrow 1$ ).
Axis-aligned boxes: $L \geq 2$ 0 (hidden layer size $L \geq 2$ 1).
Cartesian products and unions of lower-dimensional primitives.

For a concrete task, a finite library $L \geq 2$ 2 of such primitives is selected to serve as the building blocks for higher-complexity MLP construction.

2. Algebraic Operations for MLP Composition

The MLP-algebra defines a set of operators that act on compatible networks (i.e., with matching input spaces and structure) to produce new networks also in $L \geq 2$ 3. These include:

Complement $L \geq 2$ 4: For 1D-output MLP $L \geq 2$ 5, the complement $L \geq 2$ 6 is constructed by negating the final-layer weights and thresholds, thereby inverting the decision boundary via the identity $L \geq 2$ 7.
Sum (Union) $L \geq 2$ 8: Combines two L-layer, scalar-output MLPs so that the resulting network represents the union of the individual decision regions. The final layer applies a sharp sigmoid with weights $L \geq 2$ 9 and threshold $\mathcal{N} \in \mathcal{M}$ 0, enforcing logical OR.
Multi-Sum $\mathcal{N} \in \mathcal{M}$ 1: Generalizes the union operator to $\mathcal{N} \in \mathcal{M}$ 2 networks, stacking hidden layers accordingly.
Difference $\mathcal{N} \in \mathcal{M}$ 3: Set difference realized algebraically as $\mathcal{N} \in \mathcal{M}$ 4.
$\mathcal{N} \in \mathcal{M}$ 5-Product (Intersection) $\mathcal{N} \in \mathcal{M}$ 6: Forms the intersection of the positive regions of two compatible MLPs; constructed using block-diagonal combining at each hidden layer and weight $\mathcal{N} \in \mathcal{M}$ 7, threshold $\mathcal{N} \in \mathcal{M}$ 8 in the final layer.
Multi- $\mathcal{N} \in \mathcal{M}$ 9-Product $(n_1, ..., n_L)$ 0: Multi-way intersection.
Component Extraction $(n_1, ..., n_L)$ 1: Extracts the $(n_1, ..., n_L)$ 2-th output neuron from a multi-label MLP, yielding a binary classifier.
$(n_1, ..., n_L)$ 3-Product $(n_1, ..., n_L)$ 4: Concatenates two binary classifiers into a 2-label MLP by direct-sum of hidden layers and stacking the outputs.
Identical Extension $(n_1, ..., n_L)$ 5: Appends an identity mapping layer (linear or ReLU) to align depth for algebraic summing or multiplication with deeper networks.

These operations are closed in $(n_1, ..., n_L)$ 6, enabling hierarchical design without leaving the universe of valid MLPs (Peng, 2017).

3. Key Algebraic Properties

Networks with 1D output under the fundamental operators obey a commutative algebra:

Property	Formal Statement	Intuitive Explanation
Commutativity	$(n_1, ..., n_L)$ 7; $(n_1, ..., n_L)$ 8	Order of union/intersection does not affect result
Associativity	$(n_1, ..., n_L)$ 9, etc.	Grouping does not affect the output decision boundary
Distributivity	$\omega^k \in \mathbb{R}^{n_{k+1} \times n_k}$ 0 (up to scaling/thresholds)	Logical AND distributes over OR at the decision level
Involution	$\omega^k \in \mathbb{R}^{n_{k+1} \times n_k}$ 1	Double complement restores the original decision boundary

Closure is immediate from the recursive definitions of weights, thresholds, and dimensions. Associativity and commutativity arise from block-diagonal and concatenation symmetries. Distributivity is approximate, depending on the sharpness parameter $\omega^k \in \mathbb{R}^{n_{k+1} \times n_k}$ 2; exactness can be enforced by further SGD fine-tuning. Involution of complement is algebraic (Peng, 2017).

4. Construction Methodology and Pseudocode

Building a primitive-specific MLP proceeds as follows:

For each label $\omega^k \in \mathbb{R}^{n_{k+1} \times n_k}$ 3 in $\omega^k \in \mathbb{R}^{n_{k+1} \times n_k}$ 4, partition the label-specific data $\omega^k \in \mathbb{R}^{n_{k+1} \times n_k}$ 5 into regions approximately corresponding to primitives in $\omega^k \in \mathbb{R}^{n_{k+1} \times n_k}$ 6 (e.g., via clustering or geometric heuristics).
Train an MLP for each primitive region $\omega^k \in \mathbb{R}^{n_{k+1} \times n_k}$ 7—selecting the appropriate template (half-space, ball, box) and fine-tuning as a characteristic function.
Combine the label-specific primitive nets by multi-sum (union): $\omega^k \in \mathbb{R}^{n_{k+1} \times n_k}$ 8.
Merge all $\omega^k \in \mathbb{R}^{n_{k+1} \times n_k}$ 9 label-specific nets by the $\theta^k \in \mathbb{R}^{n_{k+1}}$ 0-product to yield a $\theta^k \in \mathbb{R}^{n_{k+1}}$ 1-output classifier.
Optionally, fine-tune the assembled MLP on the full labeled set for several epochs to enhance logical gating.

Pseudocode (appearing verbatim in the source):

$\{x : a \cdot x \geq b \}$ 7 Regions embedded in projected subspaces can be re-aligned using the $\theta^k \in \mathbb{R}^{n_{k+1}}$ 2-product operation (Peng, 2017).

5. Practical Strategies for Primitive Selection and Capacity

Choosing optimal primitives:

For clusters near-linearly separable, use half-space MLPs.
For "blob-like" clusters, employ ball/disk templates ( $\theta^k \in \mathbb{R}^{n_{k+1}}$ 3).
For box-shaped clusters, use axis-aligned box MLPs (hidden layer size $\theta^k \in \mathbb{R}^{n_{k+1}}$ 4).
If primitive fitting is nontrivial, approximate with unions of basic shapes.

Architectural and learning considerations:

Depth increments by one whenever a sum or $\theta^k \in \mathbb{R}^{n_{k+1}}$ 5-product is applied, corresponding to an added combining layer.
Width increases via block-diagonal concatenation as networks are joined.
The sharpness parameter $\theta^k \in \mathbb{R}^{n_{k+1}}$ 6 modulates how Boolean-like the logical gates become; higher $\theta^k \in \mathbb{R}^{n_{k+1}}$ 7 yields sharper transitions, with initial value $\theta^k \in \mathbb{R}^{n_{k+1}}$ 8 suggested, tuned as needed.
After construction, re-fine-tuning using SGD on the recombined data often sharpens logical implementation and mitigates softness in composite gates.

6. Illustrative Example: Constructing XOR in $\theta^k \in \mathbb{R}^{n_{k+1}}$ 9

For the classic XOR problem on $x \mapsto \sigma(\omega^1 x - \theta^1) \rightarrow \cdots \rightarrow \mathcal{N}(x) \in \mathbb{R}^{n_L}$ 0, the positive region is the union of two quadrants:

$x \mapsto \sigma(\omega^1 x - \theta^1) \rightarrow \cdots \rightarrow \mathcal{N}(x) \in \mathbb{R}^{n_L}$ 1 : $x \mapsto \sigma(\omega^1 x - \theta^1) \rightarrow \cdots \rightarrow \mathcal{N}(x) \in \mathbb{R}^{n_L}$ 2
$x \mapsto \sigma(\omega^1 x - \theta^1) \rightarrow \cdots \rightarrow \mathcal{N}(x) \in \mathbb{R}^{n_L}$ 3 : $x \mapsto \sigma(\omega^1 x - \theta^1) \rightarrow \cdots \rightarrow \mathcal{N}(x) \in \mathbb{R}^{n_L}$ 4

Primitive networks:

$x \mapsto \sigma(\omega^1 x - \theta^1) \rightarrow \cdots \rightarrow \mathcal{N}(x) \in \mathbb{R}^{n_L}$ 5: half-space $x \mapsto \sigma(\omega^1 x - \theta^1) \rightarrow \cdots \rightarrow \mathcal{N}(x) \in \mathbb{R}^{n_L}$ 6 (one-layer)
$x \mapsto \sigma(\omega^1 x - \theta^1) \rightarrow \cdots \rightarrow \mathcal{N}(x) \in \mathbb{R}^{n_L}$ 7: half-space $x \mapsto \sigma(\omega^1 x - \theta^1) \rightarrow \cdots \rightarrow \mathcal{N}(x) \in \mathbb{R}^{n_L}$ 8

Compositional construction:

$x \mapsto \sigma(\omega^1 x - \theta^1) \rightarrow \cdots \rightarrow \mathcal{N}(x) \in \mathbb{R}^{n_L}$ 9 via $\{x : a \cdot x \geq b \}$ 0 (intersection)
$\{x : a \cdot x \geq b \}$ 1 via $\{x : a \cdot x \geq b \}$ 2
Union $\{x : a \cdot x \geq b \}$ 3

The final binary XOR network is $\{x : a \cdot x \geq b \}$ 4. The explicit architecture includes stacking the hidden representations, forming logical AND via sharp sigmoid ( $\{x : a \cdot x \geq b \}$ 5), and summing for the logical OR. SGD fine-tuning for a few epochs enables the MLP to achieve $\{x : a \cdot x \geq b \}$ 6 accuracy on XOR inputs. This construction demonstrates the systematic decomposition and algebraic recombination at the heart of the MLP-algebra approach (Peng, 2017).

7. Significance and Interpretability

By leveraging primitive-specific construction, the resulting MLP encodes the logical structure of the classification task explicitly in its architecture. The compositionality of the framework ensures transparency: internal subnetworks correspond to interpretable geometric regions. For datasets that admit decompositions into simple primitives, this approach offers a systematic, design-theoretic alternative to unguided end-to-end training, with capacity and architectural complexity controlled through algebraic operations and hyperparameters. The method provides provable guarantees about the underlying logical form of the resulting MLP, unique among network construction methodologies (Peng, 2017).

Markdown Report Issue Upgrade to Chat

References (1)

Multilayer Perceptron Algebra (2017)

Topic to Video (Beta)

No one has generated a video about this topic yet.

Whiteboard

No one has generated a whiteboard explanation for this topic yet.

Follow Topic

Get notified by email when new papers are published related to Primitive-Specific Multilayer Perceptron (MLP).

Primitive-Specific MLP Architecture

1. The Formal Framework: Universe of MLPs and Primitives

2. Algebraic Operations for MLP Composition

3. Key Algebraic Properties

4. Construction Methodology and Pseudocode

5. Practical Strategies for Primitive Selection and Capacity

6. Illustrative Example: Constructing XOR in $\theta^k \in \mathbb{R}^{n_{k+1}}$ 9

7. Significance and Interpretability

Topic to Video (Beta)

Whiteboard

Follow Topic

Continue Learning

Don't miss out on important new AI/ML research

Primitive-Specific MLP Architecture

1. The Formal Framework: Universe of MLPs and Primitives

2. Algebraic Operations for MLP Composition

3. Key Algebraic Properties

4. Construction Methodology and Pseudocode

5. Practical Strategies for Primitive Selection and Capacity

6. Illustrative Example: Constructing XOR in θk∈Rnk+1\theta^k \in \mathbb{R}^{n_{k+1}}θk∈Rnk+1​9

7. Significance and Interpretability

Topic to Video (Beta)

Whiteboard

Follow Topic

Continue Learning

Related Topics

Don't miss out on important new AI/ML research

Sign up for free to explore the frontiers of research

6. Illustrative Example: Constructing XOR in $\theta^k \in \mathbb{R}^{n_{k+1}}$ 9