Papers
Topics
Authors
Recent
Search
2000 character limit reached

Primitive-Specific MLP Architecture

Updated 6 May 2026
  • Primitive-specific MLP is a feedforward neural classifier that decomposes complex tasks into simple geometric primitives, enabling modular and interpretable design.
  • It employs MLP-algebra operations such as union, intersection, and complement to systematically synthesize complex decision boundaries from basic subnetworks.
  • The approach demonstrates practical applications like XOR classification, offering mathematical guarantees and controlled network capacity through hierarchical construction.

A primitive-specific multilayer perceptron (MLP) is a feed-forward neural architecture systematically constructed by decomposing complex classification tasks into geometric primitives, training dedicated MLPs for these primitive regions, then algebraically combining these subnetworks to yield a tailored, interpretable classifier. The resulting architecture and its construction principles are grounded in the formalism of MLP-algebra, which provides a suite of closed, compositional operations for synthesizing complex decision boundaries from a finite library of primitive MLPs (Peng, 2017).

1. The Formal Framework: Universe of MLPs and Primitives

The universe M\mathcal{M} comprises all feed-forward MLPs with L≥2L \geq 2 layers, arbitrary finite widths, and (unless otherwise specified) sigmoid activations. Each network N∈M\mathcal{N} \in \mathcal{M} is specified by its layer dimensions (n1,...,nL)(n_1, ..., n_L), its collection of weight matrices ωk∈Rnk+1×nk\omega^k \in \mathbb{R}^{n_{k+1} \times n_k}, and threshold vectors θk∈Rnk+1\theta^k \in \mathbb{R}^{n_{k+1}}. The forward map is realized by iterating x↦σ(ω1x−θ1)→⋯→N(x)∈RnLx \mapsto \sigma(\omega^1 x - \theta^1) \rightarrow \cdots \rightarrow \mathcal{N}(x) \in \mathbb{R}^{n_L}.

A network is designated as "primitive" if it is trainable to act as the characteristic function of a simple geometric set. Canonical examples include:

  • Half-space: {x:aâ‹…x≥b}\{x : a \cdot x \geq b \} (single-layer perceptron).
  • Ball/disk: {x:∥x−c∥<r}\{x: \lVert x - c \rVert < r \} (two-layer: n→n+1→1n \rightarrow n+1 \rightarrow 1).
  • Axis-aligned boxes: L≥2L \geq 20 (hidden layer size L≥2L \geq 21).
  • Cartesian products and unions of lower-dimensional primitives.

For a concrete task, a finite library L≥2L \geq 22 of such primitives is selected to serve as the building blocks for higher-complexity MLP construction.

2. Algebraic Operations for MLP Composition

The MLP-algebra defines a set of operators that act on compatible networks (i.e., with matching input spaces and structure) to produce new networks also in L≥2L \geq 23. These include:

  • Complement L≥2L \geq 24: For 1D-output MLP L≥2L \geq 25, the complement L≥2L \geq 26 is constructed by negating the final-layer weights and thresholds, thereby inverting the decision boundary via the identity L≥2L \geq 27.
  • Sum (Union) L≥2L \geq 28: Combines two L-layer, scalar-output MLPs so that the resulting network represents the union of the individual decision regions. The final layer applies a sharp sigmoid with weights L≥2L \geq 29 and threshold N∈M\mathcal{N} \in \mathcal{M}0, enforcing logical OR.
  • Multi-Sum N∈M\mathcal{N} \in \mathcal{M}1: Generalizes the union operator to N∈M\mathcal{N} \in \mathcal{M}2 networks, stacking hidden layers accordingly.
  • Difference N∈M\mathcal{N} \in \mathcal{M}3: Set difference realized algebraically as N∈M\mathcal{N} \in \mathcal{M}4.
  • N∈M\mathcal{N} \in \mathcal{M}5-Product (Intersection) N∈M\mathcal{N} \in \mathcal{M}6: Forms the intersection of the positive regions of two compatible MLPs; constructed using block-diagonal combining at each hidden layer and weight N∈M\mathcal{N} \in \mathcal{M}7, threshold N∈M\mathcal{N} \in \mathcal{M}8 in the final layer.
  • Multi-N∈M\mathcal{N} \in \mathcal{M}9-Product (n1,...,nL)(n_1, ..., n_L)0: Multi-way intersection.
  • Component Extraction (n1,...,nL)(n_1, ..., n_L)1: Extracts the (n1,...,nL)(n_1, ..., n_L)2-th output neuron from a multi-label MLP, yielding a binary classifier.
  • (n1,...,nL)(n_1, ..., n_L)3-Product (n1,...,nL)(n_1, ..., n_L)4: Concatenates two binary classifiers into a 2-label MLP by direct-sum of hidden layers and stacking the outputs.
  • Identical Extension (n1,...,nL)(n_1, ..., n_L)5: Appends an identity mapping layer (linear or ReLU) to align depth for algebraic summing or multiplication with deeper networks.

These operations are closed in (n1,...,nL)(n_1, ..., n_L)6, enabling hierarchical design without leaving the universe of valid MLPs (Peng, 2017).

3. Key Algebraic Properties

Networks with 1D output under the fundamental operators obey a commutative algebra:

Property Formal Statement Intuitive Explanation
Commutativity (n1,...,nL)(n_1, ..., n_L)7; (n1,...,nL)(n_1, ..., n_L)8 Order of union/intersection does not affect result
Associativity (n1,...,nL)(n_1, ..., n_L)9, etc. Grouping does not affect the output decision boundary
Distributivity ωk∈Rnk+1×nk\omega^k \in \mathbb{R}^{n_{k+1} \times n_k}0 (up to scaling/thresholds) Logical AND distributes over OR at the decision level
Involution ωk∈Rnk+1×nk\omega^k \in \mathbb{R}^{n_{k+1} \times n_k}1 Double complement restores the original decision boundary

Closure is immediate from the recursive definitions of weights, thresholds, and dimensions. Associativity and commutativity arise from block-diagonal and concatenation symmetries. Distributivity is approximate, depending on the sharpness parameter ωk∈Rnk+1×nk\omega^k \in \mathbb{R}^{n_{k+1} \times n_k}2; exactness can be enforced by further SGD fine-tuning. Involution of complement is algebraic (Peng, 2017).

4. Construction Methodology and Pseudocode

Building a primitive-specific MLP proceeds as follows:

  1. For each label ωk∈Rnk+1×nk\omega^k \in \mathbb{R}^{n_{k+1} \times n_k}3 in ωk∈Rnk+1×nk\omega^k \in \mathbb{R}^{n_{k+1} \times n_k}4, partition the label-specific data ωk∈Rnk+1×nk\omega^k \in \mathbb{R}^{n_{k+1} \times n_k}5 into regions approximately corresponding to primitives in ωk∈Rnk+1×nk\omega^k \in \mathbb{R}^{n_{k+1} \times n_k}6 (e.g., via clustering or geometric heuristics).
  2. Train an MLP for each primitive region ωk∈Rnk+1×nk\omega^k \in \mathbb{R}^{n_{k+1} \times n_k}7—selecting the appropriate template (half-space, ball, box) and fine-tuning as a characteristic function.
  3. Combine the label-specific primitive nets by multi-sum (union): ωk∈Rnk+1×nk\omega^k \in \mathbb{R}^{n_{k+1} \times n_k}8.
  4. Merge all ωk∈Rnk+1×nk\omega^k \in \mathbb{R}^{n_{k+1} \times n_k}9 label-specific nets by the θk∈Rnk+1\theta^k \in \mathbb{R}^{n_{k+1}}0-product to yield a θk∈Rnk+1\theta^k \in \mathbb{R}^{n_{k+1}}1-output classifier.
  5. Optionally, fine-tune the assembled MLP on the full labeled set for several epochs to enhance logical gating.

Pseudocode (appearing verbatim in the source):

{x:a⋅x≥b}\{x : a \cdot x \geq b \}7 Regions embedded in projected subspaces can be re-aligned using the θk∈Rnk+1\theta^k \in \mathbb{R}^{n_{k+1}}2-product operation (Peng, 2017).

5. Practical Strategies for Primitive Selection and Capacity

Choosing optimal primitives:

  • For clusters near-linearly separable, use half-space MLPs.
  • For "blob-like" clusters, employ ball/disk templates (θk∈Rnk+1\theta^k \in \mathbb{R}^{n_{k+1}}3).
  • For box-shaped clusters, use axis-aligned box MLPs (hidden layer size θk∈Rnk+1\theta^k \in \mathbb{R}^{n_{k+1}}4).
  • If primitive fitting is nontrivial, approximate with unions of basic shapes.

Architectural and learning considerations:

  • Depth increments by one whenever a sum or θk∈Rnk+1\theta^k \in \mathbb{R}^{n_{k+1}}5-product is applied, corresponding to an added combining layer.
  • Width increases via block-diagonal concatenation as networks are joined.
  • The sharpness parameter θk∈Rnk+1\theta^k \in \mathbb{R}^{n_{k+1}}6 modulates how Boolean-like the logical gates become; higher θk∈Rnk+1\theta^k \in \mathbb{R}^{n_{k+1}}7 yields sharper transitions, with initial value θk∈Rnk+1\theta^k \in \mathbb{R}^{n_{k+1}}8 suggested, tuned as needed.
  • After construction, re-fine-tuning using SGD on the recombined data often sharpens logical implementation and mitigates softness in composite gates.

6. Illustrative Example: Constructing XOR in θk∈Rnk+1\theta^k \in \mathbb{R}^{n_{k+1}}9

For the classic XOR problem on x↦σ(ω1x−θ1)→⋯→N(x)∈RnLx \mapsto \sigma(\omega^1 x - \theta^1) \rightarrow \cdots \rightarrow \mathcal{N}(x) \in \mathbb{R}^{n_L}0, the positive region is the union of two quadrants:

  • x↦σ(ω1x−θ1)→⋯→N(x)∈RnLx \mapsto \sigma(\omega^1 x - \theta^1) \rightarrow \cdots \rightarrow \mathcal{N}(x) \in \mathbb{R}^{n_L}1 : x↦σ(ω1x−θ1)→⋯→N(x)∈RnLx \mapsto \sigma(\omega^1 x - \theta^1) \rightarrow \cdots \rightarrow \mathcal{N}(x) \in \mathbb{R}^{n_L}2
  • x↦σ(ω1x−θ1)→⋯→N(x)∈RnLx \mapsto \sigma(\omega^1 x - \theta^1) \rightarrow \cdots \rightarrow \mathcal{N}(x) \in \mathbb{R}^{n_L}3 : x↦σ(ω1x−θ1)→⋯→N(x)∈RnLx \mapsto \sigma(\omega^1 x - \theta^1) \rightarrow \cdots \rightarrow \mathcal{N}(x) \in \mathbb{R}^{n_L}4

Primitive networks:

  • x↦σ(ω1x−θ1)→⋯→N(x)∈RnLx \mapsto \sigma(\omega^1 x - \theta^1) \rightarrow \cdots \rightarrow \mathcal{N}(x) \in \mathbb{R}^{n_L}5: half-space x↦σ(ω1x−θ1)→⋯→N(x)∈RnLx \mapsto \sigma(\omega^1 x - \theta^1) \rightarrow \cdots \rightarrow \mathcal{N}(x) \in \mathbb{R}^{n_L}6 (one-layer)
  • x↦σ(ω1x−θ1)→⋯→N(x)∈RnLx \mapsto \sigma(\omega^1 x - \theta^1) \rightarrow \cdots \rightarrow \mathcal{N}(x) \in \mathbb{R}^{n_L}7: half-space x↦σ(ω1x−θ1)→⋯→N(x)∈RnLx \mapsto \sigma(\omega^1 x - \theta^1) \rightarrow \cdots \rightarrow \mathcal{N}(x) \in \mathbb{R}^{n_L}8

Compositional construction:

  • x↦σ(ω1x−θ1)→⋯→N(x)∈RnLx \mapsto \sigma(\omega^1 x - \theta^1) \rightarrow \cdots \rightarrow \mathcal{N}(x) \in \mathbb{R}^{n_L}9 via {x:aâ‹…x≥b}\{x : a \cdot x \geq b \}0 (intersection)
  • {x:aâ‹…x≥b}\{x : a \cdot x \geq b \}1 via {x:aâ‹…x≥b}\{x : a \cdot x \geq b \}2
  • Union {x:aâ‹…x≥b}\{x : a \cdot x \geq b \}3

The final binary XOR network is {x:a⋅x≥b}\{x : a \cdot x \geq b \}4. The explicit architecture includes stacking the hidden representations, forming logical AND via sharp sigmoid ({x:a⋅x≥b}\{x : a \cdot x \geq b \}5), and summing for the logical OR. SGD fine-tuning for a few epochs enables the MLP to achieve {x:a⋅x≥b}\{x : a \cdot x \geq b \}6 accuracy on XOR inputs. This construction demonstrates the systematic decomposition and algebraic recombination at the heart of the MLP-algebra approach (Peng, 2017).

7. Significance and Interpretability

By leveraging primitive-specific construction, the resulting MLP encodes the logical structure of the classification task explicitly in its architecture. The compositionality of the framework ensures transparency: internal subnetworks correspond to interpretable geometric regions. For datasets that admit decompositions into simple primitives, this approach offers a systematic, design-theoretic alternative to unguided end-to-end training, with capacity and architectural complexity controlled through algebraic operations and hyperparameters. The method provides provable guarantees about the underlying logical form of the resulting MLP, unique among network construction methodologies (Peng, 2017).

Definition Search Book Streamline Icon: https://streamlinehq.com
References (1)

Topic to Video (Beta)

No one has generated a video about this topic yet.

Whiteboard

No one has generated a whiteboard explanation for this topic yet.

Follow Topic

Get notified by email when new papers are published related to Primitive-Specific Multilayer Perceptron (MLP).