HeMLGOP: Heterogeneous Generalized Perceptrons
- HeMLGOP is a neural architecture that extends classical MLPs by allowing each neuron to select its own transformation, pooling, and activation operators.
- The progressive network construction strategy adapts both depth and width based on operator candidate performance, resulting in compact and efficient models.
- Empirical results show that HeMLGOP achieves top accuracy with significantly smaller model sizes and faster training times compared to traditional deep neural networks.
Heterogeneous Multilayer Generalized Operational Perceptrons (HeMLGOP) define a neural network architecture that extends classical multilayer perceptrons (MLPs) by replacing the standard linear-threshold neuron with Generalized Operational Perceptrons (GOPs). In HeMLGOP, each neuron, regardless of its depth or layer, may independently select its synaptic transformation, dendritic pooling, and nonlinearity from predefined operator libraries. This neuron-level heterogeneity, coupled with a progressive algorithm for optimizing topology and operator selection, enables networks that are more compact and expressive than conventional deep neural models. HeMLGOP inherits foundational principles from Operational Neural Networks (ONNs), generalizing them to multilayer settings with automatic, data-driven architecture and operator adaptation (Tran et al., 2018, Kiranyaz et al., 2019).
1. The Generalized Operational Perceptron Formalism
Each GOP neuron in HeMLGOP generalizes the McCulloch–Pitts model by introducing three configurable stages: a nodal operator (synaptic transformation), a pooling operator (aggregation), and an activation function (nonlinearity). For neuron in layer , with weights , bias , nodal operator , pooling operator , and activation , the computation is:
The libraries may include:
- Nodal 0: multiply, exp, sin, quadratic, Gaussian, difference-of-Gaussians, etc.
- Pooling 1: sum, 1-correlation, max, etc.
- Activation 2: sigmoid, tanh, ReLU, softplus, ELU, etc.
When all operators are set to standard choices (e.g., multiply/sum/sigmoid), the model reduces to an MLP neuron (Tran et al., 2018, Kiranyaz et al., 2019).
2. Motivation and Representational Advantages
The classical MLP is constrained to uniform, affine nonlinear transformations, which limits diversity and modeling power, especially for highly nonlinear and multimodal tasks. Biological neurons exhibit diverse synaptic and dendritic transformations; standard MLPs do not reflect this heterogeneity. GOP-based architectures, by allowing each neuron to select its own operator triple 3, drastically expand the class of nonlinear functions that can be represented, often with fewer neurons and layers. Fixing operator sets per layer or per network, as in conventional architectures, results in networks that are often oversized or under-optimized for real-world tasks. The capacity for per-neuron heterogeneity is a central driver of HeMLGOP's high representational efficiency (Tran et al., 2018).
3. Progressive, Neuron-Level Network Construction
HeMLGOP adopts a progressive, block-based search strategy to jointly optimize both depth (number of layers) and width (number of neurons per layer), as well as the set of operators for each new neuron. The learning process proceeds as follows:
- Initialize with input 4 and targets 5. Set layer index 6.
- Layer Growth: For each new layer, start with 7 GOPs.
- Width Growth:
- For each candidate operator set 8:
- Randomly initialize candidate weights, compute corresponding hidden outputs, and concatenate with existing activations.
- Solve linear regression (using Moore-Penrose pseudoinverse or Tikhonov regularization) to fit output.
- Compute candidate loss (9).
- Select operator set 0 achieving lowest loss, fine-tune the new neurons with backpropagation (BP) for a small number of epochs.
- Assess relative improvement 1; if below threshold 2, stop adding neurons to current layer; otherwise, append and repeat.
- For each candidate operator set 8:
- Depth Growth: After width stops, compute layer-level improvement 3. If 4, halt; otherwise, add next layer with new inputs, and repeat.
- Final Fine-tuning: Unfreeze all parameters and optionally perform full-network BP (Tran et al., 2018).
Pseudocode for the progression is explicitly detailed in (Tran et al., 2018), incorporating randomized evaluation, batch normalization, regularization, and stopping criteria for compactness.
4. Training Methodology and Operator Search
Operator and weight selection is divided into two cooperating stages:
- Randomized Network (RN) Evaluation: For each candidate operator triple, assign random weights, normalize hidden outputs, and solve for closed-form output weights. This enables rapid, low-cost evaluation across many operator candidates before committing to gradient-based optimization.
- Block Fine-Tuning: Only the most recent block of selected neurons undergoes BP-based optimization, with prior layers fixed, during the progressive growth. This implicitly regularizes new parameters against prior structure.
- Full-Network Optimization: Optionally, a final global fine-tuning is performed with all weights unfrozen. Batch normalization is essential, as concatenation of heterogeneous neuron blocks can skew activation statistics (Tran et al., 2018, Kiranyaz et al., 2019).
Regularization is implemented using weight decay or 5-norm constraints, with dropout applied variably (0.1–0.5). Loss functions typically include mean squared error or negative log-likelihood, with the same protocol applied for classification (Tran et al., 2018).
5. Empirical Performance and Architectural Properties
HeMLGOP demonstrates state-of-the-art or near state-of-the-art results on 11 real-world classification tasks, with datasets ranging from 768 to 60,000 samples, input dimensions 8–512, and class counts from 2 to 500. Benchmarks include the Progressive Operational Perceptron (POP), progressive MLP (PMLP), Progressive Learning Network (PLN), and Broad Learning System (BLS), as well as HeMLGOP variants (e.g., HoMLRN, HeMLRN, HoMLGOP).
Key outcomes:
- HeMLGOP typically achieves top accuracy.
- Model sizes are 6–7 smaller than POP/PMLP and more compact than PLN/BLS.
- Training time is up to 8 faster than POP, and competitive with other variants.
- Inference FLOPs are among the lowest measured, facilitating efficient deployment.
- Operator selection in heterogeneous layers is highly diverse; common choices are 9multiply, 0sum, 1ReLU/ELU (see operator distribution analysis in Fig. 2 of (Tran et al., 2018)).
The progressive, information-gain-based stopping criteria effectively prevent overgrowth and underfitting, producing networks that are both computationally and parametrically efficient.
| Method | Model Size | Accuracy | Training Time |
|---|---|---|---|
| HeMLGOP | 3–50× smaller than POP | Top/near-top | 2 faster than POP |
| POP/PMLP | Baseline | Varies | Reference |
| PLN/BLS | Larger, slower | Varies | Slower/more complex |
6. Relation to Operational Neural Networks and Theoretical Context
HeMLGOP is a direct generalization of the Operational Neural Networks (ONNs) framework, which introduced the notion of heterogeneity at a neuron and layer level by allowing arbitrary choices of nodal, pooling, and activation operators. The ONN neuron is mathematically equivalent—differing only in architectural application (vanilla feedforward vs. progressive multilayer search)—and both approaches demand specialized backpropagation through arbitrary operator chains.
HeMLGOP extends these principles by:
- Progressively searching over architecture and operator candidates at fine granularity (per neuron).
- Employing RN-based operator evaluation, enabling efficient exploration of expressive building blocks.
- Simultaneously optimizing network width and depth via explicit stopping rules based on loss improvement.
Operational heterogeneity is shown to amplify representational power per parameter and enable compact networks adaptive to modality-specific nonlinearities (e.g., in vision tasks). Notably, ONNs matched or outperformed CNNs of similar size on vision benchmarks, indicating that the core operator-diversity principle is instrumental even outside multilayer progressive architectures (Kiranyaz et al., 2019).
7. Implementation Considerations and Practical Guidelines
Practitioners are advised to:
- Begin with modest initial block sizes (3–4) and increment (5–6), tuning 7 and 8 to balance compactness against accuracy.
- Design operator libraries that span relevant linear and nonlinear relations for the target domain.
- Rely on RN evaluation to filter operator sets efficiently, shifting to backpropagation fine-tuning only for promising candidates.
- Always use normalization (e.g., Batch Normalization) when joining heterogeneous structures.
- Employ short final full-network fine-tuning as necessary, reserving further epochs only where overfitting is controlled by validation.
This approach unifies structural flexibility (depth/width), algorithmic efficiency (progressive/RN evaluation), and functional diversity (per-neuron operator selection), yielding compact, high-performance neural networks with broad applicability (Tran et al., 2018).