FOLLOW Rule: A Predictive Learning Framework
- The FOLLOW Rule blends information theory, statistics, and online learning for locally optimal predictions under constraints.
- SPLR's local, proper scoring incentivizes truthfulness and aids in designing hardware-friendly predictive algorithms.
- Application in edge computation and neuromorphic architectures demonstrates SPLR's efficiency, reducing complexity in real-time learning.
The FOLLOW Rule (Simplified Predictive Local Rule, SPLR) is a formalism for local predictive learning in which updates are driven solely by prediction errors and governed by a unique, theoretically principled scoring function. As developed in foundational works on Bayesian predictive decision theory and recently extended to efficient machine learning substrates, SPLR blends core concepts from information theory, statistics, and online learning. It yields both a characterization of locally optimal prediction under information constraints and an efficient template for hardware implementations in neuromorphic and edge computation contexts (Polson et al., 25 Dec 2025, Zang et al., 25 Dec 2025).
1. Mathematical Foundations of Local Predictive Scoring
A predictive scoring rule is a function mapping a predicted probability distribution and observed outcome to a real-valued score. Such a rule is proper if reporting the true generative distribution maximizes expected score over all possible distributions:
$\E_{X \sim q}[S(q, X)] \geq \E_{X \sim q}[S(p, X)] \quad \forall\, p \in \Delta(\mathcal{X}).$
It is local if the score depends on only via the assigned mass or density at the realized ; i.e., for some real-valued function .
Bernardo's characterization implies that all strictly proper, local scoring rules must have the form:
The uniqueness, up to an affine transformation, follows from requiring additivity under partition refinement (Shannon's amalgamation invariance). Only the log-score satisfies this requirement; no other strictly proper local form is consistent with coherent predictive updating under state-space refinement (Polson et al., 25 Dec 2025).
2. The Simplified Predictive Local Rule (SPLR): Canonical Form and Properties
The canonical instantiation is:
0
This is the SPLR. Its strict propriety ensures it incentivizes truthful reporting of the full predictive distribution. Locality ensures that the agent's predictive utility depends only on how much probability was assigned to the actual outcome.
SPLR is, therefore, not simply a modeling assumption but emerges uniquely given the desiderata of locality, strict propriety, and invariance to event amalgamation. This provides a theoretical justification for its central role in predictive decision frameworks and Bayesian models (Polson et al., 25 Dec 2025).
3. SPLR in Predictive Decision-Making, Information Theory, and Rational Inattention
In the Bayesian predictive setting, SPLR gives rise to the mutual information as the expected gain from refining predictions with data:
1
This operationalizes information cost as expected predictive utility, avoiding the need to postulate exogenous cost functions; information frictions become endogenous to the predictive decision setup (Polson et al., 25 Dec 2025).
Optimal policies under SPLR are Gibbs-Boltzmann channels. Classical rational inattention models arise as special cases, with canonical implementations including multinomial logit under entropic regularization, James-Stein shrinkage in Gaussian learning, and linear-quadratic-Gaussian control (Polson et al., 25 Dec 2025).
4. SPLR as a Plasticity Rule in Online Learning and Neural Architectures
SPLR has been adapted as an efficient, local predictive plasticity rule for training the output weights of extreme learning machines (ELM). The protocol is as follows (Zang et al., 25 Dec 2025):
- Compute binary hidden unit activation via a Heaviside step over random fixed projections.
- Compute output scores 2.
- Predict class as the index maximizing 3.
- If prediction is incorrect, update only two columns in 4:
5
with weight clipping applied per element.
This procedure eliminates global backpropagation and does not require floating-point multiplications or accumulators: updates use only binary vector additions and subtractions triggered by misclassification.
5. Computational Complexity and Empirical Performance
The SPLR-based ELM update replaces an 6 matrix inversion (required in standard batch ELM for 7 hidden units) with two 8 operations per misclassified datapoint, with further savings possible via sparsity in activations. Empirical evaluations on MNIST and Fashion-MNIST, with 9 hidden units, report training accuracy within 3.6% and test accuracy within 2.0% of batch ELM, with rapid error-rate plateauing and no need for an offline/batch regime (Zang et al., 25 Dec 2025).
6. FPGA and Hardware Implementation Characteristics
SPLR's binary, local nature enables high-throughput, energy-efficient FPGA implementation:
- On Xilinx ZCU104 UltraScale+ MPSoC, training throughput reaches 63,454 fps (frames per second), and inference 122,336 fps, at a power cost of 0W (1 fps/W).
- Random input-to-hidden weights are generated on-the-fly through per-neuron LFSRs, eliminating dedicated memory for this matrix.
- The method outperforms prior event-based and spiking implementations by three to four orders of magnitude in speed while maintaining compact memory and logic usage (Zang et al., 25 Dec 2025).
7. Comparative Significance and Theoretical Uniqueness
The SPLR is strictly proper and local; non-local rules such as Brier or spherical score do not possess this combination of properties or Shannon-style partition additivity. This strict uniqueness underpins both robust predictive calibration and the analytic tractability of rational inattention, mutual information, and related rate-distortion results.
Integration of SPLR into both formal Bayesian theory and online learning hardware demonstrates its broad scope and foundational character in probabilistic modeling, efficient learning, and resource-constrained decision-making (Polson et al., 25 Dec 2025, Zang et al., 25 Dec 2025).