Aspiration-Based Perturbed Learning Automata
- APLA is an aspiration-based reinforcement learning rule that enables decentralized optimization in multi-player games with noisy payoffs.
- It employs a two-time-scale adaptation where strategies update rapidly and aspirations adjust slowly to effectively filter out noise.
- APLA guarantees convergence to efficient pure Nash equilibria and outperforms standard perturbed learning automata in weakly acyclic games.
Aspiration-Based Perturbed Learning Automata (APLA) are a class of fully decentralized, payoff-based learning rules for distributed optimization in multi-player strategic-form games with noisy utility measurements. APLA combines reinforcement based on repeated action selection with an explicit aspiration factor that modulates learning according to an agent’s level of satisfaction—that is, whether observed payoffs exceed or fall short of dynamically evolving aspiration levels. This dynamic leads to robust stochastic selection of efficient pure Nash equilibria, including payoff-dominant equilibria, in broad classes of games such as weakly acyclic and coordination games, and ensures convergence guarantees that are unattainable by standard perturbed learning automata (PLA) in similar settings. APLAs exhibit rigorous stability and robustness properties, particularly under bounded noise, and require no inter-agent communication or explicit game structure knowledge beyond local payoff observations (Chasparis, 23 Nov 2025, Chasparis, 31 Oct 2025, Chasparis, 2018).
1. Formal Problem Setup and Game Model
Consider a finite-player, finite-action strategic-form game with player set . Each player selects actions from a finite set , and joint actions are denoted . Nominal utilities for each player are given by , satisfying the positive-utility property ( for all ). Observed utilities are subject to bounded noise: A key focus is on weakly acyclic games, where from any joint action there exists a finite sequence of strict better replies leading to a pure Nash equilibrium.
2. APLA State Variables and Update Mechanisms
Each agent maintains
- a mixed strategy ,
- an aspiration level ,
- a set of update parameters: strategy step-size , a slower aspiration step-size with , satisfaction floor , aspiration scaling , bounded noise parameter , and a “tremble” (mutation) rate .
At each discrete time , the following steps are executed:
- Action selection (with tremble):
- Observation: Each agent observes payoff .
- Aspiration factor:
- Strategy update:
- Aspiration update:
Projections and clamping ensure that and throughout (Chasparis, 23 Nov 2025, Chasparis, 31 Oct 2025, Chasparis, 2018).
3. Markov Chain Structure and Stochastic Stability
The state of the system is described by . The APLA dynamics induce a Markov chain whose ergodicity is assured by the presence of trembles (). For small , the invariant probability measure concentrates on pure-strategy states defined by and for all .
A pure-strategy state is termed stochastically stable if . The transition structure under small can be reduced to a finite Markov chain over pure-strategy states, where the unique invariant distribution characterizes the frequencies with which equilibria are visited in the limit of vanishing perturbation (Chasparis, 23 Nov 2025, Chasparis, 31 Oct 2025).
4. Stochastic Stability Analysis in Weakly Acyclic Games
In weakly acyclic games under the positive-utility property and if are sufficiently small, all stochastically stable states are pure Nash equilibria. If, in addition, every non-equilibrium profile admits a (possibly multi-agent) better-reply path to a payoff-dominant equilibrium, then the payoff-dominant Nash equilibria are the unique stochastically stable states.
This selection mechanism arises from the “resistance” structure of transitions: For a one-step transition , define resistance
The minimum total resistance of a spanning arborescence rooted at determines stochastically stable states. Transitions from payoff-dominant equilibria typically require an agent to be unsatisfied (payoff below aspiration), and thus incur extra resistance proportional to $1/h$; in the limit , these transitions become highly unlikely, favoring the selection of payoff-dominant states (Chasparis, 23 Nov 2025).
5. Algorithmic Properties, Parameter Choices, and Noise Robustness
Key algorithmic features of APLA include:
- Satisficing via aspirations: Reinforcement magnitude is modulated by how much realized payoff exceeds aspiration, allowing agents to “down-weight” reinforcement when dissatisfied.
- Two-time-scale adaptation: Aspirations evolve strictly slower than strategies (), providing a dynamic filter that attenuates the effect of bounded payoff noise.
- Ergodicity and robustness: Tremble ensures the induced process is irreducible and can escape non-equilibrium traps even in the presence of bounded noise.
- No coordination requirement: Each agent learns independently, requiring only local payoff observations.
Practical parameter selection typically uses , , and to ensure slow stable learning, robust aspiration tracking, rare exploration, and strong selection of payoff-dominant equilibria (Chasparis, 23 Nov 2025, Chasparis, 31 Oct 2025, Chasparis, 2018).
6. Comparison with Standard Perturbed Learning Automata and Illustrative Example
Conventional PLA lacks aspiration-based filtering, resulting in potential stochastic stability of risk-dominant or inefficient equilibria in coordination games. In the two-player Stag-Hunt game
$\begin{array}{c|cc} & A & B\ \hline A & (a,a) & (b,c)\ B & (c,b) & (d,d) \end{array} \qquad a > c > 0,\, d > b > 0,\, a > d,$
PLA with typically selects the risk-dominant equilibrium if . In contrast, APLA with introduces additional resistance to transitions out of (payoff-dominant), making it uniquely stochastically stable for a wide parameter range.
Simulations confirm that under moderate noise, PLA remains trapped around less efficient equilibria, whereas APLA—with slow aspirations and a small satisfaction floor —ensures almost sure convergence of empirical action frequencies to as time increases. This behavior is robust to noise and independent across agents (Chasparis, 23 Nov 2025, Chasparis, 31 Oct 2025, Chasparis, 2018).
7. Theoretical and Practical Significance
APLA provides the first reinforcement-based learning rule that guarantees stochastic convergence to pure Nash equilibria—including efficient, payoff-dominant ones—in all weakly acyclic games, going substantially beyond the convergence guarantees previously attainable only in potential or coordination games by prior learning algorithms. The framework accommodates boundedly noisy utility measurements and strictly decentralized setups, obviating the need for coordination, global information, or game structure knowledge.
The algorithm’s two-time-scale nature gives it strong resilience against trapping in suboptimal mixed strategies and filtering of reward noise, while aspiration-driven reinforcement implements “satisficing” in a manner consistent with observed learning in both engineering and behavioral contexts.
A plausible implication is that APLA can serve as a robust decentralized protocol for distributed optimization in multi-agent systems operating under partial observability or measurement noise, with rigorous equilibrium selection guarantees (Chasparis, 23 Nov 2025, Chasparis, 31 Oct 2025, Chasparis, 2018).