FM-EAC: Feature Model-Based Enhanced Actor-Critic

Updated 24 December 2025

Feature Model-Based Enhanced Actor-Critic is an RL framework that integrates compact feature extraction with actor–critic methods to improve sample efficiency, generalization, and transferability.
It tightly couples planning, acting, and learning by using modular components such as CNNs, GNNs, and PANs to transform high-dimensional states into informative, low-dimensional features.
Empirical results show FM-EAC outperforms traditional methods on single- and multi-task benchmarks by leveraging feature-informed critics and Dyna-style planning for rapid policy adaptation.

The Feature Model-Based Enhanced Actor-Critic (FM-EAC) methodology designates a class of reinforcement learning (RL) architectures that integrate low-dimensional feature modeling with an augmented actor–critic framework to address the challenges of sample efficiency, generalization, and transferability in high-dimensional, dynamic environments. FM-EAC tightly couples planning (model-based), acting (model-free), and learning via shared feature representations, enabling performance competitive with or surpassing existing state-of-the-art RL methods on both single- and multi-task benchmarks (Boney et al., 2021, Zhou et al., 17 Dec 2025).

1. Model Architecture and Key Components

FM-EAC is built upon the integration of the following core modules:

Feature Model (FM): A parameterized feature extractor, typically denoted $\phi_\varphi$ , that maps a high-dimensional raw environmental state $s$ to a compact, task-relevant vector $\phi(s)$ . FM can be instantiated using convolutional neural networks (CNNs), graph neural networks (GNNs), pre-trained point array networks (PANs), or arbitrary differentiable architectures.
Enhanced Actor-Critic (EAC): Consists of one or more actors $\pi_\theta$ and multiple critics $Q_{\phi_i}$ that use both raw states and extracted features. The critic’s inputs may be $(s, a, \phi(s))$ , enabling rich, feature-informed value estimation.
Planning Module: Utilizes the FM for short-horizon imagination rollouts or model-based policy refinement, augmenting the off-policy RL loop with synthesized experience.

The overall loop alternates between real-world execution (sampling actions via actors, collecting environment transitions and features) and batch training (updating FM, actors, and critics from pooled real and synthetic data).

2. Feature Extraction Strategies

Feature extraction is central to both single-task vision-based RL (Boney et al., 2021) and multi-task, dynamic settings (Zhou et al., 17 Dec 2025). Two prominent instantiations:

A 4-layer convolutional encoder $f$ maps stacked input frames ( $s_t = [o_t, o_{t-1}]$ ) to $K$ spatial feature maps $h_t \in \mathbb{R}^{K \times H \times W}$ .
Each map is interpreted as an unnormalized log-probability over 2D coordinates, from which:
- Soft-argmax yields spatial location $(x_k, y_k)$ as expectation over the induced probability.
- Presence scalar: $m_k = \tanh((1/HW)\sum_{x,y} h_t[k](x, y))$ , encoding object presence or feature confidence.
- The overall feature vector is $x_t = [(x_1, y_1, m_1), ...] \in \mathbb{R}^{K \times 3}$ .
Temporal differencing $x_t - x_{t-1}$ is concatenated to the feature input, ensuring velocity information is available.

$\phi_\varphi$ can be a GNN (graph convolution over agent or environment relationships), PAN (for capturing geometric structure), or other user-specified networks.
The FM may optionally model environment transitions and rewards in feature space:
- $p_\varphi(s'|s,a) \approx \mathcal{N}(\mu_\varphi(\phi(s),a), \Sigma_\varphi(\phi(s),a))$
- $r_\varphi(s,a) \approx R_\varphi(\phi(s),a)$
These models enable the agent to synthesize transitions via imagination rollouts for planning or augmenting the replay buffer.

3. Enhanced Actor-Critic Learning

FM-EAC leverages variants of off-policy RL with the following procedural characteristics:

Critic(s): $Q_{\phi_i}(s, a, \phi(s))$ evaluate state–action pairs with feature augmentation for improved expressivity. Twin critics and clipped double Q-learning are standard for stability and bias reduction.
Actor: $\pi_\theta(a|s)$ outputs either a stochastic (Gaussian) or deterministic policy.
Losses:
- Critic: $\mathcal{J}_{Q_i} = \mathbb{E}[(Q_{\phi_i}(s,a,\phi(s)) - Y(s, a, r, s'))^2]$
- Target: $Y(s, a, r, s') = r + \gamma \min_{j=1,2} Q_{\phi'_j}(s', a', \phi(s')) - \alpha \log \pi_\theta(a'|s')$ with $a' \sim \pi_\theta(\cdot|s')$
- Actor: $\mathcal{J}_\pi = \mathbb{E}_{s,a \sim \pi_\theta}[\alpha \log \pi_\theta(a|s) - \min_{i} Q_{\phi_i}(s, a, \phi(s))]$
Parameter Update: End-to-end differentiation updates FM, EAC, and supporting networks via gradients from actor-critic objectives.

4. Network Customization and Modularity

FM-EAC accommodates significant architectural flexibility:

Module	Candidate Networks (examples)	Use Case
Feature Model	GNN, PAN, BPN, Conv/FCN, handcrafted	General/structured features
Actor	SAC, TD3, PPO, hybrids	Discrete/continuous, hybrid
Critic	Standard Q, value, or distributional critics	Value estimation

Sub-networks can be selected to exploit specific inductive biases: GNNs for relational reasoning (multi-UAV), PANs for geometric structure, BPNs for task-specific information (e.g., energy-aware policies), and so on. Actor–critic variants can be matched to task requirements (action type, policy shape).

5. Empirical Performance and Sample Efficiency

FM-EAC demonstrates strong empirical performance across both image-based single-task RL (Boney et al., 2021) and multi-task environment benchmarks (Zhou et al., 17 Dec 2025):

On DeepMind Control Suite tasks, feature-point FM-EAC approaches match or nearly match state-centric SAC and outperform pixel-based SAC, DrQ, CURL, SAC-AE, and Dreamer in sample efficiency and return.
On urban and agricultural tasks (e.g., multi-UAV package delivery and sensing), customized FM-EAC variants (PAN-EAC, GNN-EAC) achieve the highest average rewards and minimal convergence times over unseen maps. Typical metrics include urban QoS, agricultural age-of-information (AoI), and reward.

Algorithm	Reward ( $\pm$ std)	Online ms	Offline ms	Urban QoS	Agri AoI
TD3	$410 \pm 62$	$15.2$	$79.5$	$8.26$	$1.77$
SAC	$446 \pm 50$	$17.3$	$74.3$	$7.47$	$1.80$
MBPO	$134 \pm 80$	$44.9$	$73.5$	$7.29$	$1.97$
PAN-EAC	$1392 \pm 62$	$17.0$	$69.5$	$8.00$	$1.10$
GNN-EAC	$1400 \pm 59$	$35.9$	$36.3$	$8.08$	$1.27$

Ablations establish that feature-point count $K$ is not critical for asymptotic returns, but temporal features (e.g., explicit velocities) accelerate learning. Auxiliary losses or decoders are unnecessary, as the bottlenecked geometric representation suffices for effective control. Pretrained FM layers do not generalize well across environments compared to end-to-end adaptation.

6. Computational Complexity and Implementation Insights

FM-EAC can be adapted to available computational budgets via selection of base networks and rollout lengths. Computational complexity of core modules is determined by:

GNNs: $O(N^2)$ for dense adjacency operations ( $N =$ number of nodes).
PANs: $O(P)$ relative to the number of points for lightweight execution.
Actor–Critic: $O(B \cdot |\theta|)$ where $B$ is the batch size and $|\theta|$ denotes the number of parameters.

Efficient implementations utilize batch-optimized forward passes, separable softmax (for feature-point extraction), and Polyak-averaged target networks to ensure stability. Real-time transfer to field-scale urban and agricultural deployment is feasible by selecting minimal FM architectures or fixed feature matrices.

7. Generalizability, Transfer, and Applications

FM-EAC’s modularity and reliance on task-relevant features confer robust transferability across unseen task instances and environments:

Generalization: Shared FM supports spatio-temporal pattern transfer—e.g., urban to agricultural maps—without retraining.
Sample Efficiency: Off-policy actor-critic loop with imagination rollouts supports Dyna-style planning, reducing real environment sample counts by 3×–10× compared to standard SAC on pixels.
Applications: Includes multi-UAV package delivery with mobile edge computing, precision agriculture/wireless sensing, and extensions to multi-robot systems, autonomous driving fleets, and multiplayer games.

Meta-heuristics (e.g., ant colony optimization, genetic algorithms, particle swarm optimization) exhibit inferior sample efficiency and fail in sequential, dynamic domains compared to FM-EAC.

FM-EAC unifies model-based planning and model-free actor-critic learning via a feature-centric representation. By bottlenecking high-dimensional observations to compact, disentangled features and propagating gradients through actor–critic losses, FM-EAC supports efficient learning, broad generalization, and modular extensibility for future RL applications in complex, dynamic environments (Boney et al., 2021, Zhou et al., 17 Dec 2025).

PDF Markdown Chat (Pro)

References (2)

Learning of feature points without additional supervision improves reinforcement learning from images (2021)

FM-EAC: Feature Model-based Enhanced Actor-Critic for Multi-Task Control in Dynamic Environments (2025)

Whiteboard

Generate a whiteboard explanation of this topic.

Topic to Video (Beta)

Generate a video overview of this topic.

Follow Topic

Get notified by email when new papers are published related to Feature Model-Based Enhanced Actor-Critic (FM-EAC).

FM-EAC: Feature Model-Based Enhanced Actor-Critic

1. Model Architecture and Key Components

2. Feature Extraction Strategies

a) Differentiable Feature-Point Bottleneck (Boney et al., 2021)

b) General Feature Modelization (Zhou et al., 17 Dec 2025)

3. Enhanced Actor-Critic Learning

4. Network Customization and Modularity

5. Empirical Performance and Sample Efficiency

6. Computational Complexity and Implementation Insights

7. Generalizability, Transfer, and Applications

Whiteboard

Topic to Video (Beta)

Follow Topic

Continue Learning

FM-EAC: Feature Model-Based Enhanced Actor-Critic

1. Model Architecture and Key Components

2. Feature Extraction Strategies

a) Differentiable Feature-Point Bottleneck (Boney et al., 2021)

b) General Feature Modelization (Zhou et al., 17 Dec 2025)

3. Enhanced Actor-Critic Learning

4. Network Customization and Modularity

5. Empirical Performance and Sample Efficiency

6. Computational Complexity and Implementation Insights

7. Generalizability, Transfer, and Applications

Sponsor

Whiteboard

Topic to Video (Beta)

Follow Topic

Continue Learning

Related Topics