Equivariant Soft Actor-Critic (Equi-SAC)
- Equi-SAC is a reinforcement learning algorithm that embeds exact SO(2)-equivariance through steerable group convolutions in both actor and critic networks.
- The architecture integrates equivariant convolutional layers and group-max pooling to ensure invariant value estimations and consistent policy outputs under rotational transformations.
- Empirical evaluations demonstrate that Equi-SAC significantly improves sample efficiency and task performance in robotic visual-control benchmarks compared to standard CNN-based methods.
Equivariant Soft Actor-Critic (Equi-SAC) is a reinforcement learning (RL) algorithm that enforces exact SO(2)-equivariance in both actor and critic networks via steerable group convolutions. By embedding symmetry properties at the algorithmic and architectural levels, Equi-SAC achieves substantial improvements in sample efficiency, particularly in robotic manipulation tasks characterized by underlying rotational symmetries (Wang et al., 2022).
1. Theoretical Foundations
Equi-SAC is grounded in the mathematical theory of group actions and equivariant representations. Consider the group , the group of planar rotations, approximated in practice by its discrete cyclic subgroup .
States are encoded as -channel images: with group action implemented as rotational shifts in pixel space,
where channel values follow the trivial representation .
Actions are partitioned as
with transforming under the standard representation 0—that is,
1
Under these definitions, the optimal value and policy functions exhibit strict group-theoretic structure: 2
Equi-SAC enforces these properties by construction:
- Critic invariance: 3.
- Actor equivariance: 4.
2. Network Architectures
Equivariant layers in Equi-SAC are implemented using group convolutions satisfying
5
These layers are instantiated with the E2CNN library and respect rotational symmetries 6.
Actor Network
- Input: 7 depth image (8-channel trivial representation).
- Core: 9 steerable convolutional layers (0 kernels, ReLU activations), equivariant under 1.
- Output: 2 tensor comprising:
- One 3 vector (two degrees of freedom) for 4.
- Eight 5 scalars for 6.
The steerable basis enables a single convolution to handle all rotated filter versions.
Critic Network
- Encoder: 7 steerable convolutional layers mapping 8, terminate in 9 features (0).
- Action concatenation: Actor output (1 vector 2 3 4 scalars) is appended to encoder output.
- Heads: Two 5-functions, each a stack of two convolutional layers (regular to trivial), concluding with max-pooling over group channels to overcome Schur’s Lemma constraints.
- Output: Two scalar 6 values 7.
Architecture Sketch
8
3. Algorithm and Training Protocol
The Equi-SAC learning process follows an adaptation of the standard Soft Actor-Critic (SAC) framework, with explicit equivariant/invariant losses and update rules:
- Replay Buffer: 9-greedy replay buffer 0, with optional demonstration data.
- Action Sampling: Actor (1) outputs mean and standard deviation; action generated via reparameterization trick. Equivariant components are rotated into a base frame before execution.
- Critic Update: The target value is computed using soft target networks and incorporates the log probability under the policy, maintaining critic invariance under 2.
- Actor Update: The loss involves both entropy maximization and critic evaluation, enforcing actor equivariance through the network structure.
- Temperature Update: Entropy temperature 3 is optionally updated to match target entropy.
- Target Network: Soft update (4) is used for critic target networks.
All convolutional operations inherit 5-equivariance, ensuring that both actor and critic properly respect symmetry constraints throughout optimization.
Pseudocode (abridged)
0
4. Empirical Evaluation
Benchmarks
Equi-SAC was benchmarked on six robotic visual-control tasks:
- Simple: Block Pulling, Object Picking, Drawer Opening
- Hard: Block Stacking, House Building, Corner Picking
Each environment used 6 depth images as state observations, continuous action spaces 7, and sparse rewards (+1 for task success).
Sample Efficiency
Steps required to achieve 80% success rate (mean across 4 seeds):
| Task | Equi-SAC | CNN-SAC | DrQ | RAD | FERM |
|---|---|---|---|---|---|
| Block Pull | 45k | 180k | 150k | 170k | 130k |
| Object Pick | 80k | × | × | × | × |
| Drawer Open | 90k | × | × | × | × |
(8 indicates failure to solve within 300k steps.)
Ablation Study
Effect of equivariant actors and critics (final reward after 200k steps):
| EqActor+EqCrit | EqActor+CNNCrit | CNNActor+EqCrit | |
|---|---|---|---|
| Pull | 0.97 | 0.82 | 0.75 |
| Pick | 0.92 | 0.70 | 0.65 |
| Open | 0.95 | 0.78 | 0.72 |
9 symmetry (eightfold rotation) consistently outperformed 0 (fourfold) in 5 of 6 domains.
5. Implementation and Practical Considerations
Key engineering and hyperparameter guidelines:
- Steerable CNNs: Use E2CNN or similar libraries for group convolutions. Specify input/output representations (1) to enable weight sharing across formal group actions.
- Critic Pooling: Apply group-max pooling before the final scalar output to avoid overconstraint (Schur’s Lemma).
- Action Augmentation: When employing buffer augmentations, rotate 2 by the same SO(2) element as the state.
- Critical Hyperparameters:
- Soft update rate 3
- Actor and critic learning rates: 4 (Adam optimizer)
- Entropy temperature initialization: 5, target entropy 6
- Batch size: 7
- Replay buffer capacity: 8
This structure ensures exact 9-equivariance, resulting in marked improvements in both sample efficiency and final policy performance on visual-control benchmarks.
6. Significance and Context
Equi-SAC demonstrates that explicitly modeling group symmetry within RL architectures can lead to substantial empirical gains in data efficiency and task performance, particularly for domains where physical laws or robot/environment geometry induce SO(2)-invariant MDPs. These results provide a basis for adopting symmetry-preserving architectures in broader robot learning applications and for extending equivariant RL principles to other symmetry groups beyond SO(2) (Wang et al., 2022).