gr-envs for Dynamic Goal Recognition
- gr-envs is a suite of open-source, Gymnasium-compatible environments tailored for online dynamic goal recognition, supporting dynamic goal updates and partial, noisy observation streams.
- It employs observation augmentation and goal control wrappers to seamlessly integrate with standard RL toolkits, enhancing algorithm development and benchmarking.
- The framework enables reproducible and modular experiments in both discrete and continuous domains, advancing research in reinforcement learning and human-robot interaction.
gr-envs is a suite of open-source, Gymnasium-compatible environments designed specifically to support research in Online Dynamic Goal Recognition (ODGR). These environments are curated and wrapped to facilitate algorithm development, evaluation, and comparison in dynamic, goal-directed settings that accurately reflect the requirements of modern goal recognition tasks in artificial intelligence, human-robot interaction, and reinforcement learning. The framework ensures reproducibility, extensibility, and seamless integration with standard RL toolkits.
1. Formalization and Problem Setting
gr-envs grounds its design in the formal ODGR task structure. An ODGR problem is defined as a tuple
where is the domain theory (state and action spaces), each is a set of possible goals introduced during the -th goal adaptation phase, and is the corresponding set of observation sequences. This reflects a realistic scenario where agents may adapt to new sets of goals (representing, e.g., changing tasks or user intentions) over time, and a recognition system must infer the agent’s ultimate intent from partial, noisy, or dynamically truncated observation streams. The framework supports this structure by allowing environment instantiation with arbitrary base goal sets (), runtime goal updates, and observation window management.
2. Environment Curation and Adaptation
gr-envs provides a diverse and curated suite of discrete and continuous environments, each adapted for dynamic, goal-directed behavior:
- Discrete domains: e.g., grid-worlds such as Minigrid’s SimpleCrossing and LavaCrossing, where the agent's start positions, target goals, and obstacle layouts can vary between episodes and phases.
- Continuous control domains: e.g., PointMaze environments (in Obstacle and FourRooms variants), Parking (from the highway-env package), and Panda-Gym Reach, supporting complex navigation and manipulation tasks with multi-dimensional state and action spaces.
Each environment has been systematically modified so that episodes begin with a well-specified initial state and target, supports arbitrary goal specification/update during or between episodes, and provides explicit, structured representations of both the agent’s achieved and desired goals. This configuration enables systematic paper of GR algorithms as they adapt to evolving task regimes, switching target distributions, and unanticipated changes in the set of candidate goals.
3. Gymnasium Compatibility and Integration
A central design principle is full API compatibility with the Gymnasium (née OpenAI Gym) interface, including standard methods such as reset
, step
, and render
. This makes all gr-envs domains immediately usable by mainline RL toolkits including Stable Baselines3. Key technical aspects include:
- Observation augmentation: Wrappers systematically inject information about current and desired goals into the observation dictionary, ensuring any RL or GR algorithm can condition its policy or inference procedure on up-to-date goal information.
- Goal control: Environments expose programmatic interfaces to set or alter the current target goal mid-episode, supporting advanced scenarios where agents may change intent or where experimenters can simulate goal switches for stress-test evaluation.
- Compatibility layers: For environments that originally lack sufficient information (such as Minigrid's omission of explicit goal states in observations), wrappers clone, extend, and synchronize environment state to present a consistent, goal-conditioned interface.
This design enables not only standardized benchmarking across algorithms but also direct reuse in RL training pipelines, transfer learning, and meta-learning setups.
4. Wrappers and Abstractions
gr-envs employs a set of robust gym-wrappers for facilitating advanced goal recognition experiments. The wrappers serve critical functions:
Function | Description |
---|---|
Observation augmentation | Adds both the achieved and desired goal to each observation tuple for consumption by learning agents |
Goal control | Allows outside processes (recognition algorithms, evaluation scripts) to switch the target mid-episode |
Visualization and diagnostics | Supports color-coded trajectory overlays, visualization hooks, and logging for qualitative analysis |
By abstracting away differences in base environment implementations and ensuring all necessary GR-relevant features are exposed, these wrappers enable modular, reproducible experiments without the need for environment-specific code modifications.
5. Support for Dynamic and Noisy Evaluation Protocols
gr-envs accommodates dynamic and adversarial experimental protocols fundamental to rigorous GR evaluation:
- Dynamic goal sets: In each adaptation phase, the set of candidate goals can be programmatically altered to emulate environment changes or task redefinitions.
- Partial observations: The environment natively supports systematic observation truncation, sampling, and corruption (for example, introducing observation noise), crucial for simulating limited sensor modalities or adversarial conditions.
- Visualization: Built-in tools overlay agent trajectories, goal locations, and relevant metrics on environment renderings for immediate, qualitative inspection.
This enables controlled experimentation under varying degrees of prior knowledge, observation completeness, and environmental complexity.
6. Benchmarking and Reproducibility
gr-envs, together with the complementary gr-libs package, forms a standardized and extensible experiment suite for algorithm comparison and benchmarking. Recognizers, planners, and learning agents developed with gr-libs can be deployed seamlessly across the full range of environments in gr-envs, with consistent result logging and diagnostics. Both packages are open-source, versioned, and distributed via GitHub and PyPI, supporting rigorous, reproducible experimentation and enabling direct performance comparison across research groups.
7. Implications and Applications
The versatility and careful design of gr-envs enable a wide array of research directions, including:
- Benchmarking and evaluation of ODGR algorithms spanning the spectrum from model-free to MDP-based inferential approaches.
- Study of domain adaptation, transfer, and zero-shot recognition in both synthetic and realistic physical domains (through continuous environments).
- Application to human-robot interaction, surveillance, and assistive systems, where dynamic and adaptive goals are the norm and robust online recognition is required.
A plausible implication is that gr-envs advances the empirical rigor and interoperability of dynamic goal recognition research, aligning the field with contemporary standards in reinforcement learning and broadening its real-world applicability.