Papers
Topics
Authors
Recent
2000 character limit reached

Masked Inverse Reinforcement Learning

Updated 20 November 2025
  • Masked Inverse Reinforcement Learning is a framework that obfuscates reward-relevant information using techniques like action mixing and summary-based data masking.
  • It employs diverse strategies—including Bayesian adversarial games, constraint perturbation, and LLM-guided invariance—to hinder direct reward recovery.
  • Practical applications span cybersecurity, radar evasion, and robot learning, achieving improved sample efficiency and robust performance under uncertainty.

Masked Inverse Reinforcement Learning (Masked IRL) encompasses a family of approaches in inverse reinforcement learning in which the observable behavior, reward structure, or relevant state space are deliberately obfuscated or only partially revealed. This obfuscation—either strategic, adversarial, or caused by natural data summarization—prevents straightforward recovery of the underlying reward function from demonstrations. Masked IRL thus arises in diverse settings: strategic multi-agent domains with information asymmetry, IRL from summary or partial data, adversarial scenarios where agents actively prevent being reverse-engineered, and recent advances leveraging LLMs to help disambiguate user intent from ambiguous demonstrations. These variants share the goal of controlling, limiting, or strategically shaping the information available to the inverse learner.

1. Mathematical Formalism and Canonical Settings

Masked IRL formalizations vary widely depending on the information restrictions and strategic behavior involved:

Non-cooperative/Bayesian Adversarial Games

In the "Non-Cooperative Inverse Reinforcement Learning" (N-CIRL) framework, Masked IRL is realized as a two-player zero-sum Markov game with one-sided incomplete information. The informed player ("attacker") knows the true reward parameter θ\theta, while the other player ("defender") must infer θ\theta and act optimally against the worst case (Zhang et al., 2019). The Markov game is defined by:

  • State space SS and action sets A1(s),A2(s)A_1(s), A_2(s) for both players.
  • Transition kernel P(ss,a,d)P(s'|s,a,d).
  • Finite intent parameters Θ\Theta from which θ\theta is drawn, known only to the attacker.
  • Both players choose mixed strategies, with the defender maintaining a belief distribution μtΔ(Θ)\mu_t \in \Delta(\Theta), and the attacker leveraging this belief to "mask" intent via action randomization.

IRL from Summary/Masked Data

When only partial or summarized demonstrations are available, Masked IRL can be cast in terms of a summarizing function σ\sigma, so that only ξσ=σ(τ)\xi_\sigma = \sigma(\tau) is observed for a full trajectory τ\tau (Kangasrääsiö et al., 2017). The generative model is:

  • p(ξσθ)=τP(ξστ)P(τθ)p(\xi_\sigma|\theta) = \sum_{\tau} P(\xi_\sigma|\tau) P(\tau|\theta), where P(τθ)P(\tau|\theta) is the trajectory probability induced by the (unknown) optimal policy under reward parameters θ\theta.
  • Inference over θ\theta requires marginalizing over all plausible trajectories consistent with each summary.

Strategic Masking Against IRL Attackers

In the "Inverse-Inverse RL" (I-IRL) setting, the agent faces an adversary performing IRL with the goal of reconstructing the agent’s constraints or utility (Pattanayak et al., 2022). The agent deliberately perturbs its responses—solving

xk=argmaxxuk(x)s.t.g(x)γkx_k^* = \arg\max_x u_k(x) \quad \text{s.t.} \quad g(x) \leq \gamma_k

with thresholds γkpk\gamma_k \approx p_k but chosen to collapse the adversary’s margin in revealed-preference-based IRL reconstruction.

Masked IRL via Language-Guided Structure Learning

Recent approaches leverage LLMs to infer masks m{0,1}dm \in \{0,1\}^d indicating state-dimension relevance, enforcing invariance of the reward function to irrelevant elements (Hwang et al., 18 Nov 2025). The composite loss is

J(θ)=LIRL(θ)+λLmask(θ)\mathcal{J}(\theta) = \mathcal{L}_\mathrm{IRL}(\theta) + \lambda \mathcal{L}_\mathrm{mask}(\theta)

where

Lmask(θ)=E(τ,,m)sτj=1d(1m(j))rθ(s(j))rθ(s)\mathcal{L}_\mathrm{mask}(\theta) = \mathbb{E}_{(τ,ℓ,m)} \sum_{s \in τ} \sum_{j=1}^d (1-m^{(j)}) | r_\theta(s^{(j)}|\ell) - r_\theta(s|\ell) |

and s(j)s^{(j)} is ss with the jjth coordinate perturbed.

2. Masking Mechanisms and Strategic Obfuscation

The core mechanism in Masked IRL is the active, passive, or induced obfuscation of reward-relevant information:

  • Bayesian Masking in Adversarial Markov Games: The informed player mixes its actions such that the updated opponent belief μ\mu' remains diffuse, trading off immediate reward for future information asymmetry (Zhang et al., 2019).
  • Summary IRL Masking: Only low-dimensional or coarse summaries (e.g., path length, total reward) are observed. The mapping σ\sigma may have a large preimage and need not factor over time, strongly masking the agent’s full behavior (Kangasrääsiö et al., 2017).
  • Adversarial Constraint Spoofing (I-IRL): The agent optimizes the reduction of the adversary's revealed-preference IRL margin subject to minimal performance loss, ensuring reconstructed constraints are unreliable (Pattanayak et al., 2022).
  • Explicit State-Dimension Masking: An LLM interprets ambiguous instructions, predicts relevant dimensions, and a loss function penalizes reward sensitivity to irrelevant components, enforcing functional invariance to masked variables (Hwang et al., 18 Nov 2025).

A summary of major masking strategies across settings:

Setting Masking Mechanism Benefit Sought
N-CIRL (zero-sum Markov games) Action mixing, belief confusion Slowed reward inference, deception
IRL from summary data Arbitrary σ\sigma-filter Privacy, tractability, robustness
Inverse-Inverse RL (I-IRL) Constraint perturbation IRL immune to reconstruction
LLM-guided reward invariance State masking via LLM/aug Disambiguation, generalization

3. Algorithmic Approaches and Inference Procedures

Algorithmic development in Masked IRL centers on handling combinatorial or continuous uncertainty and optimizing under information constraints:

  • Recursive Games with One-Sided Information: The N-CIRL recursion for the informed player computes the game value via backward induction over (s,μ)(s, \mu), using a contraction mapping (backup operator GG) to yield equilibrium strategies. The dual game and operator HH analogously support robust defender strategies (Zhang et al., 2019). Approximate solution is attained via Non-Cooperative PBVI, interpolating value functions and updating through linear programs at selected belief and surrogate points.
  • Marginalized and Likelihood-Free Inference: In IRL from summary data, exact Bayesian inference marginalizes all possible hidden paths. Monte Carlo (importance sampling) and Approximate Bayesian Computation (ABC) offer scalable surrogates, operating with only black-box trajectory and summary generators (Kangasrääsiö et al., 2017). Bayesian optimization with Gaussian processes expedites likelihood surface exploration under high evaluation cost.
  • Bi-level Optimization in I-IRL: The agent solves for minimal deviation masks that collapse the IRL feasibility margin—algorithmically, this is a quadratic program with a margin constraint (Pattanayak et al., 2022).
  • Invariance-Augmented Gradient Descent: In LLM-guided Masked IRL, mask generation (via prompt-based LLM querying) is integrated with IRL gradient descent, with a masking loss enforced through data augmentation and explicit penalization of reward variations w.r.t. masked dimensions. Disambiguation is embedded via LLM question-answering that contextualizes language with demonstration content (Hwang et al., 18 Nov 2025).

4. Applications and Empirical Findings

Masked IRL has been validated across adversarial problem domains, privacy-aware modeling, and robot task generalization:

  • Cybersecurity and Intrusion Detection: N-CIRL is applied to settings where the defender does not know which asset is threatened. Masking retards defensive learning and enables attacker deception (Zhang et al., 2019).
  • Human Task Modeling with Partial Observability: Summarized demonstrations (e.g., total completion time, path length only) are shown to permit recovery of plausible reward models and uncertainty quantification when full trajectories are missing (Kangasrääsiö et al., 2017).
  • Meta-Cognitive Radar Evasion: In radar waveform management, adversarially perturbing resource constraints ensures that an IRL attacker fails to reconstruct true limitations, with provable sample complexity bounds under Gaussian probe noise (Pattanayak et al., 2022).
  • Robot Learning from Language and Demos: Masked IRL leveraging LLM-guided masking achieves up to 15% improvement in average win rate and 4.7× reduction in demonstration sample complexity over baselines, robustly disambiguates intent, and demonstrates substantial regret reductions and improved reward invariance in real-world tasks (Hwang et al., 18 Nov 2025).

5. Theoretical Guarantees and Limitations

Each Masked IRL instantiation comes with distinct analytical tradeoffs and computational properties:

  • Contraction Guarantees: Both the primal (attacker) and dual (defender) backup operators in N-CIRL are γ\gamma-contractions, yielding unique fixed points and convergence of iterative schemes to equilibrium within ϵ\epsilon when belief/surrogate spaces are sufficiently dense (Zhang et al., 2019).
  • Sample Complexity Bounds: The I-IRL strategy, under Gaussian perturbations and margin collapse rate (1η)(1-\eta), prescribes a number of spoofing rounds KO(log(1/δ))K \geq O(\log(1/\delta)) to drive the failure probability below δ\delta (Pattanayak et al., 2022).
  • Likelihood and Posterior Scalability: Exact IRL-from-summary is exponentially hard in horizon/action space, but MC/ABC methods scale linearly with the number of forward simulations. Identifiability strongly depends on the summary function σ\sigma; severe masking broadens posteriors and may render parameters unidentifiable (Kangasrääsiö et al., 2017).
  • LLM Masking Performance: Mask prediction accuracy in ambiguous instructions reaches F1=0.67, increasing to F1=0.78 after LLM disambiguation and F1=0.88 with oracle ground-truth. Regret reduction and reward variance improvements are directly tied to accurate mask extraction from ambiguous input (Hwang et al., 18 Nov 2025).

6. Broader Implications and Extensions

Masked IRL formalizes and extends numerous settings where direct recovery of objectives is impossible, undesirable, or adversarially prevented. Its methods underpin practical tools for privacy in behavior modeling, security in adversarial environments, and robust generalization in embodied learning. Capabilities for handling arbitrary information summarization, masking via language, and derived sample complexity and convergence bounds position Masked IRL as a foundational paradigm for research in incomplete information games, human-AI alignment under ambiguity, and strategic deception or privacy.

Future directions suggested in the literature include adaptive selection of discrepancy functions in ABC, dimensionality reduction in high-dimensional reward spaces, learning or co-inferring summary/masking operators, and integrating Masked IRL principles with broader partially observable MDP frameworks (Kangasrääsiö et al., 2017, Hwang et al., 18 Nov 2025). Developments in LLM-guided task structure extraction further allow Masked IRL to extend into fields where unstructured natural language and demonstration are the primary modes of communication and supervision.

Slide Deck Streamline Icon: https://streamlinehq.com

Whiteboard

Forward Email Streamline Icon: https://streamlinehq.com

Follow Topic

Get notified by email when new papers are published related to Masked Inverse Reinforcement Learning (Masked IRL).