Dice Question Streamline Icon: https://streamlinehq.com

Polynomial vs pseudo-polynomial dependence on game size in QFR’s guarantees

Determine whether the convergence guarantees of the Q-Function based Regret Minimization (QFR) algorithm for two-player zero-sum imperfect-information extensive-form games can be strengthened so that the dependence on the size of the extensive-form game (e.g., the game tree size or depth) is polynomial rather than pseudo-polynomial.

Information Square Streamline Icon: https://streamlinehq.com

Background

The paper introduces Q-Function based Regret Minimization (QFR), a policy-gradient approach that uses trajectory Q-values and a bidilated regularizer to achieve best-iterate convergence in two-player zero-sum imperfect-information extensive-form games, including under stochastic trajectory rollouts.

While QFR provides theoretical convergence guarantees, the analysis currently yields only pseudo-polynomial dependence on the game size. A "Lazy QFR" variant achieves polynomial dependence for certain feedback types but requires impractical lazy updates in non-tabular settings. Thus, establishing polynomial dependence for QFR’s guarantees remains unresolved.

References

Further, whether it is possible to achieve polynomial, instead of pseudo-polynomial, dependence on the game size is unknown.

A Policy-Gradient Approach to Solving Imperfect-Information Games with Iterate Convergence (2408.00751 - Liu et al., 1 Aug 2024) in Conclusions and Future Work (end of paper)