Scale-invariant additive error: sufficiency and regimes
Ascertain whether the scale-invariant additive error condition |V̂(x,y_{1:h})−V*(x,y_{1:h})| ≤ V·(x) suffices for efficient test-time alignment when the mean outcome-level reward (x) is unknown; determine whether an average-case variant of this scale-invariant additive condition is sufficient; and characterize regimes of the parameter V in which the condition enables improvements over ActionLevelRS.
References
However, it remains unclear whether \cref{ass:scale-invariant-additive} is sufficient when $(x)$ is unknown. It is also unclear whether an average-case variant of \cref{ass:scale-invariant-additive} could suffice for efficient test-time alignment (in the proof of \cref{prop:scale-invariant-additive}, the first part goes through with an average-case error bound under $$, but the second part does not)---and in what regimes of $V$ it would enable improving upon the error guarantees of ActionLevelRS. We leave these interesting open questions for future work.