Learning low logit rank models using only conditional sampling

Determine whether approximately low logit rank language models can be learned to total variation error ε in time and queries polynomial in T, d, |Σ|, α, 1/δ, and 1/ε using only a conditional sampling oracle that returns y_{t+1} sampled from M(· | y_{1:t}), i.e., without logit query access.

Background

The main algorithm assumes logit-query access, which many practical APIs may not provide. The authors discuss converting conditional samples to logits, but note this can incur exponential dependence on the logit magnitude, making it impractical.

They pose the problem of obtaining the same learning guarantees using only conditional sampling, with polynomial complexity in all relevant parameters, and without the exponential overhead inherent in naive conversions from samples to logits.

References

We ask if it is possible to obtain our results only under this weaker access (without suffering exponential dependence on the value of the logits, as in \cref{rmk:conditional-sampling}): Can we learn (approximately) low-logit rank models to error $\ep$ in $\poly(T,d,|\Sigma|, \alpha, 1/\delta, 1/\ep)$ time using only a conditional sampling oracle?

Provably Learning from Modern Language Models via Low Logit Rank (2512.09892 - Golowich et al., 10 Dec 2025) in Conclusions and Future Directions, Learning from conditional samples paragraph