Reliability and appropriate complexity of probe methods
Ascertain the reliability of probe-based methods for evaluating whether intermediate neural representations encode world-model state, and determine the appropriate function complexity for probes used in such analyses to ensure valid conclusions about model internal structure.
References
However, there are open questions about the reliability of probes \citep{belinkov2022probing}, such as appropriate function complexity \citep{alain2018understanding, cao2021low,li2023othello}.
— What Has a Foundation Model Found? Using Inductive Bias to Probe for World Models
(2507.06952 - Vafa et al., 9 Jul 2025) in Section 6 (Related Work)