Self-evaluation of language model capabilities

Determine how to measure the capabilities of a language model that are unknown a priori, including whether and how the language model itself can be used to evaluate its own capabilities or those of similar models, and establish externally verifiable procedures to ensure the validity of such measurements.

Background

The authors caution against relying on a LLM to self-verify its own capabilities, highlighting risks inherent in using the subject of analysis as the evaluator. They pose a direct question about measuring capabilities that are not known in advance, underscoring a gap in methodology.

They suggest a potential direction—expanding a set of objectively verifiable characteristics and checking them externally—and note evidence that LM-as-judge methods can be unreliable or inconsistent across models, which motivates the need for robust, externally validated approaches to capability measurement.

References

How can a system measure capabilities we don't know it has?

— Benchmarks as Microscopes: A Call for Model Metrology (2407.16711 - Saxon et al., 2024) in Limitations (and rebuttals), paragraph titled “LMs evaluating LMs?”

Self-evaluation of language model capabilities

Sponsor

Background

References

Related Problems