Open Question: Interpreting the Magnitude of Cross-Modal Alignment Scores
Determine whether a mutual nearest-neighbor alignment score of approximately 0.16 between language and vision model representations reflects strong alignment with residual variance attributable to noise or indicates poor alignment with substantial representational differences remaining to be explained.
References
Is a score of $0.16$ indicative of strong alignment with the remaining gap being “noise” or does it signify poor alignment with major differences left to explain? We leave this as an open question.
                — The Platonic Representation Hypothesis
                
                (2405.07987 - Huh et al., 13 May 2024) in Section 6 (Counterexamples and limitations), paragraph “Lots left to explain”