Explain bimodal similarity in non-CC-licensed Grokipedia articles
Determine the underlying cause(s) of the bimodal distribution observed in per-article average chunk cosine similarity between non-Creative-Commons-licensed Grokipedia articles and their corresponding English Wikipedia articles; rigorously evaluate whether article length or chunk position explains the higher-similarity mode.
References
We do not know precisely why the non-CC-licensed entry distribution shows bimodality, but speculate that the higher peak corresponds to shorter non-CC-licensed articles.
— What did Elon change? A comprehensive analysis of Grokipedia
(2511.09685 - Triedman et al., 12 Nov 2025) in Section 3.2 Similarity