Impact of extended Levi graph tokenization on GLM behavior
Determine the impact of removing whitespace between concepts and edges that occurs when tokenizing each node of the Levi graph individually in the Graph Language Model’s graph preprocessing pipeline, specifically on the resulting tokenization and representations produced when encoding Graphs of Triplets.
References
This removes whitespace between concepts and edges, which impacts tokenization. We leave investigation of the impact of this effect to future work.
— Graph Language Models
(2401.07105 - Plenz et al., 13 Jan 2024) in Section 4, Graph Language Model — Graph preprocessing (footnote)