Scalability of multi-language syntax highlighting models beyond six languages

Determine the performance of the convolutional neural network-based multi-language and few-shot syntax highlighting models introduced in this study (ML32/ML64/ML128 and FS32/FS64/FS128, with or without Token Normalization) when trained and evaluated on datasets comprising more than six programming languages, to assess scalability and potential limitations in handling diverse and larger multilingual datasets.

Background

The paper evaluates CNN-based syntax highlighting models (single-language, multi-language, and few-shot variants) across six mainstream programming languages—Java, Kotlin, Python, C++, C#, and JavaScript—using a large, deterministic dataset derived from brute-force syntax highlighters. The proposed Token Normalization technique improves cross-language generalization, and multi-LLMs match single-language performance on these six languages.

However, the authors note a limitation: the experiments and validations are confined to six languages, and the behavior of the proposed multi-language and few-shot models in scenarios involving a larger number of languages remains untested. Given that state-of-practice tools like Pygments support hundreds of languages, establishing performance beyond six languages is critical to understanding the scalability and real-world applicability of the approach.

References

However, the performance of these models in scenarios involving more than six languages has not been investigated.

Multi Language Models for On-the-Fly Syntax Highlighting (2510.04166 - Palma et al., 5 Oct 2025) in Section 3.5 (Threats to Validity)