Cause of degraded vector-retrieval performance for smaller code LLMs
Determine whether the significantly worse vector-retrieval baseline performance observed with StarCoder2-15B relative to a larger model (GPT-4) is caused by heightened sensitivity to erroneous syntax in the prompt created by chunk truncation, and establish to what extent chunk truncation–induced syntax errors drive this performance gap.
References
Vector retrieval baseline performance was significantly worse (in absolute and relative terms) than with the larger model. We conjecture that this is due to a heightened sensitivity to erroneous syntax in the prompt created by chunk truncation.
— Statically Contextualizing Large Language Models with Typed Holes
(2409.00921 - Blinn et al., 2 Sep 2024) in Subsection “Hazel StarCoder2-15B Results”