Understanding performance degradation with more in-context examples
Determine the causes of the observed performance degradation in many-shot in-context learning when increasing the number of in-context examples in the prompt, with specific focus on the Hendrycks MATH dataset where accuracy declines as shots grow, and explain why negative log-likelihood trends fail to account for this behavior.
References
Another limitation of our work is that we don't completely understand why performance can sometimes degrades with more examples in the prompt (for example, for MATH). Our analysis found that negative log-likelihood trends are insufficient to explain this degradation, and future work should focus on investigating new research directions to shed light on the matter.
                — Many-Shot In-Context Learning
                
                (2404.11018 - Agarwal et al., 17 Apr 2024) in Limitations paragraph (following Conclusion)