Evaluation methodologies for long-context abilities in LLMs
Develop rigorous and standardized evaluation methodologies for assessing the long-context abilities of large language models when processing input sequences that exceed their pretraining context window lengths.
References
Additionally, evaluation methodologies for assessing long context abilities remain open research questions.
— LLM Maybe LongLM: Self-Extend LLM Context Window Without Tuning
(2401.01325 - Jin et al., 2 Jan 2024) in Conclusion and Discussion: Limitations