Bias introduced by chunkization in long-document summarization
Ascertain whether the chunkization approach used to summarize long documents—splitting the document into smaller segments, summarizing each with a large language model such as GPT-3.5-turbo or GPT-4, and aggregating the summaries—introduces systematic bias relative to single-pass summaries generated by a large language model with a sufficiently large context window, including potential inflation of combined summary length.
References
It is not clear whether chunkization may introduce bias for long documents.
— A Scoping Review of ChatGPT Research in Accounting and Finance
(2412.05731 - Dong et al., 7 Dec 2024) in Appendix: Technical Guide — Context Window