Dice Question Streamline Icon: https://streamlinehq.com

Detailed evaluation of square-root context growth in gather configuration

Evaluate the effectiveness of using a square-root growth function to control the amount of peripheral context included by the gather operator as the document-to-chunk size ratio increases, and determine whether this choice outperforms alternative growth strategies for complex document processing tasks.

Information Square Streamline Icon: https://streamlinehq.com

Background

In the generation agent’s strategy for configuring gather operations, the authors choose a square-root function to scale the amount of peripheral context as the ratio of document size to chunk size increases. This heuristic is intended to avoid overwhelming the LLM with excessive context while preserving useful surrounding information.

The paper notes that this choice is based on empirical observations, and that a more rigorous analysis comparing this function against alternatives has not yet been conducted.

References

The choice of square root is based on empirical observations that the benefit of additional context tends to diminish more drastically as more context is added—-a detailed evaluation is left for future work.

DocETL: Agentic Query Rewriting and Evaluation for Complex Document Processing (2410.12189 - Shankar et al., 16 Oct 2024) in Section 4.2 Agent and System Implementation (Generation Agents — Chunk Sizes)