Dice Question Streamline Icon: https://streamlinehq.com

Impact of text-as-image prompting at extremely large context lengths

Determine the effectiveness and limitations of representing text as images for multimodal LLMs when contexts span tens of thousands of tokens, quantifying its impact on accuracy, efficiency, and latency, and assessing whether specialized techniques are required to ensure reliable performance at this scale.

References

Despite showing promising token savings on short to medium context scenarios, our work has not yet fully evaluated the impact of text-as-image prompting on extremely large contexts that span tens of thousands of tokens or more.