Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
102 tokens/sec
GPT-4o
59 tokens/sec
Gemini 2.5 Pro Pro
43 tokens/sec
o3 Pro
6 tokens/sec
GPT-4.1 Pro
50 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

Enabling and Analyzing How to Efficiently Extract Information from Hybrid Long Documents with LLMs (2305.16344v2)

Published 24 May 2023 in cs.CL and cs.AI

Abstract: LLMs demonstrate exceptional performance in textual understanding and tabular reasoning tasks. However, their ability to comprehend and analyze hybrid text, containing textual and tabular data, remains underexplored. In this research, we specialize in harnessing the potential of LLMs to comprehend critical information from financial reports, which are hybrid long-documents. We propose an Automated Financial Information Extraction (AFIE) framework that enhances LLMs' ability to comprehend and extract information from financial reports. To evaluate AFIE, we develop a Financial Reports Numerical Extraction (FINE) dataset and conduct an extensive experimental analysis. Our framework is effectively validated on GPT-3.5 and GPT-4, yielding average accuracy increases of 53.94% and 33.77%, respectively, compared to a naive method. These results suggest that the AFIE framework offers accuracy for automated numerical extraction from complex, hybrid documents.

User Edit Pencil Streamline Icon: https://streamlinehq.com
Authors (9)
  1. Chongjian Yue (3 papers)
  2. Xinrun Xu (15 papers)
  3. Xiaojun Ma (13 papers)
  4. Lun Du (50 papers)
  5. Hengyu Liu (30 papers)
  6. Zhiming Ding (14 papers)
  7. Yanbing Jiang (4 papers)
  8. Shi Han (74 papers)
  9. Dongmei Zhang (193 papers)
Citations (3)