Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
41 tokens/sec
GPT-4o
59 tokens/sec
Gemini 2.5 Pro Pro
41 tokens/sec
o3 Pro
7 tokens/sec
GPT-4.1 Pro
50 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

PDFTriage: Question Answering over Long, Structured Documents (2309.08872v2)

Published 16 Sep 2023 in cs.CL, cs.AI, and cs.LG

Abstract: LLMs have issues with document question answering (QA) in situations where the document is unable to fit in the small context length of an LLM. To overcome this issue, most existing works focus on retrieving the relevant context from the document, representing them as plain text. However, documents such as PDFs, web pages, and presentations are naturally structured with different pages, tables, sections, and so on. Representing such structured documents as plain text is incongruous with the user's mental model of these documents with rich structure. When a system has to query the document for context, this incongruity is brought to the fore, and seemingly trivial questions can trip up the QA system. To bridge this fundamental gap in handling structured documents, we propose an approach called PDFTriage that enables models to retrieve the context based on either structure or content. Our experiments demonstrate the effectiveness of the proposed PDFTriage-augmented models across several classes of questions where existing retrieval-augmented LLMs fail. To facilitate further research on this fundamental problem, we release our benchmark dataset consisting of 900+ human-generated questions over 80 structured documents from 10 different categories of question types for document QA. Our code and datasets will be released soon on Github.

Overview of "AAAI Press Formatting Instructions for Authors Using LaTeX"

The paper "AAAI Press Formatting Instructions for Authors Using LaTeX" serves as a detailed guide designed to aid authors in preparing their manuscripts for submission to the Association for the Advancement of Artificial Intelligence (AAAI) conferences. AAAI outlines rigid formatting requirements and employs the use of specific LaTeX style files to ensure uniformity of submissions. Authored by the AAAI Press staff with contributions from other experienced individuals, the document provides intricate specifications necessary for the publication of accepted papers.

Key Instructions and Requirements

The primary objective of the document is to establish a standardized approach for preparing papers, focusing on the use of the LaTeX document preparation system. It delineates the mandatory steps authors must follow, with a substantial emphasis on ensuring that papers are consistently formatted according to the prescribed AAAI style.

  • Usage of Style Files: Authors are required to apply the 2024 AAAI Press LaTeX style file, aaai24.sty, along with the aaai24.bst bibliography style file, which are both included in the AAAI Author Kit. This condition is non-negotiable, ensuring that all manuscripts maintain a consistent appearance across the publication.
  • Document Compliance: Comprehensive guidelines for document formatting include constraints on font types, document margins, line spacing, and the prohibition of certain commands that could alter the preset template. Authors must ensure no modifications to the style file and adhere strictly to the guidelines presented, failing which the paper will not be accepted.
  • Submission Specifications: There is a clear listing of submission components, notably requiring a compliant PDF and the LaTeX source file. Submissions are examined to ensure they compile without error using a standard TeX distribution, furthering the emphasis on consistency across different platforms.

Technical and Practical Implications

The stringent requirements outlined in the guide highlight the importance AAAI places on consistency, readability, and professional presentation of academic work. By standardizing the format of submissions, AAAI enhances both the editorial process and the accessibility, ensuring each piece of research is presented on an equal footing.

  • Technical Rigor: The guide reflects a high degree of technical rigor, demanding comprehensive attention to detail from researchers during the preparation of their manuscripts. Such a process necessitates a strong familiarity with LaTeX, challenging authors to refine their presentation skills in scientific writing.
  • Facilitation of Review and Publication Processes: Standardization aids the review process, enabling reviewers to focus on content without distractions from formatting issues. Additionally, uniform formatting simplifies the production of proceedings, thereby expediting the publication timeline.

Anticipated Future Developments

Given the fast pace of technological advancements, future iterations of the AAAI Author Kit might integrate more advanced tools that leverage artificial intelligence for automatic compliance checks. Looking ahead, researchers can expect more interactive systems that offer real-time feedback on formatting issues, potentially reducing common errors prior to submission.

Overall, the "Formatting Instructions" paper serves as a comprehensive manual that underlines AAAI's expectations for manuscript preparation, facilitating the clarity and uniformity essential in high-quality academic dissemination.

User Edit Pencil Streamline Icon: https://streamlinehq.com
Authors (7)
  1. Jon Saad-Falcon (19 papers)
  2. Joe Barrow (12 papers)
  3. Alexa Siu (13 papers)
  4. Ani Nenkova (26 papers)
  5. David Seunghyun Yoon (3 papers)
  6. Ryan A. Rossi (124 papers)
  7. Franck Dernoncourt (161 papers)
Citations (10)
Youtube Logo Streamline Icon: https://streamlinehq.com