Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
41 tokens/sec
GPT-4o
60 tokens/sec
Gemini 2.5 Pro Pro
44 tokens/sec
o3 Pro
8 tokens/sec
GPT-4.1 Pro
50 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

Detect, Retrieve, Comprehend: A Flexible Framework for Zero-Shot Document-Level Question Answering (2210.01959v3)

Published 4 Oct 2022 in cs.CL, cs.AI, and cs.LG

Abstract: Researchers produce thousands of scholarly documents containing valuable technical knowledge. The community faces the laborious task of reading these documents to identify, extract, and synthesize information. To automate information gathering, document-level question answering (QA) offers a flexible framework where human-posed questions can be adapted to extract diverse knowledge. Finetuning QA systems requires access to labeled data (tuples of context, question and answer). However, data curation for document QA is uniquely challenging because the context (i.e. answer evidence passage) needs to be retrieved from potentially long, ill-formatted documents. Existing QA datasets sidestep this challenge by providing short, well-defined contexts that are unrealistic in real-world applications. We present a three-stage document QA approach: (1) text extraction from PDF; (2) evidence retrieval from extracted texts to form well-posed contexts; (3) QA to extract knowledge from contexts to return high-quality answers -- extractive, abstractive, or Boolean. Using QASPER for evaluation, our detect-retrieve-comprehend (DRC) system achieves a +7.19 improvement in Answer-F1 over existing baselines while delivering superior context selection. Our results demonstrate that DRC holds tremendous promise as a flexible framework for practical scientific document QA.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (30)
  1. Ms marco: A human generated machine reading comprehension dataset. arXiv preprint arXiv:1611.09268.
  2. Longformer: The long-document transformer. arXiv preprint arXiv:2004.05150.
  3. DUE: End-to-End Document Understanding Benchmark. In Vanschoren, J.; and Yeung, S., eds., Proceedings of the Neural Information Processing Systems Track on Datasets and Benchmarks, volume 1.
  4. Signature Verification using a ”Siamese” Time Delay Neural Network. In Cowan, J.; Tesauro, G.; and Alspector, J., eds., Advances in Neural Information Processing Systems, volume 6. Morgan-Kaufmann.
  5. Utilizing Evidence Spans via Sequence-Level Contrastive Learning for Long-Context Question Answering. arXiv preprint arXiv:2112.08777.
  6. ELECTRA: Pre-training Text Encoders as Discriminators Rather Than Generators. In 8th International Conference on Learning Representations, ICLR 2020, Addis Ababa, Ethiopia, April 26-30, 2020. OpenReview.net.
  7. A Dataset of Information-Seeking Questions and Answers Anchored in Research Papers. In NAACL.
  8. An Image is Worth 16x16 Words: Transformers for Image Recognition at Scale. In International Conference on Learning Representations.
  9. A New Neural Search and Insights Platform for Navigating and Organizing AI Research. In Proceedings of the First Workshop on Scholarly Document Processing, 207–213. Online: Association for Computational Linguistics.
  10. Billion-scale similarity search with gpus. IEEE Transactions on Big Data, 7(3): 535–547.
  11. AxCell: Automatic Extraction of Results from Machine Learning Papers. In Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP), 8580–8594. Online: Association for Computational Linguistics.
  12. Dense Passage Retrieval for Open-Domain Question Answering. In Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP), 6769–6781. Online: Association for Computational Linguistics.
  13. UnifiedQA-v2: Stronger generalization via broader cross-format training. arXiv preprint arXiv:2202.12359. https://huggingface.co/allenai/unifiedqa-v2-t5-large-1363200.
  14. Natural Questions: a Benchmark for Question Answering Research. Transactions of the Association of Computational Linguistics.
  15. DiT: Self-supervised pre-training for document image transformer. arXiv preprint arXiv:2203.02378.
  16. Pyserini: A Python Toolkit for Reproducible Information Retrieval Research with Sparse and Dense Representations. In Proceedings of the 44th International ACM SIGIR Conference on Research and Development in Information Retrieval, SIGIR ’21, 2356–2362. New York, NY, USA: Association for Computing Machinery. ISBN 9781450380379.
  17. DocVQA: A dataset for VQA on document images. In Proceedings of the IEEE/CVF winter conference on applications of computer vision, 2200–2209.
  18. Passage Re-ranking with BERT. ArXiv, abs/1901.04085.
  19. Multi-objective Representation Learning for Scientific Document Retrieval. In Proceedings of the Third Workshop on Scholarly Document Processing. Gyeongju, Republic of Korea: Association for Computational Linguistics.
  20. The Probabilistic Relevance Framework: BM25 and Beyond. Found. Trends Inf. Retr., 3(4): 333–389.
  21. Simple Entity-Centric Questions Challenge Dense Retrievers. In Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing, 6138–6148. Online and Punta Cana, Dominican Republic: Association for Computational Linguistics.
  22. PDFminer.six. https://github.com/pdfminer/pdfminer.six.
  23. End-to-End Training of Multi-Document Reader and Retriever for Open-Domain Question Answering. In Ranzato, M.; Beygelzimer, A.; Dauphin, Y.; Liang, P.; and Vaughan, J. W., eds., Advances in Neural Information Processing Systems, volume 34, 25968–25981. Curran Associates, Inc.
  24. Smith, R. 2007. An Overview of the Tesseract OCR Engine. In Proceedings of the Ninth International Conference on Document Analysis and Recognition - Volume 02, ICDAR ’07, 629–633. USA: IEEE Computer Society. ISBN 0769528228.
  25. On generating extended summaries of long documents. arXiv preprint arXiv:2012.14136.
  26. VisualMRC: Machine Reading Comprehension on Document Images. Proceedings of the AAAI Conference on Artificial Intelligence, 35(15): 13878–13888.
  27. BEIR: A Heterogeneous Benchmark for Zero-shot Evaluation of Information Retrieval Models. In Vanschoren, J.; and Yeung, S., eds., Proceedings of the Neural Information Processing Systems Track on Datasets and Benchmarks, volume 1.
  28. Attention is All you Need. In Guyon, I.; Luxburg, U. V.; Bengio, S.; Wallach, H.; Fergus, R.; Vishwanathan, S.; and Garnett, R., eds., Advances in Neural Information Processing Systems, volume 30. Curran Associates, Inc.
  29. Comparing Score Aggregation Approaches for Document Retrieval with Pretrained Transformers. In Advances in Information Retrieval: 43rd European Conference on IR Research, ECIR 2021, Virtual Event, March 28 – April 1, 2021, Proceedings, Part II, 150–163. Berlin, Heidelberg: Springer-Verlag. ISBN 978-3-030-72239-5.
  30. Talk to Papers: Bringing Neural Question Answering to Academic Search. In Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics: System Demonstrations, 30–36. Online: Association for Computational Linguistics.
User Edit Pencil Streamline Icon: https://streamlinehq.com
Authors (8)
  1. Tavish McDonald (2 papers)
  2. Brian Tsan (7 papers)
  3. Amar Saini (6 papers)
  4. Juanita Ordonez (2 papers)
  5. Luis Gutierrez (4 papers)
  6. Phan Nguyen (7 papers)
  7. Blake Mason (14 papers)
  8. Brenda Ng (5 papers)
Citations (3)