Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
80 tokens/sec
GPT-4o
59 tokens/sec
Gemini 2.5 Pro Pro
43 tokens/sec
o3 Pro
7 tokens/sec
GPT-4.1 Pro
50 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

Bridging Research and Readers: A Multi-Modal Automated Academic Papers Interpretation System (2401.09150v1)

Published 17 Jan 2024 in cs.CL

Abstract: In the contemporary information era, significantly accelerated by the advent of Large-scale LLMs, the proliferation of scientific literature is reaching unprecedented levels. Researchers urgently require efficient tools for reading and summarizing academic papers, uncovering significant scientific literature, and employing diverse interpretative methodologies. To address this burgeoning demand, the role of automated scientific literature interpretation systems has become paramount. However, prevailing models, both commercial and open-source, confront notable challenges: they often overlook multimodal data, grapple with summarizing over-length texts, and lack diverse user interfaces. In response, we introduce an open-source multi-modal automated academic paper interpretation system (MMAPIS) with three-step process stages, incorporating LLMs to augment its functionality. Our system first employs the hybrid modality preprocessing and alignment module to extract plain text, and tables or figures from documents separately. It then aligns this information based on the section names they belong to, ensuring that data with identical section names are categorized under the same section. Following this, we introduce a hierarchical discourse-aware summarization method. It utilizes the extracted section names to divide the article into shorter text segments, facilitating specific summarizations both within and between sections via LLMs with specific prompts. Finally, we have designed four types of diversified user interfaces, including paper recommendation, multimodal Q&A, audio broadcasting, and interpretation blog, which can be widely applied across various scenarios. Our qualitative and quantitative evaluations underscore the system's superiority, especially in scientific summarization, where it outperforms solutions relying solely on GPT-4.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (31)
  1. From sparse to dense: GPT-4 summarization with chain of density prompting. arXiv preprint arXiv:2309.04269 (2023).
  2. Nouf Ibrahim Altmami and Mohamed El Bachir Menai. 2022. Automatic summarization of scientific articles: A survey. Journal of King Saud University-Computer and Information Sciences 34, 4 (2022), 1011–1028.
  3. Longformer: The long-document transformer. arXiv preprint arXiv:2004.05150 (2020).
  4. Re-evaluating evaluation in text summarization. arXiv preprint arXiv:2010.07100 (2020).
  5. Sumit Bhatia and Prasenjit Mitra. 2012. Summarizing figures, tables, and algorithms in scientific publications to augment search results. ACM Transactions on Information Systems (TOIS) 30, 1 (2012), 1–24.
  6. Nougat: Neural optical understanding for academic documents. arXiv preprint arXiv:2308.13418 (2023).
  7. The price of debiasing automatic metrics in natural language evaluation. arXiv preprint arXiv:1807.02202 (2018).
  8. LongLoRA: Efficient Fine-tuning of Long-Context Large Language Models. arXiv:2309.12307 [cs.CL]
  9. Christopher Clark and Santosh Divvala. 2016. Pdffigures 2.0: Mining figures from research papers. In Proceedings of the 16th ACM/IEEE-CS on Joint Conference on Digital Libraries. 143–152.
  10. Bert: Pre-training of deep bidirectional transformers for language understanding. arXiv preprint arXiv:1810.04805 (2018).
  11. Automatic text summarization: A comprehensive survey. Expert systems with applications 165 (2021), 113679.
  12. Bottom-up abstractive summarization. arXiv preprint arXiv:1808.10792 (2018).
  13. Unifying human and statistical evaluation for natural language generation. arXiv preprint arXiv:1904.02792 (2019).
  14. Ctrlsum: Towards generic controllable text summarization. arXiv preprint arXiv:2012.04281 (2020).
  15. Anandini Hetami et al. 2015. Perancangan Information Retrieval (IR) Untuk Pencarian Ide Pokok Teks Artikel Berbahasa Inggris dengn Pembobotan Vector Space Model. Jurnal Ilmiah Teknologi Informasi Asia 9, 1 (2015), 53–59.
  16. An empirical survey on long document summarization: Datasets, models, and metrics. ACM computing surveys 55, 8 (2022), 1–35.
  17. Neural text summarization: A critical evaluation. arXiv preprint arXiv:1908.08960 (2019).
  18. Lost in the middle: How language models use long contexts. arXiv preprint arXiv:2307.03172 (2023).
  19. Bringing structure into summaries: a faceted summarization dataset for long scientific documents. arXiv preprint arXiv:2106.00130 (2021).
  20. A deep reinforced model for abstractive summarization. arXiv preprint arXiv:1705.04304 (2017).
  21. Exploring the limits of transfer learning with a unified text-to-text transformer. The Journal of Machine Learning Research 21, 1 (2020), 5485–5551.
  22. Get to the point: Summarization with pointer-generator networks. arXiv preprint arXiv:1704.04368 (2017).
  23. Automated text summarization for indonesian article using vector space model. In IOP Conference Series: Materials Science and Engineering, Vol. 288. IOP Publishing, 012037.
  24. D2S: Document-to-slide generation via query-based text summarization. arXiv preprint arXiv:2105.03664 (2021).
  25. Simone Teufel and Marc Moens. 2002. Summarizing scientific articles: experiments with relevance and rhetorical status. Computational linguistics 28, 4 (2002), 409–445.
  26. Attention is all you need. Advances in neural information processing systems 30 (2017).
  27. Chain-of-thought prompting elicits reasoning in large language models. Advances in Neural Information Processing Systems 35 (2022), 24824–24837.
  28. Controllable abstractive dialogue summarization with sketch supervision. arXiv preprint arXiv:2105.14064 (2021).
  29. Amplifying scientific paper’s abstract by leveraging data-weighted reconstruction. Information Processing & Management 52, 4 (2016), 698–719.
  30. Big bird: Transformers for longer sequences. Advances in neural information processing systems 33 (2020), 17283–17297.
  31. Pegasus: Pre-training with extracted gap-sentences for abstractive summarization. In International Conference on Machine Learning. PMLR, 11328–11339.
User Edit Pencil Streamline Icon: https://streamlinehq.com
Authors (3)
  1. Feng Jiang (97 papers)
  2. Kuang Wang (3 papers)
  3. Haizhou Li (285 papers)
Citations (3)
X Twitter Logo Streamline Icon: https://streamlinehq.com

Tweets

Youtube Logo Streamline Icon: https://streamlinehq.com