Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
133 tokens/sec
GPT-4o
7 tokens/sec
Gemini 2.5 Pro Pro
46 tokens/sec
o3 Pro
4 tokens/sec
GPT-4.1 Pro
38 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

Towards Controlled Table-to-Text Generation with Scientific Reasoning (2312.05402v1)

Published 8 Dec 2023 in cs.CL

Abstract: The sheer volume of scientific experimental results and complex technical statements, often presented in tabular formats, presents a formidable barrier to individuals acquiring preferred information. The realms of scientific reasoning and content generation that adhere to user preferences encounter distinct challenges. In this work, we present a new task for generating fluent and logical descriptions that match user preferences over scientific tabular data, aiming to automate scientific document analysis. To facilitate research in this direction, we construct a new challenging dataset CTRLSciTab consisting of table-description pairs extracted from the scientific literature, with highlighted cells and corresponding domain-specific knowledge base. We evaluated popular pre-trained LLMs to establish a baseline and proposed a novel architecture outperforming competing approaches. The results showed that large models struggle to produce accurate content that aligns with user preferences. As the first of its kind, our work should motivate further research in scientific domains.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (18)
  1. “Table-to-text generation by structure-aware seq2seq learning,” in Thirty-Second AAAI Conference on Artificial Intelligence, 2018.
  2. “Logical natural language generation from open-domain tables,” arXiv preprint arXiv:2004.10404, 2020.
  3. “Challenges in data-to-document generation,” in Proceedings of the 2017 Conference on Empirical Methods in Natural Language Processing, 2017, pp. 2253–2263.
  4. “Totto: A controlled table-to-text generation dataset,” arXiv preprint arXiv:2004.14373, 2020.
  5. “The wikipedia xml corpus,” in ACM SIGIR Forum. ACM New York, NY, USA, 2006, vol. 40, pp. 64–69.
  6. “Attention is all you need,” Advances in neural information processing systems, vol. 30, 2017.
  7. “The generality/specificity of expertise in scientific reasoning,” Cognitive science, vol. 23, no. 3, pp. 337–370, 1999.
  8. “Learning and scientific reasoning,” Science, vol. 323, no. 5914, pp. 586–587, 2009.
  9. Corinne Zimmerman, “The development of scientific reasoning skills,” Developmental review, vol. 20, no. 1, pp. 99–149, 2000.
  10. “Scigen: a dataset for reasoning-aware text generation from scientific tables,” in Thirty-fifth Conference on Neural Information Processing Systems Datasets and Benchmarks Track (Round 2), 2021.
  11. “Towards table-to-text generation with numerical reasoning,” in Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing (Volume 1: Long Papers), 2021, pp. 1451–1465.
  12. “Building applied natural language generation systems,” Natural Language Engineering, vol. 3, no. 1, pp. 57–87, 1997.
  13. “Chain-of-thought prompting elicits reasoning in large language models,” Advances in Neural Information Processing Systems, vol. 35, pp. 24824–24837, 2022.
  14. “Tsdae: Using transformer-based sequential denoising auto-encoderfor unsupervised sentence embedding learning,” in Findings of the Association for Computational Linguistics: EMNLP 2021, 2021, pp. 671–688.
  15. “Bart: Denoising sequence-to-sequence pre-training for natural language generation, translation, and comprehension,” in Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics, 2020, pp. 7871–7880.
  16. “Exploring the limits of transfer learning with a unified text-to-text transformer.,” J. Mach. Learn. Res., vol. 21, no. 140, pp. 1–67, 2020.
  17. Akiko Aizawa, “An information-theoretic perspective of tf–idf measures q,” Information Processing and Management, vol. 39, pp. 45–65, 2003.
  18. “Language models are few-shot learners,” Advances in neural information processing systems, vol. 33, pp. 1877–1901, 2020.
Citations (4)

Summary

We haven't generated a summary for this paper yet.