Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
97 tokens/sec
GPT-4o
53 tokens/sec
Gemini 2.5 Pro Pro
44 tokens/sec
o3 Pro
5 tokens/sec
GPT-4.1 Pro
47 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

gTBLS: Generating Tables from Text by Conditional Question Answering (2403.14457v1)

Published 21 Mar 2024 in cs.CL, cs.LG, and cs.IR

Abstract: Distilling large, unstructured text into a structured, condensed form such as tables is an open research problem. One of the primary challenges in automatically generating tables is ensuring their syntactic validity. Prior approaches address this challenge by including additional parameters in the Transformer's attention mechanism to attend to specific rows and column headers. In contrast to this single-stage method, this paper presents a two-stage approach called Generative Tables (gTBLS). The first stage infers table structure (row and column headers) from the text. The second stage formulates questions using these headers and fine-tunes a causal LLM to answer them. Furthermore, the gTBLS approach is amenable to the utilization of pre-trained LLMs in a zero-shot configuration, presenting a solution for table generation in situations where fine-tuning is not feasible. gTBLS improves prior approaches by up to 10% in BERTScore on the table construction task and up to 20% on the table content generation task of the E2E, WikiTableText, WikiBio, and RotoWire datasets.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (36)
  1. TEXT2TABLE: medical text summarization system based on named entity recognition and modality identification. In Proceedings of the Workshop on BioNLP - BioNLP ’09, page 185, Boulder, Colorado. Association for Computational Linguistics.
  2. Table-to-Text: Describing Table Region with Natural Language. ArXiv:1805.11234 [cs].
  3. Generating a table-of-contents. In Proceedings of the 45th Annual Meeting of the Association of Computational Linguistics, pages 544–551, Prague, Czech Republic. Association for Computational Linguistics.
  4. Language models are few-shot learners. Advances in neural information processing systems, 33:1877–1901.
  5. FakeTables: Using GANs to Generate Functional Dependency Preserving Tables with Bounded Real Data. In Proceedings of the Twenty-Eighth International Joint Conference on Artificial Intelligence, pages 2074–2080, Macao, China. International Joint Conferences on Artificial Intelligence Organization.
  6. Jiaao Chen and Diyi Yang. 2020. Multi-view sequence-to-sequence models with conversational structure for abstractive dialogue summarization. In Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP), pages 4106–4118, Online. Association for Computational Linguistics.
  7. Scaling Instruction-Finetuned Language Models. ArXiv:2210.11416 [cs].
  8. Structure-grounded pretraining for text-to-SQL. In Proceedings of the 2021 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, pages 1337–1350, Online. Association for Computational Linguistics.
  9. QA-driven zero-shot slot filling with weak supervision pretraining. In Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing (Volume 2: Short Papers), pages 654–664, Online. Association for Computational Linguistics.
  10. Improved and efficient conversational slot labeling through question answering.
  11. A weakly-supervised approach for discovering new user intents from search query logs. In Interspeech 2013, pages 3780–3784. ISCA.
  12. Larry Heck and Simon Heck. 2020. Zero-shot visual slot filling as question answering. CoRR, abs/2011.12340.
  13. mforms: Multimodal form-filling with question answering. In Proceedings of the THE 2024 Joint International Conference on Computational Linguistics, Language Resources and Evaluation, Turin, Italy.
  14. Dr. summarize: Global summarization of medical dialogue by exploiting local structures. In Findings of the Association for Computational Linguistics: EMNLP 2020, pages 3755–3763, Online. Association for Computational Linguistics.
  15. Neural text generation from structured data with application to the biography domain. In Proceedings of the 2016 Conference on Empirical Methods in Natural Language Processing, pages 1203–1213, Austin, Texas. Association for Computational Linguistics.
  16. Zero-shot relation extraction via reading comprehension. In Proceedings of the 21st Conference on Computational Natural Language Learning (CoNLL 2017), pages 333–342, Vancouver, Canada. Association for Computational Linguistics.
  17. A unified MRC framework for named entity recognition. In Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics, pages 5849–5859, Online. Association for Computational Linguistics.
  18. Decoupled weight decay regularization.
  19. Unified structure generation for universal information extraction. In Proceedings of the 60th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), pages 5755–5772, Dublin, Ireland. Association for Computational Linguistics.
  20. Multi-Task Identification of Entities, Relations, and Coreference for Scientific Knowledge Graph Construction. ArXiv:1808.09602 [cs].
  21. HybriDialogue: An information-seeking dialogue dataset grounded on tabular and textual data. In Findings of the Association for Computational Linguistics: ACL 2022, pages 481–492, Dublin, Ireland. Association for Computational Linguistics.
  22. Abstractive text summarization using sequence-to-sequence rnns and beyond.
  23. Language model is all you need: Natural language understanding as question answering. In ICASSP 2021 - 2021 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pages 7803–7807.
  24. The E2E Dataset: New Challenges For End-to-End Generation. ArXiv:1706.09254 [cs].
  25. Data synthesis based on generative adversarial networks. Proceedings of the VLDB Endowment, 11(10):1071–1083.
  26. STable: Table Generation Framework for Encoder-Decoder Models. ArXiv:2206.04045 [cs].
  27. Get to the point: Summarization with pointer-generator networks. In Proceedings of the 55th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), pages 1073–1083, Vancouver, Canada. Association for Computational Linguistics.
  28. Unsupervised abstractive meeting summarization with multi-sentence compression and budgeted submodular maximization. In Proceedings of the 56th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), pages 664–674, Melbourne, Australia. Association for Computational Linguistics.
  29. Anirudh S. Sundar and Larry Heck. 2023. cTBLS: Augmenting large language models with conversational tables. In Proceedings of the 5th Workshop on NLP for Conversational AI (NLP4ConvAI 2023), pages 59–70, Toronto, Canada. Association for Computational Linguistics.
  30. Struc-Bench: Are Large Language Models Really Good at Generating Complex Structured Data? ArXiv:2309.08963 [cs].
  31. Llama: Open and efficient foundation language models. arXiv preprint arXiv:2302.13971.
  32. Attention is all you need. Advances in neural information processing systems, 30.
  33. Challenges in data-to-document generation. In Proceedings of the 2017 Conference on Empirical Methods in Natural Language Processing, pages 2253–2263, Copenhagen, Denmark. Association for Computational Linguistics.
  34. Text-to-table: A new way of information extraction. In Proceedings of the 60th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), pages 2518–2533, Dublin, Ireland. Association for Computational Linguistics.
  35. Lei Xu and Kalyan Veeramachaneni. 2018. Synthesizing Tabular Data using Generative Adversarial Networks. ArXiv:1811.11264 [cs, stat].
  36. BERTScore: Evaluating Text Generation with BERT. In International Conference on Learning Representations.
User Edit Pencil Streamline Icon: https://streamlinehq.com
Authors (3)
  1. Anirudh Sundar (8 papers)
  2. Christopher Richardson (8 papers)
  3. Larry Heck (41 papers)
Citations (4)

Summary

We haven't generated a summary for this paper yet.