iTBLS: A Dataset of Interactive Conversations Over Tabular Information (2404.12580v1)
Abstract: This paper introduces Interactive Tables (iTBLS), a dataset of interactive conversations situated in tables from scientific articles. This dataset is designed to facilitate human-AI collaborative problem-solving through AI-powered multi-task tabular capabilities. In contrast to prior work that models interactions as factoid QA or procedure synthesis, iTBLS broadens the scope of interactions to include mathematical reasoning, natural language manipulation, and expansion of existing tables from natural language conversation by delineating interactions into one of three tasks: interpretation, modification, or generation. Additionally, the paper presents a suite of baseline approaches to iTBLS, utilizing zero-shot prompting and parameter-efficient fine-tuning for different computing situations. We also introduce a novel multi-step approach and show how it can be leveraged in conjunction with parameter-efficient fine-tuning to achieve the state-of-the-art on iTBLS; outperforming standard parameter-efficient fine-tuning by up to 15% on interpretation, 18% on modification, and 38% on generation.
- Exploring the numerical reasoning capabilities of language models: A comprehensive analysis on tabular data. In Findings of the Association for Computational Linguistics: EMNLP 2023, pages 15391–15405, Singapore. Association for Computational Linguistics.
- The fact extraction and VERification over unstructured and structured information (FEVEROUS) shared task. In Proceedings of the Fourth Workshop on Fact Extraction and VERification (FEVER), pages 1–13, Dominican Republic. Association for Computational Linguistics.
- TEXT2TABLE: medical text summarization system based on named entity recognition and modality identification. In Proceedings of the Workshop on BioNLP - BioNLP ’09, page 185, Boulder, Colorado. Association for Computational Linguistics.
- Language Models are Realistic Tabular Data Generators. In The Eleventh International Conference on Learning Representations.
- Generating a table-of-contents. In Proceedings of the 45th Annual Meeting of the Association of Computational Linguistics, pages 544–551, Prague, Czech Republic. Association for Computational Linguistics.
- FakeTables: Using GANs to Generate Functional Dependency Preserving Tables with Bounded Real Data. In Proceedings of the Twenty-Eighth International Joint Conference on Artificial Intelligence, pages 2074–2080, Macao, China. International Joint Conferences on Artificial Intelligence Organization.
- Rigel: Transforming Tabular Data by Declarative Mapping. IEEE Transactions on Visualization and Computer Graphics, 29(1):128–138.
- Tabfact: A large-scale dataset for table-based fact verification. In International Conference on Learning Representations.
- FinQA: A dataset of numerical reasoning over financial data. In Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing, pages 3697–3711, Online and Punta Cana, Dominican Republic. Association for Computational Linguistics.
- HiTab: A hierarchical table dataset for question answering and natural language generation. In Proceedings of the 60th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), pages 1094–1110, Dublin, Ireland. Association for Computational Linguistics.
- Sajad Darabi and Yotam Elor. 2021. Synthesising multi-modal minority samples for tabular data. arXiv preprint arXiv:2105.08204.
- PACIFIC: Towards proactive conversational question answering over tabular and textual data in finance. In Proceedings of the 2022 Conference on Empirical Methods in Natural Language Processing, pages 6970–6984, Abu Dhabi, United Arab Emirates. Association for Computational Linguistics.
- QLoRA: Efficient Finetuning of Quantized LLMs. In Advances in Neural Information Processing Systems, volume 36, pages 10088–10115. Curran Associates, Inc.
- Understanding tables with intermediate pre-training. In Findings of the Association for Computational Linguistics: EMNLP 2020, pages 281–296, Online. Association for Computational Linguistics.
- Large Language Models(LLMs) on Tabular Data: Prediction, Generation, and Understanding – A Survey. ArXiv:2402.17944 [cs].
- TableGPT: Few-shot table-to-text generation with table structure reconstruction and content matching. In Proceedings of the 28th International Conference on Computational Linguistics, pages 1978–1988, Barcelona, Spain (Online). International Committee on Computational Linguistics.
- PASTA: Table-operations aware fact verification via sentence-table cloze pre-training. In Proceedings of the 2022 Conference on Empirical Methods in Natural Language Processing, pages 4971–4983, Abu Dhabi, United Arab Emirates. Association for Computational Linguistics.
- Manbir S. Gulati and Paul F. Roysdon. 2023. TabMT: Generating tabular data with masked transformers. In Thirty-seventh Conference on Neural Information Processing Systems.
- Manymodalqa: Modality disambiguation and qa over diverse inputs. In Proceedings of the AAAI Conference on Artificial Intelligence, volume 34, pages 7879–7886.
- Open domain question answering over tables via dense retrieval. In Proceedings of the 2021 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, pages 512–519, Online. Association for Computational Linguistics.
- TaPas: Weakly supervised table parsing via pre-training. In Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics, pages 4320–4333, Online. Association for Computational Linguistics.
- Interactive Table Synthesis with Natural Language. IEEE Transactions on Visualization and Computer Graphics, pages 1–16.
- StructGPT: A general framework for large language model to reason over structured data. In Proceedings of the 2023 Conference on Empirical Methods in Natural Language Processing, pages 9237–9251, Singapore. Association for Computational Linguistics.
- Resprompt: Residual connection prompting advances multi-step reasoning in large language models. arXiv preprint arXiv:2310.04743.
- Foofah: Transforming Data By Example. In Proceedings of the 2017 ACM International Conference on Management of Data, SIGMOD ’17, pages 683–698, New York, NY, USA. Association for Computing Machinery. Event-place: Chicago, Illinois, USA.
- Wrangler: interactive visual specification of data transformation scripts. In Proceedings of the SIGCHI Conference on Human Factors in Computing Systems, CHI ’11, pages 3363–3372, New York, NY, USA. Association for Computing Machinery. Event-place: <conf-loc>, <city>Vancouver</city>, <state>BC</state>, <country>Canada</country>, </conf-loc>.
- AxCell: Automatic extraction of results from machine learning papers. In Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP), pages 8580–8594, Online. Association for Computational Linguistics.
- TabDDPM: modelling tabular data with diffusion models. In Proceedings of the 40th International Conference on Machine Learning, pages 17564–17579.
- Evaluating Variational Autoencoder as a Private Data Release Mechanism for Tabular Data. In 2019 IEEE 24th Pacific Rim International Symposium on Dependable Computing (PRDC), pages 198–1988.
- Tapex: Table pre-training via learning a neural sql executor. arXiv preprint arXiv:2107.07653.
- Rethinking Tabular Data Understanding with Large Language Models. arXiv preprint arXiv:2312.16702.
- Self-refine: Iterative refinement with self-feedback. Advances in Neural Information Processing Systems, 36.
- HybriDialogue: An Information-Seeking Dialogue Dataset Grounded on Tabular and Textual Data. In Findings of the Association for Computational Linguistics: ACL 2022, pages 481–492, Dublin, Ireland. Association for Computational Linguistics.
- HybriDialogue: An information-seeking dialogue dataset grounded on tabular and textual data. In Findings of the Association for Computational Linguistics: ACL 2022, pages 481–492, Dublin, Ireland. Association for Computational Linguistics.
- FeTaQA: Free-form table question answering. Transactions of the Association for Computational Linguistics, 10:35–49.
- Arash Dargahi Nobari and Davood Rafiei. 2023. DTT: An Example-Driven Tabular Transformer for Joinability by Leveraging Large Language Models. ArXiv:2303.06748 [cs].
- Art: Automatic multi-step reasoning and tool-use for large language models. arXiv preprint arXiv:2303.09014.
- Data synthesis based on generative adversarial networks. Proceedings of the VLDB Endowment, 11(10):1071–1083.
- Panupong Pasupat and Percy Liang. 2015. Compositional semantic parsing on semi-structured tables. In Proceedings of the 53rd Annual Meeting of the Association for Computational Linguistics and the 7th International Joint Conference on Natural Language Processing (Volume 1: Long Papers), pages 1470–1480, Beijing, China. Association for Computational Linguistics.
- AI Assistants: A Framework for Semi-Automated Data Wrangling. IEEE Transactions on Knowledge and Data Engineering, 35(9):9295–9306.
- STable: Table Generation Framework for Encoder-Decoder Models. ArXiv:2206.04045 [cs].
- Syndicom: Improving conversational commonsense with error-injection and natural language feedback. In Proceedings of the 24th Annual Meeting of the Special Interest Group on Discourse and Dialogue, pages 297–308, Prague, Czechia. Association for Computational Linguistics.
- Soumajyoti Sarkar and Leonard Lausen. 2023. Testing the limits of unified sequence to sequence llm pretraining on diverse table data tasks.
- Intelligently creating and recommending reusable reformatting rules. In Proceedings of the 14th International Conference on Intelligent User Interfaces, IUI ’09, pages 297–306, New York, NY, USA. Association for Computing Machinery. Event-place: Sanibel Island, Florida, USA.
- Curated LLM: Synergy of LLMs and Data Curation for tabular augmentation in ultra low-data regimes. _eprint: 2312.12112.
- Alexey O. Shigarov and Andrey A. Mikhailov. 2017. Rule-based spreadsheet data transformation from arbitrary to relational tables. Information Systems, 71:123–136.
- Rishabh Singh and Sumit Gulwani. 2012. Learning semantic string transformations from examples. arXiv preprint arXiv:1204.6079.
- Tabular Representation, Noisy Operators, and Impacts on Table Structure Understanding Tasks in LLMs. ArXiv:2310.10358 [cs].
- Aivin V. Solatorio and Olivier Dupriez. 2023. REaLTabFormer: Generating Realistic Relational and Tabular Data using Transformers. ArXiv:2302.02041 [cs].
- Table meets llm: Can large language models understand structured table data? a benchmark and empirical study. In Proceedings of the 17th ACM International Conference on Web Search and Data Mining, WSDM ’24, page 645–654, New York, NY, USA. Association for Computing Machinery.
- Anirudh Sundar and Larry Heck. 2022. Multimodal Conversational AI: A Survey of Datasets and Approaches. ArXiv:2205.06907 [cs].
- gTBLS: Generating Tables from Text by Conditional Question Answering. ArXiv:2403.14457 [cs].
- Anirudh S. Sundar and Larry Heck. 2023. cTBLS: Augmenting large language models with conversational tables. In Proceedings of the 5th Workshop on NLP for Conversational AI (NLP4ConvAI 2023), pages 59–70, Toronto, Canada. Association for Computational Linguistics.
- Struc-Bench: Are Large Language Models Really Good at Generating Complex Structured Data? ArXiv:2309.08963 [cs].
- Llama 2: Open Foundation and Fine-Tuned Chat Models. _eprint: 2307.09288.
- Text-to-table: A new way of information extraction. In Proceedings of the 60th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), pages 2518–2533, Dublin, Ireland. Association for Computational Linguistics.
- Lei Xu and Kalyan Veeramachaneni. 2018. Synthesizing Tabular Data using Generative Adversarial Networks. ArXiv:1811.11264 [cs, stat].
- TableFormer: Robust transformer modeling for table-text encoding. In Proceedings of the 60th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), pages 528–537, Dublin, Ireland. Association for Computational Linguistics.
- TaBERT: Pretraining for joint understanding of textual and tabular data. In Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics, pages 8413–8426, Online. Association for Computational Linguistics.
- Tablegpt: Towards unifying tables, nature language and commands into one gpt. arXiv preprint arXiv:2307.08674.
- Generative table pre-training empowers models for tabular prediction. In Proceedings of the 2023 Conference on Empirical Methods in Natural Language Processing, pages 14836–14854, Singapore. Association for Computational Linguistics.
- Tablellama: Towards open large generalist models for tables. arXiv preprint arXiv:2311.09206.
- MultiHiertt: Numerical reasoning over multi hierarchical tabular and textual data. In Proceedings of the 60th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), pages 6588–6600, Dublin, Ireland. Association for Computational Linguistics.
- DocMath-Eval: Evaluating Numerical Reasoning Capabilities of LLMs in Understanding Long Documents with Tabular Data. _eprint: 2311.09805.
- CTAB-GAN: Effective Table Data Synthesizing. _eprint: 2102.08369.
- Solving challenging math word problems using gpt-4 code interpreter with code-based self-verification. arXiv preprint arXiv:2308.07921.
- TAT-QA: A question answering benchmark on a hybrid of tabular and textual content in finance. In Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing (Volume 1: Long Papers), pages 3277–3287, Online. Association for Computational Linguistics.
Sponsor
Paper Prompts
Sign up for free to create and run prompts on this paper using GPT-5.
Top Community Prompts
Collections
Sign up for free to add this paper to one or more collections.