Automatic Logical Forms improve fidelity in Table-to-Text generation (2310.17279v2)
Abstract: Table-to-text systems generate natural language statements from structured data like tables. While end-to-end techniques suffer from low factual correctness (fidelity), a previous study reported gains when using manual logical forms (LF) that represent the selected content and the semantics of the target text. Given the manual step, it was not clear whether automatic LFs would be effective, or whether the improvement came from content selection alone. We present TlT which, given a table and a selection of the content, first produces LFs and then the textual statement. We show for the first time that automatic LFs improve quality, with an increase in fidelity of 30 points over a comparable system not using LFs. Our experiments allow to quantify the remaining challenges for high factual correctness, with automatic selection of content coming first, followed by better Logic-to-Text generation and, to a lesser extent, better Table-to-Logic parsing.
- HTLM: Hyper-text pre-training and prompting of language models, in: International Conference on Learning Representations. URL: https://openreview.net/forum?id=P-pPW1nxf1r.
- Abstract meaning representation (amr) 1.0 specification, in: Abstract meaning representation (amr) 1.0 specification, pp. 1533–1544.
- BigScience Workshop, 2022. Bloom (revision 4ab0472). URL: https://huggingface.co/bigscience/bloom, doi:10.57967/hf/0003.
- Valuenet: A natural language-to-sql system that learns from database information, in: 2021 IEEE 37th International Conference on Data Engineering (ICDE), pp. 2177–2182. doi:10.1109/ICDE51399.2021.00220.
- Meaning and Necessity: A Study in Semantics and Modal Logic. University of Chicago Press, Chicago.
- Logical natural language generation from open-domain tables, in: Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics, Association for Computational Linguistics, Online. pp. 7929–7942. URL: https://aclanthology.org/2020.acl-main.708, doi:10.18653/v1/2020.acl-main.708.
- KGPT: Knowledge-grounded pre-training for data-to-text generation, in: Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP), Association for Computational Linguistics, Online. pp. 8635–8648. URL: https://aclanthology.org/2020.emnlp-main.697, doi:10.18653/v1/2020.emnlp-main.697.
- Logic2Text: High-fidelity natural language generation from logical forms, in: Findings of the Association for Computational Linguistics: EMNLP 2020, Association for Computational Linguistics, Online. pp. 2096–2111. URL: https://aclanthology.org/2020.findings-emnlp.190, doi:10.18653/v1/2020.findings-emnlp.190.
- Building natural language generation systems. Language 77, 611–612. doi:10.1353/lan.2001.0146.
- BERT: Pre-training of deep bidirectional transformers for language understanding, in: Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long and Short Papers), Association for Computational Linguistics, Minneapolis, Minnesota. pp. 4171–4186. URL: https://aclanthology.org/N19-1423, doi:10.18653/v1/N19-1423.
- Statistical acquisition of content selection rules for natural language generation, in: Proceedings of the 2003 Conference on Empirical Methods in Natural Language Processing, pp. 121–128. URL: https://aclanthology.org/W03-1016.
- Measuring nominal scale agreement among many raters. Psychological bulletin 76, 378.
- Survey of the state of the art in natural language generation: Core tasks, applications and evaluation. Journal of Artificial Intelligence Research 61, 65–170. doi:10.1613/jair.5477.
- Using natural-language processing to produce weather forecasts. IEEE Expert 9, 45–53. doi:10.1109/64.294135.
- Towards complex text-to-SQL in cross-domain database with intermediate representation, in: Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics, Association for Computational Linguistics, Florence, Italy. pp. 4524–4535. URL: https://aclanthology.org/P19-1444, doi:10.18653/v1/P19-1444.
- Have your text and use it too! end-to-end neural data-to-text generation with semantic fidelity, in: Proceedings of the 28th International Conference on Computational Linguistics, International Committee on Computational Linguistics, Barcelona, Spain (Online). pp. 2410–2424. URL: https://aclanthology.org/2020.coling-main.218, doi:10.18653/v1/2020.coling-main.218.
- Long short-term memory. Neural Computation 9, 1735–1780. doi:10.1162/neco.1997.9.8.1735.
- Neural pipeline for zero-shot data-to-text generation, in: Proceedings of the 60th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), Association for Computational Linguistics, Dublin, Ireland. pp. 3914–3932. URL: https://aclanthology.org/2022.acl-long.271, doi:10.18653/v1/2022.acl-long.271.
- Six challenges for neural machine translation, in: Proceedings of the First Workshop on Neural Machine Translation, Association for Computational Linguistics, Vancouver. pp. 28–39. URL: https://aclanthology.org/W17-3204, doi:10.18653/v1/W17-3204.
- Neural text generation from structured data with application to the biography domain, in: Proceedings of the 2016 Conference on Empirical Methods in Natural Language Processing, Association for Computational Linguistics, Austin, Texas. pp. 1203–1213. URL: https://aclanthology.org/D16-1128, doi:10.18653/v1/D16-1128.
- Point precisely: Towards ensuring the precision of data in generated texts using delayed copy mechanism, in: Proceedings of the 27th International Conference on Computational Linguistics, Association for Computational Linguistics, Santa Fe, New Mexico, USA. pp. 1044–1055. URL: https://aclanthology.org/C18-1089.
- Posterior control of blackbox generation, in: Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics, Association for Computational Linguistics, Online. pp. 2731–2743. URL: https://aclanthology.org/2020.acl-main.243, doi:10.18653/v1/2020.acl-main.243.
- ROUGE: A package for automatic evaluation of summaries, in: Text Summarization Branches Out, Association for Computational Linguistics, Barcelona, Spain. pp. 74–81. URL: https://aclanthology.org/W04-1013.
- Table-to-text generation by structure-aware seq2seq learning, in: Table-to-text generation by structure-aware seq2seq learning. doi:10.1609/aaai.v32i1.11925.
- Neurologic a* esque decoding: Constrained text generation with lookahead heuristics. arXiv preprint arXiv:2112.08726 .
- Improving truthfulness of headline generation, in: Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics, Association for Computational Linguistics, Online. pp. 1335–1346. URL: https://aclanthology.org/2020.acl-main.123, doi:10.18653/v1/2020.acl-main.123.
- On faithfulness and factuality in abstractive summarization, in: Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics, Association for Computational Linguistics, Online. pp. 1906–1919. URL: https://aclanthology.org/2020.acl-main.173, doi:10.18653/v1/2020.acl-main.173.
- Human evaluation and correlation with automatic metrics in consultation note generation, in: Proceedings of the 60th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), Association for Computational Linguistics, Dublin, Ireland. pp. 5739–5754. URL: https://aclanthology.org/2022.acl-long.394, doi:10.18653/v1/2022.acl-long.394.
- Bleu: a method for automatic evaluation of machine translation, in: Proceedings of the 40th Annual Meeting of the Association for Computational Linguistics, Association for Computational Linguistics, Philadelphia, Pennsylvania, USA. pp. 311–318. URL: https://aclanthology.org/P02-1040, doi:10.3115/1073083.1073135.
- Data-to-text generation with content selection and planning. Proceedings of the AAAI Conference on Artificial Intelligence 33, 6908–6915. URL: https://ojs.aaai.org/index.php/AAAI/article/view/4668, doi:10.1609/aaai.v33i01.33016908.
- Data-to-text generation with entity modeling, in: Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics, Association for Computational Linguistics, Florence, Italy. pp. 2023–2035. URL: https://aclanthology.org/P19-1195, doi:10.18653/v1/P19-1195.
- Language models are unsupervised multitask learners. OpenAI Blog URL: https://d4mucfpksywv.cloudfront.net/better-language-models/language-models.pdf.
- ColloQL: Robust text-to-SQL over search queries, in: Proceedings of the First Workshop on Interactive and Executable Semantic Parsing, Association for Computational Linguistics, Online. pp. 34–45. URL: https://aclanthology.org/2020.intexsempar-1.5, doi:10.18653/v1/2020.intexsempar-1.5.
- Exploring the limits of transfer learning with a unified text-to-text transformer. Journal of Machine Learning Research 21, 1–67. URL: http://jmlr.org/papers/v21/20-074.html.
- A hierarchical model for data-to-text generation, in: A Hierarchical Model for Data-to-Text Generation, Springer. pp. 65–80. doi:10.1007/978-3-030-45439-5_5.
- Building applied natural language generation systems. Natural Language Engineering 3, 57–87. doi:10.1017/S1351324997001502.
- Order-planning neural text generation from structured data, in: Proceedings of the Thirty-Second AAAI Conference on Artificial Intelligence and Thirtieth Innovative Applications of Artificial Intelligence Conference and Eighth AAAI Symposium on Educational Advances in Artificial Intelligence, AAAI Press.
- Neural data-to-text generation via jointly learning the segmentation and correspondence, in: Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics, Association for Computational Linguistics, Online. pp. 7155–7165. URL: https://aclanthology.org/2020.acl-main.641, doi:10.18653/v1/2020.acl-main.641.
- Plan-then-generate: Controlled data-to-text generation via planning, in: Findings of the Association for Computational Linguistics: EMNLP 2021, Association for Computational Linguistics, Punta Cana, Dominican Republic. pp. 895–909. URL: https://aclanthology.org/2021.findings-emnlp.76, doi:10.18653/v1/2021.findings-emnlp.76.
- Sticking to the facts: Confident decoding for faithful data-to-text generation. arXiv preprint arXiv:1910.08684 .
- Pointer networks, in: Cortes, C., Lawrence, N., Lee, D., Sugiyama, M., Garnett, R. (Eds.), Advances in Neural Information Processing Systems, Curran Associates, Inc. URL: https://proceedings.neurips.cc/paper/2015/file/29921001f2f04bd3baee84a12e98098f-Paper.pdf.
- Integration of system-dynamics, aspect-programming, and object-orientation in system information modeling. IEEE Transactions on Industrial Informatics 10, 847–853. doi:10.1109/TII.2014.2300703.
- Towards faithful neural table-to-text generation with content-matching constraints, in: Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics, Association for Computational Linguistics, Online. pp. 1072–1086. URL: https://aclanthology.org/2020.acl-main.101, doi:10.18653/v1/2020.acl-main.101.
- Challenges in data-to-document generation, in: Proceedings of the 2017 Conference on Empirical Methods in Natural Language Processing, Association for Computational Linguistics, Copenhagen, Denmark. pp. 2253–2263. URL: https://aclanthology.org/D17-1239, doi:10.18653/v1/D17-1239.
- Learning neural templates for text generation, in: Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing, Association for Computational Linguistics, Brussels, Belgium. pp. 3174–3187. URL: https://aclanthology.org/D18-1356, doi:10.18653/v1/D18-1356.
- A syntactic neural model for general-purpose code generation, in: The 55th Annual Meeting of the Association for Computational Linguistics (ACL), Vancouver, Canada. URL: https://arxiv.org/abs/1704.01696.
- Bartscore: Evaluating generated text as text generation. Advances in Neural Information Processing Systems 34, 27263–27277.
- Opt: Open pre-trained transformer language models. URL: https://arxiv.org/abs/2205.01068, doi:10.48550/ARXIV.2205.01068.
- Bertscore: Evaluating text generation with bert. arXiv preprint arXiv:1904.09675 .
- An integrated environment for CAD/CAM of mechanical systems. Ph.D. thesis. TU Delft.
- Eneko Agirre (53 papers)
- Iñigo Alonso (8 papers)