Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
119 tokens/sec
GPT-4o
56 tokens/sec
Gemini 2.5 Pro Pro
43 tokens/sec
o3 Pro
6 tokens/sec
GPT-4.1 Pro
47 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

CT-Eval: Benchmarking Chinese Text-to-Table Performance in Large Language Models (2405.12174v1)

Published 20 May 2024 in cs.CL

Abstract: Text-to-Table aims to generate structured tables to convey the key information from unstructured documents. Existing text-to-table datasets are typically oriented English, limiting the research in non-English languages. Meanwhile, the emergence of LLMs has shown great success as general task solvers in multi-lingual settings (e.g., ChatGPT), theoretically enabling text-to-table in other languages. In this paper, we propose a Chinese text-to-table dataset, CT-Eval, to benchmark LLMs on this task. Our preliminary analysis of English text-to-table datasets highlights two key factors for dataset construction: data diversity and data hallucination. Inspired by this, the CT-Eval dataset selects a popular Chinese multidisciplinary online encyclopedia as the source and covers 28 domains to ensure data diversity. To minimize data hallucination, we first train an LLM to judge and filter out the task samples with hallucination, then employ human annotators to clean the hallucinations in the validation and testing sets. After this process, CT-Eval contains 88.6K task samples. Using CT-Eval, we evaluate the performance of open-source and closed-source LLMs. Our results reveal that zero-shot LLMs (including GPT-4) still have a significant performance gap compared with human judgment. Furthermore, after fine-tuning, open-source LLMs can significantly improve their text-to-table ability, outperforming GPT-4 by a large margin. In short, CT-Eval not only helps researchers evaluate and quickly understand the Chinese text-to-table ability of existing LLMs but also serves as a valuable resource to significantly improve the text-to-table performance of LLMs.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (33)
  1. Qwen technical report. arXiv preprint arXiv:2309.16609.
  2. Longbench: A bilingual, multitask benchmark for long context understanding. arXiv preprint arXiv:2308.14508.
  3. Table-to-text: Describing table region with natural language. In Proceedings of the AAAI conference on artificial intelligence, volume 32.
  4. Efficient and effective text encoding for chinese llama and alpaca. arXiv preprint arXiv:2304.08177.
  5. Glm: General language model pretraining with autoregressive blank infilling. In Proceedings of the 60th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), pages 320–335.
  6. Exploring the feasibility of chatgpt for event extraction. arXiv preprint arXiv:2303.03836.
  7. Making pre-trained language models better few-shot learners. arXiv preprint arXiv:2012.15723.
  8. Yes but.. can chatgpt identify entities in historical documents? arXiv preprint arXiv:2303.17322.
  9. A closer look at deep learning heuristics: Learning rate restarts, warmup and distillation. arXiv preprint arXiv:1810.13243.
  10. Diederik P Kingma and Jimmy Ba. 2014. Adam: A method for stochastic optimization. arXiv preprint arXiv:1412.6980.
  11. Neural text generation from structured data with application to the biography domain. arXiv preprint arXiv:1603.07771.
  12. Bart: Denoising sequence-to-sequence pre-training for natural language generation, translation, and comprehension. arXiv preprint arXiv:1910.13461.
  13. A sequence-to-sequence&set model for text-to-table generation. arXiv preprint arXiv:2306.00137.
  14. Pre-train, prompt, and predict: A systematic survey of prompting methods in natural language processing. ACM Computing Surveys, 55(9):1–35.
  15. The e2e dataset: New challenges for end-to-end generation. arXiv preprint arXiv:1706.09254.
  16. OpenAI. 2022. Introducing chatgpt. https://openai.com/blog/chatgpt.
  17. OpenAI. 2023. Gpt-4 technical report. ArXiv, abs/2303.08774.
  18. Stable: Table generation framework for encoder-decoder models. arXiv preprint arXiv:2206.04045.
  19. Going full-tilt boogie on document understanding with text-image-layout transformer. In Document Analysis and Recognition–ICDAR 2021: 16th International Conference, Lausanne, Switzerland, September 5–10, 2021, Proceedings, Part II 16, pages 732–747. Springer.
  20. Exploring the limits of transfer learning with a unified text-to-text transformer. The Journal of Machine Learning Research, 21(1):5485–5551.
  21. Deepspeed: System optimizations enable training deep learning models with over 100 billion parameters. Proceedings of the 26th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining.
  22. Autoprompt: Eliciting knowledge from language models with automatically generated prompts. arXiv preprint arXiv:2010.15980.
  23. Llama 2: Open foundation and fine-tuned chat models, 2023. URL https://arxiv. org/abs/2307.09288.
  24. Glue: A multi-task benchmark and analysis platform for natural language understanding. arXiv preprint arXiv:1804.07461.
  25. Is chatgpt a good nlg evaluator? a preliminary study. arXiv preprint arXiv:2303.04048.
  26. Zero-shot cross-lingual summarization via large language models. arXiv preprint arXiv:2302.14229.
  27. Cross-lingual knowledge editing in large language models. arXiv preprint arXiv:2309.08952.
  28. Challenges in data-to-document generation. arXiv preprint arXiv:1707.08052.
  29. Text-to-table: A new way of information extraction. In Proceedings of the 60th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), pages 2518–2533, Dublin, Ireland. Association for Computational Linguistics.
  30. Cn-dbpedia: A never-ending chinese knowledge extraction system. In International Conference on Industrial, Engineering and Other Applications of Applied Intelligent Systems, pages 428–438. Springer.
  31. Baichuan 2: Open large-scale language models. arXiv preprint arXiv:2309.10305.
  32. A survey of large language models. arXiv preprint arXiv:2303.18223.
  33. Can chatgpt understand too? a comparative study on chatgpt and fine-tuned bert. arXiv preprint arXiv:2302.10198.
User Edit Pencil Streamline Icon: https://streamlinehq.com
Authors (5)
  1. Haoxiang Shi (13 papers)
  2. Jiaan Wang (35 papers)
  3. Jiarong Xu (24 papers)
  4. Cen Wang (9 papers)
  5. Tetsuya Sakai (30 papers)

Summary

We haven't generated a summary for this paper yet.

X Twitter Logo Streamline Icon: https://streamlinehq.com