Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
175 tokens/sec
GPT-4o
7 tokens/sec
Gemini 2.5 Pro Pro
42 tokens/sec
o3 Pro
4 tokens/sec
GPT-4.1 Pro
38 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

LEGOBench: Scientific Leaderboard Generation Benchmark (2401.06233v2)

Published 11 Jan 2024 in cs.CL

Abstract: The ever-increasing volume of paper submissions makes it difficult to stay informed about the latest state-of-the-art research. To address this challenge, we introduce LEGOBench, a benchmark for evaluating systems that generate scientific leaderboards. LEGOBench is curated from 22 years of preprint submission data on arXiv and more than 11k machine learning leaderboards on the PapersWithCode portal. We present four graph-based and two LLM-based leaderboard generation task configurations. We evaluate popular encoder-only scientific LLMs as well as decoder-only LLMs across these task configurations. State-of-the-art models showcase significant performance gaps in automatic leaderboard generation on LEGOBench. The code is available on GitHub ( https://github.com/lingo-iitgn/LEGOBench ) and the dataset is hosted on OSF ( https://osf.io/9v2py/?view_only=6f91b0b510df498ba01595f8f278f94c ).

Definition Search Book Streamline Icon: https://streamlinehq.com
References (39)
  1. Semeval 2017 task 10: Scienceie-extracting keyphrases and relations from scientific publications. arXiv preprint arXiv:1704.02853, 2017.
  2. Why did you not compare with that? identifying papers for use as baselines. In Advances in Information Retrieval: 44th European Conference on IR Research, ECIR 2022, Stavanger, Norway, April 10–14, 2022, Proceedings, Part I, pages 51–64. Springer, 2022.
  3. SciBERT: A pretrained language model for scientific text. In Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP), pages 3615–3620, Hong Kong, China, November 2019a. Association for Computational Linguistics. doi: 10.18653/v1/D19-1371. URL https://aclanthology.org/D19-1371.
  4. Scibert: A pretrained language model for scientific text. In Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP), pages 3615–3620, 2019b.
  5. The ACL Anthology reference corpus: A reference dataset for bibliographic research in computational linguistics. In Proceedings of the Sixth International Conference on Language Resources and Evaluation (LREC’08), Marrakech, Morocco, May 2008. European Language Resources Association (ELRA). URL http://www.lrec-conf.org/proceedings/lrec2008/pdf/445_paper.pdf.
  6. SPECTER: Document-level representation learning using citation-informed transformers. In Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics, pages 2270–2282, Online, July 2020. Association for Computational Linguistics. doi: 10.18653/v1/2020.acl-main.207. URL https://aclanthology.org/2020.acl-main.207.
  7. Challenges in end-to-end neural scientific table recognition. In 2019 International Conference on Document Analysis and Recognition (ICDAR), pages 894–901. IEEE, 2019.
  8. BERT: Pre-training of deep bidirectional transformers for language understanding. In Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long and Short Papers), pages 4171–4186, Minneapolis, Minnesota, June 2019. Association for Computational Linguistics. doi: 10.18653/v1/N19-1423. URL https://aclanthology.org/N19-1423.
  9. An image is worth 16x16 words: Transformers for image recognition at scale. arXiv preprint arXiv:2010.11929, 2020.
  10. Eff AI Metrics, 2017. URL https://github.com/AI-metrics/AI-metrics.
  11. Google. Bard, 2023.
  12. Identification of tasks, datasets, evaluation metrics, and numeric scores for scientific leaderboards construction. arXiv preprint arXiv:1906.09317, 2019.
  13. Scirex: A challenge dataset for document-level information extraction. arXiv preprint arXiv:2005.00512, 2020.
  14. Automated Mining of Leaderboards for Empirical AI Research. In Towards Open and Trustworthy Digital Societies: 23rd International Conference on Asia-Pacific Digital Libraries, ICADL 2021, Virtual Event, December 1–3, 2021, Proceedings 23, pages 453–470. Springer, 2021.
  15. Zero-shot entailment of leaderboards for empirical ai research. arXiv preprint arXiv:2303.16835, 2023.
  16. Axcell: Automatic extraction of results from machine learning papers. arXiv preprint arXiv:2004.14356, 2020.
  17. Tables to latex: structure and content extraction from scientific tables. International Journal on Document Analysis and Recognition (IJDAR), pages 1–10, 2022.
  18. Tablebank: Table benchmark for image-based table detection and recognition. In Proceedings of the Twelfth Language Resources and Evaluation Conference, pages 1918–1925, 2020.
  19. OAG-BERT: Towards a unified backbone language model for academic knowledge services. In Proceedings of the 28th ACM SIGKDD Conference on Knowledge Discovery and Data Mining, pages 3418–3428, 2022.
  20. Tableseer: automatic table metadata extraction and searching in digital libraries. In Proceedings of the 7th ACM/IEEE-CS joint conference on Digital libraries, pages 91–100, 2007.
  21. Patrice Lopez. Grobid: Combining automatic bibliographic data recognition and term extraction for scholarship publications. In Research and Advanced Technology for Digital Libraries: 13th European Conference, ECDL 2009, Corfu, Greece, September 27-October 2, 2009. Proceedings 13, pages 473–474. Springer, 2009.
  22. Scientific information extraction with semi-supervised neural tagging. In Proceedings of the 2017 Conference on Empirical Methods in Natural Language Processing, pages 2641–2651, Copenhagen, Denmark, September 2017. Association for Computational Linguistics. doi: 10.18653/v1/D17-1279. URL https://aclanthology.org/D17-1279.
  23. Multi-task identification of entities, relations, and coreference for scientific knowledge graph construction. arXiv preprint arXiv:1808.09602, 2018.
  24. OpenAI. Chatgpt-3.5, 2023.
  25. Neighborhood contrastive learning for scientific document representations with citation embeddings. In Proceedings of the 2022 Conference on Empirical Methods in Natural Language Processing, pages 11670–11688, Abu Dhabi, United Arab Emirates, December 2022. Association for Computational Linguistics. URL https://aclanthology.org/2022.emnlp-main.802.
  26. Squad: 100,000+ questions for machine comprehension of text. In Proceedings of the 2016 Conference on Empirical Methods in Natural Language Processing, pages 2383–2392, 2016.
  27. Sebastian Ruder. NLP Progress, 2018. URL http://nlpprogress.com/.
  28. Scirepeval: A multi-format benchmark for scientific document representations. arXiv preprint arXiv:2211.13308, 2022.
  29. Automated early leaderboard generation from comparative tables. In Advances in Information Retrieval: 41st European Conference on IR Research, ECIR 2019, Cologne, Germany, April 14–18, 2019, Proceedings, Part I 41, pages 244–257. Springer, 2019.
  30. Paperwithcode, 2018. URL https://paperswithcode.com/.
  31. Yudong Tao. Reddit SOTA, 2017. URL https://github.com/RedditSota/state-of-the-art-result-for-machine-learning-problems.
  32. Galactica: A large language model for science. arXiv preprint arXiv:2211.09085, 2022.
  33. Llama: Open and efficient foundation language models. arXiv preprint arXiv:2302.13971, 2023.
  34. Attention is all you need. Advances in neural information processing systems, 30, 2017.
  35. CitationIE: Leveraging the Citation Graph for Scientific Information Extraction. arXiv preprint arXiv:2106.01560, 2021.
  36. GLUE: A multi-task benchmark and analysis platform for natural language understanding. In In the Proceedings of ICLR, 2019.
  37. Telin: Table entity linker for extracting leaderboards from machine learning publications. In Proceedings of the first Workshop on Information Extraction from Scientific Publications, pages 20–25, 2022.
  38. Xlnet: Generalized autoregressive pretraining for language understanding. Advances in neural information processing systems, 32, 2019.
  39. Image-based table recognition: Data, model, and evaluation. In Computer Vision – ECCV 2020: 16th European Conference, Glasgow, UK, August 23–28, 2020, Proceedings, Part XXI, page 564–580, Berlin, Heidelberg, 2020. Springer-Verlag. ISBN 978-3-030-58588-4. doi: 10.1007/978-3-030-58589-1_34. URL https://doi.org/10.1007/978-3-030-58589-1_34.

Summary

We haven't generated a summary for this paper yet.

Github Logo Streamline Icon: https://streamlinehq.com
X Twitter Logo Streamline Icon: https://streamlinehq.com

Tweets