Papers
Topics
Authors
Recent
Assistant
AI Research Assistant
Well-researched responses based on relevant abstracts and paper content.
Custom Instructions Pro
Preferences or requirements that you'd like Emergent Mind to consider when generating responses.
Gemini 2.5 Flash
Gemini 2.5 Flash 149 tok/s
Gemini 2.5 Pro 48 tok/s Pro
GPT-5 Medium 35 tok/s Pro
GPT-5 High 35 tok/s Pro
GPT-4o 92 tok/s Pro
Kimi K2 196 tok/s Pro
GPT OSS 120B 425 tok/s Pro
Claude Sonnet 4.5 35 tok/s Pro
2000 character limit reached

Data Transformation to Construct a Dataset for Generating Entity-Relationship Model from Natural Language (2312.13694v1)

Published 21 Dec 2023 in cs.CL

Abstract: In order to reduce the manual cost of designing ER models, recent approaches have been proposed to address the task of NL2ERM, i.e., automatically generating entity-relationship (ER) models from natural language (NL) utterances such as software requirements. These approaches are typically rule-based ones, which rely on rigid heuristic rules; these approaches cannot generalize well to various linguistic ways of describing the same requirement. Despite having better generalization capability than rule-based approaches, deep-learning-based models are lacking for NL2ERM due to lacking a large-scale dataset. To address this issue, in this paper, we report our insight that there exists a high similarity between the task of NL2ERM and the increasingly popular task of text-to-SQL, and propose a data transformation algorithm that transforms the existing data of text-to-SQL into the data of NL2ERM. We apply our data transformation algorithm on Spider, one of the most popular text-to-SQL datasets, and we also collect some data entries with different NL types, to obtain a large-scale NL2ERM dataset. Because NL2ERM can be seen as a special information extraction (IE) task, we train two state-of-the-art IE models on our dataset. The experimental results show that both the two models achieve high performance and outperform existing baselines.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (30)
  1. A Novel Natural Language Processing approach to automatically Visualize Entity-Relationship Model from Initial Software Requirements. In 2021 International Conference on Communication Technologies (ComTech), 39–43.
  2. Matching the Blanks: Distributional Similarity for Relation Learning. In Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics, 2895–2905. Florence, Italy: Association for Computational Linguistics.
  3. Generating ER Diagrams from Requirement Specifications Based On Natural Language Processing. International Journal of Database Theory and Application, 8: 61–70.
  4. Chen, P. P.-S. 1983. English sentence structure and entity-relationship diagrams. Information Sciences, 29(2): 127–149.
  5. Natural Language Processing (almost) from Scratch.
  6. Diederik P. Kingma, J. B. 2014. Adam: A Method for Stochastic Optimization.
  7. Understanding Back-Translation at Scale. In Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing, 489–500. Brussels, Belgium: Association for Computational Linguistics.
  8. Fundamentals of Database Systems (5th Edition). USA: Addison-Wesley Longman Publishing Co., Inc. ISBN 0321369572.
  9. A system for the semiautomatic generation of E-R models from natural language specifications. Data & Knowledge Engineering, 29(1): 57–81.
  10. Benchmarking Meaning Representations in Neural Semantic Parsing. In Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP), 1520–1540. Online: Association for Computational Linguistics.
  11. Towards Complex Text-to-SQL in Cross-Domain Database with Intermediate Representation. In Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics, 4524–4535. Florence, Italy: Association for Computational Linguistics.
  12. A Scenario-based ER Diagram and Query Generation Engine. In 2019 4th International Conference on Information Technology Research (ICITR), 1–5.
  13. spaCy 2: Natural language understanding with Bloom embeddings, convolutional neural networks and incremental parsing. To appear.
  14. Bidirectional LSTM-CRF Models for Sequence Tagging.
  15. REBEL: Relation Extraction By End-to-end Language generation. In Findings of the Association for Computational Linguistics: EMNLP 2021, 2370–2381. Punta Cana, Dominican Republic: Association for Computational Linguistics.
  16. Generating Entity Relationship Diagram from Requirement Specification based on NLP. In 2018 3rd International Conference on Information Technology Research (ICITR), 1–4.
  17. Neural Architectures for Named Entity Recognition. In Proceedings of the 2016 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, 260–270. San Diego, California: Association for Computational Linguistics.
  18. Unified Structure Generation for Universal Information Extraction. In Proceedings of the 60th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), 5755–5772. Dublin, Ireland: Association for Computational Linguistics.
  19. A Survey on Open Information Extraction. In Proceedings of the 27th International Conference on Computational Linguistics, 3866–3878. Santa Fe, New Mexico, USA: Association for Computational Linguistics.
  20. Heuristic-based entity-relationship modelling through natural language processing. In McGinty, L.; and Crean, B., eds., Unknown Host Publication, 302–313. Ireland: Artificial Intelligence Association of Ireland. ISBN 1-902277-89-9. Proc. of the 15th Artificial Intelligence and Cognitive Science Conference (AICS-04) ; Conference date: 01-09-2004.
  21. Semantic analysis in the automation of ER modelling through natural language processing. In 2006 International Conference on Computing & Informatics, 1–5.
  22. Predicting Semantic Relations using Global Graph Properties. In Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing, 1741–1751. Brussels, Belgium: Association for Computational Linguistics.
  23. Database Management Systems. USA: McGraw-Hill, Inc., 2nd edition. ISBN 0072440422.
  24. Modeling Relations and Their Mentions without Labeled Text. In Proceedings of the 2010 European Conference on Machine Learning and Knowledge Discovery in Databases: Part III, ECML PKDD’10, 148–163. Berlin, Heidelberg: Springer-Verlag. ISBN 3642159389.
  25. PICARD: Parsing Incrementally for Constrained Auto-Regressive Decoding from Language Models. In Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing, 9895–9901. Online and Punta Cana, Dominican Republic: Association for Computational Linguistics.
  26. Transformation of requirement specifications expressed in natural language into an EER model. In Elmasri, R. A.; Kouramajian, V.; and Thalheim, B., eds., Entity-Relationship Approach — ER ’93, 206–217. Berlin, Heidelberg: Springer Berlin Heidelberg.
  27. RAT-SQL: Relation-Aware Schema Encoding and Linking for Text-to-SQL Parsers. In Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics, 7567–7578. Online: Association for Computational Linguistics.
  28. Extracting entity-relationship diagram from a table-based legacy database. Journal of Systems and Software, 81(5): 764–771. Software Process and Product Measurement.
  29. Spider: A Large-Scale Human-Labeled Dataset for Complex and Cross-Domain Semantic Parsing and Text-to-SQL Task. In Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing, 3911–3921. Brussels, Belgium: Association for Computational Linguistics.
  30. Seq2SQL: Generating Structured Queries from Natural Language using Reinforcement Learning. CoRR, abs/1709.00103.

Summary

We haven't generated a summary for this paper yet.

Dice Question Streamline Icon: https://streamlinehq.com

Open Problems

We haven't generated a list of open problems mentioned in this paper yet.

Lightbulb Streamline Icon: https://streamlinehq.com

Continue Learning

We haven't generated follow-up questions for this paper yet.

List To Do Tasks Checklist Streamline Icon: https://streamlinehq.com

Collections

Sign up for free to add this paper to one or more collections.