Data Transformation to Construct a Dataset for Generating Entity-Relationship Model from Natural Language (2312.13694v1)
Abstract: In order to reduce the manual cost of designing ER models, recent approaches have been proposed to address the task of NL2ERM, i.e., automatically generating entity-relationship (ER) models from natural language (NL) utterances such as software requirements. These approaches are typically rule-based ones, which rely on rigid heuristic rules; these approaches cannot generalize well to various linguistic ways of describing the same requirement. Despite having better generalization capability than rule-based approaches, deep-learning-based models are lacking for NL2ERM due to lacking a large-scale dataset. To address this issue, in this paper, we report our insight that there exists a high similarity between the task of NL2ERM and the increasingly popular task of text-to-SQL, and propose a data transformation algorithm that transforms the existing data of text-to-SQL into the data of NL2ERM. We apply our data transformation algorithm on Spider, one of the most popular text-to-SQL datasets, and we also collect some data entries with different NL types, to obtain a large-scale NL2ERM dataset. Because NL2ERM can be seen as a special information extraction (IE) task, we train two state-of-the-art IE models on our dataset. The experimental results show that both the two models achieve high performance and outperform existing baselines.
- A Novel Natural Language Processing approach to automatically Visualize Entity-Relationship Model from Initial Software Requirements. In 2021 International Conference on Communication Technologies (ComTech), 39–43.
- Matching the Blanks: Distributional Similarity for Relation Learning. In Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics, 2895–2905. Florence, Italy: Association for Computational Linguistics.
- Generating ER Diagrams from Requirement Specifications Based On Natural Language Processing. International Journal of Database Theory and Application, 8: 61–70.
- Chen, P. P.-S. 1983. English sentence structure and entity-relationship diagrams. Information Sciences, 29(2): 127–149.
- Natural Language Processing (almost) from Scratch.
- Diederik P. Kingma, J. B. 2014. Adam: A Method for Stochastic Optimization.
- Understanding Back-Translation at Scale. In Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing, 489–500. Brussels, Belgium: Association for Computational Linguistics.
- Fundamentals of Database Systems (5th Edition). USA: Addison-Wesley Longman Publishing Co., Inc. ISBN 0321369572.
- A system for the semiautomatic generation of E-R models from natural language specifications. Data & Knowledge Engineering, 29(1): 57–81.
- Benchmarking Meaning Representations in Neural Semantic Parsing. In Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP), 1520–1540. Online: Association for Computational Linguistics.
- Towards Complex Text-to-SQL in Cross-Domain Database with Intermediate Representation. In Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics, 4524–4535. Florence, Italy: Association for Computational Linguistics.
- A Scenario-based ER Diagram and Query Generation Engine. In 2019 4th International Conference on Information Technology Research (ICITR), 1–5.
- spaCy 2: Natural language understanding with Bloom embeddings, convolutional neural networks and incremental parsing. To appear.
- Bidirectional LSTM-CRF Models for Sequence Tagging.
- REBEL: Relation Extraction By End-to-end Language generation. In Findings of the Association for Computational Linguistics: EMNLP 2021, 2370–2381. Punta Cana, Dominican Republic: Association for Computational Linguistics.
- Generating Entity Relationship Diagram from Requirement Specification based on NLP. In 2018 3rd International Conference on Information Technology Research (ICITR), 1–4.
- Neural Architectures for Named Entity Recognition. In Proceedings of the 2016 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, 260–270. San Diego, California: Association for Computational Linguistics.
- Unified Structure Generation for Universal Information Extraction. In Proceedings of the 60th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), 5755–5772. Dublin, Ireland: Association for Computational Linguistics.
- A Survey on Open Information Extraction. In Proceedings of the 27th International Conference on Computational Linguistics, 3866–3878. Santa Fe, New Mexico, USA: Association for Computational Linguistics.
- Heuristic-based entity-relationship modelling through natural language processing. In McGinty, L.; and Crean, B., eds., Unknown Host Publication, 302–313. Ireland: Artificial Intelligence Association of Ireland. ISBN 1-902277-89-9. Proc. of the 15th Artificial Intelligence and Cognitive Science Conference (AICS-04) ; Conference date: 01-09-2004.
- Semantic analysis in the automation of ER modelling through natural language processing. In 2006 International Conference on Computing & Informatics, 1–5.
- Predicting Semantic Relations using Global Graph Properties. In Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing, 1741–1751. Brussels, Belgium: Association for Computational Linguistics.
- Database Management Systems. USA: McGraw-Hill, Inc., 2nd edition. ISBN 0072440422.
- Modeling Relations and Their Mentions without Labeled Text. In Proceedings of the 2010 European Conference on Machine Learning and Knowledge Discovery in Databases: Part III, ECML PKDD’10, 148–163. Berlin, Heidelberg: Springer-Verlag. ISBN 3642159389.
- PICARD: Parsing Incrementally for Constrained Auto-Regressive Decoding from Language Models. In Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing, 9895–9901. Online and Punta Cana, Dominican Republic: Association for Computational Linguistics.
- Transformation of requirement specifications expressed in natural language into an EER model. In Elmasri, R. A.; Kouramajian, V.; and Thalheim, B., eds., Entity-Relationship Approach — ER ’93, 206–217. Berlin, Heidelberg: Springer Berlin Heidelberg.
- RAT-SQL: Relation-Aware Schema Encoding and Linking for Text-to-SQL Parsers. In Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics, 7567–7578. Online: Association for Computational Linguistics.
- Extracting entity-relationship diagram from a table-based legacy database. Journal of Systems and Software, 81(5): 764–771. Software Process and Product Measurement.
- Spider: A Large-Scale Human-Labeled Dataset for Complex and Cross-Domain Semantic Parsing and Text-to-SQL Task. In Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing, 3911–3921. Brussels, Belgium: Association for Computational Linguistics.
- Seq2SQL: Generating Structured Queries from Natural Language using Reinforcement Learning. CoRR, abs/1709.00103.
Paper Prompts
Sign up for free to create and run prompts on this paper using GPT-5.
Top Community Prompts
Collections
Sign up for free to add this paper to one or more collections.