CODE-ACCORD: A Corpus of building regulatory data for rule generation towards automatic compliance checking (2403.02231v4)
Abstract: Automatic Compliance Checking (ACC) within the Architecture, Engineering, and Construction (AEC) sector necessitates automating the interpretation of building regulations to achieve its full potential. Converting textual rules into machine-readable formats is challenging due to the complexities of natural language and the scarcity of resources for advanced Machine Learning (ML). Addressing these challenges, we introduce CODE-ACCORD, a dataset of 862 sentences from the building regulations of England and Finland. Only the self-contained sentences, which express complete rules without needing additional context, were considered as they are essential for ACC. Each sentence was manually annotated with entities and relations by a team of 12 annotators to facilitate machine-readable rule generation, followed by careful curation to ensure accuracy. The final dataset comprises 4,297 entities and 4,329 relations across various categories, serving as a robust ground truth. CODE-ACCORD supports a range of ML and NLP tasks, including text classification, entity recognition, and relation extraction. It enables applying recent trends, such as deep neural networks and LLMs, to ACC.
- J. Zhang, N. El-Gohary, Extraction of construction regulatory requirements from textual documents using natural language processing techniques, in: Congress on Computing in Civil Engineering, Proceedings, 2012, pp. 453–460. doi:10.1061/9780784412343.0057.
- Exploring natural language processing in construction and integration with building information modeling: A scientometric analysis, Buildings 11 (2021).
- Semantisation of rules for automated compliance checking, in: LDAC2023-Linked Data in Architecture and Construction Week, 2023.
- Leveraging word embeddings and transformers to extract semantics from building regulations text, in: LDAC2023-Linked Data in Architecture and Construction Week, 2023.
- E. F. Tjong Kim Sang, F. De Meulder, Introduction to the CoNLL-2003 shared task: Language-independent named entity recognition, in: Proceedings of the Seventh Conference on Natural Language Learning at HLT-NAACL 2003, 2003, pp. 142–147. URL: https://aclanthology.org/W03-0419.
- L. Ramshaw, M. Marcus, Text chunking using transformation-based learning, in: Third Workshop on Very Large Corpora, 1995. URL: https://aclanthology.org/W95-0107.
- Recon: Relation extraction using knowledge graph context in a graph neural network, in: Proceedings of the Web Conference 2021, WWW ’21, Association for Computing Machinery, New York, NY, USA, 2021, p. 1673–1685. URL: https://doi.org/10.1145/3442381.3449917. doi:10.1145/3442381.3449917.
- Biographical semi-supervised relation extraction dataset, in: Proceedings of the 45th International ACM SIGIR Conference on Research and Development in Information Retrieval, SIGIR ’22, Association for Computing Machinery, New York, NY, USA, 2022, p. 3121–3130. URL: https://doi.org/10.1145/3477495.3531742. doi:10.1145/3477495.3531742.
- Integrating nlp and context-free grammar for complex rule interpretation towards automated compliance checking, Computers in Industry 142 (2022) 103746.
- T. Perry, LightTag: Text annotation platform, in: Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing: System Demonstrations, Association for Computational Linguistics, Online and Punta Cana, Dominican Republic, 2021, pp. 20–27. URL: https://aclanthology.org/2021.emnlp-demo.3.
- E. Hjelseth, N. Nisbet, Capturing normative constraints by use of the semantic mark-up rase methodology, in: Proceedings of CIB W78-W102 Conference, 2011, pp. 1–10.
- A rule-based semantic approach for automated regulatory compliance in the construction sector, Expert Syst. Appl. 42 (2015) 5219–5231.
- R. Zhang, N. El-Gohary, A deep neural network-based method for deep information extraction using transfer learning strategies to support automated compliance checking, Automation in Construction 132 (2021) 103834.
- Integrating nlp and context-free grammar for complex rule interpretation towards automated compliance checking, Comput. Ind. 142 (2022).
- Knowledge-informed semantic alignment and rule interpretation for automated compliance checking, Automation in Construction 142 (2022) 104524.