Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
121 tokens/sec
GPT-4o
9 tokens/sec
Gemini 2.5 Pro Pro
47 tokens/sec
o3 Pro
4 tokens/sec
GPT-4.1 Pro
38 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

CODE-ACCORD: A Corpus of building regulatory data for rule generation towards automatic compliance checking (2403.02231v4)

Published 4 Mar 2024 in cs.IR

Abstract: Automatic Compliance Checking (ACC) within the Architecture, Engineering, and Construction (AEC) sector necessitates automating the interpretation of building regulations to achieve its full potential. Converting textual rules into machine-readable formats is challenging due to the complexities of natural language and the scarcity of resources for advanced Machine Learning (ML). Addressing these challenges, we introduce CODE-ACCORD, a dataset of 862 sentences from the building regulations of England and Finland. Only the self-contained sentences, which express complete rules without needing additional context, were considered as they are essential for ACC. Each sentence was manually annotated with entities and relations by a team of 12 annotators to facilitate machine-readable rule generation, followed by careful curation to ensure accuracy. The final dataset comprises 4,297 entities and 4,329 relations across various categories, serving as a robust ground truth. CODE-ACCORD supports a range of ML and NLP tasks, including text classification, entity recognition, and relation extraction. It enables applying recent trends, such as deep neural networks and LLMs, to ACC.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (15)
  1. J. Zhang, N. El-Gohary, Extraction of construction regulatory requirements from textual documents using natural language processing techniques, in: Congress on Computing in Civil Engineering, Proceedings, 2012, pp. 453–460. doi:10.1061/9780784412343.0057.
  2. Exploring natural language processing in construction and integration with building information modeling: A scientometric analysis, Buildings 11 (2021).
  3. Semantisation of rules for automated compliance checking, in: LDAC2023-Linked Data in Architecture and Construction Week, 2023.
  4. Leveraging word embeddings and transformers to extract semantics from building regulations text, in: LDAC2023-Linked Data in Architecture and Construction Week, 2023.
  5. E. F. Tjong Kim Sang, F. De Meulder, Introduction to the CoNLL-2003 shared task: Language-independent named entity recognition, in: Proceedings of the Seventh Conference on Natural Language Learning at HLT-NAACL 2003, 2003, pp. 142–147. URL: https://aclanthology.org/W03-0419.
  6. L. Ramshaw, M. Marcus, Text chunking using transformation-based learning, in: Third Workshop on Very Large Corpora, 1995. URL: https://aclanthology.org/W95-0107.
  7. Recon: Relation extraction using knowledge graph context in a graph neural network, in: Proceedings of the Web Conference 2021, WWW ’21, Association for Computing Machinery, New York, NY, USA, 2021, p. 1673–1685. URL: https://doi.org/10.1145/3442381.3449917. doi:10.1145/3442381.3449917.
  8. Biographical semi-supervised relation extraction dataset, in: Proceedings of the 45th International ACM SIGIR Conference on Research and Development in Information Retrieval, SIGIR ’22, Association for Computing Machinery, New York, NY, USA, 2022, p. 3121–3130. URL: https://doi.org/10.1145/3477495.3531742. doi:10.1145/3477495.3531742.
  9. Integrating nlp and context-free grammar for complex rule interpretation towards automated compliance checking, Computers in Industry 142 (2022) 103746.
  10. T. Perry, LightTag: Text annotation platform, in: Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing: System Demonstrations, Association for Computational Linguistics, Online and Punta Cana, Dominican Republic, 2021, pp. 20–27. URL: https://aclanthology.org/2021.emnlp-demo.3.
  11. E. Hjelseth, N. Nisbet, Capturing normative constraints by use of the semantic mark-up rase methodology, in: Proceedings of CIB W78-W102 Conference, 2011, pp. 1–10.
  12. A rule-based semantic approach for automated regulatory compliance in the construction sector, Expert Syst. Appl. 42 (2015) 5219–5231.
  13. R. Zhang, N. El-Gohary, A deep neural network-based method for deep information extraction using transfer learning strategies to support automated compliance checking, Automation in Construction 132 (2021) 103834.
  14. Integrating nlp and context-free grammar for complex rule interpretation towards automated compliance checking, Comput. Ind. 142 (2022).
  15. Knowledge-informed semantic alignment and rule interpretation for automated compliance checking, Automation in Construction 142 (2022) 104524.

Summary

We haven't generated a summary for this paper yet.