SNOWTEC : Synthetic Natural language Oversampling With Transformer-based information ExtraCtion for automated compliance checking

Hettiarachchi, H. and Gaber, M.M. and Parsafard, P. and Vakaj, E. (2026) SNOWTEC : Synthetic Natural language Oversampling With Transformer-based information ExtraCtion for automated compliance checking. Machine learning with applications, 24: 100911. ISSN 2666-8270

Full text not available from this repository.

Abstract

The optimisation of Automated Compliance Checking (ACC) in Architecture, Engineering, and Construction (AEC) necessitates the interpretation of building codes into machine-processable formats. As these codes primarily exist in textual form, Information Extraction (IE) became integral to decoding this data, encouraging various IE techniques spanning manual, rule-based, and machine-learning methodologies. Recent research has shown promise in adopting deep learning; however, as far as we know, within AEC, the transformers/language models’ potential remains untapped/unexplored, yet they hold state-of-the-art performance across various text-based tasks. To address this gap, we propose an approach based on Synthetic Natural language Oversampling With Transformer-based information ExtraCtion (SNOWTEC), designed to extract entities and relations from regulatory text to convert them into machine-processable knowledge graphs. We involve transformer-based architectures and introduce an innovative data oversampling/augmentation approach addressing data scarcity, which impedes model performance. Our experiments across multiple sub-domains highlight the transformers’ strength in identifying relations but also reveal challenges in recognising entities within the AEC domain, providing insights for future research. Data oversampling played a crucial role in improving relation extraction, resulting in a notable 26% average F1 increase.

Item Type:
Journal Article
Journal or Publication Title:
Machine learning with applications
ID Code:
237771
Deposited By:
Deposited On:
04 Jun 2026 09:40
Refereed?:
Yes
Published?:
Published
Last Modified:
04 Jun 2026 23:44