doi: 10.17586/2226-1494-2021-21-2-256-266


Building knowledge graphs of regulatory documentation based on semantic modeling and automatic term extraction

D. I. Mouromtsev, I. A. Shilin, D. A. Pliukhin, I. R. Baimuratov, R. R. Khaydarova, Y. Y. Dementyeva, D. A. Ozhigin, T. A. Malysheva


Read the full article  ';
Article in Russian

For citation:

Mouromtsev D.I., Shilin I.A., Pliukhin D.A., Baimuratov I.R., Khaydarova R.R., Dementyeva Yu.Yu., Ozhigin D.A., Malysheva T.A. Building knowledge graphs of regulatory documentation based on semantic modeling and automatic term extraction. Scientific and Technical Journal of Information Technologies, Mechanics and Optics, 2021, vol. 21, no. 2, pp. 256–266 (in Russian). doi: 10.17586/2226-1494-2021-21-2-256-266



Abstract

The paper proposes a new complex solution for automatic analysis and terms identification in regulatory and technical documentation (RTD). The task of terms identification in the documentation is one of the key issues in the digitalization dealing with the design and construction of buildings and structures. At the moment, the search and verification of RTD requirements is performed manually, which entails a significant number of errors. Automation of such tasks will significantly improve the quality of computer-aided design. The developed algorithm is based on such methods of natural language analysis as tokenization, search for lemmas and stems, analysis of stop words and word embeddings applied to tokens and phrases, part-of-speech tagging, syntactic annotation, etc. The experiments on the automatic extraction of terms from regulatory documents have shown great prospects of the proposed algorithm and its application for building knowledge graphs in the design domain. The recognition accuracy for 202 documents selected by experts was 79 % for the coincidence of names and 37 % for the coincidence of term identifiers. This is a comparable result with the known approaches to solving this problem. The results of the work can be used in computer-aided design systems based on Building information modeling (BIM) models, as well as to automate the examination of design documentation.


Keywords: semantic text analysis, ontologies, term extraction, word embeddings, deep neural networks

References
  1. Eastman C.M., Teicholz P., Sacks R., Liston K. BIM Handbook: A Guide to Building Information Modeling for Owners, Managers, Designers, Engineers and Contractors. John Wiley & Sons, 2011, 640 p.
  2. Liebich T. et al. Industry foundation classes IFC2x edition 3 technical corrigendum 1. International Alliance for Interoperability (Model Support Group), 2012.
  3. Pauwels P., Van Deursen D., Verstraeten R., De Roo J., De Meyer R., Van De Walle R., Van Campenhout J. A semantic rule checking environment for building performance checking. Automation in Construction, 2011, vol. 20, no. 5, pp. 506–518. doi: 10.1016/j.autcon.2010.11.017
  4. Zhang C., Beetz J., Weise M. Model view checking: automated validation for IFC building models. eWork and eBusiness in Architecture, Engineering and Construction: Proc. 10th European Conference on Product and Process Modelling, ECPPM, 2014, pp. 123–128. doi: 10.1201/b17396-24
  5. Pauwels P., Terkaj W. EXPRESS to OWL for construction industry: Towards a recommendable and usable ifcOWL ontology. Automation in Construction, 2016, vol. 63, pp. 100–133. doi: 10.1016/j.autcon.2015.12.003
  6. Dawood H., Siddle J., Dawood N. Integrating IFC and NLP for automating change request validations. Journal of Information Technology in Construction, 2019, vol. 24, pp. 540–552. doi: 10.36680/J.ITCON.2019.030
  7. Hernández E.G., Piulachs J.M. Application of the Dublin Core format for automatic metadata generation and extraction. Proc. 5th International Conference on Dublin Core and Metadata Applications (DC-2005), 2005, pp. 213–216.
  8. Constantin A., Peroni S., Pettifer S., Shotton D., Vitali F. The document components ontology (DoCO). Semantic Web, 2016, vol. 7, no. 2, pp. 167–181. doi: 10.3233/SW-150177
  9. Villegas M., Bel N. PAROLE/SIMPLE ‘lemon’ontology and lexicons. Semantic Web, 2015, vol. 6, no. 4, pp. 363–369. doi: 10.3233/SW-140148
  10. Devlin J., Chang M., Lee K., Toutanova K. BERT: Pre-training of deep bidirectional transformers for language understanding. Proc. Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, NAACL HLT. V 1. 2019, pp. 4171–4186.


Creative Commons License

This work is licensed under a Creative Commons Attribution-NonCommercial 4.0 International License
Copyright 2001-2024 ©
Scientific and Technical Journal
of Information Technologies, Mechanics and Optics.
All rights reserved.

Яндекс.Метрика