For citation: Yu Chuqiao. A method of automatic open relation extraction from Chinese texts.
Scientific and Technical Journal of Information Technologies, Mechanics and Optics, 2018, vol. 18, no. 1, pp. 163–165 (in Russian). doi: 10.17586/2226-1494-2018-18-1-163-165
Abstract
The paper considers the problem of Chinese Open Relation Extraction represented in a form of subject-predicate-object. In contrary to well-known multi-phase methods including word segmentation, part-of-speech tagging, and syntactic analysis, we propose a role approach to detection of parts of sentences without preliminary word segmentation. The key idea is to use syntactic words, prepositions and postpositions as part of speech and member of sentence attributes. Coupled with a small dictionary, it is enough for facts extraction by a query. The experiments conducted on a real technical text show satisfactory results comparable to a traditional approach
Keywords: facts extraction, Chinese language, role approach, texts analysis, dictionary, phrase segmentation, part-of-speech tagging
References
1. Banko M., Cafarella M.J., Soderland S., Broadhead M., Etzioni O. Open information extraction from the Web. Proc.20th Int. Joint Conf. on Artificial Intelligence, IJCAI’07. Hyderabad, India, 2007, pp. 2670–2676.
2. Tseng Y.H., Lee L.H., Lin S.Y, Liao B.S., Liu M.J., Chen H.H., Etzioni O., Fader A. Chinese open relation extraction for knowledge acquisition. Proc. 14th Conf. of the European Chapter of the Association for Computational Linguistics, EACL. Gothenburg, Sweden, 2014, vol. 2, pp. 12–16. doi: 10.3115/v1/e14-4003
3. Zeng D., Wei D., Chau M., Wang F. Domain-specific Chinese word segmentation using suffix tree and mutual information.Information Systems Frontiers,2011,vol. 13,no. 1,pp. 115–125. doi: 0.1007/s10796-010-9278-5
4. Zhao J., Qiu X., Zhang S., Ji F., Huang X. Part-of-speech tagging for Chinese-English mixed texts with dynamic features. Proc. 2012 Joint Conference on Empirical Methods in Natural Language Processing and Computational Natural Language Learning. Jeju Island, Korea, 2012, pp. 1379–1388.
5. Basili R. A contrastive approach to term extraction. Proc. 4th Terminological and Artificial Intelligence Conference, TIA2001. Nancy, France, 2001.
6. Lopes L., Fernandes P., Vieira R. Estimating term domain relevance through term frequency, disjoint corpora frequency – TF-DCF. Knowledge-Based Systems, 2016, vol. 97, pp. 237–249. doi: 10.1016/j.knosys.2015.12.015
7. Zhu Q., Cheng X.Y. The overview of Chinese information extraction. IJCSNS International Journal of Computer Science and Network Security, 2010, vol. 10, no. 9, pp. 171–174.
8. Wong W. Determination of unithood and termhood for term recognition. In Text and Web Mining Technologies. IGI Global, 2008, pp. 500–529. doi: 10.4018/978-1-59904-990-8.ch030
9. Nugumanova A., Bessmertny I.A., Baiburin Y., Mansurova M. A new operationalization of contrastive term extraction approach based on recognition of both representative and specific terms. Communications in Computer and Information Science, 2016, vol. 649, pp. 103–118. doi:10.1007/978-3-319-45880-9_9
10. Bessmertny I.A., Yu Chuqiao, Ma Pengyu. Statistical method of term extraction from Chinese texts without preliminary segmentation of phrases. Scientific and Technical Journal of Information Technologies, Mechanics and Optics, 2016, vol. 16, no. 6, pp. 1096–1102. doi: 10.17586/2226-1494-2016-16-6-1096-1102