DOI: 10.17586/2226-1494-2015-15-5-869-876


SEMSIN SEMANTIC AND SYNTACTIC PARSER

K. K. Boyarsky, E. A. Kanevskiy


Read the full article 
Article in Russsian

For citation: Boyarsky K.K., Kanevsky E.A. SemSin semantic and syntactic parser. Scientific and Technical Journal of Information Technologies, Mechanics and Optics, 2015, vol. 15, no. 5, pp. 869–876.

Abstract

The paper deals with the principle of operation for SemSin semantic and syntactic parser creating a dependency tree for the Russian language sentences. The parser consists of 4 blocks: a dictionary, morphological analyzer, production rules and lexical analyzer. An important logical part of the parser is pre-syntactical module, which harmonizes and complements morphological analysis results, separates the text paragraphs into individual sentences, and also carries out predisambiguation. Characteristic feature of the presented parser is an open type of control – it is done by means of a set of production rules. A varied set of commands provides the ability to both morphological and semantic-syntactic analysis of the sentence. The paper presents the sequence of rules usage and examples of their work. Specific feature of the rules is the decision making on establishment of syntactic links with simultaneous removal of the morphological and semantic ambiguity. The lexical analyzer provides the execution of commands and rules, and manages the parser in manual or automatic modes of the text analysis. In the first case, the analysis is performed interactively with the possibility of step-by-step execution of the rules and scanning the resulting parse tree. In the second case, analysis results are filed in an xml-file. Active usage of syntactic and semantic dictionary information gives the possibility to reduce significantly the ambiguity of parsing. In addition to marking the text, the parser is also usable as a tool for information extraction from natural language texts.  


Keywords: automatic text analysis, actants, dependence tree, semantic classes, token, parser, production rules, semantics.

References
1. Lyashevskaya O.N., Astaf'eva I., Bonch-Osmolovskaya A., Gareishina A., Grishina Yu., D'yachkov V., Ionov M., Koroleva A., Kudrinskii M., Lityagina A., Luchina E., Sidorova E., Toldova S., Savchuk S., Koval' S. Evaluation methods for automatic text analysis: morphological parsers of Russian language. Computational Linguistics and Intelligent Technologies, 2010, no. 9 (16), pp. 318–326. (In Russian)
2. Toldova S.Yu., Sokolova E.G., Astaf'eva I., Gareishina A., Koroleva A., Privoznov D., Sidorova E., Tupikina L., Lyashevskaya O.N. Evaluation methods for automatic text analysis 2011-2012: syntax parsers of Russian language. Computational Linguistics and Intelligent Technologies, 2012, no. 11, pp. 77–90. (In Russian)
3. Kanevsky E.A., Boiarsky K.K. Morphological and lexical analyzer and text classification. Materialy V Mezhdunarodnoi Nauchno-Prakticheskoi Konferentsii Prikladnaya Lingvistika v Nauke i Obrazovanii [Proc. V Int. Scientific Conference on Applied Linguistics in Science and Education]. St. Petersburg, 2010, pp. 157– 163. (In Russian)
4. Kanevsky E.A., Boiarsky K.K. The semantic-and-syntactic parser SemSin. Computational Linguistics and Intelligent Technologies. 2012.
5. Boyarsky K.K., Kanevsky E.A., Lezin G.V., Kalinichenko L.A., Skvortsov N.A. Automation of process of extraction of the ontological information from verbal terminological dictionaries (on the example of the terminological dictionary of the problem of interstellar extinction). Proc. XII Conference on Digital Libraries: Advanced Methods and Technologies, RCDL-2010. Kazan', 2010, pp. 257–264. (In Russian)
6. Tuzov V.A. Komp'yuternaya Semantika Russkogo Yazyka [Computer Semantics of Russian Language]. St. Petersburg, SPbSU Publ., 2004, 400 p.
7. Boyarsky K.K., Kanevsky E.A., Stafeev S.K. The use of dictionary information in text analysis. Scientific and Technical Journal of Information Technologies, Mechanics and Optics, 2012, no. 3(79), pp. 87–91.
8. Kanevsky E.A., Kolpakova N.V. On the construction of the morphological analyzer. Computational Linguistics and Intelligent Technologies, 1999, vol. 2, pp. 98–106. (In Russian)
9. Boyarskii K.K., Kanevskii E.A., Klimenko E.N. Morphological text analysis in MAZE-32. Informatsionnye Tekhnologii v Gumanitarnykh i Obshchestvennykh Naukakh. St. Petersburg, SPb EMI RAN, 2001, no. 11, pp. 1–8. (In Russian)
10. Kobzareva T.Yu., Afanas'ev R.N. Universal pre-syntactical module of homonymy parts of speech in Russian using dictionary-based diagnostic situations. Computational Linguistics and Intelligent Technologies, 2002,
pp. 258–268. (In Russian)
11. Boyarsky K.K., Kanevsky E.A. Pre-syntactical module of the parser SemSin. Internet i Sovremennoe Obshchestvo. St. Petersburg, 2013, pp. 280–286.
12. Dorokhina G.V., Zhuravlev A.O., Bondarenko E.A. Study algorithm of morphological analysis of words with spelling defisnym. Sistemy i Sredstva Iskusstvennogo Intellekta, SSII-2012. Donetsk, 2012, pp. 17–24. (In
Russian)
13. Zakharov V.P. Morphological analysis of unfamiliar words in the text based on word-formation models. Materialy XLIV Mezhdunarodnoi Filologicheskoi Konferentsii [Proc. XLIV International Philological Conference]. St. Petersburg, 2015, pp. 581–582. (In Russian)
14. Boyarskii K.K., Kanevskii E.A. Automatic detection of surnames in the text. In Informatsionnye Sistemy dlya Nauchnykh Issledovanii. St. Petersburg, 2012, pp. 280–286. (In Russian)
15. Natsional'nyi Korpus Russkogo Yazyka. Available at: http://www.ruscorpora.ru/ (accessed: 2.03.2015).
16. Rogozhnikova R.P. Tolkovyi Slovar' Sochetanii, Ekvivalentnykh Slovu [Explanatory Dictionary of Combinations Equivalent to Word]. Moscow, Astrel'-AST Publ., 2003, 416 p.
17. Kanevsky E.A., Boyarsky K.K. Special words in the Russian language text. Materialy XLII Mezhdunarodnoi Filologicheskoi Konferentsii [Proc. XLII International Philological Conference]. St. Petersburg, 2013, pp.
47–52. (In Russian)
18. Boyarsky K.K., Kanevsky E.A. Rule language for construction of a syntactic tree. Internet i Sovremennoe Obshchestvo, IMS-2011. St. Petersburg, 2011, pp. 233–237. (In Russian)
19. Kobzareva T.Yu. Principles of segmentation analysis of Russian sentences. Moskovskii Lingvisticheskii Zhurnal, 2004, vol. 8, no. 1, pp. 31–80. (In Russian)
20. Boyarskii K.K., Kanevskii E.A. Splitting text into sentences. Diskussiya Teoretikov i Praktikov, 2010, no. 3, pp. 135–137. (In Russian)
21. Avdeeva N.A., Boyarskii K.K. About the syntactical relation in numerical constructions. Materialy XLIV Mezhdunarodnoi Filologicheskoi Konferentsii [Proc. XLIV International Philological Conference]. St. Petersburg, 2015, pp. 569–570. (In Russian)
22. Boyarsky K.K., Kanevsky E.A., Stepukova A.V. Anaphoric relations identification by automatic text analysis. Scientific and Technical Journal of Information Technologies, Mechanics and Optics, 2013, no. 5(87), pp.
108–112. (In Russian)
23. Boyarskii K.K., Kanevskii E.A., Lezin G.V. Preliminary transform of the syntax tree. Internet i Sovremennoe Obshchestvo. St. Petersburg, 2010, pp. 3–8. (In Russian)
24. Artemova G., Boyarsky K., Gusarova N., Dobrenko N., Kanevsky E. Text categorization for generation of historical shipbuilding ontology. Proc. XVI Conference on Digital Libraries: Advanced Methods and Technologies, RCDL-2014. Dubna, Russia, 2014, pp. 159–164.
25. Artemova G., Gouzévitch D., Gusarova N., Dobrenko N., Kanevsky E., Petrova D. Text categorization for generation of historical shipbuilding ontology. Communications in Computer and Information Science, 2014, vol. 468, pp. 1–14.


Creative Commons License

This work is licensed under a Creative Commons Attribution-NonCommercial 4.0 International License
Copyright 2001-2019 ©
Scientific and Technical Journal
of Information Technologies, Mechanics and Optics.
All rights reserved.

Яндекс.Метрика