doi: 10.17586/2226-1494-2018-18-4-690-694


N. V. Dobrenko

Read the full article  ';
Article in Russian

For citation: Dobrenko N.V. Algorithm composition of text thematic segmentation as intellectualization instrument for design of technical systems. Scientific and Technical Journal of Information Technologies, Mechanics and Optics, 2018, vol. 18, no. 4, pp. 690–694 (in Russian). doi: 10.17586/2226-1494-2018-18-4-690-694


The paper considers the problem of thematic segmentation of extended texts aimed at the support of technical systems designer operation. The example shows that different segmentation algorithms allocate meaningfully different text fragments, and the composition of algorithms in a classical form, that is, by summarizing the results in order to single out the best one, seems to be wrong. At the same time, the simultaneous demonstration of several versions of the thematic segmentation enables the reader to obtain an integral representation of the text structure, thereby facilitating the choice of an effective strategy for mastering the text. The created system of  thematic segmentation visualization of extended texts is described, providing the user to select and analyze not the whole text, but only fragments corresponding to his current information needs. The system gives the possibility to view simultaneously the results of text segmentation performed by various algorithms. Thus, the user's abilities for quick and efficient analysis and capturing of a large amount of textual information are enhanced.             

Keywords: thematic segmentation, algorithm composition, visualization system

Acknowledgements. The work was supported by SRR-FUND 617042 in ITMO University.

  1. Jurafsky D., Martin J.H. Speech and Language Processing: An Introduction to Natural Language Processing, Speech Recognition, and Computational Linguistics. PearsonPrentice Hall, 2009, 988 p.
  2. Gavrilova T.A., Khoroshevskii V.F. Knowledge Base of Intelligent Systems. St. Petersburg, Piter Publ., 2000, 384 p. (in Russian)
  3. Van Dijk T.A., Kintsch W. Strategies of Discourse Comprehension. NY, Academic Press, 1983, 423 p.
  4. Vorontsov K.V., Potapenko A.A. Additive regularization of topic models. Machine Learning, 2014, vol. 101, no. 1-3, pp. 303–323. doi: 10.1007/s10994-014-5476-6
  5. Boyd-Graber J., Chang J., Gerrish S., Wang C., Blei D. Reading tea leaves: how humans interpret topic models. Proc. 23rd Annual Conference on Neural Information Processing Systems, NIPS. Vancouver, Canada, 2009, pp. 288–296.
  6. Liu L., Tang L., Dong W., Yao S., Zhou W. An overview of topic modeling and its current applications in bioinformatics. SpringerPlus,2016, vol. 5, pp. 1608.doi: 10.1186/s40064-016-3252-8
  7. Boyarskii K.K., Gusarova N.F., Dobrenko N.V., Kanevskii E.A., Avdeeva N.A. Specifics of applying topic segmentation algorithms to scientific texts. Analitika i Upravlenie Dannymi v Oblastyakh s Intensivnym Ispol'zovaniem Dannykh, 2015, pp. 181–189. (in Russian)
  8. Buraya K.I., Grozin V.A., Gusarova N.F., Dobrenko N.V. Machine learning methods for extracting of professionally significant information from web forums. Distantsionnoe i Virtual'noe Obrazovanie, 2015, no. 12, pp. 46–63. (in Russian)
  9. Buraya K.I., Vinogradov P.D., Grozin V.A., Gusarova N.F., Dobrenko N.V., Trofimov V.A. Automatic summarization of web forums as sources of professionally significant information. Scientific and Technical Journal of Information Technologies, Mechanics and Optics, 2016, vol. 16, no. 3, pp. 482–496. (in Russian) doi: 10.17586/2226-1494-2016-16-3-482-496
  10. Grozin V.A., Dobrenko N.V., Gusarova N.F., Ning T. The application of machine learning methods for analysis of text forums for creating learning objects. Proc. Int. Conf. on Computational Linguistics and Intellectual Technologies. Moscow, 2015, vol. 1, no. 14, pp. 202–213.
  11. Rоммe М. L' Art de la Marine, оu Principes еt Préceptes Generaux dе l'Art de Construire, d'Armer, de Manœuvrer et de Conduire dеs Vasseaux. LaRochelle, 1787. ChapitreVII.
  12. Aysina R.M. Surveyof visualization tools for topic models of text corpora. Machine Learning and Data Analysis, 2015, vol. 1, no. 11, pp. 1584–1618. (in Russian)
  13. Ianina A.O., Vorontsov K.V. Multimodal topic modeling for exploratory search in collective blog. Proc. 11th Int. Conf. on Intelligent Data Processing: Theory and Applications. Moscow, 2016, pp. 186–187. (in Russian)

Creative Commons License

This work is licensed under a Creative Commons Attribution-NonCommercial 4.0 International License
Copyright 2001-2024 ©
Scientific and Technical Journal
of Information Technologies, Mechanics and Optics.
All rights reserved.