doi: 10.17586/2226-1494-2021-21-1-102-108


A QUANTUM-LIKE SEMANTIC MODEL FOR TEXT RETRIEVAL IN ARABIC

A. Shaker, I. A. Bessmertny, L. A. Miroslavskaya, J. A. Koroleva


Read the full article  ';
Article in Russian

For citation:
Shaker A., Bessmertny I.A., Miroslavskaya L.A., Koroleva Ju.A. A quantum-like semantic model for text retrieval in Arabic. Scientific and Technical Journal of Information Technologies, Mechanics and Optics, 2021, vol. 21, no. 1, pp. 102–108 (in Russian). doi: 10.17586/2226-1494-2021-21-1-102-108


Abstract
The subject of study. The paper focuses on the extraction of semantics from texts in Arabic. In particular, the applicability of the Bell test to word pairs is investigated as a measure of the semantic words relatedness in a context. The study applies the quantum formalism to the task of information retrieval in Arabic texts and presents the results of this work. The authors also examine the influence of the context width on the effectiveness of information retrieval. Method. The research is based on the vector representation of the context. It uses the well-known approach based on the HAL (Hyperspace Analogue to Language) matrix and Bell test. The HAL matrix allows taking into account both the frequency of the words occurrence in the context and the distance to the target word. Quantum theory operates with probability density matrices. Quantum theory allows describing probabilities in the vector space in a more natural way  i.e., words can be represented as vectors. Main results. The results demonstrate that using the Bell’s test for texts in Arabic provides a better ranking of search results compared to the results of search services. Practical significance. The research results can be used in the development of the information retrieval systems, as well as for the further development of methods based on the distributive hypothesis.

Keywords: Bell inequality, quantum entanglement, information retrieval, HAL, IR algorithms, quantum theory, Arabic language, natural language processing

References
1.Yang Y., Pedersen J.O. A comparative study on feature selection in text categorization. ICML’97: Proc. of the Fourteenth International Conference on Machine Learning, 1997, pp. 412–420.
2.Peñas A., Verdejo F., Gonzalo J. Corpus-based terminology extraction applied to information access. Proc. of the Corpus Linguistics 2001 Conference, 2001, pp. 458–465.
3.Bessmertnyi I.A., Nugumanova A.B. Method for automatic construction of thesauri based on statistical processing of natural language texts. Bulletin of the Tomsk Polytechnic University, 2012, vol. 321, no. 5, pp. 125–130. (in Russian)
4.Jones K.S. A statistical interpretation of term specificity and its application in retrieval. Journal of Documentation, 2004, vol. 60, no. 5, pp. 493–502. doi: 10.1108/00220410410560573
5.Zeng D., Wei D., Chau M., Wang F. Domain-specific Chinese word segmentation using suffix tree and mutual information. Information Systems Frontiers, 2011, vol. 13, no. 1, pp. 115–125. doi: 10.1007/s10796-010-9278-5
6.Harris Z.S. Distributional structure. Word, 1954, vol. 10, no. 2-3, pp. 146–162. doi: 10.1080/00437956.1954.11659520
7.Sahlgren M. The distributional hypothesis. Rivista di Linguistica, 2008, vol. 20, no. 1, pp. 33–53.
8.Melucci M., Piwowarski B. Quantum mechanics and information retrieval: From theory to application. Proc. 4th International Conference on the Theory of Information Retrieval, ICTIR 2013, 2013, pp. 1. ACM International Conference Proceeding Series. doi: 10.1145/2499178.2499202
9.Trukhanov A., Platonov A., Bessmertny I. Using quantum probability for word embedding problem. CEUR Workshop Proceedings, 2020, vol. 2590.
10.Bessmertny I.A., Huang X., Platonov A.V., Yu C., Koroleva J.A. Applying the Bell’s test to chinese texts. Entropy, 2020, vol. 22, no. 3, pp. 275. doi: 10.3390/e22030275
11.Lund K., Burgess C. Producing high-dimensional semantic spaces from lexical co-occurrence. Behavior Research Methods, Instruments, and Computers, 1996, vol. 28, no. 2, pp. 203–208. doi: 10.3758/ BF03204766
12.Galofaro F., Toffano Z., Doan B.-L. A quantum-based semiotic model for textual semantics. Kybernetes, 2018, vol. 47, no. 2, pp. 307–320. doi: 10.1108/K-05-2017-0187
13.Shaker A. Using bell test for realizing a quantum-like semantic model for text retrieval in arabic texts. Collection of Abstracts of the Congress of Young Scientists, 2020. Available at: https://kmu.itmo.ru/digests/article/4084. IET — 2020 (accessed: 14.12.2020).
14.Galofaro F., Doan B.-L., Toffano Z. Linguistics and quantum theory: epistemological perspectives. Proc. 19th IEEE International Conference on Computational Science and Engineering, 14th IEEE International Conference on Embedded and Ubiquitous Computing and 15th International Symposium on Distributed Computing and Applications to Business, Engineering and Science, 2016, pp. 660– 667. doi: 10.1109/CSE-EUC-DCABES.2016.257
15.Kartsaklis D. Compositional operators in distributional semantics. Springer Science Reviews, 2014, vol. 2, no. 1-2, pp. 161–177. doi: 10.1007/s40362-014-0017-z
16.Cabello A. Violating Bell’s inequality beyond Cirel’son’s bound. Physical Review Letters, 2002, vol. 88, no. 6, pp. 060403. doi: 10.1103/PhysRevLett.88.060403
17.Popescu S., Rohrlich D. Quantum nonlocality as an axiom. Foundations of Physics, 1994, vol. 24, no. 3, pp. 379–385. doi: 10.1007/BF02058098
18.Bruza P.D., Woods J. Quantum collapse in semantic space: interpreting natural language argumentation. Proc. 2nd Quantum Interaction Symposium, 2008, pp. 141–147.
 


Creative Commons License

This work is licensed under a Creative Commons Attribution-NonCommercial 4.0 International License
Copyright 2001-2024 ©
Scientific and Technical Journal
of Information Technologies, Mechanics and Optics.
All rights reserved.

Яндекс.Метрика