An algorithm for detecting leaks of insider information of financial markets in investment consulting.

Vorobeva Alisa A. , Vladislav V. Gerasimov , Yulia V. Li

2021 , VOLUME 21, NUMBER 3 ( May-June )

ISSN 2226-1494 (print), ISSN 2500-0373 (online)

Publications

Editor-in-Chief

Nikiforov
Vladimir O.
D.Sc., Prof.

Partners

doi: 10.17586/2226-1494-2021-21-3-394-400

An algorithm for detecting leaks of insider information of financial markets in investment consulting.

A. A. Vorobeva, V. V. Gerasimov, Y. V. Li

Read the full article

Article in Russian

For citation:

Vorobeva A.A., Gerasimov V.V., Li Yu.V. An algorithm for detecting leaks of insider information ofﬁnancial markets in investment consulting. Scientiﬁc and Technical Journal of Information Technologies, MechanicsandOptics, 2021, vol. 21, no. 3, pp. 394–400 (in Russian). doi: 10.17586/2226-1494-2021-21-3-394-400

Abstract

The paper focuses on revealing insider information leaks of ﬁnancial markets during investment consulting. An original dataset was created, containing the records of the conversations between consultants and clients, presented in the formof dialogs in text format. The applicability of machine learning methods for automating the detection of leaks arising in a conversation between a consultant and a client has been studied. The authors examined the applicability of the following supervised machine learning methods for constructing and training a classiﬁer: probabilistic (Naïve Bayes classiﬁer), metric (k-nearest neighbors algorithm), logical (random forest), linear (support vector machine), and methods based on artiﬁcial neural networks. The paper considers various approaches to the construction of a natural language text model, such as tokenization (bag of words, word n-grams: bigrams and trigrams) and vectorization (one-hot encoding). The proposed algorithm for detecting ﬁnancial markets insider information leaks is based on the use of support vector machine (SVM) and tokenization by bigrams. The obtained results demonstrate that SVM and bigram tokenization provide the highest leakage detection accuracy. The research results can be used in cybersecurity tools development, as well as for the further elaboration of natural language processing methods dealing with information security problems.

Keywords: natural language processing, machine learning, neural networks, compliance risks, insider information

Acknowledgements. The paper was prepared at ITMO University within the framework of the scientiﬁc project No. 50449 “Development of cyberspace protection algorithms for solving applied problems of ensuring cybersecurity of banking organizations”.

References

1. Nini G., Smith D.C., Sufi A. Creditor control rights and firm investment policy. Journal of Financial Economics, 2009, vol. 92, no. 3, pp. 400–420. doi: 10.1016/j.jfineco.2008.04.008

2. Jaiswal S. Connections and conflicts of interest: investment consultants' recommendations, SSRN. 2018. Available at: https://ssrn.com/abstract=3106528 (accessed: 05.03.2021). doi: 10.2139/ssrn.3106528

3. Jenkinson T., Jones H., Martinez J.V. Picking winners? Investment consultants’ recommendations of fund managers. Journal of Finance, 2016, vol. 71, no. 5, pp. 2333–2370. doi: 10.1111/jofi.12289.

4. Medhat W., Hassan A., Korashy H. Sentiment analysis algorithms and applications: a survey. Ain Shams Engineering Journal, 2014, vol. 5, no. 4, pp. 1093–1113. doi: 10.1016/j.asej.2014.04.011

5. Ghiassi M., Olschimke M., Moon B., Arnaudo P. Automated text classification using a dynamic artificial neural network model. Expert Systems with Applications, 2012, vol. 39, no. 12, pp. 10967–10976. doi: 10.1016/j.eswa.2012.03.027

6. Fuller C.M., Biros D.P., Delen D. An investigation of data and text mining methods for real world deception detection. Expert Systems with Applications, 2011, vol. 38, no. 7, pp. 8392–8398. doi: 10.1016/j.eswa.2011.01.032

7. Batura T.V. Automatic text classification methods. Software & Systems, 2017, no. 1, pp. 85–99. (in Russian). doi: 10.15827/0236-235X.030.1.085-099

8. Alekseeva V. Using of mining techniques in problems of binary classification. Izvestia of Samara Scientific Center of the Russian Academy of Sciences, 2014, vol. 16, no. 6-2, pp. 354–356. (in Russian)

9. Babaev A.M. Basic principles of natural language processing. Dnevnik Nauki, 2019, no. 12, pp. 14. (in Russian)

10. Zhang Y., Jin R., Zhou Z.-H. Understanding bag-of-words model: a statistical framework. International Journal of Machine Learning and Cybernetics, 2010, vol. 1, no. 1-4, pp. 43–52. doi: 10.1007/s13042-010-0001-0

11. Cappelle B., Depraetere I., Lesuisse M. The necessity modals have to, must, need to, and should: Using n-grams to help identify common and distinct semantic and pragmatic aspects. Constructions and Frames, 2019, vol. 11, no. 2, pp. 220–243. doi: 10.1075/cf.00029.cap

12. Weiss S.M., Indurkhya N., Zhang T., Damerau F.F. Text Mining Predictive Methods for Analyzing Unstructured Information. Springer Science+Business Media, Inc., 2010, XII, 237 p. doi: 10.1007/978-0-387-34555-0

13. Kozhevnikov V.A., Pankratova E.S. Research of the text data vectorization and classification algorithms of machine learning. Theoretical & Applied Science, 2020, no. 5, pp. 574–585. doi: 10.15863/TAS.2020.05.85.106

14. Canbek G., Temizel T.T., Sagiroglu S., Baykal N. Binary classification performance measures/metrics: A comprehensive visualized roadmap to gain new insights. Proc. 2nd International Conference on Computer Science and Engineering (UBMK), 2017, pp. 821–826. doi: 10.1109/UBMK.2017.8093539

15. Koyejo O., Natarajan N., Ravikumar P., Dhillon I.S. Consistent binary classification with generalized performance metrics. Advances in Neural Information Processing Systems, 2014, vol. 27, pp. 2744–2752.

16. Lee J. Can investors detect managers' lack of spontaneity? Adherence to predetermined scripts during earnings conference calls. Accounting Review, 2016, vol. 91, no. 1, pp. 229–250. doi: 10.2308/accr-51135

This work is licensed under a Creative Commons Attribution-NonCommercial 4.0 International License