Multi-agent adaptive routing by multi-headattention-based twin agents using reinforcement learning

Timofey A. Gribanov, Andrey A. Filchenkov , Artur A. Azarov, Shalyto Anatoly A.

2022 , VOLUME 22, NUMBER 6 ( november-december )

ISSN 2226-1494 (print), ISSN 2500-0373 (online)

Publications

Editor-in-Chief

Nikiforov
Vladimir O.
D.Sc., Prof.

Partners

doi: 10.17586/2226-1494-2022-22-6-1178-1186

Multi-agent adaptive routing by multi-headattention-based twin agents using reinforcement learning

T. A. Gribanov, A. A. Filchenkov, A. A. Azarov, A. A. Shalyto

Read the full article

Article in Russian

For citation:

Gribanov T.A., Filchenkov A.A., Azarov A.A., Shalyto A.A. Multi-agent adaptive routing by multi-headattention- based twin agents using reinforcement learning. Scientific and Technical Journal of Information Technologies, Mechanics and Optics, 2022, vol. 22, no. 6, pp. 1178–1186 (in Russian). doi: 10.17586/2226-1494-2022-22-6-1178-1186

Abstract

A regular condition, typical for packet routing, for the problem of cargo transportation, and for the problem of flow control, is the variability of the graph. Reinforcement learning based adaptive routing algorithms are designed to solve the routing problem with this condition. However, with significant changes in the graph, the existing routing algorithms require complete retraining. To handle this challenge, we propose a novel method based on multi-agent modeling with twin-agents for which new neural network architecture with multi-headed internal attention is proposed, pre-trained within the framework of the multi-view learning paradigm. An agent in such a paradigm uses a vertex as an input, twins of the main agent are placed at the vertices of the graph and select a neighbor to which the object should be transferred. We carried out a comparative analysis with the existing DQN-LE-routing multi-agent routing algorithm on two stages: pre-training and simulation. In both cases, launches were considered by changing the topology during testing or simulation. Experiments have shown that the proposed adaptability enhancement method provides global adaptability by increasing delivery time only by 14.5 % after global changes occur. The proposed method can be used to solve routing problems with complex path evaluation functions and dynamically changing graph topologies, for example, in transport logistics and for managing conveyor belts in production.

Keywords: routing, multi-agent learning, reinforcement learning, adaptive routing

Acknowledgements. The study was supported by the grant from the Russian Science Foundation (project no. 20-19-00700).

References

Toth P., Vigo D. An overview of vehicle routing problems. The Vehicle Routing Problem. SIAM, 2002, pp. 1–26. https://doi.org/10.1137/1.9780898718515.ch1
Vutukury S., Garcia-Luna-Aceves J.J. MDVA: A distance-vector multipath routing protocol.Proc. 20^th Annual Joint Conference on the IEEE Computer and Communications Societies (INFOCOM), vol. 1, pp. 557–564. https://doi.org/10.1109/INFCOM.2001.916780
Clausen T., Jacquet P. Optimized link state routing protocol (OLSR), 2003, no. RFC3626. https://doi.org/10.17487/RFC3626
Sweda T.M., Dolinskaya I.S., Klabjan D. Adaptive routing and recharging policies for electric vehicles. Transportation Science, 2017, vol. 51, no. 4, pp. 1326–1348. https://doi.org/10.1287/trsc.2016.0724
Puthal M.K., Singh V., Gaur M.S., Laxmi V. C-Routing: An adaptive hierarchical NoC routing methodology. Proc. of the 2011 IEEE/IFIP 19^th International Conference on VLSI and System-on-Chip, 2011, pp. 392–397. https://doi.org/10.1109/VLSISoC.2011.6081616
Zeng S., Xu X., Chen Y. Multi-agent reinforcement learning for adaptive routing: A hybrid method using eligibility traces. Proc. of the 16^th IEEE International Conference on Control & Automation (ICCA'20), 2020, pp. 1332–1339. https://doi.org/10.1109/ICCA51439.2020.9264518
Ibrahim A.M., Yau K.L.A., Chong Y.W., Wu C. Applications of multi-agent deep reinforcement learning: models and algorithms. Applied Sciences, 2021, vol. 11, no. 22, pp. 10870. https://doi.org/10.3390/app112210870
Bono G., Dibangoye J.S., Simonin O., Matignon L., Pereyron F. Solving multi-agent routing problems using deep attention mechanisms. IEEE Transactions on Intelligent Transportation Systems, 2021, vol. 22, no. 12, pp. 7804–7813. https://doi.org/10.1109/TITS.2020.3009289
Kang Y., Wang X., Lan Z. Q-adaptive: A multi-agent reinforcement learning based routing on dragonfly network. Proc. of the 30^th International Symposium on High-Performance Parallel and Distributed Computing, 2021, pp. 189–200. https://doi.org/10.1145/3431379.3460650
Choi S., Yeung D.Y. Predictive Q-routing: A memory-based reinforcement learning approach to adaptive traffic control. Advances in Neural Information Processing Systems, 1995, vol. 8, pp. 945–951.
Watkins C.J., Dayan P. Q-learning.Machine Learning, 1992, vol. 8, no. 3,pp. 279–292. https://doi.org/10.1023/A:1022676722315
Mnih V., Kavukcuoglu K., Silver D., Graves A., Antonoglou I., Wierstra D., Riedmiller M. Playing atari with deep reinforcement learning. arXiv, 2013, arXiv:1312.5602. https://doi.org/10.48550/arXiv.1312.5602
Mukhutdinov D., Filchenkov A., Shalyto A., Vyatkin V. Multi-agent deep learning for simultaneous optimization for time and energy in distributed routing system. Future Generation Computer Systems,2019,vol. 94,pp. 587–600.https://doi.org/10.1016/j.future.2018.12.037
Gao B., Pavel L. On the properties of the softmax function with application in game theory and reinforcement learning. arXiv, 2017, arXiv:1704.00805. https://doi.org/10.48550/arXiv.1704.00805
Mukhudinov D. Decentralized conveyor system control algorithm using multi-agent reinforcement learning methods. MSc Dissertation. St. Petersburg, ITMO University, 2019, 92 p. Available at: http://is.ifmo.ru/diploma-theses/2019/2_5458464771026191430.pdf (accessed: 01.10.2022). (in Russian)
Belkin M., Niyogi P. Laplacian eigenmaps and spectral techniques for embedding and clustering.Advances in Neural Information Processing Systems, 2001, pp. 585–591. https://doi.org/10.7551/mitpress/1120.003.0080
Benea M.T., Florea A.M., Seghrouchni A.E.F. CAmI: An agent oriented-language for the collective development of AmI environments.Proc. of the 20^th International Conference on Control Systems and Computer Science (CSCS), 2015, pp. 749–756. https://doi.org/10.1109/CSCS.2015.136
Wang Y., Yao Q., Kwok J.T., Ni L.M. Generalizing from a few examples: A survey on few-shot learning. ACM Computing Surveys, 2020, vol. 53,no. 3, pp. 63. https://doi.org/10.1145/3386252
Liu J., Chen S., Wang B., Zhang J., Li N., Xu T. Attention as relation: learning supervised multi-head self-attention for relation extraction.Proc. of the 19^th International Joint Conferences on Artificial Intelligence (IJCAI), 2020, pp. 3787–3793. https://doi.org/10.24963/ijcai.2020/524
Sola J., Sevilla J. Importance of input data normalization for the application of neural networks to complex industrial problems.IEEE Transactions on Nuclear Science, 1997, vol. 44, no. 3,pp. 1464–1468. https://doi.org/10.1109/23.589532
Baldi P., Sadowski P.J. Understanding dropout. Advances in Neural Information Processing Systems, 2013, vol. 26, pp. 26–35.

This work is licensed under a Creative Commons Attribution-NonCommercial 4.0 International License