doi: 10.17586/2226-1494-2022-22-6-1178-1186


Multi-agent adaptive routing by multi-headattention-based twin agents using reinforcement learning

T. A. Gribanov, A. A. Filchenkov, A. A. Azarov, A. A. Shalyto


Read the full article  ';
Article in Russian

For citation:
Gribanov T.A., Filchenkov A.A., Azarov A.A., Shalyto A.A. Multi-agent adaptive routing by multi-headattention- based twin agents using reinforcement learning. Scientific and Technical Journal of Information Technologies, Mechanics and Optics, 2022, vol. 22, no. 6, pp. 1178–1186 (in Russian). doi: 10.17586/2226-1494-2022-22-6-1178-1186


Abstract
A regular condition, typical for packet routing, for the problem of cargo transportation, and for the problem of flow control, is the variability of the graph. Reinforcement learning based adaptive routing algorithms are designed to solve the routing problem with this condition. However, with significant changes in the graph, the existing routing algorithms require complete retraining. To handle this challenge, we propose a novel method based on multi-agent modeling with twin-agents for which new neural network architecture with multi-headed internal attention is proposed, pre-trained within the framework of the multi-view learning paradigm. An agent in such a paradigm uses a vertex as an input, twins of the main agent are placed at the vertices of the graph and select a neighbor to which the object should be transferred. We carried out a comparative analysis with the existing DQN-LE-routing multi-agent routing algorithm on two stages: pre-training and simulation. In both cases, launches were considered by changing the topology during testing or simulation. Experiments have shown that the proposed adaptability enhancement method provides global adaptability by increasing delivery time only by 14.5 % after global changes occur. The proposed method can be used to solve routing problems with complex path evaluation functions and dynamically changing graph topologies, for example, in transport logistics and for managing conveyor belts in production.

Keywords: routing, multi-agent learning, reinforcement learning, adaptive routing

Acknowledgements. The study was supported by the grant from the Russian Science Foundation (project no. 20-19-00700).

References
  1. Toth P., Vigo D. An overview of vehicle routing problems. The Vehicle Routing Problem. SIAM, 2002, pp. 1–26. https://doi.org/10.1137/1.9780898718515.ch1
  2. Vutukury S., Garcia-Luna-Aceves J.J. MDVA: A distance-vector multipath routing protocol.Proc. 20th Annual Joint Conference on the IEEE Computer and Communications Societies (INFOCOM), vol. 1, pp. 557–564. https://doi.org/10.1109/INFCOM.2001.916780
  3. Clausen T., Jacquet P. Optimized link state routing protocol (OLSR), 2003, no. RFC3626. https://doi.org/10.17487/RFC3626
  4. Sweda T.M., Dolinskaya I.S., Klabjan D. Adaptive routing and recharging policies for electric vehicles. Transportation Science, 2017, vol. 51, no. 4, pp. 1326–1348. https://doi.org/10.1287/trsc.2016.0724
  5. Puthal M.K., Singh V., Gaur M.S., Laxmi V. C-Routing: An adaptive hierarchical NoC routing methodology. Proc. of the 2011 IEEE/IFIP 19th International Conference on VLSI and System-on-Chip, 2011, pp. 392–397. https://doi.org/10.1109/VLSISoC.2011.6081616
  6. Zeng S., Xu X., Chen Y. Multi-agent reinforcement learning for adaptive routing: A hybrid method using eligibility traces. Proc. of the 16th IEEE International Conference on Control & Automation (ICCA'20), 2020, pp. 1332–1339. https://doi.org/10.1109/ICCA51439.2020.9264518
  7. Ibrahim A.M., Yau K.L.A., Chong Y.W., Wu C. Applications of multi-agent deep reinforcement learning: models and algorithms. Applied Sciences, 2021, vol. 11, no. 22, pp. 10870. https://doi.org/10.3390/app112210870
  8. Bono G., Dibangoye J.S., Simonin O., Matignon L., Pereyron F. Solving multi-agent routing problems using deep attention mechanisms. IEEE Transactions on Intelligent Transportation Systems, 2021, vol. 22, no. 12, pp. 7804–7813. https://doi.org/10.1109/TITS.2020.3009289
  9. Kang Y., Wang X., Lan Z. Q-adaptive: A multi-agent reinforcement learning based routing on dragonfly network. Proc. of the 30th International Symposium on High-Performance Parallel and Distributed Computing, 2021, pp. 189–200. https://doi.org/10.1145/3431379.3460650
  10. Choi S., Yeung D.Y. Predictive Q-routing: A memory-based reinforcement learning approach to adaptive traffic control. Advances in Neural Information Processing Systems, 1995, vol. 8, pp. 945–951.
  11. Watkins C.J., Dayan P. Q-learning.Machine Learning, 1992, vol. 8, no. 3,pp. 279–292. https://doi.org/10.1023/A:1022676722315
  12. Mnih V., Kavukcuoglu K., Silver D., Graves A., Antonoglou I., Wierstra D., Riedmiller M. Playing atari with deep reinforcement learning. arXiv, 2013, arXiv:1312.5602. https://doi.org/10.48550/arXiv.1312.5602
  13. Mukhutdinov D., Filchenkov A., Shalyto A., Vyatkin V. Multi-agent deep learning for simultaneous optimization for time and energy in distributed routing system. Future Generation Computer Systems,2019,vol. 94,pp. 587–600.https://doi.org/10.1016/j.future.2018.12.037
  14. Gao B., Pavel L. On the properties of the softmax function with application in game theory and reinforcement learning. arXiv, 2017, arXiv:1704.00805. https://doi.org/10.48550/arXiv.1704.00805
  15. Mukhudinov D. Decentralized conveyor system control algorithm using multi-agent reinforcement learning methods. MSc Dissertation. St. Petersburg, ITMO University, 2019, 92 p. Available at: http://is.ifmo.ru/diploma-theses/2019/2_5458464771026191430.pdf (accessed: 01.10.2022). (in Russian)
  16. Belkin M., Niyogi P. Laplacian eigenmaps and spectral techniques for embedding and clustering.Advances in Neural Information Processing Systems, 2001, pp. 585–591. https://doi.org/10.7551/mitpress/1120.003.0080
  17. Benea M.T., Florea A.M., Seghrouchni A.E.F. CAmI: An agent oriented-language for the collective development of AmI environments.Proc. of the 20th International Conference on Control Systems and Computer Science (CSCS), 2015, pp. 749–756. https://doi.org/10.1109/CSCS.2015.136
  18. Wang Y., Yao Q., Kwok J.T., Ni L.M. Generalizing from a few examples: A survey on few-shot learning. ACM Computing Surveys, 2020, vol. 53,no. 3, pp. 63. https://doi.org/10.1145/3386252
  19. Liu J., Chen S., Wang B., Zhang J., Li N., Xu T. Attention as relation: learning supervised multi-head self-attention for relation extraction.Proc. of the 19th International Joint Conferences on Artificial Intelligence (IJCAI), 2020, pp. 3787–3793. https://doi.org/10.24963/ijcai.2020/524
  20. Sola J., Sevilla J. Importance of input data normalization for the application of neural networks to complex industrial problems.IEEE Transactions on Nuclear Science, 1997, vol. 44, no. 3,pp. 1464–1468. https://doi.org/10.1109/23.589532
  21. Baldi P., Sadowski P.J. Understanding dropout. Advances in Neural Information Processing Systems, 2013, vol. 26, pp. 26–35.


Creative Commons License

This work is licensed under a Creative Commons Attribution-NonCommercial 4.0 International License
Copyright 2001-2024 ©
Scientific and Technical Journal
of Information Technologies, Mechanics and Optics.
All rights reserved.

Яндекс.Метрика