Compound quality model for recommender system evaluation

Aleksei M. Tsyplov, Boukhanovsky Alexander V

2025 , VOLUME 25, NUMBER 6 ( november-december )

ISSN 2226-1494 (print), ISSN 2500-0373 (online)

Publications

Editor-in-Chief

Nikiforov
Vladimir O.
D.Sc., Prof.

Partners

doi: 004.05

Compound quality model for recommender system evaluation

A. M. Tsyplov, A. V. Boukhanovsky

Read the full article

Article in Russian

For citation:

Tsyplov A.M., Boukhanovsky A.V. Compound quality model for recommender system evaluation. Scientific and Technical Journal of Information Technologies, Mechanics and Optics, 2025, vol. 25, no. 6, pp. 1117–1124 (in Russian). doi: 10.17586/2226-1494-2025-25-6-1117-1124

Abstract

The study examines approaches to quantifying various effects, such as Position bias, Popularity Bias, and others, in recommender systems. A new quality model of the recommendation algorithms is proposed which reduces the selected metrics to one unit of measurement and determines its impact on the system for each effect. The obtained scores allow for a deeper comparative analysis of various algorithms as well as investigation the behavior of the algorithm in different user segments. For each metric, two conditional marginal distribution densities are built within the framework of the model: separately based on relevant and irrelevant recommendations. Based on the comparison of these densities, the set of possible metric values is divided into normal and critical. The model evaluates the impact of each effect on the system based on the frequency of hitting the values of the corresponding metric in its critical area. To demonstrate how the model works, four recommendation algorithms were analyzed on the MovieLens-100K academic dataset. During the testing, Popularity Bias, the lack of novelty in recommendations, and the tendency of algorithms to recommend objects solely based on user demographic data were evaluated. For each effect, an assessment of its impact on the system is constructed, and an example of predicting an upper estimate of the system quality is given if the corresponding effect is eliminated. The study demonstrated that metrics of effects such as Popularity or Position Bias can change the distribution of absolute values depending on the system. One of the ways to compare different recommendation algorithms more reliably is the proposed quality model. The model is suitable for evaluating personal recommendations, regardless of the scope of application and the algorithm that was used to build them.

Keywords: recommendation systems, ranking, evaluation of the quality of recommendations, popularity bias, position bias, machine learning

References

1. Anderson A., Maystre L., Anderson I., Mehrotra R., Lalmas M. Algorithmic effects on the diversity of consumption on spotify // Proc. of the Web Conference. 2020. P. 2155–2165. https://doi.org/10.1145/3366423.3380281

2. Avazpour I., Pitakrat T., Grunske L., Grundy J. Dimensions and metrics for evaluating recommendation systems // Recommendation Systems in Software Engineering. 2014. P. 245–273. https://doi.org/10.1007/978-3-642-45135-5_10

3. Ding H., Kveton B., Ma Y., Park Y., Kini V., Gu Y., et al. Trending now: modeling trend recommendations // Proc. of the 17^th ACM Conference on Recommender Systems. 2023. P. 294–305. https://doi.org/10.1145/3604915.3608810

4. Cai Y., Guo J., Fan Y., Ai Q., Zhang R., Cheng X. Hard negatives or false negatives: correcting pooling bias in training neural ranking models // Proc. of the 31^st ACM International Conference on Information and Knowledge Management. 2022. P. 118–127. https://doi.org/10.1145/3511808.3557343

5. Abdollahpouri H., Mansoury M., Burke R., Mobasher B. The connection between popularity bias, calibration, and fairness in recommendation // Proc. of the 14^th ACM Conference on Recommender Systems. 2020. P. 726–731. https://doi.org/10.1145/3383313.3418487

6. Beel J., Langer S., Genzmehr M., Gipp B., Breitinger C., Nürnberger A. Research paper recommender system evaluation: a quantitative literature survey // Proc. of the International Workshop on Reproducibility and Replication in Recommender Systems Evaluation. 2013. P. 15–22. https://doi.org/10.1145/2532508.2532512

7. Wasilewski J., Hurley N. Incorporating diversity in a learning to rank recommender system // Proc. of the 29^th International Florida Artificial Intelligence Research Society Conference. 2016. P. 1–6.

8. Ricci F., Rokach L., Shapira B. Recommender Systems Handbook. Springer, 2010. 842 p.

9. Said A., Bellogin A. Comparative recommender system evaluation: benchmarking recommendation frameworks // Proc. of the 8^th ACM Conference on Recommender Systems. 2014. P. 129–136. https://doi.org/10.1145/2645710.2645746

10. Wilhelm M., Ramanathan A., Bonomo A., Jain S., Chi E.H., Gillenwater J. Practical diversified recommendations on YouTube with determinantal point processes // Proc. of the 27^th ACM International Conference on Information and Knowledge Management. 2018. P. 2165–2173. https://doi.org/10.1145/3269206.3272018

11. Chang Bo, Meng C., Ma H., Chang S., Gu Y., Peng Y., et al. Cluster anchor regularization to alleviate popularity bias in recommender systems // Proc. of the Companion Proceedings of the ACM Web Conference. 2024. P. 151–160. https://doi.org/10.1145/3589335.3648312

12. Bellogin A., Castells P., Cantador I. Precision-oriented evaluation of recommender systems: an algorithmic comparison // Proc. of the 5^thACM Conference on Recommender Systems. 2011. P. 333–336. https://doi.org/10.1145/2043932.2043996

13. Cremonesi P., Koren Y., Turrin R. Performance of recommender algorithms on top-n recommendation tasks // Proc. of the 4^th ACM Conference on Recommender Systems. 2010. P. 39–46. https://doi.org/10.1145/1864708.1864721

14. Abdollahpouri H., Burke R., Mobasher B. Managing popularity bias in recommender systems with personalized re-ranking // Proc. of the 32^nd International Florida Artificial Intelligence Research Society Conference. 2019. P. 1–6.

15. Yi X., Yang J., Hong L., Cheng D.Z., Heldt L., Kumthekar A., Zhao Z., Wei L., Chi E. Sampling-bias-corrected neural modeling for large corpus item recommendations // Proc. of the 13^th ACM Conference on Recommender Systems. 2019. P. 269–277. https://doi.org/10.1145/3298689.3346996

16. Silveira T., Zhang M., Lin X., Liu Y., Ma S. How good your recommender system is? A survey on evaluations in recommendation // International Journal of Machine Learning and Cybernetics. 2019. V. 10. N 5. P. 813–831. https://doi.org/10.1007/s13042-017-0762-9

17. Akiyama T., Obara K., Tanizaki M. Proposal and evaluation of serendipitous recommendation method using general unexpectedness // CEUR Workshop Proceedings. 2010. V. 676. P. 3–10.

18. Scott L.M., Su-In L. A unified approach to interpreting model predictions // Proc. of the 31^st Conference on Neural Information Processing Systems. 2017. P. 1–10.

19. Isinkaye F.O., Folajimi Y.O., Ojokoh B.A. Recommendation systems: principles, methods and evaluation // Egyptian Informatics Journal. 2015. V. 16. N 3. P. 261–273. https://doi.org/10.1016/j.eij.2015.06.005

20. Rhee W., Cho S.-M., Suh B. Countering popularity bias by regularizing score differences // Proc. of the 16^th ACM Conference on Recommender Systems. 2022. P. 145–155. https://doi.org/10.1145/3523227.3546757

21. Shani G., Gunawardana A. Evaluating recommendation systems // Recommender Systems Handbook. 2010. P. 257–297. https://doi.org/10.1007/978-0-387-85820-3_8

This work is licensed under a Creative Commons Attribution-NonCommercial 4.0 International License