Menu
Publications
2025
2024
2023
2022
2021
2020
2019
2018
2017
2016
2015
2014
2013
2012
2011
2010
2009
2008
2007
2006
2005
2004
2003
2002
2001
Editor-in-Chief
Nikiforov
Vladimir O.
D.Sc., Prof.
Partners
doi: 10.17586/2226-1494-2025-25-5-844-855
A universal architecture model of a crowdsourcing medical data labeling system designed
Read the full article
Article in Russian
For citation:
Abstract
For citation:
Kovalenko L.A., Blekanov I.S., Ezhov F.V., Larin E.S., Kim G.I. A universal architecture model of a crowdsourcing medical data labeling system designed. Scientific and Technical Journal of Information Technologies, Mechanics and Optics, 2025, vol. 25, no. 5, pp. 844–855 (in Russian). doi: 10.17586/2226-1494-2025-25-5-844-855
Abstract
Machine Learning (ML) and Artificial Intelligence (AI) methods are used to process and intelligently analyze medical data. The application of ML/AI methods requires specialized sets of labeled medical data of large dimensions. Process organization of quality medical data labeling requires the involvement of a large number assessors and specialists in a particular field of medicine as well as the availability of specialized tools for labeling process optimization considering the specifics of medical data processing. In this paper a universal architectural model of a crowdsourcing system specifically designed for medical data labeling was proposed. The model supports processing of diverse medical data formats, incorporates data anonymization mechanisms and multi-level quality control, while enabling a distributed annotation process with expert community involvement. As a result, classification of actual problems of the process of medical data labeling and data collection, and a quality and safety criteria for comparative analysis of medical data labeling systems was detected and formulated. The scheme of generalized scenario of users’ groups interaction with crowdsourcing system in the context of solving AI problems in the field of medicine was proposed. A universal model of such system architecture was designed and a specialized crowdsourcing system of medical data labeling based on Computer Vision Annotation Tool was implemented on its basis. Testing and approbation of the realized system was carried out at the Pirogov Clinic of High Medical Technologies. The proposed universal model of crowdsourcing system architecture can be used to improve the efficiency and safety of organization and construction of the process of labeling patients’ medical data in the context of solving various applied ML/AI tasks, such as semantic segmentation of internal organs and their pathologies, detection and classification of diseases based on medical images (e.g. computed tomography scans). The developed solution can be used by doctors of various specializations, researchers and developers aimed at the development and creation of methods and technologies of AI in the field of medicine.
Keywords: crowdsourcing, medical data annotation, software architecture model, quality criteria for crowdsourcing systems, medical data processing, use case
References
References
1. Topol E. Deep Medicine: How Artificial Intelligence Can Make Healthcare Human Again. Basic Books, 2019, 341 p.
2. Obermeyer Z., Emanuel E.J. Predicting the future — big data, machine learning, and clinical medicine. New England Journal of Medicine, 2016, vol. 375, no. 13, pp. 1216–1219. https://doi.org/10.1056/nejmp1606181
3. Jiang F., Jiang Y., Zhi H., Dong Y., Li H., Ma S., et al.Artificial intelligence in healthcare: past, present and future. Stroke and Vascular Neurology, 2017, vol. 2, no. 4, pp. 230–243. https://doi.org/10.1136/svn-2017-000101
4. Secinaro S., Calandra D., Secinaro A., Muthurangu V., Biancone P. The role of artificial intelligence in healthcare: a structured literature review. BMC Medical Informatics and Decision Making, 2021, vol. 21, no. 1, pp. 125. https://doi.org/10.1186/s12911-021-01488-9
5. Roh Y., Heo G., Whang S.E. A survey on data collection for machine learning: a big data – Al Integration perspective. IEEE Transactions on Knowledge and Data Engineering, 2021, vol. 33, no. 4, pp. 1328–1347. https://doi.org/10.1109/TKDE.2019.2946162
6. Apanasovich K.S., Makhnytkina O.V., Kabarov V.I., Dalevskaya O.P. RuPersonaChat: a dialog corpus for personalizing conversational agents. Scientific and Technical Journal of Information Technologies, Mechanics and Optics, 2024, vol. 24, no. 2, pp. 214–221. (in Russian). https://doi.org/10.17586/2226-1494-2024-24-2-214-221
7. Shaheen Z., Mouromtsev D.I., Postny I. RuLegalNER: a new dataset for Russian legal named entities recognition. Scientific and Technical Journal of Information Technologies, Mechanics and Optics, 2023, vol. 23, no. 4, pp. 854–857. https://doi.org/10.17586/2226-1494-2023-23-4-854-857
8. Sayin B., Krivosheev E., Yang J., Passerini A., Casati F. A review and experimental analysis of active learning over crowdsourced data. Artificial Intelligence Review, 2021, vol. 54, no. 7, pp. 5283–5305. https://doi.org/10.1007/s10462-021-10021-3
9. Xintong G., Hongzhi W., Song Y., Hong G. Brief survey of crowdsourcing for data mining. Expert Systems With Application, 2014, vol. 41, no. 17, pp. 7987–7994. https://doi.org/10.1016/j.eswa.2014.06.044
10. Hecht R., Kalla M., Krüger T. Crowd-sourced data collection to support automatic classification of building footprint data. Proc. of the ICA, 2018, vol. 1, pp. 54. https://doi.org/10.5194/ica-proc-1-54-2018
11. Mnih V., Kavukcuoglu K., Silver D., Rusu A.A., Veness J., Bellemare M.G., et al. Human-level control through deep reinforcement learning. Nature, 2015, vol. 518, no. 7540, pp. 529–533. https://doi.org/10.1038/nature14236
12 Rahmani A.M., Yousefpoor E., Yousefpoor M.S., Mehmood Z., Haider A., Hosseinzadeh M., Naqvi R.A. Machine learning (ML) in medicine: review, applications, and challenges. Mathematics, 2021, vol. 9, no. 22, pp. 2970. https://doi.org/10.3390/math9222970
13. Wang C., Han L., Stein G., Day S., Bien-Gund C, Mathews A., et al. Crowdsourcing in health and medical research: a systematic review. Infectious Diseases of Poverty, 2020, vol. 9, no. 1, pp. 8. https://doi.org/10.1186/s40249-020-0622-9
14. Ellis R.J., Sander R.M., Limon A. Twelve key challenges in medical machine learning and solutions. Intelligence-Based Medicine, 2022, vol. 6, pp. 100068. https://doi.org/10.1016/j.ibmed.2022.100068
15. Xia H., McKernan B. Privacy in crowdsourcing: a review of the threats and challenges. Computer Supported Cooperative Work (CSCW), 2020, vol. 29, no. 3, pp. 263–301. https://doi.org/10.1007/s10606-020-09374-0
16. Rother A., Niemann U., Hielscher T., Völzke H., Ittermann T., Spiliopoulou M. Assessing the difficulty of annotating medical data in crowdworking with help of experiments. PLOS ONE, 2021, vol. 16, no. 7, pp. e0254764. https://doi.org/10.1371/journal.pone.0254764
17. Ye C., Coco J., Epishova A., Hajaj C., Bogardus H., Novak L., et al. A crowdsourcing framework for medical data sets. AMIA Joint Summits on Translational Science proceedings, 2018, pp. 273–280.
18. Kittur A., Nickerson J., Bernstein M., Gerber E., Shaw A., Zimmerman J., et al. The future of crowd work. Proc. of the Conference on Computer Supported Cooperative Work, 2013, pp. 1301–1318. https://doi.org/10.1145/2441776.2441923
19. Ørting S.N., Doyle A., van Hilten A., Hirth M., Inel O., Madan C.R., et al. A survey of crowdsourcing in medical image analysis. Human Computation, 2020, vol. 7, no. 1, pp. 1–26. https://doi.org/10.15346/hc.v7i1.1
20. Lu J., Li W., Wang Q., Zhang Y. Research on data quality control of crowdsourcing annotation: a survey // Proc. of the IEEE Intl Conf on Dependable, Autonomic and Secure Computing, Intl Conf on Pervasive Intelligence and Computing, Intl Conf on Cloud and Big Data Computing, Intl Conf on Cyber Science and Technology Congress (DASC/PiCom/CBDCom/CyberSciTech), 2020, pp. 201–208. https://doi.org/10.1109/DASC-PICom-CBDCom-CyberSciTech49142.2020.00044
21. Lu X., Ratcliffe D., Kao T.-T., Tikhonov A., Litchfield L., Rodger C., Wang K. Rethinking quality assurance for crowdsourced multi-ROI image segmentation. Proc. of the 11th AAAI Conference on Human Computation and Crowdsourcing, 2023, vol. 11, no. 1, pp. 103–114. https://doi.org/10.1609/hcomp.v11i1.27552
22. Teslenko E.V. Artificial intelligence in medicine. Legal aspects. Proc. of the Science of the young - the future of Russia. 2023.pp. 435–438.(in Russian)
23. Hulsen T. Sharing is caring—data sharing initiatives in healthcare. International Journal of Environmental Research and Public Health, 2020, vol. 17, no. 9, pp. 3046. https://doi.org/10.3390/ijerph17093046
24. Sims M.H., Shaw M.H., Gilbertson S., Storch J., Halterman M.W.Legal and ethical issues surrounding the use of crowdsourcing among healthcare providers. Health Informatics Journal, 2019, vol. 25, no. 4, pp. 1618–1630. https://doi.org/10.1177/1460458218796599
25. Mason W., Suri S. Conducting behavioral research on Amazon’s Mechanical Turk. Behavior Research Methods, 2012, vol. 44, no. 1, pp. 1–23. https://doi.org/10.3758/s13428-011-0124-6
26. Buecheler T., Sieg J.H., Füchslin R.M., Pfeifer R. Crowdsourcing, open innovation and collective intelligence in the scientific method: a research agenda and operational framework. Proc. of the 12th International Conference on the Synthesis and Simulation of Living Systems, 2010, pp. 679–686.
27. Dortheimer J. Collective intelligence in design crowdsourcing. Mathematics, 2022, vol. 10, no. 4, pp. 539. https://doi.org/10.3390/math10040539
28. Le K.H., Tran T.V., Pham H.H., Nguyen H.T., Le T.T., Nguyen H.Learning from multiple expert annotators for enhancing anomaly detection in medical image analysis. IEEE Access, 2023, vol. 11, pp. 14105–14114. https://doi.org/10.1109/ACCESS.2023.3243845
29. Petrović N., Moyà-Alcover G., Varona J., Jaume-i-Capó A. Crowdsourcing human-based computation for medical image analysis: a systematic literature review. Health Informatics Journal, 2020, vol. 26, no. 4, pp. 2446–2469. https://doi.org/10.1177/1460458220907435
30. Vindas Y., GuépiéB.K., Almar M., Roux E., Delachartre P.Semi-automatic data annotation based on feature-space projection and local quality metrics: An application to cerebral emboli characterization. Medical Image Analysis, 2022, vol. 79, pp. 102437. https://doi.org/10.1016/j.media.2022.102437
31. Philbrick K. A., Weston A.D., Akkus Z., Kline T.L., Korfiatis P., Sakinis T., et al. RIL-Contour: a medical imaging dataset annotation tool for and with deep learning. Journal of Digital Imaging, 2019, vol. 32, no. 4, pp. 571–581. https://doi.org/10.1007/s10278-019-00232-0
32. Li H., Zhang B., Zhang Y., Liu W.W., Mao Y.J., Huang J.C., Wei L.F.A semi-automated annotation algorithm based on weakly supervised learning for medical images. Biocybernetics and Biomedical Engineering, 2020, vol. 40, no. 2, pp. 787–802. https://doi.org/10.1016/j.bbe.2020.03.005
33. Larobina M., Murino L. Medical image file formats. Journal of Digital Imaging, 2014, vol. 27, no. 2, pp. 200–206. https://doi.org/10.1007/s10278-013-9657-9
34. Willemink M.J., Koszek W.A., Hardell C., Wu J., Fleischmann D., Harvey H.,et al. Preparing medical imaging data for machine learning. Radiology, 2020, vol. 295, no. 1, pp. 4–15. https://doi.org/10.1148/radiol.2020192224
35. Pfob A., Lu S.-C., Sidey-Gibbons C. Machine learning in medicine: a practical introduction to techniques for data pre-processing, hyperparameter tuning, and model comparison. BMC Medical Research Methodology, 2022, vol. 22, no. 1, pp. 282. https://doi.org/10.1186/s12874-022-01758-8
36. Kondratenko S.S., Korzhuk V.M. Architecture of a medical data processing system taking into account integrity requirements. Collection of abstracts from the Congress of Young Scientists. 2023. Available at: https://kmu.itmo.ru/digests/article/11444 (in Russian)
37. Vasilev Y.A., Savkina E.F., Vladzimirskii A.V., Omelianskaia O.V., Arzamasov K.M. Overview of modern digital diagnostic image markup tools. Kazan Medical Journal, 2023, vol. 104, no. 5, pp. 750–760. (in Russian). https://doi.org/10.17816/KMJ349060
38. Ezhov F.V., Kovalenko L.A., Razumilov E.S., Blekanov I.S. Crowdsourcing tools for the analysis and processing of medical CT images. Processy Upravlenija i Ustojchivost', 2023,vol. 10, no. 1,pp. 291–297. (in Russian)
39. Saltz J.S., Krasteva I. Current approaches for executing big data science projects—a systematic literature review. PeerJ Computer Science, 2022, vol. 8, pp. e862. https://doi.org/10.7717/peerj-cs.862
40. Saltz J.S. CRISP-DM for data science: strengths, weaknesses and potential next steps. Proc. of the IEEE International Conference on Big Data, 2021, pp. 2337–2344. https://doi.org/10.1109/bigdata52589.2021.9671634
41. Saltz J., Hotz N. Factors that influence the selection of a data science process management methodology: an exploratory study. Proc. of the 54th Hawaii International Conference on System Sciences, 2021, pp. 949–958. https://doi.org/10.24251/hicss.2021.116
42. Zhao X., Zhang P., Song F., Fan G.D., Sun Y.Y., Wang Y.J.,et al. D2A U-Net: Automatic segmentation of COVID-19 CT slices based on dual attention and hybrid dilated convolution. Computers in Biology and Medicine, 2021, vol. 135, pp. 104526. https://doi.org/10.1016/j.compbiomed.2021.104526
43. Xie Y., Padgett J., Biancardi A.M., Reeves A.P. Automated aorta segmentation in low-dose chest CT images. International Journal of Computer Assisted Radiology and Surgery, 2014, vol. 9, no. 2, pp. 211–219. https://doi.org/10.1007/s11548-013-0924-5
44. Kim G.I., Blekanov I.S., Ezhov F.V., Kovalenko L.A., Larin E.S., Razumilov E.S., et al. Artificial intelligence methods in cardiovascular surgery and diagnosis of pathology of the aorta and aortic valve (literature review). Siberian Journal of Clinical and Experimental Medicine, 2024, vol. 39, no. 2, pp. 36–45. (in Russian). https://doi.org/10.29001/2073-8552-2024-39-2-36-45
45. Gao R., Zhao S., Aishanjiang K., Cai H., Wei T., Zhang Y.C.,et al. Deep learning for differential diagnosis of malignant hepatic tumors based on multi-phase contrast-enhanced CT and clinical data. Journal of Hematology & Oncology, 2021, vol. 14, no. 1, pp. 154. https://doi.org/10.1186/s13045-021-01167-2
46. Chen P.-T., Wu T.H., Wang P.C., Chang D.W., Liu K.L., Wu M.S., et al. Pancreatic cancer detection on CT scans with deep learning: a nationwide population-based study. Radiology, 2023, vol. 306, no. 1. pp. 172–182. https://doi.org/10.1148/radiol.220152
47. Zhou H., Li L., Liu Z., Zhao K.,Chen X., Lu M., et al. Deep learning algorithm to improve hypertrophic cardiomyopathy mutation prediction using cardiac cine images. European Radiology, 2021, vol. 31, no. 6, pp. 3931–3940. https://doi.org/10.1007/s00330-020-07454-9

