A study of human motion in computer vision systems based on a skeletal model

Sophia A. Kazakova, Polina A. Leonteva, Maria I. Frolova, Donetskaya  Julia  V. , Popov Ilya Yu. , Kouznetsov Alexander   Yu.

2021 , VOLUME 21, NUMBER 4 ( July - August )

ISSN 2226-1494 (print), ISSN 2500-0373 (online)

Publications

Editor-in-Chief

Nikiforov
Vladimir O.
D.Sc., Prof.

Partners

doi: 10.17586/2226-1494-2021-21-4-571-577

A study of human motion in computer vision systems based on a skeletal model

S. A. Kazakova, P. A. Leonteva, M. I. Frolova, J. V. Donetskaya, I. Y. Popov, A. Y. Kouznetsov

Read the full article

Article in Russian

For citation:

Kazakova S.A., Leonteva P.A., Frolova M.I., Donetskaya Ju.V., Popov I.Yu., Kuznetsov A.Yu. A study of human motion in computer vision systems based on a skeletal model. Scientific and Technical Journal of Information Technologies, Mechanics and Optics, 2021, vol. 21, no. 4, pp. 571–577 (in Russian). doi: 10.17586/2226-1494-2021-21-4-571-577

Abstract

Methods of studying human motion in computer vision systems can be divided into two types. These are analysis in two-dimensional and three-dimensional space. The former uses a single camera image and/ or multiple body sensors. Such an approach leads to a rapid accumulation of error and, consequently, low accuracy of the figure representation. Multiple cameras are usually used in the case of three-dimensional space analysis, while the objects are represented as sets of volumetric elements. Despite the high accuracy of this method, it is associated with high computational complexity and internal network load. The purpose of the paper is to develop a model using a single camera, while approaching three-dimensional space analysis methods in terms of accuracy. In this paper a human figure is represented as a skeleton. The skeleton is described by an acyclic connected graph. The general structure of a human figure is analyzed. Fifteen basic points are selected. Physical and logical connections between them were studied and mathematically described. The velocity and spatial characteristics of the points and connections outline the general dynamics of motion. The study describes a model of human motion and gives the option for model construction on the example of a particular image. The developed algorithm for collection and analysis of information estimates relative locations and velocity characteristics of the graph elements. The model can be used for acquisition of information about the reference dynamics of human movements. In case of detecting major differences between the reference and the reality, the behavior is defined as deviant. Thus, the obtained algorithm can be applied in computer vision systems for detection and analysis of human movements.

Keywords: computer vision, human motion analysis, behavioral analytics, motion detection, skeletal model

Acknowledgements. This work is partially supported by the Ministry of Science and Higher Education of Russian Federation, passport of goszadanie no. 2019-0898

References

Valčık J. Similarity models for human motion data. Ph.D. Thesis. Brno: Masaryk University, 2016. Available at: https://is.muni.cz/th/wx926/thesis.pdf (accessed: 07.04.2021).
Rogez G., Weinzaepfel P., Schmid C. LCR-Net++: Multi-person 2D and 3D pose detection in natural images. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2020, vol. 42, no. 5, pp. 1146–1161. https://doi.org/10.1109/TPAMI.2019.2892985
Ke Q., Bennamoun M., An S., Sohel F., Boussaid F. A new representation of skeleton sequences for 3D action recognition. Proc. 30^th IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2017, pp. 4570–4579. https://doi.org/10.1109/CVPR.2017.486
Vox J.P., Wallhoff F. Preprocessing and normalization of 3D-skeleton-data for human motion recognition. Proc. IEEE Life Sciences Conference (LSC). Montreal, QC, Canada. 2018, pp. 279–282. https://doi.org/10.1109/LSC.2018.8572153
Shin S., Halilaj E. Multi-view human pose and shape estimation using learnable volumetric aggregation. arXiv.org, 2020, arXiv:2011.13427
Innmann M., Zollhofer M., Nießner M., Theobalt C., Stamminger M. Volumedeform: Real-time volumetric non-rigid reconstruction. Lecture Notes in Computer Science, 2016, vol. 9912, pp. 362–379. https://doi.org/10.1007/978-3-319-46484-8_22
Liu Y., Wang K., Li G., Lin L. Semantics-aware adaptive knowledge distillation for sensor-to-vision action recognition. IEEE Transactions on Image Processing, 2021, vol. 30, pp. 5573–5588. https://doi.org/10.1109/TIP.2021.3086590
Xiang D., Joo H., Sheikh Y. Monocular total capture: Posing face, body, and hands in the wild. Proc. of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 2019, pp. 10957–10966. https://doi.org/10.1109/CVPR.2019.01122
Tanke J., Gall J. Iterative greedy matching for 3D human pose tracking from multiple views. Lecture Notes in Computer Science, 2019, vol. 11824, pp. 537–550. https://doi.org/10.1007/978-3-030-33676-9_38
Elanattil S., Moghadam P. Synthetic data for non-rigid 3D reconstruction using a moving RGB-D camera. CSIRO, Data Collection, 2018, vol. 2. https://doi.org/10.25919/5b7b60176d0cd
Wang Q. A survey of visual analysis of human motion and its applications. arXiv.org, 2016, arXiv:1608.00700.
Aggarwal J., Cai Q. Human motion analysis: A Review. Computer Vision and Image Understanding, 1999, vol. 73, no. 3, pp. 428–440. https://doi.org/10.1006/cviu.1998.0744
Kok M., Eckhoff K., Weygers I., Seel T. Observability of the relative motion from inertial data in kinematic chains. arXiv.org, 2021, arXiv:2102.02675.
Eriksson D., Harstrom J. Object detection by cluster analysis on 3D-points from a LiDAR sensor. Master’s thesis in Systems, Control and Mechatronics. Chalmers University of Technology, Sweden, 2019. Available at: https://odr.chalmers.se/bitstream/20.500.12380/257323/1/257323.pdf (accessed: 07.04.2021).
Egorov Y.A. Research of effectiveness of classical approaches for solving the problem of human pose classification using skeletal model. Information Technologies and Systems. 8^th Annual International Workshop, 2019, pp. 148–151. (in Russian)
Kataev M.Yu., Kataeva N.G., Korobko A.P., Shaymardanov T.M. Methodology to build a frontal skeletal model of a human figure during walking using images. Proceedings of TUSUR University, 2017, vol. 20, no. 4, pp. 109–112. (in Russian). https://doi.org/10.21293/1818-0442-2017-20-4-109-112
Vaganov S. E. A method for dynamic segmentation of a pair of sequental video-frames. Computer Optics, 2019, vol. 43, no. 1, pp. 83–89. (in Russian). https://doi.org/10.18287/2412-6179-2019-43-1-83-89
Driggers R.G., Cox P.G., Kelley M. National imagery interpretation rating system and the probabilities of detection, recognition, and identification. Optical Engineering, 1997, vol. 36, no. 7, pp. 1952–1959. https://doi.org/10.1117/1.601381

This work is licensed under a Creative Commons Attribution-NonCommercial 4.0 International License