Joint recognition of text and layout in historical Russian documents

Mohammed Samah, Teslya Nikolay

2023 , VOLUME 23, NUMBER 3 ( March-April )

ISSN 2226-1494 (print), ISSN 2500-0373 (online)

Publications

Editor-in-Chief

Nikiforov
Vladimir O.
D.Sc., Prof.

Partners

doi: 10.17586/2226-1494-2023-23-3-585-594

Joint recognition of text and layout in historical Russian documents

S. Mohammed, N. Teslya

Read the full article

Article in English

For citation:

Mohammed S., Teslya N. Joint recognition of text and layout in historical Russian documents. Scientific and Technical Journal of Information Technologies, Mechanics and Optics, 2023, vol. 23, no. 3, pp. 585–594. doi: 10.17586/2226-1494-2023-23-3-585-594

Abstract

In this paper, we evaluated the Document Attention Network (DAN), the first end-to-end segmentation-free architecture on Historical Russian Documents. The DAN model jointly recognizes both text and layout from whole documents, it takes whole documents from any size as an input and output the text as well as logical layout tokens. For comparison purposes, we conduct our experiments on Digital Peter dataset as it has been recognized at line-level. Dataset consists of documents of Peter the Great manuscripts; ground truths are represented according to a sophisticated XML schema which enables an accurate detailed definition of layout and text regions. We achieved good results at page-level: 18.71 % for Character Error Rate (CER), 39.7 % for Word Error Rate (WER), 14.11 % For Layout Ordering Error Rate (LOER), and 66.67 % for mean Average Precision (mAP).

Keywords: document understanding, handwritten text recognition, layout analysis, fully connected networks, transformers

Acknowledgements. The study was carried out at the expense of state funding, topic project No. FFZF-2022-0005.

References

Sánchez J., Romero V., Toselli A.H., Vidal E. ICFHR2016 competition on handwritten text recognition on the READ dataset. Proc. of the 15^th International Conference on Frontiers in Handwriting Recognition (ICFHR), 2016, pp. 630–635. https://doi.org/10.1109/icfhr.2016.0120
Coquenet D., Chatelain C., Paquet T. DAN: a segmentation-free document attention network for handwritten document recognition. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2023, in press. https://doi.org/10.1109/tpami.2023.3235826
Vaswani A., Shazeer N., Parmar N., Uszkoreit J., Jones L., Gomez A.N., Kaiser Ł., Polosukhin I. Attention is all you need. Advances in Neural Information Processing Systems 30 (NIPS 2017), 2017, pp. 5998–6008.
Pletschacher S., Antonacopoulos A. The PAGE (Page Analysis and Ground-truth Elements) format framework. Proc. of the 20^th International Conference on Pattern Recognition, 2010, pp. 257–260. https://doi.org/10.1109/icpr.2010.72
Clausner C., Pletschacher S., Antonacopoulos A. Aletheia - An advanced document layout and text ground-truthing system for production environments. Proc. of the International Conference on Document Analysis and Recognition, 2011, pp. 48–52. https://doi.org/10.1109/ICDAR.2011.19
Potanin M., Dimitrov D., Shonenkov A., Bataev V., Karachev D., Novopoltsev M., Chertok A. Digital Peter: New dataset, competition and handwriting recognition methods. Proc. of the HIP'21: The 6^th International Workshop on Historical Document Imaging and Processing, 2021, pp. 43–48. https://doi.org/10.1145/3476887.3476892
Shonenkov A., Karachev D., Novopoltsev M., Potanin M., Dimitrov D. StackMix and blot augmentations for handwritten text recognition. arXiv, 2021, arXiv:2108.11667. https://doi.org/10.48550/arXiv.2108.11667
Teslya N., Mohammed S. Deep learning for handwriting text recognition: Existing approaches and challenges. Proc. of the 31^st Conference of Open Innovations Association (FRUCT), 2022, pp. 339–346. https://doi.org/10.23919/FRUCT54823.2022.9770912
Bluche T., Messina R. Gated convolutional recurrent neural networks for multilingual handwriting recognition. Proc. of the 14^th IAPR International Conference on Document Analysis and Recognition (ICDAR). V. 1, 2017, pp. 646–651. https://doi.org/10.1109/ICDAR.2017.111
De Sousa Neto A.F., Bezerra B.L.D., Toselli A.H., Lima E.B. HTR-Flor: A deep learning system for offline handwritten text recognition. Proc. of the 33^rd SIBGRAPI Conference on Graphics, Patterns and Images (SIBGRAPI), 2020, pp. 54–61. https://doi.org/10.1109/SIBGRAPI51738.2020.00016
Shi B., Bai X., Yao C. An end-to-end trainable neural network for image-based sequence recognition and its application to scene text recognition. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2017, vol. 39, no. 11, pp. 2298–2304. https://doi.org/10.1109/TPAMI.2016.2646371
Bluche T., Louradour J., Messina R. Scan, attend and read: End-to-end handwritten paragraph recognition with MDLSTM attention. Proc. of the 14^th IAPR International Conference on Document Analysis and Recognition (ICDAR). V. 1, 2017, pp. 1050–1055. https://doi.org/10.1109/ICDAR.2017.174
Puigcerver J. Are multidimensional recurrent layers really necessary for handwritten text recognition? Proc. of the 14^th IAPR International Conference on Document Analysis and Recognition (ICDAR). V. 1, 2017, pp. 67–72. https://doi.org/10.1109/icdar.2017.20
Graves A., Fernández S., Gomez F., Schmidhuber J. Connectionist temporal classification: labelling unsegmented sequence data with recurrent neural networks. ICML '06: Proc. of the 23^rd International Conference on Machine Learning, 2006, pp. 369–376. https://doi.org/10.1145/1143844.1143891
Li M., Lv T., Chen J., Cui L., Lu Y., Florencio D., Zhang C., Li Z., Wei F. TrOCR: Transformer-based optical character recognition with pre-trained models. arXiv, 2021, arXiv:2109.10282. https://doi.org/10.48550/arXiv.2109.10282
Dosovitskiy A., Beyer L., Kolesnikov A., Weissenborn D., Zhai X., Unterthiner T., Dehghani M., Minderer M., Heigold G., Gelly S., Uszkoreit J., Houlsby N. An image is worth than 16X16 words: transformers for image recognition. ICLR 2021. Available at: https://openreview.net/pdf?id=YicbFdNTTy (accessed: 23.12.2022).
Touvron H., Cord M., Douze M., Massa F., Sablayrolles A., Jégou H. Training data-efficient image transformers & distillation through attention. arXiv, 2020, arXiv:2012.12877. https://doi.org/10.48550/arXiv.2012.12877
Bao H., Dong L., Wei F. BEiT: BERT Pre-training of image transformers. arXiv, 2021, arXiv:2106.08254. https://doi.org/10.48550/arXiv.2106.08254
Devlin J., Chang M.W., Lee K., Toutanova K. BERT: Pre-training of deep bidirectional transformers for language understanding. Proc. of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies (NAACL). V. 1, 2019, pp. 4171–4186. https://doi.org/https://aclanthology.org/N19-1423
Liu Y., Ott M., Goyal N., Du J., Joshi M., Chen D., Levy O., Lewis M., Zettlemoyer L., Stoyanov V. RoBERTa: A robustly optimized bert pretraining approach. arXiv, 2019, arXiv:1907.11692. https://doi.org/10.48550/arXiv.1907.11692
Dong L., Yang N., Wang W., Wei F., Liu X., Wang Y., Gao J., Zhou M., Hon H.-W. Unified language model pre-training for natural language understanding and generation. Advances in Neural Information Processing Systems 32 (NeurIPS), 2019.
Singh S.S., Karayev S. Full page handwriting recognition via image to sequence extraction. Lecture Notes in Computer Science, 2021, vol. 12823, pp. 55–69. https://doi.org/10.1007/978-3-030-86334-0_4
Rouhou A.C., Dhiaf M., Kessentini Y., Ben Salem S. Transformer-based approach for joint handwriting and named entity recognition in historical document. Pattern Recognition Letters, 2022, vol. 155, pp. 128–134. https://doi.org/10.1016/j.patrec.2021.11.010
Schreiber S., Agne S., Wolf I., Dengel A., Ahmed S. DeepDeSRT: Deep learning for detection and structure recognition of tables in document images. Proc. of the 14^th IAPR International Conference on Document Analysis and Recognition (ICDAR). V. 1, 2017, pp. 1162–1167. https://doi.org/10.1109/icdar.2017.192
Ares Oliveira S., Seguin B., Kaplan F. DhSegment: A generic deep-learning approach for document segmentation. Proc. of the 16^th International Conference on Frontiers in Handwriting Recognition (ICFHR), pp. 7–12. https://doi.org/10.1109/icfhr-2018.2018.00011
Yang X., Yumer E., Asente P., Kraley M., Kifer D., Giles C.L. Learning to extract semantic structure from documents using multimodal fully convolutional neural networks. Proc. of the 30^th IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2017, pp. 4342–4351. https://doi.org/10.1109/cvpr.2017.462
Xu Y., Li M., Cui L., Huang S., Wei F., Zhou M. LayoutLM: Pre-training of text and layout for document image understanding. Proc. of the 26^th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining, 2020, pp. 1192–1200. https://doi.org/10.1145/3394486.3403172
Xu Y., Xu Y., Lv T., Cui L., Wei F., Wang G., Lu Y., Florencio D., Zhang C., Che W., Zhang M., Zhou L. LayoutLMv2: Multi-modal pre-training for visually-rich document understanding. Proc. of the 59^th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing. V. 1, 2021, pp. 2579–2591. https://doi.org/10.18653/v1/2021.acl-long.201
Biswas S., Banerjee A., Lladós J., Pal U. DocSegTr: An Instance-level end-to-end document image segmentation transformer. arXiv, 2022, arXiv:2201.11438. https://doi.org/10.48550/arXiv.2201.11438
Li J., Xu Y., Lv T., Cui L., Zhang C., Wei F. DiT: Self-supervised pre-training for document image transformer. MM '22: Proc. of the 30^th ACM International Conference on Multimedia, 2022, pp. 3530–3539. https://doi.org/10.1145/3503161.3547911
Coquenet D., Chatelain C., Paquet T. End-to-end handwritten paragraph text recognition using a vertical attention network. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2023, vol. 45, no. 1, pp. 508–524. https://doi.org/10.1109/tpami.2022.3144899
Everingham M., Gool Van L., Williams C.K.I., Winn J. The PASCAL Visual Object Classes (VOC) Challenge. International Journal of Computer Vision, 2010, vol. 8, no. 2, pp. 303–338. https://doi.org/10.1007/s11263-009-0275-4
Sánchez J.A., Romero V., Toselli A.H., Vidal E. ICFHR2016 competition on handwritten text recognition on the READ dataset. Proc. of the 15^th International Conference on Frontiers in Handwriting Recognition (ICFHR), 2016, pp. 630–635. https://doi.org/10.1109/icfhr.2016.0120
Grosicki E., Abed H.E. ICDAR 2011 - French handwriting recognition competition. Proc. of the International Conference on Document Analysis and Recognition (ICDAR), 2011, pp. 1459–1463. https://doi.org/10.1109/icdar.2011.290

This work is licensed under a Creative Commons Attribution-NonCommercial 4.0 International License