Розробка прототипу системи оптичного розпізнавання тексту для зображень низької якості

Микола Баранов, Сергій Іванов, Ярослав Соколовський, Юліана Юрченко


This paper considers the problems that arise while recognizing symbols for low-quality images with a high level of digital noise, blurring, distortion of digital processing. In this work a new dataset proposed, which consists of synthetic images with overlaying text on a white background. A lot of distortions were obtained via applying resize functions, noise functions, blurring and rotation operators. Applied transformations have random
intensity uniformly distributed. It simulates the image distortions that occur in real life.
Creating data in this way allows getting labeled low-quality images of text in any quantity. We used Keras library to build a CRNN and instantiate a custom endpoint layer  for implementing CTC loss. A novel pipeline of data preprocessing is suggested for data preprocessing in order to increase the accuracy of OCR results: an algorithm for horizontal alignment of the text image and an algorithm for cutting a multi-line text image into several
single-line text images. It allows us to reuse model that is suitable for single-line text and
achieve similar accuracy score on the multi-line text images.
We dened an error metric to compare character sequences of labels and model predictions as the ratio of the Levenstein distance between the label and the model prediction to the label length. That score expose how often model mismatch single character. The value 0.02 of this metric was obtained for our model while recognizing the text of the test dataset. Testing was also performed for state-of-art OCR models Tesseract OCR v 5.0.0.
(alpha) and Google Cloud Vision APIs. The results prove that the built neural network architecture and the described image preprocessing algorithms are eective for recognizing text of low-quality images.

Повний текст:



1. Almazan J. Word spotting and recognition with embedded attributes / J. Almazan, A. Gordo,
A. Fornes, E.Valveny // IEEE Trans. Pattern Anal. Mach. Intell. 2014. Vol. 36 (12).
P. 2552-2566.
2. Dalal N. Histograms of oriented gradients for human detection / N. Dalal, B.Triggs // In
CVPR. 2005.
3. GitHub repository for Tesseract OCR v5.0.0 (alpha). Available from: https://github.com
4. Google Cloud Vision API. Available from: https://cloud.google.com/vision
5. Jaderberg M. , Deep structured output learning for unconstrained text recognition / M. Jaderberg,
K. Simonyan, A.Vedaldi, A. Zisserman // In ICLR. 2015.
6. Jaderberg M. Reading text in the wild with convolutional neural networks / M. Jaderberg,
K. Simonyan, A.Vedaldi, A. Zisserman // Int. J. Comput. Vision. 2015.
7. Neumann L. Real-time scene text localization and recognition / L. Neumann, J. Matas // In
CVPR. 2012.
8. Shi B. An End-to-End Trainable Neural Network for Image-based Sequence Recognition
and Its Application to Scene Text Recognition / B. Shi, X. Bai, C.Yao
9. Smith R. End-to-End Interpretation of the French Street Name Signs Dataset / R. Smith,
C. Gu, D.-S. Lee, H. Hu, R. Unnikrishnan, Ju. Ibarz, S. Arnoud, S. Lin
10. Su B. Accurate scene text recognition based on recurrent neural network / B. Su, S. Lu // In
ACCV. 2014.
11. Wang K. End-to-end scene text recognition / K.Wang, B. Babenko, S. Belongie // In
ICCV. 2011.
12. Wang K. Word spotting in the wild / K.Wang, S. Belongie // In ECCV. 2010.
13. Yao C. Strokelets: A learned multi-scale representation for scene text recognition / C.Yao,
X. Bai, B. Shi, W. Liu // In CVPR. 2014.

DOI: http://dx.doi.org/10.30970/vam.2021.29.11344


  • Поки немає зовнішніх посилань.