Correction of Distortion in Scene Text Recognition

No Thumbnail Available



Journal Title

Journal ISSN

Volume Title


Addis Ababa University


Scene texts which are found in scene images contain valuable meanings. Scene text recognition is the process of converting text regions on the image into machine readable and editable symbols. Naturally, scene texts can appear in regular or irregular layout. In scene texts irregular text is widely used. Scene text with an irregular layout is difficult to recognize because of different forms of distortions. Correcting these distortions without losing any desired information is one of the major challenges in computer vision. Different approaches are proposed to solve the problem of distortion in scene text recognition. Based on their proposed techniques, these approaches can be categorized into four categories: Character Level Strong Supervision, Rectification, Multi Direction Encoding and Attention based approaches. The state of the art is the attention-based approach which predicts characters from scene text image features by using encoder and decoder with attention methods. The performance of attention-based approaches, however, is not good mainly because they are unable to extract detail image features. The approach underperforms particularly with long scene text sequences due to their inconsistent and decreasing encoder output utilization during each decoding time step. Also, it faces the problem of attention mismatch for severely distorted texts. To tackle the problem in attention-based encoder decoder approach, we proposed global attention based mechanism with Bi-LSTM decoder which can handle any type of text distortions implicitly. The proposed approach is trained with 6,000 regular and irregular scene text images randomly taken from publicly available SYN90K synthetic datasets. The dataset is widely used to train scene text recognizers. Preprocessing tasks which are image rescaling and noise removal are performed only for training purpose. The proposed approach is evaluated using 4 class of regular scene text image datasets and 3 class of irregular scene text image datasets. The proposed approach outperforms the state-of-theart approach by an average of 1.58% on regular scene text image datasets and by an average of 1.85% on irregular scene text image datasets. In addition, the incorporation of Bi-LSTM decoder in the proposed approach increases the recognition performance by an average of 5.24% for regular scene texts and by an average of 3.05% for irregular scene texts.



Text Recognition, Correction, istortion