Correction of Distortion in Scene Text Recognition
No Thumbnail Available
Date
2022-04
Authors
Journal Title
Journal ISSN
Volume Title
Publisher
Addis Ababa University
Abstract
Scene texts which are found in scene images contain valuable meanings. Scene text recognition
is the process of converting text regions on the image into machine readable and editable symbols.
Naturally, scene texts can appear in regular or irregular layout. In scene texts irregular text is
widely used. Scene text with an irregular layout is difficult to recognize because of different forms
of distortions. Correcting these distortions without losing any desired information is one of the
major challenges in computer vision. Different approaches are proposed to solve the problem of
distortion in scene text recognition. Based on their proposed techniques, these approaches can be
categorized into four categories: Character Level Strong Supervision, Rectification, Multi
Direction Encoding and Attention based approaches. The state of the art is the attention-based
approach which predicts characters from scene text image features by using encoder and decoder
with attention methods. The performance of attention-based approaches, however, is not good
mainly because they are unable to extract detail image features. The approach underperforms
particularly with long scene text sequences due to their inconsistent and decreasing encoder output
utilization during each decoding time step. Also, it faces the problem of attention mismatch for
severely distorted texts.
To tackle the problem in attention-based encoder decoder approach, we proposed global
attention based mechanism with Bi-LSTM decoder which can handle any type of text distortions
implicitly. The proposed approach is trained with 6,000 regular and irregular scene text images
randomly taken from publicly available SYN90K synthetic datasets. The dataset is widely used to
train scene text recognizers. Preprocessing tasks which are image rescaling and noise removal are
performed only for training purpose.
The proposed approach is evaluated using 4 class of regular scene text image datasets and 3
class of irregular scene text image datasets. The proposed approach outperforms the state-of-theart
approach
by
an
average
of
1.58%
on
regular
scene
text
image
datasets
and
by
an
average
of
1.85%
on
irregular
scene
text
image
datasets.
In
addition,
the
incorporation
of
Bi-LSTM
decoder
in
the
proposed
approach
increases
the
recognition
performance
by
an
average
of
5.24%
for
regular
scene
texts
and
by
an
average
of 3.05%
for
irregular
scene
texts.
Description
Keywords
Text Recognition, Correction, istortion