Multilingual Text Detection and Script Recognition from Video Scene using Deeplearning
No Thumbnail Available
Date
2019-10
Authors
Journal Title
Journal ISSN
Volume Title
Publisher
Addis Ababa University
Abstract
Scene Texts occur more frequently in most videos which may contain crucial information. The
information may have contents such as location and time. In Ethiopia most information on the
streets are posted using Ethiopic (Geez) and Latin Scripts. In our Research work we have studied
Multilingual Text Detection, Script Identification and Character Recognition from Video Scene
using Deep Learning Neural Network Model.
The Videos being captured by the digital camera are processed and Keyframes are extracted using
Keyframe Selection Algorithm, Text regions are detected by using Trained Convolutional Neural
Network and those text regions which are found by bounding box regression are cropped out by
taking their bounding box values. The use of Faster R-CNN that consists of dropout layer for text
detection has achieved a 91% of precision, 92.9% recall and an execution time of 7.5 sec during
testing the network. After taking those cropped text blocks, scripts are classified or identified by
using a trained network through transfer learning into their script classes. Following the script
identification Line Segmentation, Word segmentation and Character Segmentation using
Horizontal and Vertical Projection profile are performed which are the preprocessing steps for
Optical Character Recognition, where script identification has achieved 88.5% of accuracy without
the use of dropout layer and 93.3% of accuracy with the use of dropout layer. The final phase of
this work includes character recognition which lies on the previous text detection, and script
identification phases, different epochs were considered during training the network to maximize
the efficiency of the network to recognize characters. The network that was trained with an epoch
size of 200 has achieved 0.0076% of error during testing. This shows that maximizing the number
of epochs during setting the training options improves the character recognition performance while
decreasing the error value to the minimum value.
Description
Keywords
Faster R-CNN, Deep Learning Neural Network, Optical Character Recognition, Alexnet