Multilingual Text Detection and Script Recognition from Video Scene using  Deeplearning

Kirubel, Gebrehiwot

Multilingual Text Detection and Script Recognition from Video Scene using Deeplearning

Files

Kirubel Gebrehiwot.pdf (1.88 MB)

Date

2019-10

Authors

Kirubel, Gebrehiwot

Publisher

Addis Ababa University

Abstract

Scene Texts occur more frequently in most videos which may contain crucial information. The information may have contents such as location and time. In Ethiopia most information on the streets are posted using Ethiopic (Geez) and Latin Scripts. In our Research work we have studied Multilingual Text Detection, Script Identification and Character Recognition from Video Scene using Deep Learning Neural Network Model. The Videos being captured by the digital camera are processed and Keyframes are extracted using Keyframe Selection Algorithm, Text regions are detected by using Trained Convolutional Neural Network and those text regions which are found by bounding box regression are cropped out by taking their bounding box values. The use of Faster R-CNN that consists of dropout layer for text detection has achieved a 91% of precision, 92.9% recall and an execution time of 7.5 sec during testing the network. After taking those cropped text blocks, scripts are classified or identified by using a trained network through transfer learning into their script classes. Following the script identification Line Segmentation, Word segmentation and Character Segmentation using Horizontal and Vertical Projection profile are performed which are the preprocessing steps for Optical Character Recognition, where script identification has achieved 88.5% of accuracy without the use of dropout layer and 93.3% of accuracy with the use of dropout layer. The final phase of this work includes character recognition which lies on the previous text detection, and script identification phases, different epochs were considered during training the network to maximize the efficiency of the network to recognize characters. The network that was trained with an epoch size of 200 has achieved 0.0076% of error during testing. This shows that maximizing the number of epochs during setting the training options improves the character recognition performance while decreasing the error value to the minimum value.

Keywords

Faster R-CNN, Deep Learning Neural Network, Optical Character Recognition, Alexnet

URI

http://etd.aau.edu.et/handle/123456789/20927

Collections

Computer Engineering

Full item page

Multilingual Text Detection and Script Recognition from Video Scene using Deeplearning

Files

Date

Authors

Journal Title

Journal ISSN

Volume Title

Publisher

Abstract

Description

Keywords

Citation

URI

Collections