Ethiopic and Latin Multilingual Text Detection and Script Identi cation from Videos and Images
No Thumbnail Available
Date
2018-04-10
Authors
Journal Title
Journal ISSN
Volume Title
Publisher
Addis Ababa University
Abstract
Both caption and scene texts which are found in images and video frames contain
valuable information. These texts can be used for many applications to answer questions
like what, when, where, and by who to give context to the images and video frames.
So, automatic text detection enhances the user's understanding of the media content.
In Ethiopia, most street posts and promotional boards are written in multi-lingual
characters such as Latin (English, Afaan Oromo etc.) and Ethiopic (Amharic, Tigrigna
etc.). In this work, we have studied Ethiopic and Latin multilingual text detection and
script identi cation from videos and images for both caption and scene texts.
After the images and video frames are pre-processed, maximally stable extremal region
(MSER) algorithm, aspect ratio and stroke width transform (SWT) algorithm are used
to extract text regions and discriminate non-text patterns from texts, respectively. Then
texture features are computed using local binary pattern (LBP) from the extracted
regions. Finally, support vector machine (SVM) is used to classify text region vs
non-text using the computed LBP features. In the next phase of our work, which is
script identi cation, the detected text regions are binarized using Niblack's algorithm.
Radon transform was applied on the binarized text regions to detect and correct skew.
Segmentation of lines using horizontal projection pro le followed by word segmentation
using vertical projection pro le is done when the text region contains more than one
line of text. From the resulting text words, texture features are computed again using
LBP and the text words are categorized to their respective script classes using SVM.
We used the International Conference on Document Analysis and Recognition(ICDAR)
2003 data set as well as prepared a new multilingual Ethiopic and Latin script image
dataset to evaluate our method. Our text detection method performs better compared with the state of the art method with precision 5%, recall of 10% and 8% f-measure on
ICDAR 2003 dataset. The text detection was also evaluated on our dataset, where 81%
precision,74% recall with a f- measure of 77% was obtained. The overall system gives
79.9% accuracy of script identification.
Description
Keywords
Multilingual Text Detection, Maximally Stable Extremal Region, Stroke Width Transform, Support Vector Machine, Optical Character Recognition