Ethiopic and Latin Multilingual Text Detection and Script Identi cation from Videos and Images

No Thumbnail Available

Date

2018-04-10

Journal Title

Journal ISSN

Volume Title

Publisher

Addis Ababa University

Abstract

Both caption and scene texts which are found in images and video frames contain valuable information. These texts can be used for many applications to answer questions like what, when, where, and by who to give context to the images and video frames. So, automatic text detection enhances the user's understanding of the media content. In Ethiopia, most street posts and promotional boards are written in multi-lingual characters such as Latin (English, Afaan Oromo etc.) and Ethiopic (Amharic, Tigrigna etc.). In this work, we have studied Ethiopic and Latin multilingual text detection and script identi cation from videos and images for both caption and scene texts. After the images and video frames are pre-processed, maximally stable extremal region (MSER) algorithm, aspect ratio and stroke width transform (SWT) algorithm are used to extract text regions and discriminate non-text patterns from texts, respectively. Then texture features are computed using local binary pattern (LBP) from the extracted regions. Finally, support vector machine (SVM) is used to classify text region vs non-text using the computed LBP features. In the next phase of our work, which is script identi cation, the detected text regions are binarized using Niblack's algorithm. Radon transform was applied on the binarized text regions to detect and correct skew. Segmentation of lines using horizontal projection pro le followed by word segmentation using vertical projection pro le is done when the text region contains more than one line of text. From the resulting text words, texture features are computed again using LBP and the text words are categorized to their respective script classes using SVM. We used the International Conference on Document Analysis and Recognition(ICDAR) 2003 data set as well as prepared a new multilingual Ethiopic and Latin script image dataset to evaluate our method. Our text detection method performs better compared with the state of the art method with precision 5%, recall of 10% and 8% f-measure on ICDAR 2003 dataset. The text detection was also evaluated on our dataset, where 81% precision,74% recall with a f- measure of 77% was obtained. The overall system gives 79.9% accuracy of script identification.

Description

Keywords

Multilingual Text Detection, Maximally Stable Extremal Region, Stroke Width Transform, Support Vector Machine, Optical Character Recognition

Citation