Ethiopic and Latin Multilingual Text Detection and Script Identi cation from Videos and Images

Atirsaw, Awoke

Ethiopic and Latin Multilingual Text Detection and Script Identi cation from Videos and Images

dc.contributor.advisor	Menore, Tekeba (Mr.)
dc.contributor.author	Atirsaw, Awoke
dc.date.accessioned	2019-05-30T10:01:07Z
dc.date.accessioned	2023-11-04T15:14:40Z
dc.date.available	2019-05-30T10:01:07Z
dc.date.available	2023-11-04T15:14:40Z
dc.date.issued	2018-04-10
dc.description.abstract	Both caption and scene texts which are found in images and video frames contain valuable information. These texts can be used for many applications to answer questions like what, when, where, and by who to give context to the images and video frames. So, automatic text detection enhances the user's understanding of the media content. In Ethiopia, most street posts and promotional boards are written in multi-lingual characters such as Latin (English, Afaan Oromo etc.) and Ethiopic (Amharic, Tigrigna etc.). In this work, we have studied Ethiopic and Latin multilingual text detection and script identi cation from videos and images for both caption and scene texts. After the images and video frames are pre-processed, maximally stable extremal region (MSER) algorithm, aspect ratio and stroke width transform (SWT) algorithm are used to extract text regions and discriminate non-text patterns from texts, respectively. Then texture features are computed using local binary pattern (LBP) from the extracted regions. Finally, support vector machine (SVM) is used to classify text region vs non-text using the computed LBP features. In the next phase of our work, which is script identi cation, the detected text regions are binarized using Niblack's algorithm. Radon transform was applied on the binarized text regions to detect and correct skew. Segmentation of lines using horizontal projection pro le followed by word segmentation using vertical projection pro le is done when the text region contains more than one line of text. From the resulting text words, texture features are computed again using LBP and the text words are categorized to their respective script classes using SVM. We used the International Conference on Document Analysis and Recognition(ICDAR) 2003 data set as well as prepared a new multilingual Ethiopic and Latin script image dataset to evaluate our method. Our text detection method performs better compared with the state of the art method with precision 5%, recall of 10% and 8% f-measure on ICDAR 2003 dataset. The text detection was also evaluated on our dataset, where 81% precision,74% recall with a f- measure of 77% was obtained. The overall system gives 79.9% accuracy of script identification.	en_US
dc.identifier.uri	http://etd.aau.edu.et/handle/123456789/18344
dc.language.iso	en_US	en_US
dc.publisher	Addis Ababa University	en_US
dc.subject	Multilingual Text Detection	en_US
dc.subject	Maximally Stable Extremal Region	en_US
dc.subject	Stroke Width Transform	en_US
dc.subject	Support Vector Machine	en_US
dc.subject	Optical Character Recognition	en_US
dc.title	Ethiopic and Latin Multilingual Text Detection and Script Identi cation from Videos and Images	en_US
dc.type	Thesis	en_US

Files

Original bundle

Now showing 1 - 1 of 1

Name:: Atirsaw Awoke.pdf
Size:: 2.8 MB
Format:: Adobe Portable Document Format

Download

License bundle

Now showing 1 - 1 of 1

Name:: license.txt
Size:: 1.71 KB
Format:: Plain Text
Description:

Download

Collections

Computer Engineering