Ethiopic and Latin Multilingual Text Detection and Script Identi cation from Videos and Images

dc.contributor.advisorMenore, Tekeba (Mr.)
dc.contributor.authorAtirsaw, Awoke
dc.date.accessioned2019-05-30T10:01:07Z
dc.date.accessioned2023-11-04T15:14:40Z
dc.date.available2019-05-30T10:01:07Z
dc.date.available2023-11-04T15:14:40Z
dc.date.issued2018-04-10
dc.description.abstractBoth caption and scene texts which are found in images and video frames contain valuable information. These texts can be used for many applications to answer questions like what, when, where, and by who to give context to the images and video frames. So, automatic text detection enhances the user's understanding of the media content. In Ethiopia, most street posts and promotional boards are written in multi-lingual characters such as Latin (English, Afaan Oromo etc.) and Ethiopic (Amharic, Tigrigna etc.). In this work, we have studied Ethiopic and Latin multilingual text detection and script identi cation from videos and images for both caption and scene texts. After the images and video frames are pre-processed, maximally stable extremal region (MSER) algorithm, aspect ratio and stroke width transform (SWT) algorithm are used to extract text regions and discriminate non-text patterns from texts, respectively. Then texture features are computed using local binary pattern (LBP) from the extracted regions. Finally, support vector machine (SVM) is used to classify text region vs non-text using the computed LBP features. In the next phase of our work, which is script identi cation, the detected text regions are binarized using Niblack's algorithm. Radon transform was applied on the binarized text regions to detect and correct skew. Segmentation of lines using horizontal projection pro le followed by word segmentation using vertical projection pro le is done when the text region contains more than one line of text. From the resulting text words, texture features are computed again using LBP and the text words are categorized to their respective script classes using SVM. We used the International Conference on Document Analysis and Recognition(ICDAR) 2003 data set as well as prepared a new multilingual Ethiopic and Latin script image dataset to evaluate our method. Our text detection method performs better compared with the state of the art method with precision 5%, recall of 10% and 8% f-measure on ICDAR 2003 dataset. The text detection was also evaluated on our dataset, where 81% precision,74% recall with a f- measure of 77% was obtained. The overall system gives 79.9% accuracy of script identification.en_US
dc.identifier.urihttp://etd.aau.edu.et/handle/123456789/18344
dc.language.isoen_USen_US
dc.publisherAddis Ababa Universityen_US
dc.subjectMultilingual Text Detectionen_US
dc.subjectMaximally Stable Extremal Regionen_US
dc.subjectStroke Width Transformen_US
dc.subjectSupport Vector Machineen_US
dc.subjectOptical Character Recognitionen_US
dc.titleEthiopic and Latin Multilingual Text Detection and Script Identi cation from Videos and Imagesen_US
dc.typeThesisen_US

Files

Original bundle
Now showing 1 - 1 of 1
No Thumbnail Available
Name:
Atirsaw Awoke.pdf
Size:
2.8 MB
Format:
Adobe Portable Document Format
License bundle
Now showing 1 - 1 of 1
No Thumbnail Available
Name:
license.txt
Size:
1.71 KB
Format:
Plain Text
Description: