Morphological Analysis of Ge’ez Verbs Using Memory Based Learning

No Thumbnail Available

Date

11/7/2014

Journal Title

Journal ISSN

Volume Title

Publisher

Addis Ababa University

Abstract

Ge‟ez is the classical language of Ethiopia and still used as the litrugical language of EOTC. Many ancient literatures were written in Ge‟ez. The literature includes religious texts and secular writings. The ancient philosophy, tradition, history and knowledge of Ethiopia were being written in Ge‟ez. For automatic analysis of these documents Ge‟ez morphological analysis is needed. Morphological analyzer is one of the most important basic tools in automatic processing of any human language. It analyses the naturally occurring word forms in a sentence and identifies the root word and its features. In this study, we used MBL to automatically analyze the morphology of Ge‟ez verbs. The system has two components: training and analysis. In the training phase, we identified the annotation process for our dataset in a character based representation of features. Then, these annotated dataset are extracted in a fixed length of instance vectors using windowing method. Next, instances are passed to the memory based learning tool (TiMBL). Finally, the learning model is built. On the other hand, the analysis phase performs instance making by extracting features from the given text to have similar structure of features during comparison. Then the extracted features are passed to the morpheme identification process to be compared with individual instances in memory and stems are extracted with their morpheme functions. Finally, the roots are extracted from the stems. The system was developed using python where we used TiMBL‟s IB2 and TRIBL2 algorithm for implementation. The performance of the system has been evaluated using 10-fold cross-validation technique. Testing was done using the default and optimized parameter settings. The overall accuracy with optimized parameters using IB2 and TRIBL2 was 93.24% and 92.31%, respectively. Similarly, the overall precision, recall and F-score with optimized parameters using IB2 were 55.6%, 56.3% and 59.95%, respectively. In the same manner the precision, recall and F-score using TRIBL2 were 58.8%, 60.3% and 59.54%, respectively. Moreover, a learning curve was drawn. The graph showed that as the number of training dataset increase, the accuracy on unseen data can be increased. Therefore, IB2 algorithm shows better result than TRIBL2 algorithm for Ge‟ez verb morphology. Key words:- Ge‟ez Verbs, Ge‟ez Morphology, Morphological Analyzer, Memory Based Learning, Character Based Analysis, Cross Validation, Feature Extraction

Description

Keywords

Ge‟ez Verbs, Ge‟Ez Morphology, Morphological Analyzer, Memory Based Learning, Character Based Analysis, Cross Validation, Feature Extraction

Citation

Collections