Computer Engineering
Permanent URI for this collection
Browse
Browsing Computer Engineering by Subject "Accuracy"
Now showing 1 - 2 of 2
Results Per Page
Sort Options
Item Machine Learning Approach for Morphological Analysis of Tigrigna Verbs(Addis Ababa University, 2018-10) Gebrearegay Kalayu; Getachew Alemu (PhD)Morphology, in linguistics, is the study of the forms of words that deals with the internal structure of words and word formation. Morphological analysis is the basic task of natural language processing that is defined as the process of segmenting words into morphemes and analyzing the word formation. It is often an initial step for various types of text analysis of any languages. Rule-based approach and machine learning approach are basic mechanisms for morphological analysis. The rule-based method is popular for the analysis but has limitations in terms of the efforts needed and the time. This is because the languages have many rules for a single word especially in the case of verbs. It is also difficult to include all words that need independent rules which limits the rule-based approach to accommodate words that are not in the database of the systems which can also affect the efficiency of the systems. In this work, a system for morphological analysis of Tigrigna language verbs is designed and implemented using machine learning approach. It is intended to automatically segment a given input verb into morphemes and give their categories based on prefix-stem-suffix segmentation. It gives the inflectional categories based on the subject and object markers of verbs that includes the gender, number and person by detecting the correct boundary of the morphemes. The negative, causative and passive prefixes are also considered. The data needed for training and testing was collected from scratch and annotated manually as the language is under-resourced. After the annotation process, an automatic method was implemented using java to preprocess the annotated verbs to produce list of instances for training and testing. The instance- based algorithm was used with the overlap metric with information gain weighting (IB1-IG) and without weighting (IB1) the features. Experiments were performed by varying the number of nearest neighbors starting from one up to seventeen where the accuracies were almost saturated for both the IB1 and IB1-IG. The majority class voting and the inverse distance weighted decision methods were also compared in the experiment. The best performance were obtained with IB1 using both decision methods when the number of nearest neighbors parameter was smaller. The performance decreased as the number of nearest neighbor increased for both decision methods but showed higher variation in the case of majority class voting. Similarly, the performance with IB1-IG was also better for the smaller number of nearest neighbor for both decision methods and decreased when the number of nearest neighbor increased where it showed higher decrement in the case of majority voting. The IB1 achieved better performance compared to the IB1-IG. A highest accuracy of 91.56% and 89.15% was achieved using IB1 and IB1-IG, respectively with the number of nearest neighbor parameter of 1 for IB1 and 2 for IB1-IG. This encouraging result revealed that the instance-based algorithm is able to automate the morphological analysis of Tigrigna verbs.Item Signal-based Ethiopian Languages Identification using Gaussian Mixture Model(Addis Ababa University, 2017-02) Wondimu, Mikias; Menor, MrLanguage Identification (LID) refers to the task of identifying an unknown language from the test utterances. The core problem in solving the language identification (LID) task is to find a way of reducing the complexity of human language such that an automatic algorithm can determine the language and identify it from a relatively brief audio sample. From the review of the existing approaches for LID, it is observed that very few attempts have been made on Language Identification System for African languages. The importance of Language Identification for African languages is seeing a dramatic increase due to the development of telecommunication infrastructure and, as a result, an increase in volumes of data and speech traffic in public networks. By automatically processing the raw speech data the vital assistance given to people in distress can be speeded up, by referring their calls to a person knowledgeable in that language. An LID system for four different Ethiopian languages namely Amharic, Guragegna, Oromiffa and Tigregna is done using Gaussian mixture models (GMM). The system developed here is intended to identify which language is spoken by the speaker from these four languages audio utterances of some phrases for some duration. A dataset consisted of recording of 7 different speakers for each languages were prepared and after preprocessing the database mono channel, the features are extracted using Mel frequency cepstral coefficients (MFCC) and classification is done using GMMs. To test the performance of the LID system experimental scenarios are designed and carried out by taking two, three and four languages at a time. The LID system is tested for both utterance dependent and independent system (i.e. the test is done by taking the same speech for both training and testing (utterance/speech dependent) and also by taking different speech than the training utterance (utterance/speech independent)). It is more challenging to implement and get a better LID system performance with utterance independent system with such a small recorded database. In addition to this the system also tested for the speaker independent system. The utterance dependent LID performance for four language tasks was about 93% accurate and the utterance independent LID performance for four language tasks was about 70% accurate on average. The speaker independent LID system performance for the four language task was about 91%. Keywords: Language identification, Languages, MFCC, GMM, Accuracy, Utterance