Browsing by Author "Assabie, Yaregal (PHD)"
Now showing 1 - 12 of 12
Results Per Page
Sort Options
Item Design and Development of Amharic Grammar Checker(Addis Ababa University, 2013-03) Temesgen, Aynadis; Assabie, Yaregal (PHD)Most human knowledge is recorded in natural language. The records are kept in computers or on paper to be manipulated and reserved for use in the future. Natural Language processing plays an important role in increasing computers capability to understand natural languages. Designing and implementing computer programs that can understand natural language is the aim of the works in the area of Natural Language Processing. In order to communicate through natural languages grammatical correctness is very crucial. Therefore, natural language processing applications should be enabled to recognize the grammatical errors of natural language texts. This process is known as grammar checking. This work introduces development and design of Amharic grammar checker. Two grammar checker approaches have been used in this research. The first approach is a rulebased and it is tested for simple sentences. The rules are constructed manually and matched against the patterns of the sentence to be checked. The second approach is statistical approach and tested for both simple and complex sentences. In the statistical Amharic grammar checker, ngram and probabilistic methods are used to check grammatical errors of Amharic sentence. The patterns and the corresponding probabilities of occurrence are automatically extracted from the training corpus and stored in a repository. Sentence probability can be calculated using these patterns and probabilities. Then, probability of the sentence and specified threshold are used to determine the correctness of the sentence. The corpus, both for training and test set, is prepared from a manually part-of-speech text of the language. The evaluation is made in two test cases. The first case is done on simple sentences. In this test case, 92.45% precision and 94.03% recall is obtained for the rule-based Amharic grammar checker. On the same test case, the statistical Amharic grammar checker (trigram) shows precision and recall of 67.14% and 90.38% respectively. The statistical Amharic grammar checker is tested using complex sentences in the second test case. In this test case, 63.76% of the errors are detected. The evaluation result shows that each approach is capable of detecting multiple errors from a sentence. The false alarms are due to the incomplete grammatical rules and quality of the statistical data. The accuracy of morphological analyzer also affects the grammar checking result in both approaches. Keywords: Statistical grammar checker, rule-based grammar checker, n-gram, POS tag sequenceItem Design and Development of Amharic Grammar Checker(Addis Ababa University, 2013-03) Temesgen, Aynadis; Assabie, Yaregal (PHD)Most human knowledge is recorded in natural language. The records are kept in computers or on paper to be manipulated and reserved for use in the future. Natural Language processing plays an important role in increasing computers capability to understand natural languages. Designing and implementing computer programs that can understand natural language is the aim of the works in the area of Natural Language Processing. In order to communicate through natural languages grammatical correctness is very crucial. Therefore, natural language processing applications should be enabled to recognize the grammatical errors of natural language texts. This process is known as grammar checking. This work introduces development and design of Amharic grammar checker. Two grammar checker approaches have been used in this research. The first approach is a rulebased and it is tested for simple sentences. The rules are constructed manually and matched against the patterns of the sentence to be checked. The second approach is statistical approach and tested for both simple and complex sentences. In the statistical Amharic grammar checker, ngram and probabilistic methods are used to check grammatical errors of Amharic sentence. The patterns and the corresponding probabilities of occurrence are automatically extracted from the training corpus and stored in a repository. Sentence probability can be calculated using these patterns and probabilities. Then, probability of the sentence and specified threshold are used to determine the correctness of the sentence. The corpus, both for training and test set, is prepared from a manually part-of-speech text of the language. The evaluation is made in two test cases. The first case is done on simple sentences. In this test case, 92.45% precision and 94.03% recall is obtained for the rule-based Amharic grammar checker. On the same test case, the statistical Amharic grammar checker (trigram) shows precision and recall of 67.14% and 90.38% respectively. The statistical Amharic grammar checker is tested using complex sentences in the second test case. In this test case, 63.76% of the errors are detected. The evaluation result shows that each approach is capable of detecting multiple errors from a sentence. The false alarms are due to the incomplete grammatical rules and quality of the statistical data. The accuracy of morphological analyzer also affects the grammar checking result in both approaches. Keywords: Statistical grammar checker, rule-based grammar checker, n-gram, POS tag sequenceItem Development Of Automatic Maize Quality Assessment System Using Image Processing Techniques(Addis Ababa University, 2015-11) Hailemichael, Daniel; Assabie, Yaregal (PHD)Maize is a very important crop where its circulation in the market has to conform to the rules of quality inspection. Currently, maize sample quality inspection is performed manually by human experts through visual evaluation and the constituents will be classified into foreign matter, rotten and diseased, healthy, broken, discolored, shriveled and pest damaged kernels. However, visual evaluation requires significant amount of time, trained and experienced people. Besides, it is affected by bias and inconsistencies associated with human nature. Such approach will not be satisfactory for large scale inspection and grading unless fully automated. The goal of this research work is to develop a system capable of assessing the quality of maize sample constituents using digital image processing techniques and artificial neural network classifier based on the standard for maize set by the Ethiopian Standards Agency. A novel segmentation technique is proposed to segment and lay the foundation for feature extraction. A total of 24 features (14 color, 8 shape and 2 size) have been identified to model maize sample constituents. For classificat ion of maize samples, a feedforward artificial neural network classifier with backpropagat ion learning algorithm, 24 input and 7 output nodes, corresponding to the number of features and classes respectively has been designed. The network is trained and its performance is compared against other classifiers both empirically and based on supporting facts from the literature. For the purpose of training the classifier, a total of 534 kernels and foreign matters have been collected from Ethiopian Grain Trade Enterprise. The training data is randomly apportioned into training (70%) and testing (30%). The classifier achieved an overall classification accuracy of 97.8%. The success rates for detecting foreign, rotten and diseased, healthy, broken, discolored, shriveled and pest damaged kernels are 100%, 95.2%, 98.6%, 98.8%, 100%, 98.4%, and 94.8%, respectively. Keywords: Artificial neural network, Maize quality assessment, Reconstructed image, Merged image, Color image segmentation, Digital image processing, Color structure tensorItem Development Of Automatic Maize Quality Assessment System Using Image Processing Techniques(Addis Ababa University, 2015-11) Hailemichael, Daniel; Assabie, Yaregal (PHD)Maize is a very important crop where its circulation in the market has to conform to the rules of quality inspection. Currently, maize sample quality inspection is performed manually by human experts through visual evaluation and the constituents will be classified into foreign matter, rotten and diseased, healthy, broken, discolored, shriveled and pest damaged kernels. However, visual evaluation requires significant amount of time, trained and experienced people. Besides, it is affected by bias and inconsistencies associated with human nature. Such approach will not be satisfactory for large scale inspection and grading unless fully automated. The goal of this research work is to develop a system capable of assessing the quality of maize sample constituents using digital image processing techniques and artificial neural network classifier based on the standard for maize set by the Ethiopian Standards Agency. A novel segmentation technique is proposed to segment and lay the foundation for feature extraction. A total of 24 features (14 color, 8 shape and 2 size) have been identified to model maize sample constituents. For classificat ion of maize samples, a feedforward artificial neural network classifier with backpropagat ion learning algorithm, 24 input and 7 output nodes, corresponding to the number of features and classes respectively has been designed. The network is trained and its performance is compared against other classifiers both empirically and based on supporting facts from the literature. For the purpose of training the classifier, a total of 534 kernels and foreign matters have been collected from Ethiopian Grain Trade Enterprise. The training data is randomly apportioned into training (70%) and testing (30%). The classifier achieved an overall classification accuracy of 97.8%. The success rates for detecting foreign, rotten and diseased, healthy, broken, discolored, shriveled and pest damaged kernels are 100%, 95.2%, 98.6%, 98.8%, 100%, 98.4%, and 94.8%, respectively. Keywords: Artificial neural network, Maize quality assessment, Reconstructed image, Merged image, Color image segmentation, Digital image processing, Color structure tensorItem A Hybrid Approach to Amharic Base Phrase Chunking and Parsing(Addis Ababa University, 2013-03) Ibrahim, Abeba; Assabie, Yaregal (PHD)Natural Language Processing (NLP) concerns with the interaction between computers and human natural languages. The most difficult task in NLP is to learn natural languages for the computer. Enabling computers to understand natural language involves assigning of words with their Part Of Speech, extraction of phrases, extraction of meaning, etc from natural language sentences. Text chunking and sentence parsing are among the tasks of NLP. Text chunking or shallow parsing is one of the tasks of NLP which divides a text in syntactically correlated words from a stream of text. It is an intermediate step of full parsing. As well as, text chunking could be used as a precursor for many natural language processing tasks, such as information retrieval, named entity extraction, text summarization and so on. The objective of this research is to extract different types of Amharic phrases by grouping syntactically correlated words which are found at different level of the parser using Hidden Markov Model (HMM) model and to transform the chunker to parser. Some rules are also used in this study to correct some outputs of HMM based chunker. Bottom-up approach with transformation algorithm is used to transform the chunker to the parser. For the identification of the boundary of the phrases IOB2 chunk specification is selected and used in this study. In this study different sentences are collected from Amharic grammar books and news of Walta Information Center (WIC) for the training and testing datasets. Unlike the data collected from WIC, the data collected from Amharic grammar books are not tagged at all. Thus, these data sets were analyzed and tagged manually and used as a corpus for chunking. But the entire data sets were chunk tagged manually for the training data set and approved by linguistic professionals. Experiments have been conducting using the training and testing data sets. The training and testing datasets are prepared using the 10 fold cross validation. The experiments on Amharic sentence clunking showed an average accuracy of 85.31% testing set before applying the rule for correction and an average accuracy of 93.75% on the test set after applying rules. And also the experiment on Amharic sentence parsing showed an average accuracy of 93.75%. Keywords: Amharic Text clunking, Amharic partial parsing, Amharic shallow parsing, Amharic ParsingItem A Hybrid Approach to Amharic Base Phrase Chunking and Parsing(Addis Ababa University, 2013-03) Ibrahim, Abeba; Assabie, Yaregal (PHD)Natural Language Processing (NLP) concerns with the interaction between computers and human natural languages. The most difficult task in NLP is to learn natural languages for the computer. Enabling computers to understand natural language involves assigning of words with their Part Of Speech, extraction of phrases, extraction of meaning, etc from natural language sentences. Text chunking and sentence parsing are among the tasks of NLP. Text chunking or shallow parsing is one of the tasks of NLP which divides a text in syntactically correlated words from a stream of text. It is an intermediate step of full parsing. As well as, text chunking could be used as a precursor for many natural language processing tasks, such as information retrieval, named entity extraction, text summarization and so on. The objective of this research is to extract different types of Amharic phrases by grouping syntactically correlated words which are found at different level of the parser using Hidden Markov Model (HMM) model and to transform the chunker to parser. Some rules are also used in this study to correct some outputs of HMM based chunker. Bottom-up approach with transformation algorithm is used to transform the chunker to the parser. For the identification of the boundary of the phrases IOB2 chunk specification is selected and used in this study. In this study different sentences are collected from Amharic grammar books and news of Walta Information Center (WIC) for the training and testing datasets. Unlike the data collected from WIC, the data collected from Amharic grammar books are not tagged at all. Thus, these data sets were analyzed and tagged manually and used as a corpus for chunking. But the entire data sets were chunk tagged manually for the training data set and approved by linguistic professionals. Experiments have been conducting using the training and testing data sets. The training and testing datasets are prepared using the 10 fold cross validation. The experiments on Amharic sentence clunking showed an average accuracy of 85.31% testing set before applying the rule for correction and an average accuracy of 93.75% on the test set after applying rules. And also the experiment on Amharic sentence parsing showed an average accuracy of 93.75%. Keywords: Amharic Text clunking, Amharic partial parsing, Amharic shallow parsing, Amharic ParsingItem Information Extraction from Amharic Language Text: Knowledge-Poor Approach(Addis Ababa University, 2015-06) Worku, Bekele; Assabie, Yaregal (PHD)During the last two decades with the accelerated Internet development a great amount of data have been being accumulated and stored on the Web. We are drowns with much data at office, home either in printable or electronic form. Then finding the relevant information from this mass data is critical. At this end, information extraction is a technology which creates the structured representation of unstructured texts by extracting relevant entities from them, thereby, making the data analysis realizable. This work focuses on developing information extraction system from Amharic language text. The proposed system developed using GATE (General Architecture for Text Engineering) text processing environment using knowledge-poor approach on infrastructure domain. By knowledge-poor approach we mean we are using simple rules and gazetteer list for entity identification. Our proposed Amharic text information extractor consists of three phase’s namely preprocessing, extraction and post processing. The preprocessing phases used for handling language specific issues and setting the environment ready for extraction process. The second phase is the main unit in our model. It basically performs named entity recognition, coreference resolution and relation extraction and extract relevant text. The post processing step annotates the selected data and presents the extracted information in a structured form. Various evaluation techniques, which are used to evaluate the performance of our proposed model were used. The usual precision, recall and F-measure were used to measure the efficiency of the proposed work. We have used 24760 instances for training and testing our model. Our evaluation was conducted on name entity recognition component separately and the overall system as information extraction component. Accordingly, the system achieves the F-measure of 89.1 % on the named entity recognition and in the overall it achieves the F-measure of 89.8%. Key words: Information Extraction, Amharic Text Information Extraction, Coreference Resolution, Relation Extraction, GATEItem Information Extraction from Amharic Language Text: Knowledge-Poor Approach(Addis Ababa University, 2015-06) Worku, Bekele; Assabie, Yaregal (PHD)During the last two decades with the accelerated Internet development a great amount of data have been being accumulated and stored on the Web. We are drowns with much data at office, home either in printable or electronic form. Then finding the relevant information from this mass data is critical. At this end, information extraction is a technology which creates the structured representation of unstructured texts by extracting relevant entities from them, thereby, making the data analysis realizable. This work focuses on developing information extraction system from Amharic language text. The proposed system developed using GATE (General Architecture for Text Engineering) text processing environment using knowledge-poor approach on infrastructure domain. By knowledge-poor approach we mean we are using simple rules and gazetteer list for entity identification. Our proposed Amharic text information extractor consists of three phase’s namely preprocessing, extraction and post processing. The preprocessing phases used for handling language specific issues and setting the environment ready for extraction process. The second phase is the main unit in our model. It basically performs named entity recognition, coreference resolution and relation extraction and extract relevant text. The post processing step annotates the selected data and presents the extracted information in a structured form. Various evaluation techniques, which are used to evaluate the performance of our proposed model were used. The usual precision, recall and F-measure were used to measure the efficiency of the proposed work. We have used 24760 instances for training and testing our model. Our evaluation was conducted on name entity recognition component separately and the overall system as information extraction component. Accordingly, the system achieves the F-measure of 89.1 % on the named entity recognition and in the overall it achieves the F-measure of 89.8%. Key words: Information Extraction, Amharic Text Information Extraction, Coreference Resolution, Relation Extraction, GATEItem Online Handwritten Amharic Word Recognition Using Fisher Discriminant Analysis and Hidden Markov Model(Addis Ababa University, 2014-10) Mamo, Bekalu; Assabie, Yaregal (PHD)Technology advancement has enabled human being to use electronic devices for recognizing and processing human languages. Amharic, which is the working language of Ethiopian government and which has its own script, is also encoded into computers with available computer keyboards. The purpose of this research is to develop an online handwritten Amharic word recognition system which allows using handheld devices to engrave Amharic scripts. In this thesis, a writer independent, online Amharic word recognition is presented along with different tests for character recognition. The underlying principle for word recognition is that a word is comprised of characters. Hence by segmenting a given word into character blocks and by using character recognition engine, a given input sequence for Amharic word can be predicted. Finally, hypothesis filtering will limit the number of words hypothesized. As part of character recognition, three approaches were adopted and tested. The first one using Fishers Linear Discriminant Analysis (FDA) to discriminate vectors. The second approach is to extract features from a given input sequence using a predefined set of primitives using HMM model. And the third approach is by scanning the input sequence horizontally, vertically and hybrid of the two scanning. By taking those points into vector and by using FDA for vector classification, discriminate the characters. For training and testing of characters, data from 108 users, 264 character from each user, were used. Likewise data from 34 users, where each user wrote 200 words, is used for word recognition purpose. The result for the character recognizer diminishes as the number of character increases for the first case. For the case of HMM the character recognizer engine predicted an average of 3.94 %. Using the scanning approach, first a vector of 300 length is used and resulted in an average 40.51%, 44.41% for vertical scanning and 63.11% for the hybrid. However, when the vector size is reduced to 70 to increase operation performance, the result is impacted accordingly to 25.66% for the horizontal scanning, 18.77% for the vertical scanning and 39.85% for the hybrid approach. Word recognition using the hybrid approach resulted in 37.9% recognition performance. Keywords: Online Handwriting Recognition, Amharic Handwriting Recognition, Fisher Discriminant Analysis, Hidden Markov ModelItem Online Handwritten Amharic Word Recognition Using Fisher Discriminant Analysis and Hidden Markov Model(Addis Ababa University, 2014-10) Mamo, Bekalu; Assabie, Yaregal (PHD)Technology advancement has enabled human being to use electronic devices for recognizing and processing human languages. Amharic, which is the working language of Ethiopian government and which has its own script, is also encoded into computers with available computer keyboards. The purpose of this research is to develop an online handwritten Amharic word recognition system which allows using handheld devices to engrave Amharic scripts. In this thesis, a writer independent, online Amharic word recognition is presented along with different tests for character recognition. The underlying principle for word recognition is that a word is comprised of characters. Hence by segmenting a given word into character blocks and by using character recognition engine, a given input sequence for Amharic word can be predicted. Finally, hypothesis filtering will limit the number of words hypothesized. As part of character recognition, three approaches were adopted and tested. The first one using Fishers Linear Discriminant Analysis (FDA) to discriminate vectors. The second approach is to extract features from a given input sequence using a predefined set of primitives using HMM model. And the third approach is by scanning the input sequence horizontally, vertically and hybrid of the two scanning. By taking those points into vector and by using FDA for vector classification, discriminate the characters. For training and testing of characters, data from 108 users, 264 character from each user, were used. Likewise data from 34 users, where each user wrote 200 words, is used for word recognition purpose. The result for the character recognizer diminishes as the number of character increases for the first case. For the case of HMM the character recognizer engine predicted an average of 3.94 %. Using the scanning approach, first a vector of 300 length is used and resulted in an average 40.51%, 44.41% for vertical scanning and 63.11% for the hybrid. However, when the vector size is reduced to 70 to increase operation performance, the result is impacted accordingly to 25.66% for the horizontal scanning, 18.77% for the vertical scanning and 39.85% for the hybrid approach. Word recognition using the hybrid approach resulted in 37.9% recognition performance. Keywords: Online Handwriting Recognition, Amharic Handwriting Recognition, Fisher Discriminant Analysis, Hidden Markov ModelItem Public Transport Route Planner for Addis Ababa(Addis Ababa University, 2010-06) Andarge, Addisu; Assabie, Yaregal (PHD)Public transport route planners are widely used in many large cities of the world. Route planners are helpful in identifying the best route one should follow in order to travel between two locations. While the public transport network is extensive in Addis Ababa and the majority of the population of the city is dependant on it, there is no route planner system for the city. The desktop application developed in this project helps users to identify service routes which enable them to travel from a given origin to a destination location in Addis Ababa with less number of transfers. It models transport services offered by taxis and Anbessa city bus at 140 locations of the city. Moreover, users can view the tariff for the two transport services and update it whenever necessary. It is also allows users to see the route (as list of location names) for a given service number of Anbessa city bus. Keywords: public transport, public transport route planner, public transport of Addis Ababa, public transport route planner for Addis Ababa, transfer matrixItem Public Transport Route Planner for Addis Ababa(Addis Ababa University, 2010-06) Andarge, Addisu; Assabie, Yaregal (PHD)Public transport route planners are widely used in many large cities of the world. Route planners are helpful in identifying the best route one should follow in order to travel between two locations. While the public transport network is extensive in Addis Ababa and the majority of the population of the city is dependant on it, there is no route planner system for the city. The desktop application developed in this project helps users to identify service routes which enable them to travel from a given origin to a destination location in Addis Ababa with less number of transfers. It models transport services offered by taxis and Anbessa city bus at 140 locations of the city. Moreover, users can view the tariff for the two transport services and update it whenever necessary. It is also allows users to see the route (as list of location names) for a given service number of Anbessa city bus. Keywords: public transport, public transport route planner, public transport of Addis Ababa, public transport route planner for Addis Ababa, transfer matrix