Browsing by Author "Hailemariam, Sebsibe(PhD)"
Now showing 1 - 10 of 10
Results Per Page
Sort Options
Item Automatic Fraud Detection Model from Customs Data in Ethiopian Revenues and Customs Authority(Addis Ababa, Ethiopia, 2013-03) Muhammed, Meriem; Hailemariam, Sebsibe(PhD)C\.Jstoms, which is one of the three wings in Ethiopian Revenues and Customs Authority (ERCA), is established to secure national revenues by controlling impons and exports as well as coll ecting go~emmental tax and duties. This research focuses on identification, modeling and analysis of various conflicting issues that Ethiopian customs faces. One of the major problems identified during problem understanding is controlling and management of fraudu lent behavior of fo reign traders. The declarams' intent to various types of fraudulent activities which result in the need for serious inspection of declarations and al the same lime, the huge amount of declarations per day demand significant number of human resource and time. Recognizing this critical problem of the government, ERCA adopt Automated System for Customs DAta (ASYCUDA). ASYCUDA attempts to minimize the problems through risk level recommendation to declarations using select ivi ty method that uses five parameters from the decJarants' information. The fundamenta l problem to ASYCUDA risk leveling is, restricting the variables whi ch are used to assign risk level; this may lead to direct the declaration into incorrect channel. This research proposed a machine learning approach to model fraudulent behavior of importers through identification of appropriate parameters from the observed data to improve the quality of service at Customs, ERCA. In this research, the researcher proposed automated fraud detection models which predict fraud behaviors of importing cargos, in which the problem assoc iated with ASYCUDA risk leveling wi ll be minimized. The models have been bui lt through machine learning techniques by using the past data which was collected from customs data of ERCA. The analysis has been done on inspected cargos records having 74,033 instances and 24 attributes. Four different prediction models were proposed. The first model is fraud prediction model, which predicts whether incoming cargo is fraudulent or not. The second model is fraud category prediction model, which identifies the specific type of the fraud category among the ten identified categories. The third model is fraud level prediction model. which class ifies the fraud level as high or low. The last model is fraud ri sk level prediction model which is used to classify the risk level of importing cargos into high. medium or low. x i • • Moreover. from the recommendation of IEEE, four best machine learning approaches have been tested for each of the identified prediction models. These are C4.5, CART. KNN and Naive Bayes. Based on the results which are obtained through various experimental analyses. C4.5 is found to be the best algorithm to build all types of the prediction models. The accuracy obtained in the first, second, third and founh scenarios using C4.5 machine learning algorithms are 93.4%,84.4%, 89.4%, and 86.8% respecti vely. The next best algorithm, Classification and Regression Tree (CART), performed an accuracy of 92.9%,80. 1 %,89.4%,85.3% for the first, second, third and fourth scenarios respectively. The researchers observed that both C4.5 and CART perform better for fraud prediction and fraud level classification compared to fraud category and risk level prediction. Moreover, Naive Bayes statistical approach is found to be very poor. Key words: Fraud prediction, fraud category prediction, fraud level prediction, fraud risk level prediction, classification, machine learning algorithm, ASYCUDA.Item Automatic Fraud Detection Model from Customs Data in Ethiopian Revenues and Customs Authority(Addis Ababa, Ethiopia, 2013-03) Muhammed, Meriem; Hailemariam, Sebsibe(PhD)C\.Jstoms, which is one of the three wings in Ethiopian Revenues and Customs Authority (ERCA), is established to secure national revenues by controlling impons and exports as well as coll ecting go~emmental tax and duties. This research focuses on identification, modeling and analysis of various conflicting issues that Ethiopian customs faces. One of the major problems identified during problem understanding is controlling and management of fraudu lent behavior of fo reign traders. The declarams' intent to various types of fraudulent activities which result in the need for serious inspection of declarations and al the same lime, the huge amount of declarations per day demand significant number of human resource and time. Recognizing this critical problem of the government, ERCA adopt Automated System for Customs DAta (ASYCUDA). ASYCUDA attempts to minimize the problems through risk level recommendation to declarations using select ivi ty method that uses five parameters from the decJarants' information. The fundamenta l problem to ASYCUDA risk leveling is, restricting the variables whi ch are used to assign risk level; this may lead to direct the declaration into incorrect channel. This research proposed a machine learning approach to model fraudulent behavior of importers through identification of appropriate parameters from the observed data to improve the quality of service at Customs, ERCA. In this research, the researcher proposed automated fraud detection models which predict fraud behaviors of importing cargos, in which the problem assoc iated with ASYCUDA risk leveling wi ll be minimized. The models have been bui lt through machine learning techniques by using the past data which was collected from customs data of ERCA. The analysis has been done on inspected cargos records having 74,033 instances and 24 attributes. Four different prediction models were proposed. The first model is fraud prediction model, which predicts whether incoming cargo is fraudulent or not. The second model is fraud category prediction model, which identifies the specific type of the fraud category among the ten identified categories. The third model is fraud level prediction model. which class ifies the fraud level as high or low. The last model is fraud ri sk level prediction model which is used to classify the risk level of importing cargos into high. medium or low. x i • • Moreover. from the recommendation of IEEE, four best machine learning approaches have been tested for each of the identified prediction models. These are C4.5, CART. KNN and Naive Bayes. Based on the results which are obtained through various experimental analyses. C4.5 is found to be the best algorithm to build all types of the prediction models. The accuracy obtained in the first, second, third and founh scenarios using C4.5 machine learning algorithms are 93.4%,84.4%, 89.4%, and 86.8% respecti vely. The next best algorithm, Classification and Regression Tree (CART), performed an accuracy of 92.9%,80. 1 %,89.4%,85.3% for the first, second, third and fourth scenarios respectively. The researchers observed that both C4.5 and CART perform better for fraud prediction and fraud level classification compared to fraud category and risk level prediction. Moreover, Naive Bayes statistical approach is found to be very poor. Key words: Fraud prediction, fraud category prediction, fraud level prediction, fraud risk level prediction, classification, machine learning algorithm, ASYCUDA.Item Image Analysis for Ethiopian Coffee Classification(Addis Ababa University, 2008-01) Minassie, Habtamu; Hailemariam, Sebsibe(PhD)Ethiopia is a homeland of coffee. Coffee is a major export commodity of Ethiopia, which has a significant role in earning foreign currency. There are different varieties of coffee in Ethiopia and they are classified based on their growing region. In view of this, a digital image analysis technique based on morphological and color features was developed to classify different varieties of Ethiopian coffee based on their growing region. Sample coffees were taken from six coffee growing regions (Bale, Harar, Jimma, Limu, Sidamo and Welega) which are popular and widely planted in Ethiopia. On the average 56 images were taken from each region. The total number of images taken was 309 which contain 4844 coffee beans. For the classification analysis, ten morphological and six color features were extracted from each coffee bean images. The processing type of coffee (washed or unwashed) has been also predefined during the analysis. We have compared classification approaches of Naïve Bayes and Neural Network classifiers on each classification parameters of morphology, color and the combination of the two. To evaluate the classification accuracy, from the total of 4844 data sets 80% were used for training and the remaining 20% was used for testing. The classification system was supervised corresponding to the predefined classes of the growing regions. It was found that the classification performance of neural networks classifier was better than Naïve Bayes classifier. It was also showed that the discrimination power of morphology features was better than color features but when both morphology and color features were used together the classification accuracy was increased. The best classification accuracies (80.7%, 72.6%, 56.8%, 96.77%, 95.42% and 69.9% for Bale, Harar, Jimma, Limu, Sidamo and Welega respectively) were obtained using neural networks when both morphology and color features were used together. The overall classification accuracy was 77.4%. Keywords: Ethiopian coffee, Coffee Bean, Image Analysis, Classification, Neural NetworksItem Image Analysis for Ethiopian Coffee Classification(Addis Ababa University, 2008-01) Minassie, Habtamu; Hailemariam, Sebsibe(PhD)Ethiopia is a homeland of coffee. Coffee is a major export commodity of Ethiopia, which has a significant role in earning foreign currency. There are different varieties of coffee in Ethiopia and they are classified based on their growing region. In view of this, a digital image analysis technique based on morphological and color features was developed to classify different varieties of Ethiopian coffee based on their growing region. Sample coffees were taken from six coffee growing regions (Bale, Harar, Jimma, Limu, Sidamo and Welega) which are popular and widely planted in Ethiopia. On the average 56 images were taken from each region. The total number of images taken was 309 which contain 4844 coffee beans. For the classification analysis, ten morphological and six color features were extracted from each coffee bean images. The processing type of coffee (washed or unwashed) has been also predefined during the analysis. We have compared classification approaches of Naïve Bayes and Neural Network classifiers on each classification parameters of morphology, color and the combination of the two. To evaluate the classification accuracy, from the total of 4844 data sets 80% were used for training and the remaining 20% was used for testing. The classification system was supervised corresponding to the predefined classes of the growing regions. It was found that the classification performance of neural networks classifier was better than Naïve Bayes classifier. It was also showed that the discrimination power of morphology features was better than color features but when both morphology and color features were used together the classification accuracy was increased. The best classification accuracies (80.7%, 72.6%, 56.8%, 96.77%, 95.42% and 69.9% for Bale, Harar, Jimma, Limu, Sidamo and Welega respectively) were obtained using neural networks when both morphology and color features were used together. The overall classification accuracy was 77.4%. Keywords: Ethiopian coffee, Coffee Bean, Image Analysis, Classification, Neural NetworksItem Language Modeling for Amharic Automatic Speech Recognition Systems(Addis Ababa University, 2013-03) Mekonnen, Mulugeta; Hailemariam, Sebsibe(PhD)For automatic speech recognition and other NLP tasks to be effective, a language model plays a critical role by assigning a probability to hypothesized word sequence. Various researches have been done on acoustic modelling to improve performance of Amharic SRS with no considerable effort to supplement it with proper LM. The aim of this research is to build LM for Amharic, official language of the federal government of Ethiopia, and study how it improves the performance of Amharic SRS. Accordingly, text corpus consisted of 9,079,766 tokens is prepared and various word n-gram, class-based and interpolated LMs are built using SRILM tool. Both perplexity and WRA metrics are used to evaluate the LMs. Though LMs of order 2 to 7 were built, a tetra-gram (4-gram) LM happens to be the best n-gram LM. Relative performance of different smoothing algorithms is also compared and unmodified Kneser-Ney smoothing out smarted all others. Moreover, interpolated models performed better than back-off models. With the aim of tackling data sparsity problem, different class-based LMs are also developed using IBM clustering algorithms for automatic grouping of words into clusters. Eventually, class-based LMs performed worse than word based LMs due to its generic nature. However, interpolating class-based with word-based models leads to considerable perplexity reduction over the pure word-based and class-based LMs. The word n-gram, class-based and interpolated LMs are then finally integrated to the baseline speech recognizer which has 74.52% WRA in a lattice rescoring framework. Consequently, WRA results of 80.9%, 66.0% and 82.7% have been achieved using word based n-grams, class-based and interpolated LMs respectively. Overall, an absolute 8.18% WRA gain has been achieved as a result of applying the interpolated class-based LMs to the baseline recognizer and this clearly shows LM is an indispensable part of speech recognition task. Class-based language models resulted in improved perplexity and WRA results only when combined with word-based models. Therefore, using class-based language models as a complementary tool to the word-based models is rewarding. Keywords: Amharic language modeling, Amharic class-based language modeling, Amharic text corpus.Item Language Modeling for Amharic Automatic Speech Recognition Systems(Addis Ababa University, 2013-03) Mekonnen, Mulugeta; Hailemariam, Sebsibe(PhD)For automatic speech recognition and other NLP tasks to be effective, a language model plays a critical role by assigning a probability to hypothesized word sequence. Various researches have been done on acoustic modelling to improve performance of Amharic SRS with no considerable effort to supplement it with proper LM. The aim of this research is to build LM for Amharic, official language of the federal government of Ethiopia, and study how it improves the performance of Amharic SRS. Accordingly, text corpus consisted of 9,079,766 tokens is prepared and various word n-gram, class-based and interpolated LMs are built using SRILM tool. Both perplexity and WRA metrics are used to evaluate the LMs. Though LMs of order 2 to 7 were built, a tetra-gram (4-gram) LM happens to be the best n-gram LM. Relative performance of different smoothing algorithms is also compared and unmodified Kneser-Ney smoothing out smarted all others. Moreover, interpolated models performed better than back-off models. With the aim of tackling data sparsity problem, different class-based LMs are also developed using IBM clustering algorithms for automatic grouping of words into clusters. Eventually, class-based LMs performed worse than word based LMs due to its generic nature. However, interpolating class-based with word-based models leads to considerable perplexity reduction over the pure word-based and class-based LMs. The word n-gram, class-based and interpolated LMs are then finally integrated to the baseline speech recognizer which has 74.52% WRA in a lattice rescoring framework. Consequently, WRA results of 80.9%, 66.0% and 82.7% have been achieved using word based n-grams, class-based and interpolated LMs respectively. Overall, an absolute 8.18% WRA gain has been achieved as a result of applying the interpolated class-based LMs to the baseline recognizer and this clearly shows LM is an indispensable part of speech recognition task. Class-based language models resulted in improved perplexity and WRA results only when combined with word-based models. Therefore, using class-based language models as a complementary tool to the word-based models is rewarding. Keywords: Amharic language modeling, Amharic class-based language modeling, Amharic text corpus.Item Predictive Model for ECX Coffee Contracts(Addis Ababa University, 2014-10) Mulugeta, Frehiwot; Hailemariam, Sebsibe(PhD)Ethiopia Commodity Exchange is a commodity market that transforms the traditional agricultural marketing system into modern and transparent market. Ethiopia is known for its high quality and highly diversified type of coffee and ECX has designed detailed coffee contract and the market executes many trades for these contracts. This research aims to study the relationship between ECX coffee contract and to propose prediction model that assists the market to undertake efficient coffee trading system. The price prediction model will be used to predict the daily selling price of all coffee contracts. The prediction model was developed by the most widely used machine learning method, Artificial Neural Network. Five and half years of ECX coffee trading data have been used to analyze the problem, to train and test the models. The coffee trading data have been studied intensively by correlation coefficient and scatter plot matrixes using volume of coffee traded in the market and availability of the contract in a year. It was found that washed Sidama coffee A grade 3 (WSDA3) contract was traded in a larger volume and available throughout the year. Moreover, the contract is highly correlated with most coffee contracts. And thus WSDA3 was selected as a reference contract to represent all export coffee contracts. Coffee contracts daily price data show non-linear characteristics. Traditional statistical methods are unable to develop prediction model for non-linear data. Artificial neural network can flexibly model linear or non-linear relationship between variables. Among the artificial network algorithms; the radial basis function neural network (RBF) and multilayer perceptron neural network (MLP) are used to approximate any linear or non-linear function. MLP and RBF methods were employed to develop coffee contracts price prediction model. Three experiments were designed to build the coffee contract price prediction models. For washed Sidama coffee, for unwashed Sidama coffee contracts and for contracts different from Sidama origin. The performance of the models was evaluated on the test data set by coefficient of determination and mean squared error. The experimental result reveals that large R2 values with small variance were obtained in MLP based models than RBF Based models. Moreover, the smallest MSE with small variance is observed in MLP based models as compared to models constructed by RBF algorithm. The ix results obtained from the study showed that the MLP networks are capable of predicting the daily price of coffee contracts than the RBF networks because MLP networks are global function approximators. In MLP base models, the largest R2 with smallest variance is achieved in Sidama washed coffee and different origin washed coffee contracts. Similarly, MLP based models the smallest MSE with minimum variance is achieved Sidama washed coffee and different origin washed coffee. Sidama washed coffee and different origin washed coffee contracts respectively. The accuracy results of washed coffee contracts using MLP algorithms are higher than unwashed contracts. Generally, coffee contract that belong to the same processing type to the reference contract (WSDA3) has higher accuracy result than that of contract in different processing type. Key words: ANN, Coffee Contract, ECX, MLP, Machine Learning, RBF, Price PredictionItem Predictive Model for ECX Coffee Contracts(Addis Ababa University, 2014-10) Mulugeta, Frehiwot; Hailemariam, Sebsibe(PhD)Ethiopia Commodity Exchange is a commodity market that transforms the traditional agricultural marketing system into modern and transparent market. Ethiopia is known for its high quality and highly diversified type of coffee and ECX has designed detailed coffee contract and the market executes many trades for these contracts. This research aims to study the relationship between ECX coffee contract and to propose prediction model that assists the market to undertake efficient coffee trading system. The price prediction model will be used to predict the daily selling price of all coffee contracts. The prediction model was developed by the most widely used machine learning method, Artificial Neural Network. Five and half years of ECX coffee trading data have been used to analyze the problem, to train and test the models. The coffee trading data have been studied intensively by correlation coefficient and scatter plot matrixes using volume of coffee traded in the market and availability of the contract in a year. It was found that washed Sidama coffee A grade 3 (WSDA3) contract was traded in a larger volume and available throughout the year. Moreover, the contract is highly correlated with most coffee contracts. And thus WSDA3 was selected as a reference contract to represent all export coffee contracts. Coffee contracts daily price data show non-linear characteristics. Traditional statistical methods are unable to develop prediction model for non-linear data. Artificial neural network can flexibly model linear or non-linear relationship between variables. Among the artificial network algorithms; the radial basis function neural network (RBF) and multilayer perceptron neural network (MLP) are used to approximate any linear or non-linear function. MLP and RBF methods were employed to develop coffee contracts price prediction model. Three experiments were designed to build the coffee contract price prediction models. For washed Sidama coffee, for unwashed Sidama coffee contracts and for contracts different from Sidama origin. The performance of the models was evaluated on the test data set by coefficient of determination and mean squared error. The experimental result reveals that large R2 values with small variance were obtained in MLP based models than RBF Based models. Moreover, the smallest MSE with small variance is observed in MLP based models as compared to models constructed by RBF algorithm. The ix results obtained from the study showed that the MLP networks are capable of predicting the daily price of coffee contracts than the RBF networks because MLP networks are global function approximators. In MLP base models, the largest R2 with smallest variance is achieved in Sidama washed coffee and different origin washed coffee contracts. Similarly, MLP based models the smallest MSE with minimum variance is achieved Sidama washed coffee and different origin washed coffee. Sidama washed coffee and different origin washed coffee contracts respectively. The accuracy results of washed coffee contracts using MLP algorithms are higher than unwashed contracts. Generally, coffee contract that belong to the same processing type to the reference contract (WSDA3) has higher accuracy result than that of contract in different processing type. Key words: ANN, Coffee Contract, ECX, MLP, Machine Learning, RBF, Price PredictionItem Selecting Appropriate Amharic Unit for Domain Specific Speech Synthesis: A Case for Mobile Phones(Addis Ababa University, 2008-10) Petros, Workagegnehu; Hailemariam, Sebsibe(PhD)Speech synthesis – the production of artificial speech – has a lot of applications. Applying speech synthesis onto mobile phones for Amharic language will be an important success in language technology. Mobile phones are characterized by smaller memory and processing capacity. The choice of a unit for concatenation has an impact on the quality of the synthetic speech produced, the size of the database that is used to store the speech units, and also the time required to synthesize a speech. In this thesis, three Amharic units: phonemes, diphones, and syllables are compared. Analysis is done on these units in terms of naturalness, intelligibility, memory requirement, and processing time. The result shows that diphone based speech synthesis approach is the appropriate alternative since it requires less memory and time, and provides reasonably acceptable naturalness and intelligibility. The overall Mean Opinion Score obtained for intelligibility and naturalness is 4.10 and 3.69, respectively. Keywords: Speech Synthesis, Mobile Phones, Diphones, Amharic LanguageItem Selecting Appropriate Amharic Unit for Domain Specific Speech Synthesis: A Case for Mobile Phones(Addis Ababa University, 2008-10) Petros, Workagegnehu; Hailemariam, Sebsibe(PhD)Speech synthesis – the production of artificial speech – has a lot of applications. Applying speech synthesis onto mobile phones for Amharic language will be an important success in language technology. Mobile phones are characterized by smaller memory and processing capacity. The choice of a unit for concatenation has an impact on the quality of the synthetic speech produced, the size of the database that is used to store the speech units, and also the time required to synthesize a speech. In this thesis, three Amharic units: phonemes, diphones, and syllables are compared. Analysis is done on these units in terms of naturalness, intelligibility, memory requirement, and processing time. The result shows that diphone based speech synthesis approach is the appropriate alternative since it requires less memory and time, and provides reasonably acceptable naturalness and intelligibility. The overall Mean Opinion Score obtained for intelligibility and naturalness is 4.10 and 3.69, respectively. Keywords: Speech Synthesis, Mobile Phones, Diphones, Amharic Language