Unsupervised Machine Learning Approach for Word sense Disambiguation to Amharic Words

Assemu Solomon

Unsupervised Machine Learning Approach for Word sense Disambiguation to Amharic Words

dc.contributor.advisor	Abebe Ermias (Ato)
dc.contributor.author	Assemu Solomon
dc.date.accessioned	2018-11-30T11:29:24Z
dc.date.accessioned	2023-11-18T12:44:11Z
dc.date.available	2018-11-30T11:29:24Z
dc.date.available	2023-11-18T12:44:11Z
dc.date.issued	2011-06
dc.description.abstract	Word Sense Disambiguation (WSD) in text is still a difficult problem as the best supervised methods require laborious and costly manual preparation of tagged training data. This work presents a corpus based approach to word sense disambiguation that only requires information that can be automatically extracted from untagged text. We use unsupervised techniques to address the problem of automatically deciding the correct sense of an ambiguous word based on its surrounding context. It was motivated by its use in many crucial applications such as Information Retrieval (IR), Information Extraction (IE), Machine Translation (MT), etc. For this study, we report experiments on five selected Amharic ambiguous words, these are አጠና (eTena), መሳል (mesal), መሣሣት (me`sa`sat), መጥራት (metrat), and ቀረጸ (qereSe). For the purposes of this research, unsupervised machine learning technique was applied to a corpus of Amharic sentences so as to acquire disambiguation information automatically. A total of 1045 English sense examples for the five ambiguous words were collected from British National Corpus (BNC). The sense examples were translated to Amharic using the Amharic-English dictionary and preprocessed to make it ready for experimentation. We tested five clustering algorithms (simple k means, hierarchical agglomerative: Single, Average and complete link and Expectation Maximization algorithms) in the existing implementation of Weka 3.6.4 package. “Class to cluster” evaluation mode was selected to learn the selected algorithms in the preprocessed dataset. The achieved result was encouraging, because best clustering algorithms were close in terms of accuracy of supervised machine learning approaches on the same dataset, using the same features. But, further experiments for other ambiguous words and using different approaches will be needed for a better natural language understanding of Amharic language.	en_US
dc.identifier.uri	http://etd.aau.edu.et/handle/12345678/14757
dc.language.iso	en	en_US
dc.publisher	Addis Ababa University	en_US
dc.subject	Machine Learning	en_US
dc.title	Unsupervised Machine Learning Approach for Word sense Disambiguation to Amharic Words	en_US
dc.type	Thesis	en_US

Files

Original bundle

Now showing 1 - 1 of 1

Name:: Solomon Assemu.pdf
Size:: 829.4 KB
Format:: Adobe Portable Document Format

Download

License bundle

Now showing 1 - 1 of 1

Name:: license.txt
Size:: 1.71 KB
Format:: Plain Text
Description:

Download

Collections

Information Sciences