The Application of Decislon Tree f or Part of Speech (Pos) T Agging for Amharic

dc.contributor.advisorAbebe Ermias (Ato)
dc.contributor.authorKebede Gebeyehu
dc.date.accessioned2020-06-03T10:44:03Z
dc.date.accessioned2023-11-18T12:45:08Z
dc.date.available2020-06-03T10:44:03Z
dc.date.available2023-11-18T12:45:08Z
dc.date.issued2009-09
dc.description.abstractAutomatic understanding of natural languages requires a set of language processing tools. POS tagger, which assigns the proper part s of speech (like noun , verb, adjective, etc) to word s in a sentence, is one of these tool s. T h is stud y in vest gates the possibility of applying decision tree based POS tagger for Amharic . The tagger was developed us in g j48 decision tree c classifier algorithm , which is Weka's implementation ofC4.5 algorithm in the process, a corpus developed b y ELRC annotation team was used to get the required data for training and testing the model s . The datasets is comprised of 10 6 5 news documents ; 2 10 ,000 words. A sample o f some 800 sentences are selected and used for model development and evaluation . The datasets was processed in line with the requirements of the Weka's data mining tool. In order to support decision tree classification mode is, a table that contain s the contextual and orthographic information is constructed semi-automatically and used as training and testing datasets The right and left neighboring words tags for each word are used as contextual information. Moreover, orthographic information abut the word like the first and last character, the prefix and suffix, existence of rim e riding it within the word and so o n are included in the table to provide useful information to the word to be tagged. Performance tests we re conducted at various stages using 10-fold cross validation test option. Experimental results show that, only two successive left and rig ht words tag pro v id e useful contextual information; contextual information beyond t woodiest provide useful information rather noise. In the end , a n over all ,including ambiguous us and unknown word s, 84.9% correctness (or accuracy) was obtained us in g 10- fold cross validation test option. Even though , the accuracy of this stud y is encouraging further study to improve the accuracy so a s to reach at implementation level is recommended. .en_US
dc.identifier.urihttp://etd.aau.edu.et/handle/12345678/21414
dc.language.isoenen_US
dc.publisherAddis Ababa Universityen_US
dc.subjectInformation Scienceen_US
dc.titleThe Application of Decislon Tree f or Part of Speech (Pos) T Agging for Amharicen_US
dc.typeThesisen_US

Files

Original bundle
Now showing 1 - 1 of 1
No Thumbnail Available
Name:
Gebeyehu Kebede.pdf
Size:
24.51 MB
Format:
Adobe Portable Document Format
License bundle
Now showing 1 - 1 of 1
No Thumbnail Available
Name:
license.txt
Size:
1.71 KB
Format:
Plain Text
Description: