Information Extraction Model from Amharic News Texts

dc.contributor.advisorAtnafu, Solomon (PhD)
dc.contributor.authorTsedalu, Getasew
dc.date.accessioned2018-06-20T06:55:39Z
dc.date.accessioned2023-11-04T12:22:22Z
dc.date.available2018-06-20T06:55:39Z
dc.date.available2023-11-04T12:22:22Z
dc.date.issued2010-11
dc.description.abstractAs the growth of unstructured documents in the web and intranet is increasing from time to time, a tool that can extract relevant data to facilitate decision making is becoming crucial. IE is concerned with extraction of relevant information from text and stores them in a database for easy use and management of the data. As the first comprehensive work on IE from Amharic text we designed a model that is genuine enough to deal with different domains in the Amharic language. The proposed model has document preprocessing, text categorization, learning and extraction and post processing as its main components. The document preprocessing component handles the normalization of the document while text categorization and learning and extraction handle the categorization of the news text and extracting the predefined relevant information from the categorized text respectively. The post processing component format and save the extracted data to the database. Various evaluation techniques, which are used to evaluate the performance of the classifier machine learning algorithms, are used for IE and text categorization. Among the different classifier machine learning algorithms used for text categorization component, the Naïve Bayes algorithm performs by correctly classifying 92.83% of the 1200 news texts used as a dataset. On the other hand, 1422 instances are used for training and testing the Information Extraction component. Different scenarios are used to evaluate the role of the different features in predicting the category for the candidate texts. Among the different scenarios we considered and the different machine learning algorithms we employed the SMO algorithm correctly classified 94.58% of the instances correctly, when all the features are considered which yields higher precision and recall rate for the different attributes considered for extraction. Key words: Amharic Text Information extraction, Machine Learning Approach to Information Extraction, Amharic Text Categorization, Information Extractionen_US
dc.identifier.urihttp://etd.aau.edu.et/handle/123456789/2007
dc.language.isoenen_US
dc.publisherAddis Ababa Universityen_US
dc.subjectAmharic Text Information Extraction; Machine Learning Approach to Information Extraction; Amharic Text Categorization; Information Extractionen_US
dc.titleInformation Extraction Model from Amharic News Textsen_US
dc.typeThesisen_US

Files

Original bundle
Now showing 1 - 1 of 1
No Thumbnail Available
Name:
Getasew Tsedalu.pdf
Size:
580.94 KB
Format:
Adobe Portable Document Format
License bundle
Now showing 1 - 1 of 1
No Thumbnail Available
Name:
license.txt
Size:
1.71 KB
Format:
Plain Text
Description:

Collections