Information Extraction Model from Amharic News Texts

Tsedalu, Getasew

Information Extraction Model from Amharic News Texts

dc.contributor.advisor	Atnafu, Solomon (PhD)
dc.contributor.author	Tsedalu, Getasew
dc.date.accessioned	2018-06-20T06:55:39Z
dc.date.accessioned	2023-11-04T12:22:22Z
dc.date.available	2018-06-20T06:55:39Z
dc.date.available	2023-11-04T12:22:22Z
dc.date.issued	2010-11
dc.description.abstract	As the growth of unstructured documents in the web and intranet is increasing from time to time, a tool that can extract relevant data to facilitate decision making is becoming crucial. IE is concerned with extraction of relevant information from text and stores them in a database for easy use and management of the data. As the first comprehensive work on IE from Amharic text we designed a model that is genuine enough to deal with different domains in the Amharic language. The proposed model has document preprocessing, text categorization, learning and extraction and post processing as its main components. The document preprocessing component handles the normalization of the document while text categorization and learning and extraction handle the categorization of the news text and extracting the predefined relevant information from the categorized text respectively. The post processing component format and save the extracted data to the database. Various evaluation techniques, which are used to evaluate the performance of the classifier machine learning algorithms, are used for IE and text categorization. Among the different classifier machine learning algorithms used for text categorization component, the Naïve Bayes algorithm performs by correctly classifying 92.83% of the 1200 news texts used as a dataset. On the other hand, 1422 instances are used for training and testing the Information Extraction component. Different scenarios are used to evaluate the role of the different features in predicting the category for the candidate texts. Among the different scenarios we considered and the different machine learning algorithms we employed the SMO algorithm correctly classified 94.58% of the instances correctly, when all the features are considered which yields higher precision and recall rate for the different attributes considered for extraction. Key words: Amharic Text Information extraction, Machine Learning Approach to Information Extraction, Amharic Text Categorization, Information Extraction	en_US
dc.identifier.uri	http://etd.aau.edu.et/handle/123456789/2007
dc.language.iso	en	en_US
dc.publisher	Addis Ababa University	en_US
dc.subject	Amharic Text Information Extraction; Machine Learning Approach to Information Extraction; Amharic Text Categorization; Information Extraction	en_US
dc.title	Information Extraction Model from Amharic News Texts	en_US
dc.type	Thesis	en_US

Files

Original bundle

Now showing 1 - 1 of 1

Name:: Getasew Tsedalu.pdf
Size:: 580.94 KB
Format:: Adobe Portable Document Format

Download

License bundle

Now showing 1 - 1 of 1

Name:: license.txt
Size:: 1.71 KB
Format:: Plain Text
Description:

Download

Collections

Computer Science