The Application of Machine Learning Technique (Naïve Bayes) For Automatic Text Summarization [The Case of Amharic News Texts]

No Thumbnail Available

Date

2005-06

Journal Title

Journal ISSN

Volume Title

Publisher

Addis Ababa University

Abstract

This study presents an approach to automatic summarization of Amharic news texts by extracting sentences in a give n document. The objective o f this study is to investigate the application of machine learning technique (naive Bayes method) to automatic summarization of Amharic news items. The focus is on how to use the naive Bayes classifier for automatic Amharic news text summarization to extra ct sentences , i. e. on how to train the na'ive Bayes to classify sentences from Amharic news texts First each sentence is represented by a set of p redefined features (attributes) (i .e. location of a sentence in a document, title words occurring in the sentence, and cue words occurring in the sentence) that Edmondson (1969) found as a good indicator in giving an optimum summary for scientific papers. In addition, the thematic words occurring in the sentence. Then the naive Bayes algorithm is used to train to classify sentences as "a summary" and "not - a summary" based on the feature vectors. For the purpose of this study 480 Amharic news articles is used . Evaluation of the result s of the experiments is done using 10-fold cross validation. Result of the experiment shows that the location feature gives the best result in the classification n o f sentences when using individual features. The results of different combinations of feature sets in which location feature is included shows better results than when location is not included. Based on the feature values estimated on the training program for the combination of all the features a prototype summarizer is developed which extracts sentences to a desired compression level.

Description

Keywords

Information Science

Citation