The Application of Machine Learning Technique (Naïve Bayes) For Automatic Text Summarization [The Case of Amharic News Texts]
No Thumbnail Available
Date
2005-06
Authors
Journal Title
Journal ISSN
Volume Title
Publisher
Addis Ababa University
Abstract
This study presents an approach to automatic summarization of Amharic news
texts by extracting sentences in a give n document. The objective o f this study is
to investigate the application of machine learning technique (naive Bayes
method) to automatic summarization of Amharic news items. The focus is on
how to use the naive Bayes classifier for automatic Amharic news text
summarization to extra ct sentences , i. e. on how to train the na'ive Bayes to
classify sentences from Amharic news texts
First each sentence is represented by a set of p redefined features (attributes)
(i .e. location of a sentence in a document, title words occurring in the sentence,
and cue words occurring in the sentence) that Edmondson (1969) found as a
good indicator in giving an optimum summary for scientific papers. In addition,
the thematic words occurring in the sentence. Then the naive Bayes algorithm is
used to train to classify sentences as "a summary" and "not - a summary" based
on the feature vectors.
For the purpose of this study 480 Amharic news articles is used . Evaluation of
the result s of the experiments is done using 10-fold cross validation. Result of
the experiment shows that the location feature gives the best result in the
classification n o f sentences when using individual features. The results of different
combinations of feature sets in which location feature is included shows better
results than when location is not included.
Based on the feature values estimated on the training program for the
combination of all the features a prototype summarizer is developed which
extracts sentences to a desired compression level.
Description
Keywords
Information Science