Automatic Summarization for Amharic Text Using Open Text Summarizer

No Thumbnail Available



Journal Title

Journal ISSN

Volume Title


Addis Ababa University


Information overload is a problem in this information era due to the mass production of information in many formats which is enhanced by the internet technology. Amharic text documents are part of this mass production. In order to extract the useful information from a given text document with in short time, automatic text summarization plays a decisive role. There are quite a few researches done for Amharic text summarization but still more research needs to be done to accomplish better result achieved in other languages like English. The objective of this study is therefore to investigate the applicability of the open text summarizer for Amharic news text summarization. The system is an open source, language independent single document text summarization tool. It uses combinations of term frequency and sentence position methods to rank the sentences of the article. 40 news articles on different issues are gathered from EPA, WIC and RANP web pages from which a corpus containing 30 news articles is prepared for the experimentation. Some modifications were made on the interface of the tool that was designed in C# programming language. The OTS tool is customized in two ways for performing the two experiments. The first one is done without changing the code of the tool significantly, but with few modifications on the punctuation rules and by preparing the dictionary file that holds the Amharic language lexicons. The system uses language specific lexicons which include list of affixes, abbreviations, stop words, synonyms, compound words and other rules. The second one is done by changing the Porter stemmer of the tool with an Amharic stemmer. The experiment is done on both systems by generating 90 summaries for each news article at 10%, 20% and 30% extraction rates. The performance of the two systems is evaluated using subjective and objective evaluation. Subjective evaluation is done for 45 summaries extracted in experiment one and good result is obtained. Objective evaluation is done for all the summaries generated in both experiments by comparing them with an ideal manual summary using F-measure. The highest score for the first experiment is 75.65% at the 30% extraction rate for middle size articles and a corpus average score of 66.23% has been achieved whereas for experiment two it is 72.83% at the extraction rate of 30% for the large size news articles and a corpus average score of 72.37%. The system with Amharic stemmer gave better performance than the other regardless of the size of the original article in a given extraction rate with better average corpus score at 20% and 30%. The system also showed regularity in performance improvement as the extraction rate increases.



Information overload is a problem