Afan Oromo news text summarizer

No Thumbnail Available



Journal Title

Journal ISSN

Volume Title


Addis Ababa University


Information overload is a global problem that requires solution. Automatic text summarization is one of the natural language processing technologies that have got researchers focus to help information users. It is a computer program that summarizes a text. A summarizer removes redundant information from the input text and produces a shorter non-redundant output text. In this study, a generic automatic text summarizer for Afan Oromo news text has been developed based upon the Open Text Summarizer (OTS). OTS summarizes texts in English, German, Spanish, Russian, Hebrew, Esperanto and other languages. For this master’s thesis most of the work done is customizing the OTS code so that it can make use of the Afan Oromo lexicons and work for the Afan Oromo language. The summarizer basically uses the combinations of term frequency and sentence position methods with language specific lexicons in order to identify the most important sentence for extractive summary. In this study we have developed three methods for Afan Oromo news text summarization and tested their performance both objectively and subjectively. These three summarizers are: M1 that uses term frequency and position methods without Afan Oromo stemmer and other lexicons (synonyms and abbreviations), M2 is a summarizer with combination of term frequency and position methods with Afan Oromo stemmer and language specific lexicons (synonyms and abbreviations) and M3 is with improved position method and term frequency as well as the stemmer and language specific lexicons (synonyms and abbreviations). The performance of the summarizers was measured based on subjective as well as objective evaluation methods. The result of objective evaluation shows that the three summarizers: M1, M2 and M3 registered f-measure values of 34%, 47% and 81% respectively i.e. M3 outperformed the two summarizers ( M1 and M2 ) by 47% and 34 % . Moreover, the subjective evaluation result shows that the three summarizers’ (M1, M2 and M3) performances with informativeness, linguistic quality and coherence and structure are: (34.37 %, 37%, and 62.5%), (59.37%, 60% and 65%) and (21.87%, 28.12% and 75%) respectively as it is judged by human evaluators. In both subjective and objective evaluation, the results are consistent. Summarizer M3 that uses the combination of term frequency and improved position methods outperform other summarizers followed by M2.



global problem that requires solution.