Modeling an Automatic Amharic text Summarizer: Abstractive Approach

No Thumbnail Available

Date

9/1/2016

Journal Title

Journal ISSN

Volume Title

Publisher

Addis Ababa University

Abstract

The need for automatic text summarization systems increase as the number of electronic documents that deal with specific information increases in the web. The two basic approaches of text summarization systems are extractive and abstractive. Extractive approach is based on selecting the most important sentences from the input document using different algorithms and presents the selected sentences as a summary for the input document. The abstractive approach for text summarization tries to generate novel sentences that may not be present in the input document but still represent the main idea of the input document. The abstractive approach is based on the semantic representation of input sentences. This thesis proposes an automatic Amharic text summarizer using abstractive approach based on the Universal Networking Language (UNL) which is one of the semantic representations of natural language sentences. We use different components that are related with UNL representation. Related sentences in the input document are clustered and each cluster will have its own generated sentence to be used as a summary. Thus, the number of summary sentences is based on the number of clusters formed from the input document. The text preprocessing stage which involves processes like normalization, stop-word removal and stemming makes the input data suitable for clustering component by giving the root forms or stems from the relevant words of an input sentence. The conversion between the natural language sentence and the UNL expression are done using the EnConversion or DeConversion rules together with the morphological properties of each of the words in an input sentence. There is also another component which is UNL analysis that is used for providing the common UNL expression from a group of UNL expressions. In order to evaluate the performance of the proposed system, we use Amharic input documents and human evaluators that are going to evaluate based on different parameters. The parameters used to evaluate the performance of the system are the grammar of the summary sentences and the idea represented in the summary. The results of the evaluation are promising since we use the subjective evaluation of summary sentences.

Description

Keywords

Text Summarization, Universal Networking Language (UNL), Enconversion, Deconversion

Citation

Collections