Automatic Amharic Text Summarization Using Latent Semantic Analysis
No Thumbnail Available
Date
2009-10
Authors
Journal Title
Journal ISSN
Volume Title
Publisher
Addis Ababa University
Abstract
With the continuous increase in the number of electronic documents, the need for faster techniques to assess the relevance of documents emerges. An ideal summary is one that conveys to the reader the main themes of the document and consequently the reader can determine whether the complete document is of any relevance. Automatic Text Summarization is a technique where a program summarizes a text. A text is given to the program and the program returns a shorter and less redundant extract of the original text. In this thesis, two generic text summarization methods that create text summaries by ranking and extracting sentences from the original documents are proposed. The first method, TopicLSA, employs Latent Semantic Analysis (LSA) to identify the main topics of a document. The identified topics along with document genre information are used to select semantically important sentences for summary generation. The second method, LSAGraph, combines Latent Semantic Analysis with graph-based ranking algorithms to compute the relevance of sentences for summary inclusion. Moreover, LSAGraph uses document genre information to penalize sentences that do not belong to the main topic of the document. In order to evaluate the performance of the proposed summarization methods, a prototype Amharic news text summarization system is built based on the proposed methods. Evaluation of the summarization system is then conducted by comparing the system‟s summaries with manual summaries that are generated by six independent human evaluators. Despite the very different approaches taken by the proposed methods to generate a summary, both produced quite comparable performance scores. To have an idea of the relative success of the proposed summarization methods, evaluation of the summarization system also included comparison of the proposed methods with previous summarization methods based on LSA and graph-based ranking algorithms. The results of the evaluation have shown that the proposed summarization methods have performed significantly better than previous summarization methods based on LSA and graph-based ranking algorithms.
Keywords: Summarization, Latent Semantic Analysis, Graph-based ranking algorithms.
Description
Keywords
Summarization; Latent Semantic Analysis; Graph-Based Ranking Algorithms