Automatic Text Summarizer for Tigrinya Language

No Thumbnail Available

Date

2017-02-07

Journal Title

Journal ISSN

Volume Title

Publisher

Addis Ababa University

Abstract

With the continous increase in the number of electronic documents the need for faster techniques to asses the relevance of documents emerges. An ideal summary is one that conveys to the reader the main themes of the document and consequently the rader can determine weather the complete document does have any relevance. Automatic text summarization is a technique where a program summarizes a longer text to a shorter and non redundant extract of the original text. In this thesis, two generic text summarization methods that create text summaries by ranking and extracting sentences from the original documents are proposed. The first method is term frequency that employs frequency of word to identify the relevant sentences that contains the frequent words. The frequent words are top frequent words from the original document and sentences intersections of the top frequent words are important sentences for summary generation. The second method title words identify the title words of the document and extract sentences that contain the title words to include in the summary. For experimenting purpose we have used 30 news articles, which are collected from the sources of aiga forum and dmtsi woyane tigray web sites. Evaluation of the summarization system is then conducted by comparing the the system’s summaries with manual summaries that are generated by human evaluators. According to the experimentation done the system registered 0.46(46%), 0.47(46%) and 0.46(46%) for recall, precision and F-Score respectively for the feature of term frequency. In the case of title word the registered recall, precision and F-Score values were 0.46(46%), 0.50(50%) and 0.48(48%) respectively which shows the improvement of the summarizer with this method. In general according to the experiment results show the best performer feature was title word than term frequency in both subjective and objective evaluations. The challenging task in the study was lack of standardized and well prepared Tigrinya corpus which required conducting conclusive experimentation of the proposed system and these will be future research directions in this area which contribute in the improvement of the system.

Description

Keywords

Summarizer for Tigrinya Language

Citation