Skip navigation
 

Please use this identifier to cite or link to this item: http://hdl.handle.net/123456789/16493
Title: Automatic Text Summarizer for Tigrinya Language
???metadata.dc.contributor.*???: Dr. Wondwossen Mulugeta
Amiha, Guesh
Issue Date: Feb-2017
Publisher: Addis Ababa University
Abstract: With the continous increase in the number of electronic documents the need for faster techniques to asses the relevance of documents emerges. An ideal summary is one that conveys to the reader the main themes of the document and consequently the rader can determine weather the complete document does have any relevance. Automatic text summarization is a technique where a program summarizes a longer text to a shorter and non redundant extract of the original text. In this thesis, two generic text summarization methods that create text summaries by ranking and extracting sentences from the original documents are proposed. The first method is term frequency that employs frequency of word to identify the relevant sentences that contains the frequent words. The frequent words are top frequent words from the original document and sentences intersections of the top frequent words are important sentences for summary generation. The second method title words identify the title words of the document and extract sentences that contain the title words to include in the summary. For experimenting purpose we have used 30 news articles, which are collected from the sources of aiga forum and dmtsi woyane tigray web sites. Evaluation of the summarization system is then conducted by comparing the the system’s summaries with manual summaries that are generated by human evaluators. According to the experimentation done the system registered 0.46(46%), 0.47(46%) and 0.46(46%) for recall, precision and F-Score respectively for the feature of term frequency. In the case of title word the registered recall, precision and F-Score values were 0.46(46%), 0.50(50%) and 0.48(48%) respectively which shows the improvement of the summarizer with this method. In general according to the experiment results show the best performer feature was title word than term frequency in both subjective and objective evaluations. The challenging task in the study was lack of standardized and well prepared Tigrinya corpus which required conducting conclusive experimentation of the proposed system and these will be future research directions in this area which contribute in the improvement of the system.
Description: A Thesis Submitted to College of Natural Science of Addis Ababa University in Partial Fulfillment of the Requirement for the Degree of Master of Science in Information Science
URI: http://hdl.handle.net/123456789/16493
Appears in Collections:Thesis - Information Science

Files in This Item:
File Description SizeFormat 
Guesh Amiha Birhanu.pdf880.02 kBAdobe PDFView/Open
Show full item record


Items in DSpace are protected by copyright, with all rights reserved, unless otherwise indicated.