Topic-Based Tigrigna Text Summarization Using Wordnet
No Thumbnail Available
Date
2017-10-02
Authors
Journal Title
Journal ISSN
Volume Title
Publisher
Addis Ababa University
Abstract
Text summarization is the process of distilling most relevant information from a large volume of a document and this makes a text summarization popular.
This thesis presents topic-based automatic Tigrigna text summarization using WordNet which selects sentences on the basis of their semantic similarity and topics of a document. The topic modeling approach uses the term by concept matrix that results from probabilistic latent semantic analysis model which selects keyword based features that best summarize the document and the WordNet helps us to measure semantic similarity between occurring terms since it reduces redundancy to minimum. The comparison is conducted by calculating the similarity between keywords and synsets to obtain the most relevant sentences from the original document. We experiment with news articles and the summary includes the first sentence of the document since news reports tend to put important information in the beginnings of the document rather than at the end.
We evaluated the proposed algorithm for precision/recall at 25% extraction rate summarization and the proposed approach gives us best result 0.5014 precision/recall of the original text. In addition to that we compared our systems with previous summarization methods that have been developed for other languages and our approach improves the summarization quality significantly.
Description
Keywords
Tigrigna Text Summarization, Probabilistic Latent Semantic Analysis, Wordnet, Semantic Similarity, Synsets