Afaan Oromo Text Summarization Using Word Embedding

dc.contributor.advisorAssabie, Yaregal (PhD)
dc.contributor.authorTashoma, Lamesa
dc.date.accessioned2021-03-26T11:20:36Z
dc.date.accessioned2023-11-04T12:23:08Z
dc.date.available2021-03-26T11:20:36Z
dc.date.available2023-11-04T12:23:08Z
dc.date.issued11/4/2020
dc.description.abstractNowadays we are overloaded by information as technology is growing. This causes a problem to identify which information is reading worthy or not. To solve this problem, Automatic Text Summarization has emerged. It is a computer program that summarizes text by removing redundant information from the input text and produces a shorter non-redundant output text. This study deals with development of a generic automatic text summarizer for Afaan Oromo text using word embedding. Language specific lexicons like stop words and stemmer are used to develop the summarizer. A graph-based PageRank is used to select the summary of worthy sentences out of the document. To measure the similarities between sentences cosine similarity is used. The data used in this work was collected from both secondary and primary sources. Afaan Oromo stop word list, suffix and other language specific lexicons are gathered from previous works done on Afaan Oromo. To develop a Word2Vec model we have gathered different Afaan Oromo texts from different sources like: Internet, organizations and individuals. For validation and testing 22 different newspaper topics are collected, from this, 13 of them have been used for validation while the rest 9 were employed for testing purpose. The system has been evaluated based on three experimental scenarios and evaluation is made both subjectively and objectively. The subjective evaluation focuses on evaluation of the structure of the summary like informativeness of the summary, coherence, referential clarity, non-redundancy and grammar. In the objective evaluation we used metrics like precision, recall and F-measure. The result of subjective evaluation is 83.33% informativeness, 78.8% referential integrity and grammar, and 76.66% structure and coherence. This work also achieved 0.527 precision, 0.422 recall and 0.468 F-measure by using the data we gathered. However, the overall performance of the summarizer outperformed by 0.648 precision, 0.626 recall and 0.058 F-measure when compared with the previous works by using the same data used in their work.en_US
dc.identifier.urihttp://etd.aau.edu.et/handle/123456789/25715
dc.language.isoenen_US
dc.publisherAddis Ababa Universityen_US
dc.subjectAutomatic Text Summarizationen_US
dc.subjectWord Embeddingen_US
dc.subjectSentence Vectoren_US
dc.subjectPageranken_US
dc.subjectCosine Similarityen_US
dc.titleAfaan Oromo Text Summarization Using Word Embeddingen_US
dc.typeThesisen_US

Files

Original bundle
Now showing 1 - 1 of 1
No Thumbnail Available
Name:
Lamesa Tashoma 2020.pdf
Size:
1.31 MB
Format:
Adobe Portable Document Format
License bundle
Now showing 1 - 1 of 1
No Thumbnail Available
Name:
license.txt
Size:
1.71 KB
Format:
Plain Text
Description:

Collections