Compression of Amharic Text Using Prediction by Partial Match (Ppm) Context-Modeling Algorithm

dc.contributor.advisorDereje, Hailemariam (PhD)
dc.contributor.authorYalemsew, Abate
dc.date.accessioned2019-10-21T08:16:18Z
dc.date.accessioned2023-11-28T14:09:10Z
dc.date.available2019-10-21T08:16:18Z
dc.date.available2023-11-28T14:09:10Z
dc.date.issued2019-05
dc.description.abstractA recent study on entropy estimation of Amharic language showed that its 16-bit representation in Universal Transformation Format (UTF-8) very high as compared to the entropy of the language. The study showed a minimum of 1.074 𝑏𝑖𝑡𝑠/𝑠𝑦𝑚𝑏𝑜𝑙 and a maximum of 7.981 𝑏𝑖𝑡𝑠/𝑠𝑦𝑚𝑏𝑜𝑙 can be sufficient for transmission of text sources written in Amharic through telecom networks. In digital communication, the source encoding operation produces a compressed representation of an information source for efficient utilization of communication resources like bandwidth and energy. Practical source encoding approaches in text compression use Statistical Language Models (SLMs) based on Markov process to model redundancies exhibited in a language. The Prediction by Partial Match (PPM) context-modeling algorithm is capable of high compression rates and is well suited for multiple alphabet sources like textual data. PPM adaptively combines different order Markov models to capture dependencies between successive symbols in a text. In this thesis, the PPM algorithm is used to show the advantages gained by context-modeling techniques in Amharic text source encoding and demonstrate how close practical compression gets to estimated entropy of Amharic language. Two Versions of the PPM algorithm; namely PPMC and PPMD were used to model and encode eight source files written in Amharic. It is shown that the optimum order for efficient encoding is order-3 and it is possible to achieve an average of 84.2% reduction in file size. Using both algorithms, an average compression rate of 3.3 𝑏𝑖𝑡𝑠/𝑠𝑦𝑚𝑏𝑜𝑙 is attainable for source encoding and storage applications. Modeling Amharic text sources using context models in general and PPM in particular can help to maximize efficiency in communication networks by reducing the average number of bits required for coding text sourcesen_US
dc.identifier.urihttp://etd.aau.edu.et/handle/12345678/19539
dc.language.isoen_USen_US
dc.publisherAddis Ababa Universityen_US
dc.subjectAmharicen_US
dc.subjectEntropyen_US
dc.subjectSource Encodingen_US
dc.subjectContexten_US
dc.subjectModelingen_US
dc.subjectCodingen_US
dc.subjectPPMen_US
dc.titleCompression of Amharic Text Using Prediction by Partial Match (Ppm) Context-Modeling Algorithmen_US
dc.typeThesisen_US

Files

Original bundle
Now showing 1 - 1 of 1
No Thumbnail Available
Name:
Yalemsew Abate.pdf
Size:
2.11 MB
Format:
Adobe Portable Document Format
License bundle
Now showing 1 - 1 of 1
No Thumbnail Available
Name:
license.txt
Size:
1.71 KB
Format:
Plain Text
Description: