Entropy Estimation and Entropy Based Encoding of Written Afaan Oromo for its Efficient Digital Transmission and Storage

No Thumbnail Available

Date

2021-02

Journal Title

Journal ISSN

Volume Title

Publisher

Addis Ababa University

Abstract

According to Ethiopian population census, Oromo Language is estimated to be spoken by 36.4% of the local population. Furthermore, in addition to the local population the language is spoken outside of Ethiopia, for instance in small portion of Kenya. Thus, taking this into account the language is estimated to be spoken by around fty million people. In addition to the spoken form, a considerable portion of the language's speaker are capable of understanding its written form known as Qubee. The introduction of Qubee, in the mid-nineties has opened doors for its utilization in modern day communication systems. Leaving this argument aside, in the eyes of information theory and communication channels both symbol utilization schemes are found to be ine cient. This is because, Latin or Amharic symbols are represented by ASCII8 and UTF 16 xed length encoding mechanisms poorly model written natural language. With the expected increasing demand of the language in telecom services in mind, in this thesis we mainly aim at estimating the Oromo Language Language's entropy. The estimation will set the optimum number of bits per symbol needed to e ciently trans- mit written Oromo Language in communication systems. To achieve our objective, we have modeled the sources, i.e., written Oromo Language, as Nth order Markovian chain random process. Based on the modeling scheme we have studied the distribution of symbols in ten literature written in Oromo Language. The study reveals the Language can be transmitted using 4.31 bits/symbol when modeled as rst order Markovian Chain source. Whereas, the zero crossing entropy of the source was estimated to be in average at N=19.5; which gave an entropy estimation of 0.85 bits/symbol with a re- dundancy of 89.36%. Additionally, we have conducted two entropy-based compression algorithms, namely, Hu man and Arithmetic coding, to test the validity of our estima- tion. The Hu man algorithm was able to compress our sample corpora in average from 42:17% �� 64:88% for N = 1 �� 5. These compression results con rm the results of our Nth order estimation of the Language's entropy by approaching their theoretical limits.

Description

Keywords

Entropy, Encoding, Oromo Language, Language

Citation