Prosody Based Authomatic Speech Segmentation for Amharic

No Thumbnail Available



Journal Title

Journal ISSN

Volume Title


Addis Ababa University


Many speech processing systems require segmentation of speech waveform into principal acoustic units. Speech segmentation is the process of identifying the boundaries between paragraph, sentence, words, syllables, and phonemes in spoken natural languages. It is the very primary step in the field of speech technologies. Automatic speech segmentation is a process segment any one of discrete units that occur in a continuous speech signal through algorithms developed for this purpose. Speech segmentation is a challenging task because the cues present for segmenting text are absent in a continuous speech. The main goal of this work is to develop sentence level automatic speech segmentation system for Amharic. Sentence segmentation is a process of identifying the end of a sentence. In this study, sentence segmentation system is implemented in to two approaches. In the first approach, we used an automatic tool for segmenting and labeling of Amharic speech data. Acoustic model is created using speech and their text scripts and compiling them into a statistical representation of sounds which makeup words. This is done through HMM modeling. The approach one automatic speech segmentation system is done by forced alignment. In this approach we used rule-based and AdaBoost to discriminate the true boundaries from false. In the second approach, we extracted prosodic features directly from speech waveform and also statistical method, AdaBoost, is used. The evaluation of the experiments shows that monosyllable acoustic model is the better model to get accurate forced alignment than monophone and tide state tri-syllable model. And also adaboost classifier showed consistently good results especially in decision tree classifier. In all experiment read-aloud speech perform higher accuracy than spontaneous speech. It also indicates that spontaneous speech is more difficult than read-aloud because, the spontaneous speech contains more noise and disfluencies. The evaluation in phase two indicates that pause feature is a basic discriminator for Amharic sentence boundary. And also when prosodic features are introduced, the performance is increased. The scope of the research work is narrowed down only to sentences level segmentation. It is also required to conduct a research on automatic speech segmentation of other discrete units.



Sentence Segmentation, Acoustic Model, Prosody