Prosody Based Authomatic Speech Segmentation for Amharic

dc.contributor.advisorTeferra, Solomon (PhD)
dc.contributor.authorMekonen, Rahel
dc.date.accessioned2019-05-08T13:10:45Z
dc.date.accessioned2023-11-18T12:47:26Z
dc.date.available2019-05-08T13:10:45Z
dc.date.available2023-11-18T12:47:26Z
dc.date.issued2019-02-05
dc.description.abstractMany speech processing systems require segmentation of speech waveform into principal acoustic units. Speech segmentation is the process of identifying the boundaries between paragraph, sentence, words, syllables, and phonemes in spoken natural languages. It is the very primary step in the field of speech technologies. Automatic speech segmentation is a process segment any one of discrete units that occur in a continuous speech signal through algorithms developed for this purpose. Speech segmentation is a challenging task because the cues present for segmenting text are absent in a continuous speech. The main goal of this work is to develop sentence level automatic speech segmentation system for Amharic. Sentence segmentation is a process of identifying the end of a sentence. In this study, sentence segmentation system is implemented in to two approaches. In the first approach, we used an automatic tool for segmenting and labeling of Amharic speech data. Acoustic model is created using speech and their text scripts and compiling them into a statistical representation of sounds which makeup words. This is done through HMM modeling. The approach one automatic speech segmentation system is done by forced alignment. In this approach we used rule-based and AdaBoost to discriminate the true boundaries from false. In the second approach, we extracted prosodic features directly from speech waveform and also statistical method, AdaBoost, is used. The evaluation of the experiments shows that monosyllable acoustic model is the better model to get accurate forced alignment than monophone and tide state tri-syllable model. And also adaboost classifier showed consistently good results especially in decision tree classifier. In all experiment read-aloud speech perform higher accuracy than spontaneous speech. It also indicates that spontaneous speech is more difficult than read-aloud because, the spontaneous speech contains more noise and disfluencies. The evaluation in phase two indicates that pause feature is a basic discriminator for Amharic sentence boundary. And also when prosodic features are introduced, the performance is increased. The scope of the research work is narrowed down only to sentences level segmentation. It is also required to conduct a research on automatic speech segmentation of other discrete units.en_US
dc.identifier.urihttp://etd.aau.edu.et/handle/12345678/18214
dc.language.isoenen_US
dc.publisherAddis Ababa Universityen_US
dc.subjectSentence Segmentationen_US
dc.subjectAcoustic Modelen_US
dc.subjectProsodyen_US
dc.titleProsody Based Authomatic Speech Segmentation for Amharicen_US
dc.typeThesisen_US

Files

Original bundle
Now showing 1 - 1 of 1
No Thumbnail Available
Name:
Rahel Mekonen 2019.pdf
Size:
1.77 MB
Format:
Adobe Portable Document Format
License bundle
Now showing 1 - 1 of 1
No Thumbnail Available
Name:
license.txt
Size:
1.71 KB
Format:
Plain Text
Description: