Amharic Speech Recognition System Using Joint Transformer and Connectionist Temporal Classification with External Language Model Integration

dc.contributor.advisorBisrat Derebssa (PhD)
dc.contributor.authorAlemayehu Yilma
dc.date.accessioned2023-12-05T06:54:05Z
dc.date.available2023-12-05T06:54:05Z
dc.date.issued2023-06
dc.description.abstractSequence-to-sequence (S2S) attention-based models are deep neural network models that have demonstrated some tremendously remarkable outcomes in automatic speech recognition (ASR) research. In these models, the cutting-edge Transformer architecture has been extensively employed to solve a variety of S2S transformation problems, such as machine translation and ASR. This architecture does not use sequential computation, which makes it different from recurrent neural networks (RNNs) and gives it the benefit of a rapid iteration rate during the training phase. However, according to the literature, the overall training speed (convergence) of Transformer is relatively slower than RNN-based ASR. Thus, to accelerate the convergence of the Transformer model, this research proposes joint Transformer and connectionist temporal classification (CTC) for Amharic speech recognition system. The research also investigates an appropriate recognition units: characters, subwords, and syllables for Amharic end-to-end speech recognition systems. In this study, the accuracy of character- and subword-based end-to-end speech recognition system is compared and contrasted for the target language. For the character-based model with character-level language model (LM), a best character error rate of 8.84% is reported, and for the subword-based model with subword-level LM, a best word error rate of 24.61% is reported. Furthermore, the syllable-based end-to-end model achieves a 7.05% phoneme error rate and a 13.3% syllable error rate without integrating any language models (LMs).
dc.identifier.urihttp://etd.aau.edu.et/handle/123456789/249
dc.language.isoen_US
dc.publisherAddis Ababa University
dc.titleAmharic Speech Recognition System Using Joint Transformer and Connectionist Temporal Classification with External Language Model Integration
dc.typeThesis

Files

Original bundle
Now showing 1 - 1 of 1
No Thumbnail Available
Name:
Alemayehu Yilma.pdf
Size:
1.03 MB
Format:
Adobe Portable Document Format
License bundle
Now showing 1 - 1 of 1
No Thumbnail Available
Name:
license.txt
Size:
1.71 KB
Format:
Item-specific license agreed to upon submission
Description: